Converts binary indicator (one-hot) data into the wide sequence format
expected by build_network and tna::tna(). Each binary
column represents a state; rows where the value is 1 are marked with the
column name. Supports optional windowed aggregation.
Usage
prepare_onehot(
data,
cols,
actor = NULL,
session = NULL,
interval = NULL,
window_size = 1L,
window_type = c("non-overlapping", "overlapping"),
aggregate = FALSE
)Arguments
- data
Data frame with binary (0/1) indicator columns.
- cols
Character vector. Names of the one-hot columns to use.
- actor
Character or NULL. Name of the actor/ID column. If NULL, all rows are treated as a single sequence. Default: NULL.
- session
Character or NULL. Name of the session column for sub-grouping within actors. Default: NULL.
- interval
Integer or NULL. Number of rows per time point in the output. If NULL, all rows become a single time point group. Default: NULL.
- window_size
Integer. Number of consecutive rows to aggregate into each window. Default: 1 (no windowing).
- window_type
Character.
"non-overlapping"(fixed, separate windows) or"overlapping"(rolling, step = 1). Default:"non-overlapping".- aggregate
Logical. If TRUE, aggregate within each window by taking the first non-NA indicator per column. Default: FALSE.
Value
A data frame in wide format with columns named
W{window}_T{time} where each cell contains a state name or NA.
Attributes windowed, window_size, window_span
are set on the result.
See also
action_to_onehot for the reverse conversion.
Examples
# \donttest{
# Simple binary data
df <- data.frame(
A = c(1, 0, 1, 0, 1),
B = c(0, 1, 0, 1, 0),
C = c(0, 0, 0, 0, 0)
)
seq_data <- prepare_onehot(df, cols = c("A", "B", "C"))
# With actor grouping
df$actor <- c(1, 1, 1, 2, 2)
seq_data <- prepare_onehot(df, cols = c("A", "B", "C"), actor = "actor")
# With windowing
seq_data <- prepare_onehot(df, cols = c("A", "B", "C"),
window_size = 2, window_type = "non-overlapping")
# }