Convert wide or long sequence data into frequency counts, one-hot encoding, edge lists, or follows format.
Usage
convert_sequence_format(
data,
seq_cols = NULL,
id_col = NULL,
action = NULL,
time = NULL,
format = c("frequency", "onehot", "edgelist", "follows")
)Arguments
- data
Data frame containing sequence data.
- seq_cols
Character vector. Names of columns containing sequential states (for wide format input). If NULL, all columns except
id_colare used. Default: NULL.- id_col
Character vector. Name(s) of the ID column(s). For wide format, defaults to the first column. For long format, required. Default: NULL.
- action
Character or NULL. Name of the column containing actions/states (for long format input). If provided, data is treated as long format. Default: NULL.
- time
Character or NULL. Name of the time column for ordering actions within sequences (for long format). Default: NULL.
- format
Character. Output format:
- "frequency"
Count of each action per sequence (wide, one column per state).
- "onehot"
Binary presence/absence of each action per sequence.
- "edgelist"
Consecutive transition pairs (from, to) per sequence.
- "follows"
Each action paired with the action that preceded it.
Value
A data frame in the requested format:
- frequency
ID columns + one integer column per state with counts.
- onehot
ID columns + one binary column per state (0/1).
- edgelist
ID columns +
fromandtocolumns.- follows
ID columns +
actandfollowscolumns.
See also
frequencies for building transition frequency matrices.
Examples
# \donttest{
# Wide format input
seqs <- data.frame(V1 = c("A","B","A"), V2 = c("B","A","C"), V3 = c("A","C","B"))
convert_sequence_format(seqs, format = "frequency")
#> rid V1 A B C
#> 1 1 A 1 1 0
#> 2 2 B 1 0 1
#> 3 3 A 0 1 1
convert_sequence_format(seqs, format = "edgelist")
#> V1 from to
#> 1 A B A
#> 2 B A C
#> 3 A C B
# }