Functions for converting sequence data (long or wide format) into transition frequency matrices and other useful representations.
Convert long or wide format sequence data into a transition frequency matrix. Counts how many times each transition from state_i to state_j occurs across all sequences.
Usage
frequencies(
data,
action = "Action",
id = NULL,
time = "Time",
cols = NULL,
format = c("auto", "long", "wide")
)Arguments
- data
Data frame containing sequence data in long or wide format.
- action
Character. Name of the column containing actions/states (for long format). Default: "Action".
- id
Character vector. Name(s) of the column(s) identifying sequences. For long format, each unique combination of ID values defines a sequence. For wide format, used to exclude non-state columns. Default: NULL.
- time
Character. Name of the time column used to order actions within sequences (for long format). Default: "Time".
- cols
Character vector. Names of columns containing states (for wide format). If NULL, all non-ID columns are used. Default: NULL.
- format
Character. Format of input data: "auto" (detect automatically), "long", or "wide". Default: "auto".
Value
A square integer matrix of transition frequencies where
mat[i, j] is the number of times state i was followed by state j.
Row and column names are the sorted unique states. Can be passed directly
to tna::tna().
Details
For long format data, each row is a single action/event. Sequences
are defined by the id column(s), and actions are ordered by the
time column within each sequence. Consecutive actions within a
sequence form transition pairs.
For wide format data, each row is a sequence and columns represent
consecutive time points. Transitions are counted across consecutive columns,
skipping any NA values.
See also
convert_sequence_format for converting to other
representations (frequency counts, one-hot, edge lists).
Examples
# \donttest{
# Wide format
seqs <- data.frame(V1 = c("A","B","A"), V2 = c("B","A","C"), V3 = c("A","C","B"))
freq <- frequencies(seqs, format = "wide")
# Long format
long <- data.frame(
Actor = rep(1:2, each = 3), Time = rep(1:3, 2),
Action = c("A","B","C","B","A","C")
)
freq <- frequencies(long, action = "Action", id = "Actor")
# }