Skip to contents

Functions for converting sequence data (long or wide format) into transition frequency matrices and other useful representations.

Convert long or wide format sequence data into a transition frequency matrix. Counts how many times each transition from state_i to state_j occurs across all sequences.

Usage

frequencies(
  data,
  action = "Action",
  id = NULL,
  time = "Time",
  cols = NULL,
  format = c("auto", "long", "wide")
)

Arguments

data

Data frame containing sequence data in long or wide format.

action

Character. Name of the column containing actions/states (for long format). Default: "Action".

id

Character vector. Name(s) of the column(s) identifying sequences. For long format, each unique combination of ID values defines a sequence. For wide format, used to exclude non-state columns. Default: NULL.

time

Character. Name of the time column used to order actions within sequences (for long format). Default: "Time".

cols

Character vector. Names of columns containing states (for wide format). If NULL, all non-ID columns are used. Default: NULL.

format

Character. Format of input data: "auto" (detect automatically), "long", or "wide". Default: "auto".

Value

A square integer matrix of transition frequencies where mat[i, j] is the number of times state i was followed by state j. Row and column names are the sorted unique states. Can be passed directly to tna::tna().

Details

For long format data, each row is a single action/event. Sequences are defined by the id column(s), and actions are ordered by the time column within each sequence. Consecutive actions within a sequence form transition pairs.

For wide format data, each row is a sequence and columns represent consecutive time points. Transitions are counted across consecutive columns, skipping any NA values.

See also

convert_sequence_format for converting to other representations (frequency counts, one-hot, edge lists).

Examples

# \donttest{
# Wide format
seqs <- data.frame(V1 = c("A","B","A"), V2 = c("B","A","C"), V3 = c("A","C","B"))
freq <- frequencies(seqs, format = "wide")

# Long format
long <- data.frame(
  Actor = rep(1:2, each = 3), Time = rep(1:3, 2),
  Action = c("A","B","C","B","A","C")
)
freq <- frequencies(long, action = "Action", id = "Actor")
# }