Build a Higher-Order Network (HON) — build

Constructs a Higher-Order Network from sequential data, faithfully implementing the BuildHON algorithm (Xu, Wickramarathne & Chawla, 2016).

The algorithm detects when a first-order Markov model is insufficient to capture sequential dependencies and automatically creates higher-order nodes. Uses KL-divergence to determine whether extending a node's history provides significantly different transition distributions.

Usage

build_hon(
  data,
  max_order = 5L,
  min_freq = 1L,
  collapse_repeats = FALSE,
  method = "hon+"
)

Arguments

data

One of:

data.frame: rows are trajectories, columns are time steps. Trailing NAs are stripped. All non-NA values are coerced to character.
list: each element is a character (or coercible) vector representing one trajectory.
tna: a tna object with sequence data. Numeric state IDs are automatically converted to label names.
netobject: a netobject with sequence data.

max_order

Integer. Maximum order of the HON. Default 5. The algorithm may produce lower-order nodes if the data do not justify higher orders.

min_freq

Integer. Minimum frequency for a transition to be considered. Transitions observed fewer than min_freq times are treated as zero. Default 1.

collapse_repeats

Logical. If TRUE, adjacent duplicate states within each trajectory are collapsed before analysis. Default FALSE.

method

Character. Algorithm to use: "hon+" (default, parameter-free BuildHON+ with lazy observation building and MaxDivergence pruning) or "hon" (original BuildHON with eager observation building).

Value

An S3 object of class "net_hon" containing:

matrix: Weighted adjacency matrix (rows = from, cols = to). Rows and columns use readable arrow notation (e.g., "A -> B").
edges: Data frame with columns: path (full state sequence, e.g., "A -> B -> C"), from (context/conditioning states), to (predicted next state), count (raw frequency), probability (transition probability), from_order, to_order.
nodes: Character vector of HON node names in arrow notation.
n_nodes: Number of HON nodes.
n_edges: Number of edges.
first_order_states: Character vector of unique original states.
max_order_requested: The max_order parameter used.
max_order_observed: Highest order actually present.
min_freq: The min_freq parameter used.
n_trajectories: Number of trajectories after parsing.
directed: Logical. Always TRUE.

Details

Node naming convention: Higher-order nodes use readable arrow notation. A first-order node is simply "A". A second-order node representing the context "came from A, now at B" is "A -> B". Third-order: "A -> B -> C", etc.

Algorithm overview:

Count all subsequence transitions up to max_order + 1.
Build probability distributions, filtering by min_freq.
For each first-order source, recursively test whether extending the history (adding more context) produces a significantly different distribution (via KL-divergence vs. an adaptive threshold).
Build the network from the accepted rules, rewiring edges so higher-order nodes are properly connected.

References

Xu, J., Wickramarathne, T. L., & Chawla, N. V. (2016). Representing higher-order dependencies in networks. Science Advances, 2(5), e1600028.

Saebi, M., Xu, J., Kaplan, L. M., Ribeiro, B., & Chawla, N. V. (2020). Efficient modeling of higher-order dependencies in networks: from algorithm to application for anomaly detection. EPJ Data Science, 9(1), 15.

Examples

# \donttest{
# From list of trajectories
trajs <- list(
  c("A", "B", "C", "D", "A"),
  c("A", "B", "D", "C", "A"),
  c("A", "B", "C", "D", "A")
)
hon <- build_hon(trajs, max_order = 3, min_freq = 1)
print(hon)
#> Higher-Order Network (HON)
#>   Nodes:        4 (4 first-order states)
#>   Edges:        7
#>   Max order:    1 (requested 3)
#>   Min freq:     1
#>   Trajectories: 3
summary(hon)
#> Higher-Order Network (HON) Summary
#>   Nodes: 4 | Edges: 7 | Trajectories: 3
#>   First-order states: A, B, C, D
#>   Max order observed: 1 (requested: 3)
#>   Min frequency: 1
#>   Node order distribution:
#>     Order 1: 4 nodes

# From data.frame (rows = trajectories)
df <- data.frame(T1 = c("A", "A"), T2 = c("B", "B"),
                 T3 = c("C", "D"), T4 = c("D", "C"))
hon <- build_hon(df, max_order = 2)
# }