Constructs a k-th order De Bruijn graph from sequential trajectory data and uses a hypergeometric null model to detect paths with anomalous frequencies. Paths occurring more or less often than expected under the null model are flagged as over- or under-represented.
Arguments
- data
A data.frame (rows = trajectories), list of character vectors,
tnaobject, ornetobjectwith sequence data. Fortna/netobject, numeric state IDs are automatically converted to label names.- k
Integer. Order of the De Bruijn graph (default 2). Detects anomalies in paths of length k.
- alpha
Numeric. Significance threshold for anomaly classification (default 0.05). Paths with HYPA score < alpha are under-represented; paths with score > 1-alpha are over-represented.
- min_count
Integer. Minimum observed count for a path to be classified as anomalous (default 2). Paths with fewer observations are always classified as
"normal"regardless of their HYPA score, since single occurrences are unreliable.
Value
An object of class net_hypa with components:
- scores
Data frame with path, from, to, observed, expected, ratio, hypa_score, anomaly columns. The
pathcolumn shows the full state sequence (e.g., "A -> B -> C");fromis the context (conditioning states);tois the next state;ratiois observed / expected.- adjacency
Weighted adjacency matrix of the De Bruijn graph.
- xi
Fitted propensity matrix.
- k
Order of the De Bruijn graph.
- alpha
Significance threshold used.
- n_anomalous
Number of anomalous paths detected.
- n_over
Number of over-represented paths.
- n_under
Number of under-represented paths.
- n_edges
Total number of edges.
- nodes
Node names in the De Bruijn graph.