Constructs higher-order De Bruijn graphs from sequential trajectory data and selects the optimal Markov order using AIC, BIC, or likelihood ratio tests.
Usage
build_mogen(
data,
max_order = 5L,
criterion = c("aic", "bic", "lrt"),
lrt_alpha = 0.01
)Arguments
- data
A data.frame (rows = trajectories, columns = time points) or a list of character/numeric vectors (one per trajectory).
- max_order
Integer. Maximum Markov order to test (default 5).
- criterion
Character. Model selection criterion:
"aic"(default),"bic", or"lrt"(likelihood ratio test).- lrt_alpha
Numeric. Significance threshold for LRT (default 0.01).
Value
An object of class net_mogen with components:
- optimal_order
Selected optimal Markov order.
- criterion
Which criterion was used for selection.
- orders
Integer vector of tested orders (0 to max_order).
- aic
Named numeric vector of AIC values per order.
- bic
Named numeric vector of BIC values per order.
- log_likelihood
Named numeric vector of log-likelihoods.
- dof
Named integer vector of cumulative DOF per model.
- layer_dof
Named integer vector of per-layer DOF.
- transition_matrices
List of transition matrices (index 1 = order 0).
- states
Unique first-order states.
- n_paths
Number of trajectories.
- n_observations
Total number of state observations.
Details
At order k, nodes are k-tuples of states and edges represent transitions between overlapping k-tuples. The model tests increasingly complex Markov orders and selects the one that best balances fit and parsimony.
References
Scholtes, I. (2017). When is a Network a Network? Multi-Order Graphical Model Selection in Pathways and Temporal Networks. KDD 2017.
Gote, C. & Scholtes, I. (2023). Predicting variable-length paths in networked systems using multi-order generative models. Applied Network Science, 8, 62.
Examples
# \donttest{
trajs <- list(c("A","B","C","D"), c("A","B","D","C"),
c("B","C","D","A"), c("C","D","A","B"))
m <- build_mogen(trajs, max_order = 3)
print(m)
#> Multi-Order Generative Model (MOGen)
#> Optimal order: 2 (by aic)
#> Orders tested: 0 to 3
#> States: 4
#> Paths: 4 (16 observations)
#> AIC: 50.4 | 28.7 | 26.7 | 26.7
plot(m)
# }