Aggregates node-level network weights to cluster-level summaries. Computes both between-cluster transitions (how clusters connect to each other) and within-cluster transitions (how nodes connect within each cluster).
Usage
cluster_summary(
x,
clusters = NULL,
method = c("sum", "mean", "median", "max", "min", "density", "geomean"),
type = c("tna", "cooccurrence", "semi_markov", "raw"),
directed = TRUE,
compute_within = TRUE
)
csum(
x,
clusters = NULL,
method = c("sum", "mean", "median", "max", "min", "density", "geomean"),
type = c("tna", "cooccurrence", "semi_markov", "raw"),
directed = TRUE,
compute_within = TRUE
)Arguments
- x
Network input. Accepts multiple formats:
- matrix
Numeric adjacency/weight matrix. Row and column names are used as node labels. Values represent edge weights (e.g., transition counts, co-occurrence frequencies, or probabilities).
- netobject
A cograph network object. The function extracts the weight matrix from
x$weightsor converts viato_matrix(). Clusters can be auto-detected from node attributes.- tna
A tna object from the tna package. Extracts
x$weights.- cluster_summary
If already a cluster_summary, returns unchanged.
- clusters
Cluster/group assignments for nodes. Accepts multiple formats:
- NULL
(default) Auto-detect from netobject. Looks for columns named 'clusters', 'cluster', 'groups', or 'group' in
x$nodes. Throws an error if no cluster column is found. This option only works whenxis a netobject.- vector
Cluster membership for each node, in the same order as the matrix rows/columns. Can be numeric (1, 2, 3) or character ("A", "B"). Cluster names will be derived from unique values. Example:
c(1, 1, 2, 2, 3, 3)assigns first two nodes to cluster 1.- data.frame
A data frame where the first column contains node names and the second column contains group/cluster names. Example:
data.frame(node = c("A", "B", "C"), group = c("G1", "G1", "G2"))- named list
Explicit mapping of cluster names to node labels. List names become cluster names, values are character vectors of node labels that must match matrix row/column names. Example:
list(Alpha = c("A", "B"), Beta = c("C", "D"))
- method
Aggregation method for combining edge weights within/between clusters. Controls how multiple node-to-node edges are summarized:
- "sum"
(default) Sum of all edge weights. Best for count data (e.g., transition frequencies). Preserves total flow.
- "mean"
Average edge weight. Best when cluster sizes differ and you want to control for size. Note: when input is already a transition matrix (rows sum to 1), "mean" avoids size bias. Example: cluster with 5 nodes won't have 5x the weight of cluster with 1 node.
- "median"
Median edge weight. Robust to outliers.
- "max"
Maximum edge weight. Captures strongest connection.
- "min"
Minimum edge weight. Captures weakest connection.
- "density"
Sum divided by number of possible edges. Normalizes by cluster size combinations.
- "geomean"
Geometric mean of positive weights. Useful for multiplicative processes.
- type
Post-processing applied to aggregated weights. Determines the interpretation of the resulting matrices:
- "tna"
(default) Row-normalize so each row sums to 1. Creates transition probabilities suitable for Markov chain analysis. Interpretation: "Given I'm in cluster A, what's the probability of transitioning to cluster B?" Required for use with tna package functions. Diagonal represents within-cluster transition probability.
- "raw"
No normalization. Returns aggregated counts/weights as-is. Use for frequency analysis or when you need raw counts. Compatible with igraph's contract + simplify output.
- "cooccurrence"
Symmetrize the matrix: (A + t(A)) / 2. For undirected co-occurrence analysis.
- "semi_markov"
Row-normalize with duration weighting. For semi-Markov process analysis.
- directed
Logical. If
TRUE(default), treat network as directed. A->B and B->A are separate edges. IfFALSE, edges are undirected and the matrix is symmetrized before processing.- compute_within
Logical. If
TRUE(default), compute within-cluster transition matrices for each cluster. Each cluster gets its own n_i x n_i matrix showing internal node-to-node transitions. Set toFALSEto skip this computation for better performance when only between-cluster summary is needed.
Value
A cluster_summary object (S3 class) containing:
- between
List with two elements:
- weights
k x k matrix of cluster-to-cluster weights, where k is the number of clusters. Row i, column j contains the aggregated weight from cluster i to cluster j. Diagonal contains within-cluster totals. Processing depends on
type.- inits
Numeric vector of length k. Initial state distribution across clusters, computed from column sums of the original matrix. Represents the proportion of incoming edges to each cluster.
- within
Named list with one element per cluster. Each element contains:
- weights
n_i x n_i matrix for nodes within that cluster. Shows internal transitions between nodes in the same cluster.
- inits
Initial distribution within the cluster.
NULL if
compute_within = FALSE.- clusters
Named list mapping cluster names to their member node labels. Example:
list(A = c("n1", "n2"), B = c("n3", "n4", "n5"))- meta
List of metadata:
- type
The
typeargument used ("tna", "raw", etc.)- method
The
methodargument used ("sum", "mean", etc.)- directed
Logical, whether network was treated as directed
- n_nodes
Total number of nodes in original network
- n_clusters
Number of clusters
- cluster_sizes
Named vector of cluster sizes
See cluster_summary.
Details
This is the core function for Multi-Cluster Multi-Level (MCML) analysis.
Use as_tna() to convert results to tna objects for further
analysis with the tna package.
Workflow
Typical MCML analysis workflow:
# 1. Create network
net <- build_network(data, method = "relative")
net$nodes$clusters <- group_assignments
# 2. Compute cluster summary
cs <- cluster_summary(net, type = "tna")
# 3. Convert to tna models
tna_models <- as_tna(cs)
# 4. Analyze/visualize
plot(tna_models$macro)
tna::centralities(tna_models$macro)Between-Cluster Matrix Structure
The macro$weights matrix has clusters as both rows and columns:
Off-diagonal (row i, col j): Aggregated weight from cluster i to cluster j
Diagonal (row i, col i): Within-cluster total (sum of internal edges in cluster i)
When type = "tna", rows sum to 1 and diagonal values represent
"retention rate" - the probability of staying within the same cluster.
Choosing method and type
| Input data | Recommended | Reason |
| Edge counts | method="sum", type="tna" | Preserves total flow, normalizes to probabilities |
| Transition matrix | method="mean", type="tna" | Avoids cluster size bias |
| Frequencies | method="sum", type="raw" | Keep raw counts for analysis |
| Correlation matrix | method="mean", type="raw" | Average correlations |
Examples
# -----------------------------------------------------
# Basic usage with matrix and cluster vector
# -----------------------------------------------------
mat <- matrix(runif(100), 10, 10)
rownames(mat) <- colnames(mat) <- LETTERS[1:10]
clusters <- c(1, 1, 1, 2, 2, 2, 3, 3, 3, 3)
cs <- cluster_summary(mat, clusters)
# Access results
cs$macro$weights # 3x3 cluster transition matrix
#> 1 2 3
#> 1 0.3096851 0.3024384 0.3878765
#> 2 0.3022261 0.2590855 0.4386884
#> 3 0.2565129 0.2628054 0.4806818
cs$macro$inits # Initial distribution
#> 1 2 3
#> 0.2868688 0.2746885 0.4384427
cs$clusters$`1`$weights # Within-cluster 1 transitions
#> A B C
#> A 0.35884842 0.6027075 0.0384441
#> B 0.07374815 0.4482965 0.4779554
#> C 0.31270223 0.3622318 0.3250660
cs$meta # Metadata
#> $type
#> [1] "tna"
#>
#> $method
#> [1] "sum"
#>
#> $directed
#> [1] TRUE
#>
#> $n_nodes
#> [1] 10
#>
#> $n_clusters
#> [1] 3
#>
#> $cluster_sizes
#> 1 2 3
#> 3 3 4
#>
# -----------------------------------------------------
# Named list clusters (more readable)
# -----------------------------------------------------
clusters <- list(
Alpha = c("A", "B", "C"),
Beta = c("D", "E", "F"),
Gamma = c("G", "H", "I", "J")
)
cs <- cluster_summary(mat, clusters, type = "tna")
cs$macro$weights # Rows/cols named Alpha, Beta, Gamma
#> Alpha Beta Gamma
#> Alpha 0.3096851 0.3024384 0.3878765
#> Beta 0.3022261 0.2590855 0.4386884
#> Gamma 0.2565129 0.2628054 0.4806818
cs$clusters$Alpha # Within Alpha cluster
#> $weights
#> A B C
#> A 0.35884842 0.6027075 0.0384441
#> B 0.07374815 0.4482965 0.4779554
#> C 0.31270223 0.3622318 0.3250660
#>
#> $inits
#> A B C
#> 0.2358790 0.4452836 0.3188374
#>
#> $labels
#> [1] "A" "B" "C"
#>
#> $data
#> NULL
#>
# -----------------------------------------------------
# Auto-detect clusters from netobject
# -----------------------------------------------------
# \donttest{
seqs <- data.frame(
V1 = sample(LETTERS[1:10], 30, TRUE), V2 = sample(LETTERS[1:10], 30, TRUE),
V3 = sample(LETTERS[1:10], 30, TRUE)
)
net <- build_network(seqs, method = "relative")
cs2 <- cluster_summary(net, c(1, 1, 1, 2, 2, 2, 3, 3, 3, 3))
# }
# -----------------------------------------------------
# Different aggregation methods
# -----------------------------------------------------
cs_sum <- cluster_summary(mat, clusters, method = "sum") # Total flow
cs_mean <- cluster_summary(mat, clusters, method = "mean") # Average
cs_max <- cluster_summary(mat, clusters, method = "max") # Strongest
# -----------------------------------------------------
# Raw counts vs TNA probabilities
# -----------------------------------------------------
cs_raw <- cluster_summary(mat, clusters, type = "raw")
cs_tna <- cluster_summary(mat, clusters, type = "tna")
rowSums(cs_raw$macro$weights) # Various sums
#> Alpha Beta Gamma
#> 16.59946 14.45172 19.78782
rowSums(cs_tna$macro$weights) # All equal to 1
#> Alpha Beta Gamma
#> 1 1 1
# -----------------------------------------------------
# Skip within-cluster computation for speed
# -----------------------------------------------------
cs_fast <- cluster_summary(mat, clusters, compute_within = FALSE)
cs_fast$clusters # NULL
#> NULL
# -----------------------------------------------------
# Convert to tna objects for tna package
# -----------------------------------------------------
cs <- cluster_summary(mat, clusters, type = "tna")
tna_models <- as_tna(cs)
# tna_models$macro # tna object
# tna_models$clusters$Alpha # tna object
# \donttest{
mat <- matrix(runif(16), 4, 4)
rownames(mat) <- colnames(mat) <- LETTERS[1:4]
csum(mat, c(1, 1, 2, 2))
#> MCML Network
#> ============
#> Type: tna | Method: sum
#> Nodes: 4 | Clusters: 2
#>
#> Clusters:
#> 1 (2): A, B
#> 2 (2): C, D
#>
#> Macro (cluster-level) weights:
#> 1 2
#> 1 0.5383 0.4617
#> 2 0.4832 0.5168
# }