Title: | A Statistical Framework for Comparing Sets of Trees |
Version: | 0.0.1 |
Description: | Statistical framework for comparing sets of trees using hypothesis testing methods. Designed for transmission trees, phylogenetic trees, and directed acyclic graphs (DAGs), the package implements chi-squared tests to compare edge frequencies between sets and PERMANOVA to analyse topological dissimilarities with customisable distance metrics, following Anderson (2001) <doi:10.1111/j.1442-9993.2001.01070.pp.x>. |
License: | MIT + file LICENSE |
Encoding: | UTF-8 |
RoxygenNote: | 7.3.2 |
Imports: | igraph, treespace, vegan |
Suggests: | knitr, rmarkdown |
VignetteBuilder: | knitr |
URL: | https://cygei.github.io/mixtree/ |
NeedsCompilation: | no |
Packaged: | 2025-03-03 11:48:59 UTC; cg1521 |
Author: | Cyril Geismar |
Maintainer: | Cyril Geismar <c.geismar21@imperial.ac.uk> |
Repository: | CRAN |
Date/Publication: | 2025-03-05 13:00:06 UTC |
Compute the Abouheif distance matrix
Description
The Abouheif distance is the product of the number of direct descendants of each node in the path between two nodes. It is a measure of the number of transmission events between two nodes.
Usage
abouheif(tree)
Arguments
tree |
A data frame representing a transmission tree, with the first column containing the infector IDs and the second the infectee IDs. |
Value
A square, symmetric matrix of Abouheif distances between nodes.
Examples
tree <- data.frame(from = c(1, 1, 2, 2, 3, 3), to = c(2, 3, 4, 5, 6, 7))
abouheif(tree)
Perform Chi-Square Test on Sets of Transmission Trees
Description
Tests whether the distribution of infector-infectee pairs differs between sets of transmission trees.
Usage
chisq_test(..., method = c("chisq", "fisher"), test_args = list())
Arguments
... |
Two or more sets of transmission trees. Each set is a list of data frames with columns |
method |
Test to use: |
test_args |
A list of additional arguments for |
Value
An htest
object with the test results.
Examples
set.seed(1)
# No difference in the sets
setA <- replicate(10, igraph::as_long_data_frame(
make_tree(n_cases = 10, R = 2, stochastic = TRUE)
),
simplify = FALSE
)
setB <- replicate(10, igraph::as_long_data_frame(
make_tree(n_cases = 10, R = 2, stochastic = TRUE)
),
simplify = FALSE
)
chisq_test(setA, setB)
# Difference in the sets
setC <- replicate(10, igraph::as_long_data_frame(
make_tree(n_cases = 10, R = 4, stochastic = TRUE)
),
simplify = FALSE
)
chisq_test(setA, setB, setC)
Calculate the Euclidean distance between two distance matrices.
Description
This function computes the Euclidean distance between the lower triangular parts of two given matrices.
Usage
euclidean(mat1, mat2)
Arguments
mat1 |
A numeric matrix. |
mat2 |
A numeric matrix. |
Value
A numeric value representing the Euclidean distance between the
lower triangular parts of mat1
and mat2
.
Examples
mat1 <- matrix(c(1, 2, 3, 4), 2, 2)
mat2 <- matrix(c(4, 3, 2, 1), 2, 2)
euclidean(mat1, mat2)
Compute the Kendall distance matrix
Description
Kendall's distance measures the depth of the most recent common infector (MRCI) for each pair of nodes with respect to the source (patient 0).
Usage
kendall(tree)
Arguments
tree |
A data frame representing a transmission tree, with the first column containing the infector IDs and the second the infectee IDs. |
Value
A square, symmetric matrix of Kendall's distances between nodes.
References
A Metric to Compare Transmission Trees - M Kendall · 2018
See Also
Examples
tree <- data.frame(from = c(1, 1, 2, 2, 3, 3), to = c(2, 3, 4, 5, 6, 7))
kendall(tree)
Generate a Transmission Tree
Description
Creates a transmission tree with a specified number of cases and branches per case. The tree can be generated with fixed or Poisson-distributed branching factors.
Usage
make_tree(n_cases, R = 2, stochastic = FALSE, plot = FALSE)
Arguments
n_cases |
Integer. The total number of cases (nodes) in the tree. |
R |
Integer. The fixed number of branches per case when |
stochastic |
Logical. If |
plot |
Logical. If |
Value
An igraph object representing the transmission tree.
Examples
# Generate a deterministic transmission tree
deterministic_tree <- make_tree(n_cases = 15, R = 2, stochastic = FALSE, plot = TRUE)
# Generate a stochastic transmission tree
random_tree <- make_tree(n_cases = 15, R = 2, stochastic = TRUE, plot = TRUE)
Compute the Patristic distance matrix
Description
The patristic distance is the number of generations separating any two nodes in a transmission tree.
Usage
patristic(tree)
Arguments
tree |
A data frame representing a transmission tree, with the first column containing the infector IDs and the second the infectee IDs. |
Value
A square, symmetric matrix of patristic distances between nodes.
Examples
tree <- data.frame(from = c(1, 1, 2, 2, 3, 3), to = c(2, 3, 4, 5, 6, 7))
patristic(tree)
Perform PERMANOVA on Sets of Transmission Trees
Description
Tests for topological differences between sets of transmission trees using PERMANOVA (via vegan::adonis2
).
Usage
permanova_test(
...,
within_dist = patristic,
between_dist = euclidean,
test_args = list()
)
Arguments
... |
Two or more sets of transmission trees. Each set is a list of dataframes with columns |
within_dist |
A function to compute pairwise distances within a tree. Takes a dataframe, returns a square matrix. Default is |
between_dist |
A function to compute distance between two trees. Takes two matrices, returns a numeric value. Default is |
test_args |
A list of additional arguments to pass to |
Value
A vegan::adonis2
object containing the test results.
Examples
set.seed(1)
# No difference in the sets
setA <- replicate(10, igraph::as_long_data_frame(
make_tree(n_cases = 10, R = 2, stochastic = TRUE)
),
simplify = FALSE
)
setB <- replicate(10, igraph::as_long_data_frame(
make_tree(n_cases = 10, R = 2, stochastic = TRUE)
),
simplify = FALSE
)
permanova_test(setA, setB)
# Difference in the sets
setC <- replicate(10, igraph::as_long_data_frame(
make_tree(n_cases = 10, R = 4, stochastic = TRUE)
),
simplify = FALSE
)
permanova_test(setA, setB, setC)
Shuffle Node IDs in a Graph
Description
Randomly shuffles the IDs of the nodes in a given graph and optionally plots the shuffled graph.
Usage
shuffle_graph_ids(g, plot = FALSE)
Arguments
g |
An igraph object representing the graph. |
plot |
Logical. If |
Value
An igraph object with shuffled node IDs.
Examples
# Create an example graph
g <- make_tree(n_cases = 10, R = 2)
# Shuffle the node IDs
shuffled_graph <- shuffle_graph_ids(g, plot = TRUE)
Test Differences Between Sets of Transmission Trees
Description
Performs a statistical test to assess whether there are significant differences between sets of transmission trees.
Supports PERMANOVA (via "vegan::adonis2"
), Chi-Square, or Fisher's Exact Test.
Usage
tree_test(
...,
method = c("permanova", "chisq", "fisher"),
within_dist = patristic,
between_dist = euclidean,
test_args = list()
)
Arguments
... |
Two or more sets of transmission trees. Each set must be a list of data frames with columns |
method |
A character string specifying the test method. Options are |
within_dist |
A function to compute pairwise distances within a tree for PERMANOVA. Takes a data frame, returns a square matrix. Default is |
between_dist |
A function to compute distance between two trees for PERMANOVA. Takes two matrices, returns a numeric value. Default is |
test_args |
A list of additional arguments to pass to the underlying test function ( |
Details
This function compares sets of transmission trees using one of three statistical tests.
PERMANOVA: Evaluates whether the topology of transmission trees differs between sets.
-
Null Hypothesis (H0): There is no difference in tree topologies between sets.
-
Alternative Hypothesis (H1): At least one set of transmission trees has a different topological structure.
Chi-Square or Fisher’s Exact Test: Evaluates whether the distribution of infector-infectee pairs differs between sets.
-
Null Hypothesis (H0): The frequency of infector-infectee pairs is consistent across all sets.
-
Alternative Hypothesis (H1): The frequency of infector-infectee pairs differs between at least two sets.
Value
For
"permanova"
: A"vegan::adonis2"
object containing the test results.For
"chisq"
or"fisher"
: An"htest"
object with the test results.
See Also
Examples
set.seed(1)
# Generate example sets
setA <- replicate(10, igraph::as_long_data_frame(
make_tree(n_cases = 10, R = 2, stochastic = TRUE)
), simplify = FALSE)
setB <- replicate(10, igraph::as_long_data_frame(
make_tree(n_cases = 10, R = 2, stochastic = TRUE)
), simplify = FALSE)
setC <- replicate(10, igraph::as_long_data_frame(
make_tree(n_cases = 10, R = 4, stochastic = TRUE)
), simplify = FALSE)
# PERMANOVA test
tree_test(setA, setB, setC, method = "permanova")
# Chi-Square test
tree_test(setA, setB, setC, method = "chisq")
Validate a Set of Transmission Trees
Description
Ensures that the input is a list containing at least one dataframe.
Usage
validate_set(set)
Arguments
set |
A list containing at least one dataframe. |
Value
Invisible TRUE
if the set is valid. Throws an error if invalid.
Validate sets of transmission trees
Description
Checks that the provided input is a list of at least two valid sets of transmission trees.
Each set is expected to be a list containing at least one data frame, as verified by
validate_set
.
Usage
validate_sets(sets)
Arguments
sets |
A list where each element represents a set of transmission trees. Each set must be a list containing one or more data frames. |
Details
At least two sets are provided. Each set is a list (and not a data frame itself). Each set contains at least one element. Every element in each set is a data frame.
Value
Invisible TRUE
if the sets are valid. Throws an error if invalid.
See Also
validate_set
for validating an individual set.
Validate a Transmission Tree
Description
Checks if a transmission tree meets specific topology criteria for our test. The tree must be a directed acyclic graph (DAG), weakly connected, and have at most one infector per node.
Usage
validate_tree(tree)
Arguments
tree |
A data frame with columns |
Value
Invisible TRUE
if the tree is valid. Throws an error if invalid.
Examples
good_tree <- data.frame(from = c(1, 2, 3), to = c(2, 3, 4))
validate_tree(good_tree)
bad_tree <- data.frame(from = c(1, 2, 3), to = c(2, 3, 2))
try(validate_tree(bad_tree))