Title: | Measuring Disparity |
Maintainer: | Thomas Guillerme <guillert@tcd.ie> |
Version: | 1.9 |
Date: | 2024-11-13 |
Description: | A modular package for measuring disparity (multidimensional space occupancy). Disparity can be calculated from any matrix defining a multidimensional space. The package provides a set of implemented metrics to measure properties of the space and allows users to provide and test their own metrics. The package also provides functions for looking at disparity in a serial way (e.g. disparity through time) or per groups as well as visualising the results. Finally, this package provides several statistical tests for disparity analysis. |
Depends: | R (≥ 3.6.0), ape, stats |
Imports: | ade4, castor, Claddis, ellipse, geometry, GET, graphics, grDevices, MASS, methods, mnormt, parallel, phangorn, phyclust, phylolm, utils, vegan, scales, zoo, |
License: | GPL-3 | file LICENSE |
Suggests: | MCMCglmm, geoscale, testthat, knitr |
RoxygenNote: | 7.2.3 |
URL: | https://github.com/TGuillerme/dispRity |
NeedsCompilation: | yes |
Packaged: | 2024-11-13 16:20:53 UTC; tguillerme |
Author: | Thomas Guillerme |
Repository: | CRAN |
Date/Publication: | 2024-11-13 18:20:02 UTC |
Measuring Disparity in R
Description
A modular package for measuring disparity (multidimensional space occupancy). Disparity can be calculated from any matrix defining a multidimensional space. The package provides a set of implemented metrics to measure properties of the space and allows users to provide and test their own metrics (Guillerme (2018) <doi:10.1111/2041-210X.13022>). The package also provides functions for looking at disparity in a serial way (e.g. disparity through time - Guillerme and Cooper (2018) <doi:10.1111/pala.12364>) or per groups as well as visualising the results. Finally, this package provides several statistical tests for disparity analysis.
Author(s)
Thomas Guillerme <guillert@tcd.ie>
Beck and Lee 2014 datasets
Description
Example datasets from Beck and Lee 2014.
Format
three matrices and one phylogenetic tree.
Details
-
BeckLee_tree
A phylogenetic tree with 50 living and fossil taxa -
BeckLee_mat50
The ordinated matrix based on the 50 taxa cladistic distances -
BeckLee_mat99
The ordinated matrix based on the 50 taxa + 49 nodes cladistic distances -
BeckLee_ages
A list of first and last occurrence data for fossil taxa -
BeckLee_disparity
adispRity
object with estimated sum of variances in 120 time bins, boostrapped 100 times from the Beck and Lee data
References
Beck RMD & Lee MSY. 2014. Ancient dates or accelerated rates? Morphological clocks and the antiquity of placental mammals. Proc. R. Soc. B 2014 281 20141278; DOI: 10.1098/rspb.2014.1278
See Also
BeckLee_disparity disparity
BeckLee_disparity
Description
An example of a dispRity
object.
Format
one dispRity
object.
Details
This matrix is based on the BeckLee
dataset and split into 120 continuous subsets (chrono.subsets
).
It was bootstrapped 100 times (boot.matrix
) with four rarefaction levels.
Disparity was calculated as the sum
of the variances
(dispRity
).
See Also
BeckLee disparity
Examples
## Not run:
## Loading the data
data(BeckLee_mat99)
data(BeckLee_tree)
data(BeckLee_ages)
## Creating the 7 subsets
subsets <- chrono.subsets(BeckLee_mat99, BeckLee_tree,
time = seq(from = 0, to = 120, by = 1),
method = "continuous", model = "proximity",
FADLAD = BeckLee_ages)
## Bootstrapping and rarefying
bootstraps <- boot.matrix(subsets, bootstraps = 100)
## Calculating disparity
BeckLee_disparity <- dispRity(bootstraps, metric = c(sum, variances))
## End(Not run)
Imports data from Claddis
Description
Takes Claddis data and computes both the distance and the ordination matrix
Usage
Claddis.ordination(data, distance = "mord", ..., k, add = TRUE, arg.cmdscale)
Arguments
data |
Data from |
distance |
Distance type to be computed by |
... |
Any optional arguments to be passed to |
k |
The number of dimensions in the ordination. If left empty, the number of dimensions is set to number of rows - 1. |
add |
whether to use the Cailliez correction for negative eigen values ( |
arg.cmdscale |
Any optional arguments to be passed to |
Details
If data
is a file path, the function will use a modified version of read.nexus.data
(that handles polymorphic and ambiguous characters). The file content will then be converted into a read_nexus_matrix
type list treating all characters as unordered.
If the distance
is set to NULL
, data
will be only converted into a read_nexus_matrix
type list.
Author(s)
Thomas Guillerme
See Also
calculate_morphological_distances
, read_nexus_matrix
, build_cladistic_matrix
, cmdscale
, custom.subsets
, chrono.subsets
, boot.matrix
, dispRity
.
Examples
## Not run:
require(Claddis)
## Ordinating the distance matrix of Claddis example data
Claddis.ordination(Claddis::michaux_1989)
## Creating simple discrete morphological matrix (with polymorphisms)
cat(
"#NEXUS
BEGIN DATA;
DIMENSIONS NTAX=5 NCHAR=5;
FORMAT SYMBOLS= \" 0 1 2\" MISSING=? GAP=- ;
MATRIX
t1 {01}1010
t2 02120
t3 1210(01)
t4 01111
t5 00101
;
END;", file = "morpho_matrix.nex")
## Ordinating the matrix (using a distance matrix)
Claddis.ordination("morpho_matrix.nex")
## Only converting the nexus matrix into a Claddis format
Claddis_data <- Claddis.ordination("morpho_matrix.nex", distance = NULL)
file.remove("morpho_matrix.nex")
## End(Not run)
MCMCglmm.subsets
Description
Creating a dispRity object from a MCMCglmm posterior output
Usage
MCMCglmm.subsets(
data,
posteriors,
group,
tree,
rename.groups,
set.loc = TRUE,
...
)
Arguments
data |
The |
posteriors |
A |
group |
Optional, a named vector of which group to include from the posteriors (if left empty the random and residual terms are used). See details. |
tree |
Optional, the tree(s) used in the MCMCglmm analyses. |
rename.groups |
Optional, a vector of group names for renaming them. See details. |
set.loc |
Optional, if no location is available for a subset ( |
... |
Optional arguments to be passed to |
Details
For the
group
option, the group names must be ones found in theposteriors
formula in the format <Type = Term:FactorLevel> as returned byMCMCglmm.levels(posteriors)
. For example, for returning two random effect, the phylogenetic one ("animal"
) and one for a specific clade (say the 2nd clade) as well as two residual terms for a specific factor (say level 1 and 4) you can usegroup = c(random = "animal", random = "animal:clade2", residual = "units:myfactor1", residual = "units:myfactor4")
.For the
rename.groups
option, the vector must be of class"character"
and must of the same length as the number of random and residual terms inposteriors
or ofgroup
argument (if used). If thegroup
argument is left empty, the groups are extracted from theposteriors
in the following order: the random terms first then the residual terms as specified in theposteriors
object formulas (respectivelyposteriors$Random$formula
andposteriors$Residual$formula
).
NOTE that the output dispRity
inherits the dimensions used in the posteriors
argument. You can always check the selected dimensions using:
data$call$dimensions
Author(s)
Thomas Guillerme
See Also
Examples
data(charadriiformes)
## Creating a dispRity object from the charadriiformes model
MCMCglmm.subsets(data = charadriiformes$data,
posteriors = charadriiformes$posteriors)
## Same but selecting only the three first random terms
MCMCglmm.subsets(data = charadriiformes$data,
posteriors = charadriiformes$posteriors,
tree = charadriiformes$tree,
group = MCMCglmm.levels(
charadriiformes$posteriors)[1:3],
rename.groups = c("gulls", "plovers", "sandpipers"))
MCMCglmm object utility functions
Description
Different utility functions to extract aspects of a MCMCglmm
object.
Usage
MCMCglmm.traits(MCMCglmm)
MCMCglmm.levels(MCMCglmm, convert)
MCMCglmm.sample(MCMCglmm, n)
MCMCglmm.covars(MCMCglmm, n, sample)
MCMCglmm.variance(MCMCglmm, n, sample, levels, scale)
Arguments
MCMCglmm |
A |
convert |
Logical, whether to return the raw term names names as expressed in the model column names ( |
n |
Optional, a number of random samples to extract. |
sample |
Optional, the specific samples to extract (is ignored if |
levels |
Optional, a vector |
scale |
Logical, whether to scale the variance relative to all the levels ( |
Details
-
MCMCglmm.levels
returns the different random and residual terms levels of aMCMCglmm
object. This function uses the default optionconvert = TRUE
to convert the names into something more readable. Toggle toconvert = FALSE
for the raw names. -
MCMCglmm.traits
returns the column names of the different traits of aMCMCglmm
formula object. -
MCMCglmm.sample
returns a vector of sample IDs present in theMCMCglmm
object. Ifn
is missing, all the samples IDs are returned. Else, a random series of sample IDs are returned (with replacement if n greater than the number of available samples). -
MCMCglmm.covars
returns a list of covariance matrices and intercepts from aMCMCglmm
object (respectively fromMCMCglmm$VCV
andMCMCglmm$Sol
). By default, all the covariance matrices and intercepts are returned but you can use either of the argumentssample
to return specific samples (e.g.MCMCglmm.covars(data, sample = c(1, 42))
for returning the first and 42nd samples) orn
to return a specific number of random samples (e.g.MCMCglmm.covars(data, n = 42)
for returning 42 random samples). -
MCMCglmm.variance
returns a list of covariance matrices and intercepts from aMCMCglmm
object (respectively fromMCMCglmm$VCV
andMCMCglmm$Sol
). By default, all the covariance matrices and intercepts are returned but you can use either of the argumentssample
to return specific samples (e.g.MCMCglmm.covars(data, sample = c(1, 42))
for returning the first and 42nd samples) orn
to return a specific number of random samples (e.g.MCMCglmm.covars(data, n = 42)
for returning 42 random samples).
Author(s)
Thomas Guillerme
See Also
Examples
## Loading the charadriiformes model
data(charadriiformes)
model <- charadriiformes$posteriors
class(model) # is MCMCglmm
## Get the list of levels from the model
MCMCglmm.levels(model)
## The raw levels names (as they appear in the MCMCglmm object)
MCMCglmm.levels(model, convert = FALSE)
## Get the traits names from the model
MCMCglmm.traits(model)
## Get all the available samples in the model
length(MCMCglmm.sample(model))
## Get 5 random sample IDs from the model
MCMCglmm.sample(model, n = 5)
## Get one specific samples from the model
MCMCglmm.covars(model, sample = 42)
## Get two random samples from the model
MCMCglmm.covars(model, n = 2)
## Get the variance for each terms in the model
terms_variance <- MCMCglmm.variance(model)
boxplot(terms_variance, horizontal = TRUE, las = 1)
Add, remove or get trees (or subtrees)
Description
Adding, extracting or removing the tree component from a dispRity
object.
Usage
add.tree(data, tree, replace = FALSE)
get.tree(data, subsets = FALSE, to.root = FALSE)
remove.tree(data)
Arguments
data |
A |
tree |
A |
replace |
Logical, whether to replace any existing tree ( |
subsets |
Either a logical whether to extract the tree for each subset ( |
to.root |
Logical, whether to return the subset tree including the root of the tree ( |
Details
get.tree
allows to extract the trees specific to each subsets.
Author(s)
Thomas Guillerme and Jack Hadfield
See Also
custom.subsets
, chrono.subsets
, boot.matrix
, dispRity
.
Examples
## Loading a dispRity object
data(disparity)
## Loading a tree
data(BeckLee_tree)
## Removing the tree from the dispRity object
(tree_data <- remove.tree(disparity))
## Extracting the tree
get.tree(tree_data) # is null
## Adding a tree to the disparity object
tree_data <- add.tree(tree_data, tree = BeckLee_tree)
## Extracting the tree
get.tree(tree_data) # is a "phylo" object
## Adding the same tree again
tree_data <- add.tree(tree_data, tree = BeckLee_tree)
get.tree(tree_data) # is a "multiPhylo" object (2 trees)
## Replacing the two trees by one tree
tree_data <- add.tree(tree_data, tree = BeckLee_tree, replace = TRUE)
get.tree(tree_data) # is a "phylo" object
adonis dispRity (from vegan::adonis2
)
Description
Passing dispRity
objects to the adonis2
function from the vegan
package.
Usage
adonis.dispRity(
data,
formula = matrix ~ group,
method = "euclidean",
...,
warn = TRUE,
matrix = 1
)
Arguments
data |
A |
formula |
The model formula (default is |
method |
The distance method to be passed to |
... |
Any optional arguments to be passed to |
warn |
|
matrix |
|
Details
The first element of the formula (the response) must be called matrix
and the predictors must be existing in the subsets of the dispRity
object.
If data$matrix[[1]]
is not a distance matrix, distance is calculated using the dist
function. The type of distance can be passed via the standard method
argument that will be recycled by adonis2
.
If the dispRity
data has custom subsets with a single group, the formula is set to matrix ~ group
.
If the dispRity
data has custom subsets with multiple group categories (separated by a dot, e.g. c("group1.cat1", "group1.cat2", "group2.catA", "group2.catB")
being two groups with two categories each), the default formula is matrix ~ first_group
but can be set to any combination (e.g. matrix ~ first_group + second_group
).
If the dispRity
data has time subsets, the predictor is automatically set to time
.
Author(s)
Thomas Guillerme
References
Oksanen J, Simpson G, Blanchet F, Kindt R, Legendre P, Minchin P, O'Hara R, Solymos P, Stevens M, Szoecs E, Wagner H, Barbour M, Bedward M, Bolker B, Borcard D, Carvalho G, Chirico M, De Caceres M, Durand S, Evangelista H, FitzJohn R, Friendly M, Furneaux B, Hannigan G, Hill M, Lahti L, McGlinn D, Ouellette M, Ribeiro Cunha E, Smith T, Stier A, Ter Braak C, Weedon J (2024). vegan: Community Ecology Package_. R package version 2.6-8,
See Also
adonis2
, test.dispRity
, custom.subsets
, chrono.subsets
.
Examples
## Adonis with one groups
## Generating a random character matrix
character_matrix <- sim.morpho(rtree(20), 50, rates = c(rnorm, 1, 0))
## Calculating the distance matrix
distance_matrix <- as.matrix(dist(character_matrix))
## Creating two groups
random_groups <- list("group1" = 1:10, "group2" = 11:20)
## Generating a dispRity object
random_disparity <- custom.subsets(distance_matrix, random_groups)
## Running a default NPMANOVA
adonis.dispRity(random_disparity)
## Adonis with multiple groups
## Creating a random matrix
random_matrix <- matrix(data = rnorm(90), nrow = 10,
dimnames = list(letters[1:10]))
## Creating two groups with two states each
groups <- as.data.frame(matrix(data = c(rep(1,5), rep(2,5), rep(c(1,2), 5)),
nrow = 10, ncol = 2, dimnames = list(letters[1:10], c("g1", "g2"))))
## Creating the dispRity object
multi_groups <- custom.subsets(random_matrix, groups)
## Running the NPMANOVA
adonis.dispRity(multi_groups, matrix ~ g1 + g2)
## Adonis with time
## Creating time series
data(BeckLee_mat50)
data(BeckLee_tree)
data(BeckLee_ages)
time_subsets <- chrono.subsets(BeckLee_mat50, BeckLee_tree,
method = "discrete", inc.nodes = FALSE, time = c(100, 85, 65, 0),
FADLAD = BeckLee_ages)
## Running the NPMANOVA with time as a predictor
adonis.dispRity(time_subsets, matrix ~ time)
## Running the NPMANOVA with each time bin as a predictor
adonis.dispRity(time_subsets, matrix ~ chrono.subsets)
Apply inapplicable characters to a matrix.
Description
Apply inapplicable characters to discrete morphological matrix.
Usage
apply.NA(matrix, NAs, tree, invariant = FALSE, verbose = FALSE)
Arguments
matrix |
A discrete morphological matrix. |
NAs |
Either a numeric value of how many characters to make inapplicable or vector of characters inapplicability source (either |
tree |
If any inapplicable source is |
invariant |
Whether to allow invariant sites among the characters with inapplicable data. If |
verbose |
Whether to be verbose or not. |
Details
If the NAs
argument is a numeric value n, generates n characters with inapplicable data based on the "clade"
source.
The NAs
argument intakes a vector of character inapplicability source rendering a number of characters inapplicable using the following sources:
"character"
draws inapplicable characters directly from the character matrix, ignoring the phylogeny (i.e. for a random character X, an other random character Y will have inapplicable characters for each character states 0 for character X).
"clade"
draws inapplicable characters from the phylogeny: it will randomly apply inapplicable characters states for some characters by randomly selecting clades from the provided tree. The algorithm randomly assigns an inapplicable token for this character for all taxa in this clade or all taxa outside this clade.
For example NAs = c(rep("character", 2), rep("clade", 2))
will generate 4 characters with inapplicable data, two using previous characters and two other using random clades.
Author(s)
Thomas Guillerme
See Also
Examples
set.seed(4)
## A random tree with 15 tips
tree <- rcoal(15)
## setting up the parameters
my_rates = c(rgamma, rate = 10, shape = 5)
my_substitutions = c(runif, 2, 2)
## A Mk matrix (10*50)
matrixMk <- sim.morpho(tree, characters = 100, model = "ER",
states = c(0.85, 0.15), rates = my_rates, invariant = FALSE)
## Setting the number and source of inapplicable characters
my_inapplicables <- c(rep("character", 5), rep("clade", 5))
## Apply some inapplicable characters to the matrix
matrix <- apply.NA(matrixMk, my_inapplicables, tree, verbose = TRUE)
as.covar
Description
Changes a dispRity metric to use the covar element from a dispRity object.
Usage
as.covar(fun, ..., VCV = TRUE, loc = FALSE)
Arguments
fun |
a |
... |
any additional arguments to pass to fun. |
VCV |
logical, whether to use the |
loc |
logical, whether to use the |
Details
This function effectively transforms the input argument from matrix
(or matrix2
) to matrix = matrix$VCV
and adds a evaluation after the return call to indicate that the function works on a $covar
element.
Note that if the function does not have an argument called matrix
, the first argument is estimated as being the one to be transformed (e.g. if the function has its first argument x
, it will transform it to x = x$VCV
).
You can toggle between using the $VCV
or the $loc
argument in the $covar
matrix by using either VCV = TRUE, loc = FALSE
(to access only fun(matrix = matrix$VCV, ...)
), VCV = FALSE, loc = TRUE
(to access only matrix = matrix(matrix$loc, nrow = 1), ...
) or VCV = TRUE, loc = TRUE
(to access fun(matrix = matrix$VCV, loc = matrix$loc, ...)
; provided fun
has an extra loc
argument).
For between.groups
metrics with matrix
and matrix2
arguments, you can provide multiple logicals for VCV
and loc
to be applied repspectively to matrix
and matrix2
. For example VCV = TRUE
will reinterpret matrix
and matrix2
as matrix$VCV
and matrix2$VCV
but loc = c(TRUE, FALSE)
will only reinterpret matrix
as matrix$loc
(and matrix2
will not be reinterpreted).
Author(s)
Thomas Guillerme
See Also
Examples
## Creating a dispRity
data(charadriiformes)
## Creating a dispRity object from the charadriiformes model
covar_data <- MCMCglmm.subsets(data = charadriiformes$data,
posteriors = charadriiformes$posteriors)
## Get one matrix and one covar matrix
one_matrix <- get.matrix(covar_data, subsets = 1)
one_covar <- get.covar(covar_data, subsets = 1, n = 1)[[1]][[1]]
## Measure the centroids
centroids(one_matrix)
## Measure the centroids on the covar matrix
as.covar(centroids)(one_covar)
## Is the same as:
centroids(one_covar$VCV)
## Apply the measurement on a dispRity object:
## On the traitspace:
summary(dispRity(covar_data, metric = c(sum, centroids)))
## On the covariance matrices:
summary(dispRity(covar_data, metric = c(sum, as.covar(centroids))))
## The same but with additional options (centre = 100)
summary(dispRity(covar_data,
metric = c(sum, as.covar(centroids)),
centre = 100))
## Example with the VCV and loc options
## A metric that works with both VCV and loc
## (the sum of the variances minus the distance from the location)
sum.var.dist <- function(matrix, loc = rep(0, ncol(matrix))) {
## Get the sum of the diagonal of the matrix
sum_diag <- sum(diag(matrix))
## Get the distance between 0 and the loc
dist_loc <- dist(matrix(c(rep(0, ncol(matrix)), loc), nrow = 2, byrow = TRUE))[1]
## Return the sum of the diagonal minus the distance
return(sum_diag - dist_loc)
}
## Changing the $loc on one_covar for the demonstration
one_covar$loc <- c(1, 2, 3)
## Metric on the VCV part only
as.covar(sum.var.dist, VCV = TRUE, loc = FALSE)(one_covar)
## Metric on the loc part only
as.covar(sum.var.dist, VCV = FALSE, loc = TRUE)(one_covar)
## Metric on both parts
as.covar(sum.var.dist, VCV = TRUE, loc = TRUE)(one_covar)
Bhattacharyya Coefficient
Description
Calculates the probability of overlap between two distributions.
Usage
bhatt.coeff(x, y, bw = bw.nrd0, ...)
Arguments
x , y |
two distributions. |
bw |
the bandwidth size, either a |
... |
optional arguments to be passed to the |
Author(s)
Thomas Guillerme
References
Bhattacharyya A. 1943. On a measure of divergence between two statistical populations defined by their probability distributions. Bull. Calcutta Math. Soc., 35, pp. 99-109
See Also
Examples
## Two dummy distributions
x <- rnorm(1000, 0, 1)
y <- rnorm(1000, 1, 2)
## What is the probability of overlap of these distributions?
bhatt.coeff(x, y)
Bootstraps and rarefies data.
Description
Bootstraps and rarefies either a matrix or a list of matrices.
Usage
boot.matrix(
data,
bootstraps = 100,
boot.type = "full",
boot.by = "rows",
rarefaction = FALSE,
verbose = FALSE,
prob = NULL
)
Arguments
data |
A |
bootstraps |
The number of bootstrap pseudoreplicates ( |
boot.type |
The bootstrap algorithm to use ( |
boot.by |
Which dimension of the data to bootstrap: either |
rarefaction |
Either a |
verbose |
A |
prob |
Optional, a |
Details
data
: The data is considered as the multidimensional space and is not transformed (e.g. if ordinated with negative eigen values, no correction is applied to the matrix).
rarefaction
: when the input is numeric
, the number of elements is set to the value(s) for each bootstrap. If some subsets have fewer elements than the rarefaction value, the subsets is not rarefied.
When the input is "min"
, the smallest number of elements is used (or 3 if some subsets have less than 3 elements).
boot.type
: the different bootstrap algorithms are:
-
"full"
: resamples all the rows of the matrix and replaces them with a new random sample of rows (withreplace = TRUE
, meaning all the elements can be duplicated in each bootstrap). -
"single"
: resamples only one row of the matrix and replaces it with a new randomly sampled row (withreplace = FALSE
, meaning that only one element can be duplicated in each bootstrap). -
"null"
: resamples all rows of the matrix across subsets. I.e. for each subset of n elements, this algorithm resamples n elements across ALL subsets. If only one subset (or none) is used, this does the same as the"full"
algorithm.
prob
: This option allows to attribute specific probability to each element to be drawn.
A probability of 0 will never sample the element, a probability of 1 will always allow it to be sampled.
This can also be useful for weighting elements: an element with a weight of 10 will be sampled ten times more.
If the argument is a matrix
, it must have rownames attributes corresponding to the element names.
If the argument is a vector
, it must have names attributes corresponding to the element names.
Multiple trees: If the given data
is a chrono.subsets
based on multiple trees, the sampling is proportional to the presence of each element in each tree: \sum (1/n) / T
(with n being the maximum number of elements among the trees and T being the total numbers of trees).
For example, for a slice through two trees resulting in the selection of elements A
and B
in the first tree and A
, B
and C
in the second tree, the "full"
bootstrap algorithm will select three elements (with replacement) between A
, B
and C
with a probability of respectively p(A) = 1/3
(p(A) = (1/3 + 1/3) / 2
), p(B) = 1/3
and p(C) = 1/6
(p(C) = (0 + 1/3) / 2
).
Value
This function outputs a dispRity
object containing:
matrix |
the multidimensional space (a |
call |
A |
subsets |
A |
Use summary.dispRity to summarise the dispRity
object.
Author(s)
Thomas Guillerme
See Also
cust.subsets
, chrono.subsets
, dispRity
.
Examples
## Load the Beck & Lee 2014 matrix
data(BeckLee_mat50)
## Bootstrapping a matrix
## Bootstrapping an ordinated matrix 20 times
boot.matrix(BeckLee_mat50, bootstraps = 20)
## Bootstrapping an ordinated matrix with rarefaction
boot.matrix(BeckLee_mat50, bootstraps = 20, rarefaction = TRUE)
## Bootstrapping an ordinated matrix with only elements 7, 10 and 11 sampled
boot.matrix(BeckLee_mat50, bootstraps = 20, rarefaction = c(7, 10, 11))
## Bootstrapping an the matrix but without sampling Cimolestes and sampling Maelestes 10x more
boot.matrix(BeckLee_mat50, bootstraps = 20, prob = c("Cimolestes" = 0, "Maelestes" = 10))
## Bootstrapping a subsets of matrices
## Generating a dummy subsets of matrices
ordinated_matrix <- matrix(data = rnorm(90), nrow = 10, ncol = 9,
dimnames = list(letters[1:10]))
matrix_list <- custom.subsets(ordinated_matrix, list(A = 1:5, B = 6:10))
## Bootstrapping the subsets of matrices 20 times (each)
boot.matrix(matrix_list, bootstraps = 20)
Character differences
Description
Calculates the character difference from a discrete matrix
Usage
char.diff(
matrix,
method = "hamming",
translate = TRUE,
special.tokens,
special.behaviours,
order = FALSE,
by.col = TRUE,
correction
)
Arguments
matrix |
A discrete matrix or a list containing discrete characters. The differences is calculated between the columns (usually characters). Use |
method |
The method to measure difference: |
translate |
|
special.tokens |
optional, a named |
special.behaviours |
optional, a |
order |
|
by.col |
|
correction |
optional, an eventual |
Details
Each method for calculating distance is expressed as a function of d(x, y)
where x
and y
are a pair of columns (if by.col = TRUE
) or rows in the matrix and n is the number of comparable rows (if by.col = TRUE
) or columns between them and i is any specific pair of rows (if by.col = TRUE
) or columns.
The different methods are:
-
"hamming"
The relative distance between characters. This is equal to the Gower distance for non-numeric comparisons (e.g. character tokens; Gower 1966).d(x,y) = \sum[i,n](abs(x[i] - y[i])/n
-
"manhattan"
The "raw" distance between characters:d(x,y) = \sum[i,n](abs(x[i] - y[i])
-
"comparable"
The number of comparable characters (i.e. the number of tokens that can be compared):d(x,y) = \sum[i,n]((x[i] - y[i])/(x[i] - y[i]))
-
"euclidean"
The euclidean distance between characters:d(x,y) = \sqrt(\sum[i,n]((x[i] - y[i])^2))
-
"maximum"
The maximum distance between characters:d(x,y) = max(abs(x[i] - y[i]))
-
"mord"
The maximum observable distance between characters (Lloyd 2016):d(x,y) = \sum[i,n](abs(x[i] - y[i])/\sum[i,n]((x[i] - y[i])/(x[i] - y[i])
-
"none"
Returns the matrix with eventual converted and/or translated tokens. -
"binary"
Returns the matrix with the binary characters.
When using translate = TRUE
, the characters are translated following the xyz notation where the first token is translated to 1, the second to 2, etc. For example, the character 0, 2, 1, 0
is translated to 1, 2, 3, 1
. In other words when translate = TRUE
, the character tokens are not interpreted as numeric values. When using translate = TRUE
, scaled metrics (i.e "hamming"
and "gower"
) are divide by n-1
rather than n
due to the first character always being equal to 1.
special.behaviours
allows to generate a special rule for the special.tokens
. The functions should can take the arguments character, all_states
with character
being the character that contains the special token and all_states
for the character (which is automatically detected by the function). By default, missing data returns and inapplicable returns NA
, and polymorphisms and uncertainties return all present states.
missing = function(x,y) NA
inapplicable = function(x,y) NA
polymorphism = function(x,y) strsplit(x, split = "\\&")[[1]]
uncertainty = function(x,y) strsplit(x, split = "\\/")[[1]]
Functions in the list must be named following the special token of concern (e.g. missing
), have only x, y
as inputs and a single output a single value (that gets coerced to integer
automatically). For example, the special behaviour for the special token "?"
can be coded as: special.behaviours = list(missing = function(x, y) return(y)
to make all comparisons containing the special token containing "?"
return any character state y
.
IMPORTANT: Note that for any distance method, NA
values are skipped in the distance calculations (e.g. distance(A = {1, NA, 2}, B = {1, 2, 3}
) is treated as distance(A = {1, 2}, B = {1, 3}
)).
IMPORTANT: Note that the number of symbols (tokens) per character is limited by your machine's word-size (32 or 64 bits). If you have more than 64 tokens per character, you might want to use continuous data.
Value
A character difference value or a matrix of class char.diff
Author(s)
Thomas Guillerme
References
Felsenstein, J. 2004. Inferring phylogenies vol. 2. Sinauer Associates Sunderland. Gower, J.C. 1966. Some distance properties of latent root and vector methods used in multivariate analysis. Biometrika 53:325-338. Hamming, R.W. 1950. Error detecting and error correcting codes. The Bell System Technical Journal. DOI: 10.1002/j.1538-7305.1950.tb00463.x. Lloyd, G.T. 2016. Estimating morphological diversity and tempo with discrete character-taxon matrices: implementation, challenges, progress, and future directions. Biological Journal of the Linnean Society. DOI: 10.1111/bij.12746.
See Also
plot.char.diff
, vegdist
, dist
, calculate_morphological_distances
, daisy
Examples
## Comparing two binary characters
char.diff(list(c(0, 1, 0, 1), c(0, 1, 1, 1)))
## Pairwise comparisons in a morphological matrix
morpho_matrix <- matrix(sample(c(0,1), 100, replace = TRUE), 10)
char.diff(morpho_matrix)
## Adding special tokens to the matrix
morpho_matrix[sample(1:100, 10)] <- c("?", "0&1", "-")
char.diff(morpho_matrix)
## Modifying special behaviours for tokens with "&" to be treated as NA
char.diff(morpho_matrix,
special.behaviours = list(polymorphism = function(x,y) return(NA)))
## Adding a special character with a special behaviour (count "%" as "100")
morpho_matrix[sample(1:100, 5)] <- "%"
char.diff(morpho_matrix,
special.tokens = c("paragraph" = "\\%"),
special.behaviours = list(paragraph = function(x,y) as.integer(100)))
## Comparing characters with/without translation
char.diff(list(c(0, 1, 0, 1), c(1, 0, 1, 0)), method = "manhattan")
# no character difference
char.diff(list(c(0, 1, 0, 1), c(1, 0, 1, 0)), method = "manhattan",
translate = FALSE)
# all four character states are different
Charadriiformes
Description
An example of a MCMCglmm
model.
Format
one data.frame
, one phylo
and one MCMCglmm
.
Details
This dataset is based on a random subset of 359 Charadriiformes (gulls, plovers and sandpipers) from Cooney et al 2017 and trees from Jetz et al 2012. It contains:
-
data
A"data.frame"
. -
tree
A consensus tree of 359 charadriiformes species ("phylo"
). -
posteriors
The posteriors from a"MCMCglmm"
model (see example below). -
tree_distribution
A random distribution of 10 trees of the 359 charadriiformes species ("multiPhylo"
).
References
Cooney CR, Bright JA, Capp EJ, Chira AM,Hughes EC, Moody CJ, Nouri LO, Varley ZK, Thomas GH. Mega-evolutionary dynamics of the adaptive radiation of birds. Nature. 2017 Feb;542(7641):344-7.
Jetz W, Thomas GH, Joy JB, Hartmann K, Mooers AO. The global diversity of birds in space and time. Nature. 2012 Nov;491(7424):444-8.
Examples
## Not run:
## Reproducing the MCMCglmm model
require(MCMCglmm)
data(charadriiformes)
## Setting up the model parameters:
## 1 - The formula (the first three PC axes)
model_formula <- cbind(PC1, PC2, PC3) ~ trait:clade-1
## 2 - The residual term
model_residuals <- ~us(trait):units
## 3 - The random terms
## (one per clade and one for the whole phylogeny)
model_randoms <- ~ us(at.level(clade,1):trait):animal
+ us(at.level(clade,2):trait):animal
+ us(at.level(clade,3):trait):animal
+ us(trait):animal
## Flat priors for the residuals and random terms
flat_priors <- list(
## The residuals priors
R = list(
R1 = list(V = diag(3), nu = 0.002)),
## The random priors (the phylogenetic terms)
G = list(
G1 = list(V = diag(3), nu = 0.002),
G2 = list(V = diag(3), nu = 0.002),
G3 = list(V = diag(3), nu = 0.002),
G4 = list(V = diag(3), nu = 0.002)))
## Run the model for 110000 iterations
## sampled every 100 with a burnin (discard)
## of the first 10000 iterations)
model <- MCMCglmm(formula = model_formula,
rcov = model_residual,
random = model_randoms,
family = rep("gaussian", 3),
prior = flat_priors,
nitt = 110000,
burnin = 10000,
thin = 100,
pedigree = charadriiformes$tree,
data = charadriiformes$data)
## End(Not run)
Check a morphological matrix consistency levels.
Description
Performs a fast check of the phylogenetic signal in a morphological matrix using parsimony.
Usage
check.morpho(
matrix,
orig.tree,
parsimony = "fitch",
first.tree = c(phangorn::dist.hamming, phangorn::NJ),
distance = phangorn::RF.dist,
...,
contrast.matrix,
verbose = FALSE
)
Arguments
matrix |
A discrete morphological matrix. |
orig.tree |
Optional, the input tree to measure the distance between the parsimony and the original tree. |
parsimony |
Either the parsimony algorithm to be passed to |
first.tree |
A list of functions to generate the first most parsimonious tree (default = |
distance |
Optional, if orig.tree is provided, the function to use for measuring distance between the trees (default = |
... |
Any additional arguments to be passed to the parsimony algorithm. |
contrast.matrix |
An optional contrast matrix. By default, the function recognises any character state token as different apart from |
verbose |
Whether to be verbose or not ( |
Details
The
first.tree
argument must be a list of functions to be used in a cascade to transform the matrix (as aphyDat
object) into a tree using the functions iteratively. For example the defaultc(dist.hamming, NJ)
will apply the following to the matrix:NJ(dist.hamming(matrix))
Value
Returns the parsimony score (using parsimony
), the consistency and retention indices (using CI
and RI
) from the most parsimonious tree obtained from the matrix.
Can also return the topological distance from the original tree if provided.
Author(s)
Thomas Guillerme
See Also
sim.morpho
, get.contrast.matrix
, optim.parsimony
Examples
## Generating a random tree
random_tree <- rcoal(10)
## Generating a random matrix
random_matrix <- sim.morpho(random_tree, characters = 50, model = "ER",
rates = c(rgamma, 1, 1))
## Checking the matrix scores
check.morpho(random_matrix, orig.tree = random_tree)
Separating data in chronological subsets.
Description
Splits the data into a chronological (time) subsets list.
Usage
chrono.subsets(
data,
tree = NULL,
method,
time,
model,
inc.nodes = FALSE,
FADLAD = NULL,
verbose = FALSE,
t0 = FALSE,
bind.data = FALSE,
dist.data = FALSE
)
Arguments
data |
A |
tree |
|
method |
The time subsampling method: either |
time |
Either a single |
model |
One of the following models: |
inc.nodes |
A |
FADLAD |
|
verbose |
A |
t0 |
If |
bind.data |
If |
dist.data |
A |
Details
The data is considered as the multidimensional space with rows as elements and columns as dimensions and is not transformed (e.g. if ordinated with negative eigen values, no correction is applied to the matrix).
If method = "continuous"
and when the sampling is done along an edge of the tree, the data selected for the time subsets can be one of the following:
Punctuated models:
-
"acctran"
: always selecting the value from the ancestral node. -
"deltran"
: always selecting the value from the descendant node or tip. -
"random"
: randomly selecting between the ancestral node or the descendant node/tip. -
"proximity"
: selecting the ancestral node or the descendant node/tip with a probability relative to branch length.
-
Gradual models:
-
"equal.split"
: randomly selecting from the ancestral node or the descendant node or tip with a 50% probability each. -
"gradual.split"
: selecting the ancestral node or the descendant with a probability relative to branch length.
-
N.B. "equal.split"
and "gradual.split"
differ from the punctuated models by outputting a node/tip probability table rather than simply the node and the tip selected. In other words, when bootstrapping using boot.matrix
, the two former models will properly integrate the probability to the bootstrap procedure (i.e. different tips/nodes can be drawn) and the two latter models will only use the one node/tip determined by the model before the bootstrapping.
Author(s)
Thomas Guillerme
References
Guillerme T. & Cooper N. 2018. Time for a rethink: time sub-sampling methods in disparity-through-time analyses. Palaeontology. DOI: 10.1111/pala.12364.
See Also
tree.age
, slice.tree
, cust.subsets
, boot.matrix
, dispRity
.
Examples
## Load the Beck & Lee 2014 data
data(BeckLee_tree) ; data(BeckLee_mat50)
data(BeckLee_mat99) ; data(BeckLee_ages)
## Time binning (discrete method)
## Generate two discrete time bins from 120 to 40 Ma every 40 Ma
chrono.subsets(data = BeckLee_mat50, tree = BeckLee_tree, method = "discrete",
time = c(120, 80, 40), inc.nodes = FALSE, FADLAD = BeckLee_ages)
## Generate the same time bins but including nodes
chrono.subsets(data = BeckLee_mat99, tree = BeckLee_tree, method = "discrete",
time = c(120, 80, 40), inc.nodes = TRUE, FADLAD = BeckLee_ages)
## Time slicing (continuous method)
## Generate five equidistant time slices in the dataset assuming a proximity
## evolutionary model
chrono.subsets(data = BeckLee_mat99, tree = BeckLee_tree,
method = "continuous", model = "acctran", time = 5,
FADLAD = BeckLee_ages)
Cleaning phylogenetic data
Description
Cleans a table/tree to match with a given table/tree
Usage
clean.data(data, tree, inc.nodes = FALSE)
Arguments
data |
A |
tree |
A |
inc.nodes |
Logical, whether to check if the nodes in |
Details
Note if inc.nodes
is set to TRUE
, the function outputs an error message if there is no matching.
Value
A list
containing the cleaned data and tree(s) and information on the eventual dropped tips and rows.
Author(s)
Thomas Guillerme
See Also
Examples
##Creating a set of different trees
trees_list <- list(rtree(5, tip.label = LETTERS[1:5]), rtree(4,
tip.label = LETTERS[1:4]), rtree(6, tip.label = LETTERS[1:6]))
class(trees_list) <- "multiPhylo"
##Creating a matrix
dummy_data <- matrix(c(rnorm(5), runif(5)), 5, 2,
dimnames = list(LETTERS[1:5], c("var1", "var2")))
##Cleaning the trees and the data
cleaned <- clean.data(data = dummy_data, tree = trees_list)
##The taxa that where dropped (tips and rows):
c(cleaned$dropped_tips, cleaned$dropped_rows)
##The cleaned trees:
cleaned$tree
##The cleaned data set:
cleaned$data
covar.plot
Description
Visualising components of a dispRity
object with covar.
Usage
covar.plot(
data,
n,
points = TRUE,
major.axes = FALSE,
ellipses = FALSE,
level = 0.95,
dimensions = c(1, 2),
centres = colMeans,
scale,
transparent.scale,
add = FALSE,
apply.to.VCV = FALSE,
...
)
Arguments
data |
an |
n |
optional, a number of random posteriors to use. |
points |
logical, whether to plot the observed elements ( |
major.axes |
can be either logical for plotting all (or |
ellipses |
can be either logical for plotting all (or |
level |
the confidence interval level of the major axes and ellipses (default is |
dimensions |
which dimensions (default is |
centres |
optional, a way to determine ellipses or major axes positions. Can be either a |
scale |
optional, the name of a group from |
transparent.scale |
optional, if multiple major axes and/or ellipses are plotted, a scaling factor for the transparency. If left empty, the transparency is set to |
add |
logical, whether to add the plot to an existing plot ( |
apply.to.VCV |
logical, if |
... |
any graphical options to be passed to |
Details
When specifying optional arguments with ...
in a graph with multiple elements (e.g. points
, lines
, etc...) you can specify which specific element to affect using the syntax <element>.<argument>
. For example if you want everything in the plot to be in blue at the exception of the points to be red, you can use covar.plot(..., col = "blue", points.col = "red")
.
The arguments major.axes
and ellipses
can intake a function
for summarising the display of multiple variance covariance matrices (if n
is missing or greater than one). This can be any central tendency function such as mean
, median
or mode.val
.
The argument centres
allows to determine how to calculate the centre of each ellipses or major axes. The argument can be either:
A
function
to calculate the centre from a group like the defaultcolMeans
function that calculates the centroid coordinates of each group;A
numeric
value to be replicated as the coordinates for the centre of each group (e.g.centres = 0
sets all the centres at the coordinatesc(0,0,0,...)
); or a vector of numeric values to be directly used as the coordinates for each group (e.g.centres = c(1,2,3)
sets all the centres at the coordinatesc(1,2,3)
); or a list of numeric values or numeric vectors to be used as the coordinates for the centres of each group;-
"intercept"
for using the estimated posterior intercept for each sample.
NOTE that if the input contains more dimensions than the visualised dimensions (by default dimensions = c(1,2)
) the ellipses and major axes are projected from an n-dimensional space onto a 2D space which might make them look incorrect.
NOTE also that the ellipses and major axes are measured independently, when summarising both parameters (e.g. by using ellipses = mean
and major.axes = mean
), the displayed summarised major axes is not calculated from the summarised ellipse but from the coordinates of all major axes (and therefore might not match the coordinates of the ellipse).
Author(s)
Thomas Guillerme
See Also
MCMCglmm.subsets
covar.utilities
Examples
data(charadriiformes)
## Creating a dispRity object from the charadriiformes model
covar <- MCMCglmm.subsets(data = charadriiformes$data,
posteriors = charadriiformes$posteriors,
group = MCMCglmm.levels(
charadriiformes$posteriors)[1:4],
rename.groups = c("gulls", "plovers",
"sandpipers", "phylogeny"))
## Default plot
covar.plot(covar)
## Same plot with more options
covar.plot(covar, n = 50, ellipses = mean, major.axes = TRUE,
col = c("orange", "blue", "darkgreen", "grey", "grey"),
legend = TRUE, points = TRUE, points.cex = 0.2,
main = "Charadriiformes shapespace")
Utilities for a dispRity object with covariance matrices
Description
Different utility functions to extract aspects of a MCMCglmm
object.
Usage
get.covar(data, subsets, sample, n, dimensions)
axis.covar(data, subsets, sample, n, dimensions, level = 0.95, axis = 1)
Arguments
data |
a |
subsets |
optional, a |
sample |
optional, one or more specific posterior sample IDs (is ignored if n is used) or a function to summarise all axes. |
n |
optional, a random number of covariance matrices to sample (if left empty, all are used). |
dimensions |
optional, which dimensions to use. If missing the dimensions from |
level |
which confidence interval level to use (default is |
axis |
which major axis to calculate (default is |
Author(s)
Thomas Guillerme
See Also
Examples
## Load the Charadriiformes dataset
data(charadriiformes)
## Making a dispRity object with covar data
covar_data <- MCMCglmm.subsets(data = charadriiformes$data,
posteriors = charadriiformes$posteriors)
## Get the two first covar matrices for each level
get.covar(covar_data, sample = c(1,2))
## Get 2 random covar matrices in 2D for each level
get.covar(covar_data, n = 2, dimensions = c(1,2))
## Get mean covar matrix for each level
get.covar(covar_data, sample = mean)
## Get the 0.95 major axis for the 42th covar matrix
axis.covar(covar_data, sample = 42)
## Get the 0.5 major axis for 2 random samples
axis.covar(covar_data, n = 1, level = 0.5)
## Get the median 0.95 minor axis of the 2D ellipse
axis.covar(covar_data, sample = mean, dimensions = c(1,2), axis = 2)
Separates stem and crown species
Description
Selects the crown
Usage
crown.stem(tree, inc.nodes = TRUE, output.names = TRUE)
Arguments
tree |
a |
inc.nodes |
whether to include the nodes ( |
output.names |
whether to output the taxa names ( |
Author(s)
Thomas Guillerme
See Also
Examples
## A tree with fossil taxa
data(BeckLee_tree)
## Getting both crown and stem taxa lists
crown.stem(BeckLee_tree)
## Splitting the tree into two subtrees
crown_stem_trees <- crown.stem(BeckLee_tree, output.names = FALSE)
## Graphical parameters
op <- par(mfrow = c(1,3))
## Plotting the trees
plot(BeckLee_tree, main = "Full tree")
plot(crown_stem_trees$crown, main = "Crown group")
plot(crown_stem_trees$stem, main = "Stem group")
Separating data into custom subsets.
Description
Splits the data into a customized subsets list.
Usage
custom.subsets(data, group, tree = NULL, dist.data = FALSE)
Arguments
data |
A |
group |
Either a |
tree |
|
dist.data |
A |
Details
Note that every element from the input data can be assigned to multiple groups!
Author(s)
Thomas Guillerme
See Also
chrono.subsets
, boot.matrix
, dispRity
, crown.stem
.
Examples
## Generating a dummy ordinated matrix
ordinated_matrix <- matrix(data = rnorm(90), nrow = 10)
## Splitting the ordinated matrix into two groups using row numbers
custom.subsets(ordinated_matrix, list(c(1:4), c(5:10)))
## Splitting the ordinated matrix into three groups using row names
ordinated_matrix <- matrix(data = rnorm(90), nrow = 10,
dimnames = list(letters[1:10]))
custom.subsets(ordinated_matrix,
list("A" = c("a", "b", "c", "d"), "B" = c("e", "f", "g", "h", "i", "j"),
"C" = c("a", "c", "d", "f", "h")))
## Splitting the ordinated matrix into four groups using a dataframe
groups <- as.data.frame(matrix(data = c(rep(1,5), rep(2,5), rep(c(1,2), 5)),
nrow = 10, ncol = 2, dimnames = list(letters[1:10], c("g1", "g2"))))
custom.subsets(ordinated_matrix, groups)
## Splitting a matrix by clade
data(BeckLee_mat50)
data(BeckLee_tree)
custom.subsets(BeckLee_mat50, group = BeckLee_tree)
Demo datasets
Description
A set six trait spaces with different groups and different dimensions.
Details
The content of these datasets and the pipeline to build them is described in details in Guillerme et al 2020.
-
beck
A palaeobiology study of mammals. The data is a 105 dimensions ordination (PCO) of the distances between 106 mammals based on discrete morphological characters. -
wright
A palaeobiology study of crinoids. The data is a 41 dimensions ordination (PCO) of the distances between 42 crinoids based on discrete morphological characters. -
marcy
A geometric morphometric study of gophers (rodents). The data is a 134 dimensions ordination (PCA) the Procrustes superimposition of landmarks from 454 gopher skulls. -
hopkins
A geometric morphometric study of trilobites. The data is a 134 dimensions ordination (PCA) the Procrustes superimposition of landmarks from 46 trilobites cephala. -
jones
An ecological landscape study. The data is a 47 dimensions ordination (PCO) of the Jaccard distances between 48 field sites based on species composition. -
healy
A life history analysis of the pace of life in animals. The data is a 6 dimensions ordination (PCA) of 6 life history traits from 285 animal species.
Source
References
Guillerme T, Puttick MN, Marcy AE, Weisbecker V. 2020 Shifting spaces: Which disparity or dissimilarity measurement best summarize occupancy in multidimensional spaces?. Ecol Evol. 2020;00:1-16. (doi:10.1002/ece3.6452)
Beck, R. M., & Lee, M. S. (2014). Ancient dates or accelerated rates? Morphological clocks and the antiquity of placental mammals. Proceedings of the Royal Society B: Biological Sciences, 281(1793), 20141278.
Wright, D. F. (2017). Bayesian estimation of fossil phylogenies and the evolution of early to middle Paleozoic crinoids (Echinodermata). Journal of Paleontology, 91(4), 799-814.
Marcy, A. E., Hadly, E. A., Sherratt, E., Garland, K., & Weisbecker, V. (2016). Getting a head in hard soils: convergent skull evolution and divergent allometric patterns explain shape variation in a highly diverse genus of pocket gophers (Thomomys). BMC evolutionary biology, 16(1), 207.
Hopkins, M.J. and Pearson, J.K., 2016. Non-linear ontogenetic shape change in Cryptolithus tesselatus (Trilobita) using three-dimensional geometric morphometrics. Palaeontologia Electronica, 19(3), pp.1-54.
Jones, N. T., Germain, R. M., Grainger, T. N., Hall, A. M., Baldwin, L., & Gilbert, B. (2015). Dispersal mode mediates the effect of patch size and patch connectivity on metacommunity diversity. Journal of Ecology, 103(4), 935-944.
Healy, K., Ezard, T.H., Jones, O.R., Salguero-Gomez, R. and Buckley, Y.M., 2019. Animal life history is shaped by the pace of life and the distribution of age-specific mortality and reproduction. Nature ecology & evolution, p.1.
Examples
data(demo_data)
## Loading the Beck and Lee 2014 demo data
demo_data$beck
## Loading the Wright 2017 demo data
demo_data$wright
## Loading the Marcy et al. 2015 demo data
demo_data$marcy
## Loading the Hopkins and Pearson 2016 demo data
demo_data$hopkins
## Loading the Jones et al. 2015 demo data
demo_data$jones
## Loading the Healy et al. 2019 demo data
demo_data$healy
Calculates disparity from a matrix.
Description
Calculates disparity from a matrix, a list of matrices or subsets of a matrix, where the disparity metric can be user specified.
Usage
dispRity(
data,
metric,
dimensions = NULL,
...,
between.groups = FALSE,
dist.data = NULL,
verbose = FALSE,
tree = NULL
)
Arguments
data |
A matrix or a |
metric |
A vector containing one to three functions. At least of must be a dimension-level 1 or 2 function (see details). |
dimensions |
Optional, a vector of |
... |
Optional arguments to be passed to the metric. |
between.groups |
A |
dist.data |
A |
verbose |
A |
tree |
|
Details
The dispRity
object given to the data
argument can be: a list of matrices (typically output from the functions chrono.subsets
or custom.subsets
), a bootstrapped matrix output from boot.matrix
, a list of disparity measurements calculated from the dispRity
function or a matrix
object with rows as elements and columns as dimensions. In any of these cases, the data is considered as the multidimensional space and is not transformed (e.g. if ordinated with negative eigen values, no correction is applied to the matrix).
metric
should be input as a vector of functions.
The functions are sorted and used by dimension-level from 3 to 1 (see dispRity.metric
and make.metric
).
Typically dimension-level 3 functions take a matrix
and output a matrix
; dimension-level 2 functions take a matrix
and output a vector
and dimension-level 1 functions take a matrix
or a vector
and output a single value.
When more than one function is input, they are treated first by dimension-level (i.e. 3, 2 and finally 1).
Note that the functions can only take one metric of each dimension-level and thus can only take a maximum of three arguments!
Some metric functions are built into the dispRity
package: see dispRity.metric
For user specified metrics, please use make.metric
to ensure that the metric will work.
HINT: if using more than three functions you can always create your own function that uses more than one function (e.g. my_function <- function(matrix) cor(var(matrix))
is perfectly valid and allows one to use two dimension-level 3 functions - the correlation of the variance-covariance matrix in this case).
The between.groups
argument runs the disparity between groups rather within groups. If between.groups = TRUE
, the disparity will be calculated using the following inputs:
if the input is an output from
custom.subsets
, the series are run in a pairwise manner usingmetric(matrix, matrix2)
. For example for acustom.subset
contains 3 subsets m1, m2 and m3, the code loops through:metric(m1, m2)
,metric(m2, m3)
andmetric(m1, m3)
(looping throughlist(c(1,2), c(2,3), c(3,1))
).if the input is an output from
chrono.subsets
, the series are run in a paired series manner usingmetric(matrix, matrix2)
. For example for achrono.subsets
contains 3 subsets m1, m2, m3 and m4, the code loops through:metric(m1, m2)
andmetric(m2, m3)
(looping throughlist(c(1,2), c(2,3), c(3,4))
).
In both cases it is also possible to specify the input directly by providing the list to loop through. For example using between.groups = list(c(1,2), c(2,1), c(4,8))
will apply the metric
to the 1st and 2nd subsets, the 2nd and first and the 4th and 8th (in that specific order).
Value
This function outputs a dispRity
object containing at least the following:
matrix |
the multidimensional space (a list of |
call |
A |
subsets |
A |
disparity |
A |
Use summary.dispRity to summarise the dispRity
object.
Author(s)
Thomas Guillerme
See Also
custom.subsets
, chrono.subsets
, boot.matrix
, dispRity.metric
, summary.dispRity
, plot.dispRity
.
Examples
## Load the Beck & Lee 2014 data
data(BeckLee_mat50)
## Calculating the disparity as the sum of variances from a single matrix
sum_of_variances <- dispRity(BeckLee_mat50, metric = c(sum, variances))
summary(sum_of_variances)
## Bootstrapping this value
bootstrapped_data <- boot.matrix(BeckLee_mat50, bootstraps = 100)
dispRity(bootstrapped_data, metric = c(sum, variances))
## Calculating the disparity from a customised subset
## Generating the subsets
customised_subsets <- custom.subsets(BeckLee_mat50,
list(group1 = 1:(nrow(BeckLee_mat50)/2),
group2 = (nrow(BeckLee_mat50)/2):nrow(BeckLee_mat50)))
## Bootstrapping the data
bootstrapped_data <- boot.matrix(customised_subsets, bootstraps = 100)
## Calculating the sum of variances
sum_of_variances <- dispRity(bootstrapped_data, metric = c(sum, variances))
summary(sum_of_variances)
## Calculating disparity with different metrics of different dimension-levels
## Disparity is calculated as the distribution of the variances in each
## dimension (output are distributions)
disparity_level2 <- dispRity(BeckLee_mat50, metric = variances)
## Disparity is calculated as the mean of the variances in each dimension
## (output are single values)
disparity_level1 <- dispRity(disparity_level2, metric = mean)
## Both disparities have the same means but dimension-level 1 has no quantiles
summary(disparity_level2)
summary(disparity_level1)
Covar projection analyses wrapper
Description
Wrapper function for a covar projection analyses on dispRity objects
Usage
dispRity.covar.projections(
data,
type,
base,
sample,
n,
major.axis = 1,
level = 0.95,
output = c("position", "distance", "degree"),
inc.base = FALSE,
...,
verbose = FALSE
)
Arguments
data |
a |
type |
either |
base |
optional, a specific group to project the elements or the groups onto or a list of pairs of groups to compare (see |
sample |
optional, one or more specific posterior sample IDs (is ignored if n is used) or a function to summarise all axes. |
n |
optional, a random number of covariance matrices to sample (if left empty, all are used). |
major.axis |
which major axis to use (default is |
level |
the confidence interval to estimate the major axis (default is |
output |
which values to output from the projection. By default, the three values |
inc.base |
logical, when using |
... |
any optional arguments to pass to |
verbose |
logical, whether to be verbose ( |
Details
Effectively, the wrapper runs either of the following function (simplified here):
if
type = "groups"
:dispRity(data, metric = as.covar(projections.between), between.groups = TRUE, )
for the projections group indata
onto each other.if
type = "elements"
:dispRity(data, metric = as.covar(projections), ...)
for the projections of each element indata
onto their main axis.
If base
is specified:
-
type = "groups"
will run pairs elements each subset andbase
(instead of the full pairwise analyses). -
type = "elements"
will run the projection of each subset onto the major axis frombase
rather than its own.
Value
A list
of class "dispRity"
and "projection"
which contains dispRity
objects corresponding to each projection value from output
.
The elements of the list
can be accessed and analysed individually by selecting them by name (e.g. output$position
) or by ID (e.g. output[[1]]
).
Alternatively, the list can be summarised and plotted using summary.dispRity
plot.dispRity
.
Author(s)
Thomas Guillerme
References
Guillerme T, Bright JA, Cooney CR, Hughes EC, Varley ZK, Cooper N, Beckerman AP, Thomas GH. 2023. Innovation and elaboration on the avian tree of life. Science Advances. 9(43):eadg1641.
See Also
projections
projections.between
axis.covar
dispRity
MCMCglmm.subsets
Examples
data(charadriiformes)
## Creating a dispRity object with a covar component
my_covar <-MCMCglmm.subsets(
data = charadriiformes$data,
posteriors = charadriiformes$posteriors,
tree = charadriiformes$tree,
group = MCMCglmm.levels(
charadriiformes$posteriors)[1:4],
rename.groups = c("gulls", "plovers", "sandpipers", "phylo"))
## Running a projection analyses between groups (on 100 random samples)
between_groups <- dispRity.covar.projections(my_covar, type = "groups", base = "phylo", n = 100)
## Summarising the results
summary(between_groups)
## Measuring the projection of the elements on their own average major axis
elements_proj <- dispRity.covar.projections(my_covar, type = "elements", sample = mean,
output = c("position", "distance"))
## Visualising the results
plot(elements_proj)
## Visualising the correlation
plot(elements_proj, speicfic.args = list(correlation.plot = c("position", "distance")))
Fast dispRity
Description
Fast disparity calculations. THIS FUNCTION IS LESS SAFE TO USE THAN dispRity
(see details).
Usage
dispRity.fast(group, space, metric, ...)
Arguments
group |
a logical vector for grouping |
space |
a matrix |
metric |
a metric dispRity style (up to two levels) |
... |
any additional arguments to be passed to |
Details
IN DOUBT, USE dispRity
INSTEAD OF THIS FUNCTION.
This function should only be used in very specific cases requiring advanced optimisation or embedded customised functions.
This function is simply applying metric(space[group,])
for each group and returns a list of results.
It does not check the validity of the data, metric and groups.
It does not handle specific data (e.g. multiple matrices), specific metrics (e.g. no optional arguments), does not produce meaningful error messages and does not intake nor returns a dispRity object.
Author(s)
Thomas Guillerme
Examples
## A random space
space <- matrix(rnorm(25), 5, 5)
## A metric
metric <- c(sum, variances)
## A group of four observations
group <- c(TRUE, TRUE, TRUE, TRUE, FALSE)
## The disparity
dispRity.fast(group, space, metric)
Disparity metrics
Description
Different implemented disparity metrics.
Usage
dimension.level3.fun(matrix, ...)
dimension.level2.fun(matrix, ...)
dimension.level1.fun(matrix, ...)
between.groups.fun(matrix, matrix2, ...)
Arguments
matrix |
A matrix. |
... |
Optional arguments to be passed to the function. Usual optional arguments are |
matrix2 |
Optional, a second matrix for metrics between groups. |
Details
These are inbuilt functions for calculating disparity. See make.metric
for details on dimension.level3.fun
, dimension.level2.fun
, dimension.level1.fun
and between.groups.fun
. The dimensions levels (1, 2 and 3) can be seen as similar to ranks in linear algebra.
The currently implemented dimension-level 1 metrics are:
-
convhull.volume
: calculates the convex hull hypervolume of a matrix (callsconvhulln(x, options = "FA")$vol
).Both
convhull
functions call theconvhulln
function with the"FA"
option (computes total area and volume).WARNING: both
convhull
functions can be computationally intensive above 10 dimensions!
-
convhull.surface
: calculates the convex hull hypersurface of a matrix (callsconvhulln(x, options = "FA")$area
). -
diagonal
: calculates the longest distance in the ordinated space.WARNING: This function is the generalisation of Pythagoras' theorem and thus works only if each dimensions are orthogonal to each other.
-
ellipsoid.volume
: calculates the ellipsoid volume of a matrix. This function tries to determine the nature of the input matrix and uses one of these following methods to calculate the volume. You can always specify the method usingmethod = "my_choice"
to overrun the automatic method choice.-
"eigen"
: this method directly calculates the eigen values from the input matrix (usingeigen
). This method is automatically selected if the input matrix is "distance like" (i.e. square with two mirrored triangles and a diagonal). -
"pca"
: this method calculates the eigen values as the sum of the variances of the matrix (abs(apply(var(matrix),2, sum))
). This is automatically selected if the input matrix is NOT "distance like". Note that this method is faster than"eigen"
but only works if the input matrix is an ordinated matrix from a PCA, PCO, PCoA, NMDS or MDS. -
"axes"
: this method calculates the actual semi axes length using the input matrix. It is never automatically selected. By default this method calculates the length of the major axes based on the 0.95 confidence interval ellipse but this can be modified by providing additional arguments fromaxis.covar
. -
<a numeric vector>
: finally, you can directly provide a numeric vector of eigen values. This method is never automatically selected and overrides any other options.
-
-
func.div
: The functional divergence (Villeger et al. 2008): the ratio of deviation from the centroid (this is similar toFD::dbFD()$FDiv
). -
func.eve
: The functional evenness (Villeger et al. 2008): the minimal spanning tree distances evenness (this is similar toFD::dbFD()$FEve
). If the matrix used is not a distance matrix, the distance method can be passed using, for examplemethod = "euclidean"
(default). -
mode.val
: calculates the modal value of a vector. -
n.ball.volume
: calculate the volume of the minimum n-ball (ifsphere = TRUE
) or of the ellipsoid (ifsphere = FALSE
). -
roundness
: calculate the roundness of an elliptical representation of a variance-covariance matrix as the integral of the ranked distribution of the major axes. A value of 1 indicates a sphere, a value between 1 and 0.5 indicates a more pancake like representation and a value between 0.5 and 0 a more cigar like representation. You can force the variance-covariance calculation by using the optionvcv = TRUE
(default) that will calculate the variance-covariance matrix if the input is not one.
See also mean
, median
, sum
or prod
for commonly used summary metrics.
The currently implemented dimension-level 2 metrics are:
-
ancestral.dist
: calculates the distance between each elements coordinates in the matrix and their ancestors' coordinates (ifto.root = FALSE
; default) or to the root coordinates (ifto.root = TRUE
) for a giventree
. The distance is calculate as Euclidean by default but can be changed through themethods
argument (method = "euclidean"
; default). Note that the matrix must contain data for both tips and nodes in thetree
, otherwise you must provide a matrix to the argumentreference.data
that contains them. Note that if the function is used indispRity
, both thetree
andreference.data
can be automatically recycled from thedispRity
object (if present). -
angles
: calculates the angles of the main axis of variation per dimension in amatrix
. The angles are calculated using the least square algorithm from thelm
function. The unit of the angle can be changed through theunit
argument (either"degree"
(default),radian
orslope
) and a base angle to measure the angle from can be passed through thebase
argument (by defaultbase = 0
, measuring the angle from the horizontal line (note that thebase
argument has to be passed in the same unit asunit
). When estimating the slope throughlm
, you can use the optionsignificant
to only consider significant slopes (TRUE
) or not (FALSE
- default). -
centroids
: calculates the distance between each row and the centroid of the matrix (Laliberte 2010). This function can take an optional argumentscentroid
for defining the centroid (if missing (default), the centroid of the matrix is used). This argument can be either a subset of coordinates matching the matrix's dimensions (e.g.c(0, 1, 2)
for a matrix with three columns) or a single value to be the coordinates of the centroid (e.g.centroid = 0
will set the centroid coordinates toc(0, 0, 0)
for a three dimensional matrix). NOTE: distance is calculated as"euclidean"
by default, this can be changed using themethod
argument. -
count.neighbours
: counts the number of other elements neigbhouring each element within a certain radius. This function can take the optional argumentsradius
that is the radius for counting the neighbours. This can be either missing (by default this is half the longest distance), a function to calculate the distance takingx
as the sole argument (e.g.sd
orfunction(x) sum(x, na.rm = TRUE)/length(x)
) or anumeric
orinteger
value. The other option isrelative
to make the counts relative to the number of elements (relative = TRUE
; default) or not (relative = FALSE
). NOTE: distance is calculated as"euclidean"
by default, this can be changed using themethod
argument. -
deviations
: calculates the minimal Euclidean distance between each element in and the hyperplane (or line if 2D, or a plane if 3D). You can specify equation of hyperplane of d dimensions in theintercept + ax + by + ... + nd = 0
format. For example the liney = 3x + 1
should be entered asc(1, 3, -1)
or the planex + 2y - 3z = 44
asc(44, 1, 2, -3)
. If missing thehyperplane
(default) is calculated using a least square regression using a gaussianglm
. Extra arguments can be passed toglm
through...
. When estimating the hyperplane, you can use the optionsignificant
to only consider significant slopes (TRUE
) or not (FALSE
- default). -
displacements
: calculates the ratio between the distance to the centroid (seecentroids
above) and the distance from a reference (by default the origin of the space). The reference can be changed through thereference
argument. NOTE: distance is calculated as"euclidean"
by default, this can be changed using themethod
argument. -
edge.length.tree
: calculates the edge length from a given tree for each elements present in the matrix. Each edge length is either measured between the element and the root of the tree (to.root = TRUE
; default) or between the element and its last ancestor (to.root = FALSE
)) -
neighbours
: calculates the distance to a neighbour (Foote 1990). By default this is the distance to the nearest neighbour (which = min
) but can be set to any dimension level - 1 function (e.g.which = mean
gives the distance to the most average neighbour). NOTE: distance is calculated as"euclidean"
by default, this can be changed using themethod
argument. -
pairwise.dist
: calculates the pairwise distance between elements - callsvegdist(matrix, method = method, diag = FALSE, upper = FALSE, ...)
. The distance type can be changed via themethod
argument (seevegdist
- default:method = "euclidean"
). This function outputs a vector of pairwise comparisons in the following order: d(A,B), d(A,C), d(B,C) for three elements A, B and C. NOTE: distance is calculated as"euclidean"
by default, this can be changed using themethod
argument. -
projections
: projects each element on a vector defined as (point1
,point2
) and measures some aspect of this projection. The different aspects that can be measured are:-
measure = "position"
(default), the distance of each element on the vector (point1
,point2
). Negative values means the element projects on the opposite direction of the vector (point1
,point2
). -
measure = "distance"
, the euclidean distance of each element from the vector (point1
,point2
). -
measure = "degree"
, the angle between the vector (point1
,point2
) and any vector (point1
,element
) in degrees. -
measure = "radian"
, the angle between the vector (point1
,point2
) and any vector (point1
,element
) in radians. -
measure = "orthogonality"
, the angle between the vector (point1
,point2
) and any vector (point1
,element
) expressed in right angle ranging between 0 (non angle) and 1 (right angle).
By default,
point1
is the centre of the space (coordinates0, 0, 0, ...
) andpoint2
is the centroid of the space (coordinatescolMeans(matrix)
). Coordinates forpoint1
andpoint2
can be given as a single value to be repeated (e.g.point1 = 1
is translated intopoint1 = c(1, 1, ...)
) or a specific set of coordinates. Furthermore, by default, the space is scaled so that the vector (point1
,point2
) becomes the unit vector (distance (point1
,point2
) is set to 1; optionscale = TRUE
; default). You can use the unit vector of the space using the optionscale = FALSE
. Other options include the centering of the projections on 0.5 (centre = TRUE
; default is set toFALSE
) ranging the projection onto the vector (point1
,point2
) between -1 and 1 (higher or lower values project beyond the vector); and whether to output the projection values as absolute values (abs = FALSE
; default is set toFALSE
). These two last options only affect the results frommeasure = "position"
. -
-
projections.tree
: calculates theprojections
metric but drawing the vectors from a phylogenetic tree. This metric can intake any argument fromprojections
(see above) but forpoint1
andpoint2
that are replaced by the argumenttype
.type
is avector
or alist
of two elements that designates which vector to draw and can be any pair of the following options (the first element being the origin of the vector and the second where the vector points to):-
"root"
: the root of the tree (the first element in tree$node.label); -
"ancestor"
: the element's most recent ancestor; -
"tips"
: the centroid of the tips; -
"nodes"
: the centroid of all the nodes; -
"livings"
: the centroid of the tips the furthest from the root; -
"fossils"
: the centroid of all the tips that are not the furthest from the root; any numeric values that can be interpreted as
point1
andpoint2
inprojections
;or a user defined function that with the inputs
matrix
andtree
androw
(the element's ID, i.e. the row number inmatrix
).
NOTE: the elements to calculate the origin and end points of the vector are calculated by default on the provided input
matrix
which can be missing data from the tree if used withcustom.subsets
orchrono.subsets
. You can always provide the full matrix using the optionreference.data = my_matrix
. Additional arguments include any arguments to be passed toprojections
(e.g.centre
orabs
). -
-
quantiles
: calculates the quantile range of each axis of the matrix. The quantile can be changed using thequantile
argument (default isquantile = 95
, i.e. calculating the range on each axis that includes 95% of the data). An optional argument,k.root
, can be set toTRUE
to scale the ranges by using itskth
root (wherek
are the number of dimensions). By default,k.root = FALSE
. -
radius
: calculates a distance from the centre of each axis. Thetype
argument is the function to select which distance to calculate. By defaulttype = max
calculates the maximum distance between the elements and the centre for each axis (i.e. the radius for each dimensions) -
ranges
: calculates the range of each axis of the matrix (Wills 2001). An optional argument,k.root
, can be set toTRUE
to scale the ranges by using itskth
root (wherek
are the number of dimensions). By default,k.root = FALSE
. -
variances
: calculates the variance of each axis of the matrix (Wills 2001). This function can also take thek.root
optional argument described above. -
span.tree.length
: calculates the length of the minimum spanning tree (seespantree
). This function can get slow with big matrices. To speed it up, one can directly use distance matrices as the multidimensional space.
The currently implemented between.groups metrics are:
-
disalignment
: calculates the rejection of a point frommatrix
from the major axis ofmatrix2
. Options are,axis
to choose which major axis to reject from (default isaxis = 1
);level
for the ellipse' confidence interval (to calculate the axis) (default islevel = 0.95
) andpoint.to.reject
, a numeric value for designating which point inmatrix
to use or a function for calculating it (default ispoint.to.reject = colMeans
formatrix
's centroid). -
group.dist
: calculates the distance between two groups (by default, this is the minimum euclidean vector norm distance between groups). Negative distances are considered as 0. This function must intake two matrices (matrix
andmatrix2
) and the quantiles to consider. For the minimum distance between two groups, the 100th quantiles are considered (default:probs = c(0,1)
) but this can be changed to any values (e.g. distance between the two groups accounting based on the 95th CI:probs = c(0.025, 0.975)
; distance between centroids:probs = c(0.5)
, etc...). This function is the linear algebra equivalent of thehypervolume::hypervolume_distance
function. -
point.dist
: calculates the distance betweenmatrix
and a point calculated frommatrix2
. By default, this point is the centroid ofmatrix2
. This can be changed by passing a function to be applied tomatrix2
through thepoint
argument (for example, for the centroid:point.dist(..., point = colMeans)
). NOTE: distance is calculated as"euclidean"
by default, this can be changed using themethod
argument. -
projections.between
: calculates the projection of the major axis between two matrices. It allows the same arguments asprojections
. This function measures the major axis from both input matrices, centre their origins and projects the end of the vector ofmatrix
onto the vector frommatrix2
. Which axis to measure can be changed with the optionaxis
(for the major axis,axis = 1
; default) and the confidence interval can be changed usinglevel
(for the 95 confidence interval,level = 0.95
; default - seeaxis.covar
for more details).
When used in the dispRity
function, optional arguments are declared after the metric
argument: for example
dispRity(data, metric = centroids, centroid = 0, method = "manhattan")
Author(s)
Thomas Guillerme
References
Donohue I, Petchey OL, Montoya JM, Jackson AL, McNally L, Viana M, Healy K, Lurgi M, O'Connor NE, Emmerson MC. 2013. On the dimensionality of ecological stability. Ecology letters. 16(4):421-9.
Lalibert'e E, Legendre P. 2010. A distance-based framework for measuring functional diversity from multiple traits. Ecology, 91(1), pp.299-305.
Vill'eger S, Mason NW, Mouillot D. 2008. New multidimensional functional diversity indices for a multifaceted framework in functional ecology. Ecology. 89(8):2290-301.
Wills MA. 2001. Morphological disparity: a primer. In Fossils, phylogeny, and form (pp. 55-144). Springer, Boston, MA.
Foote, M. 1990. Nearest-neighbor analysis of trilobite morphospace. Systematic Zoology, 39(4), pp.371-382.
Guillerme T, Puttick MN, Marcy AE, Weisbecker V. 2020. Shifting spaces: Which disparity or dissimilarity measurement best summarize occupancy in multidimensional spaces?. Ecology and evolution. 10(14):7261-75.
See Also
dispRity
and make.metric
.
Examples
## A random matrix
dummy_matrix <- matrix(rnorm(90), 9, 10)
## ancestral.dist
## A random tree with node labels
rand_tree <- rtree(5) ; rand_tree$node.label <- paste0("n", 1:4)
## Adding the tip and node names to the matris
rownames(dummy_matrix) <- c(rand_tree$tip.label, rand_tree$node.label)
## Calculating the distances to the ancestors
ancestral.dist(dummy_matrix, tree = rand_tree)
## Calculating the manhattan distances to the root
ancestral.dist(dummy_matrix, tree = rand_tree,
to.root = TRUE, method = "manhattan")
## angles
## The angles in degrees of each axis
angles(dummy_matrix)
## The angles in slope from the 1:1 slope (Beta = 1)
angles(dummy_matrix, unit = "slope", base = 1)
## centroids
## Distances between each row and centroid of the matrix
centroids(dummy_matrix)
## Distances between each row and an arbitrary point
centroids(dummy_matrix, centroid = c(1,2,3,4,5,6,7,8,9,10))
## Distances between each row and the origin
centroids(dummy_matrix, centroid = 0)
## convhull.surface
## Making a matrix with more elements than dimensions (for convhull)
thinner_matrix <- matrix(rnorm(90), 18, 5)
## Convex hull hypersurface of a matrix
convhull.surface(thinner_matrix)
## convhull.volume
## Convex hull volume of a matrix
convhull.volume(thinner_matrix)
## count.neigbhours
## Counting the number of neighbours within a radius of half the traitspace
count.neighbours(dummy_matrix)
## The absolute number of neighbours within a radius of 3
count.neighbours(dummy_matrix, radius = 3, relative = FALSE)
## The relative number of neighbours within a radius of one standard deviation
count.neighbours(dummy_matrix, radius = sd, relative = FALSE)
## deviations
## The deviations from the least square hyperplane
deviations(dummy_matrix)
## The deviations from the plane between the x and y axis
deviations(dummy_matrix, hyperplane = c(0,1,1,0,0,0,0,0,0,0,0))
## diagonal
## Matrix diagonal
diagonal(dummy_matrix) # WARNING: only valid if the dimensions are orthogonal
## disalignment
## Two dummy matrices
matrix_1 <- matrix(rnorm(16), 4, 4)
matrix_2 <- matrix(rnorm(16), 4, 4)
## Measuring the disalignment of matrix_1 from matrix_2
disalignment(matrix_1, matrix_2)
## Same but using the 2nd major axis of the 0.75 CI ellipse
## from matrix_2 and the first point from matrix_1.
disalignment(matrix_1, matrix_2,
axis = 2, level = 0.75,
point.to.reject = 1)
## displacements
## displacement ratios (from the centre)
displacements(dummy_matrix)
## displacement ratios (from an arbitrary point)
displacements(dummy_matrix, reference = c(1,2,3,4,5,6,7,8,9,10))
## displacement ratios from the centre (manhattan distance)
displacements(dummy_matrix, method = "manhattan")
## edge.length.tree
## Making a dummy tree with node labels
dummy_tree <- makeNodeLabel(rtree((nrow(dummy_matrix)/2)+1))
## Naming the elements in the matrix
named_matrix <- dummy_matrix
rownames(named_matrix) <- c(dummy_tree$tip.label,
dummy_tree$node.label)
## The total edge length of each element in the matrix (to the root)
edge.length.tree(named_matrix, tree = dummy_tree)
## The edge lengths for each edge leading to the elements in the matrix
edge.length.tree(named_matrix, tree = dummy_tree, to.root = FALSE)
## ellipsoid.volume
## Ellipsoid volume of a matrix
ellipsoid.volume(dummy_matrix)
## Calculating the same volume with provided eigen values
ordination <- prcomp(dummy_matrix)
## Calculating the ellipsoid volume by providing your own eigen values
ellipsoid.volume(ordination$x, method = ordination$sdev^2)
## func.div
## Functional divergence
func.div(dummy_matrix)
## func.eve
## Functional evenness
func.eve(dummy_matrix)
## Functional evenness (based on manhattan distances)
func.eve(dummy_matrix, method = "manhattan")
## group.dist
## The distance between groups
dummy_matrix2 <- matrix(runif(40, min = 2, max = 4), 4, 10)
## The minimum distance between both groups
group.dist(dummy_matrix, dummy_matrix2)
## The distance between both groups' centroids
group.dist(dummy_matrix, dummy_matrix2, probs = 0.5)
## The minimum distance between the 50% CI of each group
group.dist(dummy_matrix, dummy_matrix2, probs = c(0.25, 0.75))
## mode.val
## Modal value of a vector
mode.val(dummy_matrix)
## neighbours
## The nearest neighbour euclidean distances
neighbours(dummy_matrix)
## The furthest neighbour manhattan distances
neighbours(dummy_matrix, which = max, method = "manhattan")
## pairwise.dist
## The pairwise distance
pairwise.dist(dummy_matrix)
## The average squared pairwise distance
mean(pairwise.dist(dummy_matrix)^2)
## equal to:
# geiger::disparity(data = dummy_matrix)
## point.dist
## The distances from the rows dummy_matrix
## to the centroids of dummy_matrix2
point.dist(dummy_matrix, dummy_matrix2)
## The average distances from dummy_matrix
## to the centroids of dummy_matrix2
mean(point.dist(dummy_matrix, dummy_matrix2))
## The manhattan distance from the rows dummy_matrix
## to the standard deviation of dummy_matrix2
point.dist(dummy_matrix, dummy_matrix2, point = sd, method = "manhattan")
## projections
## The distances on the vector defined from the centre of
## the matrix to its centroid (default)
projections(dummy_matrix)
## The distances from the vector defined from the third
## element of the matrix to the point of coordinated
## c(1,1,1, ...) the matrix to its centroid (default)
projections(dummy_matrix, measure = "distance",
point1 = dummy_matrix[3, ],
point2 = 1)
## projections.tree
## Making a dummy tree with node labels
dummy_tree <- makeNodeLabel(rtree((nrow(dummy_matrix)/2)+1))
## Naming the elements in the matrix
named_matrix <- dummy_matrix
rownames(named_matrix) <- c(dummy_tree$tip.label,
dummy_tree$node.label)
## The projection on the vector defined from the root of
## the tree to the ancestor of each element in the matrix
projections.tree(named_matrix, dummy_tree,
type = c("root", "ancestor"))
## The rejection from the vector defined from the centroid
## of the nodes to the centroids of the tips
projections.tree(named_matrix, dummy_tree,
type = c("nodes", "tips"),
measure = "distance")
## A user function that define coordinates based on the
## centroid of the three first nodes
user.fun <- function(matrix, tree, row = NULL) {
return(colMeans(matrix[tree$node.label[1:3], ]))
}
## The projection on the vector defined by the coordinates
## 0,0,0 and a user defined function
projections.tree(named_matrix, dummy_tree,
type = c(0, user.fun))
## projections.between
## Two dummy matrices
matrix_1 <- matrix(rnorm(16), 4, 4)
matrix_2 <- matrix(rnorm(16), 4, 4)
## Projecting the major axis of matrix_2 onto the one from matrix_1
projections.between(matrix_1, matrix_2)
## Projecting both second major 0.75 axes
## and getting the rejections (see projections() for option details)
projections.between(matrix_1, matrix_2,
measure = "distance",
axis = 2, level = 0.75)
## quantiles
## The 95 quantiles
quantiles(dummy_matrix)
## The 100 quantiles (which are equal to the ranges)
quantiles(dummy_matrix, quantile = 100) == ranges(dummy_matrix) # All TRUE
## radius
## The maximal radius of each axis (maximum distance from centre of each axis)
radius(dummy_matrix)
## ranges
## ranges of each column in a matrix
ranges(dummy_matrix)
## ranges of each column in the matrix corrected using the kth root
ranges(dummy_matrix, k.root = TRUE)
## roundness
## calculating the variance-covariance of the dummy_matrix
vcv <- var(dummy_matrix)
## calculating the roundness of it
roundness(vcv)
## calculating the roundness of the dummy matrix by calculating the vcv
roundness(dummy_matrix, vcv = TRUE)
## span.tree.length
## Minimum spanning tree length (default)
span.tree.length(dummy_matrix)
## Minimum spanning tree length from a distance matrix (faster)
distance <- as.matrix(dist(dummy_matrix))
span.tree.length(distance)
## Minimum spanning tree length based on Manhattan distance
span.tree.length(dummy_matrix, method = "manhattan")
span.tree.length(as.matrix(dist(dummy_matrix, method = "manhattan"))) # Same
## variances
## variances of a each column in the matrix
variances(dummy_matrix)
## variances of a each column in the matrix corrected using the kth root
variances(dummy_matrix, k.root = TRUE)
Disparity in different groups.
Description
Performs a disparity analysis between groups.
Usage
dispRity.per.group(data, group, metric = c(median, centroids), ...)
Arguments
data |
An ordinated |
group |
A |
metric |
A vector containing one to three functions (default = |
... |
Optional arguments to be passed to |
Details
Note that this is a wrapper function that allows users to run a basic disparity among groups analysis without too much effort.
As such it has a lot of defaults described in the functions that make up the analysis.
See custom.subsets
, boot.matrix
, dispRity.metric
, summary.dispRity
, plot.dispRity
for more details of the defaults used in each of these functions.
Note that any of these defaults can be changed within the disparity.through.time
function.
Value
A dispRity
object that can be passed to summary
or plot
.
Author(s)
Thomas Guillerme
See Also
custom.subsets
, boot.matrix
, dispRity.metric
, summary.dispRity
, plot.dispRity
.
Examples
## Load the Beck & Lee 2014 data
data(BeckLee_mat50)
## Run a simple disparity per group analysis comparing stem and crown mammals
result <- dispRity.per.group(BeckLee_mat50, list(crown = c(16, 19:41, 45:50),
stem = c(1:15, 17:18, 42:44)))
summary(result) ; plot(result)
## This is equivalent to run the following decomposed code
dispRity(boot.matrix(custom.subsets(BeckLee_mat50, list(crown = c(16, 19:41, 45:50),
stem = c(1:15, 17:18, 42:44))),
bootstraps = 100),
metric = c(median, centroids))
Disparity through time.
Description
Performs a disparity through time analysis.
Usage
dispRity.through.time(data, tree, time, metric = c(median, centroids), ...)
Arguments
data |
An ordinated |
tree |
A |
time |
A |
metric |
A vector containing one to three functions (default = |
... |
Optional arguments to be passed to |
Details
By default the time subsets use method = "discrete"
, the matrix is bootstrapped 100 times.
Note that this is a wrapper function that allows users to run a basic disparity-through-time analysis without too much effort.
As such it has a lot of defaults described in the functions that make up the analysis.
See chrono.subsets
, boot.matrix
, dispRity.metric
, summary.dispRity
, plot.dispRity
for more details of the defaults used in each of these functions.
Note that any of these defaults can be changed within the disparity.through.time
function.
Value
A dispRity
object that can be passed to summary
or plot
.
Author(s)
Thomas Guillerme
See Also
chrono.subsets
, boot.matrix
, dispRity.metric
, summary.dispRity
, plot.dispRity
.
Examples
## Load the Beck & Lee 2014 data
data(BeckLee_mat50) ; data(BeckLee_tree)
## Run a simple disparity through time analysis (with three time bins)
result <- dispRity.through.time(BeckLee_mat50, BeckLee_tree, 3)
summary(result) ; plot(result)
## This is equivalent to run the following decomposed code
dispRity(boot.matrix(chrono.subsets(BeckLee_mat50, BeckLee_tree, time = 3, method = "discrete"),
bootstraps = 100),
metric = c(median, centroids))
disparity
Description
An example of a dispRity
object.
Format
one dispRity
object.
Details
This matrix is based on the BeckLee
dataset and split into seven continuous subsets (chrono.subsets
).
It was bootstrapped 100 times (boot.matrix
) with four rarefaction levels.
Disparity was calculated as the median
of the centroids
(dispRity
).
See Also
BeckLee_disparity BeckLee
Examples
## Not run:
## Loading the data
data(BeckLee_mat99)
data(BeckLee_tree)
data(BeckLee_ages)
## Creating the 7 subsets
subsets <- chrono.subsets(BeckLee_mat99, BeckLee_tree,
time = seq(from = 30, to = 90, by = 10),
method = "continuous", model = "ACCTRAN",
FADLAD = BeckLee_ages)
## Bootstrapping and rarefying
bootstraps <- boot.matrix(subsets, bootstraps = 100,
rarefaction = c(20, 15, 10, 5))
## Calculating disparity
disparity <- dispRity(bootstraps, metric = c(median, centroids))
## End(Not run)
Randtest distance
Description
Measures the distance between the observed statistic from a "randtest"
object and some specific quantile of the simulated data.
Usage
distance.randtest(xtest, quantile = c(0.025, 0.975), abs = FALSE)
Arguments
xtest |
an object of class |
quantile |
a |
abs |
|
Details
To compare the observed value to the simulated median value, you can use quantile = 0.5
.
Also note that when using abs = FALSE
(default), a negative value means that the observed statistic is within the request quantiles.
Author(s)
Thomas Guillerme
See Also
Examples
## Simple example
dummy_matrix <- matrix(rnorm(500), 100, 5)
## Testing whether the mean of a random subset
## is different than the means of 100 subsets
dummy_test <- randtest.dispRity(dummy_matrix,
subset = sample(1:100, 20),
metric = mean)
dummy_test ; plot(dummy_test)
## The distance between the observed data and the 95% quantile
distance.randtest(dummy_test)
## The absolute distance from the median
distance.randtest(dummy_test, quantile = 0.5, abs = TRUE)
dtt dispRity (from geiger::dtt
)
Description
A wrapper for the geiger::dtt
function working with any disparity metric.
Usage
dtt.dispRity(
data,
metric,
tree,
nsim = 0,
model = "BM",
alternative = "two-sided",
scale.time = TRUE,
...
)
Arguments
data |
A |
metric |
The disparity metric to be passed to |
tree |
A |
nsim |
The number of simulations to calculate null disparity-through-time. |
model |
A evolutionary model for the simulations (see |
alternative |
The H1 alternative (for calculating the p-value). Can be |
scale.time |
Optional, whether to scale the time (between 0 and 1; |
... |
Any other arguments to be passed to |
Details
See geiger::dtt
for details.
Author(s)
Thomas Guillerme
See Also
test.dispRity
, custom.subsets
, chrono.subsets
, plot.dispRity
.
Examples
## Loading morphological data and a tree
data(BeckLee_mat50)
data(BeckLee_tree)
## The average squared pairwise distance metric (used in geiger::dtt)
average.sq <- function(X) mean(pairwise.dist(X)^2)
## Calculate the disparity of the dataset using dtt.dispRity
dispRity_dtt <- dtt.dispRity(data = BeckLee_mat50, metric = average.sq,
tree = BeckLee_tree, nsim = 20)
## Plotting the results
plot(dispRity_dtt)
Getting the time subsets before and after an extinction event
Description
Getting the reference (pre-extinction) and the comparison (post-extinction) time subsets
Usage
extinction.subsets(data, extinction, lag = 1, names = FALSE, as.list = FALSE)
Arguments
data |
a |
extinction |
|
lag |
|
names |
|
as.list |
|
Author(s)
Thomas Guillerme
See Also
Examples
## Loading some disparity data
data(disparity)
## Time subsets for the K-Pg extinction (66 Mya)
extinction.subsets(disparity, 66, names = TRUE)
## Extinction with a lag effect of 3 slices
extinction_time <- extinction.subsets(disparity, 66, lag = 3, as.list = TRUE)
## Testing the extinction effect with a lag
test.dispRity(disparity, wilcox.test, comparisons = extinction_time,
correction = "bonferroni")
Imports data from geomorph
Description
Takes geomorph Procrustes object or a geomorph.data.frame object and ordinates it.
Usage
geomorph.ordination(data, ordinate = TRUE, ...)
Arguments
data |
An array (p x k x n) typically obtained from a Procrustes superimposition |
ordinate |
Logical, whether to ordinate the data using |
... |
Any optional arguments to be passed to |
Details
If data
is a geomorph.data.frame
object containing factors, directly performs a custom.subsets
using these factors.
Value
A matrix
or a dispRity
object.
See Also
geomorph::gpagen
, geomorph::morphol.disparity
, prcomp
, custom.subsets
, chrono.subsets
, boot.matrix
, dispRity
.
Examples
## Not run:
require(geomorph)
## Loading the plethodon dataset
data(plethodon)
## Performing a Procrustes transform
procrustes <- geomorph::gpagen(plethodon$land, PrinAxes = FALSE)
## Obtaining the ordination matrix
geomorph.ordination(procrustes)
## Using a geomorph.data.frame
geomorph_df <- geomorph.data.frame(procrustes, species = plethodon$species)
geomorph.ordination(geomorph_df)
## Calculating disparity from dispRity or geomorph::morphol.disparity
geomorph_disparity <- geomorph::morphol.disparity(coords ~ 1,
groups= ~ species, data = geomorph_df)
dispRity_disparity <- dispRity(geomorph.ordination(geomorph_df),
metric = function(X) return(sum(X^2)/nrow(X)))
## Extracting the raw disparity values
geomorph_val <- round(as.numeric(geomorph_disparity$Procrustes.var), 15)
dispRity_val <- as.vector(summary(dispRity_disparity, digits = 15)$obs)
## Comparing the values (to the 15th decimal!)
geomorph_val == dispRity_val # all TRUE
## End(Not run)
Get time bins ages
Description
Gets time bins for a specific tree using stratigraphy
Usage
get.bin.ages(tree, what = "End", type = "Age", ICS = 2015)
Arguments
tree |
A |
what |
Which data to output. Can be |
type |
The type of stratigraphic frame. Can be |
ICS |
The reference year of the International Commission on Stratigraphy (default = |
Author(s)
Thomas Guillerme
See Also
Examples
## Loading the data
data(BeckLee_tree)
data(BeckLee_mat50)
## Getting the stratigraphic data
stratigraphy <- get.bin.ages(BeckLee_tree)
## Making stratigraphic time subsets
chrono.subsets(BeckLee_mat50, tree = BeckLee_tree, method = "discrete",
time = stratigraphy)
Generates a contrast matrix.
Description
Creates a contrast matrix using the observed character states in an input matrix.
Usage
get.contrast.matrix(matrix)
Arguments
matrix |
a discrete morphological character matrix. |
Author(s)
Thomas Guillerme
See Also
Examples
## A random multistate matrix
random_matrix <- matrix(sample(c(0,1,2), 100, TRUE), 10, 10)
## Get the contrast matrix
get.contrast.matrix(random_matrix)
## Adding inapplicable and missing data to the matrix
random_matrix[sample(1:100, 10)] <- "?"
random_matrix[sample(1:100, 10)] <- "-"
## Get the contrast matrix
get.contrast.matrix(random_matrix)
Extract elements from a dispRity
object.
Description
Extract a matrix or the disparity results from a dispRity
.
Usage
get.matrix(data, subsets, rarefaction, bootstrap, matrix)
get.disparity(data, subsets, rarefaction, observed, concatenate)
Arguments
data |
A |
subsets |
Optional, a |
rarefaction |
Optional, a single |
bootstrap |
Optional, a |
matrix |
A |
observed |
A |
concatenate |
When the disparity metric is a distribution, whether to concatenate it returning the median ( |
Author(s)
Thomas Guillerme
See Also
Examples
## Load the disparity data based on Beck & Lee 2014
data(disparity)
## To get the original matrix
get.matrix(disparity)
## To get the un-bootstrapped matrix from the subset called "80"
get.matrix(disparity, subsets = "80")
## To get the 52nd bootstrap draw of the second rarefaction level (15) of the
## same subset
get.matrix(disparity, subsets = 2, rarefaction = 2, bootstrap = 52)
## Extracting the observed disparity
get.disparity(disparity)
## Extracting the bootstrapped disparity
boot_disp <- get.disparity(disparity, observed = FALSE)
str(boot_disp)
## Or only the rarefied (5) data
boot_disp_rare <- get.disparity(disparity, observed = FALSE,
rarefaction = 5)
Extracts or modify subsets from a dispRity
object.
Description
Extracting or modify some subsets' data and information from a dispRity
object.
Usage
n.subsets(data)
name.subsets(data)
size.subsets(data)
get.subsets(data, subsets)
combine.subsets(data, subsets)
Arguments
data |
A |
subsets |
Either a |
Details
For the function combine.subsets
, the argument subsets
can ALSO be a numeric
value of the minimum of elements for each series.
If subset
is a vector, the subsets are merged in the given input order. c(1, 3, 4)
will merge subsets 1 and 3 into 4, while the opposite, c(3, 4, 1)
will merge subsets 3 and 4 into 1.
When a single numeric value is given, subsets are merged with the next subset until the correct number of elements for each subset is reached (apart from the last subset that gets merged with the previous one).
Author(s)
Thomas Guillerme
See Also
Examples
## Load the disparity data based on Beck & Lee 2014
data(disparity)
## How many subsets are in disparity?
n.subsets(disparity)
## What are the subset names
name.subsets(disparity)
## What are the number of elements per subsets?
size.subsets(disparity)
## Get one subset
get.subsets(disparity, "60")
## Get two subsets
get.subsets(disparity, c(1,5))
## Generate subsets from a dummy matrix
dummy_matrix <- matrix(rnorm(120), 40, dimnames = list(c(1:40)))
dummy_subsets <- custom.subsets(dummy_matrix,
group = list("a" = c(1:5), "b" = c(6:10), "c" = c(11:20),
"d" = c(21:24), "e" = c(25:30), "f" = c(31:40)))
## Merging the two first subsets
combine.subsets(dummy_subsets, c(1,2))
## Merging the three subsets by name
combine.subsets(dummy_subsets, c("d", "c", "e"))
## Merging the subsets to contain at least 20 taxa
combine.subsets(dummy_subsets, 10)
Make and fill dispRity
.
Description
Creating an empty dispRity
object from a matrix
Usage
make.dispRity(data, tree, call, subsets)
fill.dispRity(data, tree, check)
remove.dispRity(data, what)
Arguments
data |
A |
tree |
Optional, a |
call |
Optional, a |
subsets |
Optional, a |
check |
Logical, whether to check the data ( |
what |
Which elements to remove. Can be any of the following: |
Details
When using remove.dispRity
, the function recursively removes any other data depending on "what"
.
For example, for a data with disparity calculated for bootstrapped subsets, removing the subsets (what = "subsets"
) also removes the bootstraps and the disparity data.
But removing the bootstraps (what = "bootstraps"
) removes only the bootstraps draws and the disparity relating to the bootstraps (but keeps the subsets and the non-bootstrapped disparity values).
Author(s)
Thomas Guillerme
Examples
## An empty dispRity object
make.dispRity()
## Still an empty dispRity object (with a matrix)
(empty <- make.dispRity(data = matrix(rnorm(12), ncol = 3)))
## A dispRity object with a matrix of 4*3
fill.dispRity(empty)
## A dispRity object with a tree
my_tree <- rtree(4, tip.label = c(1:4))
fill.dispRity(empty, tree = my_tree)
Creating disparity metrics
Description
Testing the dimension-level of disparity metrics
Usage
make.metric(
fun,
...,
silent = FALSE,
check.between.groups = FALSE,
data.dim,
tree = NULL,
covar = FALSE,
get.help = FALSE
)
Arguments
fun |
A |
... |
Some arguments to be passed to |
silent |
|
check.between.groups |
|
data.dim |
optional, two |
tree |
optional, a |
covar |
|
get.help |
|
Details
This function tests:
1: if your function can deal with a matrix as an
input
.2: which dimension-level is your function (1, 2 or 3, see
dispRity.metric
).3: whether the function can properly be implemented in the
dispRity
function.
The three different metric levels correspond to the dimensions of the output and are:
"dimension-level 1": for functions that decompose a
matrix
into a single value."dimension-level 2": for functions that decompose a
matrix
into avector
."dimension-level 3": for functions that transform the
matrix
into anothermatrix
.
For example, the disparity metric sum
of variances
is composed of two metric dimension-levels:
The
variances
(dimension-level 2) that calculates the variances for each column in a matrix (aggregates amatrix
into avector
).The
sum
(dimension-level 1) that transforms thevector
of variances into a single value.
See function example for a concrete illustration (three different dimension-levels of the function sum
).
HINT: it is better practice to name the first argument of fun
matrix
to avoid potential argument conflicts down the line (the dispRity
function assumes the matrix
argument for the parsing the metrics).
The input fun
can be a "normal" metric function (i.e. that takes a matrix as first argument) or a "between.groups" metric (i.e. that takes two matrix as arguments). If the arguments are named matrix
and matrix2
, the metric will be assumed to be "between.groups" and be run in a for
loop rather than a apply
loop in dispRity
.
Author(s)
Thomas Guillerme
See Also
Examples
## A dimension-level 1 function
my_fun <- function(matrix) sum(matrix)
make.metric(my_fun)
## A dimension-level 2 function
my_fun <- function(matrix) apply(matrix, 2, sum)
make.metric(my_fun)
## A dimension-level 3 function
my_fun <- function(matrix) (matrix + sum(matrix))
make.metric(my_fun)
Match tips or nodes edge vector
Description
Match a vector of tips or tips and nodes with the an edge list from a "phylo"
or "multiPhylo"
.
Usage
match.tip.edge(
vector,
phylo,
replace.na,
use.parsimony = TRUE,
to.root = FALSE
)
Arguments
vector |
a vector of variables (equal to the number of tips or to the number of tips and nodes) or a vector of tips and nodes names or IDs. |
phylo |
a phylo or multiPhylo object. |
replace.na |
optional, what to replace NAs with. |
use.parsimony |
logical, whether to also colour internal edges parsimoniously ( |
to.root |
logical, if |
Value
If the input vector
is a vector of variables, the function returns a vector of variables equal to the number of edges in the tree (or a list of vectors if the phylo
input is of class "multiPhylo"
). Else it returns an integer
vector for the selected edges.
Author(s)
Thomas Guillerme
Examples
## A random tree
tree <- rtree(20)
## A random vector of two variables for each tips
tip_values <- sample(c("blue", "red"), 20, replace = TRUE)
## Matching the colors (blue and red) to the tips descendants
edge_colors <- match.tip.edge(tip_values, tree, replace.na = "grey")
## Plotting the results
plot(tree, show.tip.label = FALSE, edge.color = edge_colors)
tiplabels(1:20, bg = tip_values)
## Same but without assuming parsimony for the internal nodes
plot(tree, show.tip.label = FALSE,
edge.color = match.tip.edge(tip_values, tree,
use.parsimony = FALSE,
replace.na = "grey"))
## Matching the tips and nodes colors with the edges
node_values <- sample(c("blue", "red"), 19, replace = TRUE)
edge_colors <- match.tip.edge(c(tip_values, node_values), tree)
plot(tree, show.tip.label = FALSE, edge.color = edge_colors)
tiplabels(1:20, bg = tip_values)
nodelabels(1:19, bg = node_values)
## Matching the tips and nodes colours to the root
data(bird.orders)
## Getting the bird orders starting with a "C"
some_orders <- sort(bird.orders$tip.label)[4:9]
## Get the edges linking these orders
edges_of_interest <- match.tip.edge(vector = some_orders,
phylo = bird.orders)
## Create a colour vector for all edges
all_edges <- rep("grey", Nedge(bird.orders))
## Replacing the edges of interest by another colour
all_edges[edges_of_interest] <- "black"
## Plot the results
plot(bird.orders, edge.color = all_edges)
Model Test
Description
Fit models of disparity change through time
Usage
model.test(
data,
model,
pool.variance = NULL,
time.split = NULL,
fixed.optima = FALSE,
control.list = list(fnscale = -1),
verbose = TRUE
)
Arguments
data |
A |
model |
The model(s) of evolution to allow for changes in disparity-through-time using a homogenous or hetergenous model, either using a single input or a list containing different models (See Details). If a vector with multiple modes is supplied then the model will test for shifts in modes at the time supplied by |
pool.variance |
If |
time.split |
The age of the change in mode ( |
fixed.optima |
A |
control.list |
A |
verbose |
|
Details
DISCLAIMER: this function is working properly (i.e. it does what it is supposed to do), however, the interpretation of the results has not yet been thought through, discussed and peer-reviewed (what does a Brownian motion like disparity curve means biologically?).
The models are fit using maximum likelihood optimisation using the function optim. Fine-tuning of the search algorithms can be applied using the control.list
argument. Models can be fit using a homogenous model with the same process applied to the entire sequence or models with time splits that represent a change in parameters or a shift in mode. When a heterogeneous and/or a time-shift model is specified with a specified time.split
then the shift is tested at that value only. If no time shift is supplied then multiple shift times are tested, with all bins that allow for at least 10 bins either side of the split. If the entire sample is fewer than 30 samples long then no time splits are searched for (unless a time split is supplied by the user). Parameters are shared across different modes. For example, c("BM", "OU")
would fit a model in which the process starts with a BM model and shifts to an OU process. The ancestral value at the start of the sequence and sigma squared value are shared across the models. Any combination of the following homogenous models (with the exception of "multi.OU"
) can be fit to the data:
BM: Fits a unbiased random walk model of Brownian motion evolution (Felsenstein 1973; 1985; Hunt 2006). The model optimises the ancestral state and the 'step-variance' (sigma-squared).
OU: The Ornstein-Uhlenbeck model of evolution in which the change in variance is constrained to an optimum value (Hansen 1997). In this model there are three parameters: optima, alpha, and ancestral state. The strength of attraction based on the parameter alpha and the ancestral state is estimated from the data. The optima value is estimated from the data, and this can lead to optima being found outside the known data values, and thus the model can resemble a trend. If the argument
fixed.optima = TRUE
, the model will not estimate optima but constrain it to the first (ancestral) value in the sequence as is done in phylogenetic OU models.Trend: Fits a Brownian motion model with a directional component. This model is also known as the General Random Walk (Hunt 2006). This model has three parameters: the ancestral state, the 'step-variance' (sigma-squared), and the positive or negative trend.
Stasis: Fits a model in which traits evolve with variance (omega) around a mean (theta). This model is time-independent in that the model is guided only by the variance and attraction to the mean (Hunt 2006).
EB: Early-Burst, trait variance accumulates early in the evolution of a trait and decreases exponentially through time (Blomberg et al. 2003; Harmon et al. 2010). This model has three parameters: ancestral state, sigma-squared, and the exponential rate of decrease. Note this model expects the mean to remain unchanged through the model, so does not explicitly model a rapid change to a new mean or optimum value.
multi.OU: Fits a model in which the value of the optima shifts at one or more time splits. The values of the 'step-variance' (sigma squared) and attraction to the optima (alpha) are shared across all the samples. This model can not be fit with other models - the multi.OU system can be fit to the model only.
Value
A list of class dispRity
and model.test
that can be plotted and summarised via summary.dispRity
and plot.dispRity
.
The list is composed of:
-
$aic.models
summary for each model's small sample Akaike Information Criterion (AICc), delta AICc, and AICc weight -
$full.models
the list of the full models outputs fromoptim
with the estimated parameters, log-likelihood, convergence statistics, and the split.time if applicable -
$call
the model input -
$models.data
input data used by the model(s) -
$fixed.optima
Logical indicating whether a fixed optima was assumed for OU model(s)
Author(s)
Mark N Puttick and Thomas Guillerme
References
Blomberg SP, Garland T Jr, & Ives AR. 2003. Testing for phylogenetic signal in comparative data: behavioral traits are more labile. Evolution. 57, 717-745.
Hansen TF. 1997. Stabilizing selection and the comparative analysis of adaptation. Evolution. 51, 1341-1351.
Harmon LJ, et al. 2010. Early bursts of body size and shape evolution are rare in comparative data. 64, 2385-2396.
Hunt G. 2006. Fitting and comparing models of phyletic evolution: random walks and beyond. Paleobiology. 32, 578-601. DOI: 10.1666/05070.1.
Hunt G, Hopkins MJ & Lidgard S. 2015. Simple versus complex models of trait evolution and stasis as a response to environmental change. Proceedings of the National Academy of Sciences. 112, 4885-4890. DOI: 10.1073/pnas.1403662111
Felsenstein J. 1973. Maximum-likelihood estimation of evolutionary trees from continuous characters. American Journal of Human Genetics. 25, 471-492.
Felsenstein J. 1985. Phylogenies and the comparative method. The American Naturalist. 51, 1-15.
Murrell DJ. 2018. A global envelope test to detect non-random bursts of trait evolution. Methods in Ecology and Evolution. DOI: 10.1111/2041-210X.13006
See Also
model.test.wrapper
, model.test.sim
, summary.dispRity
and plot.dispRity
Examples
## Not run:
## Mammal disparity through time
data(BeckLee_disparity)
## The four models to fit
models <- list("BM", "OU", "multi.OU", c("BM", "OU"))
## Fitting the four models to the disparity data
tests <- model.test(BeckLee_disparity, models, time.split = 66)
## Summarising the models
summary(tests)
## Plotting only the models support
plot(tests)
## End(Not run)
Simulate Model Test
Description
Simulate models of disparity change through time
Usage
model.test.sim(
sim = 1,
model,
model.rank = 1,
alternative = "two-sided",
time.split = NULL,
time.span = 100,
variance = 1,
sample.size = 100,
parameters = list(),
fixed.optima = FALSE
)
Arguments
sim |
The number of separate simulations to run. |
model |
Either (i) the named model of evolution to simulate for changes in disparity-through-time using a homogenous or hetergenous model (see list in |
model.rank |
If a |
alternative |
If the simulation is based on a |
time.split |
The age of the change in mode. The age is measured as the time before the most recent sample, and multiple ages can be supplied in a vector. Note this only applies to heterogenous models. |
time.span |
The length of the sequence ( |
variance |
The variance of each sample ( |
sample.size |
The sample size of each sample ( |
parameters |
A |
fixed.optima |
A |
Details
DISCLAIMER: this function is working properly (i.e. it does what it is supposed to do), however, the interpretation of the results has not yet been thought through, discussed and peer-reviewed (what does a Brownian motion like disparity curve means biologically?).
The parameters
is a list of arguments to be passed to the models.
These arguments can be:
ancestral.state
, ancestral value of the disparity applicable to all models (default =0.01
).sigma.squared
, rate of step variance to all models except Stasis (default =1
).alpha
, strength of attraction to the optimum in OU models (default =1
).optima.1
, the value of the optimum in a OU model, or the first bin optimum in a multi-OU model (default =0.15
).optima.2
, the second bin optimum in a multi-OU model (default =0.15
).optima.3
, the third bin optimum in a multi-OU model (default =0.15
).theta.1
, the mean in a Stasis model, or the first bin mean in a multi-Stasis model (default =1
).theta.2
, the second bin optimum in a multi-OU model (default =1
).theta.3
, the third bin optimum in a multi-OU model (default =1
).omega
, the variance in a Stasis model (default =1
).trend
, the trend parameter in the Trend model (default =0.5
).eb.rate
, the rate of exponential rate decrease in the EB model (default =-0.1
).
Value
A list of class dispRity
and model.sim
. Each list element contains the simulated central tendency, as well as the variance, sample size, and subsets used to simulate the data.
Author(s)
Mark N Puttick and Thomas Guillerme
References
Blomberg SP, Garland T Jr, & Ives AR. 2003. Testing for phylogenetic signal in comparative data: behavioral traits are more labile. Evolution. 57, 717-745.
Hansen TF. 1997. Stabilizing selection and the comparative analysis of adaptation. Evolution. 51, 1341-1351.
Harmon LJ, et al. 2010. Early bursts of body size and shape evolution are rare in comparative data. 64, 2385-2396.
Hunt G. 2006. Fitting and comparing models of phyletic evolution: random walks and beyond. Paleobiology. 32, 578-601. DOI: 10.1666/05070.1.
Hunt G, Hopkins MJ & Lidgard S. 2015. Simple versus complex models of trait evolution and stasis as a response to environmental change. Proceedings of the National Academy of Sciences. 112, 4885-4890. DOI: 10.1073/pnas.1403662111
Felsenstein J. 1973. Maximum-likelihood estimation of evolutionary trees from continuous characters. American Journal of Human Genetics. 25, 471-492.
Felsenstein J. 1985. Phylogenies and the comparative method. The American Naturalist. 51, 1-15.
Murrell DJ. 2018. A global envelope test to detect non-random bursts of trait evolution. Methods in Ecology and Evolution. DOI: 10.1111/2041-210X.13006
Citation for the envelope code:
See Also
model.test
, model.test.wrapper
, summary.dispRity
and plot.dispRity
Examples
## Not run:
## Disparity through time data
data(BeckLee_disparity)
## List of models to test
models <- list("Trend", "BM")
## Testing the models on the observed disparity
model_test_output <- model.test(BeckLee_disparity, models, time.split = 66)
## simulations using the output from model.test
model_test_sim_output <- model.test.sim(sim = 100, model= model_test_output)
## Plot the simulated best model
plot(model_test_sim_output)
## Add the observed data
plot(BeckLee_disparity, add = TRUE, col = c("pink", "#ff000050", "#ff000050"))
## Simulating a specific model with specific parameters parameters
model_simulation <- model.test.sim(sim = 100, model = "BM", time.span = 120, variance = 0.1,
sample.size = 100, parameters = list(ancestral.state = 0,
sigma.squared = 0.1))
## Summarising the results
plot(model_simulation, main = "A simple Brownian motion")
## End(Not run)
Model test wrapper
Description
A wrapper function for model.test
to perform a model fitting analysis on disparity through time data.
Usage
model.test.wrapper(
data,
model,
pool.variance = NULL,
time.split = NULL,
fixed.optima = FALSE,
control.list = list(fnscale = -1),
verbose = TRUE,
sim = 1000,
plot.sim = TRUE,
col.sim,
col.obs = "hotpink",
lwd.obs = 2,
show.p = FALSE,
cex.p,
legend = FALSE,
...
)
Arguments
data |
A |
model |
The model(s) of evolution to allow for changes in disparity-through-time using a homogenous or hetergenous model, either using a single input or a list containing different models (see list in |
pool.variance |
If |
time.split |
The age of the change in mode ( |
fixed.optima |
A |
control.list |
A |
verbose |
|
sim |
The number of separate simulations (default = 1000). |
plot.sim |
Logical. If |
col.sim |
Colour options used for the plotting of simulated values. See |
col.obs |
Colour of the observed data on the plot. Default colour is |
lwd.obs |
Line width of the observed value. |
show.p |
Logical, when |
cex.p |
A numerical value for the the font size of the displayed p-value (if |
legend |
Logical, when |
... |
Any additional arguments to be passed to |
Details
This function gives the relative fit of model.test
output using log-likelihood and AICc values, as well as the Rank Envelope Test significance to elucidate if empirical data is significantly different to simulated data modelled using the estimated model parameters from model.test.sim
. This is equivalent to running test <- model.test.sim(sim = 1000, model = model.test(data, model)); summary(test) ; plot(test) ; plot(data, add = TRUE)
.
DISCLAIMER: this function is working properly (i.e. it does what it is supposed to do), however, the interpretation of the results has not yet been thought through, discussed and peer-reviewed (what does a Brownian motion like disparity curve means biologically?).
Value
A matrix with the relative fit, parameter values, and Rank Envelope test p values for each model, and a plot of simulated data from each model alongside observed data for each model if plot.sim is TRUE
Author(s)
Mark N Puttick and Thomas Guillerme
References
Blomberg SP, Garland T Jr, & Ives AR. 2003. Testing for phylogenetic signal in comparative data: behavioral traits are more labile. Evolution. 57, 717-745.
Hansen TF. 1997. Stabilizing selection and the comparative analysis of adaptation. Evolution. 51, 1341-1351.
Harmon LJ, et al. 2010. Early bursts of body size and shape evolution are rare in comparative data. 64, 2385-2396.
Hunt G. 2006. Fitting and comparing models of phyletic evolution: random walks and beyond. Paleobiology. 32, 578-601. DOI: 10.1666/05070.1.
Hunt G, Hopkins MJ & Lidgard S. 2015. Simple versus complex models of trait evolution and stasis as a response to environmental change. Proceedings of the National Academy of Sciences. 112, 4885-4890. DOI: 10.1073/pnas.1403662111
Felsenstein J. 1973. Maximum-likelihood estimation of evolutionary trees from continuous characters. American Journal of Human Genetics. 25, 471-492.
Felsenstein J. 1985. Phylogenies and the comparative method. The American Naturalist. 51, 1-15.
Murrell DJ. 2018. A global envelope test to detect non-random bursts of trait evolution. Methods in Ecology and Evolution. DOI: 10.1111/2041-210X.13006
See Also
model.test
, model.test.sim
, summary.dispRity
and plot.dispRity
Examples
## Not run:
## Mammal disparity through time
data(BeckLee_disparity)
## The models to be fit to disparity data
models <- list("BM", "OU", "multi.OU", "Trend")
## test all models, and assess the significance of simulated data
## against the empirical distribution for each
model.test.wrapper(data = BeckLee_disparity, model = models, fixed.optima = TRUE,
time.split = 66, show.p = TRUE)
## End(Not run)
Ancestral states estimations with multiple trees
Description
Fast ancestral states estimations run on multiple trees using the Mk model from castor::asr_mk_model.
Usage
multi.ace(
data,
tree,
models,
threshold = TRUE,
special.tokens,
special.behaviours,
brlen.multiplier,
verbose = FALSE,
parallel = FALSE,
output,
options.args,
estimation.details = NULL
)
Arguments
data |
A |
tree |
A |
models |
A |
threshold |
either |
special.tokens |
optional, a named |
special.behaviours |
optional, a |
brlen.multiplier |
optional, a vector of branch length modifiers (e.g. to convert time branch length in changes branch length) or a list of vectors (the same length as |
verbose |
|
parallel |
Either a |
output |
optional, see |
options.args |
optional, a named list of options to be passed to function called by |
estimation.details |
optional, whether to also return the details for each estimation as returned by |
Details
Depending on the type of characters models
argument can be either:
the name of a single model to apply to all characters (if all characters are discrete or all are continuous); see below for the list of available names. For example
models = "ER"
applies the Equal Rates model to all characters (assuming they are all discrete characters).a vector of model names to apply to different type of characters (see below for the list). For example
models = c("ER", "ER", "BM")
applies the Equal Rates model to the two first characters (discrete) and the"BM"
model to the third character (continuous).a transition
"matrix"
to be applied to all characters (if discrete). For examplemodels = matrix(0.2, 2, 2)
.an single named list of arguments to be applied to all characters by passing it to
ape::ace
(if continuous). For examplemodels = list(method = "GLS", corStruct = corBrownian(1, my_tree))
.an un-ambiguous list of arguments to be passed to either
castor::asr_mk_model
(discrete characters) orape::ace
(continuous characters). For examplemodels = list("char1" = list(transition_matrix = matrix(0.2, 2, 2)), "char2" = list(method = "GLS", corStruct = corBrownian(1, my_tree)))
to be specifically passed to the characters named "char1" and "char2"
The available built-in models for discrete characters in castor::asr_mk_model
are:
-
"ER"
for all equal rates -
"SYM"
for symmetric rates -
"ARD"
all rates are different -
"SUEDE"
equal stepwise transitions (e.g. for meristic/counting characters) -
"SRD"
different stepwise transitions
See directly castor::asr_mk_model
for more models.
The available built-in models and methods for continuous characters in ape::ace
are:
-
"BM"
model: for a default Brownian Motion with the "REML" method -
"REML"
method: for a default Brownian Motion with the "REML" method (same as above) -
"ML"
method: for a default Brownian Motion with the "ML" method -
"pic"
method: for a default Brownian Motion with the "pic" (least squared) method
The threshold
option allows to convert ancestral states likelihoods into discrete states. When threshold = FALSE
, the ancestral state estimated is the one with the highest likelihood (or at random if likelihoods are equal). When threshold = TRUE
, the ancestral state estimated are all the ones that are have a scaled likelihood greater than the maximum observed scaled likelihood minus the inverse number of possible states (i.e. select_state >= (max(likelihood) - 1/n_states)
). This option makes the threshold selection depend on the number of states (i.e. if there are more possible states, a lower scaled likelihood for the best state is expected). Finally using a numerical value for the threshold option (e.g. threshold = 0.95
) will simply select only the ancestral states estimates with a scaled likelihood equal or greater than the designated value. This option makes the threshold selection absolute. Regardless, if more than one value is select, the uncertainty token (special.tokens["uncertainty"]
) will be used to separate the states. If no value is selected, the uncertainty token will be use between all observed characters (special.tokens["uncertainty"]
).
special.behaviours
allows to generate a special rule for the special.tokens
. The functions should can take the arguments character, all_states
with character
being the character that contains the special token and all_states
for the character (which is automatically detected by the function). By default, missing data returns and inapplicable returns all states, and polymorphisms and uncertainties return all present states.
missing = function(x,y) y
inapplicable = function(x,y) y
polymorphism = function(x,y) strsplit(x, split = "\\&")[[1]]
uncertainty = function(x,y) strsplit(x, split = "\\/")[[1]]
Functions in the list must be named following the special token of concern (e.g. missing
), have only x, y
as inputs and a single output a single value (that gets coerced to integer
automatically). For example, the special behaviour for the special token "?"
can be coded as: special.behaviours = list(missing = function(x, y) return(NA)
to make ignore the character for taxa containing "?"
.
When using the parallel option (either through using parallel = TRUE
by using the number of available cores minus on or manually setting the number of cores - e.g. parallel = 5
), the castor::asr_mk_model
function will use the designated number of cores (using the option Nthreads = <requested_number_of_cores>
). Additionally, if the input tree
is a "multiPhylo"
object, the trees will be run in parallel for each number of cores, thus decreasing computation time accordingly (e.g. if 3 cores are requested and tree
contains 12 "phylo"
objects, 4 different "phylo"
objects will be run in parallel on the 3 cores making the calculation around 3 times faster).
Value
Returns a "matrix"
or "list"
of ancestral states. By default, the function returns the ancestral states in the same format as the input matrix
. This can be changed using the option output = "matrix"
or "list"
to force the class of the output.
To output the combined ancestral states and input, you can use "combined"
(using the input format) or "combined.matrix"
or "combined.list"
.
If using continuous characters only, you can use the output option "dispRity"
to directly output a usable dispRity
object with all trees and all the data (estimated and input).
NOTE that if the input data had multiple character types (continuous and discrete) and that "matrix"
or "combined.matrix"
output is requested, the function returns a "data.frame"
.
Author(s)
Thomas Guillerme
See Also
castor::asr_mk_model
, char.diff
Examples
set.seed(42)
## A simple example:
## A random tree with 10 tips
tree <- rcoal(10)
## Setting up the parameters
my_rates = c(rgamma, rate = 10, shape = 5)
## A random Mk matrix (10*50)
matrix_simple <- sim.morpho(tree, characters = 50, model = "ER", rates = my_rates,
invariant = FALSE)
## Run a basic ancestral states estimations
ancestral_states <- multi.ace(matrix_simple, tree)
ancestral_states[1:5, 1:5]
## A more complex example
## Create a multiple list of 5 trees
multiple_trees <- rmtree(5, 10)
## Modify the matrix to contain missing and special data
matrix_complex <- matrix_simple
matrix_complex[sample(1:length(matrix_complex), 50)] <- "-"
matrix_complex[sample(1:length(matrix_complex), 50)] <- "0%2"
matrix_complex[sample(1:length(matrix_complex), 50)] <- "?"
matrix_complex[1:5,1:5]
## Set a list of extra special tokens
my_spec_tokens <- c("weirdtoken" = "%")
## Set some special behaviours for the "weirdtoken" and for "-" and "?"
my_spec_behaviours <- list()
## Inapplicable tokens "-" are ignored
my_spec_behaviours$inapplicable <- function(x,y) return(NA)
## Missing tokens "?" are considered as all states
my_spec_behaviours$missing <- function(x,y) return(y)
## Weird tokens are considered as state 0 and 3
my_spec_behaviours$weirdtoken <- function(x,y) return(c(1,2))
## Create a random branch length modifier to apply to each tree
branch_lengths <- rnorm(18)^2
## Setting a list of model ("ER" for the 25 first characters and then "SYM")
my_models <- c(rep("ER", 25), rep("SYM", 25))
## Run the ancestral states on all the tree with multiple options
ancestral_states <- multi.ace(matrix_complex, multiple_trees,
verbose = TRUE,
models = my_models,
threshold = 0.95,
special.tokens = my_spec_tokens,
special.behaviours = my_spec_behaviours,
brlen.multiplier = branch_lengths,
output = "combined.matrix")
## The results for the the two first characters for the first tree
ancestral_states[[1]][, 1:2]
## Not run:
## The same example but running in parallel
ancestral_states <- multi.ace(matrix_complex, multiple_trees,
verbose = TRUE,
models = my_models,
threshold = 0.95,
special.tokens = my_spec_tokens,
special.behaviours = my_spec_behaviours,
brlen.multiplier = branch_lengths,
output = "combined.matrix",
parallel = TRUE)
## End(Not run)
Testing a null hypothesis on multidimensional data.
Description
Testing the difference between the observed disparity and disparity under a null model.
Usage
null.test(
data,
replicates = 100,
null.distrib,
null.args = NULL,
null.cor = NULL,
null.scree = NULL,
alter = "two-sided",
scale = FALSE,
...
)
Arguments
data |
a |
replicates |
the number of replicates for the test (default = |
null.distrib |
one or more distribution functions to generate the null model to be passed to |
null.args |
any additional distribution arguments to be passed to |
null.cor |
an additional correlation matrix to be passed to |
null.scree |
an additional vector of variance per axis (equivalent to screeplot output); |
alter |
the type of alternative hypothesis (H1) as used in |
scale |
whether to scale the simulated and the observed data. |
... |
optional arguments to be passed to |
Author(s)
Thomas Guillerme
References
Diaz, S., Kattge, J., Cornelissen, J.H., Wright, I.J., Lavorel, S., Dray, S., Reu, B., Kleyer, M., Wirth, C., Prentice, I.C. and Garnier, E., 2016. The global spectrum of plant form and function. Nature, 529(7585), pp.167-171.
See Also
Examples
## Load the Beck & Lee 2014 data
data(BeckLee_mat50)
## Calculating the disparity as the ellipsoid volume
obs_disparity <- dispRity(BeckLee_mat50, metric = ellipsoid.volume)
## Testing against normal distribution
results <- null.test(obs_disparity, replicates = 100, null.distrib = rnorm)
results ; plot(results)
## Running the test on multiple subsets (may take some time!)
## Generating the subsets
groups <- as.data.frame(matrix(data = c(rep(1, 12), rep(2, 13), rep(3, 12),
rep(4, 13)), dimnames = list(rownames(BeckLee_mat50))), ncol = 1)
customised_subsets <- custom.subsets(BeckLee_mat50, groups)
## Bootstrapping the data
bootstrapped_data <- boot.matrix(customised_subsets, bootstraps = 100)
## Calculating variances of each dimension
sum_variances <- dispRity(bootstrapped_data, metric = c(sum, variances))
## Testing against normal distribution
results <- null.test(sum_variances, replicates = 100, null.distrib = rnorm)
summary(results) ; plot(results)
Plots pairwise comparisons
Description
Plots pairwise comparisons from a data frame (typically output from test.dispRity
).
Usage
pair.plot(
data,
what,
col = c("black", "white"),
legend = FALSE,
binary,
diag,
add,
lower = TRUE,
...
)
Arguments
data |
A |
what |
A |
col |
The two extremes of a color gradient (default = |
legend |
Logical, whether to plot the legend or not. |
binary |
Optional, if the results must be binary, a |
diag |
Optional, can be |
add |
Optional, whether to add significance tokens can be |
lower |
Optional, logical, whether to add tokens for values lower than |
... |
Any other options to be passed to |
Details
The number of rows (i.e. comparisons) in matrix
must be equal to the results of a pairwise combination.
In general, the number of rows x
must satisfy the equation: x = n^2 / 2 - n / 2
where n
must be an integer greater or equal than 2.
Author(s)
Thomas Guillerme
See Also
Examples
## A small matrix of two pairwise comparisons of seven elements (2*21 comparisons)
data <- matrix(data = runif(42), ncol = 2)
## Plotting the first column as a pairwise comparisons
pair.plot(data, what = 1, col = c("orange", "blue"), legend = TRUE, diag = 1)
## Adding some tokens for each value below 0.2 in the second column
pair.plot(data, what = 2, binary = 0.2, add = "*", cex = 2)
## Loading disparity data
data(disparity)
## Testing the pairwise difference between slices
tests <- test.dispRity(disparity, test = wilcox.test, correction = "bonferroni")
## Plotting the significance
pair.plot(as.data.frame(tests), what = "p.value", binary = 0.05)
phylolm dispRity (from phylolm::phylolm
)
Description
Passing dispRity
objects to the phylolm
function from the phylolm
package. Typically to run some PGLS.
Usage
pgls.dispRity(data, tree, formula, model = "BM", ..., optim = list())
Arguments
data |
A |
tree |
If |
formula |
The PGLS formula. If left empty, runs either |
model |
The covariance model (default is |
... |
Any optional arguments to be passed to |
optim |
An optional named list of arguments to be passed to the function |
Details
The formula
needs to be expressed by always naming the response variable disparity
to use the calculated disparity data from data
.
Optional arguments ...
correspond to all the non-ambiguous named arguments from the phylolm
. Optional arguments for the internal optim
function can be passed as a named list to the optim
argument.
Author(s)
Thomas Guillerme
See Also
phylolm
, test.dispRity
, custom.subsets
, chrono.subsets
.
Examples
## Simple example
data(BeckLee_mat50)
data(BeckLee_tree)
disparity <- dispRity(BeckLee_mat50, metric = centroids, tree = BeckLee_tree)
## Running a simple PGLS
model <- pgls.dispRity(disparity)
summary(model)
## More complex example running a PGLS
## on multiple trees and using groups as a predictor
Plots character differences
Description
Plots a character difference matrix from a discrete character matrix or its character differences density profile.
Usage
## S3 method for class 'char.diff'
plot(
x,
...,
type = "matrix",
legend = TRUE,
legend.title = "Difference",
legend.pos = "topleft",
legend.round = 0,
axis = TRUE,
xlim,
ylim,
xlab,
ylab,
col,
main
)
Arguments
x |
A discrete matrix or an already computed character difference matrix of class |
... |
Any additional graphical arguments to be passed to |
type |
Either |
legend |
A logical value stating whether to print the legend or not (default = |
legend.title |
A |
legend.pos |
The position of the legend. Can be two |
legend.round |
A |
axis |
A logical value stating whether to print the axis or not (default = |
xlim |
Two |
ylim |
Two |
xlab |
A |
ylab |
A |
col |
Two colors for forming the gradient if |
main |
An overall title for the plot. |
Author(s)
Thomas Guillerme
See Also
Examples
## Comparing two characters
char.diff(list(c(0, 1, 0, 1), c(0, 1, 1, 1)))
## Pairwise comparisons in a morphological matrix
morpho_matrix <- matrix(sample(c(0,1), 100, replace = TRUE), 10)
## Plotting a matrix
plot.char.diff(morpho_matrix)
## Plotting the density profile of a char.diff object
char.diff_matrix <- char.diff(morpho_matrix)
plot(char.diff_matrix, type = "density")
dispRity object plotting
Description
Plots a dispRity
object.
Usage
## S3 method for class 'dispRity'
plot(
x,
...,
type,
quantiles = c(50, 95),
cent.tend = median,
rarefaction = NULL,
elements = FALSE,
observed = FALSE,
add = FALSE,
density = NULL,
specific.args
)
Arguments
x |
A |
... |
Any optional arguments to be passed to |
type |
Either |
quantiles |
The quantiles to display (default is |
cent.tend |
A function for summarising the bootstrapped disparity values (default is |
rarefaction |
Either |
elements |
|
observed |
|
add |
|
density |
the density of shading lines to be passed to |
specific.args |
optional, a named list of arguments to be passed for some specific plot types. See details. |
Details
When specifying optional arguments with ...
in a graph with multiple elements (e.g. points
, lines
, etc...) you can specify which specific element to affect using the syntax <element>.<argument>
. For example if you want everything in the plot to be in blue at the exception of the points to be red, you can use plot(..., col = "blue", points.col = "red")
.
The different type
arguments are:
-
"continuous"
: plots the results as a continuous line. -
"box"
: plots the results as discrete box plots (note that this option ignores the user set quantiles and central tendency). -
"line"
: plots the results as discrete vertical lines with the user's set quantiles and central tendency. -
"polygon"
: identical as"line"
but using polygons rather than vertical lines. -
"preview"
: plots two dimensional preview of the space (default isc(1,2)
). WARNING: the plotted dimensions might not be representative of the full multi-dimensional space!
The different specific.args
arguments for the following options are:
if
type = "preview"
, the specific arguments can be:-
dimensions
: two specific dimensions to plot (default isspecific.args = list(dimensions = c(1,2)
); -
matrix
: which specific matrix to plot the data from (by default, all the matrices are used). -
tree
: whether to plot the underlying tree(s) or not. Can be either logical, whether to plot no tree (default isspecific.args = list(tree = FALSE)
), all trees (specific.args = list(tree = TRUE)
) or a specific set of trees (e.g.specific.args = list(tree = c(1,2))
)
-
if data is a
"test.metric"
result that was obtained with the optionsave.steps = TRUE
(seetest.metric
), it is possible to specify which steps to by specifying the following specific argument:specific.args = list(visualise.steps = c(1,4,5))
for visualising steps 1, 4 and 5 of the different shifts. By default, if the"test.metric"
was obtained with the optionsave.steps = TRUE
, four steps are displayed.if data is a
"dispRity"
and"projection"
object (fromdispRity.covar.projections
), it is possible to plot either the boxplot of disparity values for each projection (usingcorrelation.plot = NULL
; default) or to plot the correlation between two calculated elements (e.g.correlation.plot = c("position", "distance")
).
When plotting "randtest"
or "test.metric"
data or when using type = "preview"
a legend is plotted by default. To remove the legend you can use the argument legend = FALSE
. You can control specific arguments for the legend using the ...
optional arguments preceded by legend.
. For example, to change the legend position you can use legend.x = "topleft"
or legend.x = 4.2
and legend.y = 1.23
. General legend
arguments such as col
, legend
, pch
, etc... are recycled by the function but can always be specified using this syntax.
Author(s)
Thomas Guillerme
See Also
dispRity
, summary.dispRity
, null.test
, dtt.dispRity
, model.test
, model.test.sim
, test.metric
Examples
## Load the disparity data based on Beck & Lee 2014
data(disparity)
## Discrete plotting
plot(disparity, type = "box")
## Using polygons rather than boxes (quantiles and central tendency can be
## set by the user)
plot(disparity, type = "polygon", quantiles = c(10, 50, 95),
cent.tend = mean)
## Using different options
plot(disparity, type = "line", elements = TRUE, ylim = c(0, 3),
xlab = ("Time (Ma)"), ylab = "disparity")
## Continuous plotting (all default options)
plot(disparity, type = "continuous")
## Rarefactions plots
plot(disparity, rarefaction = TRUE)
## Observed data
plot(disparity, observed = TRUE)
## Observed data with graphical details
plot(disparity, observed = list("pch" = 19, col = "blue", cex = 4))
## For plotting dispRity objects with the dual classes "randtest", "dtt",
## "model.test", "model.sim" and "test.metric" see the examples
## in the specific function manuals from the "See also" section above
Prints a dispRity
object.
Description
Summarises the content of a dispRity
object.
Usage
## S3 method for class 'dispRity'
print(x, all = FALSE, ...)
Arguments
x |
A |
all |
|
... |
further arguments to be passed to |
Author(s)
Thomas Guillerme
See Also
custom.subsets
, chrono.subsets
, boot.matrix
, dispRity
.
Examples
## Load the disparity data based on Beck & Lee 2014
data(disparity)
## Displaying the summary of the object content
disparity
print(disparity) # the same
print.dispRity(disparity) # the same
## Displaying the full object
print.dispRity(disparity, all = TRUE)
Random circle
Description
Creates coordinates for a random circle
Usage
random.circle(n, distribution, inner = 0, outer = Inf, ...)
Arguments
n |
The number of pairs x,y of coordinates. |
distribution |
The distribution from which the coordinates are sampled. |
inner |
Optional, the radius for an empty inner circle. |
outer |
Optional, the maximum radius for the circle. |
... |
Any additional argument to be passed to |
Author(s)
Thomas Guillerme
See Also
Examples
## A simple uniform circle
plot(random.circle(1000, runif), pch = 20)
## A normal ring with inner and outer boundaries
plot(random.circle(1000, rnorm, inner = 0.5, outer = 5), pch = 20)
Random (permutation) test
Description
Performs a random test (aka permutation test) on a matrix
or a dispRity
object.
Usage
## S3 method for class 'dispRity'
randtest(
xtest,
subsets,
metric,
replicates = 100,
resample = TRUE,
alter = "two-sided",
...
)
Arguments
xtest |
The |
subsets |
A |
metric |
A |
replicates |
A |
resample |
|
alter |
The alternative hypothesis. Can be |
... |
optional arguments to be passed to |
Details
This test checks whether the metric calculated on a given subset of the data is significantly different from the metric calculated on any random subset of the same size. In other words: does the given subset have a clearly different disparity value than the rest of the data?
First, the metric
(statistic) is applied to the subset
sampled from the data
(population).
Second, the metric
is applied to random equally sized subsets from the data
.
If the observed difference falls out of the random differences distribution, the differences are significant.
This algorithm is based on a similar procedure than in link[ade4]{rantest}
.
If data
is a dispRity
object, the subsets
, metric
and replicates
can be left missing and are automatically inherited from the dispRity
if it contains respectively subsets (from chrono.subsets
or custom.subsets
) a metric
(from dispRity
) and bootstrap draws (from boot.matrix
).
If data
is a dispRity
object subsets can be a list of subsets to compare for example list(c("A", "B"), c("B", "A"))
will run two tests comparing respectively sample A to B and B to A. Note that it will only compare these two samples and use their combined size as the population size, if you want to compare a subset to all the subsets you can use list(c("A")
or write down the specific subsets to be used.
Value
This function returns a "randtest"
object that can be passed to the generic S3 functions print.randtest
or plot.randtest
.
The output also contains to extra elements output$observed
and output$random
containing the raw results of respectively the observed and random tests.
Author(s)
Thomas Guillerme
See Also
Examples
## Simple example
dummy_matrix <- matrix(rnorm(500), 100, 5)
## Testing whether the mean of a random subset
## is different than the means of 100 subsets
dummy_test <- randtest.dispRity(dummy_matrix,
subset = sample(1:100, 20),
metric = mean)
dummy_test ; plot(dummy_test)
## Applying this on dispRity objects
data(disparity)
test_disparity <- test.dispRity(disparity,
test = randtest.dispRity)
## The summarised results
summary(test_disparity)
## Plotting the results
plot(test_disparity)
## Applying this on a dispRity object with specific subset comparisons
test_disparity2 <- randtest.dispRity(disparity, subsets = list(
## Comparing subset 90 to the whole population (traitspace)
c(observed = "90"),
## Comparing subset "70" to "90", "70" and "30"
c(observed = "70", random = c("90", "70", "30"))))
## Summarising and plotting the results
summary(test_disparity2)
plot(test_disparity2)
Reduce a matrix
Description
Reduce the number of rows/columns in a matrix to optimise overlap
Usage
reduce.matrix(matrix, distance = "gower", by.row = TRUE, verbose = FALSE)
Arguments
matrix |
A matrix |
distance |
which distance to consider (passed to |
by.row |
Whether to do it by rows ( |
verbose |
Whether to do be verbose ( |
Author(s)
Thomas Guillerme
Examples
set.seed(1)
## A 10*5 matrix
na_matrix <- matrix(rnorm(50), 10, 5)
## Making sure some rows don't overlap
na_matrix[1, 1:2] <- NA
na_matrix[2, 3:5] <- NA
## Adding 50% NAs
na_matrix[sample(1:50, 25)] <- NA
## Illustrating the gappy matrix
image(t(na_matrix), col = "black")
## Reducing the matrix by row
(reduction <- reduce.matrix(na_matrix))
## Illustrating the overlapping matrix
image(t(na_matrix[-as.numeric(reduction$rows.to.remove), ]), col = "black")
## Reducing the matrix by columns (and being verbose)
reduce.matrix(na_matrix, by.row = FALSE, verbose = TRUE)
Reduce space
Description
Remove elements from a multidimensional space
Usage
reduce.space(
space,
type,
remove,
parameters,
tuning,
verbose = FALSE,
return.optim = FALSE
)
Arguments
space |
the trait space |
type |
how to reduce the space (either |
remove |
the proportion of elements to be removed (in probability) |
parameters |
the parameter(s) for removal selection (see details). If left empty, the |
tuning |
Optinal parameters for tuning the parameter estimations (if remove is required and parameters is missing) a list of three parameters: "max" for the maximum of operations, "tol" for the tuning (e.g. 0.1 close), "inc.steps" for the initial increment value during optimisation (default = 2 - the bigger the value, the slower the increment). |
verbose |
wether to be verbose or not |
return.optim |
logical, whether to also return the optimal value. |
Details
The type of reductions algorithms select the proportion of elements to remove (from the remove
parameter). The different algorithms are:
-
"random"
for randomly selecting a proportion of data points (usingsample(..., replace = FALSE)
). -
"size"
for selecting the proportion of data points closer to the centre. -
"density"
for selecting the proportion of data points with the lower nearest neigbhour distances. -
"evenness"
for randomly selecting the proportion of data points from the regions with most density.
The parameters for each reduction type algorithms are:
-
"size"
parameters: a list ofparameters$centre
, the centre from which to count the radius (if missing, is set to0
); andparameters$radius
, the radius for removal. -
"density"
parameters: a list ofparameters$what
"close" (default) for close neighbours or "distant" for distant ones;parameters$diameter
the diameter for considering closeness or distance;parameters$output
either "singles" or "pairs" to return the pairs of neighbours or one of them only (the first). -
"position"
parameters: a list ofparameters$value
, value the threshold value from which to remove elements. -
"evenness"
parameters: a list ofparameters$bw
, a bandwith selector function (bw.nrd0
by default); andparameters$power
a scaling factor for exaggerating the flatting/narrowing of the curve (the counts are set to this parameter exponent: default is1
).
See Guillerme et al. 2020 and https://github.com/TGuillerme/moms for details.
Value
A vector of logical
values of the rows to remove selected by the function. TRUE
corresponds to the following (and FALSE
to the opposite):
"random": the randomly selected points.
"size": the points closer to the centre of the space.
"density": the points closer to each other.
"position": the points on the "positive" side of the space (typically upper right corner in 2D).
"evenness": the randomly select points from the higher density regions.
Author(s)
Thomas Guillerme
References
Guillerme T, Puttick MN, Marcy AE, Weisbecker V. 2020 Shifting spaces: Which disparity or dissimilarity measurement best summarize occupancy in multidimensional spaces?. Ecol Evol. 2020;00:1-16. (doi:10.1002/ece3.6452)
See Also
Examples
set.seed(1)
## Creating a two dimensional space
space <- space.maker(100, 2, distribution = stats::rnorm)
## Generating the four types of reductions
random <- reduce.space(space, "random", remove = 0.5)
size <- reduce.space(space, "size", remove = 0.5)
density <- reduce.space(space, "density", remove = 0.5)
position <- reduce.space(space, "position", remove = 0.5)
evenness <- reduce.space(space, "evenness", remove = 0.5)
## Plotting the four different results
par(mfrow = c(3,2))
plot(space, pch = 19, col = c("grey", "black")[as.factor(random)],
main = "Random removal")
plot(space, pch = 19, col = c("grey", "black")[as.factor(size)],
main = "Size removal")
plot(space, pch = 19, col = c("grey", "black")[as.factor(density)],
main = "Density removal")
plot(space, pch = 19, col = c("grey", "black")[as.factor(position)],
main = "Position removal")
plot(space, pch = 19, col = c("grey", "black")[as.factor(evenness)],
main = "Evenness removal")
## The space reduction with specific parameters:
# Using the point with coordinates (2,2) as the centre
# Running over a maximum of 300 iterations
# With a tolerance of 0.05 (5%)
reduce.space(space, "size", remove = 0.2,
parameters = list("centre" = c(2,2)),
tuning = list("max" = 300, "tol" = 0.05))
## Remove a specific amount to match a specific parameter
reduce.space(space, type = "size", parameters = list("radius" = 1.206866))
Remove zero branch length
Description
Remove zero or negative branch lengths on trees by sliding nodes randomly in a postorder traversal based on slide.nodes
.
Usage
remove.zero.brlen(tree, slide, verbose = FALSE)
Arguments
tree |
A |
slide |
An optional sliding |
verbose |
A |
Details
The sliding value will be used to slide the nodes up and down to remove zero branch lengths by minimising the amount of branch changes. The algorithm slides the nodes up and down (when possible) on each node in a recursive way while there is still zero or negative branch lengths. If two recursions produce the same series of zero branches (e.g. by sliding node A towards node B equally so that the distance A:B becomes 0), the sliding value is divided by two until the next slide.
Value
A "phylo"
object with a postorder edge table and no zero branch lengths.
Author(s)
Thomas Guillerme
See Also
Examples
set.seed(42)
## Generating a tree
tree <- rtree(20)
## Adding some zero branch lengths (5)
tree$edge.length[sample(1:Nedge(tree), 5)] <- 0
any(tree$edge.length == 0) # TRUE
## And now removing these zero branch lengths!
tree_no_zero <- remove.zero.brlen(tree)
any(tree_no_zero$edge.length == 0) # FALSE
## Exaggerating the removal (to make it visible)
tree_exaggerated <- remove.zero.brlen(tree, slide = 1)
## Plot the differences
par(mfrow = c(3,1))
plot(tree, main = "zero branch length")
plot(tree_no_zero, main = "no zero branch length")
plot(tree_exaggerated, main = "exaggerated slidding")
## Removing negative branch lengths
## Generating a tree with negative branch length
set.seed(3)
tree_negative <- chronoMPL(rtree(10))
## Removing the negative branch length (and make it non-zero)
tree_positive <- remove.zero.brlen(tree_negative)
## Plot the differences
par(mfrow = c(2, 1))
plot(tree_negative, main = "Negative branch lengths")
plot(tree_positive, main = "Positive branch lengths")
Rescaling and centering disparity results.
Description
Scales or/and centers the disparity measurements.
Usage
## S3 method for class 'dispRity'
scale(x, center = FALSE, scale = TRUE, ...)
Arguments
x |
a |
center |
either a |
scale |
either a |
... |
optional arguments to be passed to |
Details
To scale or and center using the full distribution (i.e. all the disparity values) or only the distribution within each subsets of bootstraps you can use the optional argument use.all
as a logical. By default is use.all = TRUE
and uses all the disparity values not only the ones in the subset.
Author(s)
Thomas Guillerme
See Also
dispRity
, test.dispRity
, scale
.
Examples
## Load the disparity data based on Beck & Lee 2014
data(disparity)
## Scaling the data
summary(scale.dispRity(disparity, scale = TRUE)) # Dividing by the maximum
## Multiplying by 10 (dividing by 0.1)
summary(scale.dispRity(disparity, scale = 0.1))
Selects ordination axes
Description
Selects the axes required to explain a cumulative threshold amount of variance in an ordination (e.g. > 95%).
Usage
select.axes(data, group, threshold = 0.95, inc.threshold = TRUE)
Arguments
data |
The trait space to analyse. This can be either a |
group |
Optional, either a |
threshold |
The arbitrary threshold amount of variance (by default this is |
inc.threshold |
Logical, whether to output the axes that contain the threshold value ( , i.e. the axes necessary to include at least the threshold value |
Details
If inc.threshold = TRUE
, the returned axes are the ones that contains at least the threshold value (e.g. if the threshold is 0.95
, all the returned axes contain at least 0.95
of the variance, potentially more). If inc.threshold = FALSE
, the returned axes are the ones before reaching this threshold (e.g. the cumulative variance returned is strictly less or equal to 0.95
).
Value
A "dispRity"
, "axes"
object that can be printed, summarised and plot through generic print
, summary
and plot
functions.
The object is a list containing:
-
$dimensions
: the maximum number of dimensions selected across all groups; -
$dim.list
: the selected dimensions per group; -
$var
: the variance per axes per group; -
$scaled.var
: the variance scaled variance per axes per group; -
$cumsum.var
: the cumulative scaled variance per axes per group; -
$call
: a list containing the$threshold
value and the$inc.threshold
option used.
Author(s)
Thomas Guillerme
See Also
Examples
## Ordinating the USArrests dataset
ordination <- princomp(USArrests, cor = TRUE)
## Which dimensions to select?
(selected <- select.axes(ordination))
## The selected dimensions
selected$dimensions
## Visualising the results
plot(selected)
## Same but by grouping the data into three groups
states_groups <- list("Group1" = c("Mississippi","North Carolina",
"South Carolina", "Georgia", "Alabama",
"Alaska", "Tennessee", "Louisiana"),
"Group2" = c("Florida", "New Mexico", "Michigan",
"Indiana", "Virginia", "Wyoming", "Montana",
"Maine", "Idaho", "New Hampshire", "Iowa"),
"Group3" = c("Rhode Island", "New Jersey", "Hawaii",
"Massachusetts"))
(selected <- select.axes(ordination, group = states_groups))
## Note that the required number of axes is now 4 (instead of 3)
plot(selected)
## Loading some example dispRity data
data(demo_data)
## How many axes are required to explain 99% of the variance
## for each group in the Healy et al 2019 data?
(how_many <- select.axes(demo_data$healy, threshold = 0.99))
summary(how_many)
plot(how_many)
Adds root time to a tree
Description
Adds or replace root time to a tree by calculating it's root's depth
Usage
set.root.time(tree, present = 0)
Arguments
tree |
A |
present |
The age of the most recent tip. By default this is set to |
Examples
## A random tree with no root.time
my_tree <- rtree(10)
my_tree$root.time # is NULL
## Adding a root time
my_tree <- set.root.time(my_tree)
my_tree$root.time # is not NULL
## Rewrite the root time with a different present
my_tree <- set.root.time(my_tree, present = 10)
my_tree$root.time # is older
Simulates morphological data.
Description
Generates a morphological matrix using rTraitDisc
or gen.seq.HKY
functions.
Usage
sim.morpho(
tree,
characters,
states = 1,
model = "ER",
rates,
substitution = c(stats::runif, 2, 2),
invariant = TRUE,
verbose = FALSE
)
Arguments
tree |
A phylogenetic tree to use for generating the characters. |
characters |
The number of morphological characters to generate. |
states |
A |
model |
Either an implemented ( |
rates |
A function an its parameters for the rates distribution (see details). |
substitution |
A function an its parameters for the substitutions distribution (see details; |
invariant |
|
verbose |
Whether to be verbose or not ( |
Details
The
model
arguments must be either a user's defined function for generating the discrete morphological characters (that takes the states, rates and substitution arguments) or one of the two following:-
"ER"
uses theape::rTraitDisc
function with the"ER"
model argument (= Mk model). -
"HKY"
uses thephyclust::gen.seq.HKY
function withkappa
sampled from thesubstitution
argument,pi = runif(4)
(divided bysum(runif(4))
),rate.scale
sampled from therates
distribution andL
being the number ofcharacters
and transforms the purines (A, G) into 0 and the pyrimidines (C, T) into 1. -
"MIXED"
randomly uses"ER"
or"HKY"
for binary characters and"ER"
for any character with more than two states. the user defined model must be a
function
that generates a single discrete morphological character and takes one element from at least the following arguments:tree
,states
,rates
,substitution
.
-
The
states
argument attributes a number of states to each character by using the given probability vector for each number of states starting from two. For examplestates = c(0.7, 0.2, 0.1)
will generate 70% of characters with two states, 20% of characters with three states and 10% of characters with four states.The
rates
andsubstitution
arguments require a function that outputs a distribution and its optional parameters. For examplerates = c(runif, 1, 10)
creates a uniform distribution between 1 and 10 for the rates distribution.
Author(s)
Thomas Guillerme
See Also
check.morpho
, apply.NA
, rTraitDisc
, gen.seq.HKY
Examples
set.seed(4)
## A random tree with 15 tips
tree <- rcoal(15)
## Setting up the parameters
my_rates = c(rgamma, rate = 10, shape = 5)
my_substitutions = c(runif, 2, 2)
## HKY binary (15*50)
matrixHKY <- sim.morpho(tree, characters = 50, model = "HKY",
rates = my_rates, substitution = my_substitutions)
## Mk matrix (15*50) (for Mkv models)
matrixMk <- sim.morpho(tree, characters = 50, model = "ER", rates = my_rates)
## Mk invariant matrix (15*50) (for Mk models)
matrixMk <- sim.morpho(tree, characters = 50, model = "ER", rates = my_rates,
invariant = FALSE)
## MIXED model invariant matrix (15*50)
matrixMixed <- sim.morpho(tree, characters = 50, model = "MIXED",
rates = my_rates, substitution = my_substitutions, invariant = FALSE,
verbose = TRUE)
Time slicing a tree.
Description
Time slicing through a phylogenetic tree.
Usage
slice.tree(tree, age, model, FAD, LAD, keep.all.ancestors = FALSE)
Arguments
tree |
A |
age |
A single |
model |
One of the following models: |
FAD , LAD |
The first and last occurrence data. |
keep.all.ancestors |
Optional, whether to also include the ancestors of the tree slice ( |
Author(s)
Thomas Guillerme
References
Guillerme T. & Cooper N. 2018. Time for a rethink: time sub-sampling methods in disparity-through-time analyses. Palaeontology. DOI: 10.1111/pala.12364.
See Also
paleotree::timeSliceTree
, chrono.subsets
.
Examples
set.seed(1)
## Generate a random ultrametric tree
tree <- rtree(20)
## Add some node labels
tree$node.label <- letters[1:19]
## Add its root time
tree$root.time <- max(tree.age(tree)$ages)
## Slice the tree at age 1.5
tree_slice <- slice.tree(tree, age = 1.5, "deltran")
## The slice at age 0.5 but keeping all the ancestors
deep_slice <- slice.tree(tree, age = 0.5, "deltran",
keep.all.ancestors = TRUE)
## Visualising the trees
old_par <- par(mfrow = c(2,2))
plot(ladderize(tree), main = "full tree"); axisPhylo()
abline(v = tree$root.time - 1.5)
plot(ladderize(tree_slice), main = "tree slice"); axisPhylo()
plot(ladderize(deep_slice), main = "slice with ancestors"); axisPhylo()
par(old_par)
Stretching a tree
Description
Stretches a phylogenetic tree at a particular node
Usage
slide.nodes(nodes, tree, slide, allow.negative.root = FALSE)
Arguments
nodes |
A list of the ID nodes to slide ( |
tree |
a |
slide |
the sliding value. |
allow.negative.root |
logical, whether to allow negative branch lengths and moving the root node ( |
Details
The sliding works by subtracting the slide value to the branch leading to the node and adding it to the descendant branches. Note that the slide value can be negative to slide nodes the other way (up); the only requirement is that the slide does not lead to negative branch length values.
Value
A "phylo"
object.
Author(s)
Thomas Guillerme
See Also
Examples
set.seed(42)
## Generating a coalescent tree
tree <- rcoal(5)
## Stretching node 8 up and down
tree_slide_up <- slide.nodes(8, tree, slide = 0.075)
tree_slide_down <- slide.nodes(8, tree, slide = -0.075)
## Display the results
par(mfrow = c(3,1))
plot(tree) ; axisPhylo() ; nodelabels()
plot(tree_slide_up) ; axisPhylo() ; nodelabels()
plot(tree_slide_down) ; axisPhylo() ; nodelabels()
## Stretching many nodes
set.seed(42)
tree <- rtree(50)
move_nodes <- c(99, 93, 53, 86, 58, 63, 60, 84)
tree_slided <- slide.nodes(move_nodes, tree, slide = 0.07)
## Display the results
par(mfrow = c(2, 1))
node_colors <- c("lightblue", "orange")[((1:Nnode(tree))+Ntip(tree)) %in% move_nodes + 1]
plot(tree, show.tip.label = FALSE) ; axisPhylo()
nodelabels(bg = node_colors, cex = 0.5)
plot(tree_slided, show.tip.label = FALSE) ; axisPhylo()
nodelabels(bg = node_colors, cex = 0.5)
Sorting or ordering a dispRity
object.
Description
Sort (or order) the subsets of a dispRity
object.
Usage
## S3 method for class 'dispRity'
sort(x, decreasing = FALSE, sort, ...)
Arguments
x |
A |
decreasing |
|
sort |
An optional |
... |
optional arguments to be passed to |
Author(s)
Thomas Guillerme
See Also
dispRity
, test.dispRity
, plot.dispRity
, get.subsets
, get.disparity
.
Examples
## Load the disparity data based on Beck & Lee 2014
data(disparity)
## Sorting the data
summary(disparity)
summary(sort(disparity, decreasing = TRUE))
summary(sort(disparity, sort = c(7,1,3,4,5,2,6)))
Creating multidimensional spaces
Description
Creates a multidimensional space with a given number of elements and dimensions
Usage
space.maker(
elements,
dimensions,
distribution,
arguments = NULL,
cor.matrix = NULL,
scree = NULL,
elements.names = NULL,
replicates = NULL
)
Arguments
elements |
An |
dimensions |
An |
distribution |
One or more |
arguments |
Optional |
cor.matrix |
An optional correlation |
scree |
An optional proportional |
elements.names |
Optional, a |
replicates |
Optional, an |
Details
When passing additional arguments to different distributions, these must be given as a list
to each function in the order they appear.
For example if distribution = c(runif, rnorm, rgamma)
and one wants the distributions to be runif(elements, min = 1, max = 10)
, rnorm(elements, mean = 8)
and rgamma(elements, shape = 1, log = TRUE)
, the additional arguments should be passed as
c(list(min = 1, max = 10), list(mean = 8), list(shape = 1, log = TRUE)
. If no arguments have to be passed to a certain function, it can be left as NULL
(e.g. c(list(min = 1, max = 10), list(NULL), list(shape = 1, log = TRUE)
).
The cor.matrix
argument should be a correlation matrix between the dimensions.
If not NULL
, the multidimensional space is multiplied by the the Choleski decomposition (chol
) of the correlation matrix.
The scree
argument is simply a value multiplier for each dimension to adjust their variance to approximate the scree
one.
Author(s)
Thomas Guillerme
See Also
Examples
## A square space
plot(space.maker(5000, 2, runif), pch = 20)
## A circular space
plot(space.maker(5000, 2, rnorm), pch = 20)
## A 2-dimensional cylindrical space
plot(space.maker(5000, 2, c(rnorm, runif)), pch = 20)
## A 4-dimensional space with different distributions
space.maker(5, 4, c(runif, runif, rnorm, rgamma),
arguments = list(list(min = 1, max = 10), list(min = 1, max = 2),
list(mean = 8), list(shape = 1)))
## A 3-dimensional correlated space
cor_matrix <- matrix(cbind(1, 0.8 ,0.2, 0.8, 1, 0.7, 0.2, 0.7, 1), nrow = 3)
space <- space.maker(10000, 3, rnorm, cor.matrix = cor_matrix)
round(cor(space), 1) ; cor_matrix ## Both should be really similar matrices
## A 3-dimensional space with a priori approximated variance for each dimension
space <- space.maker(10000, 3, rnorm, scree = c(0.6, 0.3, 0.1))
## The resulting screeplot
barplot(apply(space, 2, var))
## Generate 3 2D normal spaces with rownames
space.maker(10, 2, rnorm, elements.names = letters[1:10], replicates = 3)
## Not run:
require(scatterplot3d)
## A cube space
scatterplot3d(space.maker(5000, 3, runif), pch = 20)
## A plane space
scatterplot3d(space.maker(5000, 3, c(runif, runif, runif),
arguments = list(list(min = 0, max = 0), NULL, NULL)), pch = 20)
## A sphere space
scatterplot3d(space.maker(5000, 3, rnorm), pch = 20)
## A 3D cylindrical space
scatterplot3d(space.maker(5000, 3, c(rnorm, rnorm, runif)), pch = 20)
## Generating a doughnut space
doughnut <- space.maker(5000, 3, c(rnorm, random.circle),
arguments = list(list(mean = 0), list(runif, inner = 0.5, outer = 1)))
## Reodering the axis for projecting the dougnut in 2D
scatterplot3d(doughnut[,c(2,1,3)], pch = 20)
## End(Not run)
dispRity object summary
Description
Creates a summary of a dispRity
object.
Usage
## S3 method for class 'dispRity'
summary(
object,
...,
quantiles = c(50, 95),
cent.tend = median,
recall = FALSE,
digits
)
Arguments
object |
A |
... |
Additional arguments to be passed to |
quantiles |
The quantiles to display (default is |
cent.tend |
A function for summarising the bootstrapped disparity values (default is |
recall |
|
digits |
Optional, a value for digits the values in the output table (default = 2). |
Details
If the dispRity
object to summarise comes from a chrono.subsets
using a "multiPhylo"
object, the displayed number of observations (n
) corresponds to the maximum number of observation at the specific time slice (some slices through some trees might have less observations).
Value
A data.frame
with:
subsets |
the subset names. |
n |
the maximum number of elements in each subset (see details). |
observed |
the observed disparity or the the observed central tendency (<cent_tend>) of disparity ( |
bootstraps... |
if |
Author(s)
Thomas Guillerme
See Also
Examples
## Load the disparity data based on Beck & Lee 2014
data(disparity)
## Summarising the results
summary(disparity) # default
## Using different options
summary(disparity, quantiles = 75, cent.tend = mean, digits = 8,
recall = TRUE)
Testing disparity hypotheses
Description
Applying statistical tests to dispRity objects
Usage
test.dispRity(
data,
test,
comparisons = "pairwise",
rarefaction = NULL,
correction = "none",
concatenate = TRUE,
conc.quantiles = c(mean, c(95, 50)),
details = FALSE,
...
)
Arguments
data |
A |
test |
A test |
comparisons |
If data contains more than two subsets, the type of comparisons to apply: either |
rarefaction |
A |
correction |
Which p-value correction to apply to |
concatenate |
Logical, whether to concatenate bootstrapped disparity values ( |
conc.quantiles |
If |
details |
Whether to output the details of each test (non-formatted; default = |
... |
Additional options to pass to the test |
Details
The comparison
argument can be:
-
"pairwise"
: pairwise comparisons of all the subsets (default). -
"referential"
: compares the first subset to all the others. -
"sequential"
: compares each subset sequentially (e.g. first against second, second against third, etc.). -
"all"
: compares all the subsets simultaneously to the data (i.e.bootstrapped disparity ~ subsets names
). This argument is used forlm
orglm
type tests. A list of pairs of number of subsets to compare. Each element of the list must contain two elements (e.g.
list(c("a","b"), ("b", "a"))
to compare "a" to "b" and then "b" to "a").-
If the called test is
null.test
, the comparison argument is ignored.
IMPORTANT: if you are performing multiple comparisons (e.g. when using "pairwise"
, "referential"
or "sequential"
), don't forget about the Type I error rate inflation. You might want to use a p-value correction (see p.adjust
).
Author(s)
Thomas Guillerme
See Also
dispRity
, null.test
, bhatt.coeff
, pair.plot
, adonis.dispRity
, randtest.dispRity
, test.dispRity
Examples
## Load the Beck & Lee 2014 data
data(BeckLee_mat50)
data(BeckLee_tree)
## Calculating the disparity from customised subsets
## Generating the subsets
groups <- crown.stem(BeckLee_tree, inc.nodes = FALSE)
customised_subsets <- custom.subsets(BeckLee_mat50, groups)
## Bootstrapping the data
bootstrapped_data <- boot.matrix(customised_subsets, bootstraps = 100)
## Calculating the sum of variances
sum_of_variances <- dispRity(bootstrapped_data, metric = c(sum, variances))
## Measuring the subset overlap
test.dispRity(sum_of_variances, bhatt.coeff, "pairwise")
## Measuring differences from a reference subset
test.dispRity(sum_of_variances, wilcox.test, "referential")
## Measuring disparity as a distribution
disparity_var <- dispRity(bootstrapped_data, metric = variances)
## Differences between the concatenated bootstrapped values of the subsets
test.dispRity(disparity_var, test = t.test, comparisons = "pairwise",
concatenate = TRUE, correction = "bonferroni")
## Differences between the subsets bootstrapped
test.dispRity(disparity_var, test = t.test, comparisons = "pairwise",
concatenate = FALSE, correction = "bonferroni",
conc.quantiles = c(mean, c(95, 5)))
Test disparity metric
Description
Test whether a metric captures changes trait space size, density and position.
Usage
test.metric(
data,
metric,
...,
shifts,
shift.options,
model,
replicates = 3,
steps = 10,
dimensions,
verbose = FALSE,
save.steps = FALSE
)
Arguments
data |
A matrix or a |
metric |
A vector containing one to three functions. At least of must be a dimension-level 1 or 2 function (see details). If |
... |
Optional arguments to be passed to the metric. |
shifts |
The types of shits to test, can be |
shift.options |
Optional, a |
model |
Optional, which model to fit for testing the metric. See details. |
replicates |
A |
steps |
The number of steps in the space reduction to output between 10% and 100%. By default |
dimensions |
Optional, a |
verbose |
A |
save.steps |
A |
Details
For the three non-random shifts: "size"
, "density"
, "evenness"
and "position"
, the function returns both of shifts as:
-
"size.inner"
and"size.outer"
removing data from the edges or the centre respectively (contracting the size and "hollowing" it respectively). -
"density.higher"
and"density.lower"
removing data to increase or decrease density respectively (increasing/decreasing nearest neighbour distance). -
"evenness.flattened"
and"evenness.compacted"
removing data to from the centre of the distribution or from the edges to resepectively "flatten" or "condense" the distribution. -
"position.top"
and"position.bottom"
removing data from one side or the other of the space (the sides are selected from the point with lowest/highest scores on each dimensions respectively).
See figure 2 in Guillerme et al. 2020 for more details.
The default model
is a linear model using the following function:
model = function(data) lm(disparity ~ reduction, data)
You can provide your own as long as it is a single function with data
as a single argument. The two terms from data should be called reduction
for the variable on the x axis and disparity
for the variable on the y axis. For example:
model = function(data) nls(disparity ~ a*reduction/(b+reduction), data)
Note that models (like this example) should be specific to the dataset. Any type of model can be fitted but only the ones with an associated summary
function will be correctly displayed by summary.dispRity
.
To not run any model, use model = NULL
.
Value
This function outputs a dispRity
object containing a list of simulated reductions in trait space. The results can be accessed through the usual S3 methods (print
, summary
, plot
) or accessed directly through x$<name_of_the_shift>
(e.g. x$random
for the random shift results).
Author(s)
Thomas Guillerme
References
Guillerme T, Puttick MN, Marcy AE, Weisbecker V. 2020 Shifting spaces: Which disparity or dissimilarity measurement best summarize occupancy in multidimensional spaces?. Ecol Evol. 2020;00:1-16. (doi:10.1002/ece3.6452)
See Also
reduce.space
dispRity
plot.dispRity
Examples
## Creating a 2D uniform space
space <- space.maker(300, 2, runif)
## A simple test with only 1 replicate for two shifts (random and size):
simple_test <- test.metric(space, metric = c(prod, ranges),
replicates = 1, shifts = c("random", "size"))
## Summarising the tests
summary(simple_test)
## Visualising the test
plot(simple_test)
## Applying the test directly on a disparity object
data(disparity)
median_centroid_test <- test.metric(disparity, shifts = "size")
## Summarising the tests
summary(median_centroid_test)
## Visualising the test
plot(median_centroid_test)
## Not run:
## Note that the tests can take several minutes to run.
## Testing the sum of variance on all shifts
sum_var_test <- test.metric(space, metric = c(sum, variances),
shifts = c("random", "size", "density", "position"))
## Summarising the tests
summary(sum_var_test)
## Visualising the test
plot(sum_var_test)
## Creating a 2D uniform space
space <- space.maker(300, 2, runif)
## Re-running the test on two shifts with data saving for visualisation
median_centroid_test <- test.metric(space,
metric = c(median, centroids),
shifts = c("random", "size"),
save.steps = TRUE)
## Visualising the tests results and display the shifts visualisation
plot(median_centroid_test)
## End(Not run)
Extracting the age of nodes and tips in a tree.
Description
Extracting the age of each node and tip in a tree give the height of the tree or some specified age.
Usage
tree.age(tree, age, order = "past", fossil = TRUE, digits = 4)
Arguments
tree |
A |
age |
The age of the tree. If missing the age is set to be the tree height. |
order |
Either "past" if the units express time since the present (e.g. million years ago), or "present" if the unit is expressed in time since the root. |
fossil |
|
digits |
A |
Details
When fossil = TRUE
, if the tree
contains a tree$root.time
element (for tree's root age), and that order
is set to "past"
, the output ages are adjusted to be starting from the root.time. Else, if no tree$root.time
exists or fossil = FALSE
, tips and nodes age is relative from the tip furthest away from the root. THIS FUNCTION DOES NOT ESTIMATE TREE AGES, it just extracts branch length information and converts it into time units. For basic dating functions in R, check chronos
, chronopl
, chronoMPL
or use more specialised dating software (e.g. MrBayes, BEAST, RAxML, etc.).
Author(s)
Thomas Guillerme
See Also
Examples
## A dated random phylogeny with a root 50 units of time old.
tree.age(rtree(10), age = 50)
## A random tree with the distance since the root.
tree.age(rtree(10), order = 'present')