Title: | Measuring Multivariate Dependence Using Distance Multivariance |
Version: | 2.4.1 |
Date: | 2021-10-06 |
Description: | Distance multivariance is a measure of dependence which can be used to detect and quantify dependence of arbitrarily many random vectors. The necessary functions are implemented in this packages and examples are given. It includes: distance multivariance, distance multicorrelation, dependence structure detection, tests of independence and copula versions of distance multivariance based on the Monte Carlo empirical transform. Detailed references are given in the package description, as starting point for the theoretic background we refer to: B. Böttcher, Dependence and Dependence Structures: Estimation and Visualization Using the Unifying Concept of Distance Multivariance. Open Statistics, Vol. 1, No. 1 (2020), <doi:10.1515/stat-2020-0001>. |
Depends: | R (≥ 3.3.0) |
License: | GPL-3 |
Encoding: | UTF-8 |
LazyData: | true |
Imports: | igraph, graphics, stats, Rcpp, microbenchmark |
RoxygenNote: | 7.1.2 |
Suggests: | testthat |
LinkingTo: | Rcpp |
NeedsCompilation: | yes |
Packaged: | 2021-10-06 15:06:41 UTC; BB |
Author: | Björn Böttcher [aut, cre], Martin Keller-Ressel [ctb] |
Maintainer: | Björn Böttcher <bjoern.boettcher@tu-dresden.de> |
Repository: | CRAN |
Date/Publication: | 2021-10-06 15:50:05 UTC |
multivariance: Measuring Multivariate Dependence Using Distance Multivariance
Description
The multivariance package provides basic functions to calculate distance multivariance and related quantities. To test independence use multivariance.test
, it provides an interface (via its arguments) to all the tests based on distance (m-/total-)multivariance. The package offers also several other functions related to distance multivariance, e.g. a detection and visualization of dependence structures dependence.structure
. See below for details on the full content of the package.
Details
Distance multivariance is a multivariate dependence measure, which can be used to detect dependencies between an arbitrary number of random vectors each of which can have a distinct dimension. The necessary functions are implemented in this package, and examples are given. For the theoretic background we refer to the papers [1,2,3,4,5,6]. Paper [3] includes a summary of the first two. It is the recommended starting point for users with an applied interest. Paper [4] is concerned with new (faster) p-value estimates for the independence tests, [5] introduces the copula versions of distance multivariance, [6] discusses the quantification of dependence using distance multicorrelations.
The (current) code is speed improved in comparison to the former releases. Certainly there is still room for improvement and development. Questions, comments and remarks are welcome: bjoern.boettcher@tu-dresden.de
For infos on the latest changes and/or updates to the package use news(package="multivariance")
.
To cite this package use the standard citation for R packages, i.e., the output of citation("multivariance")
.
Multivariance
multivariance
computes the distance multivariance
total.multivariance
computes the total distance multivariance
m.multivariance
computes the m-multivariance (introduced in [3])
It might be convenient to compute these simultaneously using multivariances.all
.
copula.multivariance
computes the copula versions of the above (introduced in [5])
multicorrelation
computes the multicorrelations (discussed specifically in [6])
Functions to use and interpret multivariance
rejection.level
computes a (conservative) rejection level for a given significance level. This can be used for a conservative interpretation of distance multivariance. The counterpart is multivariance.pvalue
, which computes a conservative p-value for a given distance multivariance. Both methods are distribution-free.
resample.rejection.level
and resample.pvalue
are the distribution dependent versions of the above. They are approximately sharp, but computational more expensive. Any resampling is done by resample.multivariance
.
Using the methods developed in [4] approximate p-value estimates are provided by pearson.pvalue
. This method is much faster than the resampling method.
multivariance.test
provides the corresponding tests of independence. The former provides output as common for tests in R.
cdm
and cdms
compute the doubly centered distance matrix and matrices, respectively. These can be used to speed up repeated computations of distance multivariance.
In [4] various methods to estimate the moments of the test statistic under H0 were developed, these are (implicitly) implemented in this package only for the moments used in pearson.pvalue
. Further and explicit functions can be added upon request. Please feel free to contact the author.
emp.transf
computes the Monte Carlo empirical transform of the data. This data yields the copula version of distance multivariance. Hereto note, that values become randomized due to the "Monte Carlo empirical transform", i.e., the copula versions yield in a finite sample setting not identical values for repeated runs.
For planing of large projects or studies it might be convenient to estimate the computation time of multivariance via multivariance.timing
.
Dependence structures
dependence.structure
performs the dependence structure detection algorithm as described in [3].
find.cluster
is the basic building block of dependence.structure
. It is recommended to use dependence.structure
.
Examples
coins
and tetrahedron
generate samples of pairwise independent random variables, with dependence of higher order.
dep_struct_iterated_13_100
, dep_struct_ring_15_100
, dep_struct_several_26_100
and dep_struct_star_9_100
are example data sets for the dependence structure detection. These might also serve as benchmark examples.
anscombe.extended
provides an extension of Anscombe's Quartett. It illustrates that a large value of Pearson's correlation can occur for very different dependencies and that this is not a small-sample problem. These dependencies are at least partly differentiated by values of distance multicorrelation.
References
[1] B. Böttcher, M. Keller-Ressel, R.L. Schilling, Detecting independence of random vectors: generalized distance covariance and Gaussian covariance. Modern Stochastics: Theory and Applications, Vol. 5, No. 3(2018) 353-383. https://www.vmsta.org/journal/VMSTA/article/127/info
[2] B. Böttcher, M. Keller-Ressel, R.L. Schilling, Distance multivariance: New dependence measures for random vectors. The Annals of Statistics, Vol. 47, No. 5 (2019) 2757-2789. doi: 10.1214/18-AOS1764
[3] B. Böttcher, Dependence and Dependence Structures: Estimation and Visualization using the Unifying Concept of Distance Multivariance. Open Statistics, Vol. 1, No. 1 (2020) 1-46. doi: 10.1515/stat-2020-0001
[4] G. Berschneider, B. Böttcher, On complex Gaussian random fields, Gaussian quadratic forms and sample distance multivariance. Preprint. https://arxiv.org/abs/1808.07280
[5] B. Böttcher, Copula versions of distance multivariance and dHSIC via the distributional transform – a general approach to construct invariant dependence measures. Statistics, (2020) 1-18. doi: 10.1080/02331888.2020.1748029
[6] B. Böttcher, Notes on the interpretation of dependence measures – Pearson's correlation, distance correlation, distance multicorrelations and their copula versions. Preprint. https://arxiv.org/abs/2004.07649
Computes the explicit coefficients for the finite sample variance for a sample of size N
Description
Computes the explicit coefficients for the finite sample variance for a sample of size N
Usage
N.coefficients(N)
Extended Anscombe's Quartett
Description
The dataset extends 'anscombe' provided in the standard R-package 'datasets'. All examples feature the same correlation of 0.82, but different types of dependencies. The main aim was to extend the classical examples, which have sample size 11, to larger sample sizes. This illustrates that the implied problems of Pearson's correlation are not small sample problems! Distance multicorrelation (which coincides in this case with distance correlation) yields different values for the datasets.
Usage
anscombe.extended
Format
list
with elements:
anscombe.extended$N11
matrix with 11 samples for 5 examples the first 4 are the classical Anscombe Quartett, the fifth is a monoton relation which also features the same correlation.-
anscombe.extended$N100
same as above but 100 samples -
anscombe.extended$N1000
same as above but 1000 samples
Details
Note: Anscombe's quartett features further identical parameters besides Pearson's correlation. The extended set is only concerned with correlation.
References
This example was introduced in the reference [6] given on the main help page of this package: multivariance-package.
Examples
# Code which generates plots of all included data:
op = par(mfrow = c(3,5),mar = c(0.5,0.5,3,0.5))
for (name in c("N11","N100","N1000")) {
for (i in 1:5) {
x = anscombe.extended[[name]][,2*i-1]
y = anscombe.extended[[name]][,2*i]
plot(x,y,main = paste0("cor = ",round(cor(x,y),2),
"\n Mcor = ",round(multicorrelation(cbind(x,y),type = "pairwise",squared = FALSE),2),
"\n CMcor = ",round(copula.multicorrelation(cbind(x,y),type = "pairwise",squared = FALSE),2)),
axes = FALSE,xlab ="",ylab = "", cex.main=1)
# for two variables 'pairwise' coincides with
# both values of 'total.upper.lower'.
box()
}
}
par(op)
computes a doubly centered distance matrix
Description
computes the doubly centered distance matrix
Usage
cdm(
x,
normalize = TRUE,
psi = NULL,
p = NULL,
isotropic = FALSE,
external.dm.fun = NULL
)
Arguments
x |
matrix, each row of the matrix is treated as one sample |
normalize |
logical, indicates if the matrix should be normalized |
psi |
if it is |
p |
numeric, if it is a value between 1 and 2 then the Minkowski distance with parameter p is used. |
isotropic |
logical, indicates if psi of the Euclidean distance matrix should be computed, i.e., if an isotropic distance should be used. |
external.dm.fun |
here one can supply an external function, which computes the distance matrix given |
Details
The doubly centered distance matrices are required for the computation of (total / m-) multivariance.
If normalize = TRUE
then the value of multivariance is comparable and meaningful. It can be compared to the rejection.level
or its p-value multivariance.pvalue
can be computed.
More details: If normalize = TRUE
the matrix is scaled such that the multivariance based on it, times the sample size, has in the limit - in the case of independence - the distribution of an L^2 norm of a Gaussian process with known expectation.
As default the Euclidean distance is used. The parameters psi
, p
, isotropic
and external.dm.fun
can be used to select a different distance. In particular, external.dm.fun
can be used to provide any function which calculates a distance matrix for the rows of a given matrix.
References
For the theoretic background see the references given on the main help page of this package: multivariance-package.
Examples
x = coins(100)
cdm(x) # fast euclidean distances
cdm(x,psi = function(x,y) sqrt(sum((x-y)^2))) # this is identical to the previous (but slower)
# the function cdm does the following three lines in a faster way
N = nrow(x)
C = diag(N) - matrix(1/N,nrow = N,ncol = N)
A = - C %*% as.matrix(stats::dist(x,method="euclidean")) %*% C #'
all(abs(A- cdm(x,normalize = FALSE)) < 10^(-12))
given the sample of a single variable the doubly centered distance matrix, mu (the limit moments) and bcd (the terms for the finite sample moments) are computed
Description
The normalization should be postponed to the moment calculation.
Usage
cdm.mu.bcd(
x,
normalize = FALSE,
psi = NULL,
p = NULL,
isotropic = FALSE,
unbiased.moments = TRUE,
external.dm.fun = NULL
)
computes the doubly centered distance matrices
Description
computes the doubly centered distance matrices
Usage
cdms(x, vec = 1:ncol(x), membership = NULL, ...)
Arguments
x |
matrix, each row is a sample |
vec |
vector which indicates which columns are treated as one sample |
membership |
depreciated. Now use |
... |
these are passed to |
Value
It returns a list of distance matrices.
computes the doubly centered distance matrices, mus and bcds
Description
computes the doubly centered distance matrices, mus and bcds
Usage
cdms.mu.bcd(x, vec = 1:ncol(x), membership = NULL, cdm.normalize = TRUE, ...)
Arguments
x |
matrix, each row is a sample |
vec |
vector which indicates which columns are treated as one sample |
membership |
depreciated. Now use |
... |
these are passed to |
Value
A list containing the following components:
list.cdm
list of the doubly centered distance matrices - these are always normalized if 'cdm.normalize = TRUE'!!!,
mu
matrix with the limit moments in a column for each variable,
bcd
matrix with b, c, d (which are required for the computation of the finite sample moments) in columns for each variable,
mean
vector with the mean of each distance matrix.
calculates the coordinates of n points on a circle of radius r if only 1 inner point, then it is placed in the center
Description
calculates the coordinates of n points on a circle of radius r if only 1 inner point, then it is placed in the center
Usage
circle.coordinates(n, r = 0.5, add.angle = 0)
cleanup dependence structure graph
Description
Given a dependence structure graph: vertices representing the multivariances of only two vertices can be turned into an edge labeled with the label of the vertex. Moreover, only subsets of the graph can be selected.
Usage
clean.graph(
g,
only.level = NULL,
simplify.pairs = TRUE,
drop.label.pairs = FALSE
)
Arguments
g |
graph, created by |
only.level |
integer vector, if provided all edges and dependency nodes corresponding to dependence orders not given in 'only.level' are removed |
simplify.pairs |
boolean, if true dependency nodes which are only connected to two variables are turned into edges |
drop.label.pairs |
boolean, if true the labels for edges indicating pairwise dependence are removed |
Details
Note: The option 'only.level' works only properly for a full dependence structure graph, in the case of a clustered dependence structure graph dependency nodes representing a cluster might be removed.
Value
graph
Examples
N = 200
y = coins(N,2)
x = cbind(y,y,y)
ds = dependence.structure(x,structure.type = "clustered")
plot(clean.graph(ds$graph))
plot(clean.graph(ds$graph,only.level = 2))
plot(clean.graph(ds$graph,only.level = 3)) # of limited use for a clustered graph,
# i.e., here the three-dependence node without edges indicates that
# all edges were connected to clusters
ds = dependence.structure(x,structure.type = "full")
plot(clean.graph(ds$graph))
plot(clean.graph(ds$graph,drop.label.pairs = TRUE))
plot(clean.graph(ds$graph,only.level = 2))
plot(clean.graph(ds$graph,only.level = 2,drop.label.pairs = TRUE))
plot(clean.graph(ds$graph,only.level = 3))
dependence example: k-independent coin sampling
Description
This function creates samples which are dependent but k-independent.
Usage
coins(N = 1000, k = 2, type = "even")
Arguments
N |
number of samples |
k |
each k-tuple will be independent |
type |
one of |
Details
Throw k
independent fair coins. Now consider
the k+1 events: The first shows head, the second shows head,... the k
-th shows head,
there is an even
(or odd
as selected via type
) number of heads. Each row
contains the state of these k+1 events.
Value
It returns the samples as rows of an N
by k+1
matrix. The columns are dependent but k-independent.
References
For the theoretic background see the reference [3] given on the main help page of this package: multivariance-package.
Examples
coins(200,4)
coupla versions of distance multicorrelation
Description
Formally it is nothing but distance multicorrelation applied to the Monte Carlo emprical transform of the data. Hence its values vary for repeated runs.
Usage
copula.multicorrelation(x, vec = 1:ncol(x), ...)
CMcor(x, vec = 1:ncol(x), ...)
Arguments
x |
either a data matrix or a list of doubly centered distance matrices |
vec |
if x is a matrix, then this indicates which columns are treated together as one sample; if x is a list, these are the indexes for which the multivariance is calculated. The default is all columns and all indexes, respectively. |
... |
are passed to |
References
For the theoretic background see the reference [5] given on the main help page of this package: multivariance-package.
See Also
independence tests using the copula versions of distance multivariance
Description
Formally it is nothing but tests for distance multivariance applied to the Monte Carlo emprical transform of the data. Hence its values vary for repeated runs.
Usage
copula.multicorrelation.test(x, vec = 1:ncol(x), ...)
Arguments
x |
matrix, each row is a sample |
vec |
vector which indicates which columns are treated as one sample |
... |
these are passed to |
References
For the theoretic background see the reference [5] given on the main help page of this package: multivariance-package.
copula version of distance multivariance
Description
Formally it is nothing but distance multivariance applied to the Monte Carlo emprical transform of the data. Hence its values vary for repeated runs.
Usage
copula.multivariance(x, vec = 1:ncol(x), type = "total", ...)
Arguments
x |
either a data matrix or a list of doubly centered distance matrices |
vec |
if x is a matrix, then this indicates which columns are treated together as one sample; if x is a list, these are the indexes for which the multivariance is calculated. The default is all columns and all indexes, respectively. |
type |
default: "total.lower.upper", for details and other options see below |
... |
these are passed to |
References
For the theoretic background see the reference [5] given on the main help page of this package: multivariance-package.
functions which are required for the calculation of the finite sample expectation and variance for m-multivariance and total multivariance
Description
functions which are required for the calculation of the finite sample expectation and variance for m-multivariance and total multivariance
Usage
d2(a, b)
example dataset for dependence.structure
Description
It was generated by
set.seed(532333356) N = 100 x = matrix(sample.int(2,10*N,replace = TRUE)-1,ncol = 10) for (i in c(2,5,9)) x = cbind(x,(rowSums(as.matrix(x[,1:(i-1)])) dep_struct_iterated_13_100 = x save(dep_struct_iterated_13_100,file ="dep_struct_iterated_13_100.rda")
Usage
dep_struct_iterated_13_100
Format
matrix
13 variables (columns), 100 independent samples (rows)
Details
To avoid irritation, note that the seed is just a simple integer hash value of the variable name.
example dataset for dependence.structure
Description
It was generated by
set.seed(436646700) N = 100 n= 15 x=matrix(sample.int(2,N*n,replace = TRUE)-1,nrow =N) x[,4] = rowSums(x[,1:3]) x[,7] = rowSums(x[,4:6]) x[,10] = rowSums(x[,7:9]) x[,13] = rowSums(x[,10:12]) x[,15] = rowSums(x[,c(13,14,1)]) dep_struct_ring_15_100 = x save(dep_struct_ring_15_100,file ="dep_struct_ring_15_100.rda")
Usage
dep_struct_ring_15_100
Format
matrix
15 variables (columns), 100 independent samples (rows)
Details
To avoid irritation, note that the seed is just a simple integer hash value of the variable name.
example dataset for dependence.structure
Description
It was generated by
set.seed(1348879148) N = 100 dep_struct_several_26_100 = cbind(coins(N,2),tetrahedron(N),coins(N,4), tetrahedron(N),tetrahedron(N),coins(N,3),coins(N,3),rnorm(N)) save(dep_struct_several_26_100,file ="dep_struct_several_26_100.rda")
Usage
dep_struct_several_26_100
Format
matrix
26 variables (columns), 100 independent samples (rows)
Details
To avoid irritation, note that the seed is just a simple integer hash value of the variable name.
example dataset for dependence.structure
Description
It was generated by
set.seed(222454572) N = 100 y = coins(N,2) dep_struct_star_9_100 = cbind(y,y,y) save(dep_struct_star_9_100,file ="dep_struct_star_9_100.rda")
Usage
dep_struct_star_9_100
Format
matrix
9 variables (columns), 100 independent samples (rows)
Details
To avoid irritation, note that the seed is just a simple integer hash value of the variable name.
determines the dependence structure
Description
Determines the dependence structure as described in [3].
Usage
dependence.structure(
x,
vec = 1:ncol(x),
verbose = TRUE,
detection.aim = NULL,
type = "conservative",
structure.type = "clustered",
c.factor = 2,
list.cdm = NULL,
alpha = 0.05,
p.adjust.method = "holm",
stop.too.many = NULL,
...
)
Arguments
x |
matrix, each row of the matrix is treated as one sample |
vec |
vector, it indicates which columns are initially treated together as one sample |
verbose |
boolean, if |
detection.aim |
|
type |
the method used for the detection, one of ' |
structure.type |
either the ' |
c.factor |
numeric, larger than 0, a constant factor used in the case of ' |
list.cdm |
not required, the list of doubly centered distance matrices corresponding to |
alpha |
numeric between 0 and 1, the significance level used for the tests |
p.adjust.method |
a string indicating the p-value adjustment for multiple testing, see |
stop.too.many |
numeric, upper limit for the number of tested tuples. A warning is issued if it is used. Use |
... |
these are passed to |
Details
Performs the detection of the dependence structure as described in [3]. In the clustered
structure variables are clustered and treated as one variable as soon as a dependence is detected, the full
structure treats always each variable separately. The detection is either based on tests with significance level alpha
or a consistent
estimator is used. The latter yields (in the limit for increasing sample size) under very mild conditions always the correct dependence structure (but the convergence might be very slow).
If fixed.rejection.level
is not provided, the significance level alpha
is used to determine which multivariances are significant using the distribution-free rejection level. As default the Holm method is used for p-value correction corresponding to multiple testing.
The resulting graph can be simplified (pairwise dependence can be represented by edges instead of vertices) using clean.graph
.
Advanced:
The argument detection.aim
is currently only implemented for structure.type = clustered
. It can be used to check, if an expected dependence structure was detected. This might be useful for simulation studies to determine the empirical power of the detection algorithm. Hereto detection.aim
is set to a list of vectors which indicate the expected detected dependence structures (one for each run of find.cluster
). The vector has as first element the k
for which k-tuples are detected (for this aim the detection stops without success if no k-tuple is found), and the other elements, indicate to which clusters all present vertices belong after the detection, e.g. c(3,2,2,1,2,1,1,2,1)
expects that 3-tuples are detected and in the graph are 8 vertices (including those representing the detected 3 dependencies), the order of the 2's and 1's indicate which vertices belong to which cluster. If detection.aim
is provided, the vector representing the actual detection is printed, thus one can use the output with copy-paste to fix successively the expected detection aims.
Note that a failed detection might invoke the warning:
run$mem == detection.aim[[k]][-1] : longer object length is not a multiple of shorter object length
Value
returns a list with elements:
multivariances
calculated multivariances,
cdms
calculated doubly centered distance matrices,
graph
graph representing the dependence structure,
detected
boolean, this is only included if a
detection.aim
is given,number.of.dep.tuples
vector, with the number of dependent tuples for each tested order. For the full dependence structure a value of -1 indicates that all tuples of this order are already lower order dependent, a value of -2 indicates that there were more than
stop.too.many
tuples,structure.type
either
clustered
orfull
,type
the type of p-value estimation or consistent estimation used,
total.number.of.tests
numeric vector, with the number of tests for each group of tests,
typeI.error.prob
estimated probability of a type I error,
alpha
significance level used if a p-value estimation procedure is used,
c.factor
factor used if a consistent estimation procedure is used,
parameter.range
significance levels (or 'c.factor' values) which yield the same detection result.
References
For the theoretic background see the reference [3] given on the main help page of this package: multivariance-package.
Examples
# structures for the datasets included in the package
dependence.structure(dep_struct_several_26_100)
dependence.structure(dep_struct_star_9_100)
dependence.structure(dep_struct_iterated_13_100)
dependence.structure(dep_struct_ring_15_100)
# basic examples:
x = coins(100) # 3-dependent
dependence.structure(x)
colnames(x) = c("A","B","C")
dependence.structure(x) # names of variables are used as labels
dependence.structure(coins(100),vec = c(1,1,2))
# 3-dependent rv of which the first two rv are used together as one rv, thus 2-dependence.
dependence.structure(x,vec = c(1,1,2)) # names of variables are used as labels
dependence.structure(cbind(coins(200),coins(200,k=5)),verbose = TRUE)
#1,2,3 are 3-dependent, 4,..,9 are 6-dependent
# similar to the the previous example, but
# the pair 1,3 is treated as one sample,
# anagously the pair 2,4. In the resulting structure one does not
# see anymore that the dependence of 1,2,3,4 with the rest is due
# to 4.
dependence.structure(cbind(coins(200),coins(200,k=5)),
vec = c(1,2,1,2,3,4,5,6,7),verbose = TRUE)
### Advanced:
# How to check the empirical power of the detection algorithm?
# Use a dataset for which the structure is detected, e.g. dep_struct_several_26_100.
# run:
dependence.structure(dep_struct_several_26_100,
detection.aim = list(c(ncol(dep_struct_several_26_100))))
# The output provides the first detection aim. Now we run the same line with the added
# detection aim
dependence.structure(dep_struct_several_26_100,detection.aim = list(c(3,1, 1, 1, 2, 2, 2, 3, 4,
5, 6, 7, 8, 8, 8, 9, 9, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 1, 2, 8, 9),
c(ncol(dep_struct_several_26_100))))
# and get the next detection aim ... thus we finally obtain all detection aims.
# now we can run the code with new sample data ....
N = 100
dependence.structure(cbind(coins(N,2),tetrahedron(N),coins(N,4),tetrahedron(N),
tetrahedron(N),coins(N,3),coins(N,3),rnorm(N)),
detection.aim = list(c(3,1, 1, 1, 2, 2, 2, 3, 4, 5, 6, 7, 8, 8, 8,
9, 9, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 1, 2, 8, 9),
c(4,1, 1, 1, 2, 2, 2, 3, 4, 5, 6, 7, 8, 8, 8, 9, 9, 9, 10, 10, 10, 10, 11, 11, 11,
11, 12, 1, 2, 8, 9, 10, 11),
c(5, 1, 1, 1, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 5, 5, 5, 6, 6, 6, 6, 7, 7, 7, 7, 8, 1,
2, 4, 5, 6, 7, 3),
c(5, 1, 1, 1, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 5, 5, 5, 6, 6, 6, 6, 7, 7, 7, 7, 8, 1,
2, 4, 5, 6, 7, 3)))$detected
# ... and one could start to store the results and compute the rate of successes.
# ... or one could try to check how many samples are necessary for the detection:
re = numeric(100)
for (i in 2:100) {
re[i] =
dependence.structure(dep_struct_several_26_100[1:i,],verbose = FALSE,
detection.aim = list(c(3,1, 1, 1, 2, 2, 2, 3, 4, 5, 6, 7, 8,
8, 8, 9, 9, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 1, 2, 8, 9),
c(4,1, 1, 1, 2, 2, 2, 3, 4, 5, 6, 7, 8, 8, 8, 9, 9, 9, 10, 10, 10, 10, 11, 11,
11, 11, 12, 1, 2, 8, 9, 10, 11),
c(5, 1, 1, 1, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 5, 5, 5, 6, 6, 6, 6, 7, 7, 7, 7,
8, 1, 2, 4, 5, 6, 7, 3),
c(5, 1, 1, 1, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 5, 5, 5, 6, 6, 6, 6, 7, 7, 7, 7,
8, 1, 2, 4, 5, 6, 7, 3)))$detected
print(paste("First", i,"samples. Detected?", re[i]==1))
}
cat(paste("Given the 1 to k'th row the structure is not detected for k =",which(re == FALSE),"\n"))
functions to detect the full (without clustering) dependence structure
Description
functions to detect the full (without clustering) dependence structure
Usage
dependence.structure.full(
x,
vec = 1:ncol(x),
verbose = TRUE,
type = "conservative",
alpha = 0.05,
list.cdm = NULL,
maxk = max(vec),
stop.too.many = NULL,
c.factor = 2,
...
)
Examples
# multivariance:::dependence.structure.full(dep_struct_ring_15_100)
# dependence.structure(dep_struct_ring_15_100,structure.type = "full")
transforms a distance matrix to a matrix
Description
Does for a distance matrix generated via dist
the same as as.matrix
only slightly faster.
Usage
dist.to.matrix(ds)
Arguments
ds |
a distance matrix object, e.g. generated by |
distance matrix
Description
# currently only used for the bias corrected multicorrelations # it might be used globally to remove redundancies
Usage
dm(x, psi = NULL, p = NULL, isotropic = FALSE, external.dm.fun = NULL)
list of distance matrices
Description
# currently only used for the bias corrected multicorrelations # it might be used globally, to remove redundancies
Usage
dms(x, vec = 1:ncol(x), ...)
double centering of a matrix
Description
# changed default after 2.0.0
Usage
double.center(dm, normalize = TRUE)
bias corrected double centering # included for speed comparison
Description
bias corrected double centering # included for speed comparison
Usage
doubleCenterBiasCorrected(x)
bias corrected double centering with normalizing # included for speed comparison
Description
bias corrected double centering with normalizing # included for speed comparison
Usage
doubleCenterBiasCorrectedUpper(x)
bias corrected double centering with normalizing constants for upper and lower bound
Description
bias corrected double centering with normalizing constants for upper and lower bound
Usage
doubleCenterBiasCorrectedUpperLower(x, n)
double center a symmetric matrix
Description
double center a symmetric matrix
Usage
doubleCenterSymMat(x, normalize)
Arguments
x |
symmetric matrix |
normalize |
boolean. If |
Monte Carlo empirical transform
Description
Transforms a matrix (rows: samples, columns: variables) into a matrix of uniform samples with the same dependence structure via the Monte Carlo empirical transform.
Usage
emp.transf(x, continuous = FALSE)
Arguments
x |
data matrix (rows: samples, columns: variables) |
continuous |
boolean, if TRUE it provides the classical (non-Monte-Carlo) transformation by the empirical distribution function, which is a reasonable choice for data of continuous distributions. |
References
For the theoretic background see the reference [5] given on the main help page of this package: multivariance-package.
A dependent Monte Carlo emprical transform
Description
It is dependendent, since each component uses the same uniformly distributed sample.
Usage
emp.transf.dep(x)
Transform a vector of samples into a vector of samples of the uniform distribution such that, if applied to multiple (dependent) sample vectors, the dependence is preserved.
Description
Transform a vector of samples into a vector of samples of the uniform distribution such that, if applied to multiple (dependent) sample vectors, the dependence is preserved.
Usage
emp.transf.vec(x, unif.samples = stats::runif(length(x)))
Examples
# x = rnorm(100)
# plot(emp.transf.vec(x))
fast centered Euclidean distance matrix
Description
fast centered Euclidean distance matrix
Usage
fastEuclideanCdm(x, normalize)
Arguments
x |
matrix with sample rows for which the distance matrix is computed (to use with vectors, use |
normalize |
boolean. If |
fast Euclidean distance matrix
Description
fast Euclidean distance matrix
Usage
fastdist(x)
Arguments
x |
matrix with sample rows for which the distance matrix is computed (to use with vectors, use |
Examples
#require(microbenchmark)
#x = rnorm(100)
#microbenchmark(fastdist(as.matrix(x)),as.matrix(dist(x)))
cluster detection
Description
Performs the detection of dependence structures algorithm until a cluster is found. This function is the basic building block dependence.structure
. Advanced users, might use it directly.
Usage
find.cluster(
x,
vec = 1:ncol(x),
list.cdm = cdms(x, vec = vec),
mem = as.numeric(1:max(vec)),
cluster.to.vertex = 1:max(mem),
vertex.to.cdm = 1:max(mem),
previous.n.o.cdms = rep(0, max(mem)),
all.multivariances = numeric(0),
g = igraph::add.vertices(igraph::graph.empty(, directed = FALSE), max(mem), label =
sapply(1:max(mem), function(r) paste(colnames(x, do.NULL = FALSE, prefix = "")[vec ==
r], collapse = ",")), shape = "circle"),
fixed.rejection.level = NA,
alpha = 0.05,
p.adjust.method = "holm",
verbose = TRUE,
kvec = 2:max(mem),
parameter.range = NULL,
type = "conservative",
stop.too.many = NULL,
...
)
Arguments
x |
matrix with the samples |
vec |
vector, it indicates which columns are initially treated together as one sample |
list.cdm |
list of doubly centered distance matrices |
mem |
numeric vector, its length is the number of vertices, its content is the number of the corresponding cluster for the current iteration, i.e., vertex |
cluster.to.vertex |
vector, contains the cluster to vertex relations, i.e., |
vertex.to.cdm |
vector, contains the vertex to doubly centered distance matrix relations, i.e., |
previous.n.o.cdms |
vector, number of the doubly centered distance matrices in the previous iteration (it is used to ensure that previously check tuples are not checked again) |
all.multivariances |
vector, which contains all distance multivariances which have been calculated so far. Only used to finally return all distance multivariances which have been calculated. |
g |
dependence structure graph |
fixed.rejection.level |
vector, if not |
alpha |
numeric, significance level used for the (distribution-free) tests |
p.adjust.method |
name of the method used to adjust the p-values for multiple testing, see |
verbose |
boolean, if |
kvec |
vector, k-tuples are only checked for each k in |
parameter.range |
numeric matrix, which hosts the range of significance levels or ' |
type |
the method for the detection, one of ' |
stop.too.many |
numeric, upper limit for the number of tested tuples. A warning is issued if it is used. Use |
... |
are passed to |
Details
For further details see dependence.structure
.
test for independence
Description
Depreciated. Use multivariance.test
instead. It provides all options and returns test result in a standard R format.
Usage
independence.test(
x,
vec = 1:ncol(x),
alpha = 0.05,
type = "distribution_free",
verbose = TRUE,
...
)
Arguments
x |
either a data matrix or a list of doubly centered distance matrices |
vec |
if x is a matrix, then this indicates which columns are treated together as one sample; if x is a list, these are the indexes for which the multivariance is calculated. The default is all columns and all indexes, respectively. |
alpha |
significance level |
type |
one of |
verbose |
logical, if TRUE meaningful text output is generated. |
... |
these are passed to |
Details
This computes a test of independence for the columns of a sample matrix (required for the resampling test) or for given doubly centered distance matrices (only possible for the distribution-free test).
The "pearson_approx"
and "resample"
are approximately sharp. The latter is based on a resampling approach and thus much slower. The "distribution_free"
test might be very conservative.
The doubly centered distance matrices can be prepared by cdms
. But note that for the test based on Pearson's approximation and for the resampling test, the data matrix has to be given.
Value
Returns TRUE
if the hypothesis of independence is NOT rejected, otherwise FALSE
.
References
For the theoretic background see the references given on the main help page of this package: multivariance-package.
Examples
independence.test(coins(100)) #dependent sample which is 2-independent
independence.test(coins(100),type = "resample") #dependent sample which is 2-independent
independence.test(coins(100)[,2:3]) # independent sample
independence.test(coins(100)[,2:3],type = "resample") # independent sample
independence.test(coins(10),type = "resample") #dependent sample which is 2-independent
independence.test(coins(10)[,2:3],type = "resample") #dependent sample which is 2-independent
checks if a matrix is doubly centered
Description
!works only for the biased estimators
Usage
is.doubly.centered(mat)
Arguments
mat |
matrix |
Examples
multivariance:::is.doubly.centered(as.matrix(dist(rnorm(10))))
multivariance:::is.doubly.centered(multivariance:::double.center(as.matrix(dist(rnorm(10)))))
special igraph layout for the dependence structure visualization
Description
It places the variable nodes on an outer circle and the dependency nodes on an inner circle
Usage
layout_on_circles(g, n = sum(is.na(igraph::V(g)$level)))
Arguments
g |
graph |
n |
number of vertices on outer circle |
Details
This is the standard layout for the full dependence structure, since in this case there often too many nodes which make the other (usual) layout incomprehensible.
Examples
N = 200
y = coins(N,2)
x = cbind(y,y,y)
g = dependence.structure(x,structure.type = "clustered",verbose = FALSE)$graph
plot(g)
plot(g,layout = layout_on_circles(g))
check if lower order dependencies are present for the given tuple indices here 'm.values' is a list of boolean matrices. Matrix [[k]] corresponds to the k tuples. For each number of tuples, the first columns of the matrix always contain the indices of the tuples
Description
check if lower order dependencies are present for the given tuple indices here 'm.values' is a list of boolean matrices. Matrix [[k]] corresponds to the k tuples. For each number of tuples, the first columns of the matrix always contain the indices of the tuples
Usage
lower.order(tuple, m.values)
m distance multivariance
Description
Computes m distance multivariance.
Usage
m.multivariance(
x,
vec = NA,
m = 2,
Nscale = TRUE,
Escale = TRUE,
squared = TRUE,
...
)
Arguments
x |
either a data matrix or a list of doubly centered distance matrices |
vec |
if x is a matrix, then this indicates which columns are treated together as one sample; if x is a list, these are the indexes for which the multivariance is calculated. The default is all columns and all indexes, respectively. |
m |
|
Nscale |
if |
Escale |
if |
squared |
if |
... |
these are passed to |
Details
m-distance multivariance is per definition the scaled sum of certain distance multivariances, and it characterize m-dependence.
As a rough guide to interpret the value of total distance multivariance note:
Large values indicate dependence.
If the random variables are (m-1)-independent and
Nscale = TRUE
, values close to 1 and smaller indicate m-independence, larger values indicate dependence. In fact, in the case of independence the test statistic is a Gaussian quadratic form with expectation 1 and samples of it can be generated byresample.multivariance
.If the random variables are (m-1)-independent and
Nscale = FALSE
, small values (close to 0) indicate m-independence, larger values indicate dependence.
Since random variables are always 1-independent, the case m=2
characterizes pairwise independence.
Finally note, that due to numerical (in)precision the value of m-multivariance might become negative. In these cases it is set to 0. A warning is issued, if the value is negative and further than the usual (used by all.equal
) tolerance away from 0.
References
For the theoretic background see the reference [3] given on the main help page of this package: multivariance-package.
Examples
x = matrix(rnorm(3*30),ncol = 3)
# the following values are identical
m.multivariance(x,m =2)
1/choose(3,2)*(multivariance(x[,c(1,2)]) +
multivariance(x[,c(1,3)]) +
multivariance(x[,c(2,3)]))
# the following values are identical
m.multivariance(x,m=3)
multivariance(x)
# the following values are identical
1/4*(3*(m.multivariance(x,m=2)) + m.multivariance(x,m=3))
total.multivariance(x, Nscale = TRUE)
1/4*(multivariance(x[,c(1,2)], Nscale = TRUE) +
multivariance(x[,c(1,3)], Nscale = TRUE) +
multivariance(x[,c(2,3)], Nscale = TRUE) + multivariance(x, Nscale = TRUE))
Returns the row indices of matrix A which match with B Use the fast cpp implementation 'match_rows' instead. Function here just for reference.
Description
Returns the row indices of matrix A which match with B Use the fast cpp implementation 'match_rows' instead. Function here just for reference.
Usage
match.rows(A, B)
Examples
# A = t(utils::combn(10,3))
# B = A[sort(sample.int(nrow(A),10)),]
# match.rows(A,B)
for the fast detection of the full dependence structure
Description
Returns the row indicies of matrix A which match with B
Usage
match_rows(A, B)
Arguments
A |
matrix |
B |
matrix whose rows are subset of A |
Examples
# A = t(utils::combn(10,3))
# B = A[sort(sample.int(nrow(A),10)),]
# match_rows(A,B)
computes the moments as required for Pearson's approximation
Description
computes the moments as required for Pearson's approximation
Usage
moments.for.pearson(N, bcd, mu, mmean, type = "multi")
Arguments
N |
sample size |
bcd |
an array with b c d |
mu |
the limit moments |
mmean |
the means of the distance matrices |
Details
Note: It is currently only implemented for the case of normalized multivariance, i.e., the OUTPUT values correspond to normalized multivariance!
given the distance matrix the unbiased estimate for mu3 is computed
Description
given the distance matrix the unbiased estimate for mu3 is computed
Usage
mu3.unbiased(B, b2ob = sum(tcrossprod(B) * B))
distance multicorrelation
Description
Computes various types of sample distance multicorrelation as defined and discussed in [3,4,6].
Usage
multicorrelation(
x,
vec = 1:ncol(x),
type = "total.upper.lower",
multicorrelation.type = "normalized",
estimator.type = "bias.corrected",
squared = TRUE,
...
)
Mcor(
x,
vec = 1:ncol(x),
type = "total.upper.lower",
multicorrelation.type = "normalized",
estimator.type = "bias.corrected",
squared = TRUE,
...
)
Arguments
x |
either a data matrix or a list of doubly centered distance matrices |
vec |
if x is a matrix, then this indicates which columns are treated together as one sample; if x is a list, these are the indexes for which the multivariance is calculated. The default is all columns and all indexes, respectively. |
type |
default: "total.lower.upper", for details and other options see below |
multicorrelation.type |
one of |
estimator.type |
one of |
squared |
if |
... |
these are passed to |
Details
There exist many variants of distance multicorrelation as discussed in [6] – and only in specific cases a direct comparison of the values is meaningful.
The implemented options are:
-
total.upper.lower normalized bias.corrected
: default; bounded by 1; fast; population limit characterizes independence by 0 -
pairwise normalized bias.corrected
: bounded by 1; fast; population limit characterizes pairwise independence by 0 -
total.upper normalized biased
: biased versions of the above -
total.lower normalized biased
-
pairwise normalized biased
-
multi normalized biased
: population limit characterizes only in case of lower independence the independence of all variables by 0 -
m.multi.3 normalized biased
: population limit characterizes only in case of pairwise independence the 3-independence of all variables by 0 -
pairwise unnormalized biased
population limit characterizes pairwise independence by 0 and relation by similarity transforms by 1 -
multi unnormalized biased
: population limit characterizes only in case of lower independence the independence of all variables by 0 and relation by similarity transforms by 1 -
m.multi.3 unnormalized biased
: population limit characterizes only in case of pairwise independence the 3-independence of all variables by 0 and relation by similarity transforms by 1
Further details:
The "bias.corrected"
versions require a data matrix, since they compute bias corrected centered distance matricies.
For "multi"
the unnormalized and normalized version coincide if an even number of variables is considered. They usually differ if an odd number of variables is considered. If all variables are related by similarity transforms the unnormalized "unnormalized"
multicorrelations are 1.
For "pairwise"
an alias is "m.multi.2"
.
For total multicorrelation there is currently only a feasible empirical estimator for a lower or upper bound. These are upper and lower bounds for in the population setting. When using bias corrected estimators these are in general no proper bounds, but their range can be used as values for comparisons.
Value
Value of the multicorrelation(s).
References
For the theoretic background see the references [2,3,6] given on the main help page of this package: multivariance-package.
Examples
y = rnorm(100)
x = cbind(y,y*2,(y-2)/3,y+1,y*5) # all variables are related by similarity transforms
# compute all types of correlations for x:
for (ty in c("total.lower","total.upper","pairwise","m.multi.3","multi"))
for (mty in c("normalized"))
print(paste(format(multicorrelation(
x,type=ty,multicorrelation.type = mty,estimator.type = "biased")
,digits=3,nsmall = 3,width = 7),mty,ty,"correlation - biased estimate"))
for (ty in c("total.upper.lower","pairwise"))
for (mty in c("normalized"))
print(paste(format(multicorrelation(
x,type=ty,multicorrelation.type = mty,estimator.type = "bias.corrected")
,digits=3,nsmall = 3,width = 7),mty,ty,"correlation - bias corrected estimate"))
for (ty in c("m.multi.2","m.multi.3","multi"))
for (mty in c("unnormalized"))
print(paste(format(multicorrelation(
x,type=ty,multicorrelation.type = mty,estimator.type = "biased")
,digits=3,nsmall = 3,width = 7),mty,ty,"correlation - biased estimate"))
bias corrected total multicorrelations
Description
bias corrected total multicorrelations
Usage
multicorrelation.bias.corrected(
x,
vec = 1:ncol(x),
squared = FALSE,
type = "all",
...
)
distance multivariance
Description
Computes the distance multivariance, either for given data or a given list of doubly centered distance matrices.
Usage
multivariance(
x,
vec = NA,
Nscale = TRUE,
correlation = FALSE,
squared = TRUE,
...
)
Arguments
x |
either a data matrix or a list of doubly centered distance matrices |
vec |
if x is a matrix, then this indicates which columns are treated together as one sample; if x is a list, these are the indexes for which the multivariance is calculated. The default is all columns and all indexes, respectively. |
Nscale |
if |
correlation |
depreciated, please use the function |
squared |
if |
... |
these are passed to |
Details
If x
is a matrix and vec
is not given, then each column is treated as a separate sample. Otherwise vec
has to have as many elements as x
has columns and values starting from 1 up to the number of 'variables', e.g. if x
is an N
by 5 matrix and vec = c(1,2,1,3,1)
then the multivariance of the 1-dimensional variables represented by column 2 and 4 and the 3-dimensional variable represented by the columns 1,3,5 is computed.
As default it computes the normalized Nscaled squared multivariance, for a multivariance without normalization the argument normalize = FALSE
has to be passed to cdms
.
correlation = TRUE
yields values between 0 and 1. These can be interpreted similarly to classical correlations, see also multicorrelation
.
As a rough guide to interpret the value of distance multivariance note:
If the random variables are not (n-1)-independent, large values indicate dependence, but small values are meaningless. Thus in this case use
total.multivariance
.If the random variables are (n-1)-independent and
Nscale = TRUE
, values close to 1 and smaller indicate independence, larger values indicate dependence. In fact, in the case of independence the test statistic is a Gaussian quadratic form with expectation 1 and samples of it can be generated byresample.multivariance
.If the random variables are (n-1)-independent and
Nscale = FALSE
, small values (close to 0) indicate independence, larger values indicate dependence.
Finally note, that due to numerical (in)precision the value of multivariance might become negative. In these cases it is set to 0. A warning is issued, if the value is negative and further than the usual (used by all.equal
) tolerance away from 0.
References
For the theoretic background see the references given on the main help page of this package: multivariance-package.
Examples
multivariance(matrix(rnorm(100*3),ncol = 3)) #independent sample
multivariance(coins(100)) #dependent sample which is 2-independent
x = matrix(rnorm(100*2),ncol = 2)
x = cbind(x,x[,2])
multivariance(x) #dependent sample which is not 2-independent (thus small values are meaningless!)
multivariance(x[,1:2]) #these are independent
multivariance(x[,2:3]) #these are dependent
multivariance(x[,2:3],correlation = TRUE)
transform multivariance to p-value
Description
Computes a conservative p-value for the hypothesis of independence for a given multivariance / m-multivariance / total multivariance.
Usage
multivariance.pvalue(x)
Arguments
x |
value of a normalized |
Details
This is based on a distribution-free approach. The p-value is conservative, i.e. it might be much smaller. This is the counterpart to rejection.level
. For a less conservative approach see resample.pvalue
or pearson.pvalue
.
p-values larger than 0.215 might be incorrect, since the distribution-free estimate on which the computation is based only holds up to 0.215.
References
For the theoretic background see the references given on the main help page of this package: multivariance-package.
independence tests based on (total-/2-/3-) multivariance
Description
This performs the (specified by type
and p.value.type
) independence test for the columns of a sample matrix.
Usage
multivariance.test(
x,
vec = 1:ncol(x),
type = "total",
p.value.type = "pearson_approx",
verbose = TRUE,
...
)
Arguments
x |
matrix, each row is a sample |
vec |
vector which indicates which columns are treated as one sample |
type |
one of |
p.value.type |
one of |
verbose |
logical, if TRUE meaningful text output is generated. |
... |
these are passed to |
Details
For the use of vec
see the examples below and the more detailed explanation of this argument for multivariance
.
The types "independence"
and "total"
are identical: an independence test is performed.
Also the types "pairwise independence"
and "m.multi.2"
are identical: a test of pairwise independence is performed.
The type "m.multi.3"
, performs a test for 3-independence, assuming pairwise independence. The type "multi"
performs a test for n-independence, assuming (n-1)-independence.
There are several ways (determined by p.value.type
) to estimate the p-value: The "pearson_approx"
and "resample"
are approximately sharp. The latter is based on a resampling approach and thus much slower. The "distribution_free"
test might be very conservative, its p-value estimates are only valid for p-values lower than 0.215 - values above should be interpreted as "values larger than 0.215". Finally, "pearson_unif"
uses fixed parameters in Pearson's estimate, it is only applicable for univariate uniformly distributed marginals
All tests are performed using the standard euclidean distance. Other distances can be supplied via the ...
, see cdm
for the accepted arguments.
Value
A list with class "htest
" containing the following components:
statistic
the value of the test statistic,
p.value
the p-value of the test statistic,
method
a character string indicating the type of test performed,
data.name
a character string giving the name(s) of the data.
References
For the theoretic background see the references given on the main help page of this package: multivariance-package.
Examples
# an independence test
multivariance.test(dep_struct_several_26_100,p.value.type = "distribution_free") # conservative
multivariance.test(dep_struct_several_26_100,p.value.type = "resample") #sharp but slow
multivariance.test(dep_struct_several_26_100,p.value.type = "pearson_approx") #
# as an example, all tests for one data set:
coins100 = coins(100)
for (ty in c("total","m.multi.2","m.multi.3","multi"))
for (pvt in c("distribution_free","resample","pearson_approx"))
print(multivariance.test(coins100,type=ty,p.value.type = pvt))
# using the vec argument:
x = matrix(rnorm(50*6),ncol = 10) # a 50x6 data matrix
vec = c(1,2,3,4,5,6) # each column is treated as one variable
multivariance.test(x,vec,p.value.type = "distribution_free") # is the same as the default
vec = c(1,2,2,1,3,1)
# column 1,4,6 are treated as one variable
# column 2,3 are treated as one variable
# column 5 is treated as one variable
multivariance.test(x,vec,p.value.type = "distribution_free")
estimate of the computation time
Description
Estimates the computation time. This is relative rough. First run with determine.parameters = TRUE
(which takes a while). Then use the computed parameters to determine the computation time/or sample size.
Usage
multivariance.timing(
N = NULL,
n,
sectime = NULL,
coef.cdm = 15.2,
coef.prod = 2.1,
coef.sum = 1.05,
determine.parameters = FALSE
)
Arguments
N |
number of samples. If |
n |
number of variables |
sectime |
desired computation time in seconds. If |
coef.cdm |
computation time parameter for the doubly centered distance matrices |
coef.prod |
computation time parameter for matrix products |
coef.sum |
computation time parameter for matrix sums |
determine.parameters |
if |
Details
When detecting the parameters, the median of the computation times is used.
Examples
Ns = (1:100)*10
ns = 1:100
fulltime = outer(Ns,ns,FUN = function(N,n) multivariance.timing(N,n))
contour(Ns,ns,fulltime,xlab = "N",ylab = "n",
main = "computation time of multivariance in secs",
sub = "using default parameters -
use 'determine.parameters = TRUE' to compute machine specific values")
# Run to determine the parameters of your system:
# multivariance.timing(determine.parameters = TRUE)
simultaneous computation of multivariance and total/ 2-/ 3-multivariance
Description
Computes simultaneously multivariance, total multivariance, 2-multivariance and 3-multivariance.
Usage
multivariances.all(x, vec = NA, Nscale = TRUE, squared = TRUE, ...)
Arguments
x |
either a data matrix or a list of doubly centered distance matrices |
vec |
if x is a matrix, then this indicates which columns are treated together as one sample; if x is a list, these are the indexes for which the multivariance is calculated. The default is all columns and all indexes, respectively. |
Nscale |
if |
squared |
if |
... |
these are passed to |
Details
The computation is faster than the separate computations.
Value
Returns a vector with multivariance, total.multivariance, 2-multivariance and 3-multivariance
See Also
multivariance
, total.multivariance
, m.multivariance
Examples
x = coins(100,k = 3)
multivariances.all(x)
# yields the same as:
multivariance(x)
total.multivariance(x)
m.multivariance(x,m=2)
m.multivariance(x,m=3)
transforms a p-value into the corresponding label
Description
transforms a p-value into the corresponding label
Usage
p.value.to.star.label(pv)
Arguments
pv |
p-value |
pairwise multicorrelation
Description
pairwise multicorrelation
Usage
pairwise.multicorrelation.bias.corrected(
x,
vec = 1:ncol(x),
squared = FALSE,
...
)
fast p-value approximation
Description
Computes the p-value of a sample using Pearson's approximation of Gaussian quadratic forms with the estimators developed by Berschneider and Böttcher in [4].
Usage
pearson.pvalue(x, vec = NA, type = "multi", ...)
Arguments
x |
matrix, the rows should be iid samples |
vec |
vector, which indicates which columns of |
type |
one of |
... |
these are passed to |
Details
This is the method recommended in [4], i.e., using Pearson's quadratic form estimate with the unbiased finite sample estimators for the mean and variance of normalized multivariance together with the unbiased estimator for the limit skewness.
References
For the theoretic background see the reference [4] given on the main help page of this package: multivariance-package.
compute the p-value by Pearson's approximation assuming uniform marginals and euclidean distance
Description
compute the p-value by Pearson's approximation assuming uniform marginals and euclidean distance
Usage
pearson.pvalue.unif(
x,
vec = NA,
type = "total",
psi = NULL,
isotropic = TRUE,
...
)
Examples
multivariance:::pearson.pvalue.unif(matrix(runif(300),ncol = 3))
## Not run:
library(microbenchmark)
x = matrix(runif(300*3),ncol = 3)
microbenchmark(
multivariance.test(x,p.value.type = "pearson_approx")$p.value,
multivariance:::pearson.pvalue.unif(emp.transf(x))
)
## End(Not run)
approximate distribution function of a Gaussian quadratic form
Description
Approximation of the of the value of the distribution function of a Gaussian quadratic form based on its first three moments.
Usage
pearson.qf(x, moment, lower.tail = TRUE, verbose = FALSE)
Arguments
x |
value at which the distribution function is to be evaluated |
moment |
vector with the mean, variance and skewness of the quadratic form |
lower.tail |
logical, indicating of the lower or upper tail of the distribution function should be calculated |
verbose |
logical, if |
Details
This is Pearson's approximation for Gaussian quadratic forms as stated in [4] (equation (4.65) in arXiv:1808.07280v2)
References
For the theoretic background see the reference [4] given on the main help page of this package: multivariance-package.
rejection level for the test statistic
Description
Under independence the probability for the normalized and Nscaled (squared) multivariance to be above this level is less than alpha
. The same holds for the normalized, Nscaled and Escaled (squared) total multivariance and m-multivariance.
Usage
rejection.level(alpha)
Arguments
alpha |
level of significance |
Details
This is based on a distribution-free approach. The value might be very conservative. This is the counterpart to multivariance.pvalue
. For a less conservative approach see resample.rejection.level
.
The estimate is only valid for alpha
smaller than 0.215.
Examples
rejection.level(0.05) #the rejection level, for comparison with the following values
total.multivariance(matrix(rnorm(100*3),ncol = 3)) #independent sample
total.multivariance(coins(100)) #dependent sample which is 2-independent
# and the p values are (to compare with alpha)
multivariance.pvalue(total.multivariance(matrix(rnorm(100*3),ncol = 3))) #independent sample
multivariance.pvalue(total.multivariance(coins(100))) #dependent sample which is 2-independent
## Not run:
# visualization of the rejection level
curve(rejection.level(x),xlim = c(0.001,0.215),xlab = "alpha")
## End(Not run)
resampling (total /m-) multivariance
Description
The distribution of the test statistic under the hypothesis of independence is required for the independence tests. This function generates approximate samples of this distribution either by sampling without replacement (permutations) or by sampling with replacement (bootstrap).
Usage
resample.multivariance(
x,
vec = 1:ncol(x),
times = 300,
type = "multi",
resample.type = "permutation",
...
)
Arguments
x |
matrix, the rows should be iid samples |
vec |
vector, which indicates which columns of |
times |
integer, number of samples to generate |
type |
one of |
resample.type |
one of |
... |
is passed to |
Details
The resampling is done by sampling from the original data either without replacement ("permutation"
) or with replacement ("bootstrap"
). Using resampling without replacement is (much) faster (due to special identities which only hold in this case).
For convenience also the actual (total /m-) multivariance is computed and its p-value.
Value
A list with elements
resampled
the (total/m-)multivariances of the resampled data,
original
the (total/m-)multivariance of the original data,
p.value
the p-value of the original data, computed using the resampled data
References
For the theoretic background see the reference [3] given on the main help page of this package: multivariance-package.
Examples
re.m = resample.multivariance(matrix(rnorm(30*2),nrow = 30),
type= "multi",times = 300)$resampled
curve(ecdf(re.m)(x), xlim = c(0,4),main = "empirical distribution of the test statistic under H_0")
p-value via resampling
Description
Use a resampling method to generate samples of the test statistic under the hypothesis of independence. Based on these the p.value of a given value of a test statistic is computed.
Usage
resample.pvalue(value, ...)
Arguments
value |
numeric, the value of (total-/m-)multivariance for which the p-value shall be computed |
... |
passed to |
Details
This function is useful if a p-value of a test statistic shall be computed based on the resampling values of the test statistic of a different sample. For the p-value based on the same sample resample.multivariance(...)$p.value
is sufficient.
Value
It returns 1 minus the value of the empirical distribution function of the resampling samples evaluated at the given value.
References
For the theoretic background see the reference [3] given on the main help page of this package: multivariance-package.
Examples
x = coins(100)
resample.pvalue(multivariance(x),x=x,times = 300)
resample.pvalue(multivariances.all(x),x=x,times = 300,type = "all")
rejection level via resampling
Description
Uses the resample method to sample from the test statistic under the hypothesis of independence. The alpha quantile of these samples is returned.
Usage
resample.rejection.level(alpha = 0.05, ...)
Arguments
alpha |
numeric, the significance value |
... |
passed to |
References
For the theoretic background see the reference [3] given on the main help page of this package: multivariance-package.
Examples
resample.rejection.level(0.05,matrix(rnorm(30*2),nrow = 30))
resample.rejection.level(0.05,matrix(rnorm(30*3),nrow = 30),vec = c(1,1,2))
resamples doubly centered distance matrices
Description
resamples doubly centered distance matrices
Usage
sample.cdms(list.cdm, replace = FALSE, incl.first = FALSE)
Arguments
list.cdm |
a list of doubly centered distance matrices |
replace |
boolean, sampling with or without replacement |
incl.first |
boolean, if |
Value
Returns a list of doubly centered distance matrices, each matrix corresponds to the resampled columns of the corresponding sample, using resampling with replacement (bootstrap) or without replacement (permutations).
resample the columns of a matrix
Description
resample the columns of a matrix
Usage
sample.cols(x, vec = 1:ncol(x), replace = TRUE, incl.first = TRUE)
Arguments
x |
matrix |
vec |
vector, indicates which columns belong together |
replace |
boolean, sampling with or without replacement |
incl.first |
boolean, if |
Value
Returns a matrix with the same dimensions as x
. The columns are resampled from the original columns. The resampling is done with replacement (replace = TRUE
) or without (replace = FALSE
). Columns which belong together (indicated by vec) are resampled identically, i.e., all values in rows of these are kept together.
Examples
sample.cols(matrix(1:15,nrow = 5),vec = c(1,1,2))
sign preserving square root
Description
sign preserving square root
Usage
signed.sqrt(x)
Simple integer hash from text
Description
Used to compute a random seed for set.seed
based on the example name,
in order to aviod arbitrary seeds like '1234'.
Usage
simple.int.hash(x)
Examples
multivariance:::simple.int.hash("dep_struct_several_26_100")
This is the function GC which is required for the computation of the finite sample variance for m and total multivariance
Description
This is the function GC which is required for the computation of the finite sample variance for m and total multivariance
Usage
sums.of.products(a, b, c, type = "multi")
dependence example: tetrahedron sampling
Description
This function creates samples of a tetrahedron-dice colored r, g, b and rgb. Each sample indicates if for the thrown dice the colors r, g and b are contained on the bottom side of the dice.
Usage
tetrahedron(N = 1000)
Arguments
N |
number of samples |
Value
It returns the samples of the events r, g and b as rows of a N
by 3 matrix (the first column corresponds to r, the second to g,...). TRUE indicates that this color is on the bottom side of the dice. The columns are dependent but 2-independent.
References
For the theoretic background see the reference [3] given on the main help page of this package: multivariance-package.
Examples
tetrahedron(10)
# included for speed. it is faster than upper.lower
Description
# included for speed. it is faster than upper.lower
Usage
total.multicorrelation.bias.corrected.upper(
x,
vec = 1:ncol(x),
squared = FALSE,
...
)
total distance multivariance
Description
computes the total distance multivariance
Usage
total.multivariance(
x,
vec = NA,
lambda = 1,
Nscale = TRUE,
Escale = TRUE,
squared = TRUE,
...
)
Arguments
x |
either a data matrix or a list of doubly centered distance matrices |
vec |
if x is a matrix, then this indicates which columns are treated together as one sample; if x is a list, these are the indexes for which the multivariance is calculated. The default is all columns and all indexes, respectively. |
lambda |
a scaling parameter >0. Each k-tuple multivariance gets weight |
Nscale |
if |
Escale |
if |
squared |
if |
... |
these are passed to |
Details
Total distance multivariance is per definition the scaled sum of certain distance multivariances, and it characterize dependence.
As a rough guide to interpret the value of total distance multivariance note:
Large values indicate dependence.
For
Nscale = TRUE
values close to 1 and smaller indicate independence, larger values indicate dependence. In fact, in the case of independence the test statistic is a Gaussian quadratic form with expectation 1 and samples of it can be generated byresample.multivariance
.For
Nscale = FALSE
small values (close to 0) indicate independence, larger values indicate dependence.
Finally note, that due to numerical (in)precision the value of total multivariance might become negative. In these cases it is set to 0. A warning is issued, if the value is negative and further than the usual (used by all.equal
) tolerance away from 0.
References
For the theoretic background see the references given on the main help page of this package: multivariance-package.
Examples
x = matrix(rnorm(100*3),ncol = 3)
total.multivariance(x) #for an independent sample
# the value coincides with
(multivariance(x[,c(1,2)],Nscale = TRUE) + multivariance(x[,c(1,3)],Nscale = TRUE)+
multivariance(x[,c(2,3)],Nscale = TRUE) + multivariance(x,Nscale = TRUE))/4
total.multivariance(coins(100)) #value for a dependent sample which is 2-independent