Type: | Package |
Title: | Covariance Measure Tests for Conditional Independence |
Version: | 0.1-1 |
Description: | Covariance measure tests for conditional independence testing against conditional covariance and nonlinear conditional mean alternatives. The package implements versions of the generalised covariance measure test (Shah and Peters, 2020, <doi:10.1214/19-aos1857>) and projected covariance measure test (Lundborg et al., 2023, <doi:10.1214/24-AOS2447>). The tram-GCM test, for censored responses, is implemented including the Cox model and survival forests (Kook et al., 2024, <doi:10.1080/01621459.2024.2395588>). Application examples to variable significance testing and modality selection can be found in Kook and Lundborg (2024, <doi:10.1093/bib/bbae475>). |
Depends: | R (≥ 4.2.0) |
Imports: | ranger, glmnet, Formula, survival, coin |
License: | GPL-3 |
Encoding: | UTF-8 |
RoxygenNote: | 7.3.1 |
Suggests: | testthat (≥ 3.0.0), ggplot2, tidyr, dplyr, xgboost |
Config/testthat/edition: | 3 |
URL: | https://github.com/LucasKook/comets |
BugReports: | https://github.com/LucasKook/comets/issues |
NeedsCompilation: | no |
Packaged: | 2025-01-31 09:33:41 UTC; lkook |
Author: | Lucas Kook |
Maintainer: | Lucas Kook <lucasheinrich.kook@gmail.com> |
Repository: | CRAN |
Date/Publication: | 2025-01-31 10:10:06 UTC |
Covariance measure tests with formula interface
Description
Covariance measure tests with formula interface
Usage
comet(formula, data, test = c("gcm", "pcm", "wgcm"), ...)
comets(formula, data, test = c("gcm", "pcm", "wgcm"), ...)
Arguments
formula |
Formula of the form |
data |
Data.frame containing the variables in |
test |
Character string; |
... |
Additional arguments passed to |
Details
Formula-based interface for the generalised and projected covariance measure tests.
Value
Object of class "gcm"
, "wgcm"
or "pcm"
and
"htest"
. See gcm
and pcm
for details.
References
Kook, L. & Lundborg A. R. (2024). Algorithm-agnostic significance testing in supervised learning with multimodal data. Briefings in Bioinformatics, 25(6), 2024. doi:10.1093/bib/bbae475
Examples
tn <- 1e2
df <- data.frame(y = rnorm(tn), x1 = rnorm(tn), x2 = rnorm(tn), z = rnorm(tn))
comet(y ~ x1 + x2 | z, data = df, test = "gcm")
Generalised covariance measure test
Description
Generalised covariance measure test
Usage
gcm(
Y,
X,
Z,
alternative = c("two.sided", "less", "greater"),
reg_YonZ = "rf",
reg_XonZ = "rf",
args_YonZ = NULL,
args_XonZ = NULL,
type = c("quadratic", "max", "scalar"),
B = 499L,
coin = TRUE,
cointrol = list(distribution = "asymptotic"),
return_fitted_models = FALSE,
multivariate = c("none", "YonZ", "XonZ", "both"),
...
)
Arguments
Y |
Vector or matrix of response values. |
X |
Matrix or data.frame of covariates. |
Z |
Matrix or data.frame of covariates. |
alternative |
A character string specifying the alternative hypothesis,
must be one of |
reg_YonZ |
Character string or function specifying the regression for
Y on Z. See |
reg_XonZ |
Character string or function specifying the regression for
X on Z. See |
args_YonZ |
A list of named arguments passed to |
args_XonZ |
A list of named arguments passed to |
type |
Type of test statistic, either |
B |
Number of bootstrap samples. Only applies if |
coin |
Logical; whether or not to use the |
cointrol |
List; further arguments passed to
|
return_fitted_models |
Logical; whether to return the fitted regressions
(default is |
multivariate |
Character; specifying which regression can handle
multivariate outcomes ( |
... |
Additional arguments passed to |
Details
The generalised covariance measure test tests whether the conditional covariance of Y and X given Z is zero.
Value
Object of class 'gcm
' and 'htest
' with the following
components:
statistic |
The value of the test statistic. |
p.value |
The p-value for the |
parameter |
In case X is multidimensional, this is the degrees of freedom used for the chi-squared test. |
hypothesis |
String specifying the null hypothesis. |
null.value |
String specifying the null hypothesis. |
method |
The string |
data.name |
A character string giving the name(s) of the data. |
rY |
Residuals for the Y on Z regression. |
rX |
Residuals for the X on Z regression. |
models |
List of fitted regressions if |
References
Rajen D. Shah, Jonas Peters "The hardness of conditional independence testing and the generalised covariance measure," The Annals of Statistics, 48(3), 1514-1538. doi:10.1214/19-aos1857
Examples
n <- 1e2
X <- matrix(rnorm(2 * n), ncol = 2)
colnames(X) <- c("X1", "X2")
Z <- matrix(rnorm(2 * n), ncol = 2)
colnames(Z) <- c("Z1", "Z2")
Y <- X[, 2]^2 + Z[, 2] + rnorm(n)
(gcm1 <- gcm(Y, X, Z))
Projected covariance measure test for conditional mean independence
Description
Projected covariance measure test for conditional mean independence
Usage
pcm(
Y,
X,
Z,
rep = 1,
est_vhat = TRUE,
reg_YonXZ = "rf",
reg_YonZ = "rf",
reg_YhatonZ = "rf",
reg_VonXZ = "rf",
reg_RonZ = "rf",
args_YonXZ = NULL,
args_YonZ = NULL,
args_YhatonZ = NULL,
args_VonXZ = NULL,
args_RonZ = NULL,
frac = 0.5,
indices = NULL,
coin = FALSE,
cointrol = NULL,
return_fitted_models = FALSE,
...
)
Arguments
Y |
Vector of response values. Can be supplied as a numeric vector or a single column matrix. |
X |
Matrix or data.frame of covariates. |
Z |
Matrix or data.frame of covariates. |
rep |
Number of repetitions with which to repeat the PCM test |
est_vhat |
Logical; whether to estimate the variance functional |
reg_YonXZ |
Character string or function specifying the regression
for Y on X and Z, default is |
reg_YonZ |
Character string or function specifying the regression
for Y on Z, default is |
reg_YhatonZ |
Character string or function specifying the regression
for the predicted values of |
reg_VonXZ |
Character string or function specifying the regression
for estimating the conditional variance of Y given X and Z, default
is |
reg_RonZ |
Character string or function specifying the regression
for the estimated transformation of Y, X, and Z on Z, default is
|
args_YonXZ |
A list of named arguments passed to |
args_YonZ |
A list of named arguments passed to |
args_YhatonZ |
A list of named arguments passed to |
args_VonXZ |
A list of named arguments passed to |
args_RonZ |
A list of named arguments passed to |
frac |
Relative size of train split. |
indices |
A numeric vector of indices specifying the observations used
for estimating the estimating the direction (the other observations will
be used for computing the final test statistic). Default is |
coin |
Logical; whether or not to use the |
cointrol |
List; further arguments passed to
|
return_fitted_models |
Logical; whether to return the fitted regressions
(default is |
... |
Additional arguments currently ignored. |
Details
The projected covariance measure test tests whether the conditional mean of Y given X and Z is independent of X.
Value
Object of class 'pcm
' and 'htest
' with the following
components:
statistic |
The value of the test statistic. |
p.value |
The p-value for the |
parameter |
In case X is multidimensional, this is the degrees of freedom used for the chi-squared test. |
hypothesis |
Null hypothesis of conditional mean independence. |
null.value |
Null hypothesis of conditional mean independence. |
method |
The string |
data.name |
A character string giving the name(s) of the data. |
check.data |
A |
models |
List of fitted regressions if |
References
Lundborg, A. R., Kim, I., Shah, R. D., & Samworth, R. J. (2022). The Projected Covariance Measure for assumption-lean variable significance testing. arXiv preprint. doi:10.48550/arXiv.2211.02039
Examples
n <- 1e2
X <- matrix(rnorm(2 * n), ncol = 2)
colnames(X) <- c("X1", "X2")
Z <- matrix(rnorm(2 * n), ncol = 2)
colnames(Z) <- c("Z1", "Z2")
Y <- X[, 2]^2 + Z[, 2] + rnorm(n)
(pcm1 <- pcm(Y, X, Z))
Equivalence test for the parameter in a partially linear model
Description
Equivalence test for the parameter in a partially linear model
Usage
plm_equiv_test(Y, X, Z, from, to, scale = c("plm", "cov", "cor"), ...)
Arguments
Y |
Vector or matrix of response values. |
X |
Matrix or data.frame of covariates. |
Z |
Matrix or data.frame of covariates. |
from |
Lower bound of the equivalence margin |
to |
Upper bound of the equivalence margin |
scale |
Scale on which to specify the equivalence margin. Default
|
... |
Further arguments passed to |
Details
The partially linear model postulates
Y = X \theta + g(Z) + \epsilon,
and the target of inference is theta. The target is closely related to the conditional covariance between Y and X given Z:
\theta = E[cov(X, Y | Z)] / E[Var(X | Z)].
The equivalence test (based
on the GCM test) tests H_0: \theta \not\in [{\tt from}, {\tt to}]
versus
H_1: \theta \in [{\tt from}, {\tt to}]
. Y, X (and theta) can only be
one-dimensional. There are no restrictions on Z. The equivalence test can
also be performed on the conditional covariance scale directly (using
scale = "cov"
) or on the conditional correlation scale:
E[cov(X, Y | Z)] / \sqrt{E[Var(X | Z)]E[Var(Y | Z)]}
,
using scale = "cor"
.
Value
Object of class 'gcm
' and 'htest
'
Examples
n <- 150
X <- rnorm(n)
Z <- matrix(rnorm(2 * n), ncol = 2)
colnames(Z) <- c("Z1", "Z2")
Y <- X^2 + Z[, 2] + rnorm(n)
plm_equiv_test(Y, X, Z, from = -1, to = 1)
Plotting methods for COMETs
Description
Plotting methods for COMETs
Usage
## S3 method for class 'gcm'
plot(x, plot = TRUE, ...)
## S3 method for class 'pcm'
plot(x, plot = TRUE, ...)
## S3 method for class 'wgcm'
plot(x, plot = TRUE, ...)
Arguments
x |
Object of class ' |
plot |
Logical; whether to print the plot (default: |
... |
Currently ignored. |
Implemented regression methods
Description
Implemented regression methods
Usage
rf(y, x, ...)
survforest(y, x, ...)
qrf(y, x, ...)
lrm(y, x, ...)
glrm(y, x, ...)
lasso(y, x, ...)
ridge(y, x, ...)
postlasso(y, x, ...)
cox(y, x, ...)
tuned_rf(
y,
x,
max.depths = 1:5,
mtrys = list(1, function(p) ceiling(sqrt(p)), identity),
verbose = FALSE,
...
)
xgb(y, x, nrounds = 2, verbose = 0, ...)
tuned_xgb(
y,
x,
etas = c(0.1, 0.5, 1),
max_depths = 1:5,
nfold = 5,
nrounds = c(2, 10, 50),
verbose = 0,
metrics = list("rmse"),
...
)
Arguments
y |
Vector (or matrix) of response values. |
x |
Design matrix of predictors. |
... |
Additional arguments passed to the underlying regression method.
In case of |
max.depths |
Values for |
mtrys |
for |
verbose |
See |
nrounds |
See |
etas |
Values for |
max_depths |
Values for |
nfold |
Number of folds for |
metrics |
See |
Details
The implemented choices are "rf"
for random forests as implemented in
ranger, "lasso"
for cross-validated Lasso regression (using the
one-standard error rule), "ridge"
for cross-validated ridge regression (using the one-standard error rule),
"cox"
for the Cox proportional
hazards model as implemented in survival, "qrf"
or "survforest"
for quantile and survival random forests, respectively. The option
"postlasso"
option refers to a cross-validated LASSO (using the
one-standard error rule) and subsequent OLS regression. The "lrm"
option implements a standard linear regression model. The "xgb"
and
"tuned_xgb"
options require the xgboost
package.
The "tuned_rf"
regression method tunes the mtry
and
max.depth
parameters in ranger
out-of-bag.
The "tuned_xgb"
regression method uses k-fold cross-validation to
tune the nrounds
, mtry
and max_depth
parameters in
xgb.cv
.
New regression methods can be implemented and supplied as well and need the
following structure. The regression method "custom_reg"
needs to take
arguments y, x, ...
, fit the model using y
and x
as
matrices and return an object of a user-specified class, for instance,
'custom
'. For the GCM test, implementing a residuals.custom
method is sufficient, which should take arguments
object, response = NULL, data = NULL, ...
. For the PCM test, a
predict.custom
method is necessary for out-of-sample prediction
and computation of residuals.
GCM test with pre-computed residuals
Description
GCM test with pre-computed residuals
Usage
rgcm(
rY,
rX,
alternative = "two.sided",
type = c("quadratic", "max", "scalar"),
...
)
Arguments
rY |
Vector or matrix of response values. |
rX |
Matrix or data.frame of covariates. |
alternative |
A character string specifying the alternative hypothesis,
must be one of |
type |
Type of test statistic, either |
... |
Further arguments passed to |
Value
Object of class 'gcm
' and 'htest
' with the following
components:
statistic |
The value of the test statistic. |
p.value |
The p-value for the |
parameter |
In case X is multidimensional, this is the degrees of freedom used for the chi-squared test. |
hypothesis |
String specifying the null hypothesis. |
null.value |
String specifying the null hypothesis. |
method |
The string |
data.name |
A character string giving the name(s) of the data. |
rY |
Residuals for the Y on Z regression. |
rX |
Residuals for the X on Z regression. |
Weighted Generalised covariance measure test
Description
Weighted Generalised covariance measure test
Usage
wgcm(
Y,
X,
Z,
reg_YonZ = "rf",
reg_XonZ = "rf",
reg_wfun = "rf",
args_YonZ = NULL,
args_XonZ = NULL,
args_wfun = NULL,
frac = 0.5,
B = 499L,
coin = TRUE,
cointrol = NULL,
return_fitted_models = FALSE,
multivariate = c("none", "YonZ", "XonZ", "both"),
...
)
Arguments
Y |
Vector of response values. Can be supplied as a numeric vector or a single column matrix. |
X |
Matrix or data.frame of covariates. |
Z |
Matrix or data.frame of covariates. |
reg_YonZ |
Character string or function specifying the regression for
Y on Z. See |
reg_XonZ |
Character string or function specifying the regression for
X on Z. See |
reg_wfun |
Character string or function specifying the regression for
estimating the weighting function.
See |
args_YonZ |
A list of named arguments passed to |
args_XonZ |
A list of named arguments passed to |
args_wfun |
Additional arguments passed to |
frac |
Relative size of train split. |
B |
Number of bootstrap samples. Only applies if |
coin |
Logical; whether or not to use the |
cointrol |
List; further arguments passed to
|
return_fitted_models |
Logical; whether to return the fitted regressions
(default is |
multivariate |
Character; specifying which regression can handle
multivariate outcomes ( |
... |
Additional arguments currently ignored. |
Details
The weighted generalised covariance measure test tests whether a weighted version of the conditional covariance of Y and X given Z is zero.
Value
Object of class 'wgcm
' and 'htest
' with the following
components:
statistic |
The value of the test statistic. |
p.value |
The p-value for the |
parameter |
In case X is multidimensional, this is the degrees of freedom used for the chi-squared test. |
hypothesis |
String specifying the null hypothesis . |
null.value |
String specifying the null hypothesis. |
method |
The string |
data.name |
A character string giving the name(s) of the data. |
rY |
Residuals for the Y on Z regression. |
rX |
Weighted residuals for the X on Z regression. |
W |
Estimated weights. |
models |
List of fitted regressions if |
References
Scheidegger, C., Hörrmann, J., & Bühlmann, P. (2022). The weighted generalised covariance measure. Journal of Machine Learning Research, 23(273), 1-68.
Examples
n <- 100
X <- matrix(rnorm(2 * n), ncol = 2)
colnames(X) <- c("X1", "X2")
Z <- matrix(rnorm(2 * n), ncol = 2)
colnames(Z) <- c("Z1", "Z2")
Y <- X[, 2]^2 + Z[, 2] + rnorm(n)
(wgcm1 <- wgcm(Y, X, Z))