Title: | Query Composite Hypotheses |
Version: | 2.1.0 |
Maintainer: | Tristan Mary-Huard <tristan.mary-huard@agroparistech.fr> |
Description: | Provides functions for the joint analysis of Q sets of p-values obtained for the same list of items. This joint analysis is performed by querying a composite hypothesis, i.e. an arbitrary complex combination of simple hypotheses, as described in Mary-Huard et al. (2021) <doi:10.1093/bioinformatics/btab592> and De Walsche et al.(2023) <doi:10.1101/2024.03.17.585412>. In this approach, the Q-uplet of p-values associated with each item is distributed as a multivariate mixture, where each of the 2^Q components corresponds to a specific combination of simple hypotheses. The dependence between the p-value series is considered using a Gaussian copula function. A p-value for the composite hypothesis test is derived from the posterior probabilities. |
License: | GPL-3 |
Depends: | R (≥ 2.10) |
Imports: | copula, dplyr, graphics, ks, purrr, qvalue, Rcpp, stats, stringr, utils |
LinkingTo: | Rcpp, RcppArmadillo |
Encoding: | UTF-8 |
LazyData: | true |
RoxygenNote: | 7.3.2 |
NeedsCompilation: | yes |
Packaged: | 2025-07-04 11:22:16 UTC; Annaig |
Author: | Tristan Mary-Huard
|
Repository: | CRAN |
Date/Publication: | 2025-07-04 12:50:02 UTC |
qch: Query Composite Hypotheses
Description
Provides functions for the joint analysis of Q sets of p-values obtained for the same list of items. This joint analysis is performed by querying a composite hypothesis, i.e. an arbitrary complex combination of simple hypotheses, as described in Mary-Huard et al. (2021) doi:10.1093/bioinformatics/btab592 and De Walsche et al.(2023) doi:10.1101/2024.03.17.585412. In this approach, the Q-uplet of p-values associated with each item is distributed as a multivariate mixture, where each of the 2^Q components corresponds to a specific combination of simple hypotheses. The dependence between the p-value series is considered using a Gaussian copula function. A p-value for the composite hypothesis test is derived from the posterior probabilities.
Details
The main functions of the package GetHconfig
, GetH1AtLeast
,
GetH1Equal
,
qch.fit
and qch.test
correspond to the
4 steps for querying a composite hypothesis:
Building all possible combination of simple hypotheses
H_0
/H_1
Composite alternative hypothesis formulation
Inferring the null distribution
Testing the composite null hypothesis
Author(s)
Maintainer: Tristan Mary-Huard tristan.mary-huard@agroparistech.fr (ORCID)
Authors:
Annaig De Walsche annaig.de-walsche@inrae.fr (ORCID)
Other contributors:
Franck Gauthier franck.gauthier@inrae.fr (ORCID) [contributor]
Gaussian copula density for each H-configuration.
Description
Gaussian copula density for each H-configuration.
Usage
Copula.Hconfig_gaussian_density(Hconfig, F0Mat, F1Mat, R)
Arguments
Hconfig |
A list of all possible combination of |
F0Mat |
a matrix containing the evaluation of the marginal cdf under |
F1Mat |
a matrix containing the evaluation of the marginal cdf under |
R |
the correlation matrix. |
Value
A matrix containing the evaluation of the Gaussian density function for each H-configuration in columns.
EM calibration in the case of the Gaussian copula (unsigned)
Description
EM calibration in the case of the Gaussian copula (unsigned)
Usage
EM_calibration_gaussian(
Hconfig,
F0Mat,
F1Mat,
fHconfig,
R.init,
Prior.init,
Precision = 1e-06
)
Arguments
Hconfig |
A list of all possible combination of |
F0Mat |
a matrix containing the evaluation of the marginal cdf under |
F1Mat |
a matrix containing the evaluation of the marginal cdf under |
fHconfig |
a matrix containing H-config densities evaluated at each items, each column corresponding to a configurations. |
R.init |
the initialization of the correlation matrix of the Gaussian copula parameter. |
Prior.init |
the initialization of prior probabilities for each of the H-configurations. |
Precision |
Precision for the stop criterion. (Default is 1e-6) |
Value
A list with the following elements:
priorHconfig | vector of estimated prior probabilities for each of the H-configurations. |
Rcopula | the estimated correlation matrix of the Gaussian copula. |
EM calibration in the case of the Gaussian copula (unsigned) with memory management
Description
EM calibration in the case of the Gaussian copula (unsigned) with memory management
Usage
EM_calibration_gaussian_memory(
Logf0Mat,
Logf1Mat,
F0Mat,
F1Mat,
Prior.init,
R.init,
Hconfig,
Precision = 1e-06,
threads_nb
)
Arguments
Logf0Mat |
a matrix containing the |
Logf1Mat |
a matrix containing the |
F0Mat |
a matrix containing the evaluation of the marginal cdf under |
F1Mat |
a matrix containing the evaluation of the marginal cdf under |
Prior.init |
the initialization of prior probabilities for each of the H-configurations. |
R.init |
the initialization of the correlation matrix of the gaussian copula parameter. |
Hconfig |
A list of all possible combination of |
Precision |
Precision for the stop criterion. (Default is 1e-6) |
threads_nb |
The number of threads to use. |
Value
A list with the following elements:
priorHconfig | vector of estimated prior probabilities for each of the H-configurations. |
Rcopula | the estimated correlation matrix of the Gaussian copula. |
EM calibration in the case of conditional independence
Description
EM calibration in the case of conditional independence
Usage
EM_calibration_indep(fHconfig, Prior.init, Precision = 1e-06)
Arguments
fHconfig |
a matrix containing config densities evaluated at each items, each column corresponding to a configurations. |
Prior.init |
the initialization of prior probabilities for each of the H-configurations. |
Precision |
Precision for the stop criterion. (Default is 1e-6) |
Value
a vector of estimated prior probabilities for each of the H-configurations.
EM calibration in the case of conditional independence with memory management (unsigned)
Description
EM calibration in the case of conditional independence with memory management (unsigned)
Usage
EM_calibration_indep_memory(
Logf0Mat,
Logf1Mat,
Prior.init,
Hconfig,
Precision = 1e-06,
threads_nb
)
Arguments
Logf0Mat |
a matrix containing the |
Logf1Mat |
a matrix containing the |
Prior.init |
the initialization of prior probabilities for each of the H-configurations. |
Hconfig |
A list of all possible combination of |
Precision |
Precision for the stop criterion. (Default is 1e-6) |
threads_nb |
The number of threads to use. |
Value
a vector of estimated prior probabilities for each of the H-configurations.
FastKerFdr signed
Description
Kernel estimation of the density in a two-components mixture model where one component are a standard Gaussian density.
Usage
FastKerFdr_signed(
X,
p0 = NULL,
plotting = FALSE,
NbKnot = 1e+05,
tol = 1e-05,
max_iter = 10000
)
Arguments
X |
a vector of probit-transformed p-values (corresponding to a p-value serie). |
p0 |
a priori proportion of |
plotting |
boolean, should some diagnostic graphs be plotted. (Default is FALSE.) |
NbKnot |
The (maximum) number of knot for the |
tol |
a tolerance value for convergence. (Default is 1e-5.) |
max_iter |
the maximum number of iterations allowed for the algorithm to converge or complete its process.(Default is 1e4.) |
Value
A list with the following elements:
p0 | vector of the estimated proportions of H_0 hypotheses
for each of p-value serie. |
tau | the vector of H_1 posteriors. |
f1 | a numeric vector, each coordinate i
corresponding to the evaluation of the H_1 density at point x_i ,
where x_i is the i th item in X . |
F1 | a numeric vector, each coordinate i
corresponding to the evaluation of the H_1 cdf at point x_i ,
where x_i is the i th item in X .
|
FastKerFdr unsigned
Description
Kernel estimation of the density in a two-components mixture model
where one component are a standard Gaussian density.
Here we suppose that the density to estimate lives in R^+
.
Usage
FastKerFdr_unsigned(
X,
p0 = NULL,
plotting = FALSE,
NbKnot = 1e+05,
tol = 1e-05,
max_iter = 10000
)
Arguments
X |
a vector of probit-transformed p-values (corresponding to a p-value serie) |
p0 |
a priori proportion of |
plotting |
boolean, should some diagnostic graphs be plotted. (Default is FALSE.) |
NbKnot |
The (maximum) number of knot for the |
tol |
a tolerance value for convergence. (Default is 1e-5.) |
max_iter |
the maximum number of iterations allowed for the algorithm to converge or complete its process.(Default is 1e4.) |
Value
A list with the following elements:
p0 | vector of the estimated proportions of H_0 hypotheses
for each of p-value serie. |
tau | the vector of H_1 posteriors. |
f1 | a numeric vector, each coordinate i
corresponding to the evaluation of the H_1 density at point x_i ,
where x_i is the i th item in X . |
F1 | a numeric vector, each coordinate i
corresponding to the evaluation of the H_1 cdf at point x_i ,
where x_i is the i th item in X .
|
Specify the configurations corresponding to the composite H_1
test "AtLeast".
Description
Specify which configurations among Hconfig
correspond
to the composite alternative hypothesis : {at least "AtLeast
" H_1
hypotheses are of interest }
Usage
GetH1AtLeast(Hconfig, AtLeast, Consecutive = FALSE, SameSign = FALSE)
Arguments
Hconfig |
A list of all possible combination of |
AtLeast |
How many |
Consecutive |
Should the significant test series be consecutive ? (optional, default is |
SameSign |
Should the significant test series have the same sign ? (optional, default is |
Value
A vector 'Hconfig.H1
' of components of Hconfig
that correspond to the 'AtLeast
' specification.
See Also
Examples
GetH1AtLeast(GetHconfig(4), 2)
Specify the configurations corresponding to the composite H_1
test "Equal".
Description
Specify which configurations among Hconfig
correspond
to the composite alternative hypothesis :{Exactly "Equal
" H_1
hypotheses are of interest }
Usage
GetH1Equal(Hconfig, Equal, Consecutive = FALSE, SameSign = FALSE)
Arguments
Hconfig |
A list of all possible combination of H0 and H1 hypotheses generated by the |
Equal |
What is the exact number of |
Consecutive |
Should the significant test series be consecutive ? (optional, default is FALSE). |
SameSign |
Should the significant test series have the same sign ? (optional, default is FALSE). |
Value
A vector 'Hconfig.H1
' of components of Hconfig
that correspond to the 'Equal
' specification.
See Also
Examples
GetH1Equal(GetHconfig(4), 2)
Generate the H_0
/H_1
configurations.
Description
Generate all possible combination of simple hypotheses H_0
/H_1
.
Usage
GetHconfig(Q, Signed = FALSE)
Arguments
Q |
The number of test series to be combined. |
Signed |
Should the sign of the effect be taken into account? (optional, default is |
Value
A list 'Hconfig
' of all possible combination of H_0
and H_1
hypotheses among Q
hypotheses tested.
Examples
GetHconfig(4)
Synthetic example to illustrate the main qch functions
Description
PvalSets is a data.frame with 10,000 rows and 3 columns. Each row corresponds to an item,
columns 'Pval1' and 'Pval2' each correspond to a test serie over the items, and column 'Class'
provides the truth, i.e. if item i
belongs to class 1 then the H0 hypothesis is true for the 2 tests,
if item i
belongs to class 2 (resp. 3) then the H0 hypothesis is true for the first (resp. second)
test only, and if item i
belongs to class 4 then both H0 hypotheses are false (for the first
and the second test).
Usage
PvalSets
Format
A data.frame
Synthetic example to illustrate the main qch functions using Gaussian copula
Description
PvalSets_cor is a data.frame with 10,000 rows and 3 columns. Each row corresponds to an item,
columns Pval1
and Pval2
each correspond to a test serie over the items, and column 'Class'
provides the truth, i.e. if item i
belongs to class 1 then the H_0
hypothesis is true for the 2 tests,
if item i
belongs to class 2 (resp. 3) then the H_0
hypothesis is true for the first (resp. second)
test only, and if item i
belongs to class 4 then both H0 hypotheses are false (for the first
and the second test). The correlation between the two pvalues series within each class is 0.3.
Usage
PvalSets_cor
Format
A data.frame
Gaussian copula correlation matrix Maximum Likelihood estimator.
Description
Gaussian copula correlation matrix Maximum Likelihood estimator.
Usage
R.MLE(Hconfig, zeta0, zeta1, Tau)
Arguments
Hconfig |
A list of all possible combination of |
zeta0 |
a matrix containing the |
zeta1 |
a matrix containing the |
Tau |
a matrix providing for each item (in row) its posterior probability to belong to each of the H-configurations (in columns). |
Value
Estimate of the correlation matrix.
Check the Gaussian copula correlation matrix Maximum Likelihood estimator
Description
Check the Gaussian copula correlation matrix Maximum Likelihood estimator
Usage
R.MLE.check(R)
Arguments
R |
Estimate of the correlation matrix. |
Value
Estimate of the correlation matrix.
Gaussian copula correlation matrix Maximum Likelihood estimator (memory handling)
Description
Gaussian copula correlation matrix Maximum Likelihood estimator (memory handling)
Usage
R.MLE.memory(
Hconfig,
fHconfig_sum,
OldPrior,
Logf0Mat,
Logf1Mat,
zeta0,
zeta1,
OldR,
OldRinv
)
Arguments
Hconfig |
A list of all possible combination of |
fHconfig_sum |
a vector containing |
OldPrior |
a vector containing the prior probabilities for each of the H-configurations. |
Logf0Mat |
a matrix containing |
Logf1Mat |
a matrix containing |
zeta0 |
a matrix containing |
zeta1 |
a matrix containing |
OldR |
the copula correlation matrix. |
OldRinv |
the inverse of copula correlation matrix. |
Value
Estimate of the correlation matrix.
Update the estimate of R correlation matrix of the gaussian copula, parallelized version
Description
Update the estimate of R correlation matrix of the gaussian copula, parallelized version
Usage
R_MLE_update_gaussian_copula_ptr_parallel(
Hconfig,
fHconfig_sum,
OldPrior,
Logf0Mat,
Logf1Mat,
zeta0,
zeta1,
OldR,
OldRinv,
RhoIndex,
threads_nb = 0L
)
Arguments
Hconfig |
list of vector of 0 and 1, corresponding to the configurations |
fHconfig_sum |
a double vector containing sum_c(w_c*psi_c), obtained by fHconfig_sum_update_ptr_parallel() |
OldPrior |
a double vector containing the prior w_c |
Logf0Mat |
a double matrix containing the log(f0(xi_q)) |
Logf1Mat |
a double matrix containing the log(f1(xi_q)) |
zeta0 |
a double matrix containing the qnorm(F0(x_iq)) |
zeta1 |
a double matrix containing the qnorm(F1(x_iq)) |
OldR |
a double matrix corresponding to the copula parameter |
OldRinv |
a double matrix corresponding to the inverse copula parameter |
RhoIndex |
a int matrix containing the index of lower triangular part of a matrix |
threads_nb |
an int the number of threads |
Value
a double vector containing the lower triangular part of the MLE of R
Signed case function: Separate f1 into f+ and f-
Description
Signed case function: Separate f1 into f+ and f-
Usage
f1_separation_signed(XMat, f0Mat, f1Mat, p0, plotting = FALSE)
Arguments
XMat |
a matrix of probit-transformed p-values, each column corresponding to a p-value serie. |
f0Mat |
a matrix containing the evaluation of the marginal density functions under |
f1Mat |
a matrix containing the evaluation of the marginal density functions under |
p0 |
the proportions of |
plotting |
boolean, should some diagnostic graphs be plotted. (Default is FALSE.) |
Value
A list with the following elements:
f1plusMat | a matrix containing the evaluation of the marginal density functions under H_1^+
at each items, each column corresponding to a p-value serie. |
f1minusMat | a matrix containing the evaluation of the marginal density functions under H_1^-
at each items, each column corresponding to a p-value serie. |
p1plus | an estimate of the proportions of H_1^+ items for each series. |
p1minus | an estimate of the proportions of H_1^- items for each series.
|
Computation of the sum sum_c(w_c*psi_c) using Gaussian copula parallelized version
Description
Computation of the sum sum_c(w_c*psi_c) using Gaussian copula parallelized version
Usage
fHconfig_sum_update_gaussian_copula_ptr_parallel(
Hconfig,
NewPrior,
Logf0Mat,
Logf1Mat,
zeta0,
zeta1,
R,
Rinv,
threads_nb = 0L
)
Arguments
Hconfig |
list of vector of 0 and 1, corresponding to the configurations |
NewPrior |
a double vector containing the prior w_c |
Logf0Mat |
a double matrix containing the log(f0(xi_q)) |
Logf1Mat |
a double matrix containing the log(f1(xi_q)) |
zeta0 |
a double matrix containing the qnorm(F0(x_iq)) |
zeta1 |
a double matrix containing the qnorm(F1(x_iq)) |
R |
a double matrix corresponding to the copula parameter |
Rinv |
a double matrix corresponding to the inverse copula parameter |
threads_nb |
an int the number of threads |
Value
a double vector containing sum_c(w_c*psi_c)
Computation of the sum sum_c(w_c*psi_c) parallelized version
Description
Computation of the sum sum_c(w_c*psi_c) parallelized version
Usage
fHconfig_sum_update_ptr_parallel(
Hconfig,
NewPrior,
Logf0Mat,
Logf1Mat,
threads_nb = 0L
)
Arguments
Hconfig |
list of vector of 0 and 1, corresponding to the configurations |
NewPrior |
a double vector containing the prior w_c |
Logf0Mat |
a double matrix containing the log(f0(xi_q)) |
Logf1Mat |
a double matrix containing the log(f1(xi_q)) |
threads_nb |
an int the number of threads |
Value
a double vector containing sum_c(w_c*psi_c)
Gaussian copula density
Description
Gaussian copula density
Usage
gaussian_copula_density(zeta, R, Rinv)
Arguments
zeta |
the matrix of probit-transformed observations. |
R |
the correlation matrix. |
Rinv |
the inverse correlation matrix. |
Value
A numeric vector, each coordinate i
corresponding to the evaluation of the Gaussian copula density function at observation \code{zeta}_i
.
Update of the prior estimate in EM algo parallelized version
Description
Update of the prior estimate in EM algo parallelized version
Usage
prior_update_arma_ptr_parallel(
Hconfig,
fHconfig_sum,
OldPrior,
Logf0Mat,
Logf1Mat,
threads_nb = 0L
)
Arguments
Hconfig |
list of vector of 0 and 1, corresponding to the configurations |
fHconfig_sum |
a double vector containing sum_c(w_c*psi_c), obtained by fHconfig_sum_update_ptr_parallel() |
OldPrior |
a double vector containing the prior w_c |
Logf0Mat |
a double matrix containing the log(f0(xi_q)) |
Logf1Mat |
a double matrix containing the log(f1(xi_q)) |
threads_nb |
an int the number of threads |
Value
a double vector containing the new estimate of prior w_c
Update of the prior estimate in EM algo using Gaussian copula, parallelized version
Description
Update of the prior estimate in EM algo using Gaussian copula, parallelized version
Usage
prior_update_gaussian_copula_ptr_parallel(
Hconfig,
fHconfig_sum,
OldPrior,
Logf0Mat,
Logf1Mat,
zeta0,
zeta1,
R,
Rinv,
threads_nb = 0L
)
Arguments
Hconfig |
list of vector of 0 and 1, corresponding to the configurations |
fHconfig_sum |
a double vector containing sum_c(w_c*psi_c), obtained by fHconfig_sum_update_ptr_parallel() |
OldPrior |
a double vector containing the prior w_c |
Logf0Mat |
a double matrix containing the log(f0(xi_q)) |
Logf1Mat |
a double matrix containing the log(f1(xi_q)) |
zeta0 |
a double matrix containing the qnorm(F0(x_iq)) |
zeta1 |
a double matrix containing the qnorm(F1(x_iq)) |
R |
a double matrix corresponding to the copula parameter |
Rinv |
a double matrix corresponding to the inverse copula parameter |
threads_nb |
an int the number of threads |
Value
a double vector containing the new estimate of prior w_c
Infer posterior probabilities of H_0
/H_1
configurations.
Description
For each item, estimate the posterior probability for each configuration.
This function use either the model accounting for the dependence structure
through a Gaussian copula function (copula=="gaussian"
) or
assuming the conditional independence (copula=="indep"
).
Utilizes parallel computing, when available. For package documentation, see qch-package
.
Usage
qch.fit(
pValMat,
EffectMat = NULL,
Hconfig,
copula = "indep",
threads_nb = 0,
plotting = FALSE,
Precision = 1e-06
)
Arguments
pValMat |
A matrix of p-values, each column corresponding to a p-value serie. |
EffectMat |
A matrix of estimated effects corresponding to the p-values contained in |
Hconfig |
A list of all possible combination of |
copula |
A string specifying the form of copula to use. Possible values are " |
threads_nb |
The number of threads to use. The number of thread will set to the number of cores available by default. |
plotting |
A boolean. Should some diagnostic graphs be plotted ? Default is |
Precision |
The precision for EM algorithm to infer the parameters. Default is |
Value
A list with the following elements:
prior | vector of estimated prior probabilities for each of the H-configurations. |
Rcopula | the estimated correlation matrix of the Gaussian copula. (if applicable) |
Hconfig | the list of all configurations. |
null_prop | the estimation of items under the null for each test series. |
If the storage permits, the list will additionally contain:
posterior
matrix providing for each item (in row) its posterior probability to belong to each of the H-configurations (in columns). fHconfig
matrix containing \psi_c
densities evaluated at each items, each column corresponding to a configuration.Else, the list will additionally contain:
f0Mat
matrix containing the evaluation of the marginal densities under H_0
at each items, each column corresponding to a p-value serie.f1Mat
matrix containing the evaluation of the marginal densities under H_1
at each items, each column corresponding to a p-value serie.F0Mat
matrix containing the evaluation of the marginal cdf under H_0
at each items, each column corresponding to a p-value serie.F1Mat
matrix containing the evaluation of the marginal cdf under H_1
at each items, each column corresponding to a p-value serie.fHconfig_sum
vector containing (\sum_cw_c\psi_c(Z_i))
for each itemsi
.
The elements of interest are the posterior probabilities matrix, posterior
,
the estimated proportion of observations belonging to each configuration, prior
, and
the estimated correlation matrix of the Gaussian copula, Rcopula
.
The remaining elements are returned primarily for use by other functions.
Examples
data(PvalSets_cor)
PvalMat <- as.matrix(PvalSets_cor[, -3])
## Build the Hconfig objects
Q <- 2
Hconfig <- GetHconfig(Q)
## Run the function
res.fit <- qch.fit(pValMat = PvalMat, Hconfig = Hconfig, copula = "gaussian")
## Display the prior of each class of items
res.fit$prior
## Display the correlation estimate of the gaussian copula
res.fit$Rcopula
## Display the first posteriors
head(res.fit$posterior)
Perform composite hypothesis testing.
Description
Perform any composite hypothesis test by specifying
the configurations 'Hconfig.H1
' corresponding to the composite alternative hypothesis
among all configurations 'Hconfig
'.
Usage
qch.test(res.qch.fit, Hconfig, Hconfig.H1 = NULL, Alpha = 0.05, threads_nb = 0)
Arguments
res.qch.fit |
The result provided by the |
Hconfig |
A list of all possible combination of |
Hconfig.H1 |
An integer vector (or a list of such vector) of the |
Alpha |
the nominal Type I error rate for FDR control. Default is |
threads_nb |
The number of threads to use. The number of thread will set to the number of cores available by default. |
Details
By default, the function performs the composite hypothesis test of being associated with "at least q
" simple tests, for q=1,..Q
.
Value
A list with the following elements:
Rejection | a matrix providing for each item the result of the composite hypothesis test, after adaptive Benjamin-Höchberg multiple testing correction. |
lFDR | a matrix providing for each item its local FDR estimate. |
Pvalues | a matrix providing for each item its p-value of the composite hypothesis test. |
See Also
qch.fit()
, GetH1AtLeast()
,GetH1Equal()
Examples
data(PvalSets_cor)
PvalMat <- as.matrix(PvalSets_cor[, -3])
Truth <- PvalSets[, 3]
## Build the Hconfig objects
Q <- 2
Hconfig <- GetHconfig(Q)
## Infer the posteriors
res.fit <- qch.fit(pValMat = PvalMat, Hconfig = Hconfig, copula = "gaussian")
## Run the test procedure with FDR control
H1config <- GetH1AtLeast(Hconfig, 2)
res.test <- qch.test(res.qch.fit = res.fit, Hconfig = Hconfig, Hconfig.H1 = H1config)
table(res.test$Rejection$AtLeast_2, Truth == 4)