Title: | Two-Sample Nonparametric Testing Under Heterogeneity |
Version: | 1.0.0 |
Description: | Implements the TRUH test statistic for two sample testing under heterogeneity. TRUH incorporates the underlying heterogeneity and imbalance in the samples, and provides a conservative test for the composite null hypothesis that the two samples arise from the same mixture distribution but may differ with respect to the mixing weights. See Trambak Banerjee, Bhaswar B. Bhattacharya, Gourab Mukherjee Ann. Appl. Stat. 14(4): 1777-1805 (December 2020). <doi:10.1214/20-AOAS1362> for more details. |
License: | GPL (≥ 3) |
Encoding: | UTF-8 |
URL: | https://github.com/natesmith07/truh |
Imports: | Rfast, cluster, doParallel, foreach, iterators, fpc, parallel |
RoxygenNote: | 7.1.1 |
Suggests: | rmarkdown, knitr |
VignetteBuilder: | knitr |
NeedsCompilation: | no |
Packaged: | 2021-09-03 18:37:13 UTC; Jolly |
Author: | Nathan Smith [aut, cre], Trambak Banerjee [aut], Bhaswar Bhattacharya [aut], Gourab Mukherjee [aut] |
Maintainer: | Nathan Smith <nathan_smith_99@ku.edu> |
Repository: | CRAN |
Date/Publication: | 2021-09-08 08:00:02 UTC |
Nearest neighbor computation for the TRUH statistic
Description
For a given d
dimensional vector \mathbf{y}
, this function finds the nearest neighbor of \mathbf{y}
in
a n\times d
matrix \mathbf{U}
.
Usage
nearest(y, U, n, d)
Arguments
y |
a |
U |
a |
n |
the sample size. |
d |
dimension of each sample. |
Value
d1 - nearest neighbor of
\mathbf{y}
in\mathbf{U}
d2 - nearest neighbor of d1 in
\mathbf{U}
See Also
Examples
library(truh)
n = 100
d = 3
set.seed(1)
y = rnorm(3)
set.seed(2)
U = matrix(rnorm(n*d),nrow=n,ncol=d)
out = nearest(y,U,n,d)
TRUH test statistic
Description
TRUH test statistic for nonparametric two sample testing under heterogeneity.
Usage
truh(V, U, B, fc = 1, ncores = 2, seed = 1)
Arguments
V |
|
U |
a |
B |
number of bootstrap samples. |
fc |
fold change constant. The default value is 1. See equation (2.8) of the referenced paper for more details. |
ncores |
the number of computing cores available. The default value is 2. |
seed |
random seed for replicability. The default value is 1. |
Value
teststat - TRUH test statistic.
k.hat - number of clusters detected in the uninfected sample.
pval - The maximum p-value across the detected clusters.
pval_all - p-value for each cluster.
dist.null_all - the approximate bootstrapped based null distribution.
References
Banerjee, Trambak, Bhaswar B. Bhattacharya, and Gourab Mukherjee. "A nearest-neighbor based nonparametric test for viral remodeling in heterogeneous single-cell proteomic data." The Annals of Applied Statistics 14, no. 4 (2020): 1777-1805.
See Also
Examples
library(truh)
n = 500
m = 10
d = 3
set.seed(1)
V = matrix(rnorm(m*d),nrow=m,ncol=d)
set.seed(2)
U = matrix(rnorm(n*d),nrow=n,ncol=d)
out = truh(V,U,100)