Help for package truh

Title:

Two-Sample Nonparametric Testing Under Heterogeneity

Version:

1.0.0

Description:

Implements the TRUH test statistic for two sample testing under heterogeneity. TRUH incorporates the underlying heterogeneity and imbalance in the samples, and provides a conservative test for the composite null hypothesis that the two samples arise from the same mixture distribution but may differ with respect to the mixing weights. See Trambak Banerjee, Bhaswar B. Bhattacharya, Gourab Mukherjee Ann. Appl. Stat. 14(4): 1777-1805 (December 2020). <doi:10.1214/20-AOAS1362> for more details.

License:

GPL (≥ 3)

Encoding:

UTF-8

URL:

https://github.com/natesmith07/truh

Imports:

Rfast, cluster, doParallel, foreach, iterators, fpc, parallel

RoxygenNote:

7.1.1

Suggests:

rmarkdown, knitr

VignetteBuilder:

knitr

NeedsCompilation:

Packaged:

2021-09-03 18:37:13 UTC; Jolly

Author:

Nathan Smith [aut, cre], Trambak Banerjee [aut], Bhaswar Bhattacharya [aut], Gourab Mukherjee [aut]

Maintainer:

Nathan Smith <nathan_smith_99@ku.edu>

Repository:

CRAN

Date/Publication:

2021-09-08 08:00:02 UTC

Nearest neighbor computation for the TRUH statistic

Description

For a given d dimensional vector \mathbf{y}, this function finds the nearest neighbor of \mathbf{y} in a n\times d matrix \mathbf{U}.

Usage

nearest(y, U, n, d)

Arguments

y

a d dimensional vector.

U

a n\times d matrix where n represents the sample size and d is the dimension of each sample.

n

the sample size.

d

dimension of each sample.

Value

d1 - nearest neighbor of \mathbf{y} in \mathbf{U}
d2 - nearest neighbor of d1 in \mathbf{U}

Examples

library(truh)
n = 100
d = 3
set.seed(1)
y = rnorm(3)
set.seed(2)
U = matrix(rnorm(n*d),nrow=n,ncol=d)
out = nearest(y,U,n,d)

TRUH test statistic

Description

TRUH test statistic for nonparametric two sample testing under heterogeneity.

Usage

truh(V, U, B, fc = 1, ncores = 2, seed = 1)

Arguments

V

m\times d matrix where m represents the sample size and d is the dimension of each sample.

U

a n\times d matrix where n represents the sample size and d is the dimension of each sample with m\ll n.

B

number of bootstrap samples.

fc

fold change constant. The default value is 1. See equation (2.8) of the referenced paper for more details.

ncores

the number of computing cores available. The default value is 2.

seed

random seed for replicability. The default value is 1.

Value

teststat - TRUH test statistic.
k.hat - number of clusters detected in the uninfected sample.
pval - The maximum p-value across the detected clusters.
pval_all - p-value for each cluster.
dist.null_all - the approximate bootstrapped based null distribution.

References

Banerjee, Trambak, Bhaswar B. Bhattacharya, and Gourab Mukherjee. "A nearest-neighbor based nonparametric test for viral remodeling in heterogeneous single-cell proteomic data." The Annals of Applied Statistics 14, no. 4 (2020): 1777-1805.

Examples

library(truh)
n = 500
m = 10
d = 3
set.seed(1)
V = matrix(rnorm(m*d),nrow=m,ncol=d)
set.seed(2)
U = matrix(rnorm(n*d),nrow=n,ncol=d)
out = truh(V,U,100)

Nearest neighbor computation for the TRUH statistic

Description

Usage

Arguments

Value

See Also

Examples

TRUH test statistic

Description

Usage

Arguments

Value

References

See Also

Examples