Title: | Covariate Dependent Graph Estimation |
Version: | 1.0.1 |
Date: | 2022-09-16 |
Language: | en-US |
BugReports: | https://github.com/JacobHelwig/covdepGE/issues |
URL: | https://github.com/JacobHelwig/covdepGE |
Description: | A covariate-dependent approach to Gaussian graphical modeling as described in Dasgupta et al. (2022). Employs a novel weighted pseudo-likelihood approach to model the conditional dependence structure of data as a continuous function of an extraneous covariate. The main function, covdepGE::covdepGE(), estimates a graphical representation of the conditional dependence structure via a block mean-field variational approximation, while several auxiliary functions (inclusionCurve(), matViz(), and plot.covdepGE()) are included for visualizing the resulting estimates. |
License: | GPL (≥ 3) |
Encoding: | UTF-8 |
RoxygenNote: | 7.2.1 |
LinkingTo: | Rcpp, RcppArmadillo |
Imports: | doParallel, foreach, ggplot2, glmnet, latex2exp, MASS, parallel, Rcpp, reshape2, stats |
Suggests: | testthat (≥ 3.0.0), covr, vdiffr |
Config/testthat/edition: | 3 |
NeedsCompilation: | yes |
Packaged: | 2022-09-16 15:25:55 UTC; jacob.a.helwig |
Author: | Jacob Helwig [cre, aut], Sutanoy Dasgupta [aut], Peng Zhao [aut], Bani Mallick [aut], Debdeep Pati [aut] |
Maintainer: | Jacob Helwig <jacob.a.helwig@tamu.edu> |
Repository: | CRAN |
Date/Publication: | 2022-09-16 15:56:08 UTC |
covdepGE: Covariate Dependent Graph Estimation
Description
A covariate-dependent approach to Gaussian graphical modeling as described in Dasgupta et al. (2022). Employs a novel weighted pseudo-likelihood approach to model the conditional dependence structure of data as a continuous function of an extraneous covariate. The main function, covdepGE::covdepGE(), estimates a graphical representation of the conditional dependence structure via a block mean-field variational approximation, while several auxiliary functions (inclusionCurve(), matViz(), and plot.covdepGE()) are included for visualizing the resulting estimates.
Author(s)
Maintainer: Jacob Helwig jacob.a.helwig@tamu.edu
Authors:
Sutanoy Dasgupta sutanoy@stat.tamu.edu
Peng Zhao pzhao@stat.tamu.edu
Bani Mallick bmallick@stat.tamu.edu
Debdeep Pati debdeep@stat.tamu.edu
References
(1) Sutanoy Dasgupta, Peng Zhao, Prasenjit Ghosh, Debdeep Pati, and Bani Mallick. An approximate Bayesian approach to covariate-dependent graphical modeling. pages 1–59, 2022.
See Also
Useful links:
Report bugs at https://github.com/JacobHelwig/covdepGE/issues
Covariate Dependent Graph Estimation
Description
Model the conditional dependence structure of X
as a function
of Z
as described in (1)
Usage
covdepGE(
X,
Z = NULL,
hp_method = "hybrid",
ssq = NULL,
sbsq = NULL,
pip = NULL,
nssq = 5,
nsbsq = 5,
npip = 5,
ssq_mult = 1.5,
ssq_lower = 1e-05,
snr_upper = 25,
sbsq_lower = 1e-05,
pip_lower = 1e-05,
pip_upper = NULL,
tau = NULL,
norm = 2,
center_X = TRUE,
scale_Z = TRUE,
alpha_tol = 1e-05,
max_iter_grid = 10,
max_iter = 100,
edge_threshold = 0.5,
sym_method = "mean",
parallel = FALSE,
num_workers = NULL,
prog_bar = TRUE
)
Arguments
X |
|
Z |
Z <- rep(0, nrow(X)) If |
hp_method |
|
ssq |
ssq <- seq(ssq_lower, ssq_upper, length.out = nssq)
|
sbsq |
sbsq <- seq(sbsq_lower, sbsq_upper, length.out = nsbsq)
|
pip |
pip <- seq(pip_lower, pi_upper, length.out = npip)
|
nssq |
positive integer; number of points to generate for |
nsbsq |
positive integer; number of points to generate for |
npip |
positive integer; number of points to generate for |
ssq_mult |
positive numeric; if ssq_upper <- ssq_mult * stats::var(X_j) Then, |
ssq_lower |
positive numeric; if |
snr_upper |
positive numeric; upper bound on the signal-to-noise ratio.
If s2_sum <- sum(apply(X, 2, stats::var)) sbsq_upper <- snr_upper / (pip_upper * s2_sum) Then, |
sbsq_lower |
positive numeric; if |
pip_lower |
numeric in |
pip_upper |
lasso <- glmnet::cv.glmnet(X, X_j) non0 <- sum(glmnet::coef.glmnet(lasso, s = "lambda.1se")[-1] != 0) non0 <- min(max(non0, 1), p - 1) pip_upper <- non0 / p
|
tau |
|
norm |
numeric in |
center_X |
logical; if |
scale_Z |
logical; if |
alpha_tol |
positive numeric; end CAVI when the Frobenius norm of the
change in the alpha matrix is within |
max_iter_grid |
positive integer; if tolerance criteria has not been
met by |
max_iter |
positive integer; if tolerance criteria has not been met by
|
edge_threshold |
numeric in |
sym_method |
|
parallel |
logical; if doParallel::registerDoParallel(num_workers)
|
num_workers |
num_workers <- floor(parallel::detectCores() / 2)
|
prog_bar |
logical; if |
Value
Returns object of class covdepGE
with the following values:
graphs |
list with the following values:
|
variational_params |
list with the following values:
|
hyperparameters |
list of
|
model_details |
list with the following values:
|
weights |
list with the following values:
|
References
(1) Sutanoy Dasgupta, Peng Zhao, Prasenjit Ghosh, Debdeep Pati, and Bani Mallick. An approximate Bayesian approach to covariate-dependent graphical modeling. pages 1–59, 2022.
(2) Sutanoy Dasgupta, Debdeep Pati, and Anuj Srivastava. A Two-Step Geometric Framework For Density Modeling. Statistica Sinica, 30(4):2155–2177, 2020.
Examples
## Not run:
library(ggplot2)
# get the data
set.seed(12)
data <- generateData()
X <- data$X
Z <- data$Z
interval <- data$interval
prec <- data$true_precision
# get overall and within interval sample sizes
n <- nrow(X)
n1 <- sum(interval == 1)
n2 <- sum(interval == 2)
n3 <- sum(interval == 3)
# visualize the distribution of the extraneous covariate
ggplot(data.frame(Z = Z, interval = as.factor(interval))) +
geom_histogram(aes(Z, fill = interval), color = "black", bins = n %/% 5)
# visualize the true precision matrices in each of the intervals
# interval 1
matViz(prec[[1]], incl_val = TRUE) +
ggtitle(paste0("True precision matrix, interval 1, observations 1,...,", n1))
# interval 2 (varies continuously with Z)
cat("\nInterval 2, observations ", n1 + 1, ",...,", n1 + n2, sep = "")
int2_mats <- prec[interval == 2]
int2_inds <- c(5, n2 %/% 2, n2 - 5)
lapply(int2_inds, function(j) matViz(int2_mats[[j]], incl_val = TRUE) +
ggtitle(paste("True precision matrix, interval 2, observation", j + n1)))
# interval 3
matViz(prec[[length(prec)]], incl_val = TRUE) +
ggtitle(paste0("True precision matrix, interval 3, observations ",
n1 + n2 + 1, ",...,", n1 + n2 + n3))
# fit the model and visualize the estimated graphs
(out <- covdepGE(X, Z))
plot(out)
# visualize the posterior inclusion probabilities for variables (1, 3) and (1, 2)
inclusionCurve(out, 1, 2)
inclusionCurve(out, 1, 3)
## End(Not run)
Generate Covariate-Dependent Data
Description
Generate a 1
-dimensional extraneous covariate
and p
-dimensional Gaussian data with a precision matrix that varies as
a continuous function of the extraneous covariate. This data is distributed
similar to that used in the simulation study from (1)
Usage
generateData(p = 5, n1 = 60, n2 = 60, n3 = 60, Z = NULL, true_precision = NULL)
Arguments
p |
positive integer; number of variables in the data matrix. |
n1 |
positive integer; number of observations in the first interval.
|
n2 |
positive integer; number of observations in the second interval.
|
n3 |
positive integer; number of observations in the third interval.
|
Z |
|
true_precision |
|
Value
Returns list with the following values:
X |
a |
Z |
a |
true_precision |
list of |
interval |
vector of length |
Extraneous Covariate
If Z = NULL
, then the generation of Z
is as follows:
The first n1
observations have z_i
from from a uniform
distribution on the interval (-3, -1)
(the first interval).
Observations n1 + 1
to n1 + n2
have z_i
from from a uniform
distribution on the interval (-1, 1)
(the second interval).
Observations n1 + n2 + 1
to n1 + n2 + n3
have z_i
from a
uniform distribution on the interval (1, 3)
(the third interval).
Precision Matrices
If true_precision = NULL
, then the generation of the true precision
matrices is as follows:
All precision matrices have 2
on the diagonal and 1
in the
(2, 3)/ (3, 2)
positions.
Observations in the first interval have a 1
in the
(1, 2) / (1, 2)
positions, while observations in the third interval
have a 1
in the (1, 3)/ (3, 1)
positions.
Observations in the second interval have 2
entries that vary as a
linear function of their extraneous covariate. Let
\beta = 1/2
. Then, the (1, 2)/(2, 1)
positions for
the i
-th observation in the second interval are
\beta\cdot(1 - z_i)
, while the (1, 3)/ (3, 1)
entries are \beta\cdot(1 + z_i)
.
Thus, as z_i
approaches -1
from the right, the associated
precision matrix becomes more similar to the matrix for observations in the
first interval. Similarly, as z_i
approaches 1
from the left,
the matrix becomes more similar to the matrix for observations in the third
interval.
Examples
## Not run:
library(ggplot2)
# get the data
set.seed(12)
data <- generateData()
X <- data$X
Z <- data$Z
interval <- data$interval
prec <- data$true_precision
# get overall and within interval sample sizes
n <- nrow(X)
n1 <- sum(interval == 1)
n2 <- sum(interval == 2)
n3 <- sum(interval == 3)
# visualize the distribution of the extraneous covariate
ggplot(data.frame(Z = Z, interval = as.factor(interval))) +
geom_histogram(aes(Z, fill = interval), color = "black", bins = n %/% 5)
# visualize the true precision matrices in each of the intervals
# interval 1
matViz(prec[[1]], incl_val = TRUE) +
ggtitle(paste0("True precision matrix, interval 1, observations 1,...,", n1))
# interval 2 (varies continuously with Z)
cat("\nInterval 2, observations ", n1 + 1, ",...,", n1 + n2, sep = "")
int2_mats <- prec[interval == 2]
int2_inds <- c(5, n2 %/% 2, n2 - 5)
lapply(int2_inds, function(j) matViz(int2_mats[[j]], incl_val = TRUE) +
ggtitle(paste("True precision matrix, interval 2, observation", j + n1)))
# interval 3
matViz(prec[[length(prec)]], incl_val = TRUE) +
ggtitle(paste0("True precision matrix, interval 3, observations ",
n1 + n2 + 1, ",...,", n1 + n2 + n3))
# fit the model and visualize the estimated graphs
(out <- covdepGE(X, Z))
plot(out)
# visualize the posterior inclusion probabilities for variables (1, 3) and (1, 2)
inclusionCurve(out, 1, 2)
inclusionCurve(out, 1, 3)
## End(Not run)
Plot PIP as a Function of Index
Description
Plot the posterior inclusion probability of an edge between two variables as a function of observation index
Usage
inclusionCurve(
out,
col_idx1,
col_idx2,
line_type = "solid",
line_size = 0.5,
line_color = "black",
point_shape = 21,
point_size = 1.5,
point_color = "#500000",
point_fill = "white"
)
Arguments
out |
object of class |
col_idx1 |
integer in |
col_idx2 |
integer in |
line_type |
linetype; |
line_size |
positive numeric; thickness of the interpolating line.
|
line_color |
color; color of interpolating line. |
point_shape |
shape; shape of the points denoting observation-specific
inclusion probabilities; |
point_size |
positive numeric; size of probability points. |
point_color |
color; color of probability points. |
point_fill |
color; fill of probability points. Only applies to select
shapes. |
Value
Returns ggplot2
visualization of inclusion probability curve
Examples
## Not run:
library(ggplot2)
# get the data
set.seed(12)
data <- generateData()
X <- data$X
Z <- data$Z
interval <- data$interval
prec <- data$true_precision
# get overall and within interval sample sizes
n <- nrow(X)
n1 <- sum(interval == 1)
n2 <- sum(interval == 2)
n3 <- sum(interval == 3)
# visualize the distribution of the extraneous covariate
ggplot(data.frame(Z = Z, interval = as.factor(interval))) +
geom_histogram(aes(Z, fill = interval), color = "black", bins = n %/% 5)
# visualize the true precision matrices in each of the intervals
# interval 1
matViz(prec[[1]], incl_val = TRUE) +
ggtitle(paste0("True precision matrix, interval 1, observations 1,...,", n1))
# interval 2 (varies continuously with Z)
cat("\nInterval 2, observations ", n1 + 1, ",...,", n1 + n2, sep = "")
int2_mats <- prec[interval == 2]
int2_inds <- c(5, n2 %/% 2, n2 - 5)
lapply(int2_inds, function(j) matViz(int2_mats[[j]], incl_val = TRUE) +
ggtitle(paste("True precision matrix, interval 2, observation", j + n1)))
# interval 3
matViz(prec[[length(prec)]], incl_val = TRUE) +
ggtitle(paste0("True precision matrix, interval 3, observations ",
n1 + n2 + 1, ",...,", n1 + n2 + n3))
# fit the model and visualize the estimated graphs
(out <- covdepGE(X, Z))
plot(out)
# visualize the posterior inclusion probabilities for variables (1, 3) and (1, 2)
inclusionCurve(out, 1, 2)
inclusionCurve(out, 1, 3)
## End(Not run)
Visualize a matrix
Description
Create a visualization of a matrix
Usage
matViz(
x,
color1 = "white",
color2 = "#500000",
grid_color = "black",
incl_val = FALSE,
prec = 2,
font_size = 3,
font_color1 = "black",
font_color2 = "white",
font_thres = mean(x)
)
Arguments
x |
matrix; matrix to be visualized |
color1 |
color; color for low entries. |
color2 |
color; color for high entries. |
grid_color |
color; color of grid lines. |
incl_val |
logical; if |
prec |
positive integer; number of decimal places to round entries to if
|
font_size |
positive numeric; size of font if |
font_color1 |
color; color of font for low entries if |
font_color2 |
color; color of font for high entries if |
font_thres |
numeric; values less than |
Value
Returns ggplot2
visualization of matrix
Examples
## Not run:
library(ggplot2)
# get the data
set.seed(12)
data <- generateData()
X <- data$X
Z <- data$Z
interval <- data$interval
prec <- data$true_precision
# get overall and within interval sample sizes
n <- nrow(X)
n1 <- sum(interval == 1)
n2 <- sum(interval == 2)
n3 <- sum(interval == 3)
# visualize the distribution of the extraneous covariate
ggplot(data.frame(Z = Z, interval = as.factor(interval))) +
geom_histogram(aes(Z, fill = interval), color = "black", bins = n %/% 5)
# visualize the true precision matrices in each of the intervals
# interval 1
matViz(prec[[1]], incl_val = TRUE) +
ggtitle(paste0("True precision matrix, interval 1, observations 1,...,", n1))
# interval 2 (varies continuously with Z)
cat("\nInterval 2, observations ", n1 + 1, ",...,", n1 + n2, sep = "")
int2_mats <- prec[interval == 2]
int2_inds <- c(5, n2 %/% 2, n2 - 5)
lapply(int2_inds, function(j) matViz(int2_mats[[j]], incl_val = TRUE) +
ggtitle(paste("True precision matrix, interval 2, observation", j + n1)))
# interval 3
matViz(prec[[length(prec)]], incl_val = TRUE) +
ggtitle(paste0("True precision matrix, interval 3, observations ",
n1 + n2 + 1, ",...,", n1 + n2 + n3))
# fit the model and visualize the estimated graphs
(out <- covdepGE(X, Z))
plot(out)
# visualize the posterior inclusion probabilities for variables (1, 3) and (1, 2)
inclusionCurve(out, 1, 2)
inclusionCurve(out, 1, 3)
## End(Not run)
Plot the Graphs Estimated by covdepGE
Description
Create a list of the unique graphs estimated by covdepGE
Usage
## S3 method for class 'covdepGE'
plot(x, graph_colors = NULL, title_sum = TRUE, ...)
Arguments
x |
object of class |
graph_colors |
|
title_sum |
logical; if |
... |
additional arguments will be ignored |
Value
Returns list of ggplot2
visualizations of unique graphs estimated
by covdepGE
Examples
## Not run:
library(ggplot2)
# get the data
set.seed(12)
data <- generateData()
X <- data$X
Z <- data$Z
interval <- data$interval
prec <- data$true_precision
# get overall and within interval sample sizes
n <- nrow(X)
n1 <- sum(interval == 1)
n2 <- sum(interval == 2)
n3 <- sum(interval == 3)
# visualize the distribution of the extraneous covariate
ggplot(data.frame(Z = Z, interval = as.factor(interval))) +
geom_histogram(aes(Z, fill = interval), color = "black", bins = n %/% 5)
# visualize the true precision matrices in each of the intervals
# interval 1
matViz(prec[[1]], incl_val = TRUE) +
ggtitle(paste0("True precision matrix, interval 1, observations 1,...,", n1))
# interval 2 (varies continuously with Z)
cat("\nInterval 2, observations ", n1 + 1, ",...,", n1 + n2, sep = "")
int2_mats <- prec[interval == 2]
int2_inds <- c(5, n2 %/% 2, n2 - 5)
lapply(int2_inds, function(j) matViz(int2_mats[[j]], incl_val = TRUE) +
ggtitle(paste("True precision matrix, interval 2, observation", j + n1)))
# interval 3
matViz(prec[[length(prec)]], incl_val = TRUE) +
ggtitle(paste0("True precision matrix, interval 3, observations ",
n1 + n2 + 1, ",...,", n1 + n2 + n3))
# fit the model and visualize the estimated graphs
(out <- covdepGE(X, Z))
plot(out)
# visualize the posterior inclusion probabilities for variables (1, 3) and (1, 2)
inclusionCurve(out, 1, 2)
inclusionCurve(out, 1, 3)
## End(Not run)