Title: | Mixed, Low-Rank, and Sparse Multivariate Regression on High-Dimensional Data |
Version: | 0.1.0 |
Description: | Mixed, low-rank, and sparse multivariate regression ('mixedLSR') provides tools for performing mixture regression when the coefficient matrix is low-rank and sparse. 'mixedLSR' allows subgroup identification by alternating optimization with simulated annealing to encourage global optimum convergence. This method is data-adaptive, automatically performing parameter selection to identify low-rank substructures in the coefficient matrix. |
License: | MIT + file LICENSE |
Encoding: | UTF-8 |
RoxygenNote: | 7.2.1 |
Depends: | R (≥ 4.1.0) |
Imports: | grpreg, purrr, MASS, stats, ggplot2 |
Suggests: | knitr, rmarkdown, mclust |
VignetteBuilder: | knitr |
BugReports: | https://github.com/alexanderjwhite/mixedLSR |
URL: | https://alexanderjwhite.github.io/mixedLSR/ |
NeedsCompilation: | no |
Packaged: | 2022-11-04 10:33:31 UTC; whitealj |
Author: | Alexander White |
Maintainer: | Alexander White <whitealj@iu.edu> |
Repository: | CRAN |
Date/Publication: | 2022-11-04 20:00:02 UTC |
Compute Bayesian information criterion for a mixedLSR model
Description
Compute Bayesian information criterion for a mixedLSR model
Usage
bic_lsr(a, n, llik)
Arguments
a |
A list of coefficient matrices. |
n |
The sample size. |
llik |
The log-likelihood of the model. |
Value
The BIC.
Examples
n <- 50
simulate <- simulate_lsr(n)
model <- mixed_lsr(simulate$x, simulate$y, k = 2, init_lambda = c(1,1), alt_iter = 0)
bic_lsr(model$A, n = n, model$llik)
Internal Alternating Optimization Function
Description
Internal Alternating Optimization Function
Usage
fct_alt_optimize(
x,
y,
k,
clust_assign,
lambda,
alt_iter,
anneal_iter,
em_iter,
temp,
mu,
eps,
accept_prob,
sim_N,
verbose
)
Arguments
x |
A matrix of predictors. |
y |
A matrix of responses. |
k |
The number of groups. |
clust_assign |
The current clustering assignment. |
lambda |
A vector of penalization parameters. |
alt_iter |
The maximum number of times to alternate between the classification expectation maximization algorithm and the simulated annealing algorithm. |
anneal_iter |
The maximum number of simulated annealing iterations. |
em_iter |
The maximum number of EM iterations. |
temp |
The initial simulated annealing temperature, temp > 0. |
mu |
The simulated annealing decrease temperature fraction. Once the best configuration cannot be improved, reduce the temperature to (mu)T, 0 < mu < 1. |
eps |
The final simulated annealing temperature, eps > 0. |
accept_prob |
The simulated annealing probability of accepting a new assignment 0 < accept_prob < 1. When closer to 1, trial assignments will only be small perturbation of the current assignment. When closer to 0, trial assignments are closer to random. |
sim_N |
The simulated annealing number of iterations for reaching equilibrium. |
verbose |
A boolean indicating whether to print to screen. |
Value
A final fit of mixedLSR
Internal Double Penalized Projection Function
Description
Internal Double Penalized Projection Function
Usage
fct_dpp(
y,
x,
rank,
lambda = NULL,
alpha = 2 * sqrt(3),
beta = 1,
sigma,
ptype = "grLasso",
y_sparse = TRUE
)
Arguments
y |
A matrix of responses. |
x |
A matrix of predictors. |
rank |
The rank, if known. |
lambda |
A vector of penalization parameters. |
alpha |
A positive constant DPP parameter. |
beta |
A positive constant DPP parameter. |
sigma |
An estimated standard deviation |
ptype |
A group penalized regression penalty type. See grpreg. |
y_sparse |
Should Y coefficients be treated as sparse? |
Value
A list containing estimated coefficients, covariance, and penalty parameters.
Internal EM Algorithm
Description
Internal EM Algorithm
Usage
fct_em(x, y, k, lambda, clust_assign, lik_track, em_iter, verbose)
Arguments
x |
A matrix of predictors. |
y |
A matrix of responses. |
k |
The number of groups. |
lambda |
A vector of penalization parameters. |
clust_assign |
The current clustering assignment. |
lik_track |
A vector storing the log-likelihood by iteration. |
em_iter |
The maximum number of EM iterations. |
verbose |
A boolean indicating whether to print to screen. |
Value
A mixedLSR model.
Internal Posterior Calculation
Description
Internal Posterior Calculation
Usage
fct_gamma(
x,
y,
k,
N,
clust_assign,
pi_vec,
lambda,
alpha,
beta,
y_sparse,
rank,
max_rank
)
Arguments
x |
A matrix of predictors. |
y |
A matrix of responses. |
k |
The number of groups. |
N |
The sample size. |
clust_assign |
The current clustering assignment. |
pi_vec |
A vector of mixing probabilities for each cluster label. |
lambda |
A vector of penalization parameters. |
alpha |
A positive constant DPP parameter. |
beta |
A positive constant DPP parameter. |
y_sparse |
Should Y coefficients be treated as sparse? |
rank |
The rank, if known. |
max_rank |
The maximum allowed rank. |
Value
A list with the posterior, coefficients, and estimated covariance.
Internal Partition Initialization Function
Description
Internal Partition Initialization Function
Usage
fct_initialize(k, N)
Arguments
k |
The number of groups. |
N |
The sample size. |
Value
A vector of assignments.
Internal Likelihood Function
Description
Internal Likelihood Function
Usage
fct_j_lik(
x,
y,
k,
clust_assign,
lambda,
alpha = 2 * sqrt(3),
beta = 1,
y_sparse = TRUE,
max_rank = 3,
rank = NULL
)
Arguments
x |
A matrix of predictors. |
y |
A matrix of responses. |
k |
The number of groups. |
clust_assign |
A vector of cluster labels. |
lambda |
A vector of penalization parameters. |
alpha |
A positive constant DPP parameter. |
beta |
A positive constant DPP parameter. |
y_sparse |
Should Y coefficients be treated as sparse? |
max_rank |
The maximum allowed rank. |
rank |
The rank, if known. |
Value
The weighted log-likelihood
Internal Log-Likelihood Function
Description
Internal Log-Likelihood Function
Usage
fct_log_lik(mu_mat, sig_vec, y, N, m)
Arguments
mu_mat |
The mean matrix. |
sig_vec |
A vector of sigma. |
y |
The output matrix. |
N |
The sample size. |
m |
The number of y features. |
Value
A posterior matrix.
Internal Perturb Function
Description
Internal Perturb Function
Usage
fct_new_assign(assign, k, p)
Arguments
assign |
The current clustering assignments. |
k |
The number of groups. |
p |
The acceptance probability. |
Value
A perturbed assignment.
Internal Pi Function
Description
Internal Pi Function
Usage
fct_pi_vec(clust_assign, k, N)
Arguments
clust_assign |
The current clustering assignment. |
k |
The number of groups. |
N |
The sample size. |
Value
A mixing vector.
Internal Rank Estimation Function
Description
Internal Rank Estimation Function
Usage
fct_rank(x, y, sigma, eta)
Arguments
x |
A matrix of predictors. |
y |
A matrix of responses. |
sigma |
An estimated noise level. |
eta |
A rank selection parameter. |
Value
The estimated rank.
Internal Penalty Parameter Selection Function.
Description
Internal Penalty Parameter Selection Function.
Usage
fct_select_lambda(
x,
y,
k,
clust_assign = NULL,
initial = FALSE,
type = "all",
verbose
)
Arguments
x |
A matrix of predictors. |
y |
A matrix of responses. |
k |
The number of groups. |
clust_assign |
The current clustering assignment. |
initial |
An initial penalty parameter. |
type |
A type. |
verbose |
A boolean indicating whether to print to screen. |
Value
A selected penalty parameter.
Internal Sigma Estimation Function
Description
Internal Sigma Estimation Function
Usage
fct_sigma(y, N, m)
Arguments
y |
A matrix of responses. |
N |
The sample size. |
m |
The number of outcome variables. |
Value
The estimated sigma.
Internal Simulated Annealing Function
Description
Internal Simulated Annealing Function
Usage
fct_sim_anneal(
x,
y,
k,
init_assign,
lambda,
temp,
mu,
eps,
accept_prob,
sim_N,
track,
anneal_iter = 1000,
verbose
)
Arguments
x |
A matrix of predictors. |
y |
A matrix of responses. |
k |
The number of groups. |
init_assign |
An initial clustering assignment. |
lambda |
A vector of penalization parameters. |
temp |
The initial simulated annealing temperature, temp > 0. |
mu |
The simulated annealing decrease temperature fraction. Once the best configuration cannot be improved, reduce the temperature to (mu)T, 0 < mu < 1. |
eps |
The final simulated annealing temperature, eps > 0. |
accept_prob |
The simulated annealing probability of accepting a new assignment 0 < accept_prob < 1. When closer to 1, trial assignments will only be small perturbation of the current assignment. When closer to 0, trial assignments are closer to random. |
sim_N |
The simulated annealing number of iterations for reaching equilibrium. |
track |
A likelihood tracking vector. |
anneal_iter |
The maximum number of simulated annealing iterations. |
verbose |
A boolean indicating whether to print to screen. |
Value
An updated clustering vector.
Internal Weighted Log Likelihood Function
Description
Internal Weighted Log Likelihood Function
Usage
fct_weighted_ll(gamma)
Arguments
gamma |
A posterior matrix |
Value
A weighted log likelihood vector
Mixed Low-Rank and Sparse Multivariate Regression for High-Dimensional Data
Description
Mixed Low-Rank and Sparse Multivariate Regression for High-Dimensional Data
Usage
mixed_lsr(
x,
y,
k,
nstart = 1,
init_assign = NULL,
init_lambda = NULL,
alt_iter = 5,
anneal_iter = 1000,
em_iter = 1000,
temp = 1000,
mu = 0.95,
eps = 1e-06,
accept_prob = 0.95,
sim_N = 200,
verbose = TRUE
)
Arguments
x |
A matrix of predictors. |
y |
A matrix of responses. |
k |
The number of groups. |
nstart |
The number of random initializations, the result with the maximum likelihood is returned. |
init_assign |
A vector of initial assignments, NULL by default. |
init_lambda |
A vector with the values to initialize the penalization parameter for each group, e.g., c(1,1,1). Set to NULL by default. |
alt_iter |
The maximum number of times to alternate between the classification expectation maximization algorithm and the simulated annealing algorithm. |
anneal_iter |
The maximum number of simulated annealing iterations. |
em_iter |
The maximum number of EM iterations. |
temp |
The initial simulated annealing temperature, temp > 0. |
mu |
The simulated annealing decrease temperature fraction. Once the best configuration cannot be improved, reduce the temperature to (mu)T, 0 < mu < 1. |
eps |
The final simulated annealing temperature, eps > 0. |
accept_prob |
The simulated annealing probability of accepting a new assignment 0 < accept_prob < 1. When closer to 1, trial assignments will only be small perturbation of the current assignment. When closer to 0, trial assignments are closer to random. |
sim_N |
The simulated annealing number of iterations for reaching equilibrium. |
verbose |
A boolean indicating whether to print to screen. |
Value
A list containing the likelihood, the partition, the coefficient matrices, and the BIC.
Examples
simulate <- simulate_lsr(50)
mixed_lsr(simulate$x, simulate$y, k = 2, init_lambda = c(1,1), alt_iter = 0)
Heatmap Plot of the mixedLSR Coefficient Matrices
Description
Heatmap Plot of the mixedLSR Coefficient Matrices
Usage
plot_lsr(a, abs = TRUE)
Arguments
a |
A coefficient matrix from mixed_lsr model. |
abs |
A boolean for taking the absolute value of the coefficient matrix. |
Value
A ggplot2 heatmap of the coefficient matrix, separated by subgroup.
Examples
simulate <- simulate_lsr()
plot_lsr(simulate$a)
Simulate Heterogeneous, Low-Rank, and Sparse Data
Description
Simulate Heterogeneous, Low-Rank, and Sparse Data
Usage
simulate_lsr(
N = 100,
k = 2,
p = 30,
m = 35,
b = 1,
d = 20,
h = 0.2,
case = "independent"
)
Arguments
N |
The sample size, default = 100. |
k |
The number of groups, default = 2. |
p |
The number of predictor features, default = 30. |
m |
The number of response features, default = 35. |
b |
The signal-to-noise ratio, default = 1. |
d |
The singular value, default = 20. |
h |
The lower bound for the singular matrix simulation, default = 0.2. |
case |
The covariance case, "independent" or "dependent", default = "independent". |
Value
A list of simulation values, including x matrix, y matrix, coefficients and true clustering assignments.
Examples
simulate_lsr()