Title: | Ridge Redundancy Analysis for High-Dimensional Omics Data |
Version: | 0.1.1 |
Description: | Efficient framework for ridge redundancy analysis (rrda), tailored for high-dimensional omics datasets where the number of predictors exceeds the number of samples. The method leverages Singular Value Decomposition (SVD) to avoid direct inversion of the covariance matrix, enhancing scalability and performance. It also introduces a memory-efficient storage strategy for coefficient matrices, enabling practical use in large-scale applications. The package supports cross-validation for selecting regularization parameters and reduced-rank dimensions, making it a robust and flexible tool for multivariate analysis in omics research. Please refer to our article (Yoshioka et al., 2025) for more details. |
License: | GPL (≥ 3) |
Imports: | dplyr, furrr, ggplot2, MASS, reshape2, RSpectra, scales, stats |
Suggests: | testthat (≥ 3.0.0) |
Config/testthat/edition: | 3 |
Encoding: | UTF-8 |
RoxygenNote: | 7.3.2 |
NeedsCompilation: | no |
Packaged: | 2025-04-28 09:03:09 UTC; hayatoyoshioka |
Author: | Hayato Yoshioka |
Maintainer: | Julie Aubert <julie.aubert@inrae.fr> |
Repository: | CRAN |
Date/Publication: | 2025-04-29 09:10:05 UTC |
Generate a list of rank-specific Bhat matrices (the coefficient of Ridge Redundancy Analysis for each parameter lambda and nrank).
Description
Generate a list of rank-specific Bhat matrices (the coefficient of Ridge Redundancy Analysis for each parameter lambda and nrank). In our formula, Bhat is stored as LeftBhatlambda_k (p times r matrix) and RightBhatlambda_k (q times r). Here, n is the number of samples, p is the number of variables of X, q is the number of variables of Y, and r is the specified rank in the Ridge Redundancy Analysis.
Usage
Bhat_mat_rlist(Bhat_comp_1L, nrank)
Arguments
Bhat_comp_1L |
A list containing components of Bhat for each lambda value. |
nrank |
A numeric vector indicating the rank(s) of Bhat. |
Value
A list of matrices, each representing a rank-specific Bhat matrix.
Compute MSE for different ranks of the coefficient Bhat and lambda.
Description
For cross-validation, compute MSE for different ranks of the coefficient Bhat and lambda.
Usage
MSE_lambda_rank(Bhat_comp_1L, X, Y, nrank)
Arguments
Bhat_comp_1L |
A list containing components of Bhat. |
X |
A numeric matrix of explanatory variables. |
Y |
A numeric matrix of response variables. |
nrank |
A numeric vector indicating the rank(s) of Bhat. |
Value
A numeric vector of MSE values for each rank.
Generate a list of rank-specific Yhat matrices.
Description
Generate a list of rank-specific Yhat matrices.
Usage
Yhat_mat_rlist(Bhat_comp_1L, X, nrank)
Arguments
Bhat_comp_1L |
A list containing components of Bhat. |
X |
A numeric matrix of explanatory variables. |
nrank |
A numeric vector indicating the rank(s) of Bhat. |
Value
A list of matrices, each representing rank-specific predicted values Yhat.
Compute the components of the coefficient Bhat using SVD.
Description
Compute the components of Bhat using SVD. In our formula, Bhat is stored as LeftBhatlambda_k (p times r matrix) and RightBhatlambda_k (q times r). Here, n is the number of samples, p is the number of variables of X, q is the number of variables of Y, and r is the specified rank in the Ridge Redundancy Analysis.
Usage
get_Bhat_comp(rsid2_1L, DUtY, nrank, tV)
Arguments
rsid2_1L |
A numeric vector used for each lambda value. |
DUtY |
A numeric matrix (n times q). |
nrank |
A numeric vector indicating the rank(s) of Bhat. |
tV |
A numeric matrix. |
Value
A list containing the left and right components of Bhat (LeftBhatlambda_k
, RightBhatlambda_k
and singular values (Bd) for GSVD of Bhat).
Estimate an appropriate value for the ridge penalty (lambda).
Description
Estimate an appropriate value for the ridge penalty (lambda).
Usage
get_lambda(Y, X, scale.X = FALSE, sample.X = sample.X, sample.Y = sample.Y)
Arguments
Y |
A numeric matrix of response variables. |
X |
A numeric matrix of explanatory variables. |
scale.X |
Logical indicating if |
sample.X |
A number of variables sampled from X for the lamdba range estimate. Default is 1000. |
sample.Y |
A number of variables sampled from Y for the lamdba range estimate. Default is 1000. |
Value
A numeric vector of the range of lambda values.
Generate rank-specific matrices by combining the left and right components.
Description
Generate rank-specific matrices by combining the left and right components of the coefficeint Bhat. In our formula, Bhat is stored as LeftBhatlambda_k (the left component vectors, p times r matrix) and RightBhatlambda_k (the right component vectors, q times r) for each lambda value. Here, n is the number of samples, p is the number of variables of X, q is the number of variables of Y, and r is the specified rank in the Ridge Redundancy Analysis.
Usage
get_rlist(LV, RV, nrank)
Arguments
LV |
A numeric matrix of the left component vectors. |
RV |
A numeric matrix of the right component vectors. |
nrank |
A numeric vector indicating the rank(s) of Bhat. |
Value
A list of matrices of rank-specific combinations of LV and RV.
Generate simulated data for Ridge Redundancy Analysis (RDA).
Description
The function generates simulated data for Ridge Redundancy Analysis (RDA). It creates two data matrices, X and Y, based on a set of shared latent variables H. The function adds noise to the data and returns a list containing the matrices X, Y, the latent variables H, and the regression coefficients theta.y used for generating Y.
Usage
rdasim1(n, p, q, k, s2n = c(5, 5))
Arguments
n |
The number of samples. |
p |
The number of variables of X. |
q |
The number of variables of Y. |
k |
The number of latent variables. |
s2n |
The numeric parameters of signal to noise ratio for X and Y, default value is c(1,1). |
Value
A list containing matrices X, Y, H, and theta.y.
Examples
# Example usage of rdasim1
set.seed(10)
sim_data <- rdasim1(n = 10, p = 5, q = 3, k = 2)
str(sim_data)
Generate simulated data for Ridge Redundancy Analysis (RDA).
Description
The rdasim2 function generates simulated data for Ridge Redundancy Analysis (RDA) with adjustable signal-to-noise ratio and covariance structure for X. The data matrix Y is created by a low-rank model, where the rank is set by the product of two matrices A and C corresponding to the number of latent variables (k). The function allows control over the signal-to-noise ratio (s2n) and off-diagonal elements of the covariance matrix for X (xofd). It returns a list containing the matrices X, Y, the regression coefficient matrix B (obtained as the product of A and C), and the error matrix E.
Usage
rdasim2(n, p, q, k, s2n = 5, xofd = 0)
Arguments
n |
The number of samples. |
p |
The number of variables of X. |
q |
The number of variables of Y. |
k |
The number of latent variables. |
s2n |
The numeric parameter of signal to noise ratio, default value is 5. |
xofd |
The numeric parameter of the off-diagnal elements of covariance matrix of X, default is 0. |
Value
A list containing matrices X, Y, B, E.
Examples
# Example usage of rdasim2
set.seed(10)
sim_data2 <- rdasim2(n = 10, p = 5, q = 3, k = 2)
str(sim_data2)
Calculate the Bhat matrix from the return of the rrda.fit
function.
Description
This function calculates the coefficient Bhat (the coefficient of Ridge Redundancy Analysis for each parameter lambda and nrank) as a matrix form by using the Bhat components calculated by the rrda.fit function. This function obtain the matrix form of Bhat as follows
\hat{B}(\lambda, r) = FG^{\prime}
Here, the Bhat components F and G are obtained from the rrda.fit
function as follows
For i = 1, \ldots, r
, the matrices F
and G
are defined as:
F_{.i} = U_{\hat{B}(\lambda)}^{[i]}D_{\hat{B}(\lambda)}^{[i]}, \quad G_{.i} = V_{\hat{B}(\lambda)}^{[i]}
If the input already contains Bhat as matrix form (Bhat_mat), the function selects the preferred matrix from the list of Bhat matrices.
The function can handle different ranks (nrank) and ridge penalty values (lambda) based on the input. If nrank or lambda is NULL, the function will use the values from the Bhat components. Note that if lambda = NULL and B matrix is large (nrow(B)*ncol(B) > 100000), the function is performed for the minimum lambda value only.
Usage
rrda.coef(Bhat, nrank = NULL, lambda = NULL)
Arguments
Bhat |
A list of vectors of Bhat components, obtained by the |
nrank |
A numeric vector specifying the ranks of Bhat. Default is |
lambda |
A numeric vector of ridge penalty values. Default is |
Value
A list containing the Bhat matrix
Examples
set.seed(10)
simdata<-rdasim1(n = 100,p = 200,q = 200,k = 5)
X <- simdata$X
Y <- simdata$Y
Bhat <- rrda.fit(Y = Y, X = X, nrank = c(1:10))
Bhat_mat <- rrda.coef(Bhat = Bhat, nrank = 10)
Cross-validation for Ridge Redundancy Analysis
Description
This function performs cross-validation to evaluate the performance of Ridge Redundancy Analysis (RDA) models. It calculates the mean squared error (MSE) for different ranks and ridge penalty values through cross-validation folds. The function also supports centering and scaling of the input matrices.
The range of lambda for the cross-validation is automatically calculated following the method of "glmnet" (Friedman et al., 2010). When we have a matrix of response variables (Y; n times q matrix) and a matrix of explanatory variables (X; n times p matrix), the largest lambda for the validation is obtained as follows
\lambda_{\text{max}} = \frac{\max_{j \in \{1, 2, \dots, p\}} \sqrt{\sum_{k=1}^{q} \left( \sum_{i=1}^{n} (x_{ij}\cdot y_{ik}) \right)^2}}{N \times 10^{-3}}
Then, we define \lambda_{min}=10^{-4}\lambda_{max}
, and the sequence \lambda
is generated based on the range.
Also, to reduce the computation, the variable sampling is performed for the large matrix of X and Y (by default, when the number of the variables is over 1000). Alternatively, the range of lambda can be specified manually.
Usage
rrda.cv(
Y,
X,
maxrank = NULL,
lambda = NULL,
nfold = 5,
folds = NULL,
sample.X = 1000,
sample.Y = 1000,
scale.X = FALSE,
scale.Y = FALSE,
center.X = TRUE,
center.Y = TRUE,
verbose = TRUE
)
Arguments
Y |
A numeric matrix of response variables. |
X |
A numeric matrix of explanatory variables. |
maxrank |
A numeric vector specifying the maximum rank of the coefficient Bhat. Default is |
lambda |
A numeric vector of ridge penalty values. Default is |
nfold |
The number of folds for cross-validation. Default is 5. |
folds |
A vector specifying the folds. Default is |
sample.X |
A number of variables sampled from X for the lamdba range estimate. Default is 1000. |
sample.Y |
A number of variables sampled from Y for the lamdba range estimate. Default is 1000. |
scale.X |
Logical indicating if |
scale.Y |
Logical indicating if |
center.X |
Logical indicating if |
center.Y |
Logical indicating if |
verbose |
Logical indicating. If |
Value
A list containing the cross-validated MSE matrix, lambda values, rank values, and the optimal lambda and rank.
Examples
set.seed(10)
simdata<-rdasim1(n = 10,p = 30,q = 30,k = 3)
X <- simdata$X
Y <- simdata$Y
cv_result<- rrda.cv(Y = Y, X = X, maxrank = 5, nfold = 5)
rrda.summary(cv_result = cv_result)
##Complete Example##
# library(future) # <- if you want to compute in parallel
# plan(multisession) # <- if you want to compute in parallel
# cv_result<- rrda.cv(Y = Y, X = X, maxrank = 5, nfold = 5) # cv
# plan(multisession) # <- To come back to sequential computing
# rrda.summary(cv_result = cv_result) # cv result
p <- rrda.plot(cv_result) # cv result plot
print(p)
h <- rrda.heatmap(cv_result) # cv result heatmao
print(h)
estimated_lambda<-cv_result$opt_min$lambda # selected parameter
estimated_rank<-cv_result$opt_min$rank # selected parameter
Bhat <- rrda.fit(Y = Y, X = X, nrank = estimated_rank,lambda = estimated_lambda) # fitting
Bhat_mat<-rrda.coef(Bhat)
Yhat_mat <- rrda.predict(Bhat = Bhat, X = X) # prediction
Yhat<-Yhat_mat[[1]][[1]][[1]] # predicted values
cor_Y_Yhat<-diag(cor(Y,Yhat)) # correlation
summary(cor_Y_Yhat)
Calculate the coefficient Bhat by Ridge Redundancy Analysis.
Description
This function performs Ridge Redundancy Analysis (RRDA) to obtain the coefficient Bhat, which models the relationship between a matrix of response variables (Y; n \times q
matrix) and a matrix of explanatory variables (X; n \times p
matrix). Especially, the function is designed to facilitate a high-dimensional computation and storing (for the details, refer to the article Yoshioka et al. 2025).
The Ridge Redundancy Analysis model is represented as:
Y = XB + E
where:
-
Y
is the response matrix (n \times q
), -
X
is the predictor matrix (n \times p
), -
B
is the regression coefficient matrix (p \times q
), and -
E
is the error matrix (n \times q
).
The regularized estimate of B
is described as:
\hat{B}(\lambda) = \left(X'X + \lambda P_{X'}\right)^{-} X'Y
Additionally, the regularized-rank-restricted estimation of B
is represented as:
\hat{B}(\lambda, r) = U_{\hat{B}(\lambda)}^{[r]} D_{\hat{B}(\lambda)}^{[r]} V_{\hat{B}(\lambda)}^{[r]'}
Here:
-
U_{\hat{B}(\lambda)}^{[r]}
is ap \times r
matrix, -
D_{\hat{B}(\lambda)}^{[r]}
is ar \times r
diagonal matrix, and -
V_{\hat{B}(\lambda)}^{[r]}
is aq \times r
matrix.
The user can specify ranks (nrank), ridge penalty values (lambda), and whether to center and scale the X and Y matrices.
The Bhat can be returned as either component vectors or matrices. To store a large size of matrix, the coefficient Bhat is by default stored as LeftBhatlambda_k (F; p \times r
matrix) and RightBhatlambda_k (G; q \times r
). Here, r is the specified rank (nrank) in the Ridge Redundancy Analysis formula.
For i = 1, \ldots, r
, the matrices F
and G
are defined as:
F_{.i} = U_{\hat{B}(\lambda)}^{[i]}D_{\hat{B}(\lambda)}^{[i]}, \quad G_{.i} = V_{\hat{B}(\lambda)}^{[i]}
These definitions allow the decomposition of \hat{B}(\lambda)
into rank-specific components, facilitating the storing of the high-dimensional regression coefficients. To reconstruct the matrix form of Bhat, you can use the rrda.coef
function.
Usage
rrda.fit(
Y,
X,
nrank = NULL,
lambda = 1,
component = TRUE,
center.X = TRUE,
center.Y = TRUE,
scale.X = FALSE,
scale.Y = FALSE
)
Arguments
Y |
A numeric matrix of response variables. |
X |
A numeric matrix of explanatory variables. |
nrank |
A numeric vector specifying the ranks of Bhat. Default is |
lambda |
A numeric vector of ridge penalty values. Default value is 1. |
component |
Logical indicating if Bhat is returned as vectors or matrices. If |
center.X |
Logical indicating if |
center.Y |
Logical indicating if |
scale.X |
Logical indicating if |
scale.Y |
Logical indicating if |
Value
A list containing Bhat components or Bhat matrices (the coefficient of Ridge Redundancy Analysis for each parameter lambda and nrank), ranks, and lambda values.
Examples
set.seed(10)
simdata<-rdasim1(n = 100,p = 200,q = 200,k = 5)
X <- simdata$X
Y <- simdata$Y
# Sequential
Bhat <- rrda.fit(Y = Y, X = X, nrank = c(1:10))
names(Bhat)
Heatmap of the results of cross-validation for Bhat obtained from the rrda.cv
function.
Description
This function creates a heatmap to visualize the Mean Squared Error (MSE) results from the cross-validation of the Bhat matrix obtained from the rrda.cv function. The heatmap displays the MSE for different ranks of Bhat and values of the regularization parameter lambda, allowing users to visually assess the best combination of rank and lambda. The function also allows the user to highlight the points corresponding to the minimum MSE and the 1-standard error rule, helping to identify optimal model parameters.
Usage
rrda.heatmap(
cv_result,
nrank = NULL,
min_l = NULL,
max_l = NULL,
highlight_min = TRUE,
title = NULL
)
Arguments
cv_result |
A result list from the function |
nrank |
A numeric vector specifying the ranks of Bhat to be plotted. Default is |
min_l |
Minimum lambda value to be plotted. Default is |
max_l |
Maximum lambda value to be plotted. Default is |
highlight_min |
Logical indicating if the marks should be plotted on the best prediction point, and 1se point. Default is |
title |
Title of the figure |
Value
A heatmap of MSE cross-validation results.
Examples
set.seed(10)
simdata<-rdasim1(n = 10,p = 30,q = 30,k = 3) # data generation
X <- simdata$X
Y <- simdata$Y
cv_result<- rrda.cv(Y = Y, X = X, maxrank = 5, nfold = 5) # cv
rrda.summary(cv_result = cv_result)
rrda.heatmap(cv_result=cv_result)
Plot the results of cross-validation for Bhat obtained from the rrda.cv
function.
Description
This function visualizes the results of cross-validation for the estimated Bhat matrix obtained from the rrda.cv function. It creates a plot of the Mean Squared Error (MSE) for each combination of rank and lambda regularization parameter, allowing for the selection of specific ranks and lambda ranges to be plotted. Error bars representing the standard error of the MSE can be displayed for the best rank.
Usage
rrda.plot(
cv_result,
nrank = NULL,
min_l = NULL,
max_l = NULL,
show_error_bar = FALSE,
title = NULL
)
Arguments
cv_result |
A result list from the function |
nrank |
A numeric vector specifying the ranks of Bhat to be plotted. Default is |
min_l |
Minimum lambda value to be plotted. Default is |
max_l |
Maximum lambda value to be plotted. Default is |
show_error_bar |
Logical value indicating if the error bar is shown on the line that gives the best MSE value. |
title |
Title of the figure |
Value
A plot of MSE cross-validation results.
Examples
set.seed(10)
simdata<-rdasim1(n = 10,p = 30,q = 30,k = 3) # data generation
X <- simdata$X
Y <- simdata$Y
cv_result<- rrda.cv(Y = Y, X = X, maxrank = 5, nfold = 5) # cv
rrda.summary(cv_result = cv_result)
rrda.plot(cv_result = cv_result)
Calculate the predicted matrix Yhat using the coefficient Bhat obtained from the rrda.fit
function.
Description
This function calculates the predicted response matrix (Yhat) using Bhat (the coefficient of Ridge Redundancy Analysis) obtained from the rrda.fit function. The user can specify the ranks and lambda values to be used for the prediction. If not provided, the ranks and lambda values are set based on the Bhat components. The function returns the predicted Yhat matrix, as well as the ranks and lambda values used.
The predicted response matrix is calculated as:
\hat{Y}_r = X \left( \sum_{i=1}^{r} (F_{.i} G_{.i}^{\prime}) \right)
= \sum_{i=1}^{r} (X F_{.i} G_{.i}^{\prime})
= \sum_{i=1}^{r} (\tilde{X}_{.i} G_{.i}^{\prime})
Here, \tilde{X} = X F
, and r
is the rank of \hat{B}(\lambda, r)
.
The regularized-rank-restricted estimation of B
is obtained from the rrda.fit
function:
\hat{B}(\lambda, r) = FG^{\prime}
Usage
rrda.predict(Bhat, X, nrank = NULL, lambda = NULL)
Arguments
Bhat |
A list of vectors of Bhat components, obtained by the |
X |
A numeric matrix of explanatory variables. |
nrank |
A numeric vector specifying the ranks of Bhat. Default is |
lambda |
A numeric vector of ridge penalty values. Default is |
Value
A list containing the predicted Yhat matrix, ranks, and lambda values.
Examples
set.seed(10)
simdata<-rdasim1(n = 100,p = 200,q = 200,k = 5)
X <- simdata$X
Y <- simdata$Y
Bhat <- rrda.fit(Y = Y, X = X, nrank = c(1:10))
Yhat_mat <- rrda.predict(Bhat = Bhat, X = X, nrank = 10)
Summarize the results of cross-validation for the coefficient Bhat obtained from the rrda.cv
function.
Description
The function provides a summary of the results from cross-validation for Bhat obtained using the rrda.cv function. It displays the Mean Squared Error (MSE), rank, and lambda values for different cross-validation options.
Usage
rrda.summary(cv_result = cv_result)
Arguments
cv_result |
A result list from the function |
Value
Prints the estimated Coefficients, rank, lambda, and MSE from cross-validation.
Compute the square root of the inverse of (d^2 + lambda).
Description
Compute the square root of the inverse of (d^2 + lambda).
Usage
sqrt_inv_d2_lambda(d, lambda)
Arguments
d |
A scalar of singular value of X. |
lambda |
A scalar of the ridge penalty. |
Value
A scalar of the square root of the inverse of (d^2 + lambda).
Scale a matrix using unbiased estimators for the mean and standard deviation.
Description
Scale a matrix using unbiased estimators for the mean and standard deviation.
Usage
unbiased_scale(x)
Arguments
x |
A numeric matrix to be scaled. |
Value
A scaled numeric matrix.
Unscale a matrix based on provided mean and standard deviation values.
Description
Unscale a matrix based on provided mean and standard deviation values.
Usage
unscale_matrices(mat, means, std = NULL)
Arguments
mat |
A numeric matrix to be unscaled. |
means |
A numeric vector of means for each column. |
std |
A numeric vector of standard deviations for each column (optional). |
Value
An unscaled numeric matrix.
Apply unscaling to a nested list of matrices using specified mean and standard deviation values.
Description
Apply unscaling to a nested list of matrices using specified mean and standard deviation values.
Usage
unscale_nested_matrices_map(mat, means, std = NULL)
Arguments
mat |
A nested list of matrices. |
means |
A numeric vector of means for each matrix. |
std |
A numeric vector of standard deviations for each matrix (optional). |
Value
A nested list of unscaled matrices.