Type: | Package |
Title: | Efficiently Impute Large Scale Incomplete Matrix |
Version: | 0.2.4 |
Date: | 2024-07-22 |
Author: | Zhe Gao [aut, cre], Jin Zhu [aut], Junxian Zhu [aut], Xueqin Wang [aut], Yixuan Qiu [cph], Gael Guennebaud [cph, ctb], Jitse Niesen [cph, ctb], Ray Gardner [ctb] |
Maintainer: | Zhe Gao <gaozh8@mail.ustc.edu.cn> |
Description: | Efficiently impute large scale matrix with missing values via its unbiased low-rank matrix approximation. Our main approach is Hard-Impute algorithm proposed in https://www.jmlr.org/papers/v11/mazumder10a.html, which achieves highly computational advantage by truncated singular-value decomposition. |
License: | GPL-3 | file LICENSE |
Imports: | Rcpp (≥ 0.12.6) |
LinkingTo: | Rcpp, RcppEigen |
Encoding: | UTF-8 |
RoxygenNote: | 7.3.1 |
NeedsCompilation: | yes |
Packaged: | 2024-07-22 12:57:11 UTC; AMA |
Suggests: | knitr |
VignetteBuilder: | knitr |
Repository: | CRAN |
Date/Publication: | 2024-07-22 22:10:05 UTC |
Data standardization
Description
Standardize a matrix rows and/or columns to have zero mean or unit variance
Usage
biscale(x, thresh.sd = 1e-05, maxit.sd = 100, control = list(...), ...)
Arguments
x |
an |
thresh.sd |
convergence threshold, measured as the relative change in the Frobenius norm between two successive estimates. |
maxit.sd |
maximum number of iterations. |
control |
a list of parameters that control details of standard procedure. See biscale.control. |
... |
arguments to be used to form the default control argument if it is not supplied directly. |
Value
A list is returned
x.st |
The matrix after standardization. |
alpha |
The row mean after iterative process. |
beta |
The column mean after iterative process. |
tau |
The row standard deviation after iterative process. |
gamma |
The column standard deviation after iterative process. |
References
Hastie, Trevor, Rahul Mazumder, Jason D. Lee, and Reza Zadeh. Matrix completion and low-rank SVD via fast alternating least squares. The Journal of Machine Learning Research 16, no. 1 (2015): 3367-3402.
Examples
################# Quick Start #################
m <- 100
n <- 100
r <- 10
x_na <- incomplete.generator(m, n, r)
###### Standardize both mean and variance
xs <- biscale(x_na)
###### Only standardize mean ######
xs_mean <- biscale(x_na, row.mean = TRUE, col.mean = TRUE)
###### Only standardize variance ######
xs_std <- biscale(x_na, row.std = TRUE, col.std = TRUE)
Control for standard procedure
Description
Various parameters that control aspects of the standard procedure.
Usage
biscale.control(
row.mean = FALSE,
row.std = FALSE,
col.mean = FALSE,
col.std = FALSE
)
Arguments
row.mean |
if |
row.std |
if |
col.mean |
similar to |
col.std |
similar to |
Value
A list with components named as the arguments.
Efficiently impute missing values for a large scale matrix
Description
Fit a low-rank matrix approximation to a matrix with missing values. The algorithm iterates like EM: filling the missing values with the current guess, and then approximating the complete matrix via truncated SVD.
Usage
eimpute(
x,
r,
svd.method = c("tsvd", "rsvd"),
noise.var = 0,
thresh = 1e-05,
maxit = 100,
init = FALSE,
init.mat = 0,
override = FALSE,
control = list(...),
...
)
Arguments
x |
an |
r |
the rank of low-rank matrix for approximating |
svd.method |
a character string indicating the truncated SVD method.
If |
noise.var |
the variance of noise. |
thresh |
convergence threshold, measured as the relative change in the Frobenius norm between two successive estimates. |
maxit |
maximal number of iterations. |
init |
if init = FALSE(the default), the missing entries will initialize with mean. |
init.mat |
the initialization matrix. |
override |
logical value indicating whether the observed elements in |
control |
a list of parameters that control details of standard procedure, See biscale.control. |
... |
arguments to be used to form the default control argument if it is not supplied directly. |
Value
A list containing the following components
x.imp |
the matrix after completion. |
rmse |
the relative mean square error of matrix completion, i.e., training error. |
iter.count |
the number of iterations. |
References
Rahul Mazumder, Trevor Hastie and Rob Tibshirani (2010) Spectral Regularization Algorithms for Learning Large Incomplete Matrices, Journal of Machine Learning Research 11, 2287-2322
Nathan Halko, Per-Gunnar Martinsson, Joel A. Tropp (2011) Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions, Siam Review Vol. 53, num. 2, pp. 217-288
Examples
################# Quick Start #################
m <- 100
n <- 100
r <- 10
x_na <- incomplete.generator(m, n, r)
head(x_na[, 1:6])
x_impute <- eimpute(x_na, r)
head(x_impute[["x.imp"]][, 1:6])
x_impute[["rmse"]]
Incomplete data generator
Description
Generate a matrix with missing values, where the indices of missing values are uniformly randomly distributed in the matrix.
Usage
incomplete.generator(m, n, r, snr = 3, prop = 0.5, seed = 1)
Arguments
m |
the rows of the matrix. |
n |
the columns of the matrix. |
r |
the rank of the matrix. |
snr |
the signal-to-noise ratio in generating the matrix. Default |
prop |
the proportion of missing observations. Default |
seed |
the random seed. Default |
Details
We generate the matrix by UV + \epsilon
, where U
, V
are m
by r
, r
by n
matrix satisfy standard normal
distribution. \epsilon
has a normal distribution with mean 0 and variance \frac{r}{snr}
.
Value
A matrix with missing values.
Examples
m <- 100
n <- 100
r <- 10
x_na <- incomplete.generator(m, n, r)
head(x_na[, 1:6])
Search rank magnitude of the best approximating matrix
Description
Estimate a preferable matrix rank magnitude for fitting a low-rank matrix approximation to a matrix with missing values. The algorithm use GIC/CV to search the rank in a given range, and then fill the missing values with the estimated rank.
Usage
r.search(
x,
r.min = 1,
r.max = "auto",
svd.method = c("tsvd", "rsvd"),
rule.type = c("gic", "cv"),
noise.var = 0,
init = FALSE,
init.mat = 0,
maxit.rank = 1,
nfolds = 5,
thresh = 1e-05,
maxit = 100,
override = FALSE,
control = list(...),
...
)
Arguments
x |
an |
r.min |
the start rank for searching. Default |
r.max |
the max rank for searching. |
svd.method |
a character string indicating the truncated SVD method.
If |
rule.type |
a character string indicating the information criterion rule.
If |
noise.var |
the variance of noise. |
init |
if init = FALSE(the default), the missing entries will initialize with mean. |
init.mat |
the initialization matrix. |
maxit.rank |
maximal number of iterations in searching rank. Default |
nfolds |
number of folds in cross validation. Default |
thresh |
convergence threshold, measured as the relative change in the Frobenius norm between two successive estimates. |
maxit |
maximal number of iterations. |
override |
logical value indicating whether the observed elements in |
control |
a list of parameters that control details of standard procedure, See biscale.control. |
... |
arguments to be used to form the default control argument if it is not supplied directly. |
Value
A list containing the following components
x.imp |
the matrix after completion with the estimated rank. |
r.est |
the rank estimation. |
rmse |
the relative mean square error of matrix completion, i.e., training error. |
iter.count |
the number of iterations. |
Examples
################# Quick Start #################
m <- 100
n <- 100
r <- 10
x_na <- incomplete.generator(m, n, r)
head(x_na[, 1:6])
x_impute <- r.search(x_na, 1, 15, "rsvd", "gic")
x_impute[["r.est"]]