Type: | Package |
Title: | Vintage Sparse PCA for Semi-Parametric Factor Analysis |
Version: | 0.1.2 |
Description: | Provides fast spectral estimation of latent factors in random dot product graphs using the vsp estimator. Under mild assumptions, the vsp estimator is consistent for (degree-corrected) stochastic blockmodels, (degree-corrected) mixed-membership stochastic blockmodels, and degree-corrected overlapping stochastic blockmodels. |
License: | MIT + file LICENSE |
URL: | https://rohelab.github.io/vsp/, https://github.com/RoheLab/vsp |
BugReports: | https://github.com/RoheLab/vsp/issues |
Depends: | R (≥ 3.1) |
Imports: | clue, ggplot2, glue, invertiforms, LRMF3, magrittr, Matrix, rlang, RSpectra, stats, tibble, withr |
Suggests: | covr, dplyr, GGally, igraph, igraphdata, knitr, purrr, rmarkdown, scales, testthat (≥ 3.0.0), tidygraph, tidyr |
VignetteBuilder: | knitr |
Config/testthat/edition: | 3 |
Encoding: | UTF-8 |
RoxygenNote: | 7.3.2 |
NeedsCompilation: | no |
Packaged: | 2024-11-05 17:15:51 UTC; alex |
Author: | Karl Rohe [aut],
Muzhe Zeng [aut],
Alex Hayes |
Maintainer: | Alex Hayes <alexpghayes@gmail.com> |
Repository: | CRAN |
Date/Publication: | 2024-11-05 19:40:02 UTC |
vsp: Vintage Sparse PCA for Semi-Parametric Factor Analysis
Description
Provides fast spectral estimation of latent factors in random dot product graphs using the vsp estimator. Under mild assumptions, the vsp estimator is consistent for (degree-corrected) stochastic blockmodels, (degree-corrected) mixed-membership stochastic blockmodels, and degree-corrected overlapping stochastic blockmodels.
Author(s)
Maintainer: Alex Hayes alexpghayes@gmail.com (ORCID) [copyright holder]
Authors:
Karl Rohe karlrohe@stat.wisc.edu
Muzhe Zeng mzeng6@wisc.edu
Fan Chen
See Also
Useful links:
Report bugs at https://github.com/RoheLab/vsp/issues
Pipe operator
Description
See magrittr::%>%
for details.
Usage
lhs %>% rhs
Find features most associated with cluster membership
Description
Find features most associated with cluster membership
Usage
bff(loadings, features, num_best)
Arguments
loadings |
An |
features |
An |
num_best |
An integer indicating how many of the top features for differentiating between loadings you want. |
Details
See vignette("bff")
.
Value
An n
by k
matrix whose [i, j]
entry is the
ith "most important" feature for cluster j.
Add Z factor loadings to node table of tidygraph
Description
Add Z factor loadings to node table of tidygraph
Usage
bind_varimax_z(graph, fa, ...)
bind_varimax_y(graph, fa, ...)
bind_svd_u(graph, fa, ...)
bind_svd_v(graph, fa, ...)
Arguments
graph |
A tidygraph::tbl_graph object. |
fa |
Optionally, a vsp object to extract varimax loadings from. If you do not passed a vsp object, one will be created. |
... |
Arguments passed on to
|
Value
The same graph
object with columns factor1
, ..., factor{rank}
in the table of node information.
Functions
-
bind_varimax_y()
: Add Y factor loadings to node table of tidygraph -
bind_svd_u()
: Add left singular vectors to node table of tidygraph -
bind_svd_v()
: Add right singular vectors to node table of tidygraph
Get left singular vectors in a tibble
Description
Get left singular vectors in a tibble
Usage
get_svd_u(fa, factors = 1:fa$rank)
get_svd_v(fa, factors = 1:fa$rank)
get_varimax_z(fa, factors = 1:fa$rank)
get_varimax_y(fa, factors = 1:fa$rank)
Arguments
fa |
A |
factors |
The specific columns to index into. The most reliable option here is to index with an integer vector of column indices, but you could also use a character vector if columns have been named. By default returns all factors/singular vectors. |
Value
A tibble::tibble()
with one row for each node, and one column
containing each of the requested factor or singular vector, plus
an additional id
column.
Functions
-
get_svd_v()
: Get right singular vectors in a tibble -
get_varimax_z()
: Get varimax Y factors in a tibble -
get_varimax_y()
: Get varimax Z factors in a tibble
Examples
data(enron, package = "igraphdata")
fa <- vsp(enron, rank = 30)
fa
get_svd_u(fa)
get_svd_v(fa)
get_varimax_z(fa)
get_varimax_y(fa)
Get most important hubs for each Z factor
Description
Get most important hubs for each Z factor
Usage
get_z_hubs(fa, hubs_per_factor = 10, factors = 1:fa$rank)
get_y_hubs(fa, hubs_per_factor = 10, factors = 1:fa$rank)
Arguments
fa |
A |
hubs_per_factor |
The number of important nodes to get per
latent factor. Defaults to |
factors |
The specific columns to index into. The most reliable option here is to index with an integer vector of column indices, but you could also use a character vector if columns have been named. By default returns all factors/singular vectors. |
Value
A tibble::tibble()
where each row corresponds to a single
hub, and three columns:
-
id
: Node id of hub node -
factor
: Which factor that node is a hub for. Nodes can be hubs of multiple factors. -
loading
: The actual value of the hubs factor loading for that factor.
Functions
-
get_y_hubs()
: Get most important hubs for each Y factor
Examples
data(enron, package = "igraphdata")
fa <- vsp(enron, rank = 30)
fa
get_z_hubs(fa)
get_y_hubs(fa)
Plot pairs of inverse participation ratios for singular vectors
Description
When IPR for a given singular vector is O(1) rather than O(1 / sqrt(n)), this can indicate that the singular vector is localizing on a small subset of nodes. Oftentimes this localization indicates overfitting. If you see IPR values that are not close to zero (where "close to zero" is something you sort of have to pick up over time), then you need to some further investigation to see if you have localization and that localization corresponds to overfitting. Note, however, that not all localization is overfitting.
Usage
plot_ipr_pairs(fa)
Arguments
fa |
A |
Value
A tibble::tibble()
with one row for each node, and one column
containing each of the requested factor or singular vector, plus
an additional id
column.
Plot the mixing matrix B
Description
Plot the mixing matrix B
Usage
plot_mixing_matrix(fa)
Arguments
fa |
A |
Value
A tibble::tibble()
with one row for each node, and one column
containing each of the requested factor or singular vector, plus
an additional id
column.
Create a pairs plot of select Y factors
Description
To avoid overplotting, plots data for a maximum of 1000 nodes. If there are more than 1000 nodes, samples 1000 nodes randomly proportional to row norms (i.e. nodes with embeddings larger in magniture are more likely to be sampled).
Usage
plot_varimax_z_pairs(fa, factors = 1:min(5, fa$rank), ...)
plot_varimax_y_pairs(fa, factors = 1:min(5, fa$rank), ...)
plot_svd_u(fa, factors = 1:min(5, fa$rank))
plot_svd_v(fa, factors = 1:min(5, fa$rank))
Arguments
fa |
A |
factors |
The specific columns to index into. The most reliable option here is to index with an integer vector of column indices, but you could also use a character vector if columns have been named. By default returns all factors/singular vectors. |
... |
Arguments passed on to
|
Value
A ggplot2::ggplot()
plot or GGally::ggpairs()
plot.
Functions
-
plot_varimax_y_pairs()
: Create a pairs plot of select Z factors -
plot_svd_u()
: Create a pairs plot of select left singular vectors -
plot_svd_v()
: Create a pairs plot of select right singular vectors
Examples
data(enron, package = "igraphdata")
fa <- vsp(enron, rank = 3)
plot_varimax_z_pairs(fa)
plot_varimax_y_pairs(fa)
plot_svd_u(fa)
plot_svd_v(fa)
screeplot(fa)
plot_mixing_matrix(fa)
plot_ipr_pairs(fa)
Safe L2 row normalization
Description
Helper function for Kaiser normalization to handle rows with zero (or
numerically zero) norm, which results in a divide by zero error
in the stats::varimax()
implementation.
Usage
safe_row_l2_normalize(x, eps = 1e-10)
Arguments
x |
A matrix to row normalize. |
eps |
Tolerance to use when assessing if squared L2 row norm is numerically larger or smaller than zero. |
Value
The row-rescaled matrix
Create a screeplot from a factor analysis object
Description
Create a screeplot from a factor analysis object
Usage
## S3 method for class 'vsp_fa'
screeplot(x, ...)
Arguments
x |
A |
... |
Ignored, included only for consistency with S3 generic. |
Value
A tibble::tibble()
with one row for each node, and one column
containing each of the requested factor or singular vector, plus
an additional id
column.
Give the dimensions of Z factors informative names
Description
Give the dimensions of Z factors informative names
Usage
set_z_factor_names(fa, names)
set_y_factor_names(fa, names)
Arguments
fa |
A |
names |
Describe new names for Z/Y factors. |
Value
A new vsp_fa()
object, but the columns names of Z
and the
row names of B
have been set to names
(for set_z_factor_names
),
and the column names of B
and the column names of Y
have been
set to names
(for set_y_factor_names
).
Functions
-
set_y_factor_names()
: Give the dimensions of Y factors informative names
Semi-Parametric Factor Analysis via Vintage Sparse PCA
Description
This code implements TODO.
Usage
vsp(x, rank, ...)
## Default S3 method:
vsp(x, rank, ...)
## S3 method for class 'matrix'
vsp(
x,
rank,
...,
center = FALSE,
recenter = FALSE,
degree_normalize = TRUE,
renormalize = FALSE,
tau_row = NULL,
tau_col = NULL,
kaiser_normalize_u = FALSE,
kaiser_normalize_v = FALSE,
rownames = NULL,
colnames = NULL,
match_columns = TRUE
)
## S3 method for class 'Matrix'
vsp(
x,
rank,
...,
center = FALSE,
recenter = FALSE,
degree_normalize = TRUE,
renormalize = FALSE,
tau_row = NULL,
tau_col = NULL,
kaiser_normalize_u = FALSE,
kaiser_normalize_v = FALSE,
rownames = NULL,
colnames = NULL,
match_columns = TRUE
)
## S3 method for class 'dgCMatrix'
vsp(
x,
rank,
...,
center = FALSE,
recenter = FALSE,
degree_normalize = TRUE,
renormalize = FALSE,
tau_row = NULL,
tau_col = NULL,
kaiser_normalize_u = FALSE,
kaiser_normalize_v = FALSE,
rownames = NULL,
colnames = NULL,
match_columns = TRUE
)
## S3 method for class 'igraph'
vsp(x, rank, ..., edge_weights = NULL)
Arguments
x |
Either a graph adjacency matrix, igraph::igraph or
tidygraph::tbl_graph. If |
rank |
The number of factors to calculate. |
... |
These dots are for future extensions and must be empty. |
center |
Should the adjacency matrix be row and column centered?
Defaults to |
recenter |
Should the varimax factors be re-centered around the
original factor means? Only used when |
degree_normalize |
Should the regularized graph laplacian be used instead of the
raw adjacency matrix? Defaults to |
renormalize |
Should the regularized graph laplacian be used instead of the
raw adjacency matrix? Defaults to |
tau_row |
Row regularization term. Default is |
tau_col |
Column regularization term. Default is |
kaiser_normalize_u |
Whether or not to use Kaiser normalization
when rotating the left singular vectors |
kaiser_normalize_v |
Whether or not to use Kaiser normalization
when rotating the right singular vectors |
rownames |
Character vector of row names of |
colnames |
Character vector of column names of |
match_columns |
Should the columns of |
edge_weights |
When |
Details
Sparse SVDs use RSpectra
for performance.
Value
An object of class vsp
. TODO: Details
Examples
library(LRMF3)
vsp(ml100k, rank = 2)
Perform varimax rotation on a low rank matrix factorization
Description
Perform varimax rotation on a low rank matrix factorization
Usage
## S3 method for class 'svd_like'
vsp(
x,
rank,
...,
centerer = NULL,
scaler = NULL,
recenter = FALSE,
renormalize = FALSE,
kaiser_normalize_u = FALSE,
kaiser_normalize_v = FALSE,
rownames = NULL,
colnames = NULL,
match_columns = TRUE
)
Arguments
x |
Either a graph adjacency matrix, igraph::igraph or
tidygraph::tbl_graph. If |
rank |
The number of factors to calculate. |
... |
These dots are for future extensions and must be empty. |
centerer |
TODO |
scaler |
TODO |
recenter |
Should the varimax factors be re-centered around the
original factor means? Only used when |
renormalize |
Should the regularized graph laplacian be used instead of the
raw adjacency matrix? Defaults to |
kaiser_normalize_u |
Whether or not to use Kaiser normalization
when rotating the left singular vectors |
kaiser_normalize_v |
Whether or not to use Kaiser normalization
when rotating the right singular vectors |
rownames |
Character vector of row names of |
colnames |
Character vector of column names of |
match_columns |
Should the columns of |
Examples
library(LRMF3)
library(RSpectra)
s <- svds(ml100k, k = 2)
mf <- as_svd_like(s)
fa <- vsp(mf, rank = 2)
Create a vintage sparse factor analysis object
Description
vsp_fa
objects are a subclass of LRMF3::fa_like()
, with additional
fields u
, d
, v
, transformers
, R_U
, and R_V
Usage
vsp_fa(
u,
d,
v,
Z,
B,
Y,
transformers,
R_U,
R_V,
rownames = NULL,
colnames = NULL
)
Arguments
u |
A |
d |
A |
v |
A |
Z |
A matrix of embeddings for each observation. |
B |
A mixing matrix describing how observation embeddings and topics interact. Does not have to be diagonal! |
Y |
A matrix describing the compositions of various topics or factors. |
transformers |
A list of transformations from the |
R_U |
Varimax rotation matrix use to transform |
R_V |
Varimax rotation matrix use to transform |
rownames |
Identifying names for each row of the original
data. Defaults to |
colnames |
Identifying names for each column of the original
data. Defaults to |
Value
A svd_fa
object.