Title: | Analysis of Biological Count Data, Especially from Single-Cell RNA-Seq |
Version: | 0.1.2 |
Description: | A set of functions for applying a restricted linear algebra to the analysis of count-based data. See the accompanying preprint manuscript: "Normalizing need not be the norm: count-based math for analyzing single-cell data" Church et al (2022) <doi:10.1101/2022.06.01.494334> This tool is specifically designed to analyze count matrices from single cell RNA sequencing assays. The tools implement several count-based approaches for standard steps in single-cell RNA-seq analysis, including scoring genes and cells, comparing cells and clustering, calculating differential gene expression, and several methods for rank reduction. There are many opportunities for further optimization that may prove useful in the analysis of other data. We provide the source code freely available at https://github.com/shchurch/countland and encourage users and developers to fork the code for their own purposes. |
License: | MIT + file LICENSE |
Encoding: | UTF-8 |
RoxygenNote: | 7.3.0 |
URL: | https://github.com/shchurch/countland |
BugReports: | https://github.com/shchurch/countland/issues |
Imports: | methods, rlang, Matrix, ggplot2 |
Suggests: | tidyverse, viridis, gridExtra, igraph, RSpectra, matrixTests, rdist, stats, Seurat, testthat (≥ 3.0.0) |
Config/testthat/edition: | 3 |
NeedsCompilation: | no |
Packaged: | 2024-02-01 17:48:49 UTC; samuelchurch |
Author: | Church Samuel H. |
Maintainer: | Church Samuel H. <samuelhchurch@gmail.com> |
Repository: | CRAN |
Date/Publication: | 2024-02-01 18:00:02 UTC |
Recapitulate Seurat centering scaled and transformed data
Description
Recapitulate Seurat centering scaled and transformed data
Usage
Center(C)
Arguments
C |
countland object |
Value
countland object with slots centered_counts
Perform spectral clustering on dot products.
Description
Perform spectral clustering on dot products.
Usage
Cluster(C, n_clusters, n_components = NULL)
Arguments
C |
countland object |
n_clusters |
number of clusters, integer |
n_components |
number of components from spectral embedding to use (default NULL, will be set to n_clusters), integer |
Value
countland object with slot cluster_labels
Examples
gold_path <- system.file("testdata", package = "countland", mustWork = TRUE)
gold.data <- Seurat::Read10X(data.dir = gold_path)
C <- countland(gold.data)
C <- Dot(C)
C <- Embed(C,n_components=5)
C <- Cluster(C,n_clusters=3)
Internal function for calculating count index.
Description
Internal function for calculating count index.
Usage
CountIndex(lm)
Arguments
lm |
column vector |
Value
count index = largest n where n cells have >= n counts
Calculate pairwise dot products of counts between all cells.
Description
Calculate pairwise dot products of counts between all cells.
Usage
Dot(C, subsample = FALSE)
Arguments
C |
countland object |
subsample |
if TRUE, use subsampled counts, otherwise use counts (default=FALSE) |
Value
countland object with slot dots
Examples
gold_path <- system.file("testdata", package = "countland", mustWork = TRUE)
gold.data <- Seurat::Read10X(data.dir = gold_path)
C <- countland(gold.data)
C <- Dot(C)
Perform spectral embedding on dot products.
Description
Perform spectral embedding on dot products.
Usage
Embed(C, n_components = 10)
Arguments
C |
countland object |
n_components |
number of components, integer (default=10) |
Value
countland object with slot embedding
, eigenvals
Examples
gold_path <- system.file("testdata", package = "countland", mustWork = TRUE)
gold.data <- Seurat::Read10X(data.dir = gold_path)
C <- countland(gold.data)
C <- Dot(C)
C <- Embed(C,n_components=5)
run integer matrix approximation
Description
run integer matrix approximation
Usage
IMA(X, params)
Arguments
X |
observed data matrix |
params |
parameter object |
Value
U, V, and Lambda matrix factors
rescale if max val is above upper bound
Description
rescale if max val is above upper bound
Usage
IMA_Compute_Init_Scaled(h, l_bound, u_bound)
Arguments
h |
matrix to be rescaled |
l_bound |
lower bound |
u_bound |
upper bound |
Value
rescaled matrix
Update factor matrix - see SUSTain code
Description
Update factor matrix - see SUSTain code
Usage
IMA_Update_Factor(M, coeff, mkrp, mode, lambda_, params)
Arguments
M |
matrix to be updated (either U or V) |
coeff |
matrix used in updating algorithm |
mkrp |
matrix used in updating algorithm |
mode |
whether update U or V |
lambda_ |
scaling matrix |
params |
parameter object |
Value
updated matrix and scaling factors
function to initialize U, V, and Lambda
Description
function to initialize U, V, and Lambda
Usage
IMA_init(X, params)
Arguments
X |
observed data matrix |
params |
parameter object |
Value
initialized U, V, and Lambda matrices
Parameter class for IMA
Description
Parameter class for IMA
Usage
IMA_params(
rank,
u_bounds,
l_bounds = c(0, 0),
maxiter = 1e+06,
stop_crit = 1e-04
)
Arguments
rank |
target number of features in final matrices |
u_bounds |
upper bounds on integers |
l_bounds |
lower bounds on integers (default = c(0,0)) |
maxiter |
maximum number of iterations (default = 1000000) |
stop_crit |
criterion of difference at which to stop (default = 0.0001) |
Value
parameter object
Recapitulate Seurat log transformation
Description
Recapitulate Seurat log transformation
Usage
Log(C)
Arguments
C |
countland object |
Value
countland object with slots log_counts
Recapitulate Seurat normalization
Description
Recapitulate Seurat normalization
Usage
Normalize(C)
Arguments
C |
countland object |
Value
countland object with slots norm_factor
, norm_counts
Plots eigenvalues to investigate the optimal number of clusters
Description
Plots eigenvalues to investigate the optimal number of clusters
Usage
PlotEigengap(C)
Arguments
C |
countland object |
Value
generates plot of eigenvalues by number of components
Examples
gold_path <- system.file("testdata", package = "countland", mustWork = TRUE)
gold.data <- Seurat::Read10X(data.dir = gold_path)
C <- countland(gold.data)
C <- Dot(C)
C <- Embed(C,n_components=5)
PlotEigengap(C)
Plot cells using spectral embedding of dot products.
Description
Plot cells using spectral embedding of dot products.
Usage
PlotEmbedding(C, colors = color_palette)
Arguments
C |
countland object |
colors |
color palette for ggplot2, default=palette of 11 colors |
Value
generates plot of cells in two spectral embedding dimensions
Examples
gold_path <- system.file("testdata", package = "countland", mustWork = TRUE)
gold.data <- Seurat::Read10X(data.dir = gold_path)
C <- countland(gold.data)
C <- Dot(C)
C <- Embed(C,n_components=5)
C <- Cluster(C,n_clusters=3)
PlotEmbedding(C)
Generate a strip plot for counts across selected genes
Description
Generate a strip plot for counts across selected genes
Usage
PlotGeneCounts(C, gene_indices, colors = color_palette)
Arguments
C |
countland object |
gene_indices |
vector of gene index values |
colors |
color palette for ggplot2, default=palette of 11 colors |
Value
generates plot of gene count distributions
Examples
gold_path <- system.file("testdata", package = "countland", mustWork = TRUE)
gold.data <- Seurat::Read10X(data.dir = gold_path)
C <- countland(gold.data)
PlotGeneCounts(C,gene_indices=1:10)
Plot cells using integer matrix approximation
Description
Plot cells using integer matrix approximation
Usage
PlotIMA(C, x = 1, y = 2, colors = color_palette, subsample = TRUE)
Arguments
C |
countland object |
x |
feature on x-axis, integer (default=1) |
y |
feature on y-axis, integer (default=2) |
colors |
color palette for ggplot2, default=palette of 11 colors |
subsample |
if TRUE, use subsampled counts (default), otherwise use counts |
Value
generates plot of cells using integer matrix approximation
Plot the difference between the observed and reconstructed count matrix using integer matrix approximation and a series of total features.
Description
Plot the difference between the observed and reconstructed count matrix using integer matrix approximation and a series of total features.
Usage
PlotIMAElbow(C, max_features, u_bounds, subsample = TRUE)
Arguments
C |
countland object |
max_features |
maximum number of features to assess, integer |
u_bounds |
upper bounds for U and V matrices, vector of length 2 |
subsample |
if TRUE, use subsampled counts (default), otherwise use counts |
Value
generates elbow plot for the difference between observed and reconstructed matrices as number of features increases
Plot cell using spectral embedding and display counts in a given gene.
Description
Plot cell using spectral embedding and display counts in a given gene.
Usage
PlotMarker(C, gene_index, colors = color_palette)
Arguments
C |
countland object |
gene_index |
index value for gene to visualize |
colors |
color palette for ggplot2, default=palette of 11 colors |
Value
generates plot of cells with spectral embedding, colored by marker gene counts
Plot cells using matrix of counts summed by clusters of genes.
Description
Plot cells using matrix of counts summed by clusters of genes.
Usage
PlotSharedCounts(C, x = 1, y = 2, colors = color_palette)
Arguments
C |
countland object |
x |
gene cluster to plot on x-axis, integer (default=1) |
y |
gene cluster to plot on y-axis, integer (default=2) |
colors |
color palette for ggplot2, default=palette of 11 colors |
Value
generates plot of cells using shared counts
Restore count matrix to original state
Description
Restore count matrix to original state
Usage
PrintGeneNumber(C)
Arguments
C |
countland object |
Value
countland object
Rank the top marker genes for each cluster from spectral clustering.
Description
Rank the top marker genes for each cluster from spectral clustering.
Usage
RankMarkerGenes(C, method = "prop-zero", subsample = FALSE)
Arguments
C |
countland object |
method |
|
subsample |
if TRUE, use subsampled counts, otherwise use counts (default=FALSE) |
Value
countland object with slots marker_genes
and marker_full
Examples
gold_path <- system.file("testdata", package = "countland", mustWork = TRUE)
gold.data <- Seurat::Read10X(data.dir = gold_path)
C <- countland(gold.data)
C <- Dot(C)
C <- Embed(C,n_components=5)
C <- Cluster(C,n_clusters=3)
C <- RankMarkerGenes(C,method='prop-zero',subsample=FALSE)
Internal function to remove empty columns and rows
Description
Internal function to remove empty columns and rows
Usage
RemoveEmpty(C)
Arguments
C |
countland object |
Value
countland object, count matrix updated
Recapitulate Seurat scaling to unit variance
Description
Recapitulate Seurat scaling to unit variance
Usage
RescaleVariance(C)
Arguments
C |
countland object |
Value
countland object with slots scaled_counts
Restore count matrix to original state
Description
Restore count matrix to original state
Usage
RestoreCounts(C)
Arguments
C |
countland object |
Value
countland object
Examples
gold_path <- system.file("testdata", package = "countland", mustWork = TRUE)
gold.data <- Seurat::Read10X(data.dir = gold_path)
C <- countland(gold.data)
C <- SubsetGenes(C,gene_indices=1:200)
C <- SubsetCells(C,cell_indices=1:50)
C <- RestoreCounts(C)
Perform integer matrix approximation on count matrix.
Description
Perform integer matrix approximation on count matrix.
Usage
RunIMA(
C,
features,
u_bounds,
l_bounds = c(0, 0),
maxiter = 1e+06,
stop_crit = 1e-04,
subsample = TRUE
)
Arguments
C |
countland object |
features |
target number of features, integer |
u_bounds |
upper bounds for U and V matrices, vector of length 2 |
l_bounds |
lower bounds for U and V matrices, vector of length 2 (default=c(0,0)) |
maxiter |
maximum number of iterations, integer (default=1000000) |
stop_crit |
criterion for stopping based on difference between iterations, numeric (default=0.0001) |
subsample |
if TRUE, use subsampled counts (default), otherwise use counts |
Value
countland object with slots matrixU
, matrixV
, matrixLambda
Examples
gold_path <- system.file("testdata", package = "countland", mustWork = TRUE)
gold.data <- Seurat::Read10X(data.dir = gold_path)
C <- countland(gold.data)
C <- RunIMA(C,features=10,u_bounds=c(10,10),subsample=FALSE)
Recapitulate scikit.manifold.spectral_embedding from python.
Description
Recapitulate scikit.manifold.spectral_embedding from python.
Usage
ScikitManifoldSpectralEmbedding(A, n_components)
Arguments
A |
similarity matrix, dgCMatrix |
n_components |
number of eigenvectors to retain, integer |
Value
matrix of eigenvectors
Calculate several scores for counts across cells
Description
Calculate several scores for counts across cells
Usage
ScoreCells(C, gene_string = NULL)
Arguments
C |
countland object |
gene_string |
string with regular expression expression matching gene names of interest (default=NULL) |
Value
countland object with slot cell_scores
Examples
gold_path <- system.file("testdata", package = "countland", mustWork = TRUE)
gold.data <- Seurat::Read10X(data.dir = gold_path)
C <- countland(gold.data)
C <- ScoreCells(C,gene_string="*149932$")
Calculate several scores for count-based gene expression.
Description
Calculate several scores for count-based gene expression.
Usage
ScoreGenes(C, subsample = FALSE)
Arguments
C |
countland object |
subsample |
if TRUE, use subsampled counts, otherwise use counts (default=FALSE) |
Value
countland object with slot gene_scores
Examples
gold_path <- system.file("testdata", package = "countland", mustWork = TRUE)
gold.data <- Seurat::Read10X(data.dir = gold_path)
C <- countland(gold.data)
C <- ScoreGenes(C)
Combine groups of genes with similar counts by clustering and summing.
Description
Combine groups of genes with similar counts by clustering and summing.
Usage
SharedCounts(C, n_clusters, n_cells = 100, subsample = TRUE)
Arguments
C |
countland object |
n_clusters |
number of clusters |
n_cells |
number of cells to sample for gene clustering |
subsample |
if TRUE, use subsampled counts (default), otherwise use counts |
Value
countland object with slots shared_counts
, sum_sharedcounts
, sum_sharedcounts_all
Examples
gold_path <- system.file("testdata", package = "countland", mustWork = TRUE)
gold.data <- Seurat::Read10X(data.dir = gold_path)
C <- countland(gold.data)
C <- SharedCounts(C,n_clusters=10,subsample=FALSE)
Subsample cells to a standard number of counts by randomly sampling observations without replacement.
Description
Subsample cells to a standard number of counts by randomly sampling observations without replacement.
Usage
Subsample(C, gene_counts = NA, cell_counts = NA)
Arguments
C |
countland object |
gene_counts |
maximum total counts for genes |
cell_counts |
sequencing depth for all cells, or "min" to use the minimum cell total |
Value
countland object with slot subsample
Examples
gold_path <- system.file("testdata", package = "countland", mustWork = TRUE)
gold.data <- Seurat::Read10X(data.dir = gold_path)
C <- countland(gold.data)
C <- Subsample(C,gene_counts=250,cell_counts=100)
Internal function for subsampling a column from a sparse matrix.
Description
Internal function for subsampling a column from a sparse matrix.
Usage
SubsampleCol(lm, li, j, n_counts)
Arguments
lm |
column vector |
li |
row positions |
j |
column index |
n_counts |
count to sample |
Value
subsampled column as dgTMatrix components
Subsets cells using a vector of cell indices
Description
Subsets cells using a vector of cell indices
Usage
SubsetCells(C, cell_indices, remove_empty = TRUE)
Arguments
C |
countland object |
cell_indices |
vector of cell index values |
remove_empty |
filter out cells and genes with no observed counts (default=TRUE) |
Value
countland object, count matrix updated
Examples
gold_path <- system.file("testdata", package = "countland", mustWork = TRUE)
gold.data <- Seurat::Read10X(data.dir = gold_path)
C <- countland(gold.data)
C <- SubsetCells(C,cell_indices=1:50)
Subsets genes using a vector of gene indices
Description
Subsets genes using a vector of gene indices
Usage
SubsetGenes(C, gene_indices, remove_empty = TRUE)
Arguments
C |
countland object |
gene_indices |
vector of gene index values |
remove_empty |
filter out cells and genes with no observed counts (default=TRUE) |
Value
countland object, count matrix updated
Examples
gold_path <- system.file("testdata", package = "countland", mustWork = TRUE)
gold.data <- Seurat::Read10X(data.dir = gold_path)
C <- countland(gold.data)
C <- SubsetGenes(C,gene_indices=1:200)
Initialize a countland object from a dgCMatrix
Description
Initialize a countland object from a dgCMatrix
Usage
countland(m, remove_empty = TRUE, verbose = TRUE)
Arguments
m |
A matrix of counts (dense or sparse) |
remove_empty |
filter out cells and genes with no observed counts (default=TRUE) |
verbose |
show stderr message statements (default=TRUE) |
Value
countland object
Examples
gold_path <- system.file("testdata", package = "countland", mustWork = TRUE)
gold.data <- Seurat::Read10X(data.dir = gold_path)
C <- countland(gold.data)
An S4 class to represent a countland object
Description
An S4 class to represent a countland object
Slots
counts
A dgCMatrix with rows as cells, columns as genes.
names_genes
A character vector of column names.
names_cells
A character vector of row names.
raw_counts
The count dgCMatrix as originally loaded.
raw_names_genes
The gene name character vector as originally loaded.
raw_names_cells
The cell name character vector as originally loaded.
subsample
A dgCMatrix with row sums equal.
cell_scores
A data.frame of cell count measures.
gene_scores
A data.frame of gene expression measures.
dots
A similarity dgCMatrix of dot products.
eigenvals
An vector of eigenvalues from spectral embedding
embedding
An array of two columns (spectral embeddings).
cluster_labels
A numeric vector of cluster assignments of length n cells.
marker_full
A list of data.frames with genes ranked for each cluster.
marker_genes
A data.frame of top ten marker genes per cluster.
matrixU
A dgCMatrix of dimensions cells x features.
matrixV
A dgCMatrix of dimensions genes x features.
matrixLambda
A diagonal dgCMatrix of scaling factors.
sharedcounts
A similarity dgCMatrix of shared counts between genes.
sum_sharedcounts
A dgCMatrix with counts summed within gene clusters.
sum_sharedcounts_all
A dgCMatrix with counts summed and including all genes not present in any cluster.
norm_factor
A numeric vector of cell normalization factors.
norm_counts
A dgCMatrix of normalized counts.
log_counts
A dgCMatrix of log transformed counts.
scaled_counts
A dgCMatrix of counts scaled by gene unit variance.
centered_counts
A dgCMatrix of counts centered at zero.
verbose
A T/F object for suppressing messages
Split dgCMatrix into column vectors.
Description
Split dgCMatrix into column vectors.
Usage
listCols(m)
Arguments
m |
dgCMatrix |
Value
list of column vectors, numeric