Type: | Package |
Title: | Simplification of scRNA-Seq Data by Merging Together Similar Cells |
Version: | 1.0.1 |
Description: | Aggregates large single-cell data into metacell dataset by merging together gene expression of very similar cells. 'SuperCell' uses 'velocyto.R' <doi:10.1038/s41586-018-0414-6> https://github.com/velocyto-team/velocyto.R for RNA velocity. We also recommend installing 'scater' Bioconductor package <doi:10.18129/B9.bioc.scater> https://bioconductor.org/packages/release/bioc/html/scater.html. |
License: | GPL-3 |
BugReports: | https://github.com/GfellerLab/SuperCell/issues |
Encoding: | UTF-8 |
LazyData: | true |
LazyDataCompression: | xz |
biocViews: | Software |
Additional_repositories: | https://mteleman.github.io/drat |
Imports: | igraph, RANN, WeightedCluster, corpcor, weights, Hmisc, Matrix, matrixStats, plyr, irlba, grDevices, patchwork, ggplot2, umap, entropy, Rtsne, dbscan, scales, plotfunctions, proxy, methods, rlang, |
RoxygenNote: | 7.3.2 |
Suggests: | SingleCellExperiment, SummarizedExperiment, cowplot, scater, Seurat, knitr, rmarkdown, remotes, bluster, velocyto.R, testthat (≥ 3.0.0) |
Depends: | R (≥ 4.0.0) |
VignetteBuilder: | knitr |
Config/testthat/edition: | 3 |
NeedsCompilation: | no |
Packaged: | 2024-10-25 11:08:29 UTC; admin |
Author: | Mariia Bilous [aut], Leonard Herault [cre] |
Maintainer: | Leonard Herault <leonard.herault@unil.ch> |
Repository: | CRAN |
Date/Publication: | 2024-10-25 11:30:02 UTC |
Detection of metacells with the SuperCell approach
Description
This function detects metacells (former super-cells) from single-cell gene expression matrix
Usage
SCimplify(
X,
genes.use = NULL,
genes.exclude = NULL,
cell.annotation = NULL,
cell.split.condition = NULL,
n.var.genes = min(1000, nrow(X)),
gamma = 10,
k.knn = 5,
do.scale = TRUE,
n.pc = 10,
fast.pca = TRUE,
do.approx = FALSE,
approx.N = 20000,
block.size = 10000,
seed = 12345,
igraph.clustering = c("walktrap", "louvain"),
return.singlecell.NW = TRUE,
return.hierarchical.structure = TRUE,
...
)
Arguments
X |
log-normalized gene expression matrix with rows to be genes and cols to be cells |
genes.use |
a vector of genes used to compute PCA |
genes.exclude |
a vector of genes to be excluded when computing PCA |
cell.annotation |
a vector of cell type annotation, if provided, metacells that contain single cells of different cell type annotation will be split in multiple pure metacell (may result in slightly larger numbe of metacells than expected with a given gamma) |
cell.split.condition |
a vector of cell conditions that must not be mixed in one metacell. If provided, metacells will be split in condition-pure metacell (may result in significantly(!) larger number of metacells than expected) |
n.var.genes |
if |
gamma |
graining level of data (proportion of number of single cells in the initial dataset to the number of metacells in the final dataset) |
k.knn |
parameter to compute single-cell kNN network |
do.scale |
whether to scale gene expression matrix when computing PCA |
n.pc |
number of principal components to use for construction of single-cell kNN network |
fast.pca |
use irlba as a faster version of prcomp (one used in Seurat package) |
do.approx |
compute approximate kNN in case of a large dataset (>50'000) |
approx.N |
number of cells to subsample for an approximate approach |
block.size |
number of cells to map to the nearest metacell at the time (for approx coarse-graining) |
seed |
seed to use to subsample cells for an approximate approach |
igraph.clustering |
clustering method to identify metacells (available methods "walktrap" (default) and "louvain" (not recommended, gamma is ignored)). |
return.singlecell.NW |
whether return single-cell network (which consists of approx.N if |
return.hierarchical.structure |
whether return hierarchical structure of metacell |
... |
other parameters of build_knn_graph function |
Value
a list with components
graph.supercells - igraph object of a simplified network (number of nodes corresponds to number of metacells)
membership - assigmnent of each single cell to a particular metacell
graph.singlecells - igraph object (kNN network) of single-cell data
supercell_size - size of metacells (former super-cells)
gamma - requested graining level
N.SC - number of obtained metacells
genes.use - used genes
do.approx - whether approximate coarse-graining was perfirmed
n.pc - number of principal components used for metacells construction
k.knn - number of neighbors to build single-cell graph
sc.cell.annotation. - single-cell cell type annotation (if provided)
sc.cell.split.condition. - single-cell split condition (if provided)
SC.cell.annotation. - super-cell cell type annotation (if was provided for single cells)
SC.cell.split.condition. - super-cell split condition (if was provided for single cells)
Examples
data(cell_lines) # list with GE - gene expression matrix (logcounts), meta - cell meta data
GE <- cell_lines$GE
SC <- SCimplify(GE, # log-normalized gene expression matrix
gamma = 20, # graining level
n.var.genes = 1000,
k.knn = 5, # k for kNN algorithm
n.pc = 10, # number of principal components to use
do.approx = TRUE) #
Construct super-cells from spliced and un-spliced matrices
Description
Construct super-cells from spliced and un-spliced matrices
Usage
SCimplify_for_velocity(emat, nmat, gamma = NULL, membership = NULL, ...)
Arguments
emat |
spliced (exonic) count matrix |
nmat |
unspliced (nascent) count matrix |
gamma |
graining level of data (proportion of number of single cells in the initial dataset to the number of super-cells in the final dataset) |
membership |
metacell membership vector (if provided, will be used for |
... |
other parameters from SCimplify |
Value
list containing vector of membership, spliced count and un-spliced count matrices
Detection of metacells with the SuperCell approach from low dim representation
Description
This function detects metacells (former super-cells) from single-cell gene expression matrix
Usage
SCimplify_from_embedding(
X,
cell.annotation = NULL,
cell.split.condition = NULL,
gamma = 10,
k.knn = 5,
n.pc = 10,
do.approx = FALSE,
approx.N = 20000,
block.size = 10000,
seed = 12345,
igraph.clustering = c("walktrap", "louvain"),
return.singlecell.NW = TRUE,
return.hierarchical.structure = TRUE,
...
)
Arguments
X |
low dimensional embedding matrix with rows to be cells and cols to be low-dim components |
cell.annotation |
a vector of cell type annotation, if provided, metacells that contain single cells of different cell type annotation will be split in multiple pure metacell (may result in slightly larger numbe of metacells than expected with a given gamma) |
cell.split.condition |
a vector of cell conditions that must not be mixed in one metacell. If provided, metacells will be split in condition-pure metacell (may result in significantly(!) larger number of metacells than expected) |
gamma |
graining level of data (proportion of number of single cells in the initial dataset to the number of metacells in the final dataset) |
k.knn |
parameter to compute single-cell kNN network |
n.pc |
number of principal components to use for construction of single-cell kNN network |
do.approx |
compute approximate kNN in case of a large dataset (>50'000) |
approx.N |
number of cells to subsample for an approximate approach |
block.size |
number of cells to map to the nearest metacell at the time (for approx coarse-graining) |
seed |
seed to use to subsample cells for an approximate approach |
igraph.clustering |
clustering method to identify metacells (available methods "walktrap" (default) and "louvain" (not recommended, gamma is ignored)). |
return.singlecell.NW |
whether return single-cell network (which consists of approx.N if |
return.hierarchical.structure |
whether return hierarchical structure of metacell |
... |
other parameters of build_knn_graph function |
Value
a list with components
graph.supercells - igraph object of a simplified network (number of nodes corresponds to number of metacells)
membership - assigmnent of each single cell to a particular metacell
graph.singlecells - igraph object (kNN network) of single-cell data
supercell_size - size of metacells (former super-cells)
gamma - requested graining level
N.SC - number of obtained metacells
genes.use - used genes (NA due to low-dim representation)
do.approx - whether approximate coarse-graining was perfirmed
n.pc - number of principal components used for metacells construction
k.knn - number of neighbors to build single-cell graph
sc.cell.annotation. - single-cell cell type annotation (if provided)
sc.cell.split.condition. - single-cell split condition (if provided)
SC.cell.annotation. - super-cell cell type annotation (if was provided for single cells)
SC.cell.split.condition. - super-cell split condition (if was provided for single cells)
Convert Anndata metacell object (Metacell-2 or SEACells) to Super-cell like object
Description
Convert Anndata metacell object (Metacell-2 or SEACells) to Super-cell like object
Usage
anndata_2_supercell(adata, simplification.algo = "unknown")
Arguments
adata |
anndata object of metacells
(for example, the output of Please, **make sure**, adata has ‘uns[’sc.obs']' field containing observation information of single-cell data, in particular, a column 'membership' (single-cell assignemnt to metacells) |
simplification.algo |
metacell construction algorithm (i.e., Metacell2 or SEACells) |
Value
a list of super-cell like object (similar to the output of SCimplify)
Build kNN graph
Description
Build kNN graph either from distance (from == "dist") or from coordinates (from == "coordinates")
Usage
build_knn_graph(
X,
k = 5,
from = c("dist", "coordinates"),
use.nn2 = TRUE,
return_neighbors_order = FALSE,
dist_method = "euclidean",
cor_method = "pearson",
p = 2,
directed = FALSE,
DoSNN = FALSE,
which.snn = c("bluster", "dbscan"),
pruning = NULL,
kmin = 0,
...
)
Arguments
X |
either distance or matrix of coordinates (rows are samples and cols are coordinates) |
k |
kNN parameter |
from |
from which data type to build kNN network: "dist" if X is a distance (dissimilarity) or "coordinates" if X is a matrix with coordinates as cols and cells as rows |
use.nn2 |
whether use nn2 method to buid kNN network faster (available only for "coordinates" option) |
return_neighbors_order |
whether return order of neighbors (not available for nn2 option) |
dist_method |
method to compute dist (if X is a matrix of coordinates) available: c("cor", "euclidean", "maximum", "manhattan", "canberra", "binary", "minkowski") |
cor_method |
if distance is computed as correlation (dist_method == "cor), which type of correlation to use (available: "pearson", "kendall", "spearman") |
p |
p param in |
directed |
whether to build a directed graph |
DoSNN |
whether to apply shared nearest neighbors (default is |
which.snn |
whether to use neighborsToSNNGraph or sNN for sNN graph construction |
pruning |
quantile to perform edge pruning (default is |
kmin |
keep at least |
... |
other parameters of neighborsToSNNGraph or sNN |
Value
a list with components
graph.knn - igraph object
order - Nxk matrix with indices of k nearest neighbors ordered by relevance (from 1st to k-th)
Build kNN graph using RANN::nn2
(used in "build_knn_graph"
)
Description
Build kNN graph using RANN::nn2
(used in "build_knn_graph"
)
Usage
build_knn_graph_nn2(
X,
k = min(5, ncol(X)),
mode = "all",
DoSNN = FALSE,
which.snn = c("bluster", "dbscan"),
pruning = NULL,
kmin = 0,
...
)
Arguments
X |
matrix of coordinates (rows are samples and cols are coordinates) |
k |
kNN parameter |
mode |
mode of graph_from_adj_list ('all' – undirected graph, 'out' – directed graph) |
DoSNN |
whether to apply shared nearest neighbors (default is |
which.snn |
whether to use neighborsToSNNGraph or sNN for sNN graph construction |
pruning |
quantile to perform edge pruning (default is |
kmin |
keep at least |
... |
other parameters of neighborsToSNNGraph or sNN |
Value
a list with components
graph.knn - igraph object
Cancer cell lines dataset
Description
ScRNA-seq data of 5 cancer cell lines from [Tian et al., 2019](https://doi.org/10.1038/s41592-019-0425-8).
Usage
cell_lines
Format
A list with gene expression (i.e., log-normalized counts) (GE), and metadata data (meta):
- GE
gene expression (log-normalized counts) matrix
- meta
cells metadata (cell line annotation)
Details
Data available at authors' [GitHub](https://github.com/LuyiTian/sc_mixology/blob/master/data/) under file name *sincell_with_class_5cl.Rdata*.
Source
Build kNN graph from distance
(used in "build_knn_graph"
)
Description
Build kNN graph from distance
(used in "build_knn_graph"
)
Usage
knn_graph_from_dist(D, k = 5, return_neighbors_order = TRUE, mode = "all")
Arguments
D |
dist matrix or dist object (preferentially) |
k |
kNN parameter |
return_neighbors_order |
whether return order of neighbors (not available for nn2 option) |
mode |
mode of graph_from_adj_list ('all' – undirected graph, 'out' – directed graph) |
Value
a list with components
graph.knn - igraph object
order - Nxk matrix with indices of k nearest neighbors ordered by relevance (from 1st to k-th)
Convert Metacells (Metacell-2) to Super-cell like object
Description
Convert Metacells (Metacell-2) to Super-cell like object
Usage
metacell2_anndata_2_supercell(adata, obs.sc)
Arguments
adata |
anndata object of metacells (the output of |
obs.sc |
a dataframe of the single-cell anndata object used to compute metacells (anndata after applying |
Value
a list of super-cell like object (similar to the output of SCimplify)
Compute mixing of single-cells within supercell
Description
Compute mixing of single-cells within supercell
Usage
sc_mixing_score(SC, clusters)
Arguments
SC |
super-cell object (output of SCimplify function) |
clusters |
vector of clustering assignment (reference assignment) |
Value
a vector of single-cell mixing within super-cell it belongs to, which is defined as: 1 - proportion of cells of the same annotation (e.g., cell type) within the same super-cell With 0 meaning that super-cell consists of single cells from one cluster (reference assignment) and higher values correspond to higher cell type mixing within super-cell
Super-cells to Seurat object
Description
This function transforms super-cell gene expression and super-cell partition into Seurat object
Usage
supercell_2_Seurat(
SC.GE,
SC,
fields = c(),
var.genes = NULL,
do.preproc = TRUE,
is.log.normalized = TRUE,
do.center = TRUE,
do.scale = TRUE,
N.comp = NULL,
output.assay.version = "v4"
)
Arguments
SC.GE |
gene expression matrix with genes as rows and cells as columns |
SC |
super-cell (output of SCimplify function) |
fields |
which fields of |
var.genes |
set of genes used as a set of variable features of Seurat (by default is the set of genes used to generate super-cells), ignored if |
do.preproc |
whether to do prepocessing, including data normalization, scaling, HVG, PCA, nearest neighbors, |
is.log.normalized |
whether |
do.center |
whether to center gene expression matrix to compute PCA, ignored if |
do.scale |
whether to scale gene expression matrix to compute PCA, ignored if |
N.comp |
number of principal components to use for construction of single-cell kNN network, ignored if |
output.assay.version |
version of the seurat assay in output, |
Details
Since the input of CreateSeuratObject should be unnormalized count matrix (UMIs or TPMs, see CreateSeuratObject).
Thus, we manually set field `assays$RNA@data`
to SC.GE
if is.log.normalized == TRUE
.
Avoid running NormalizeData for the obtained Seurat object, otherwise this will overwrite field `assays$RNA@data`
.
If you have run NormalizeData, then make sure to replace `assays$RNA@data`
with correct matrix by running
`your_seurat@assays$RNA@data <- your_seurat@assays$RNA@counts`
.
Since super-cells have different size (consist of different number of single cells), we use sample-weighted algorithms for all
possible steps of the downstream analysis, including scaling and dimensionality reduction. Thus, generated Seurat object comes
with the results of sample-wighted scaling (available as `your_seurat@assays$RNA@scale.data`
or
`your_seurat@assays$RNA@misc[["scale.data.weighted"]]`
to reproduce if the first one has been overwritten) and PCA (available as
`your_seurat@reductions$pca`
or `your_seurat@reductions$pca_weighted`
to reproduce if the first one has been overwritten).
Value
Seurat object
Examples
data(cell_lines)
SC <- SCimplify(X=cell_lines$GE, gamma = 20)
SC$ident <- supercell_assign(clusters = cell_lines$meta, supercell_membership = SC$membership)
SC.GE <- supercell_GE(cell_lines$GE, SC$membership)
m.seurat <- supercell_2_Seurat(SC.GE = SC.GE, SC = SC, fields = c("ident"))
Super-cells to SingleCellExperiment object
Description
This function transforms super-cell gene expression and super-cell partition into SingleCellExperiment object
Usage
supercell_2_sce(
SC.GE,
SC,
fields = c(),
var.genes = NULL,
do.preproc = TRUE,
is.log.normalized = TRUE,
do.center = TRUE,
do.scale = TRUE,
ncomponents = 50
)
Arguments
SC.GE |
gene expression matrix with genes as rows and cells as columns |
SC |
super-cell (output of SCimplify function) |
fields |
which fields of |
var.genes |
set of genes used as a set of variable features of SingleCellExperiment (by default is the set of genes used to generate super-cells) |
do.preproc |
whether to do prepocessing, including data normalization, scaling, HVG, PCA, nearest neighbors, |
is.log.normalized |
whether |
do.center |
whether to center gene expression matrix to compute PCA |
do.scale |
whether to scale gene expression matrix to compute PCA |
ncomponents |
number of principal components to compute |
Value
SingleCellExperiment object
Examples
data(cell_lines)
SC <- SCimplify(cell_lines$GE, gamma = 20)
SC$ident <- supercell_assign(clusters = cell_lines$meta, supercell_membership = SC$membership)
SC.GE <- supercell_GE(cell_lines$GE, SC$membership)
sce <- supercell_2_sce(SC.GE = SC.GE, SC = SC, fields = c("ident"))
Plot metacell 2D plot (PCA, UMAP, tSNE etc)
Description
Plots 2d representation of metacells
Usage
supercell_DimPlot(
SC,
groups = NULL,
dim.name = "PCA",
dim.1 = 1,
dim.2 = 2,
color.use = NULL,
asp = 1,
alpha = 0.7,
title = NULL,
do.sqtr.rescale = FALSE
)
Arguments
SC |
SuperCell computed metacell object (the output of SCimplify) |
groups |
an assigment of metacells to any group (for ploting in different colors) |
dim.name |
name of the dimensionality reduction to plot (must be a field in |
dim.1 |
dimension to plot on X-axis |
dim.2 |
dimension to plot on Y-axis |
color.use |
colros to use for groups, if |
asp |
aspect ratio |
alpha |
a rotation of the layout (either provided or computed) |
title |
a title of a plot |
do.sqtr.rescale |
whether to sqrt-scale node size (to balance plot if some metacells are large and covers smaller metacells) |
Value
Differential expression analysis of supep-cell data. Most of the parameters are the same as in Seurat FindAllMarkers (for simplicity)
Description
Differential expression analysis of supep-cell data. Most of the parameters are the same as in Seurat FindAllMarkers (for simplicity)
Usage
supercell_FindAllMarkers(
ge,
clusters,
supercell_size = NULL,
genes.use = NULL,
logfc.threshold = 0.25,
min.expr = 0,
min.pct = 0.1,
seed = 12345,
only.pos = FALSE,
return.extra.info = FALSE,
do.bootstrapping = FALSE
)
Arguments
ge |
gene expression matrix for super-cells (rows - genes, cols - super-cells) |
clusters |
a vector with clustering information (ordered the same way as in |
supercell_size |
a vector with supercell size (ordered the same way as in |
genes.use |
set of genes to test. Defeult – all genes in |
logfc.threshold |
log fold change threshold for genes to be considered in the further analysis |
min.expr |
minimal expression (default 0) |
min.pct |
remove genes with lower percentage of detection from the set of genes which will be tested |
seed |
random seed to use |
only.pos |
whether to compute only positive (upregulated) markers |
return.extra.info |
whether to return extra information about test and its statistics. Default is FALSE. |
do.bootstrapping |
whether to perform bootstrapping when computing standard error and p-value in wtd.t.test |
Value
list of results of supercell_FindMarkers
Differential expression analysis of supep-cell data. Most of the parameters are the same as in Seurat FindMarkers (for simplicity)
Description
Differential expression analysis of supep-cell data. Most of the parameters are the same as in Seurat FindMarkers (for simplicity)
Usage
supercell_FindMarkers(
ge,
supercell_size = NULL,
clusters,
ident.1,
ident.2 = NULL,
genes.use = NULL,
logfc.threshold = 0.25,
min.expr = 0,
min.pct = 0.1,
seed = 12345,
only.pos = FALSE,
return.extra.info = FALSE,
do.bootstrapping = FALSE
)
Arguments
ge |
gene expression matrix for super-cells (rows - genes, cols - super-cells) |
supercell_size |
a vector with supercell size (ordered the same way as in |
clusters |
a vector with clustering information (ordered the same way as in |
ident.1 |
name(s) of cluster for which markers are computed |
ident.2 |
name(s) of clusters for comparison. If |
genes.use |
set of genes to test. Defeult – all genes in |
logfc.threshold |
log fold change threshold for genes to be considered in the further analysis |
min.expr |
minimal expression (default 0) |
min.pct |
remove genes with lower percentage of detection from the set of genes which will be tested |
seed |
random seed to use |
only.pos |
whether to compute only positive (upregulated) markers |
return.extra.info |
whether to return extra information about test and its statistics. Default is FALSE. |
do.bootstrapping |
whether to perform bootstrapping when computing standard error and p-value in wtd.t.test |
Value
a matrix with a test name (t-test), statisctics, adjusted p-values, logFC, percenrage of detection in eacg ident and mean expresiion
Simplification of scRNA-seq dataset
Description
This function converts (i.e., averages or sums up) gene-expression matrix of single-cell data into a gene expression matrix of metacells
Usage
supercell_GE(
ge,
groups,
mode = c("average", "sum"),
weights = NULL,
do.median.norm = FALSE
)
Arguments
ge |
gene expression matrix (or any coordinate matrix) with genes as rows and cells as cols |
groups |
vector of membership (assignment of single-cell to metacells) |
mode |
string indicating whether to average or sum up 'ge' within metacells |
weights |
vector of a cell weight (NULL by default), used for computing average gene expression withing cluster of metaells |
do.median.norm |
whether to normalize by median value (FALSE by default) |
Value
a matrix of simplified (averaged withing groups) data with ncol equal to number of groups and nrows as in the initial dataset
Simplification of scRNA-seq dataset (old version, not used since 12.02.2021)
Description
This function converts gene-expression matrix of single-cell data into a gene expression matrix of super-cells
Usage
supercell_GE_idx(ge, groups, weights = NULL, do.median.norm = FALSE)
Arguments
ge |
gene expression matrix (or any coordinate matrix) with genes as rows and cells as cols |
groups |
vector of membership (assignment of single-cell to super-cells) |
weights |
vector of a cell weight (NULL by default), used for computing average gene expression withing cluster of super-cells |
do.median.norm |
whether to normalize by median value (FALSE by default) |
Value
a matrix of simplified (averaged withing groups) data with ncol equal to number of groups and nrows as in the initial dataset
Gene-gene correlation plot
Description
Plots gene-gene expression and computes their correaltion
Usage
supercell_GeneGenePlot(
ge,
gene_x,
gene_y,
supercell_size = NULL,
clusters = NULL,
color.use = NULL,
idents = NULL,
pt.size = 1,
alpha = 0.9,
x.max = NULL,
y.max = NULL,
same.x.lims = FALSE,
same.y.lims = FALSE,
ncol = NULL,
combine = TRUE,
sort.by.corr = TRUE
)
Arguments
ge |
a gene expression matrix of super-cells (ncol same as number of super-cells) |
gene_x |
gene or vector of genes (if vector, has to be the same lenght as gene_y) |
gene_y |
gene or vector of genes (if vector, has to be the same lenght as gene_x) |
supercell_size |
a vector with supercell size (ordered the same way as in |
clusters |
a vector with clustering information (ordered the same way as in |
color.use |
colors for idents |
idents |
idents (clusters) to plot (default all) |
pt.size |
point size (if supercells have identical sizes) |
alpha |
transparency |
x.max |
max of x axis |
y.max |
max of y axis |
same.x.lims |
same x axis for all plots |
same.y.lims |
same y axis for all plots |
ncol |
number of colums in combined plot |
combine |
combine plots into a single patchworked ggplot object. If FALSE, return a list of ggplot |
sort.by.corr |
whether to sort plots by absolute value of correlation (fist plot genes with largest (anti-)correlation) |
Value
a list with components
p - is a combined ggplot or list of ggplots if
combine = TRUE
w.cor - weighted correlation between genes
a list, where
Plot Gene-gene correlation plot for 1 feature
Description
Used for supercell_GeneGenePlot
Usage
supercell_GeneGenePlot_single(
ge_x,
ge_y,
gene_x_name,
gene_y_name,
supercell_size = NULL,
clusters = NULL,
color.use = NULL,
x.max = NULL,
y.max = NULL,
pt.size = 1,
alpha = 0.9
)
Arguments
ge_x |
first gene expression vector (same length as number of super-cells) |
ge_y |
second gene expression vector (same length as number of super-cells) |
gene_x_name |
name of gene x |
gene_y_name |
name of gene y |
supercell_size |
a vector with supercell size (ordered the same way as in |
clusters |
a vector with clustering information (ordered the same way as in |
color.use |
colors for idents |
x.max |
max of x axis |
y.max |
max of y axis |
pt.size |
point size (0 by default) |
alpha |
transparency of dots |
Compute UMAP of super-cells
Description
Computes UMAP of super-cells
Usage
supercell_UMAP(SC, PCA_name = "SC_PCA", n.comp = NULL, n_neighbors = 15, ...)
Arguments
SC |
super-cell structure (output of SCimplify) with a field |
PCA_name |
name of |
n.comp |
number of vector of principal components to use for computing UMAP |
n_neighbors |
number of neighbors (parameter of umap) |
... |
other parameters of umap |
Value
umap result
Violin plots
Description
Violin plots (similar to VlnPlot with some changes for super-cells)
Usage
supercell_VlnPlot(
ge,
supercell_size = NULL,
clusters,
features = NULL,
idents = NULL,
color.use = NULL,
pt.size = 0,
pch = "o",
y.max = NULL,
y.min = NULL,
same.y.lims = FALSE,
adjust = 1,
ncol = NULL,
combine = TRUE,
angle.text.y = 90,
angle.text.x = 45
)
Arguments
ge |
a gene expression matrix (ncol same as number of super-cells) |
supercell_size |
a vector with supercell size (ordered the same way as in |
clusters |
a vector with clustering information (ordered the same way as in |
features |
name of genes of for which gene expression is plotted |
idents |
idents (clusters) to plot (default all) |
color.use |
colors for idents |
pt.size |
point size (0 by default) |
pch |
shape of jitter dots |
y.max |
max of y axis |
y.min |
min of y axis |
same.y.lims |
same y axis for all plots |
adjust |
param of geom_violin |
ncol |
number of columns in combined plot |
combine |
combine plots into a single patchworked ggplot object. If FALSE, return a list of ggplot |
angle.text.y |
rotation of y text |
angle.text.x |
rotation of x text |
Value
combined ggplot or list of ggplots if combine = TRUE
Plot Violin plot for 1 feature
Description
Used for supercell_VlnPlot
Usage
supercell_VlnPlot_single(
ge1,
supercell_size = NULL,
clusters,
feature = NULL,
color.use = NULL,
pt.size = 0,
pch = "o",
y.max = NULL,
y.min = NULL,
adjust = 1,
angle.text.y = 90,
angle.text.x = 45
)
Arguments
ge1 |
a gene expression vector (same length as number of super-cells) |
supercell_size |
a vector with supercell size (ordered the same way as in |
clusters |
a vector with clustering information (ordered the same way as in |
feature |
gene to plot |
color.use |
colors for idents |
pt.size |
point size (0 by default) |
pch |
shape of jitter dots |
y.max |
max of y axis |
y.min |
min of y axis |
adjust |
param of geom_violin |
angle.text.y |
rotation of y text |
angle.text.x |
rotation of x text |
Assign super-cells to the most aboundant cluster
Description
Assign super-cells to the most aboundant cluster
Usage
supercell_assign(
clusters,
supercell_membership,
method = c("jaccard", "relative", "absolute")
)
Arguments
clusters |
a vector of clustering assignment |
supercell_membership |
a vector of assignment of single-cell data to super-cells (membership field of SCimplify function output) |
method |
method to define the most abuldant cell cluster within super-cells. Available: "jaccard" (default), "relative", "absolute".
|
Value
a vector of super-cell assignment to clusters
Cluster super-cell data
Description
Cluster super-cell data
Usage
supercell_cluster(
D,
k = 5,
supercell_size = NULL,
algorithm = c("hclust", "PAM"),
method = NULL,
return.hcl = TRUE
)
Arguments
D |
a dissimilarity matrix or a dist object |
k |
number of clusters |
supercell_size |
a vector with supercell size (ordered the same way as in D) |
algorithm |
which algorithm to use to compute clustering: |
method |
which method of algorithm to use:
|
return.hcl |
whether to return a result of |
Value
a list with components
clustering - vector of clustering assignment of super-cells
algo - the algorithm used
method - method used with an algorithm
hlc - hclust result (only for
"hclust"
algorithm whenreturn.hcl
is TRUE)
Run RNAvelocity for super-cells (slightly modified from https://github.com/velocyto-team/velocyto.R) Not yet adjusted for super-cell size (not sample-weighted)
Description
Run RNAvelocity for super-cells (slightly modified from https://github.com/velocyto-team/velocyto.R) Not yet adjusted for super-cell size (not sample-weighted)
Usage
supercell_estimate_velocity(
emat,
nmat,
smat = NULL,
membership = NULL,
supercell_size = NULL,
do.run.avegaring = (ncol(emat) == length(membership)),
kCells = 10,
...
)
Arguments
emat |
spliced (exonic) count matrix (see https://github.com/velocyto-team/velocyto.R) |
nmat |
unspliced (nascent) count matrix (https://github.com/velocyto-team/velocyto.R) |
smat |
optional spanning read matrix (used in offset calculations) (https://github.com/velocyto-team/velocyto.R) |
membership |
supercell membership ('membership' field of SCimplify) |
supercell_size |
a vector with supercell size (if emat and nmat provided at super-cell level) |
do.run.avegaring |
whether to run averaging of emat & nmat (if nmat provided at a single-cell level) |
kCells |
number of k nearest neighbors (NN) to use in slope calculation smoothing (see https://github.com/velocyto-team/velocyto.R) |
... |
other parameters from https://github.com/velocyto-team/velocyto.R |
Value
results of https://github.com/velocyto-team/velocyto.R plus metacell size vector
Merging independent SuperCell objects
Description
This function merges independent SuperCell objects
Usage
supercell_merge(SCs, fields = c())
Arguments
SCs |
list of SuperCell objects (results of SCimplify ) |
fields |
which additional fields (e.g., metadata) of the the SuperCell objects to keep when merging |
Value
a list with components
membership - assignment of each single cell to a particular metacell
cell.ids - the original ids of single-cells
supercell_size - size of metacells (former super-cells)
gamma - graining level of the merged object (estimated as an average size of metacells as the independent SuperCell objects might have different graining levels)
N.SC - number of obtained metacells
Examples
data(cell_lines) # list with GE - gene expression matrix (logcounts), meta - cell meta data
GE <- cell_lines$GE
cell.meta <- cell_lines$meta
cell.idx.HCC827 <- which(cell.meta == "HCC827")
cell.idx.H838 <- which(cell.meta == "H838")
SC.HCC827 <- SCimplify(GE[,cell.idx.HCC827], # log-normalized gene expression matrix
gamma = 20, # graining level
n.var.genes = 1000,
k.knn = 5, # k for kNN algorithm
n.pc = 10) # number of principal components to use
SC.HCC827$cell.line <- supercell_assign(
cell.meta[cell.idx.HCC827],
supercell_membership = SC.HCC827$membership)
SC.H838 <- SCimplify(GE[,cell.idx.H838], # log-normalized gene expression matrix
gamma = 30, # graining level
n.var.genes = 1000, # number of top var genes to use for the dim reduction
k.knn = 5, # k for kNN algorithm
n.pc = 15) # number of proncipal components to use
SC.H838$cell.line <- supercell_assign(
cell.meta[cell.idx.H838],
supercell_membership = SC.H838$membership)
SC.merged <- supercell_merge(list(SC.HCC827, SC.H838), fields = c("cell.line"))
# compute metacell gene expression for SC.HCC827
SC.GE.HCC827 <- supercell_GE(GE[, cell.idx.HCC827], groups = SC.HCC827$membership)
# compute metacell gene expression for SC.H838
SC.GE.H838 <- supercell_GE(GE[, cell.idx.H838], groups = SC.H838$membership)
# merge GE matricies
SC.GE.merged <- supercell_mergeGE(list(SC.GE.HCC827, SC.GE.H838))
Merging metacell gene expression matrices from several independent SuperCell objects
Description
This function merges independent SuperCell objects
Usage
supercell_mergeGE(SC.GEs)
Arguments
SC.GEs |
list of metacell gene expression matricies (result of supercell_GE ), make sure the order of the gene expression metricies is the same as in the call of supercell_merge |
Value
a merged matrix of gene expression
Examples
# see examples in \link{supercell_merge}
Plot metacell NW
Description
Plot metacell NW
Usage
supercell_plot(
SC.nw,
group = NULL,
color.use = NULL,
lay.method = c("nicely", "fr", "components", "drl", "graphopt"),
lay = NULL,
alpha = 0,
seed = 12345,
main = NA,
do.frames = TRUE,
do.extra.log.rescale = FALSE,
do.directed = FALSE,
log.base = 2,
do.extra.sqtr.rescale = FALSE,
frame.color = "black",
weights = NULL,
min.cell.size = 0,
return.meta = FALSE
)
Arguments
SC.nw |
a super-cell (metacell) network (a field |
group |
an assigment of metacells to any group (for ploting in different colors) |
color.use |
colros to use for groups, if |
lay.method |
method to compute layout of the network (for the moment there several available: "nicely" for layout_nicely and "fr" for layout_with_fr, "components" for layout_components, "drl" for layout_with_drl, "graphopt" for layout_with_graphopt). If your dataset has clear clusters, use "components" |
lay |
a particular layout of a graph to plot (in is not |
alpha |
a rotation of the layout (either provided or computed) |
seed |
a random seed used to compute graph layout |
main |
a title of a plot |
do.frames |
whether to keep vertex.frames in the plot |
do.extra.log.rescale |
whether to log-scale node size (to balance plot if some metacells are large and covers smaller metacells) |
do.directed |
whether to plot edge direction |
log.base |
base with thich to log-scale node size |
do.extra.sqtr.rescale |
whether to sqrt-scale node size (to balance plot if some metacells are large and covers smaller metacells) |
frame.color |
color of node frames, black by default |
weights |
edge weights used for some layout algorithms |
min.cell.size |
do not plot cells with smaller size |
return.meta |
whether to return all the meta data |
Value
plot of a super-cell network
Examples
data(cell_lines) # list with GE - gene expression matrix (logcounts), meta - cell meta data
GE <- cell_lines$GE
cell.meta <- cell_lines$meta
SC <- SCimplify(GE, # gene expression matrix
gamma = 20) # graining level
# Assign metacell to a cell line
SC2cellline <- supercell_assign(
clusters = cell.meta, # single-cell assignment to cell lines
supercell_membership = SC$membership) # single-cell assignment to metacells
# Plot metacell network colored by cell line
supercell_plot(SC$graph.supercells, # network
group = SC2cellline, # group assignment
main = "Metacell colored by cell line assignment",
lay.method = 'nicely')
Plot super-cell NW colored by an expression of a gene (gradient color)
Description
Plot super-cell NW colored by an expression of a gene (gradient color)
Usage
supercell_plot_GE(
SC.nw,
ge,
color.use = c("gray", "blue"),
n.color.gradient = 10,
main = NA,
legend.side = 4,
gene.name = NULL,
...
)
Arguments
SC.nw |
a super-cell network (a field |
ge |
a gene expression vector (same length as number of super-cells) |
color.use |
colors of gradient |
n.color.gradient |
number of bins of the gradient, default is 10 |
main |
plot title |
legend.side |
a side parameter of gradientLegend function (default is 4) |
gene.name |
name of gene of for which gene expression is plotted |
... |
rest of the parameters of supercell_plot function |
Value
plot of a super-cell network with color representing an expression level
Plot super-cell UMAP (Use supercell_DimPlot instead) Plots super-cell UMAP (result of supercell_UMAP)
Description
Plot super-cell UMAP (Use supercell_DimPlot instead) Plots super-cell UMAP (result of supercell_UMAP)
Usage
supercell_plot_UMAP(
SC,
groups,
UMAP_name = "SC_UMAP",
color.use = NULL,
asp = 1,
alpha = 0.7,
title = NULL
)
Arguments
SC |
super-cell structure (output of SCimplify) with a field |
groups |
coloring metacells by groups |
UMAP_name |
the mane of the field containing UMAP result |
color.use |
colors of groups |
asp |
plot aspect ratio |
alpha |
transparency of |
title |
title of the plot |
Value
Plot super-cell tSNE (Use supercell_DimPlot instead) Plots super-cell tSNE (result of supercell_tSNE)
Description
Plot super-cell tSNE (Use supercell_DimPlot instead) Plots super-cell tSNE (result of supercell_tSNE)
Usage
supercell_plot_tSNE(
SC,
groups,
tSNE_name = "SC_tSNE",
color.use = NULL,
asp = 1,
alpha = 0.7,
title = NULL
)
Arguments
SC |
super-cell structure (output of SCimplify) with a field |
groups |
coloring metacells by groups |
tSNE_name |
the mane of the field containing tSNE result |
color.use |
colors of groups |
asp |
plot aspect ratio |
alpha |
transparency of |
title |
title of the plot |
Value
compute PCA for super-cell data (sample-weighted data)
Description
compute PCA for super-cell data (sample-weighted data)
Usage
supercell_prcomp(
X,
genes.use = NULL,
genes.exclude = NULL,
supercell_size = NULL,
k = 20,
do.scale = TRUE,
do.center = TRUE,
fast.pca = TRUE,
seed = 12345
)
Arguments
X |
super-cell transposed gene expression matrix (! where rows represent super-cells and cols represent genes) |
genes.use |
genes to use for dimensionality reduction |
genes.exclude |
genes to exclude from dimensionaloty reduction |
supercell_size |
a vector with supercell sizes (ordered the same way as in X) |
k |
number of components to compute |
do.scale |
scale data before PCA |
do.center |
center data before PCA |
fast.pca |
whether to run fast PCA (works for datasets with |super-cells| > 50) |
seed |
a seed to use for |
Value
the same object as prcomp result
Compute purity of super-cells
Description
Compute purity of super-cells
Usage
supercell_purity(
clusters,
supercell_membership,
method = c("max_proportion", "entropy")[1]
)
Arguments
clusters |
vector of clustering assignment (reference assignment) |
supercell_membership |
vector of assignment of single-cell data to super-cells (membership field of SCimplify function output) |
method |
method to compute super-cell purity.
|
Value
a vector of super-cell purity, which is defined as:
- proportion of the most abundant cluster within super-cell for method = "max_proportion"
or
- Shanon entropy for method = "entropy"
.
With 1 meaning that super-cell consists of single cells from one cluster (reference assignment)
Rescale supercell object
Description
This function recomputes super-cell structure at a different graining level (gamma
) or
for a specific number of super-cells (N.SC
)
Usage
supercell_rescale(SC.object, gamma = NULL, N.SC = NULL)
Arguments
SC.object |
super-cell object (an output from SCimplify function) |
gamma |
new grainig level (provide either |
N.SC |
new number of super-cells (provide either |
Value
the same object as SCimplify at a new graining level
Compute Silhouette index accounting for samlpe size (super cells size) ###
Description
Compute Silhouette index accounting for samlpe size (super cells size) ###
Usage
supercell_silhouette(x, dist, supercell_size = NULL)
Arguments
x |
– clustering |
dist |
- distance among super-cells |
supercell_size |
– super-cell size |
Value
silhouette result
Compute tSNE of super-cells
Description
Computes tSNE of super-cells
Usage
supercell_tSNE(
SC,
PCA_name = "SC_PCA",
n.comp = NULL,
perplexity = 30,
seed = 12345,
...
)
Arguments
SC |
super-cell structure (output of SCimplify) with a field |
PCA_name |
name of |
n.comp |
number of vector of principal components to use for computing tSNE |
perplexity |
perplexity parameter (parameter of Rtsne) |
seed |
random seed |
... |
other parameters of Rtsne |
Value
Rtsne result