Iterative Latent Semantic Indexing (LSI) for Single-Cell Data (Compatible with Monocle3 and Seurat) with optional UMAP
iterative_LSI.Rd
This function performs iterative LSI on single-cell data to minimize batch effects and accentuate cell type differences. It can accept either a Monocle3 cell_data_set
object or a Seurat object as input. The function iteratively:
Applies TF-IDF transformation and Singular Value Decomposition (SVD) to normalize the data.
Clusters the normalized data using Leiden clustering in high-dimensional space.
Identifies over-represented features in the resulting clusters using a simple counting method.
These steps are repeated, using features identified in step 3 to subset the normalization matrix in step 1, and the process is repeated for a specified number of iterations. This method is inspired by Granja et al. (2019) and aims to enhance the separation of cell types while reducing batch effects.
Usage
iterative_LSI(
object,
num_dim = 25,
starting_features = NULL,
resolution = c(1e-04, 3e-04, 5e-04),
do_tf_idf = TRUE,
num_features = c(3000, 3000, 3000),
exclude_features = NULL,
binarize = FALSE,
scale = TRUE,
log_transform = TRUE,
LSI_method = 1,
partition_qval = 0.05,
seed = 2020,
scale_to = 10000,
leiden_k = 20,
leiden_weight = FALSE,
leiden_iter = 1,
verbose = FALSE,
return_iterations = FALSE,
run_umap = FALSE,
...
)
Arguments
- object
A Monocle3
cell_data_set
object or a Seurat object.- num_dim
Integer specifying the number of principal components to use in downstream analysis. Default is 25.
- starting_features
Optional character vector of starting features (e.g., genes or peaks) to use in the first iteration.
- resolution
Numeric vector specifying the resolution parameters for Leiden clustering at each iteration. The number of iterations is determined by the length of this vector.
- do_tf_idf
Logical indicating whether to perform TF-IDF transformation. Default is
TRUE
.- num_features
Integer or numeric vector specifying the number of features to use for dimensionality reduction at each iteration. If a single integer is provided, it is used for all iterations. Default is 3000.
- exclude_features
Optional character vector of features (rownames of the data) to exclude from analysis.
- binarize
Logical indicating whether to binarize the data prior to TF-IDF transformation. Default is
FALSE
.- scale
Logical indicating whether to scale the data to
scale_to
. Default isTRUE
.- log_transform
Logical indicating whether to log-transform the data after scaling. Default is
TRUE
.- scale_to
Numeric value specifying the scaling factor if
scale
isTRUE
. Default is 10000.- leiden_k
Integer specifying the number of nearest neighbors (k) for Leiden clustering. Default is 20.
- leiden_weight
Logical indicating whether to use edge weights in Leiden clustering. Default is
FALSE
.- leiden_iter
Integer specifying the number of iterations for Leiden clustering. Default is 1.
- verbose
Logical indicating whether to display progress messages. Default is
FALSE
.- run_umap
Logical indicating whether to run UMAP.
- ...
Additional arguments passed to lower-level functions.
- random_seed
Integer specifying the random seed for reproducibility. Default is 2020.
- return_object
Logical indicating whether to return the updated input object with LSI reduction and clustering results. Default is
TRUE
.
Value
If return_object
is TRUE
, returns the updated input object (Monocle3 cell_data_set
or Seurat object) with LSI reduction and clustering results added. If FALSE
, returns a list with elements:
lsi_embeddings
The final LSI embeddings.
clusters
The clustering assignments.
iterations
A list containing intermediate results from each iteration.
Details
The function performs iterative LSI as described in Granja et al. (2019), adapting methods from Cusanovich et al. (2018). It is suitable for processing single-cell ATAC-seq or RNA-seq data to identify meaningful clusters and reduce batch effects.
References
Granja, J. M., et al. (2019). Single-cell multiomic analysis identifies regulatory programs in mixed-phenotype acute leukemia. Nature Biotechnology, 37(12), 1458–1465.
Cusanovich, D. A., et al. (2018). The cis-regulatory dynamics of embryonic development at single-cell resolution. Nature, 555(7697), 538–542.