Skip to contents

This function performs iterative LSI on single-cell data to minimize batch effects and accentuate cell type differences. It can accept either a Monocle3 cell_data_set object or a Seurat object as input. The function iteratively:

  1. Applies TF-IDF transformation and Singular Value Decomposition (SVD) to normalize the data.

  2. Clusters the normalized data using Leiden clustering in high-dimensional space.

  3. Identifies over-represented features in the resulting clusters using a simple counting method.

These steps are repeated, using features identified in step 3 to subset the normalization matrix in step 1, and the process is repeated for a specified number of iterations. This method is inspired by Granja et al. (2019) and aims to enhance the separation of cell types while reducing batch effects.

Usage

iterative_LSI(
  object,
  num_dim = 25,
  starting_features = NULL,
  resolution = c(1e-04, 3e-04, 5e-04),
  do_tf_idf = TRUE,
  num_features = c(3000, 3000, 3000),
  exclude_features = NULL,
  binarize = FALSE,
  scale = TRUE,
  log_transform = TRUE,
  LSI_method = 1,
  partition_qval = 0.05,
  seed = 2020,
  scale_to = 10000,
  leiden_k = 20,
  leiden_weight = FALSE,
  leiden_iter = 1,
  verbose = FALSE,
  return_iterations = FALSE,
  run_umap = FALSE,
  ...
)

Arguments

object

A Monocle3 cell_data_set object or a Seurat object.

num_dim

Integer specifying the number of principal components to use in downstream analysis. Default is 25.

starting_features

Optional character vector of starting features (e.g., genes or peaks) to use in the first iteration.

resolution

Numeric vector specifying the resolution parameters for Leiden clustering at each iteration. The number of iterations is determined by the length of this vector.

do_tf_idf

Logical indicating whether to perform TF-IDF transformation. Default is TRUE.

num_features

Integer or numeric vector specifying the number of features to use for dimensionality reduction at each iteration. If a single integer is provided, it is used for all iterations. Default is 3000.

exclude_features

Optional character vector of features (rownames of the data) to exclude from analysis.

binarize

Logical indicating whether to binarize the data prior to TF-IDF transformation. Default is FALSE.

scale

Logical indicating whether to scale the data to scale_to. Default is TRUE.

log_transform

Logical indicating whether to log-transform the data after scaling. Default is TRUE.

scale_to

Numeric value specifying the scaling factor if scale is TRUE. Default is 10000.

leiden_k

Integer specifying the number of nearest neighbors (k) for Leiden clustering. Default is 20.

leiden_weight

Logical indicating whether to use edge weights in Leiden clustering. Default is FALSE.

leiden_iter

Integer specifying the number of iterations for Leiden clustering. Default is 1.

verbose

Logical indicating whether to display progress messages. Default is FALSE.

run_umap

Logical indicating whether to run UMAP.

...

Additional arguments passed to lower-level functions.

random_seed

Integer specifying the random seed for reproducibility. Default is 2020.

return_object

Logical indicating whether to return the updated input object with LSI reduction and clustering results. Default is TRUE.

Value

If return_object is TRUE, returns the updated input object (Monocle3 cell_data_set or Seurat object) with LSI reduction and clustering results added. If FALSE, returns a list with elements:

lsi_embeddings

The final LSI embeddings.

clusters

The clustering assignments.

iterations

A list containing intermediate results from each iteration.

Details

The function performs iterative LSI as described in Granja et al. (2019), adapting methods from Cusanovich et al. (2018). It is suitable for processing single-cell ATAC-seq or RNA-seq data to identify meaningful clusters and reduce batch effects.

References

Granja, J. M., et al. (2019). Single-cell multiomic analysis identifies regulatory programs in mixed-phenotype acute leukemia. Nature Biotechnology, 37(12), 1458–1465.

Cusanovich, D. A., et al. (2018). The cis-regulatory dynamics of embryonic development at single-cell resolution. Nature, 555(7697), 538–542.

Examples

if (FALSE) {
# For a Monocle3 cell_data_set object:
cds <- iterative_LSI(
  object = cds,
  num_dim = 30,
  resolution = c(1e-4, 3e-4, 5e-4)
)

# For a Seurat object:
seurat_obj <- iterative_LSI(
  object = seurat_obj,
  num_dim = 30,
  resolution = c(0.2, 0.5, 0.8)
)
}