Skip to contents

This function projects data from one single-cell object (or a similar structure) into the UMAP embedding of another object. The embedding must be precomputed using methods such as LSI or PCA. This allows for co-embedding of new data into an existing embedding space.

Usage

project_data(
  projector = NULL,
  projectee = NULL,
  ncells_coembedding = 5000,
  scale = FALSE,
  reduced_dim = "LSI",
  embedding = "UMAP",
  make_pseudo_single_cells = FALSE,
  features = c("annotation-based", "range-based"),
  n = 250,
  verbose = TRUE,
  threads = 6,
  seed = 2020,
  force = FALSE
)

Arguments

projector

A cell_data_set or Seurat object containing a reduced dimension matrix (e.g., LSI) and the model used to generate a UMAP embedding.

projectee

A SummarizedExperiment, cell_data_set, or Seurat object that will be projected into the UMAP space defined by the projector.

ncells_coembedding

Numeric, the number of cells to use from the projector for co-embedding with the pseudo-single cells. Default is 5000, or the total number of cells if less than 5000.

scale

Logical, whether to scale the projected data after projection. Default is FALSE.

reduced_dim

Character, specifies the reduced dimension method (e.g., "LSI" or "PCA"). Default is "LSI".

embedding

Character, specifies the embedding type to be used (currently only "UMAP" supported). Default is "UMAP".

make_pseudo_single_cells

Logical, whether to simulate pseudo-single cells from the projectee data, useful for bulk data. Default is FALSE.

features

Character vector, specifying whether the projection should be based on "annotation-based" or "range-based" features. Default is c("annotation-based", "range-based").

n

Integer, the number of subsampled "pseudo single cells" per bulk sample, relevant when make_pseudo_single_cells is TRUE. Default is 250.

verbose

Logical, whether to print verbose output during execution. Default is TRUE.

threads

Integer, the number of threads for parallel execution. Default is 6.

seed

Integer, seed for reproducibility. Default is 2020.

force

Logical, whether to continue even if the overlap ratio of features between the projector and projectee is below a certain threshold. Default is FALSE.

Value

A SimpleList object containing the UMAP coordinates of the projected data (both original single-cell and simulated bulk data) and the reduced dimension matrix.

Details

The function supports projections using models built from UMAP embeddings, which are ideal for comparing single-cell data, and can also simulate pseudo-single cells for bulk data. It is adapted from: ArchR: An integrative and scalable software package for single-cell chromatin accessibility analysis by Granja et al. (2020).