Project Data into UMAP Embedding of Single-Cell Object (Compatible with Monocle3 and Seurat)
project_data.Rd
This function projects data from one single-cell object (or a similar structure) into the UMAP embedding of another object. The embedding must be precomputed using methods such as LSI or PCA. This allows for co-embedding of new data into an existing embedding space.
Usage
project_data(
projector = NULL,
projectee = NULL,
ncells_coembedding = 5000,
scale = FALSE,
reduced_dim = "LSI",
embedding = "UMAP",
make_pseudo_single_cells = FALSE,
features = c("annotation-based", "range-based"),
n = 250,
verbose = TRUE,
threads = 6,
seed = 2020,
force = FALSE
)
Arguments
- projector
A
cell_data_set
or Seurat object containing a reduced dimension matrix (e.g., LSI) and the model used to generate a UMAP embedding.- projectee
A
SummarizedExperiment
,cell_data_set
, or Seurat object that will be projected into the UMAP space defined by the projector.- ncells_coembedding
Numeric, the number of cells to use from the
projector
for co-embedding with the pseudo-single cells. Default is 5000, or the total number of cells if less than 5000.- scale
Logical, whether to scale the projected data after projection. Default is
FALSE
.- reduced_dim
Character, specifies the reduced dimension method (e.g., "LSI" or "PCA"). Default is "LSI".
- embedding
Character, specifies the embedding type to be used (currently only "UMAP" supported). Default is "UMAP".
- make_pseudo_single_cells
Logical, whether to simulate pseudo-single cells from the
projectee
data, useful for bulk data. Default isFALSE
.- features
Character vector, specifying whether the projection should be based on "annotation-based" or "range-based" features. Default is
c("annotation-based", "range-based")
.- n
Integer, the number of subsampled "pseudo single cells" per bulk sample, relevant when
make_pseudo_single_cells
isTRUE
. Default is 250.- verbose
Logical, whether to print verbose output during execution. Default is
TRUE
.- threads
Integer, the number of threads for parallel execution. Default is 6.
- seed
Integer, seed for reproducibility. Default is 2020.
- force
Logical, whether to continue even if the overlap ratio of features between the
projector
andprojectee
is below a certain threshold. Default isFALSE
.
Value
A SimpleList
object containing the UMAP coordinates of the projected data (both original single-cell and simulated bulk data) and the reduced dimension matrix.
Details
The function supports projections using models built from UMAP embeddings, which are ideal for comparing single-cell data, and can also simulate pseudo-single cells for bulk data. It is adapted from: ArchR: An integrative and scalable software package for single-cell chromatin accessibility analysis by Granja et al. (2020).