Function to infer cell labels using a trained model
viewmastR_infer.Rd
This function infers cell labels using a trained model and updates the input dataset with the inferred labels.
Usage
viewmastR_infer(
query_cds,
model_dir,
selected_features,
query_celldata_col = "viewmastR_inferred",
labels = NULL,
verbose = TRUE,
return_probs = FALSE,
return_type = c("object", "list"),
chunks = 1,
workers = 1,
batch_size = NULL,
show_progress = TRUE
)
Arguments
- query_cds
Seurat or cell_data_set object - The dataset for which cell labels are to be inferred.
- model_dir
character path to the trained model file.
- selected_features
character vector - Features used for inference (must be the same used during model creation).
- query_celldata_col
character vector - names of the column to store inferred cell labels in the query dataset. Default is "viewmastR_inferred".
- labels
character vector - optional labels corresponding to the class indices. Default is NULL.
- verbose
bool - show messaging
- return_probs
logical If TRUE, returns the class probabilities. Default is FALSE.
- return_type
A character string, either
"object"
or"list"
. If"object"
, the updatedquery_cds
is returned. If"list"
, a list containing the updated object and the raw inference results is returned. Default is"object"
.- chunks
An integer indicating the number of chunks to split the data into for parallelization. Default is 1 (no chunking).
- workers
An integer specifying the number of parallel workers to use. Default is 1 (no parallelization).
- batch_size
An integer specifying the batch size used during inference. If
NULL
, a heuristic is used to determine a suitable batch size based on the data and number of chunks.- show_progress
A logical indicating whether to show a progress bar with total elapsed time. Default is
TRUE
.
Value
Depending on return_type
:
"object"
: Returns the updatedquery_cds
with inferred labels inquery_celldata_col
and optionally probabilities in the metadata."list"
: Returns a list containing:object
The updatedquery_cds
.training_output
The raw inference results including probabilities.
Details
The function first checks that all variable features specified in selected_features
are present in query_cds
, extracts normalized counts, and determines whether
to run sequentially or in parallel. When parallelization is enabled (workers > 1
),
the dataset is split into chunks, and each chunk is processed and run through the
model inference in parallel. A progress bar is displayed showing the number of
completed chunks and total elapsed time.
The underlying model is loaded from the specified model_path
, and class
probabilities (log-odds) are computed for each cell. The function then assigns the
most likely label to each cell. Optionally, the probabilities are added to the
object's metadata.