Function to infer cell labels using a trained model

This function infers cell labels using a trained model and updates the input dataset with the inferred labels.

Usage

viewmastR_infer(
  query_cds,
  model_dir,
  selected_features,
  query_celldata_col = "viewmastR_inferred",
  labels = NULL,
  verbose = TRUE,
  return_probs = FALSE,
  return_type = c("object", "list"),
  chunks = 1,
  workers = 1,
  batch_size = NULL,
  show_progress = TRUE
)

Arguments

query_cds: Seurat or cell_data_set object - The dataset for which cell labels are to be inferred.
model_dir: character path to the trained model file.
selected_features: character vector - Features used for inference (must be the same used during model creation).
query_celldata_col: character vector - names of the column to store inferred cell labels in the query dataset. Default is "viewmastR_inferred".
labels: character vector - optional labels corresponding to the class indices. Default is NULL.
verbose: bool - show messaging
return_probs: logical If TRUE, returns the class probabilities. Default is FALSE.
return_type: A character string, either "object" or "list". If "object", the updated query_cds is returned. If "list", a list containing the updated object and the raw inference results is returned. Default is "object".
chunks: An integer indicating the number of chunks to split the data into for parallelization. Default is 1 (no chunking).
workers: An integer specifying the number of parallel workers to use. Default is 1 (no parallelization).
batch_size: An integer specifying the batch size used during inference. If NULL, a heuristic is used to determine a suitable batch size based on the data and number of chunks.
show_progress: A logical indicating whether to show a progress bar with total elapsed time. Default is TRUE.

Value

Depending on return_type:

"object": Returns the updated query_cds with inferred labels in query_celldata_col and optionally probabilities in the metadata.
"list": Returns a list containing:
objectThe updated query_cds.
training_outputThe raw inference results including probabilities.

Details

The function first checks that all variable features specified in selected_features are present in query_cds, extracts normalized counts, and determines whether to run sequentially or in parallel. When parallelization is enabled (workers > 1), the dataset is split into chunks, and each chunk is processed and run through the model inference in parallel. A progress bar is displayed showing the number of completed chunks and total elapsed time.

The underlying model is loaded from the specified model_path, and class probabilities (log-odds) are computed for each cell. The function then assigns the most likely label to each cell. Optionally, the probabilities are added to the object's metadata.

Function to infer cell labels using a trained model

Usage

Arguments

Value

Details

See also