Skip to contents

This function infers cell labels using a trained model and updates the input dataset with the inferred labels.

Usage

viewmastR_infer(
  query_cds,
  model_dir,
  selected_features,
  query_celldata_col = "viewmastR_inferred",
  labels = NULL,
  verbose = TRUE,
  return_probs = FALSE,
  return_type = c("object", "list"),
  chunks = 1,
  workers = 1,
  batch_size = NULL,
  show_progress = TRUE
)

Arguments

query_cds

Seurat or cell_data_set object - The dataset for which cell labels are to be inferred.

model_dir

character path to the trained model file.

selected_features

character vector - Features used for inference (must be the same used during model creation).

query_celldata_col

character vector - names of the column to store inferred cell labels in the query dataset. Default is "viewmastR_inferred".

labels

character vector - optional labels corresponding to the class indices. Default is NULL.

verbose

bool - show messaging

return_probs

logical If TRUE, returns the class probabilities. Default is FALSE.

return_type

A character string, either "object" or "list". If "object", the updated query_cds is returned. If "list", a list containing the updated object and the raw inference results is returned. Default is "object".

chunks

An integer indicating the number of chunks to split the data into for parallelization. Default is 1 (no chunking).

workers

An integer specifying the number of parallel workers to use. Default is 1 (no parallelization).

batch_size

An integer specifying the batch size used during inference. If NULL, a heuristic is used to determine a suitable batch size based on the data and number of chunks.

show_progress

A logical indicating whether to show a progress bar with total elapsed time. Default is TRUE.

Value

Depending on return_type:

  • "object": Returns the updated query_cds with inferred labels in query_celldata_col and optionally probabilities in the metadata.

  • "list": Returns a list containing:

  • objectThe updated query_cds.

  • training_outputThe raw inference results including probabilities.

Details

The function first checks that all variable features specified in selected_features are present in query_cds, extracts normalized counts, and determines whether to run sequentially or in parallel. When parallelization is enabled (workers > 1), the dataset is split into chunks, and each chunk is processed and run through the model inference in parallel. A progress bar is displayed showing the number of completed chunks and total elapsed time.

The underlying model is loaded from the specified model_path, and class probabilities (log-odds) are computed for each cell. The function then assigns the most likely label to each cell. Optionally, the probabilities are added to the object's metadata.

See also