Skip to contents

ViewmastR is a tool designed to predict cell type assignments in a query dataset based on reference data. In this tutorial, you’ll learn how to install and use viewmastR, load data, and evaluate its predictions.

Prerequisites

Before we begin, ensure you have an updated Rust installation, as it’s a core dependency. You can follow the instructions provided on the official Rust installation page.

Installing viewmastR

First, ensure you have the devtools R package installed, which allows you to install packages from GitHub. If devtools is installed, you can easily install viewmastR using the following command:

devtools::install_github("furlan-lab/viewmastR")

This will fetch the latest version of viewmastR from GitHub and install it.

Running viewmastR

In this section, we’ll load two Seurat objects:
- Query dataset (seu): Contains the data you want to classify.
- Reference dataset (seur): Contains known cell type labels used to train the model.

ViewmastR predicts the cell types of your query dataset by leveraging the features associated with cell type labels in the reference data.

# Load required packages
suppressPackageStartupMessages({
  library(viewmastR)
  library(Seurat)
  library(ggplot2)
  library(scCustomize)
})


# Load query and reference datasets
seu <- readRDS(file.path(ROOT_DIR1, "240813_final_object.RDS"))
seur <- readRDS(file.path(ROOT_DIR2, "230329_rnaAugmented_seurat.RDS"))

Defining “Ground Truth” in the Query Dataset

Although we don’t know the cell type labels for the query dataset a priori, we can approximate the ground truth by using cluster-based cell type assignments. This approximation will help us evaluate the accuracy of viewmastR’s predictions. We can visualize the query dataset with its ground truth labels to get an initial idea of the cell types we’re working with.

DimPlot(seu, group.by = "ground_truth", cols = seur@misc$colors)

Finding Common Features

The performance of viewmastR is enhanced when the features (genes) are consistent between the query and reference datasets. We’ll now identify and select highly variable genes in both datasets and find the common genes to use for training the model.

# Calculate and plot gene dispersion in query dataset
seu <- calculate_gene_dispersion(seu)
plot_gene_dispersion(seu)

seu <- select_genes(seu, top_n = 10000, logmean_ul = -1, logmean_ll = -8)
plot_gene_dispersion(seu)

vgq <- get_selected_genes(seu)

# Repeat the process for the reference dataset
seur <- calculate_gene_dispersion(seur)
plot_gene_dispersion(seur)

seur <- select_genes(seur, top_n = 10000, logmean_ul = -1, logmean_ll = -8)
plot_gene_dispersion(seur)

vgr <- get_selected_genes(seur)

# Find common genes
vg <- intersect(vgq, vgr)

Visualizing Reference Cell Types

Next, we visualize the reference dataset to see the known cell type classifications that viewmastR will use to train its model.

DimPlot(seur, group.by = "SFClassification", cols = seur@misc$colors)

Running viewmastR

Now we run viewmastR to predict cell types in the query dataset. This function will learn from the reference dataset’s cell type annotations and apply its knowledge to classify the query cells.

seu <- viewmastR(seu, seur, ref_celldata_col = "SFClassification", selected_genes = vg, max_epochs = 4)

Visualizing Predictions

After running viewmastR, we can visualize the predicted cell types for the query dataset.

DimPlot(seu, group.by = "viewmastR_pred", cols = seur@misc$colors)

Evaluating Model Accuracy with a Confusion Matrix

We can further evaluate the accuracy of viewmastR’s predictions by comparing them to the ground truth labels (approximated earlier) using a confusion matrix.

confusion_matrix(pred = factor(seu$viewmastR_pred), gt = factor(seu$ground_truth), cols = seur@misc$colors)

Analyzing Training Performance

ViewmastR can also return a detailed training history, including metrics like training loss and validation loss over time. This helps diagnose overfitting or underfitting during model training.

To access these metrics, you need to set the return_type parameter to "list". Here’s an example of how to retrieve and plot the training data:

# Run viewmastR with return_type = "list"
output_list <- viewmastR(seu, seur, ref_celldata_col = "SFClassification", selected_genes = vg, return_type = "list")

# Plot training data
plot_training_data(output_list)

We can now visualize how the training and validation losses change over the epochs. If the training loss keeps decreasing while the validation loss plateaus or increases, it may indicate overfitting.

plt <- plot_training_data(output_list)
plt

Probabilities

Finally, we can also look at prediction probabilities using the return_probs argument. Doing so will add meta-data columns to the object prefixed with the string “probs_” for each class of prediction. The values are transformed log-odds from the model prediction transformed using the plogis function in R.

seu <- viewmastR(seu, seur, ref_celldata_col = "SFClassification", selected_genes = vg, backend = "candle", max_epochs = 4, return_probs = T)
FeaturePlot_scCustom(seu, features = "prob_14_B")

FeaturePlot_scCustom(seu, features = "prob_16_CD8.N")

Appendix

## R version 4.4.0 (2024-04-24)
## Platform: x86_64-apple-darwin20
## Running under: macOS Ventura 13.6.7
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRblas.0.dylib 
## LAPACK: /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## time zone: America/Los_Angeles
## tzcode source: internal
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] scCustomize_2.1.2  ggplot2_3.5.1      Seurat_5.1.0       SeuratObject_5.0.2
## [5] sp_2.1-4           viewmastR_0.2.3   
## 
## loaded via a namespace (and not attached):
##   [1] fs_1.6.4                    matrixStats_1.3.0          
##   [3] spatstat.sparse_3.0-3       RcppMsgPack_0.2.3          
##   [5] lubridate_1.9.3             httr_1.4.7                 
##   [7] RColorBrewer_1.1-3          doParallel_1.0.17          
##   [9] tools_4.4.0                 sctransform_0.4.1          
##  [11] backports_1.5.0             utf8_1.2.4                 
##  [13] R6_2.5.1                    lazyeval_0.2.2             
##  [15] uwot_0.2.2                  GetoptLong_1.0.5           
##  [17] withr_3.0.0                 gridExtra_2.3              
##  [19] progressr_0.14.0            cli_3.6.2                  
##  [21] Biobase_2.64.0              textshaping_0.4.0          
##  [23] spatstat.explore_3.2-7      fastDummies_1.7.3          
##  [25] labeling_0.4.3              sass_0.4.9                 
##  [27] spatstat.data_3.0-4         proxy_0.4-27               
##  [29] ggridges_0.5.6              pbapply_1.7-2              
##  [31] pkgdown_2.0.9               systemfonts_1.1.0          
##  [33] foreign_0.8-86              R.utils_2.12.3             
##  [35] parallelly_1.37.1           rstudioapi_0.16.0          
##  [37] generics_0.1.3              shape_1.4.6.1              
##  [39] crosstalk_1.2.1             ica_1.0-3                  
##  [41] spatstat.random_3.2-3       dplyr_1.1.4                
##  [43] Matrix_1.7-0                ggbeeswarm_0.7.2           
##  [45] fansi_1.0.6                 S4Vectors_0.42.0           
##  [47] abind_1.4-5                 R.methodsS3_1.8.2          
##  [49] lifecycle_1.0.4             yaml_2.3.8                 
##  [51] snakecase_0.11.1            SummarizedExperiment_1.34.0
##  [53] recipes_1.1.0               SparseArray_1.4.8          
##  [55] Rtsne_0.17                  paletteer_1.6.0            
##  [57] grid_4.4.0                  promises_1.3.0             
##  [59] crayon_1.5.2                miniUI_0.1.1.1             
##  [61] lattice_0.22-6              cowplot_1.1.3              
##  [63] pillar_1.9.0                knitr_1.46                 
##  [65] ComplexHeatmap_2.20.0       GenomicRanges_1.56.0       
##  [67] rjson_0.2.21                boot_1.3-30                
##  [69] future.apply_1.11.2         codetools_0.2-20           
##  [71] leiden_0.4.3.1              glue_1.7.0                 
##  [73] data.table_1.15.4           vctrs_0.6.5                
##  [75] png_0.1-8                   spam_2.10-0                
##  [77] gtable_0.3.5                rematch2_2.1.2             
##  [79] assertthat_0.2.1            cachem_1.1.0               
##  [81] gower_1.0.1                 xfun_0.44                  
##  [83] S4Arrays_1.4.1              mime_0.12                  
##  [85] prodlim_2024.06.25          survival_3.6-4             
##  [87] timeDate_4041.110           SingleCellExperiment_1.26.0
##  [89] iterators_1.0.14            pbmcapply_1.5.1            
##  [91] hardhat_1.4.0               lava_1.8.0                 
##  [93] fitdistrplus_1.1-11         ROCR_1.0-11                
##  [95] ipred_0.9-15                nlme_3.1-164               
##  [97] RcppAnnoy_0.0.22            GenomeInfoDb_1.40.1        
##  [99] bslib_0.7.0                 irlba_2.3.5.1              
## [101] vipor_0.4.7                 KernSmooth_2.23-24         
## [103] rpart_4.1.23                colorspace_2.1-0           
## [105] BiocGenerics_0.50.0         Hmisc_5.1-2                
## [107] nnet_7.3-19                 ggrastr_1.0.2              
## [109] tidyselect_1.2.1            compiler_4.4.0             
## [111] htmlTable_2.4.2             desc_1.4.3                 
## [113] DelayedArray_0.30.1         plotly_4.10.4              
## [115] checkmate_2.3.1             scales_1.3.0               
## [117] lmtest_0.9-40               stringr_1.5.1              
## [119] digest_0.6.35               goftest_1.2-3              
## [121] spatstat.utils_3.1-0        minqa_1.2.7                
## [123] rmarkdown_2.27              XVector_0.44.0             
## [125] htmltools_0.5.8.1           pkgconfig_2.0.3            
## [127] base64enc_0.1-3             lme4_1.1-35.3              
## [129] sparseMatrixStats_1.16.0    MatrixGenerics_1.16.0      
## [131] highr_0.10                  fastmap_1.2.0              
## [133] rlang_1.1.4                 GlobalOptions_0.1.2        
## [135] htmlwidgets_1.6.4           UCSC.utils_1.0.0           
## [137] shiny_1.8.1.1               DelayedMatrixStats_1.26.0  
## [139] farver_2.1.2                jquerylib_0.1.4            
## [141] zoo_1.8-12                  jsonlite_1.8.8             
## [143] ModelMetrics_1.2.2.2        R.oo_1.26.0                
## [145] magrittr_2.0.3              Formula_1.2-5              
## [147] GenomeInfoDbData_1.2.12     dotCall64_1.1-1            
## [149] patchwork_1.2.0             munsell_0.5.1              
## [151] Rcpp_1.0.12                 reticulate_1.37.0          
## [153] stringi_1.8.4               pROC_1.18.5                
## [155] zlibbioc_1.50.0             MASS_7.3-60.2              
## [157] plyr_1.8.9                  parallel_4.4.0             
## [159] listenv_0.9.1               ggrepel_0.9.5              
## [161] forcats_1.0.0               deldir_2.0-4               
## [163] splines_4.4.0               tensor_1.5                 
## [165] circlize_0.4.16             igraph_2.0.3               
## [167] spatstat.geom_3.2-9         RcppHNSW_0.6.0             
## [169] reshape2_1.4.4              stats4_4.4.0               
## [171] evaluate_0.23               ggprism_1.0.5              
## [173] nloptr_2.0.3                foreach_1.5.2              
## [175] httpuv_1.6.15               RANN_2.6.1                 
## [177] tidyr_1.3.1                 purrr_1.0.2                
## [179] polyclip_1.10-6             future_1.33.2              
## [181] clue_0.3-65                 scattermore_1.2            
## [183] janitor_2.2.0               xtable_1.8-4               
## [185] monocle3_1.3.7              e1071_1.7-16               
## [187] RSpectra_0.16-1             later_1.3.2                
## [189] viridisLite_0.4.2           class_7.3-22               
## [191] ragg_1.3.2                  tibble_3.2.1               
## [193] memoise_2.0.1               beeswarm_0.4.0             
## [195] IRanges_2.38.0              cluster_2.1.6              
## [197] timechange_0.3.0            globals_0.16.3             
## [199] caret_6.0-94
## [1] "/Users/sfurla/develop/viewmastR/vignettes"