Calculate dispersion genes in a cell_data_set object — calculate_gene

Monocle3 aims to learn how cells transition through a biological program of gene expression changes in an experiment. Each cell can be viewed as a point in a high-dimensional space, where each dimension describes the expression of a different gene. Identifying the program of gene expression changes is equivalent to learning a trajectory that the cells follow through this space. However, the more dimensions there are in the analysis, the harder the trajectory is to learn. Fortunately, many genes typically co-vary with one another, and so the dimensionality of the data can be reduced with a wide variety of different algorithms. Monocle3 provides two different algorithms for dimensionality reduction via reduce_dimensions (UMAP and tSNE). The function calculate_dispersion is an optional step in the trajectory building process before preprocess_cds. After calculating dispersion for a cell_data_set using the calculate_gene_dispersion function, the select_features function allows the user to identify a set of genes that will be used in downstream dimensionality reduction methods. These genes and their disperion and mean expression can be plotted using the plot_gene_dispersion function.

This function calculates dispersion genes in a cell_data_set object for downstream analysis.

Usage

calculate_gene_dispersion(
  cds,
  q = 3,
  id_tag = "id",
  symbol_tag = "gene_short_name",
  method = "m3addon",
  removeOutliers = T
)

calculate_gene_dispersion(
  cds,
  q = 3,
  id_tag = "id",
  symbol_tag = "gene_short_name",
  method = "m3addon",
  removeOutliers = T
)

Arguments

cds: The cell data set upon which to perform this operation.
q: The polynomial degree.
id_tag: The name of the feature data column corresponding to the unique id.
symbol_tag: The name of the feature data column corresponding to the gene symbol.
upper_lim: The upper limit of dispersion to consider.
verbose: Boolean indicating whether to display verbose output.

Value

an updated cell_data_set object with dispersion and mean expression saved

A vector of dispersion genes.