Introduction

As of version 3.0.0, VISION now supports the analysis of scRNA-seq data with respect to a user-defined cell lineage, or phylogeny (e.g., the CRISPR/Cas9-based lineage tracing technologies described in Chan et al, Nature 2019). We refer to this analysis pipeline as PhyloVision. PhyloVision uses the relationships between cells as specified by the phylogeny, as opposed to the cell-cell similarities that can be inferred from the scRNA-seq data, to conduct the autocorrelation analysis. In this way, the analyses performed in PhyloVision identify gene modules that are heritable, or structured meaningfully on a cell lineage.

While several types of trees can be used to relate cells to one another (e.g., those inferred via B Cell Receptor sequencing), in this vignette, we focus on applying PhyloVision to a recently published dataset using CRISPR/Cas9-based synthetic lineage tracing from Quinn et al, Science 2021. This technology relies on Cas9 to introduce heritable, irreversible mutations at a defined locus in the 3’ UTR of a fluorescent protein (“target site”) across cell divisions, encoding nested relationships between cells. Utilizing conventional scRNA-seq platforms, this data simultaneously captures single-cell transcriptomes as well as target sites harboring lineage barcodes. Using this dataset, we demonstrate how a user might use PhyloVision.

Specifically, we discuss four items: - Preparing data for input into PhyloVision, including passing external meta-data (e.g., metastatic rates) to the pipeline. - Performing the PhyloVision analysis pipeline - Detecting de-novo gene modules with Hotspot (discussed more in the Spatial Hotspot Vignette). - Launching an interactive, web-based report for viewing.

Preliminaries

If you have yet to install VISION, we recommend installing the package from Github to install this package. Full source code can be found at the VISION Github repository, available here.

require(devtools)
install_github("YosefLab/VISION")

You’ll also want to be sure that you have ape, Matrix, and reticulate installed:

install.packages('ape')
install.packages('Matrix')
install.packages('reticulate')

Once VISION and R are installed, you may load in VISION using library(VISION).

To enable the Hotspot analysis below, install it directly from the git repository using the following command:

pip install git+https://github.com/yoseflab/Hotspot.git

If you are having trouble installing Hotspot, please refer to the documentation website here.

Running PhyloVision

First, we need to load VISION, reticulate and ape.

Creating a PhyloVision Object with a Tree

We’ll proceed by reading in the raw expression matrix, tree, and meta data for a single clone (CP007) of approximately 600 cells from Quinn et al, Science 2021.

# Read in expression data
expression <- read.table('data/metastasis/cp007_expression.tsv.gz', sep='\t',
                         header=T, row.names=1, check.names = FALSE,
                         stringsAsFactors = F)

# Read in tree
tree <- read.tree('data/metastasis/cp007_tree.nwk.gz')
# Collapse one mutations with ape
tree <- collapse.singles(tree)

# Read in meta data
meta <- read.table('data/metastasis/cp007_meta.tsv.gz', sep='\t',
                   header=T, row.names=1)

# Signature file
sig <- paste("data", "h.all.v5.2.symbols.gmt", sep="/")

A quick inspection of the meta data indicates that we have several types of data associated with each of the leaves of this tree. Not only do we have the anatomical site from which the cell was sampled (sampleID) but also pre-computed statistics like the Metastatic Rate (scTreeMetRate). In fact, users can specify an arbitrary number of data items that have been precomputed. For example, if a user has performed fitness inference as in Yang et al, bioRxiv 2021, they can add these statistics as a column to the meta data file.

In addition, a user might be interested in defining their own clustering methodology on the tree. A simple clustering would entail cutting the tree at a defined depth and grouping together cells that are below each internal node at the defined depth (similar to the cuttree method for grouping samples together from a hierarchical clustering).

clusterByDepth <- function(tree, depth = 2) {
  
  clusters <- list()
  current_depth = 0
  current_cluster_name <- 1
  root <- VISION:::find_root(tree)
  
  queue <- c(root)
  while(length(queue) > 0) {
    
    if (current_depth == depth) {
      for (node in queue) {
        clusters[[as.character(current_cluster_name)]] <- tree$tip.label[VISION:::get_all_children(tree, node)]
        current_cluster_name <- current_cluster_name + 1
      }
      return(clusters)
    }
    
    current_depth <- current_depth + 1
    
    new_queue <- c()
    for (node in queue) {
      children <- VISION:::get_children(tree, node)
      if (length(children) == 1 && children == node) {
        clusters[[as.character(current_cluster_name)]] <- c(tree$tip.label[[node]])
        current_cluster_name <- current_cluster_name + 1
      } else {
        new_queue <- c(new_queue, children)
      }
    }
    queue <- new_queue
  }
  stop("Targeted depth is larger than the depth of the tree.
       Repeat call with smaller target depth.")
}

# add in depth based cluster annotations
depth_clusters <- clusterByDepth(tree, depth=1) 
meta$DepthClusters <- NA
for (cluster in names(depth_clusters)) {
  meta[depth_clusters[[cluster]], 'DepthClusters'] <- cluster
}
# Construct the Vision object
projection_genes <- VISION:::filterGenesFano(expression)
vis <- PhyloVision(tree=tree, data=expression, signatures=sig,
                   meta=meta, num_neighbors=30, projection_genes= projection_genes)

More details on input

Tree

The ape phylogeny object. Singleton edges should be collapsed prior to use. Ensure that leaf names correspond to the same names used in the expression matrix.

Expression Data

The provided expression data should be library-normalized. The expression data should not be log-transformed prior to loading into VISION. For more information on the input to VISION, please refer to the central VISION vignette.

Signatures

Signatures can be provided as a list of paths to signature files (*.gmt) or Signature objects.

See the signature vignette for more information on finding or creating gene Signatures.

Meta Data

An R data.frame with cell-level meta-data. This could be confounding variables (e.g. percentage of mitochondrial RNA, number of genes detected) or experimental covariates (e.g. genotype, donor, batch).

This input is optional if Signatures are provided.

Projection Genes These genes are used to infer projections for clustering in the transcriptional space and 2D visualizations.

Other Options

Other options and inputs can be provided to customize how VISION runs.

Running PhyloVision analysis

Next, we can perform the normal Vision analysis using the tree as the latent space. In addition to the typical VISION pipeline, phyloAnlayze will compute plasticity scores for each categorical meta data. For example, in this case, a plasticity index on the sampleID (i.e., which anatomical site a cell was harvested from) would correspond to the scMetRate discussed in the original study.

vis <- phyloAnalyze(vis)

Note: phyloAnalyze will add new entries to the object@metaData slot that can be viewed on the interactive report and evaluated with autocorrelation statistics. Specifically, a default tree-based clustering will be added (named VISION_Cluster_Tree) and numerical plasticity scores will be computed as meta-data for each of the categorical data columns in the user-specified metaData.

Running Hotspot analysis

As mentioned above, PhyloVision can take advantage of the new Hotspot functionality in VISION. Briefly, Hotspot is a tool for inferring modules of genes that are significantly autocorrelated with one another and a particular latent space (e.g., the first 30 principal components of a gene expression matrix). When combined with PhyloVision, the Hotspot functionality will use the user-defined phylogeny as the latent space.

We can invoke the Hotspot analysis with the function runHotspot and setting tree=True. Upon doing so, this will identify modules of genes and add these as Signatures to the VISION object for autocorrelation evaluation. Moreover, to add interpretability to this analysis, VISION will also compute the enrichment between all the user-defined Signatures and each Hotspot module. This information will be accessible on the web-based report by selecting the “Hotspot” mode in the top-right of the web-page.

The full Hotspot API is exposed for analysis as well. For more on the Hotspot API see here. To note, while it is typically not recommended to use model="normal" and logdata=FALSE in Hotpsot, we elect to do so because the data here has been previously log-normalized. In more typical single-cell anlayses where the user is working with a library-normalized count matrix, we suggest setting model="danb" and logdata=TRUE.

vis <- runHotspot(vis, model="danb", tree=TRUE, min_gene_threshold=50, n_neighbors = 30, number_top_genes = nrow(expression), logdata=FALSE)

We have additionally exposed the full Hotspot pipeline should a user want to iterate with different parameters:

# Init Hotspot
hs <- hsInit(vis, model = "normal", tree=TRUE, logdata=FALSE)
# Init Hotspot KNN
hs <- hsCreateKnnGraph(hs, vis, n_neighbors=30)
# perform Hotspot analysis and store results in R
hs_genes <- hsComputeAutoCorrelations(hs, number_top_genes=12438)
# Compute localcorr
hs <- hsComputeLocalCorrelations(hs, hs_genes)
# Calculate Hotspot Module Scores for informative genes
hs <- hsCalculateModuleScores(hs)
# Cluster Hotspot modules and perform Vision based analysis on HS Modules and 
vis <- analyzeHotspotObjectVision(vis, hs, tree=TRUE)

We can also access further Hotspot functionality using Reticulate and Python.

hs <- loadHotspotObject(bytes=vis@Hotspot)
library(reticulate)
use_python('/usr/bin/python3')
import matplotlib.pyplot as plt
import hotspot
hs.plot_local_correlations()
plt.show()

Note: the heatmap can also be visualized like so:

Viewing Results

Finally, we can launch the Vision browser. If you are local, the following will work:

Else, if you would like to launch from a server, you can use the following settings:

viewResults(vis, host='0.0.0.0', port=8100, browser=F)

Now, the report will be visible from port 8100 from your server (e.g., http://server-name:8100/Results.html)