As of version 3.0.0, VISION now supports the analysis of scRNA-seq data with respect to a user-defined cell lineage, or phylogeny (e.g., the CRISPR/Cas9-based lineage tracing technologies described in Chan et al, Nature 2019). We refer to this analysis pipeline as PhyloVision. PhyloVision uses the relationships between cells as specified by the phylogeny, as opposed to the cell-cell similarities that can be inferred from the scRNA-seq data, to conduct the autocorrelation analysis. In this way, the analyses performed in PhyloVision identify gene modules that are heritable, or structured meaningfully on a cell lineage.
While several types of trees can be used to relate cells to one another (e.g., those inferred via B Cell Receptor sequencing), in this vignette, we focus on applying PhyloVision to a recently published dataset using CRISPR/Cas9-based synthetic lineage tracing from Quinn et al, Science 2021. This technology relies on Cas9 to introduce heritable, irreversible mutations at a defined locus in the 3’ UTR of a fluorescent protein (“target site”) across cell divisions, encoding nested relationships between cells. Utilizing conventional scRNA-seq platforms, this data simultaneously captures single-cell transcriptomes as well as target sites harboring lineage barcodes. Using this dataset, we demonstrate how a user might use PhyloVision.
Specifically, we discuss four items: - Preparing data for input into
PhyloVision
, including passing external meta-data (e.g.,
metastatic rates) to the pipeline. - Performing the
PhyloVision
analysis pipeline - Detecting de-novo gene
modules with Hotspot (discussed more in the Spatial Hotspot Vignette). - Launching an
interactive, web-based report for viewing.
If you have yet to install VISION, we recommend installing the package from Github to install this package. Full source code can be found at the VISION Github repository, available here.
require(devtools)
install_github("YosefLab/VISION")
You’ll also want to be sure that you have ape
,
Matrix
, and reticulate
installed:
install.packages('ape')
install.packages('Matrix')
install.packages('reticulate')
Once VISION and R are installed, you may load in VISION using
library(VISION)
.
To enable the Hotspot analysis below, install it directly from the git repository using the following command:
pip install git+https://github.com/yoseflab/Hotspot.git
If you are having trouble installing Hotspot, please refer to the documentation website here.
First, we need to load VISION, reticulate and ape.
knitr::opts_chunk$set(echo = TRUE)
library(VISION)
library(reticulate)
library(ape)
We’ll proceed by reading in the raw expression matrix, tree, and meta data for a single clone (CP007) of approximately 600 cells from Quinn et al, Science 2021.
# Read in expression data
expression <- read.table('data/metastasis/cp007_expression.tsv.gz', sep='\t',
header=T, row.names=1, check.names = FALSE,
stringsAsFactors = F)
# Read in tree
tree <- read.tree('data/metastasis/cp007_tree.nwk.gz')
# Collapse one mutations with ape
tree <- collapse.singles(tree)
# Read in meta data
meta <- read.table('data/metastasis/cp007_meta.tsv.gz', sep='\t',
header=T, row.names=1)
# Signature file
sig <- paste("data", "h.all.v5.2.symbols.gmt", sep="/")
A quick inspection of the meta data indicates that we have several
types of data associated with each of the leaves of this tree. Not only
do we have the anatomical site from which the cell was sampled
(sampleID
) but also pre-computed statistics like the
Metastatic Rate (scTreeMetRate
). In fact, users can specify
an arbitrary number of data items that have been precomputed. For
example, if a user has performed fitness inference as in Yang
et al, bioRxiv 2021, they can add these statistics as a column to
the meta data file.
In addition, a user might be interested in defining their own
clustering methodology on the tree. A simple clustering would entail
cutting the tree at a defined depth and grouping together cells that are
below each internal node at the defined depth (similar to the cuttree
method for grouping samples together from a hierarchical
clustering).
clusterByDepth <- function(tree, depth = 2) {
clusters <- list()
current_depth = 0
current_cluster_name <- 1
root <- VISION:::find_root(tree)
queue <- c(root)
while(length(queue) > 0) {
if (current_depth == depth) {
for (node in queue) {
clusters[[as.character(current_cluster_name)]] <- tree$tip.label[VISION:::get_all_children(tree, node)]
current_cluster_name <- current_cluster_name + 1
}
return(clusters)
}
current_depth <- current_depth + 1
new_queue <- c()
for (node in queue) {
children <- VISION:::get_children(tree, node)
if (length(children) == 1 && children == node) {
clusters[[as.character(current_cluster_name)]] <- c(tree$tip.label[[node]])
current_cluster_name <- current_cluster_name + 1
} else {
new_queue <- c(new_queue, children)
}
}
queue <- new_queue
}
stop("Targeted depth is larger than the depth of the tree.
Repeat call with smaller target depth.")
}
# add in depth based cluster annotations
depth_clusters <- clusterByDepth(tree, depth=1)
meta$DepthClusters <- NA
for (cluster in names(depth_clusters)) {
meta[depth_clusters[[cluster]], 'DepthClusters'] <- cluster
}
# Construct the Vision object
projection_genes <- VISION:::filterGenesFano(expression)
vis <- PhyloVision(tree=tree, data=expression, signatures=sig,
meta=meta, num_neighbors=30, projection_genes= projection_genes)
Tree
The ape
phylogeny object. Singleton edges should be
collapsed prior to use. Ensure that leaf names correspond to the same
names used in the expression matrix.
Expression Data
The provided expression data should be library-normalized. The expression data should not be log-transformed prior to loading into VISION. For more information on the input to VISION, please refer to the central VISION vignette.
Signatures
Signatures can be provided as a list of paths to signature files (*.gmt) or Signature objects.
See the signature vignette for more information on finding or creating gene Signatures.
Meta Data
An R data.frame with cell-level meta-data. This could be confounding variables (e.g. percentage of mitochondrial RNA, number of genes detected) or experimental covariates (e.g. genotype, donor, batch).
This input is optional if Signatures are provided.
Projection Genes These genes are used to infer projections for clustering in the transcriptional space and 2D visualizations.
Other Options
Other options and inputs can be provided to customize how VISION runs.
Next, we can perform the normal Vision analysis using the tree as the
latent space. In addition to the typical VISION pipeline, phyloAnlayze
will compute plasticity scores for each categorical meta data. For
example, in this case, a plasticity index on the sampleID
(i.e., which anatomical site a cell was harvested from) would correspond
to the scMetRate discussed in the original study.
vis <- phyloAnalyze(vis)
Note: phyloAnalyze
will add new entries to the
object@metaData
slot that can be viewed on the interactive
report and evaluated with autocorrelation statistics. Specifically, a
default tree-based clustering will be added (named
VISION_Cluster_Tree
) and numerical plasticity scores will
be computed as meta-data for each of the categorical data columns in the
user-specified metaData
.
As mentioned above, PhyloVision can take advantage of the new Hotspot functionality in VISION. Briefly, Hotspot is a tool for inferring modules of genes that are significantly autocorrelated with one another and a particular latent space (e.g., the first 30 principal components of a gene expression matrix). When combined with PhyloVision, the Hotspot functionality will use the user-defined phylogeny as the latent space.
We can invoke the Hotspot analysis with the function
runHotspot
and setting tree=True
. Upon doing
so, this will identify modules of genes and add these as Signatures to
the VISION object for autocorrelation evaluation. Moreover, to add
interpretability to this analysis, VISION will also compute the
enrichment between all the user-defined Signatures and each Hotspot
module. This information will be accessible on the web-based report by
selecting the “Hotspot” mode in the top-right of the web-page.
The full Hotspot API is exposed for analysis as well. For more on the
Hotspot API see here. To note,
while it is typically not recommended to use model="normal"
and logdata=FALSE
in Hotpsot, we elect to do so because the
data here has been previously log-normalized. In more typical
single-cell anlayses where the user is working with a library-normalized
count matrix, we suggest setting model="danb"
and
logdata=TRUE
.
vis <- runHotspot(vis, model="danb", tree=TRUE, min_gene_threshold=50, n_neighbors = 30, number_top_genes = nrow(expression), logdata=FALSE)
We have additionally exposed the full Hotspot pipeline should a user want to iterate with different parameters:
# Init Hotspot
hs <- hsInit(vis, model = "normal", tree=TRUE, logdata=FALSE)
# Init Hotspot KNN
hs <- hsCreateKnnGraph(hs, vis, n_neighbors=30)
# perform Hotspot analysis and store results in R
hs_genes <- hsComputeAutoCorrelations(hs, number_top_genes=12438)
# Compute localcorr
hs <- hsComputeLocalCorrelations(hs, hs_genes)
# Calculate Hotspot Module Scores for informative genes
hs <- hsCalculateModuleScores(hs)
# Cluster Hotspot modules and perform Vision based analysis on HS Modules and
vis <- analyzeHotspotObjectVision(vis, hs, tree=TRUE)
We can also access further Hotspot functionality using Reticulate and Python.
hs <- loadHotspotObject(bytes=vis@Hotspot)
library(reticulate)
use_python('/usr/bin/python3')
import matplotlib.pyplot as plt
import hotspot
hs.plot_local_correlations() plt.show()
Note: the heatmap can also be visualized like so:
hs <- loadHotspotObject(bytes=vis@Hotspot)
library(paletteer)
draw_hotspot_heatmap(hs)
Finally, we can launch the Vision browser. If you are local, the following will work:
viewResults(vis)
Else, if you would like to launch from a server, you can use the following settings:
viewResults(vis, host='0.0.0.0', port=8100, browser=F)
Now, the report will be visible from port 8100
from your
server (e.g., http://server-name:8100/Results.html)