In this vignette, we demonstrate how to use the Hotspot pipeline in VISION to identify de-novo transcriptional gene modules and interpret them. To this end, we use Hotspot - an algorithm that identifies modules of genes that are significantly autocorrelated with one another on some latent space that can be used to define cell-cell similarity (e.g., the first 30 principal components). Qualitatively, this can be interpreted as a co-expression analysis that takes into account the distribution of gene expression values and nuanced cell-cell similarities. If you are interested in learning more about the algorithm, you can read the original publication.
In the example below, we use spatial transcriptomic data from the Slide-seq technology for the Hotspot analysis, following the original Hotspot tutorial. Though this analysis is extensible to more typical latent spaces, this is an interesting example where we show that VISION can use spatial coordinates to define cell-cell similarities.
If you have yet to install VISION, we recommend installing the package from Github to install this package. Full source code can be found at the VISION Github repository, available here.
require(devtools)
install_github("YosefLab/VISION")
Once VISION and R are installed, you may load in VISION using
library(VISION)
. Also make sure that you have installed
reticulate
appropriately, as this will be necessary for
running Hotspot.
You’ll also need to install Hotspot, which can be installed directly from the git repository using the following command:
pip install git+https://github.com/yoseflab/Hotspot.git
Below, we first run the VISION pipeline as usual. In addition to
importing VISION, we’ll have to import reticulate
for the
Hotspot analaysis.
First, we need to load VISION and reticulate.
knitr::opts_chunk$set(echo = TRUE)
library(VISION)
library(reticulate) # for the Hotspot analysis
First, we create a Vision object
# Read the expression and meta data
file_path <- "data/spatial" # Replace the path
expr <- read.table(paste(file_path, "expr.tsv.gz", sep="/"), check.names = FALSE, sep = "\t")
meta <- read.table(paste(file_path, "meta.tsv", sep="/"), check.names = FALSE, sep = "\t", row.names = 1, header=TRUE)
# Signature file
sig <- paste("data", "h.all.v5.2.symbols.gmt", sep="/")
# Read and create the coordinates
pos <- read.table(paste(file_path, "BeadLocationsForR.csv", sep="/"), sep=",", check.names = FALSE, row.names=1, header=TRUE)
pos["X"] <- pos$ycoord
pos["Y"] <- -1 * pos$xcoord
pos <- pos[c("X", "Y")]
# Construct the Vision object
vis <- Vision(expr, signatures=c(sig), latentSpace = pos, meta=meta) # TODO add relevant signatures
Expression Data
The provided expression data should be library-normalized.
The expression data should not be log-transformed prior to loading into VISION. For more information about how to transform the expression data, refer to the central VISION vignette.
Signatures
Signatures can be provided as a list of paths to signature files (*.gmt) or Signature objects.
See the signature vignette for more information on finding or creating gene Signatures.
Meta Data
An R data.frame with cell-level meta-data. This could be confounding variables (e.g. percentage of mitochondrial RNA, number of genes detected) or experimental covariates (e.g. genotype, donor, batch).
This input is optional if Signatures are provided.
Other Options
Other options and inputs can be provided to customize how VISION runs.
Next, we can perform the normal Vision analysis using the spatial coordinates as the latent space.
vis <- analyze(vis)
Conveniently, the Hotspot analysis can be performed directly on the
VISION object we just processed. To do so, we’ll use the
runHotspot
function which invokes the Hotspot pipeline. For
more information about the analysis pipeline & Hotspot API, you can
refer the documentation website here and the PhyloVision vignette.
As described in the original Hotspot vignette, we use the
bernoulli
model because of the low capture rate of the
spatial technology. In a typical anlaysis, we would use the
danb
model.
vis@params$latentSpace$projectionGenes <- rownames(vis@exprData) # Use all genes.
vis <- runHotspot(vis, model="bernoulli", num_umi=meta["num_umi"], logdata=FALSE)
The PhyloVision vignette describes the additional API that is exposed for iterating on the Hotspot analysis.
After running the Hotpsot pipeline, the VISION object will have populated slots storing the gene modules, the scores for each cell with respect to these new gene modules, and the enrichment score between user-specified signatures and the new Hotspot gene modules. These results can be explored in the Hotspot mode of the VISION web-based report.
Finally, we can launch the Vision browser. The results from the Hotspot analysis can be viewed by clicking on the “Hotspot” button on the top-right of the web report.
viewResults(vis)