Spatial Data and Hotspot • VISION

Introduction

In this vignette, we demonstrate how to use the Hotspot pipeline in VISION to identify de-novo transcriptional gene modules and interpret them. To this end, we use Hotspot - an algorithm that identifies modules of genes that are significantly autocorrelated with one another on some latent space that can be used to define cell-cell similarity (e.g., the first 30 principal components). Qualitatively, this can be interpreted as a co-expression analysis that takes into account the distribution of gene expression values and nuanced cell-cell similarities. If you are interested in learning more about the algorithm, you can read the original publication.

In the example below, we use spatial transcriptomic data from the Slide-seq technology for the Hotspot analysis, following the original Hotspot tutorial. Though this analysis is extensible to more typical latent spaces, this is an interesting example where we show that VISION can use spatial coordinates to define cell-cell similarities.

Preliminaries

If you have yet to install VISION, we recommend installing the package from Github to install this package. Full source code can be found at the VISION Github repository, available here.

require(devtools)
install_github("YosefLab/VISION")

Once VISION and R are installed, you may load in VISION using library(VISION). Also make sure that you have installed reticulate appropriately, as this will be necessary for running Hotspot.

You’ll also need to install Hotspot, which can be installed directly from the git repository using the following command:

pip install git+https://github.com/yoseflab/Hotspot.git

Using VISION

Below, we first run the VISION pipeline as usual. In addition to importing VISION, we’ll have to import reticulate for the Hotspot analaysis.

First, we need to load VISION and reticulate.

knitr::opts_chunk$set(echo = TRUE)
library(VISION)
library(reticulate) # for the Hotspot analysis

Creating a Vision Object

First, we create a Vision object

# Read the expression and meta data
file_path <- "data/spatial" # Replace the path

expr <- read.table(paste(file_path, "expr.tsv.gz", sep="/"), check.names = FALSE, sep = "\t")
meta <- read.table(paste(file_path, "meta.tsv", sep="/"), check.names = FALSE, sep = "\t", row.names = 1, header=TRUE)

# Signature file
sig <- paste("data", "h.all.v5.2.symbols.gmt", sep="/")

# Read and create the coordinates
pos <- read.table(paste(file_path, "BeadLocationsForR.csv", sep="/"), sep=",", check.names = FALSE, row.names=1, header=TRUE)
pos["X"] <- pos$ycoord
pos["Y"] <- -1 * pos$xcoord
pos <- pos[c("X", "Y")]

# Construct the Vision object
vis <- Vision(expr, signatures=c(sig), latentSpace = pos, meta=meta) # TODO add relevant signatures

Expression Data

The provided expression data should be library-normalized.

The expression data should not be log-transformed prior to loading into VISION. For more information about how to transform the expression data, refer to the central VISION vignette.

Signatures

Signatures can be provided as a list of paths to signature files (*.gmt) or Signature objects.

See the signature vignette for more information on finding or creating gene Signatures.

Meta Data

An R data.frame with cell-level meta-data. This could be confounding variables (e.g. percentage of mitochondrial RNA, number of genes detected) or experimental covariates (e.g. genotype, donor, batch).

This input is optional if Signatures are provided.

Other Options

Other options and inputs can be provided to customize how VISION runs.

Running an analysis

Next, we can perform the normal Vision analysis using the spatial coordinates as the latent space.

vis <- analyze(vis)

Running Hotpsot analysis

Conveniently, the Hotspot analysis can be performed directly on the VISION object we just processed. To do so, we’ll use the runHotspot function which invokes the Hotspot pipeline. For more information about the analysis pipeline & Hotspot API, you can refer the documentation website here and the PhyloVision vignette.

As described in the original Hotspot vignette, we use the bernoulli model because of the low capture rate of the spatial technology. In a typical anlaysis, we would use the danb model.

vis@params$latentSpace$projectionGenes <- rownames(vis@exprData) # Use all genes.
vis <- runHotspot(vis, model="bernoulli", num_umi=meta["num_umi"], logdata=FALSE)

The PhyloVision vignette describes the additional API that is exposed for iterating on the Hotspot analysis.

After running the Hotpsot pipeline, the VISION object will have populated slots storing the gene modules, the scores for each cell with respect to these new gene modules, and the enrichment score between user-specified signatures and the new Hotspot gene modules. These results can be explored in the Hotspot mode of the VISION web-based report.

Viewing results

Finally, we can launch the Vision browser. The results from the Hotspot analysis can be viewed by clicking on the “Hotspot” button on the top-right of the web report.

viewResults(vis)