If you have yet to install VISION, we recommend installing the package from Github to install this package. Full source code can be found at the VISION Github repository, available here.
require(devtools)
install_github("YosefLab/VISION")
If you encounter errors in the installation, it is likely because a dependency is not installed correctly. Pay special attention to the error message, and in particular whether or not it notifies you that specific dependencies are not found.
After a successful installation, you can proceed by loading VISION
using library(VISION)
.
Running an analysis with vision consists of three steps:
analyze
functionCreating the VISION object requires a gene expression matrix and either a list of Gene Signatures or a data.frame of meta-data.
In this example - both are provided.
# Load VISION
library(VISION)
# Read in expression counts (Genes X Cells)
counts <- read.table("data/expression_matrix.txt.gz",
header = TRUE,
sep = '\t',
row.names = 1)
# Scale counts within a sample
n.umi <- colSums(counts)
scaled_counts <- t(t(counts) / n.umi) * median(n.umi)
# Read in meta data (Cells x Vars)
meta = read.table("data/glio_meta.txt.gz", sep='\t', header=T, row.names=1)
vis <- Vision(scaled_counts,
signatures = c("data/h.all.v5.2.symbols.gmt"),
meta = meta)
Expression Data
The provided expression data should be scaled and normalized. The example above shows just a simple UMI-scaling, but it is recommended to apply more advanced normalization procedures such as batch correction or removal of technical confounders.
The expression data should not be log-transformed prior to loading into VISION.
Signatures
Signatures can be provided as a list of paths to signature files (*.gmt) or Signature objects.
See the signature vignette for more information on finding or creating gene Signatures.
Meta Data
An R data.frame with cell-level meta-data. This could be confounding variables (e.g. percentage of mitochondrial RNA, number of genes detected) or experimental covariates (e.g. genotype, donor, batch).
This input is optional if Signatures are provided.
Other Options
Other options and inputs can be provided to customize how VISION runs. For information on this, see the “Customizing VISION Analysis” section below.
With the processed Vision object, a dynamic web report can be
generated with the viewResults()
function.
viewResults(vis)
This will launch a browser running the interactive report.
Other options (port, host, browser) can be provided to control how
this occurs. For example, if you are launching a report on a remote
server (such as an AWS instance) and want to make it accessible to
others, run this with host="0.0.0.0"
, some selected port
number (e.g. port=8888
), and browser=FALSE
(so
a browser isn’t auto-opened). Then the report should be available at
“<your instance IP address>:8888”. (Note: You will also likely
need to enable inbound traffic on your selected port for this to work
correctly).
Alternately, you can work with the VISION object directly in R. For example:
# Display autocorrelation coefficients, p-values for signatures
head(getSignatureAutocorrelation(vis))
# Plot signature scores for a signature of interest
tsne <- getProjections(vis)[["tSNE30"]]
sigScores <- getSignatureScores(vis)[, "HALLMARK_INTERFERON_GAMMA_RESPONSE"]
library(ggplot2)
ggplot() + aes(x=tsne[, 1], y=tsne[, 2], color=sigScores) + geom_point()
For more details on accessing computed data within the VISION object, see the References page.
VISION requires a latent space to model the similarity between cells (used to determine a cell’s local neighborhood).
By default, this is calculated via PCA on a subset of the genes. A few arguments control this process:
projection_genes
- controls how genes are selected for
PCA. Options are:
"threshold"
- use a threshold to select genes. The
threshold
argument specifies either the number or
proportion of genes in which a cell must be expressed to be
included."fano"
- first the “threshold” filter is applied. Then
genes are ordered by mean expression into 30 bins and within each bin,
genes with a high Fano factor (2 MAD above the median) are retainedscRNA-seq experiments have grown in size over the past couple of years, and as such we have provided an algorithm for pooling together similar cells and continuing with analysis, thus reducing the cell-wise time complexity of the VISION pipeline. For more information, please see our micropooling vignette.
You can control the parameters of the micropooling algorithm using two arguments:
pool
- boolean specifying whether or not to apply
micropooling. By default this is set to ‘auto’ and micropooling is run
when the number of cells exceeds 15,000.cellsPerPartition
- integer specifying the target
number of cells per micropool.Two-dimensional projections are used to visualize the data in the output report.
By default, VISION computes tSNE on the latent space for
visualization. However, other options are availabled via the
projection_methods
argument.
Often times, users will have pre-computed projections that they would
like to use for visualizing their data (e.g. a pre-computed tSNE, or
UMAP projection). In this case, the addProjection()
method
can be used to add this view of the data to the output report.
projection <- read.csv("umap_results.csv")
# projection is a matirx or data.frame of dimension (Cells x 2)
vis <- addProjection(vis, "UMAP", projection)
As of version 3.0.0, we have enabled users to perform de-novo gene module identification Hotspot (DeTomaso and Yosef, Cell Systems 2021) from within VISION.
As described in the original Hotspot vignette, we’ll use the
danb
model for standard single-cell RNA-seq datasets. While
this works in general, pay special attention to characteristics of your
data (e.g., in spatial examples where capture is low, we recommend using
the bernoulli
model).
vis@params$latentSpace$projectionGenes <- rownames(vis@exprData) # Use all genes.
vis <- runHotspot(vis, model="danb", num_umi=meta["num_umi"], logdata=FALSE)
For more information about the analysis pipeline & Hotspot API, you can refer the documentation website here and the PhyloVision vignette.
As of version 3.0.0, we have enabled users to save the relevant
information from an interactive report to reproduce in R. This can be
accomplished by downloading the “state” of the report using the
Download
button below “Save Report Info” in the upper right
of the interactive web-based report. Below, we provide an example
workflow of doing this.
Clicking Download
will download three files: (1) a DE
cache; (2) a cache for selections; and (3) a json file containing the
current projection, values being plotted, and selected cells. If you
would like to view the state on the UI, you may read in the first two
objects into VISION and launch the report again:
vis_new <- load_de_cache(vis, "de_cache_download_path")
vis_new <- load_selections(vis_new, "selections_download_path")
viewResults(vis_new, ...)
Else, if you would like to reproduce the plot in R, you can use the third output. This json object contains 5 pieces of information that are useful for reproducing plots:
item_key
)values
)projection
)projection_keyX
and projection_keyY
)Below is an example function that will reproduce the visualization in
ggplot2. Be sure to have ggplot2
, jsonlite
,
and viridis
(for coloring) installed:
install.packages('ggplot2')
install.packages('viridis')
install.packages('jsonlite')
plot_saved_state <- function(json_file_path) {
library(jsonlite)
library(viridis)
library(ggplot2)
vision_state <- read_json(json_file_path)
# extract meaningful info
value_name <- vision_state$item_key
scatter <- vision_state$projection
values <- vision_state$values
selection <- vision_state$selected_cells
x_name <- vision_state$projection_keyX[[2]]
y_name <- vision_state$projection_keyY[[2]]
embedding <- data.frame(dim1=unlist(lapply(scatter, function(x) x[[1]])),
dim2=unlist(lapply(scatter, function(x) x[[2]])),
vals=unlist(values))
if (length(selection) > 0) {
selected_cells = (rownames(embedding) %in% selection)
g <- ggplot() +
geom_point(data=embedding[!selected_cells,],
aes(x = dim1, y = dim2, color=vals),
stroke = 0.5, shape=16, alpha = .5) +
geom_point(data=embedding[selected_cells,],
aes(x=dim1, y=dim2, color=vals), size=1.6)
} else {
g <- ggplot() +
geom_point(data=embedding, aes(x=dim1, y=dim2, color=vals))
}
if (is.numeric(embedding$vals)) {
g <- g + scale_colour_viridis()
} else {
g <- g + scale_colour_discrete()
}
g <- g +
labs(colour=value_name, x=x_name, y=y_name, title=value_name)
return(g)
}
saved_state_file <- '/path/to/saved_scatter.json'
plot_saved_state(saved_state_file)