In this notebook, we propose a demonstration of our reimplementation of Stereoscope in the new scvi-tools codebase. We apply our implementation as well as the original code on 10x Visium data from the hippocampus of a mouse brain.
Stereoscope¶
Install scvi-tools¶
In [1]:
Copied!
import sys
#if True, will install via pypi, else will install from source
stable = False
IN_COLAB = "google.colab" in sys.modules
if IN_COLAB and stable:
!pip install --quiet scvi-tools[tutorials]==0.14.1
# !pip install --upgrade pytorch-lightning==1.4.9
elif IN_COLAB and not stable:
!pip install --quiet --upgrade jsonschema
!pip install --quiet git+https://github.com/yoseflab/scvi-tools@stereoscope_sum#egg=scvi-tools[tutorials]
if IN_COLAB:
!mkdir -p figs/stereoscope
import sys
#if True, will install via pypi, else will install from source
stable = False
IN_COLAB = "google.colab" in sys.modules
if IN_COLAB and stable:
!pip install --quiet scvi-tools[tutorials]==0.14.1
# !pip install --upgrade pytorch-lightning==1.4.9
elif IN_COLAB and not stable:
!pip install --quiet --upgrade jsonschema
!pip install --quiet git+https://github.com/yoseflab/scvi-tools@stereoscope_sum#egg=scvi-tools[tutorials]
if IN_COLAB:
!mkdir -p figs/stereoscope
|████████████████████████████████| 69 kB 4.2 MB/s
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
nbclient 0.5.4 requires jupyter-client>=6.1.5, but you have jupyter-client 5.3.5 which is incompatible.
Installing build dependencies ... done
Getting requirements to build wheel ... done
Preparing wheel metadata ... done
|████████████████████████████████| 212 kB 6.7 MB/s
|████████████████████████████████| 127 kB 55.6 MB/s
|████████████████████████████████| 813 kB 65.6 MB/s
|████████████████████████████████| 242 kB 99.8 MB/s
|████████████████████████████████| 678 kB 62.4 MB/s
|████████████████████████████████| 1.4 MB 27.1 MB/s
|████████████████████████████████| 2.0 MB 76.6 MB/s
|████████████████████████████████| 8.8 MB 20.2 MB/s
|████████████████████████████████| 41 kB 146 kB/s
|████████████████████████████████| 3.2 MB 59.0 MB/s
|████████████████████████████████| 48 kB 7.1 MB/s
|████████████████████████████████| 125 kB 71.5 MB/s
|████████████████████████████████| 636 kB 75.3 MB/s
|████████████████████████████████| 829 kB 69.5 MB/s
|████████████████████████████████| 282 kB 83.8 MB/s
|████████████████████████████████| 1.3 MB 60.1 MB/s
|████████████████████████████████| 51 kB 9.2 MB/s
|████████████████████████████████| 80 kB 12.2 MB/s
|████████████████████████████████| 1.1 MB 77.5 MB/s
|████████████████████████████████| 160 kB 76.0 MB/s
|████████████████████████████████| 271 kB 80.5 MB/s
|████████████████████████████████| 63 kB 2.4 MB/s
Building wheel for scvi-tools (PEP 517) ... done
Building wheel for docrep (setup.py) ... done
Building wheel for loompy (setup.py) ... done
Building wheel for future (setup.py) ... done
Building wheel for umap-learn (setup.py) ... done
Building wheel for pynndescent (setup.py) ... done
Building wheel for numpy-groupies (setup.py) ... done
Building wheel for sinfo (setup.py) ... done
Get the single-cell data¶
In [2]:
Copied!
!wget https://github.com/YosefLab/scVI-data/blob/master/scRefSubsampled1000_hippocampus_compressed.h5ad?raw=true -O scRef.h5ad
!wget https://github.com/YosefLab/scVI-data/blob/master/scRefSubsampled1000_hippocampus_compressed.h5ad?raw=true -O scRef.h5ad
--2021-10-14 17:21:41-- https://github.com/YosefLab/scVI-data/blob/master/scRefSubsampled1000_hippocampus_compressed.h5ad?raw=true Resolving github.com (github.com)... 192.30.255.112 Connecting to github.com (github.com)|192.30.255.112|:443... connected. HTTP request sent, awaiting response... 302 Found Location: https://github.com/YosefLab/scVI-data/raw/master/scRefSubsampled1000_hippocampus_compressed.h5ad [following] --2021-10-14 17:21:41-- https://github.com/YosefLab/scVI-data/raw/master/scRefSubsampled1000_hippocampus_compressed.h5ad Reusing existing connection to github.com:443. HTTP request sent, awaiting response... 302 Found Location: https://raw.githubusercontent.com/YosefLab/scVI-data/master/scRefSubsampled1000_hippocampus_compressed.h5ad [following] --2021-10-14 17:21:41-- https://raw.githubusercontent.com/YosefLab/scVI-data/master/scRefSubsampled1000_hippocampus_compressed.h5ad Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.111.133, 185.199.110.133, 185.199.108.133, ... Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.111.133|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 40290764 (38M) [application/octet-stream] Saving to: ‘scRef.h5ad’ scRef.h5ad 100%[===================>] 38.42M 254MB/s in 0.2s 2021-10-14 17:21:43 (254 MB/s) - ‘scRef.h5ad’ saved [40290764/40290764]
Import libraries¶
In [3]:
Copied!
import scanpy as sc
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
%matplotlib inline
import scanpy as sc
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
%matplotlib inline
In [4]:
Copied!
sns.reset_orig()
sc.settings._vector_friendly = True
# p9.theme_set(p9.theme_classic)
plt.rcParams["svg.fonttype"] = "none"
plt.rcParams["pdf.fonttype"] = 42
plt.rcParams["savefig.transparent"] = True
plt.rcParams["figure.figsize"] = (4, 4)
plt.rcParams["axes.titlesize"] = 15
plt.rcParams["axes.titleweight"] = 500
plt.rcParams["axes.titlepad"] = 8.0
plt.rcParams["axes.labelsize"] = 14
plt.rcParams["axes.labelweight"] = 500
plt.rcParams["axes.linewidth"] = 1.2
plt.rcParams["axes.labelpad"] = 6.0
plt.rcParams["axes.spines.top"] = False
plt.rcParams["axes.spines.right"] = False
plt.rcParams["font.size"] = 11
# plt.rcParams['font.family'] = 'sans-serif'
plt.rcParams['font.sans-serif'] = ['Helvetica', "Computer Modern Sans Serif", "DejaVU Sans"]
plt.rcParams['font.weight'] = 500
plt.rcParams['xtick.labelsize'] = 12
plt.rcParams['xtick.minor.size'] = 1.375
plt.rcParams['xtick.major.size'] = 2.75
plt.rcParams['xtick.major.pad'] = 2
plt.rcParams['xtick.minor.pad'] = 2
plt.rcParams['ytick.labelsize'] = 12
plt.rcParams['ytick.minor.size'] = 1.375
plt.rcParams['ytick.major.size'] = 2.75
plt.rcParams['ytick.major.pad'] = 2
plt.rcParams['ytick.minor.pad'] = 2
plt.rcParams["legend.fontsize"] = 12
plt.rcParams['legend.handlelength'] = 1.4
plt.rcParams['legend.numpoints'] = 1
plt.rcParams['legend.scatterpoints'] = 3
plt.rcParams['legend.frameon'] = False
plt.rcParams['lines.linewidth'] = 1.7
DPI = 300
sns.reset_orig()
sc.settings._vector_friendly = True
# p9.theme_set(p9.theme_classic)
plt.rcParams["svg.fonttype"] = "none"
plt.rcParams["pdf.fonttype"] = 42
plt.rcParams["savefig.transparent"] = True
plt.rcParams["figure.figsize"] = (4, 4)
plt.rcParams["axes.titlesize"] = 15
plt.rcParams["axes.titleweight"] = 500
plt.rcParams["axes.titlepad"] = 8.0
plt.rcParams["axes.labelsize"] = 14
plt.rcParams["axes.labelweight"] = 500
plt.rcParams["axes.linewidth"] = 1.2
plt.rcParams["axes.labelpad"] = 6.0
plt.rcParams["axes.spines.top"] = False
plt.rcParams["axes.spines.right"] = False
plt.rcParams["font.size"] = 11
# plt.rcParams['font.family'] = 'sans-serif'
plt.rcParams['font.sans-serif'] = ['Helvetica', "Computer Modern Sans Serif", "DejaVU Sans"]
plt.rcParams['font.weight'] = 500
plt.rcParams['xtick.labelsize'] = 12
plt.rcParams['xtick.minor.size'] = 1.375
plt.rcParams['xtick.major.size'] = 2.75
plt.rcParams['xtick.major.pad'] = 2
plt.rcParams['xtick.minor.pad'] = 2
plt.rcParams['ytick.labelsize'] = 12
plt.rcParams['ytick.minor.size'] = 1.375
plt.rcParams['ytick.major.size'] = 2.75
plt.rcParams['ytick.major.pad'] = 2
plt.rcParams['ytick.minor.pad'] = 2
plt.rcParams["legend.fontsize"] = 12
plt.rcParams['legend.handlelength'] = 1.4
plt.rcParams['legend.numpoints'] = 1
plt.rcParams['legend.scatterpoints'] = 3
plt.rcParams['legend.frameon'] = False
plt.rcParams['lines.linewidth'] = 1.7
DPI = 300
Read Spatial transcriptomics data¶
In [5]:
Copied!
st_adata = sc.datasets.visium_sge(sample_id="V1_Adult_Mouse_Brain")
st_adata.var_names_make_unique()
st_adata = sc.datasets.visium_sge(sample_id="V1_Adult_Mouse_Brain")
st_adata.var_names_make_unique()
Variable names are not unique. To make them unique, call `.var_names_make_unique`. Variable names are not unique. To make them unique, call `.var_names_make_unique`.
In [6]:
Copied!
sc.pl.spatial(st_adata, img_key="hires", color=["Ptk2b"])
sc.pl.spatial(st_adata, img_key="hires", color=["Ptk2b"])
... storing 'feature_types' as categorical ... storing 'genome' as categorical
In [7]:
Copied!
# Take a slice and focus on the hippocampus
sl_adata = st_adata[st_adata.obs["array_row"] > 15]
sl_adata = sl_adata[sl_adata.obs["array_row"] < 32]
sl_adata = sl_adata[sl_adata.obs["array_col"] > 20]
sl_adata = sl_adata[sl_adata.obs["array_col"] < 80]
st_adata.obs["filter"] = pd.Categorical([x in sl_adata.obs.index for x in st_adata.obs.index])
sc.pl.embedding(st_adata, basis="spatial", color="filter")
# Take a slice and focus on the hippocampus
sl_adata = st_adata[st_adata.obs["array_row"] > 15]
sl_adata = sl_adata[sl_adata.obs["array_row"] < 32]
sl_adata = sl_adata[sl_adata.obs["array_col"] > 20]
sl_adata = sl_adata[sl_adata.obs["array_col"] < 80]
st_adata.obs["filter"] = pd.Categorical([x in sl_adata.obs.index for x in st_adata.obs.index])
sc.pl.embedding(st_adata, basis="spatial", color="filter")
Marker genes
- CA1: Fibcd1
- CA3: Npy2r
- Dentate: Prox1
- Oligo: Plp1
- Astrocytes: Aldoc
In [8]:
Copied!
sc.pl.embedding(sl_adata, basis="spatial", color=["Fibcd1", "Plp1", "Npy2r", "Prox1"], size=200)
sc.pl.embedding(sl_adata, basis="spatial", color=["Fibcd1", "Plp1", "Npy2r", "Prox1"], size=200)
Read single-cell RNA Sequencing data¶
In [9]:
Copied!
sc_adata = sc.read_h5ad("scRef.h5ad")
sc_adata.obs["liger_ident_coarse"].value_counts()
sc_adata = sc.read_h5ad("scRef.h5ad")
sc_adata.obs["liger_ident_coarse"].value_counts()
Out[9]:
Polydendrocyte 1000 Oligodendrocyte 1000 CA1 1000 CA3 1000 Denate 1000 Endothelial_Stalk 1000 Entorihinal 1000 Interneuron 1000 Microglia_Macrophages 1000 Mural 1000 Neuron.Slc17a6 1000 Astrocyte 1000 Neurogenesis 988 Ependymal 715 Endothelial_Tip 676 Cajal_Retzius 672 Choroid 44 Name: liger_ident_coarse, dtype: int64
In [10]:
Copied!
# preprocessing
sc.pp.filter_genes(sc_adata, min_counts=10)
sc_adata.layers["counts"] = sc_adata.X.copy()
# normalization for selection of highly variable genes (not for the deconvolution)
sc.pp.normalize_total(sc_adata, target_sum=10e4)
sc.pp.log1p(sc_adata)
sc_adata.raw = sc_adata
sc.pp.highly_variable_genes(
sc_adata,
n_top_genes=2000,
subset=True,
layer="counts",
flavor="seurat_v3"
)
# filter genes to be the same on the spatial data
intersect = np.intersect1d(sc_adata.var_names, st_adata.var_names)
sl_adata = sl_adata[:, intersect].copy()
sc_adata = sc_adata[:, intersect].copy()
# preprocessing
sc.pp.filter_genes(sc_adata, min_counts=10)
sc_adata.layers["counts"] = sc_adata.X.copy()
# normalization for selection of highly variable genes (not for the deconvolution)
sc.pp.normalize_total(sc_adata, target_sum=10e4)
sc.pp.log1p(sc_adata)
sc_adata.raw = sc_adata
sc.pp.highly_variable_genes(
sc_adata,
n_top_genes=2000,
subset=True,
layer="counts",
flavor="seurat_v3"
)
# filter genes to be the same on the spatial data
intersect = np.intersect1d(sc_adata.var_names, st_adata.var_names)
sl_adata = sl_adata[:, intersect].copy()
sc_adata = sc_adata[:, intersect].copy()
In [11]:
Copied!
# visualize for fun
sc.tl.pca(sc_adata, svd_solver='arpack')
sc.pp.neighbors(sc_adata, n_pcs = 30, n_neighbors = 20)
sc.tl.umap(sc_adata)
# visualize for fun
sc.tl.pca(sc_adata, svd_solver='arpack')
sc.pp.neighbors(sc_adata, n_pcs = 30, n_neighbors = 20)
sc.tl.umap(sc_adata)
/usr/local/lib/python3.7/dist-packages/numba/np/ufunc/parallel.py:363: NumbaWarning: The TBB threading layer requires TBB version 2019.5 or later i.e., TBB_INTERFACE_VERSION >= 11005. Found TBB_INTERFACE_VERSION = 9107. The TBB threading layer is disabled.
warnings.warn(problem)
In [12]:
Copied!
sc.pl.umap(sc_adata, color=['liger_ident_coarse'])
sc.pl.umap(sc_adata, color=['liger_ident_coarse'])
In [13]:
Copied!
# restore counts for CLI Stereoscope
sc_adata.X = sc_adata.layers["counts"].copy()
# restore counts for CLI Stereoscope
sc_adata.X = sc_adata.layers["counts"].copy()
Learn RNA model (on scRNA-seq data)¶
In [14]:
Copied!
import scvi
from scvi.external.stereoscope import RNAStereoscope, SpatialStereoscope
import scvi
from scvi.external.stereoscope import RNAStereoscope, SpatialStereoscope
Global seed set to 0
In [15]:
Copied!
scvi.data.setup_anndata(sc_adata, layer="counts", labels_key="liger_ident_coarse")
scvi.data.setup_anndata(sc_adata, layer="counts", labels_key="liger_ident_coarse")
INFO No batch_key inputted, assuming all cells are same batch INFO Using labels from adata.obs["liger_ident_coarse"] INFO Using data from adata.layers["counts"] INFO Successfully registered anndata object containing 15095 cells, 1898 vars, 1 batches, 17 labels, and 0 proteins. Also registered 0 extra categorical covariates and 0 extra continuous covariates. INFO Please do not further modify adata until model is trained.
/usr/local/lib/python3.7/dist-packages/sklearn/utils/deprecation.py:87: FutureWarning: Function setup_anndata is deprecated; Please use the model-specific setup_anndata methods instead. The global method will be removed in version 0.15.0. warnings.warn(msg, category=FutureWarning)
In [16]:
Copied!
# time per epoch on colab for this is 2.15 seconds per epoch
# original implementation recommends 75,000 epochs. That would take 44 hours with our code.
# early stopping stopped at 124 epochs!
train = True
if train:
sc_model = RNAStereoscope(sc_adata)
# here we use "weight_decay=0" to reproduce Stereoscope, but we advise to use the default parameter for all other applications
sc_model.train(max_epochs=100, plan_kwargs={"weight_decay": 0}, check_val_every_n_epoch=1, train_size=1.0, validation_size=0.0,
early_stopping=False, early_stopping_monitor="elbo_validation", early_stopping_patience=5, early_stopping_min_delta=1)
plt.plot(sc_model.history["elbo_train"][10:], label="train")
plt.title("loss over training epochs")
plt.legend()
plt.show()
sc_model.save("scmodel", overwrite=True)
else:
sc_model = RNAStereoscope.load("scmodel", sc_adata)
print("Loaded RNA model from file!")
# time per epoch on colab for this is 2.15 seconds per epoch
# original implementation recommends 75,000 epochs. That would take 44 hours with our code.
# early stopping stopped at 124 epochs!
train = True
if train:
sc_model = RNAStereoscope(sc_adata)
# here we use "weight_decay=0" to reproduce Stereoscope, but we advise to use the default parameter for all other applications
sc_model.train(max_epochs=100, plan_kwargs={"weight_decay": 0}, check_val_every_n_epoch=1, train_size=1.0, validation_size=0.0,
early_stopping=False, early_stopping_monitor="elbo_validation", early_stopping_patience=5, early_stopping_min_delta=1)
plt.plot(sc_model.history["elbo_train"][10:], label="train")
plt.title("loss over training epochs")
plt.legend()
plt.show()
sc_model.save("scmodel", overwrite=True)
else:
sc_model = RNAStereoscope.load("scmodel", sc_adata)
print("Loaded RNA model from file!")
GPU available: True, used: True TPU available: False, using: 0 TPU cores LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
Epoch 100/100: 100%|██████████| 100/100 [00:55<00:00, 1.82it/s, loss=5.67e+04, v_num=1]
small detour to get the gene by cell type matrix for the figure
In [17]:
Copied!
params = sc_model.module.get_params()
import scipy.cluster.hierarchy as sch
Y = sch.linkage(params[0], method="centroid")
Z1 = sch.dendrogram(Y, orientation="left")
X = sch.linkage(params[0].T, method="single")
Z2 = sch.dendrogram(X)
# .matshow(D, aspect='auto', origin='lower', cmap=pylab.cm.YlGnBu)
plt.figure(figsize=(3, 3))
plt.imshow(params[0][Z1["leaves"]][:, Z2["leaves"]][:200], aspect=0.1, cmap="YlGnBu")
plt.gca().axes.xaxis.set_ticks([])
plt.gca().axes.yaxis.set_ticks([])
plt.xlabel("Cell types")
plt.ylabel("Genes")
plt.savefig("figs/stereoscope/dict_gene_exp.pdf")
plt.show()
params = sc_model.module.get_params()
import scipy.cluster.hierarchy as sch
Y = sch.linkage(params[0], method="centroid")
Z1 = sch.dendrogram(Y, orientation="left")
X = sch.linkage(params[0].T, method="single")
Z2 = sch.dendrogram(X)
# .matshow(D, aspect='auto', origin='lower', cmap=pylab.cm.YlGnBu)
plt.figure(figsize=(3, 3))
plt.imshow(params[0][Z1["leaves"]][:, Z2["leaves"]][:200], aspect=0.1, cmap="YlGnBu")
plt.gca().axes.xaxis.set_ticks([])
plt.gca().axes.yaxis.set_ticks([])
plt.xlabel("Cell types")
plt.ylabel("Genes")
plt.savefig("figs/stereoscope/dict_gene_exp.pdf")
plt.show()
We can check that marker genes have high value of the parameters in the single-cell model. For example, a CA1 marker. Check that the parameter of the marker gene is larger for the CA1 cell type!
In [18]:
Copied!
list(zip(sc_adata.uns["_scvi"]["categorical_mappings"]["_scvi_labels"]["mapping"], params[0][np.where(sc_adata.var_names == "Fibcd1")[0][0],]))
list(zip(sc_adata.uns["_scvi"]["categorical_mappings"]["_scvi_labels"]["mapping"], params[0][np.where(sc_adata.var_names == "Fibcd1")[0][0],]))
Out[18]:
[('Astrocyte', -5.235477), ('CA1', -0.7815605), ('CA3', -4.9932957), ('Cajal_Retzius', -3.7664227), ('Choroid', -4.8968277), ('Denate', -4.054734), ('Endothelial_Stalk', -5.283406), ('Endothelial_Tip', -5.260552), ('Entorihinal', -4.5145183), ('Ependymal', -5.1968904), ('Interneuron', -4.912203), ('Microglia_Macrophages', -4.367293), ('Mural', -5.234314), ('Neurogenesis', -5.26265), ('Neuron.Slc17a6', -4.734822), ('Oligodendrocyte', -5.039871), ('Polydendrocyte', -4.4008713)]
In [19]:
Copied!
list(zip(sc_adata.uns["_scvi"]["categorical_mappings"]["_scvi_labels"]["mapping"], params[0][np.where(sc_adata.var_names == "Fibcd1")[0][0],]))
list(zip(sc_adata.uns["_scvi"]["categorical_mappings"]["_scvi_labels"]["mapping"], params[0][np.where(sc_adata.var_names == "Fibcd1")[0][0],]))
Out[19]:
[('Astrocyte', -5.235477), ('CA1', -0.7815605), ('CA3', -4.9932957), ('Cajal_Retzius', -3.7664227), ('Choroid', -4.8968277), ('Denate', -4.054734), ('Endothelial_Stalk', -5.283406), ('Endothelial_Tip', -5.260552), ('Entorihinal', -4.5145183), ('Ependymal', -5.1968904), ('Interneuron', -4.912203), ('Microglia_Macrophages', -4.367293), ('Mural', -5.234314), ('Neurogenesis', -5.26265), ('Neuron.Slc17a6', -4.734822), ('Oligodendrocyte', -5.039871), ('Polydendrocyte', -4.4008713)]
Infer proportion for spatial data¶
In [20]:
Copied!
sl_adata.layers["counts"] = sl_adata.X.copy()
scvi.data.setup_anndata(sl_adata, layer="counts")
sl_adata.layers["counts"] = sl_adata.X.copy()
scvi.data.setup_anndata(sl_adata, layer="counts")
INFO No batch_key inputted, assuming all cells are same batch INFO No label_key inputted, assuming all cells have same label INFO Using data from adata.layers["counts"] INFO Successfully registered anndata object containing 472 cells, 1898 vars, 1 batches, 1 labels, and 0 proteins. Also registered 0 extra categorical covariates and 0 extra continuous covariates. INFO Please do not further modify adata until model is trained.
/usr/local/lib/python3.7/dist-packages/sklearn/utils/deprecation.py:87: FutureWarning: Function setup_anndata is deprecated; Please use the model-specific setup_anndata methods instead. The global method will be removed in version 0.15.0. warnings.warn(msg, category=FutureWarning)
In [21]:
Copied!
# time per epoch on colab for this is 0.09 seconds per epoch
# original implementation uses 75,000 epochs, that would take 1.8 hours with this code
# early stopping stopped at
train = True
if train:
# here we use "minibatch" to reproduce Stereoscope, but we advise to use the default parameter for all other applications
# here we use "weight_decay=0" to reproduce Stereoscope, but we advise to use the default parameter for all other applications
spatial_model = SpatialStereoscope.from_rna_model(
sl_adata, sc_model, prior_weight="minibatch"
)
spatial_model.train(max_epochs=5000, plan_kwargs={"weight_decay": 0}, check_val_every_n_epoch=1, early_stopping=False,
early_stopping_monitor="elbo_train", early_stopping_patience=5, early_stopping_min_delta=0.4)
plt.plot(spatial_model.history["elbo_train"], label="train")
plt.title("loss over training epochs")
plt.legend()
plt.show()
spatial_model.save("stmodel", overwrite=True)
else:
spatial_model = SpatialStereoscope.load("stmodel", sl_adata)
print("Loaded Spatial model from file!")
# time per epoch on colab for this is 0.09 seconds per epoch
# original implementation uses 75,000 epochs, that would take 1.8 hours with this code
# early stopping stopped at
train = True
if train:
# here we use "minibatch" to reproduce Stereoscope, but we advise to use the default parameter for all other applications
# here we use "weight_decay=0" to reproduce Stereoscope, but we advise to use the default parameter for all other applications
spatial_model = SpatialStereoscope.from_rna_model(
sl_adata, sc_model, prior_weight="minibatch"
)
spatial_model.train(max_epochs=5000, plan_kwargs={"weight_decay": 0}, check_val_every_n_epoch=1, early_stopping=False,
early_stopping_monitor="elbo_train", early_stopping_patience=5, early_stopping_min_delta=0.4)
plt.plot(spatial_model.history["elbo_train"], label="train")
plt.title("loss over training epochs")
plt.legend()
plt.show()
spatial_model.save("stmodel", overwrite=True)
else:
spatial_model = SpatialStereoscope.load("stmodel", sl_adata)
print("Loaded Spatial model from file!")
GPU available: True, used: True TPU available: False, using: 0 TPU cores LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
Epoch 5000/5000: 100%|██████████| 5000/5000 [02:36<00:00, 32.01it/s, loss=2.58e+05, v_num=1]
In [22]:
Copied!
spatial_model.history["elbo_train"][-20:]
spatial_model.history["elbo_train"][-20:]
Out[22]:
elbo_train | |
---|---|
epoch | |
4980 | 2111.38 |
4981 | 2111.33 |
4982 | 2111.27 |
4983 | 2111.21 |
4984 | 2111.15 |
4985 | 2111.1 |
4986 | 2111.05 |
4987 | 2110.99 |
4988 | 2110.92 |
4989 | 2110.87 |
4990 | 2110.82 |
4991 | 2110.77 |
4992 | 2110.7 |
4993 | 2110.64 |
4994 | 2110.58 |
4995 | 2110.52 |
4996 | 2110.46 |
4997 | 2110.41 |
4998 | 2110.35 |
4999 | 2110.29 |
In [23]:
Copied!
sl_adata.obsm["deconvolution"] = spatial_model.get_proportions()
# unconstrained proportions
import torch
import pandas as pd
with torch.no_grad():
keep_noise = False
res = torch.nn.functional.softplus(spatial_model.module.V).cpu().numpy().T
if not keep_noise:
res = res[:, :-1]
column_names = spatial_model.cell_type_mapping
sl_adata.obsm["deconvolution_unconstr"] = pd.DataFrame(
data=res,
columns=column_names,
index=spatial_model.adata.obs.index,
)
sl_adata.obsm["deconvolution"] = spatial_model.get_proportions()
# unconstrained proportions
import torch
import pandas as pd
with torch.no_grad():
keep_noise = False
res = torch.nn.functional.softplus(spatial_model.module.V).cpu().numpy().T
if not keep_noise:
res = res[:, :-1]
column_names = spatial_model.cell_type_mapping
sl_adata.obsm["deconvolution_unconstr"] = pd.DataFrame(
data=res,
columns=column_names,
index=spatial_model.adata.obs.index,
)
In [24]:
Copied!
for ct in sl_adata.obsm["deconvolution"].columns:
sl_adata.obs[ct] = sl_adata.obsm["deconvolution"][ct]
for ct in sl_adata.obsm["deconvolution"].columns:
sl_adata.obs[ct] = sl_adata.obsm["deconvolution"][ct]
In [25]:
Copied!
sc.pl.embedding(sl_adata, basis="spatial", color=["CA1", "CA3", "Denate", "Oligodendrocyte"], size=200)
sc.pl.embedding(sl_adata, basis="spatial", color=["CA1", "CA3", "Denate", "Oligodendrocyte"], size=200)
In [26]:
Copied!
sc.pl.embedding(sl_adata, basis="spatial", color=["Astrocyte", "Entorihinal", "Endothelial_Stalk", "Endothelial_Tip"], size=200)
sc.pl.embedding(sl_adata, basis="spatial", color=["Astrocyte", "Entorihinal", "Endothelial_Stalk", "Endothelial_Tip"], size=200)
In [27]:
Copied!
sl_adata.obsm["deconvolution"].columns
sl_adata.obsm["deconvolution"].columns
Out[27]:
Index(['Astrocyte', 'CA1', 'CA3', 'Cajal_Retzius', 'Choroid', 'Denate', 'Endothelial_Stalk', 'Endothelial_Tip', 'Entorihinal', 'Ependymal', 'Interneuron', 'Microglia_Macrophages', 'Mural', 'Neurogenesis', 'Neuron.Slc17a6', 'Oligodendrocyte', 'Polydendrocyte'], dtype='object')
small detour to get a nice plot for the paper
In [28]:
Copied!
plt.scatter(sl_adata.obsm["spatial"][:, 0], sl_adata.obsm["spatial"][:, 1], c=sl_adata.obsm["deconvolution"].values.argmax(1), cmap="Set2")
# cmap="YlGnBu")
plt.gca().axes.xaxis.set_ticks([])
plt.gca().axes.yaxis.set_ticks([])
# plt.xlabel("Cell types")
# plt.ylabel("Genes")
plt.savefig("figs/stereoscope/cell_type_location.pdf")
plt.scatter(sl_adata.obsm["spatial"][:, 0], sl_adata.obsm["spatial"][:, 1], c=sl_adata.obsm["deconvolution"].values.argmax(1), cmap="Set2")
# cmap="YlGnBu")
plt.gca().axes.xaxis.set_ticks([])
plt.gca().axes.yaxis.set_ticks([])
# plt.xlabel("Cell types")
# plt.ylabel("Genes")
plt.savefig("figs/stereoscope/cell_type_location.pdf")
Save files for input to original Stereoscope
In [29]:
Copied!
del sl_adata.obs["filter"]
sl_adata.write_h5ad("sl_adata.h5ad")
sc_adata.write_h5ad("sc_adata.h5ad")
del sl_adata.obs["filter"]
sl_adata.write_h5ad("sl_adata.h5ad")
sc_adata.write_h5ad("sc_adata.h5ad")
Install Stereoscope from source¶
In [30]:
Copied!
# must use my own fork because of circular dependencies in the original codebase that didn't work out for Colab (unknown reason; only two import lines deleted)
!git clone https://github.com/adamgayoso/stereoscope
# must use my own fork because of circular dependencies in the original codebase that didn't work out for Colab (unknown reason; only two import lines deleted)
!git clone https://github.com/adamgayoso/stereoscope
Cloning into 'stereoscope'... remote: Enumerating objects: 590, done. remote: Counting objects: 100% (282/282), done. remote: Compressing objects: 100% (216/216), done. remote: Total 590 (delta 151), reused 171 (delta 63), pack-reused 308 Receiving objects: 100% (590/590), 57.61 MiB | 32.48 MiB/s, done. Resolving deltas: 100% (304/304), done.
In [31]:
Copied!
%%capture
!cd stereoscope; ./setup.py install
%%capture
!cd stereoscope; ./setup.py install
In [32]:
Copied!
!python -c "import stsc; print(stsc.__version__)"
!stereoscope test
!python -c "import stsc; print(stsc.__version__)"
!stereoscope test
stereoscope : 0.2.0 Successfully installed stereoscope CLI
Run Stereoscope & read results¶
In [33]:
Copied!
# SCMODEL: time per epoch on colab for this is 1.8 seconds per epoch
# original implementation uses 75,000 epochs, that would take 37.5 hours
# STMODEL: time per epoch on colab for this is 0.036 seconds per epoch
# original implementation uses 75,000 epochs, that would take 45 min
import time
start = time.time()
!stereoscope run --sc_cnt sc_adata.h5ad --label_colname liger_ident_coarse --st_cnt sl_adata.h5ad \
-sce 100 -o stereo-out -n 2000 -ste 5000 --gpu -stb 128 -scb 128
end = time.time()
print(end-start)
# SCMODEL: time per epoch on colab for this is 1.8 seconds per epoch
# original implementation uses 75,000 epochs, that would take 37.5 hours
# STMODEL: time per epoch on colab for this is 0.036 seconds per epoch
# original implementation uses 75,000 epochs, that would take 45 min
import time
start = time.time()
!stereoscope run --sc_cnt sc_adata.h5ad --label_colname liger_ident_coarse --st_cnt sl_adata.h5ad \
-sce 100 -o stereo-out -n 2000 -ste 5000 --gpu -stb 128 -scb 128
end = time.time()
print(end-start)
[2021-10-14 17:26:57,344 - stsc - INFO ] >> Using device cuda [2021-10-14 17:26:57,344 - stsc - INFO ] >> fitting sc data | count file : sc_adata.h5ad | labels file : None [2021-10-14 17:26:57,808 - stsc - INFO ] >> SC data GENES : 1898 SC data CELLS : 15095 SC data TYPES : 17 Batch size: 128 Epoch : 100/100 | Loss : 6.791583E+06 | [ =================== ] [2021-10-14 17:27:53,717 - stsc - INFO ] >> fit st data section(s) : ['sl_adata.h5ad'] [2021-10-14 17:27:53,810 - stsc - INFO ] >> ST data GENES : 1898 ST data SPOTS : 472 Batch size: 128 Epoch : 5000/5000 | Loss : 1.029873E+06 | [ =================== ] [2021-10-14 17:29:03,896 - stsc - INFO ] >> saving proportions for section sl_adata to stereo-out/sl_adata/W.2021-10-14172657.337016.tsv [2021-10-14 17:29:03,906 - stsc - INFO ] >> saving proportions (unconstr) for section sl_adata to stereo-out/sl_adata/W_unconstr.2021-10-14172657.337016.tsv 131.0823802947998
In [34]:
Copied!
# second run for assessing stochasticity reproducibility
start = time.time()
!stereoscope run --sc_cnt sc_adata.h5ad --label_colname liger_ident_coarse --st_cnt sl_adata.h5ad \
-sce 100 -o stereo-out-2 -n 2000 -ste 5000 --gpu -stb 128 -scb 128
end = time.time()
print(end-start)
# second run for assessing stochasticity reproducibility
start = time.time()
!stereoscope run --sc_cnt sc_adata.h5ad --label_colname liger_ident_coarse --st_cnt sl_adata.h5ad \
-sce 100 -o stereo-out-2 -n 2000 -ste 5000 --gpu -stb 128 -scb 128
end = time.time()
print(end-start)
[2021-10-14 17:29:08,424 - stsc - INFO ] >> Using device cuda [2021-10-14 17:29:08,424 - stsc - INFO ] >> fitting sc data | count file : sc_adata.h5ad | labels file : None [2021-10-14 17:29:08,907 - stsc - INFO ] >> SC data GENES : 1898 SC data CELLS : 15095 SC data TYPES : 17 Batch size: 128 Epoch : 100/100 | Loss : 6.788670E+06 | [ =================== ] [2021-10-14 17:30:03,800 - stsc - INFO ] >> fit st data section(s) : ['sl_adata.h5ad'] [2021-10-14 17:30:03,888 - stsc - INFO ] >> ST data GENES : 1898 ST data SPOTS : 472 Batch size: 128 Epoch : 5000/5000 | Loss : 1.019606E+06 | [ =================== ] [2021-10-14 17:31:14,477 - stsc - INFO ] >> saving proportions for section sl_adata to stereo-out-2/sl_adata/W.2021-10-14172908.417238.tsv [2021-10-14 17:31:14,488 - stsc - INFO ] >> saving proportions (unconstr) for section sl_adata to stereo-out-2/sl_adata/W_unconstr.2021-10-14172908.417238.tsv 130.58961939811707
In [35]:
Copied!
import os
path = "stereo-out/sl_adata/"
pl = os.listdir(path)
filename = pl[0] if "unconstr" in pl[0] else pl[1]
stereo_out = pd.read_csv(path+filename, sep="\t", index_col=0)
stereo_out.index = sl_adata.obs.index.copy()
path = "stereo-out-2/sl_adata/"
pl = os.listdir(path)
filename = pl[0] if "unconstr" in pl[0] else pl[1]
stereo_out_2 = pd.read_csv(path+filename, sep="\t", index_col=0)
stereo_out_2.index = sl_adata.obs.index.copy()
import os
path = "stereo-out/sl_adata/"
pl = os.listdir(path)
filename = pl[0] if "unconstr" in pl[0] else pl[1]
stereo_out = pd.read_csv(path+filename, sep="\t", index_col=0)
stereo_out.index = sl_adata.obs.index.copy()
path = "stereo-out-2/sl_adata/"
pl = os.listdir(path)
filename = pl[0] if "unconstr" in pl[0] else pl[1]
stereo_out_2 = pd.read_csv(path+filename, sep="\t", index_col=0)
stereo_out_2.index = sl_adata.obs.index.copy()
In [36]:
Copied!
def prettify_axis(ax):
# Hide the right and top spines
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)
# Only show ticks on the left and bottom spines
ax.yaxis.set_ticks_position('left')
ax.xaxis.set_ticks_position('bottom')
def prettify_axis(ax):
# Hide the right and top spines
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)
# Only show ticks on the left and bottom spines
ax.yaxis.set_ticks_position('left')
ax.xaxis.set_ticks_position('bottom')
In [37]:
Copied!
from scipy.stats import spearmanr, pearsonr
scores_spearman = []
scores_pearson = []
for i, ct in enumerate(sl_adata.obsm["deconvolution_unconstr"].columns):
data = sl_adata.obsm["deconvolution_unconstr"][ct]
scores_spearman.append(spearmanr(data, stereo_out[ct]).correlation)
scores_pearson.append(pearsonr(data, stereo_out[ct])[0])
scores_spearman_2 = []
scores_pearson_2 = []
for i, ct in enumerate(sl_adata.obsm["deconvolution_unconstr"].columns):
data = stereo_out_2[ct]
scores_spearman_2.append(spearmanr(data, stereo_out[ct]).correlation)
scores_pearson_2.append(pearsonr(data, stereo_out[ct])[0])
from scipy.stats import spearmanr, pearsonr
scores_spearman = []
scores_pearson = []
for i, ct in enumerate(sl_adata.obsm["deconvolution_unconstr"].columns):
data = sl_adata.obsm["deconvolution_unconstr"][ct]
scores_spearman.append(spearmanr(data, stereo_out[ct]).correlation)
scores_pearson.append(pearsonr(data, stereo_out[ct])[0])
scores_spearman_2 = []
scores_pearson_2 = []
for i, ct in enumerate(sl_adata.obsm["deconvolution_unconstr"].columns):
data = stereo_out_2[ct]
scores_spearman_2.append(spearmanr(data, stereo_out[ct]).correlation)
scores_pearson_2.append(pearsonr(data, stereo_out[ct])[0])
In [38]:
Copied!
fig, axs = plt.subplots(2, 3, figsize=(7, 4))
indices = [11, 10, 4]
for i, ct in enumerate(["Interneuron", "Astrocyte", "Denate"]):
prettify_axis(axs[0, i])
axs[0, i].scatter(
sl_adata.obsm["deconvolution_unconstr"][ct], stereo_out[ct], marker="x", s=10
)
ax = axs[0, i]
lims = [
np.min([ax.get_xlim(), ax.get_ylim()]), # min of both axes
np.max([ax.get_xlim(), ax.get_ylim()]), # max of both axes
]
# now plot both limits against eachother
ax.plot(lims, lims, "k-", alpha=0.75, zorder=0)
axs[0, i].set_title(ct)
axs[0, i].text(
0.05,
0.7,
"r=" + "{:0.2f}".format(scores_spearman[indices[i]]),
fontsize=12,
transform=axs[0, i].transAxes,
)
axs[0, i].set_aspect("equal", adjustable="box")
ax.set_xlim(lims)
ax.set_ylim(lims)
# axs[i].plot(np.linspace(0, 1, 50), np.linspace(0, 1, 50), c="black")
indices = [7, 9, 12]
for i, ct in enumerate(["Endothelial_Tip", "Ependymal", "Mural"]):
prettify_axis(axs[1, i])
axs[1, i].scatter(
sl_adata.obsm["deconvolution_unconstr"][ct], stereo_out[ct], marker="x", s=10
)
ax = axs[1, i]
lims = [
np.min([ax.get_xlim(), ax.get_ylim()]), # min of both axes
np.max([ax.get_xlim(), ax.get_ylim()]), # max of both axes
]
# now plot both limits against eachother
ax.plot(lims, lims, "k-", alpha=0.75, zorder=0)
axs[1, i].set_title(ct)
axs[1, i].text(
0.05,
0.7,
"r=" + "{:0.2f}".format(scores_spearman[indices[i]]),
fontsize=12,
transform=axs[1, i].transAxes,
)
axs[1, i].set_aspect("equal", adjustable="box")
ax.set_xlim(lims)
ax.set_ylim(lims)
# axs[i].plot(np.linspace(0, 1, 50), np.linspace(0, 1, 50), c="black")
plt.tight_layout()
plt.savefig("figs/stereoscope/stereoscope_scatter.pdf", bbox_inches="tight", dpi=DPI)
plt.show()
fig, axs = plt.subplots(2, 3, figsize=(7, 4))
indices = [11, 10, 4]
for i, ct in enumerate(["Interneuron", "Astrocyte", "Denate"]):
prettify_axis(axs[0, i])
axs[0, i].scatter(
sl_adata.obsm["deconvolution_unconstr"][ct], stereo_out[ct], marker="x", s=10
)
ax = axs[0, i]
lims = [
np.min([ax.get_xlim(), ax.get_ylim()]), # min of both axes
np.max([ax.get_xlim(), ax.get_ylim()]), # max of both axes
]
# now plot both limits against eachother
ax.plot(lims, lims, "k-", alpha=0.75, zorder=0)
axs[0, i].set_title(ct)
axs[0, i].text(
0.05,
0.7,
"r=" + "{:0.2f}".format(scores_spearman[indices[i]]),
fontsize=12,
transform=axs[0, i].transAxes,
)
axs[0, i].set_aspect("equal", adjustable="box")
ax.set_xlim(lims)
ax.set_ylim(lims)
# axs[i].plot(np.linspace(0, 1, 50), np.linspace(0, 1, 50), c="black")
indices = [7, 9, 12]
for i, ct in enumerate(["Endothelial_Tip", "Ependymal", "Mural"]):
prettify_axis(axs[1, i])
axs[1, i].scatter(
sl_adata.obsm["deconvolution_unconstr"][ct], stereo_out[ct], marker="x", s=10
)
ax = axs[1, i]
lims = [
np.min([ax.get_xlim(), ax.get_ylim()]), # min of both axes
np.max([ax.get_xlim(), ax.get_ylim()]), # max of both axes
]
# now plot both limits against eachother
ax.plot(lims, lims, "k-", alpha=0.75, zorder=0)
axs[1, i].set_title(ct)
axs[1, i].text(
0.05,
0.7,
"r=" + "{:0.2f}".format(scores_spearman[indices[i]]),
fontsize=12,
transform=axs[1, i].transAxes,
)
axs[1, i].set_aspect("equal", adjustable="box")
ax.set_xlim(lims)
ax.set_ylim(lims)
# axs[i].plot(np.linspace(0, 1, 50), np.linspace(0, 1, 50), c="black")
plt.tight_layout()
plt.savefig("figs/stereoscope/stereoscope_scatter.pdf", bbox_inches="tight", dpi=DPI)
plt.show()
In [39]:
Copied!
# the order of the three less reproducible cell types is the same (7, 9, 12) <-> (Endothelial_Tip, Ependymal, Mural)
# the first two of those list are among the least present in the scRNA-seq
plt.figure(figsize=(3, 3))
plt.boxplot([scores_pearson, scores_pearson_2, scores_spearman, scores_spearman_2])
plt.xticks([1, 2, 3, 4], ['Pearson \n ours <-> theirs', 'Pearson \n theirs <-> theirs', 'Spearman \n ours <-> theirs', 'Spearman \n theirs <-> theirs',], rotation=90)
# plt.ylim((0, 1.1))
plt.title("Reproducibility")
plt.savefig("figs/stereoscope/stereoscope_box.pdf", bbox_inches="tight", dpi=DPI)
plt.show()
# the order of the three less reproducible cell types is the same (7, 9, 12) <-> (Endothelial_Tip, Ependymal, Mural)
# the first two of those list are among the least present in the scRNA-seq
plt.figure(figsize=(3, 3))
plt.boxplot([scores_pearson, scores_pearson_2, scores_spearman, scores_spearman_2])
plt.xticks([1, 2, 3, 4], ['Pearson \n ours <-> theirs', 'Pearson \n theirs <-> theirs', 'Spearman \n ours <-> theirs', 'Spearman \n theirs <-> theirs',], rotation=90)
# plt.ylim((0, 1.1))
plt.title("Reproducibility")
plt.savefig("figs/stereoscope/stereoscope_box.pdf", bbox_inches="tight", dpi=DPI)
plt.show()
In [39]:
Copied!