In this notebook, we propose a demonstration of our reimplementation of Stereoscope in the new scvi-tools codebase. We apply our implementation as well as the original code on 10x Visium data from the hippocampus of a mouse brain.

Stereoscope¶

Install scvi-tools¶

In [1]:

import sys

#if True, will install via pypi, else will install from source
stable = False
IN_COLAB = "google.colab" in sys.modules

if IN_COLAB and stable:
    !pip install --quiet scvi-tools[tutorials]==0.14.1
    # !pip install --upgrade pytorch-lightning==1.4.9
elif IN_COLAB and not stable:
    !pip install --quiet --upgrade jsonschema
    !pip install --quiet git+https://github.com/yoseflab/scvi-tools@stereoscope_sum#egg=scvi-tools[tutorials]
    
if IN_COLAB:
    !mkdir -p figs/stereoscope
import sys

#if True, will install via pypi, else will install from source
stable = False
IN_COLAB = "google.colab" in sys.modules

if IN_COLAB and stable:
    !pip install --quiet scvi-tools[tutorials]==0.14.1
    # !pip install --upgrade pytorch-lightning==1.4.9
elif IN_COLAB and not stable:
    !pip install --quiet --upgrade jsonschema
    !pip install --quiet git+https://github.com/yoseflab/scvi-tools@stereoscope_sum#egg=scvi-tools[tutorials]
    
if IN_COLAB:
    !mkdir -p figs/stereoscope

     |████████████████████████████████| 69 kB 4.2 MB/s 
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
nbclient 0.5.4 requires jupyter-client>=6.1.5, but you have jupyter-client 5.3.5 which is incompatible.
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
    Preparing wheel metadata ... done
     |████████████████████████████████| 212 kB 6.7 MB/s 
     |████████████████████████████████| 127 kB 55.6 MB/s 
     |████████████████████████████████| 813 kB 65.6 MB/s 
     |████████████████████████████████| 242 kB 99.8 MB/s 
     |████████████████████████████████| 678 kB 62.4 MB/s 
     |████████████████████████████████| 1.4 MB 27.1 MB/s 
     |████████████████████████████████| 2.0 MB 76.6 MB/s 
     |████████████████████████████████| 8.8 MB 20.2 MB/s 
     |████████████████████████████████| 41 kB 146 kB/s 
     |████████████████████████████████| 3.2 MB 59.0 MB/s 
     |████████████████████████████████| 48 kB 7.1 MB/s 
     |████████████████████████████████| 125 kB 71.5 MB/s 
     |████████████████████████████████| 636 kB 75.3 MB/s 
     |████████████████████████████████| 829 kB 69.5 MB/s 
     |████████████████████████████████| 282 kB 83.8 MB/s 
     |████████████████████████████████| 1.3 MB 60.1 MB/s 
     |████████████████████████████████| 51 kB 9.2 MB/s 
     |████████████████████████████████| 80 kB 12.2 MB/s 
     |████████████████████████████████| 1.1 MB 77.5 MB/s 
     |████████████████████████████████| 160 kB 76.0 MB/s 
     |████████████████████████████████| 271 kB 80.5 MB/s 
     |████████████████████████████████| 63 kB 2.4 MB/s 
  Building wheel for scvi-tools (PEP 517) ... done
  Building wheel for docrep (setup.py) ... done
  Building wheel for loompy (setup.py) ... done
  Building wheel for future (setup.py) ... done
  Building wheel for umap-learn (setup.py) ... done
  Building wheel for pynndescent (setup.py) ... done
  Building wheel for numpy-groupies (setup.py) ... done
  Building wheel for sinfo (setup.py) ... done

Get the single-cell data¶

In [2]:

!wget https://github.com/YosefLab/scVI-data/blob/master/scRefSubsampled1000_hippocampus_compressed.h5ad?raw=true -O scRef.h5ad
!wget https://github.com/YosefLab/scVI-data/blob/master/scRefSubsampled1000_hippocampus_compressed.h5ad?raw=true -O scRef.h5ad

--2021-10-14 17:21:41--  https://github.com/YosefLab/scVI-data/blob/master/scRefSubsampled1000_hippocampus_compressed.h5ad?raw=true
Resolving github.com (github.com)... 192.30.255.112
Connecting to github.com (github.com)|192.30.255.112|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://github.com/YosefLab/scVI-data/raw/master/scRefSubsampled1000_hippocampus_compressed.h5ad [following]
--2021-10-14 17:21:41--  https://github.com/YosefLab/scVI-data/raw/master/scRefSubsampled1000_hippocampus_compressed.h5ad
Reusing existing connection to github.com:443.
HTTP request sent, awaiting response... 302 Found
Location: https://raw.githubusercontent.com/YosefLab/scVI-data/master/scRefSubsampled1000_hippocampus_compressed.h5ad [following]
--2021-10-14 17:21:41--  https://raw.githubusercontent.com/YosefLab/scVI-data/master/scRefSubsampled1000_hippocampus_compressed.h5ad
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.111.133, 185.199.110.133, 185.199.108.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.111.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 40290764 (38M) [application/octet-stream]
Saving to: ‘scRef.h5ad’

scRef.h5ad          100%[===================>]  38.42M   254MB/s    in 0.2s    

2021-10-14 17:21:43 (254 MB/s) - ‘scRef.h5ad’ saved [40290764/40290764]

Import libraries¶

In [3]:

import scanpy as sc
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns

%matplotlib inline
import scanpy as sc
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns

%matplotlib inline

In [4]:

sns.reset_orig()
sc.settings._vector_friendly = True
# p9.theme_set(p9.theme_classic)
plt.rcParams["svg.fonttype"] = "none"
plt.rcParams["pdf.fonttype"] = 42
plt.rcParams["savefig.transparent"] = True
plt.rcParams["figure.figsize"] = (4, 4)

plt.rcParams["axes.titlesize"] = 15
plt.rcParams["axes.titleweight"] = 500
plt.rcParams["axes.titlepad"] = 8.0
plt.rcParams["axes.labelsize"] = 14
plt.rcParams["axes.labelweight"] = 500
plt.rcParams["axes.linewidth"] = 1.2
plt.rcParams["axes.labelpad"] = 6.0
plt.rcParams["axes.spines.top"] = False
plt.rcParams["axes.spines.right"] = False


plt.rcParams["font.size"] = 11
# plt.rcParams['font.family'] = 'sans-serif'
plt.rcParams['font.sans-serif'] = ['Helvetica', "Computer Modern Sans Serif", "DejaVU Sans"]
plt.rcParams['font.weight'] = 500

plt.rcParams['xtick.labelsize'] = 12
plt.rcParams['xtick.minor.size'] = 1.375
plt.rcParams['xtick.major.size'] = 2.75
plt.rcParams['xtick.major.pad'] = 2
plt.rcParams['xtick.minor.pad'] = 2

plt.rcParams['ytick.labelsize'] = 12
plt.rcParams['ytick.minor.size'] = 1.375
plt.rcParams['ytick.major.size'] = 2.75
plt.rcParams['ytick.major.pad'] = 2
plt.rcParams['ytick.minor.pad'] = 2

plt.rcParams["legend.fontsize"] = 12
plt.rcParams['legend.handlelength'] = 1.4
plt.rcParams['legend.numpoints'] = 1
plt.rcParams['legend.scatterpoints'] = 3
plt.rcParams['legend.frameon'] = False

plt.rcParams['lines.linewidth'] = 1.7

DPI = 300
sns.reset_orig()
sc.settings._vector_friendly = True
# p9.theme_set(p9.theme_classic)
plt.rcParams["svg.fonttype"] = "none"
plt.rcParams["pdf.fonttype"] = 42
plt.rcParams["savefig.transparent"] = True
plt.rcParams["figure.figsize"] = (4, 4)

plt.rcParams["axes.titlesize"] = 15
plt.rcParams["axes.titleweight"] = 500
plt.rcParams["axes.titlepad"] = 8.0
plt.rcParams["axes.labelsize"] = 14
plt.rcParams["axes.labelweight"] = 500
plt.rcParams["axes.linewidth"] = 1.2
plt.rcParams["axes.labelpad"] = 6.0
plt.rcParams["axes.spines.top"] = False
plt.rcParams["axes.spines.right"] = False


plt.rcParams["font.size"] = 11
# plt.rcParams['font.family'] = 'sans-serif'
plt.rcParams['font.sans-serif'] = ['Helvetica', "Computer Modern Sans Serif", "DejaVU Sans"]
plt.rcParams['font.weight'] = 500

plt.rcParams['xtick.labelsize'] = 12
plt.rcParams['xtick.minor.size'] = 1.375
plt.rcParams['xtick.major.size'] = 2.75
plt.rcParams['xtick.major.pad'] = 2
plt.rcParams['xtick.minor.pad'] = 2

plt.rcParams['ytick.labelsize'] = 12
plt.rcParams['ytick.minor.size'] = 1.375
plt.rcParams['ytick.major.size'] = 2.75
plt.rcParams['ytick.major.pad'] = 2
plt.rcParams['ytick.minor.pad'] = 2

plt.rcParams["legend.fontsize"] = 12
plt.rcParams['legend.handlelength'] = 1.4
plt.rcParams['legend.numpoints'] = 1
plt.rcParams['legend.scatterpoints'] = 3
plt.rcParams['legend.frameon'] = False

plt.rcParams['lines.linewidth'] = 1.7

DPI = 300

Read Spatial transcriptomics data¶

In [5]:

st_adata = sc.datasets.visium_sge(sample_id="V1_Adult_Mouse_Brain")
st_adata.var_names_make_unique()
st_adata = sc.datasets.visium_sge(sample_id="V1_Adult_Mouse_Brain")
st_adata.var_names_make_unique()

Variable names are not unique. To make them unique, call `.var_names_make_unique`.
Variable names are not unique. To make them unique, call `.var_names_make_unique`.

In [6]:

sc.pl.spatial(st_adata, img_key="hires", color=["Ptk2b"])
sc.pl.spatial(st_adata, img_key="hires", color=["Ptk2b"])

... storing 'feature_types' as categorical
... storing 'genome' as categorical

In [7]:

# Take a slice and focus on the hippocampus
sl_adata = st_adata[st_adata.obs["array_row"] > 15]
sl_adata = sl_adata[sl_adata.obs["array_row"] < 32]
sl_adata = sl_adata[sl_adata.obs["array_col"] > 20]
sl_adata = sl_adata[sl_adata.obs["array_col"] < 80]

st_adata.obs["filter"] = pd.Categorical([x in sl_adata.obs.index for x in st_adata.obs.index])
sc.pl.embedding(st_adata, basis="spatial", color="filter")
# Take a slice and focus on the hippocampus
sl_adata = st_adata[st_adata.obs["array_row"] > 15]
sl_adata = sl_adata[sl_adata.obs["array_row"] < 32]
sl_adata = sl_adata[sl_adata.obs["array_col"] > 20]
sl_adata = sl_adata[sl_adata.obs["array_col"] < 80]

st_adata.obs["filter"] = pd.Categorical([x in sl_adata.obs.index for x in st_adata.obs.index])
sc.pl.embedding(st_adata, basis="spatial", color="filter")

Marker genes

CA1: Fibcd1
CA3: Npy2r
Dentate: Prox1
Oligo: Plp1
Astrocytes: Aldoc

In [8]:

sc.pl.embedding(sl_adata, basis="spatial", color=["Fibcd1", "Plp1", "Npy2r", "Prox1"], size=200)
sc.pl.embedding(sl_adata, basis="spatial", color=["Fibcd1", "Plp1", "Npy2r", "Prox1"], size=200)

Read single-cell RNA Sequencing data¶

In [9]:

sc_adata = sc.read_h5ad("scRef.h5ad")
sc_adata.obs["liger_ident_coarse"].value_counts()
sc_adata = sc.read_h5ad("scRef.h5ad")
sc_adata.obs["liger_ident_coarse"].value_counts()

Out[9]:

Polydendrocyte           1000
Oligodendrocyte          1000
CA1                      1000
CA3                      1000
Denate                   1000
Endothelial_Stalk        1000
Entorihinal              1000
Interneuron              1000
Microglia_Macrophages    1000
Mural                    1000
Neuron.Slc17a6           1000
Astrocyte                1000
Neurogenesis              988
Ependymal                 715
Endothelial_Tip           676
Cajal_Retzius             672
Choroid                    44
Name: liger_ident_coarse, dtype: int64

In [10]:

# preprocessing
sc.pp.filter_genes(sc_adata, min_counts=10)
sc_adata.layers["counts"] = sc_adata.X.copy()
# normalization for selection of highly variable genes (not for the deconvolution)
sc.pp.normalize_total(sc_adata, target_sum=10e4)
sc.pp.log1p(sc_adata)
sc_adata.raw = sc_adata
sc.pp.highly_variable_genes(
    sc_adata,
    n_top_genes=2000,
    subset=True,
    layer="counts",
    flavor="seurat_v3"
)
# filter genes to be the same on the spatial data
intersect = np.intersect1d(sc_adata.var_names, st_adata.var_names)
sl_adata = sl_adata[:, intersect].copy()
sc_adata = sc_adata[:, intersect].copy()
# preprocessing
sc.pp.filter_genes(sc_adata, min_counts=10)
sc_adata.layers["counts"] = sc_adata.X.copy()
# normalization for selection of highly variable genes (not for the deconvolution)
sc.pp.normalize_total(sc_adata, target_sum=10e4)
sc.pp.log1p(sc_adata)
sc_adata.raw = sc_adata
sc.pp.highly_variable_genes(
    sc_adata,
    n_top_genes=2000,
    subset=True,
    layer="counts",
    flavor="seurat_v3"
)
# filter genes to be the same on the spatial data
intersect = np.intersect1d(sc_adata.var_names, st_adata.var_names)
sl_adata = sl_adata[:, intersect].copy()
sc_adata = sc_adata[:, intersect].copy()

In [11]:

# visualize for fun
sc.tl.pca(sc_adata, svd_solver='arpack')
sc.pp.neighbors(sc_adata, n_pcs = 30, n_neighbors = 20)
sc.tl.umap(sc_adata)
# visualize for fun
sc.tl.pca(sc_adata, svd_solver='arpack')
sc.pp.neighbors(sc_adata, n_pcs = 30, n_neighbors = 20)
sc.tl.umap(sc_adata)

/usr/local/lib/python3.7/dist-packages/numba/np/ufunc/parallel.py:363: NumbaWarning: The TBB threading layer requires TBB version 2019.5 or later i.e., TBB_INTERFACE_VERSION >= 11005. Found TBB_INTERFACE_VERSION = 9107. The TBB threading layer is disabled.
  warnings.warn(problem)

In [12]:

sc.pl.umap(sc_adata, color=['liger_ident_coarse'])
sc.pl.umap(sc_adata, color=['liger_ident_coarse'])

In [13]:

# restore counts for CLI Stereoscope 
sc_adata.X = sc_adata.layers["counts"].copy()
# restore counts for CLI Stereoscope 
sc_adata.X = sc_adata.layers["counts"].copy()

Learn RNA model (on scRNA-seq data)¶

In [14]:

import scvi
from scvi.external.stereoscope import RNAStereoscope, SpatialStereoscope
import scvi
from scvi.external.stereoscope import RNAStereoscope, SpatialStereoscope

Global seed set to 0

In [15]:

scvi.data.setup_anndata(sc_adata, layer="counts", labels_key="liger_ident_coarse")
scvi.data.setup_anndata(sc_adata, layer="counts", labels_key="liger_ident_coarse")

INFO     No batch_key inputted, assuming all cells are same batch                            
INFO     Using labels from adata.obs["liger_ident_coarse"]                                   
INFO     Using data from adata.layers["counts"]                                              
INFO     Successfully registered anndata object containing 15095 cells, 1898 vars, 1 batches,
         17 labels, and 0 proteins. Also registered 0 extra categorical covariates and 0     
         extra continuous covariates.                                                        
INFO     Please do not further modify adata until model is trained.

/usr/local/lib/python3.7/dist-packages/sklearn/utils/deprecation.py:87: FutureWarning: Function setup_anndata is deprecated; Please use the model-specific setup_anndata methods instead. The global method will be removed in version 0.15.0.
  warnings.warn(msg, category=FutureWarning)

In [16]:

# time per epoch on colab for this is 2.15 seconds per epoch
# original implementation recommends 75,000 epochs. That would take 44 hours with our code.
# early stopping stopped at 124 epochs!
train = True
if train:
    sc_model = RNAStereoscope(sc_adata)
    # here we use "weight_decay=0" to reproduce Stereoscope, but we advise to use the default parameter for all other applications
    sc_model.train(max_epochs=100, plan_kwargs={"weight_decay": 0}, check_val_every_n_epoch=1, train_size=1.0, validation_size=0.0,
                   early_stopping=False, early_stopping_monitor="elbo_validation", early_stopping_patience=5, early_stopping_min_delta=1)
    plt.plot(sc_model.history["elbo_train"][10:], label="train")
    plt.title("loss over training epochs")
    plt.legend()
    plt.show()
    sc_model.save("scmodel", overwrite=True)
else:
    sc_model = RNAStereoscope.load("scmodel", sc_adata)
    print("Loaded RNA model from file!")
# time per epoch on colab for this is 2.15 seconds per epoch
# original implementation recommends 75,000 epochs. That would take 44 hours with our code.
# early stopping stopped at 124 epochs!
train = True
if train:
    sc_model = RNAStereoscope(sc_adata)
    # here we use "weight_decay=0" to reproduce Stereoscope, but we advise to use the default parameter for all other applications
    sc_model.train(max_epochs=100, plan_kwargs={"weight_decay": 0}, check_val_every_n_epoch=1, train_size=1.0, validation_size=0.0,
                   early_stopping=False, early_stopping_monitor="elbo_validation", early_stopping_patience=5, early_stopping_min_delta=1)
    plt.plot(sc_model.history["elbo_train"][10:], label="train")
    plt.title("loss over training epochs")
    plt.legend()
    plt.show()
    sc_model.save("scmodel", overwrite=True)
else:
    sc_model = RNAStereoscope.load("scmodel", sc_adata)
    print("Loaded RNA model from file!")

GPU available: True, used: True
TPU available: False, using: 0 TPU cores
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]

Epoch 100/100: 100%|██████████| 100/100 [00:55<00:00,  1.82it/s, loss=5.67e+04, v_num=1]

small detour to get the gene by cell type matrix for the figure

In [17]:

params = sc_model.module.get_params()

import scipy.cluster.hierarchy as sch

Y = sch.linkage(params[0], method="centroid")
Z1 = sch.dendrogram(Y, orientation="left")
X = sch.linkage(params[0].T, method="single")
Z2 = sch.dendrogram(X)

# .matshow(D, aspect='auto', origin='lower', cmap=pylab.cm.YlGnBu)
plt.figure(figsize=(3, 3))
plt.imshow(params[0][Z1["leaves"]][:, Z2["leaves"]][:200], aspect=0.1, cmap="YlGnBu")
plt.gca().axes.xaxis.set_ticks([])
plt.gca().axes.yaxis.set_ticks([])
plt.xlabel("Cell types")
plt.ylabel("Genes")
plt.savefig("figs/stereoscope/dict_gene_exp.pdf")
plt.show()
params = sc_model.module.get_params()

import scipy.cluster.hierarchy as sch

Y = sch.linkage(params[0], method="centroid")
Z1 = sch.dendrogram(Y, orientation="left")
X = sch.linkage(params[0].T, method="single")
Z2 = sch.dendrogram(X)

# .matshow(D, aspect='auto', origin='lower', cmap=pylab.cm.YlGnBu)
plt.figure(figsize=(3, 3))
plt.imshow(params[0][Z1["leaves"]][:, Z2["leaves"]][:200], aspect=0.1, cmap="YlGnBu")
plt.gca().axes.xaxis.set_ticks([])
plt.gca().axes.yaxis.set_ticks([])
plt.xlabel("Cell types")
plt.ylabel("Genes")
plt.savefig("figs/stereoscope/dict_gene_exp.pdf")
plt.show()

We can check that marker genes have high value of the parameters in the single-cell model. For example, a CA1 marker. Check that the parameter of the marker gene is larger for the CA1 cell type!

In [18]:

list(zip(sc_adata.uns["_scvi"]["categorical_mappings"]["_scvi_labels"]["mapping"], params[0][np.where(sc_adata.var_names == "Fibcd1")[0][0],]))
list(zip(sc_adata.uns["_scvi"]["categorical_mappings"]["_scvi_labels"]["mapping"], params[0][np.where(sc_adata.var_names == "Fibcd1")[0][0],]))

Out[18]:

[('Astrocyte', -5.235477),
 ('CA1', -0.7815605),
 ('CA3', -4.9932957),
 ('Cajal_Retzius', -3.7664227),
 ('Choroid', -4.8968277),
 ('Denate', -4.054734),
 ('Endothelial_Stalk', -5.283406),
 ('Endothelial_Tip', -5.260552),
 ('Entorihinal', -4.5145183),
 ('Ependymal', -5.1968904),
 ('Interneuron', -4.912203),
 ('Microglia_Macrophages', -4.367293),
 ('Mural', -5.234314),
 ('Neurogenesis', -5.26265),
 ('Neuron.Slc17a6', -4.734822),
 ('Oligodendrocyte', -5.039871),
 ('Polydendrocyte', -4.4008713)]

In [19]:

list(zip(sc_adata.uns["_scvi"]["categorical_mappings"]["_scvi_labels"]["mapping"], params[0][np.where(sc_adata.var_names == "Fibcd1")[0][0],]))
list(zip(sc_adata.uns["_scvi"]["categorical_mappings"]["_scvi_labels"]["mapping"], params[0][np.where(sc_adata.var_names == "Fibcd1")[0][0],]))

Out[19]:

[('Astrocyte', -5.235477),
 ('CA1', -0.7815605),
 ('CA3', -4.9932957),
 ('Cajal_Retzius', -3.7664227),
 ('Choroid', -4.8968277),
 ('Denate', -4.054734),
 ('Endothelial_Stalk', -5.283406),
 ('Endothelial_Tip', -5.260552),
 ('Entorihinal', -4.5145183),
 ('Ependymal', -5.1968904),
 ('Interneuron', -4.912203),
 ('Microglia_Macrophages', -4.367293),
 ('Mural', -5.234314),
 ('Neurogenesis', -5.26265),
 ('Neuron.Slc17a6', -4.734822),
 ('Oligodendrocyte', -5.039871),
 ('Polydendrocyte', -4.4008713)]

Infer proportion for spatial data¶

In [20]:

sl_adata.layers["counts"] = sl_adata.X.copy()
scvi.data.setup_anndata(sl_adata, layer="counts")
sl_adata.layers["counts"] = sl_adata.X.copy()
scvi.data.setup_anndata(sl_adata, layer="counts")

INFO     No batch_key inputted, assuming all cells are same batch                            
INFO     No label_key inputted, assuming all cells have same label                           
INFO     Using data from adata.layers["counts"]                                              
INFO     Successfully registered anndata object containing 472 cells, 1898 vars, 1 batches, 1
         labels, and 0 proteins. Also registered 0 extra categorical covariates and 0 extra  
         continuous covariates.                                                              
INFO     Please do not further modify adata until model is trained.

/usr/local/lib/python3.7/dist-packages/sklearn/utils/deprecation.py:87: FutureWarning: Function setup_anndata is deprecated; Please use the model-specific setup_anndata methods instead. The global method will be removed in version 0.15.0.
  warnings.warn(msg, category=FutureWarning)

In [21]:

# time per epoch on colab for this is 0.09 seconds per epoch
# original implementation uses 75,000 epochs, that would take 1.8 hours with this code
# early stopping stopped at 
train = True
if train:
    # here we use "minibatch" to reproduce Stereoscope, but we advise to use the default parameter for all other applications
    # here we use "weight_decay=0" to reproduce Stereoscope, but we advise to use the default parameter for all other applications
    spatial_model = SpatialStereoscope.from_rna_model(
        sl_adata, sc_model, prior_weight="minibatch"
    )
    spatial_model.train(max_epochs=5000, plan_kwargs={"weight_decay": 0}, check_val_every_n_epoch=1, early_stopping=False, 
                        early_stopping_monitor="elbo_train", early_stopping_patience=5, early_stopping_min_delta=0.4)
    plt.plot(spatial_model.history["elbo_train"], label="train")
    plt.title("loss over training epochs")
    plt.legend()
    plt.show()
    spatial_model.save("stmodel", overwrite=True)
else:
    spatial_model = SpatialStereoscope.load("stmodel", sl_adata)
    print("Loaded Spatial model from file!")
# time per epoch on colab for this is 0.09 seconds per epoch
# original implementation uses 75,000 epochs, that would take 1.8 hours with this code
# early stopping stopped at 
train = True
if train:
    # here we use "minibatch" to reproduce Stereoscope, but we advise to use the default parameter for all other applications
    # here we use "weight_decay=0" to reproduce Stereoscope, but we advise to use the default parameter for all other applications
    spatial_model = SpatialStereoscope.from_rna_model(
        sl_adata, sc_model, prior_weight="minibatch"
    )
    spatial_model.train(max_epochs=5000, plan_kwargs={"weight_decay": 0}, check_val_every_n_epoch=1, early_stopping=False, 
                        early_stopping_monitor="elbo_train", early_stopping_patience=5, early_stopping_min_delta=0.4)
    plt.plot(spatial_model.history["elbo_train"], label="train")
    plt.title("loss over training epochs")
    plt.legend()
    plt.show()
    spatial_model.save("stmodel", overwrite=True)
else:
    spatial_model = SpatialStereoscope.load("stmodel", sl_adata)
    print("Loaded Spatial model from file!")

GPU available: True, used: True
TPU available: False, using: 0 TPU cores
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]

Epoch 5000/5000: 100%|██████████| 5000/5000 [02:36<00:00, 32.01it/s, loss=2.58e+05, v_num=1]

In [22]:

spatial_model.history["elbo_train"][-20:]
spatial_model.history["elbo_train"][-20:]

Out[22]:

	elbo_train
epoch
4980	2111.38
4981	2111.33
4982	2111.27
4983	2111.21
4984	2111.15
4985	2111.1
4986	2111.05
4987	2110.99
4988	2110.92
4989	2110.87
4990	2110.82
4991	2110.77
4992	2110.7
4993	2110.64
4994	2110.58
4995	2110.52
4996	2110.46
4997	2110.41
4998	2110.35
4999	2110.29

In [23]:

sl_adata.obsm["deconvolution"] = spatial_model.get_proportions()

# unconstrained proportions
import torch
import pandas as pd
with torch.no_grad():
    keep_noise = False
    res = torch.nn.functional.softplus(spatial_model.module.V).cpu().numpy().T
    if not keep_noise:
        res = res[:, :-1]
column_names = spatial_model.cell_type_mapping
sl_adata.obsm["deconvolution_unconstr"] = pd.DataFrame(
    data=res,
    columns=column_names,
    index=spatial_model.adata.obs.index,
)
sl_adata.obsm["deconvolution"] = spatial_model.get_proportions()

# unconstrained proportions
import torch
import pandas as pd
with torch.no_grad():
    keep_noise = False
    res = torch.nn.functional.softplus(spatial_model.module.V).cpu().numpy().T
    if not keep_noise:
        res = res[:, :-1]
column_names = spatial_model.cell_type_mapping
sl_adata.obsm["deconvolution_unconstr"] = pd.DataFrame(
    data=res,
    columns=column_names,
    index=spatial_model.adata.obs.index,
)

In [24]:

for ct in sl_adata.obsm["deconvolution"].columns:
    sl_adata.obs[ct] = sl_adata.obsm["deconvolution"][ct]
for ct in sl_adata.obsm["deconvolution"].columns:
    sl_adata.obs[ct] = sl_adata.obsm["deconvolution"][ct]

In [25]:

sc.pl.embedding(sl_adata, basis="spatial", color=["CA1", "CA3", "Denate", "Oligodendrocyte"], size=200)
sc.pl.embedding(sl_adata, basis="spatial", color=["CA1", "CA3", "Denate", "Oligodendrocyte"], size=200)

In [26]:

sc.pl.embedding(sl_adata, basis="spatial", color=["Astrocyte", "Entorihinal", "Endothelial_Stalk", "Endothelial_Tip"], size=200)
sc.pl.embedding(sl_adata, basis="spatial", color=["Astrocyte", "Entorihinal", "Endothelial_Stalk", "Endothelial_Tip"], size=200)

In [27]:

sl_adata.obsm["deconvolution"].columns
sl_adata.obsm["deconvolution"].columns

Out[27]:

Index(['Astrocyte', 'CA1', 'CA3', 'Cajal_Retzius', 'Choroid', 'Denate',
       'Endothelial_Stalk', 'Endothelial_Tip', 'Entorihinal', 'Ependymal',
       'Interneuron', 'Microglia_Macrophages', 'Mural', 'Neurogenesis',
       'Neuron.Slc17a6', 'Oligodendrocyte', 'Polydendrocyte'],
      dtype='object')

small detour to get a nice plot for the paper

In [28]:

plt.scatter(sl_adata.obsm["spatial"][:, 0],  sl_adata.obsm["spatial"][:, 1], c=sl_adata.obsm["deconvolution"].values.argmax(1), cmap="Set2")
# cmap="YlGnBu")
plt.gca().axes.xaxis.set_ticks([])
plt.gca().axes.yaxis.set_ticks([])
# plt.xlabel("Cell types")
# plt.ylabel("Genes")
plt.savefig("figs/stereoscope/cell_type_location.pdf")
plt.scatter(sl_adata.obsm["spatial"][:, 0],  sl_adata.obsm["spatial"][:, 1], c=sl_adata.obsm["deconvolution"].values.argmax(1), cmap="Set2")
# cmap="YlGnBu")
plt.gca().axes.xaxis.set_ticks([])
plt.gca().axes.yaxis.set_ticks([])
# plt.xlabel("Cell types")
# plt.ylabel("Genes")
plt.savefig("figs/stereoscope/cell_type_location.pdf")

Save files for input to original Stereoscope

In [29]:

del sl_adata.obs["filter"]
sl_adata.write_h5ad("sl_adata.h5ad")
sc_adata.write_h5ad("sc_adata.h5ad")
del sl_adata.obs["filter"]
sl_adata.write_h5ad("sl_adata.h5ad")
sc_adata.write_h5ad("sc_adata.h5ad")

Install Stereoscope from source¶

In [30]:

# must use my own fork because of circular dependencies in the original codebase that didn't work out for Colab (unknown reason; only two import lines deleted)
!git clone https://github.com/adamgayoso/stereoscope
# must use my own fork because of circular dependencies in the original codebase that didn't work out for Colab (unknown reason; only two import lines deleted)
!git clone https://github.com/adamgayoso/stereoscope 

Cloning into 'stereoscope'...
remote: Enumerating objects: 590, done.
remote: Counting objects: 100% (282/282), done.
remote: Compressing objects: 100% (216/216), done.
remote: Total 590 (delta 151), reused 171 (delta 63), pack-reused 308
Receiving objects: 100% (590/590), 57.61 MiB | 32.48 MiB/s, done.
Resolving deltas: 100% (304/304), done.

In [31]:

%%capture
!cd stereoscope; ./setup.py install
%%capture
!cd stereoscope; ./setup.py install

In [32]:

!python -c "import stsc; print(stsc.__version__)"
!stereoscope test
!python -c "import stsc; print(stsc.__version__)"
!stereoscope test

stereoscope : 0.2.0
Successfully installed stereoscope CLI

Run Stereoscope & read results¶

In [33]:

# SCMODEL: time per epoch on colab for this is 1.8 seconds per epoch
# original implementation uses 75,000 epochs, that would take 37.5 hours

# STMODEL: time per epoch on colab for this is 0.036 seconds per epoch
# original implementation uses 75,000 epochs, that would take 45 min

import time


start = time.time()
!stereoscope run --sc_cnt sc_adata.h5ad --label_colname liger_ident_coarse --st_cnt sl_adata.h5ad \
-sce 100  -o stereo-out -n 2000 -ste 5000 --gpu -stb 128 -scb 128
end = time.time()

print(end-start)
# SCMODEL: time per epoch on colab for this is 1.8 seconds per epoch
# original implementation uses 75,000 epochs, that would take 37.5 hours

# STMODEL: time per epoch on colab for this is 0.036 seconds per epoch
# original implementation uses 75,000 epochs, that would take 45 min

import time


start = time.time()
!stereoscope run --sc_cnt sc_adata.h5ad --label_colname liger_ident_coarse --st_cnt sl_adata.h5ad \
-sce 100  -o stereo-out -n 2000 -ste 5000 --gpu -stb 128 -scb 128
end = time.time()

print(end-start)

[2021-10-14 17:26:57,344 - stsc - INFO ] >> Using device cuda
[2021-10-14 17:26:57,344 - stsc - INFO ] >> fitting sc data | count file : sc_adata.h5ad | labels file : None
[2021-10-14 17:26:57,808 - stsc - INFO ] >> SC data GENES : 1898  SC data CELLS : 15095  SC data TYPES : 17 


Batch size: 128
Epoch : 100/100 | Loss : 6.791583E+06 | [ =================== ] 

[2021-10-14 17:27:53,717 - stsc - INFO ] >> fit st data section(s) : ['sl_adata.h5ad']
[2021-10-14 17:27:53,810 - stsc - INFO ] >> ST data GENES : 1898  ST data SPOTS : 472 


Batch size: 128
Epoch : 5000/5000 | Loss : 1.029873E+06 | [ =================== ] 

[2021-10-14 17:29:03,896 - stsc - INFO ] >> saving proportions for section sl_adata to stereo-out/sl_adata/W.2021-10-14172657.337016.tsv
[2021-10-14 17:29:03,906 - stsc - INFO ] >> saving proportions (unconstr) for section sl_adata to stereo-out/sl_adata/W_unconstr.2021-10-14172657.337016.tsv
131.0823802947998

In [34]:

# second run for assessing stochasticity reproducibility

start = time.time()
!stereoscope run --sc_cnt sc_adata.h5ad --label_colname liger_ident_coarse --st_cnt sl_adata.h5ad \
-sce 100  -o stereo-out-2 -n 2000 -ste 5000 --gpu -stb 128 -scb 128
end = time.time()

print(end-start)
# second run for assessing stochasticity reproducibility

start = time.time()
!stereoscope run --sc_cnt sc_adata.h5ad --label_colname liger_ident_coarse --st_cnt sl_adata.h5ad \
-sce 100  -o stereo-out-2 -n 2000 -ste 5000 --gpu -stb 128 -scb 128
end = time.time()

print(end-start)

[2021-10-14 17:29:08,424 - stsc - INFO ] >> Using device cuda
[2021-10-14 17:29:08,424 - stsc - INFO ] >> fitting sc data | count file : sc_adata.h5ad | labels file : None
[2021-10-14 17:29:08,907 - stsc - INFO ] >> SC data GENES : 1898  SC data CELLS : 15095  SC data TYPES : 17 


Batch size: 128
Epoch : 100/100 | Loss : 6.788670E+06 | [ =================== ] 

[2021-10-14 17:30:03,800 - stsc - INFO ] >> fit st data section(s) : ['sl_adata.h5ad']
[2021-10-14 17:30:03,888 - stsc - INFO ] >> ST data GENES : 1898  ST data SPOTS : 472 


Batch size: 128
Epoch : 5000/5000 | Loss : 1.019606E+06 | [ =================== ] 

[2021-10-14 17:31:14,477 - stsc - INFO ] >> saving proportions for section sl_adata to stereo-out-2/sl_adata/W.2021-10-14172908.417238.tsv
[2021-10-14 17:31:14,488 - stsc - INFO ] >> saving proportions (unconstr) for section sl_adata to stereo-out-2/sl_adata/W_unconstr.2021-10-14172908.417238.tsv
130.58961939811707

In [35]:

import os
path = "stereo-out/sl_adata/"
pl = os.listdir(path)
filename = pl[0] if "unconstr" in pl[0] else pl[1]
stereo_out = pd.read_csv(path+filename,  sep="\t", index_col=0)
stereo_out.index = sl_adata.obs.index.copy()

path = "stereo-out-2/sl_adata/"
pl = os.listdir(path)
filename = pl[0] if "unconstr" in pl[0] else pl[1]
stereo_out_2 = pd.read_csv(path+filename,  sep="\t", index_col=0)
stereo_out_2.index = sl_adata.obs.index.copy()
import os
path = "stereo-out/sl_adata/"
pl = os.listdir(path)
filename = pl[0] if "unconstr" in pl[0] else pl[1]
stereo_out = pd.read_csv(path+filename,  sep="\t", index_col=0)
stereo_out.index = sl_adata.obs.index.copy()

path = "stereo-out-2/sl_adata/"
pl = os.listdir(path)
filename = pl[0] if "unconstr" in pl[0] else pl[1]
stereo_out_2 = pd.read_csv(path+filename,  sep="\t", index_col=0)
stereo_out_2.index = sl_adata.obs.index.copy()

In [36]:

def prettify_axis(ax):
    # Hide the right and top spines
    ax.spines['right'].set_visible(False)
    ax.spines['top'].set_visible(False)

    # Only show ticks on the left and bottom spines
    ax.yaxis.set_ticks_position('left')
    ax.xaxis.set_ticks_position('bottom')
def prettify_axis(ax):
    # Hide the right and top spines
    ax.spines['right'].set_visible(False)
    ax.spines['top'].set_visible(False)

    # Only show ticks on the left and bottom spines
    ax.yaxis.set_ticks_position('left')
    ax.xaxis.set_ticks_position('bottom')

In [37]:

from scipy.stats import spearmanr, pearsonr
scores_spearman = []
scores_pearson = []

for i, ct in enumerate(sl_adata.obsm["deconvolution_unconstr"].columns):
    data = sl_adata.obsm["deconvolution_unconstr"][ct]
    scores_spearman.append(spearmanr(data, stereo_out[ct]).correlation)
    scores_pearson.append(pearsonr(data, stereo_out[ct])[0])

scores_spearman_2 = []
scores_pearson_2 = []

for i, ct in enumerate(sl_adata.obsm["deconvolution_unconstr"].columns):
    data = stereo_out_2[ct]
    scores_spearman_2.append(spearmanr(data, stereo_out[ct]).correlation)
    scores_pearson_2.append(pearsonr(data, stereo_out[ct])[0])
from scipy.stats import spearmanr, pearsonr
scores_spearman = []
scores_pearson = []

for i, ct in enumerate(sl_adata.obsm["deconvolution_unconstr"].columns):
    data = sl_adata.obsm["deconvolution_unconstr"][ct]
    scores_spearman.append(spearmanr(data, stereo_out[ct]).correlation)
    scores_pearson.append(pearsonr(data, stereo_out[ct])[0])

scores_spearman_2 = []
scores_pearson_2 = []

for i, ct in enumerate(sl_adata.obsm["deconvolution_unconstr"].columns):
    data = stereo_out_2[ct]
    scores_spearman_2.append(spearmanr(data, stereo_out[ct]).correlation)
    scores_pearson_2.append(pearsonr(data, stereo_out[ct])[0])

In [38]:

fig, axs = plt.subplots(2, 3, figsize=(7, 4))
indices = [11, 10, 4]
for i, ct in enumerate(["Interneuron", "Astrocyte", "Denate"]):
    prettify_axis(axs[0, i])
    axs[0, i].scatter(
        sl_adata.obsm["deconvolution_unconstr"][ct], stereo_out[ct], marker="x", s=10
    )
    ax = axs[0, i]
    lims = [
        np.min([ax.get_xlim(), ax.get_ylim()]),  # min of both axes
        np.max([ax.get_xlim(), ax.get_ylim()]),  # max of both axes
    ]
    # now plot both limits against eachother
    ax.plot(lims, lims, "k-", alpha=0.75, zorder=0)
    axs[0, i].set_title(ct)
    axs[0, i].text(
        0.05,
        0.7,
        "r=" + "{:0.2f}".format(scores_spearman[indices[i]]),
        fontsize=12,
        transform=axs[0, i].transAxes,
    )
    axs[0, i].set_aspect("equal", adjustable="box")
    ax.set_xlim(lims)
    ax.set_ylim(lims)
    # axs[i].plot(np.linspace(0, 1, 50), np.linspace(0, 1, 50), c="black")
indices = [7, 9, 12]
for i, ct in enumerate(["Endothelial_Tip", "Ependymal", "Mural"]):
    prettify_axis(axs[1, i])
    axs[1, i].scatter(
        sl_adata.obsm["deconvolution_unconstr"][ct], stereo_out[ct], marker="x", s=10
    )
    ax = axs[1, i]
    lims = [
        np.min([ax.get_xlim(), ax.get_ylim()]),  # min of both axes
        np.max([ax.get_xlim(), ax.get_ylim()]),  # max of both axes
    ]
    # now plot both limits against eachother
    ax.plot(lims, lims, "k-", alpha=0.75, zorder=0)
    axs[1, i].set_title(ct)
    axs[1, i].text(
        0.05,
        0.7,
        "r=" + "{:0.2f}".format(scores_spearman[indices[i]]),
        fontsize=12,
        transform=axs[1, i].transAxes,
    )
    axs[1, i].set_aspect("equal", adjustable="box")
    ax.set_xlim(lims)
    ax.set_ylim(lims)
    # axs[i].plot(np.linspace(0, 1, 50), np.linspace(0, 1, 50), c="black")
plt.tight_layout()
plt.savefig("figs/stereoscope/stereoscope_scatter.pdf", bbox_inches="tight", dpi=DPI)
plt.show()
fig, axs = plt.subplots(2, 3, figsize=(7, 4))
indices = [11, 10, 4]
for i, ct in enumerate(["Interneuron", "Astrocyte", "Denate"]):
    prettify_axis(axs[0, i])
    axs[0, i].scatter(
        sl_adata.obsm["deconvolution_unconstr"][ct], stereo_out[ct], marker="x", s=10
    )
    ax = axs[0, i]
    lims = [
        np.min([ax.get_xlim(), ax.get_ylim()]),  # min of both axes
        np.max([ax.get_xlim(), ax.get_ylim()]),  # max of both axes
    ]
    # now plot both limits against eachother
    ax.plot(lims, lims, "k-", alpha=0.75, zorder=0)
    axs[0, i].set_title(ct)
    axs[0, i].text(
        0.05,
        0.7,
        "r=" + "{:0.2f}".format(scores_spearman[indices[i]]),
        fontsize=12,
        transform=axs[0, i].transAxes,
    )
    axs[0, i].set_aspect("equal", adjustable="box")
    ax.set_xlim(lims)
    ax.set_ylim(lims)
    # axs[i].plot(np.linspace(0, 1, 50), np.linspace(0, 1, 50), c="black")
indices = [7, 9, 12]
for i, ct in enumerate(["Endothelial_Tip", "Ependymal", "Mural"]):
    prettify_axis(axs[1, i])
    axs[1, i].scatter(
        sl_adata.obsm["deconvolution_unconstr"][ct], stereo_out[ct], marker="x", s=10
    )
    ax = axs[1, i]
    lims = [
        np.min([ax.get_xlim(), ax.get_ylim()]),  # min of both axes
        np.max([ax.get_xlim(), ax.get_ylim()]),  # max of both axes
    ]
    # now plot both limits against eachother
    ax.plot(lims, lims, "k-", alpha=0.75, zorder=0)
    axs[1, i].set_title(ct)
    axs[1, i].text(
        0.05,
        0.7,
        "r=" + "{:0.2f}".format(scores_spearman[indices[i]]),
        fontsize=12,
        transform=axs[1, i].transAxes,
    )
    axs[1, i].set_aspect("equal", adjustable="box")
    ax.set_xlim(lims)
    ax.set_ylim(lims)
    # axs[i].plot(np.linspace(0, 1, 50), np.linspace(0, 1, 50), c="black")
plt.tight_layout()
plt.savefig("figs/stereoscope/stereoscope_scatter.pdf", bbox_inches="tight", dpi=DPI)
plt.show()

In [39]:

# the order of the three less reproducible cell types is the same (7, 9, 12) <-> (Endothelial_Tip, Ependymal, Mural) 
# the first two of those list are among the least present in the scRNA-seq
plt.figure(figsize=(3, 3))
plt.boxplot([scores_pearson, scores_pearson_2, scores_spearman, scores_spearman_2])
plt.xticks([1, 2, 3, 4], ['Pearson \n ours <-> theirs', 'Pearson \n theirs <-> theirs', 'Spearman \n ours <-> theirs', 'Spearman \n theirs <-> theirs',], rotation=90)
# plt.ylim((0, 1.1))
plt.title("Reproducibility")
plt.savefig("figs/stereoscope/stereoscope_box.pdf", bbox_inches="tight", dpi=DPI)
plt.show()
# the order of the three less reproducible cell types is the same (7, 9, 12) <-> (Endothelial_Tip, Ependymal, Mural) 
# the first two of those list are among the least present in the scRNA-seq
plt.figure(figsize=(3, 3))
plt.boxplot([scores_pearson, scores_pearson_2, scores_spearman, scores_spearman_2])
plt.xticks([1, 2, 3, 4], ['Pearson \n ours <-> theirs', 'Pearson \n theirs <-> theirs', 'Spearman \n ours <-> theirs', 'Spearman \n theirs <-> theirs',], rotation=90)
# plt.ylim((0, 1.1))
plt.title("Reproducibility")
plt.savefig("figs/stereoscope/stereoscope_box.pdf", bbox_inches="tight", dpi=DPI)
plt.show()

In [39]: