RNA velocity in dentate gyrus#

RNA velocity analysis with the EM and steady-state models using data preprocessed with alevin_prepref_isoseparate_cdna_introns_decoy_gentrome.

Requires

  • adata_generation.ipynb

  • starsolo_var_names.csv from starsolo_vi.ipynb

Output

  • DATA_DIR/dentategyrus/velocities/alevin_sep_decoy_gtr_steady_state.pickle

  • DATA_DIR/dentategyrus/velocities/alevin_sep_decoy_gtr_em.pickle

Library imports#

import sys

import pandas as pd

import scanpy as sc
import scvelo as scv

sys.path.insert(0, "../../../")
from paths import DATA_DIR
sc.logging.print_version_and_date()
Running Scanpy 1.9.1, on 2022-07-19 09:42.

General settings#

# set verbosity levels
sc.settings.verbosity = 2
scv.settings.verbosity = 3
scv.settings.set_figure_params('scvelo', dpi_save=400, dpi=80, transparent=True, fontsize=20, color_map='viridis')
scv.settings.plot_prefix = ""

Constants#

N_JOBS = 8
STARSOLO_VAR_NAMES = pd.read_csv('starsolo_var_names.csv', index_col=0, header=None).index.tolist()

Data loading#

adata = sc.read(
    DATA_DIR / 'dentategyrus' / "alevin_prepref_isoseparate_cdna_introns_decoy_gentrome.h5ad"
)
adata
AnnData object with n_obs × n_vars = 2914 × 54144
    obs: 'cell_index', 'clusters', 'age.days.', 'clusters_enlarged'
    layers: 'spliced', 'unspliced'
scv.pl.proportions(adata)
../../_images/ba06e950b7af9ef441813829729c40e7923b04dac90a97b3fd7e745ba4fdb0ad.png

Data pre-processing#

scv.pp.filter_and_normalize(adata, min_shared_counts=20, n_top_genes=2000, retain_genes=STARSOLO_VAR_NAMES)
scv.pp.moments(adata, n_pcs=30, n_neighbors=30)
Filtered out 45511 genes that are detected 20 counts (shared).
WARNING: Did not normalize X as it looks processed already. To enforce normalization, set `enforce=True`.
WARNING: Did not normalize spliced as it looks processed already. To enforce normalization, set `enforce=True`.
WARNING: Did not normalize unspliced as it looks processed already. To enforce normalization, set `enforce=True`.
Extracted 2279 highly variable genes.
Logarithmized X.
computing PCA
    on highly variable genes
    with n_comps=30
    finished (0:00:02)
computing neighbors
    finished (0:00:35) --> added 
    'distances' and 'connectivities', weighted adjacency matrices (adata.obsp)
computing moments based on connectivities
    finished (0:00:03) --> added 
    'Ms' and 'Mu', moments of un/spliced abundances (adata.layers)

Model fitting#

Steady state#

scv.tl.velocity(adata, mode="steady_state")
adata = adata[:, STARSOLO_VAR_NAMES].copy()
computing velocities
    finished (0:00:01) --> added 
    'velocity', velocity vectors for each individual cell (adata.layers)
pd.DataFrame(
    adata.layers['velocity'],
    index=adata.obs_names,
    columns=adata.var_names
).to_pickle(
    DATA_DIR / 'dentategyrus' / 'velocities' / 'alevin_sep_decoy_gtr_steady_state.pickle'
)

EM#

scv.tl.recover_dynamics(adata, var_names=STARSOLO_VAR_NAMES, n_jobs=N_JOBS)
scv.tl.velocity(adata, mode='dynamical')
recovering dynamics (using 8/112 cores)
    finished (0:04:00) --> added 
    'fit_pars', fitted parameters for splicing dynamics (adata.var)
computing velocities
    finished (0:00:01) --> added 
    'velocity', velocity vectors for each individual cell (adata.layers)
pd.DataFrame(
    adata[:, ~adata.var['fit_alpha'].isnull()].layers['velocity'],
    index=adata.obs_names,
    columns=adata.var_names[~adata.var['fit_alpha'].isnull()]
).to_pickle(
    DATA_DIR / 'dentategyrus' / 'velocities' / 'alevin_sep_decoy_gtr_em.pickle'
)