Note
This page was generated from notebooks\Demo-micropools.ipynb. Interactive online version: .
Compass Micropooled Analysis¶
To install the required python packages, you can uncomment the “install_reqs()” call.
[1]:
def install_reqs():
!pip install pandas
!pip install numpy
#install_reqs()
[2]:
import sys
if "google.colab" in sys.modules:
!git clone -b docs https://github.com/YosefLab/Compass.git --depth 1
!cp -r Compass/notebooks/extdata ./
!rm -r /content/Compass
install_reqs()
[3]:
import pandas as pd
import numpy as np
This notebook demonstrates how to analyze the results of Compass when micropooled. In particular, how to determine what cell type each cluster/pool represents.
[4]:
cell_md = pd.read_csv("extdata/Th17/cell_metadata.csv", index_col=0)
reaction_penalties = pd.read_csv("extdata/Th17-micropooled/reactions.tsv", sep="\t", index_col=0)
micropools = pd.read_csv("extdata/Th17-micropooled/micropools.tsv", sep="\t", index_col=0)
[5]:
clusters = {}
for cell in micropools.index:
mc = micropools.loc[cell, 'microcluster']
if mc in clusters:
clusters[mc] += [cell]
else:
clusters[mc] = [cell]
[6]:
Th17p, Th17n = {cl:0 for cl in clusters}, {cl:0 for cl in clusters}
for cl in clusters:
for cell in clusters[cl]:
cell_type = cell_md.loc[cell, 'cell_type']
if cell_type == 'Th17p':
Th17p[cl] += 1
elif cell_type == 'Th17n':
Th17n[cl] += 1
else:
print("Should not happen")
pctTh17p = {cl:Th17p[cl] / (Th17p[cl] + Th17n[cl]) for cl in clusters}
This gives percentage of the clusters that are Th17p or Th17n. In this case, all of the clusters are one cell type or the other. Then you can apply the regular analysis to the micropooled data and treat each cluster as being the predominant cell type.
[7]:
pctTh17p
[7]:
{10: 1.0,
24: 1.0,
12: 1.0,
11: 1.0,
14: 1.0,
9: 1.0,
27: 1.0,
23: 1.0,
28: 1.0,
15: 1.0,
26: 1.0,
25: 1.0,
16: 1.0,
13: 1.0,
4: 0.0,
3: 0.0,
7: 0.0,
20: 0.0,
18: 0.0,
2: 0.0,
21: 0.0,
17: 0.0,
6: 0.0,
8: 0.0,
0: 0.0,
5: 0.0,
1: 0.0,
19: 0.0,
22: 0.0}
For dataset the micropooling worked very well and every single pool/cluster is only composed of one cell type. For others you may want to set a cutoff such as 90 percent used in the code below.
[8]:
def mc_type(pct):
if pct > 0.9:
return 'Th17p'
elif pct < 0.1:
return 'Th17n'
else:
return 'Uncertain'
micropool_md = {'cluster_'+str(cl):mc_type(pctTh17p[cl]) for cl in pctTh17p}
micropool_md = pd.DataFrame.from_dict(micropool_md, orient='index', columns=['cell_type'])
[9]:
micropool_md.to_csv("extdata/Th17-micropooled/cluster_metadata.csv")
This now gives a metadata to use when determining what each cluster represents, analgous to the regular cell metadata. Then the data can be analyzed as you would a regular dataset.
For the python notebook we demonstrate an analysis of Compass results with, this can be dome simply by changing the input files. Replace “extdata/Th17/reactions.tsv” with “extdata/Th17-micropooled/reactions.tsv” and “extdata/Th17/cell_metadata.csv” with “extdata/Th17-micropooled/cluster_metadata.csv”.
[10]:
micropool_md
[10]:
cell_type | |
---|---|
cluster_10 | Th17p |
cluster_24 | Th17p |
cluster_12 | Th17p |
cluster_11 | Th17p |
cluster_14 | Th17p |
cluster_9 | Th17p |
cluster_27 | Th17p |
cluster_23 | Th17p |
cluster_28 | Th17p |
cluster_15 | Th17p |
cluster_26 | Th17p |
cluster_25 | Th17p |
cluster_16 | Th17p |
cluster_13 | Th17p |
cluster_4 | Th17n |
cluster_3 | Th17n |
cluster_7 | Th17n |
cluster_20 | Th17n |
cluster_18 | Th17n |
cluster_2 | Th17n |
cluster_21 | Th17n |
cluster_17 | Th17n |
cluster_6 | Th17n |
cluster_8 | Th17n |
cluster_0 | Th17n |
cluster_5 | Th17n |
cluster_1 | Th17n |
cluster_19 | Th17n |
cluster_22 | Th17n |