Note

This page was generated from notebooks\Demo-micropools.ipynb. Interactive online version: Colab badge.

Compass Micropooled Analysis

To install the required python packages, you can uncomment the “install_reqs()” call.

[1]:
def install_reqs():
    !pip install pandas
    !pip install numpy
#install_reqs()
[2]:
import sys
if "google.colab" in sys.modules:
  !git clone -b docs https://github.com/YosefLab/Compass.git --depth 1
  !cp -r Compass/notebooks/extdata ./
  !rm -r /content/Compass
  install_reqs()
[3]:
import pandas as pd
import numpy as np

This notebook demonstrates how to analyze the results of Compass when micropooled. In particular, how to determine what cell type each cluster/pool represents.

[4]:
cell_md = pd.read_csv("extdata/Th17/cell_metadata.csv", index_col=0)
reaction_penalties = pd.read_csv("extdata/Th17-micropooled/reactions.tsv", sep="\t", index_col=0)
micropools = pd.read_csv("extdata/Th17-micropooled/micropools.tsv", sep="\t", index_col=0)
[5]:
clusters = {}
for cell in micropools.index:
    mc = micropools.loc[cell, 'microcluster']
    if mc in clusters:
        clusters[mc] += [cell]
    else:
        clusters[mc] = [cell]
[6]:
Th17p, Th17n = {cl:0 for cl in clusters}, {cl:0 for cl in clusters}
for cl in clusters:
    for cell in clusters[cl]:
        cell_type = cell_md.loc[cell, 'cell_type']
        if cell_type == 'Th17p':
            Th17p[cl] += 1
        elif cell_type == 'Th17n':
            Th17n[cl] += 1
        else:
            print("Should not happen")
pctTh17p = {cl:Th17p[cl] / (Th17p[cl] + Th17n[cl]) for cl in clusters}

This gives percentage of the clusters that are Th17p or Th17n. In this case, all of the clusters are one cell type or the other. Then you can apply the regular analysis to the micropooled data and treat each cluster as being the predominant cell type.

[7]:
pctTh17p
[7]:
{10: 1.0,
 24: 1.0,
 12: 1.0,
 11: 1.0,
 14: 1.0,
 9: 1.0,
 27: 1.0,
 23: 1.0,
 28: 1.0,
 15: 1.0,
 26: 1.0,
 25: 1.0,
 16: 1.0,
 13: 1.0,
 4: 0.0,
 3: 0.0,
 7: 0.0,
 20: 0.0,
 18: 0.0,
 2: 0.0,
 21: 0.0,
 17: 0.0,
 6: 0.0,
 8: 0.0,
 0: 0.0,
 5: 0.0,
 1: 0.0,
 19: 0.0,
 22: 0.0}

For dataset the micropooling worked very well and every single pool/cluster is only composed of one cell type. For others you may want to set a cutoff such as 90 percent used in the code below.

[8]:
def mc_type(pct):
    if pct > 0.9:
        return 'Th17p'
    elif pct < 0.1:
        return 'Th17n'
    else:
        return 'Uncertain'
micropool_md = {'cluster_'+str(cl):mc_type(pctTh17p[cl]) for cl in pctTh17p}
micropool_md = pd.DataFrame.from_dict(micropool_md, orient='index', columns=['cell_type'])
[9]:
micropool_md.to_csv("extdata/Th17-micropooled/cluster_metadata.csv")

This now gives a metadata to use when determining what each cluster represents, analgous to the regular cell metadata. Then the data can be analyzed as you would a regular dataset.

For the python notebook we demonstrate an analysis of Compass results with, this can be dome simply by changing the input files. Replace “extdata/Th17/reactions.tsv” with “extdata/Th17-micropooled/reactions.tsv” and “extdata/Th17/cell_metadata.csv” with “extdata/Th17-micropooled/cluster_metadata.csv”.

[10]:
micropool_md
[10]:
cell_type
cluster_10 Th17p
cluster_24 Th17p
cluster_12 Th17p
cluster_11 Th17p
cluster_14 Th17p
cluster_9 Th17p
cluster_27 Th17p
cluster_23 Th17p
cluster_28 Th17p
cluster_15 Th17p
cluster_26 Th17p
cluster_25 Th17p
cluster_16 Th17p
cluster_13 Th17p
cluster_4 Th17n
cluster_3 Th17n
cluster_7 Th17n
cluster_20 Th17n
cluster_18 Th17n
cluster_2 Th17n
cluster_21 Th17n
cluster_17 Th17n
cluster_6 Th17n
cluster_8 Th17n
cluster_0 Th17n
cluster_5 Th17n
cluster_1 Th17n
cluster_19 Th17n
cluster_22 Th17n