Function Reference

Where does this stuff get included?

class hotspot.Hotspot(counts, model='danb', latent=None, distances=None, tree=None, umi_counts=None)[source]

Initialize a Hotspot object for analysis

Either latent or tree or distances is required.

Parameters:
  • counts (pandas.DataFrame) – Count matrix (shape is genes x cells)
  • model (string, optional) –

    Specifies the null model to use for gene expression. Valid choices are:

    • ’danb’: Depth-Adjusted Negative Binomial
    • ’bernoulli’: Models probability of detection
    • ’normal’: Depth-Adjusted Normal
    • ’none’: Assumes data has been pre-standardized
  • latent (pandas.DataFrame, optional) – Latent space encoding cell-cell similarities with euclidean distances. Shape is (cells x dims)
  • distances (pandas.DataFrame, optional) – Distances encoding cell-cell similarities directly Shape is (cells x cells)
  • tree (ete3.coretype.tree.TreeNode) – Root tree node. Can be created using ete3.Tree
  • umi_counts (pandas.Series, optional) – Total umi count per cell. Used as a size factor. If omitted, the sum over genes in the counts matrix is used
create_knn_graph(weighted_graph=False, n_neighbors=30, neighborhood_factor=3)[source]

Create’s the KNN graph and graph weights

The resulting matrices containing the neighbors and weights are stored in the object at self.neighbors and self.weights

Parameters:
  • weighted_graph (bool) – Whether or not to create a weighted graph
  • n_neighbors (int) – Neighborhood size
  • neighborhood_factor (float) – Used when creating a weighted graph. Sets how quickly weights decay relative to the distances within the neighborhood. The weight for a cell with a distance d will decay as exp(-d/D) where D is the distance to the n_neighbors/neighborhood_factor-th neighbor.
compute_hotspot(jobs=1)[source]

Perform feature selection using local autocorrelation

In addition to returning output, this also stores the output in self.results.

Alias for self.compute_autocorrelations

Parameters:jobs (int) – Number of parallel jobs to run
Returns:results

A dataframe with four columns:

  • C: Scaled -1:1 autocorrelation coeficients
  • Z: Z-score for autocorrelation
  • Pval: P-values computed from Z-scores
  • FDR: Q-values using the Benjamini-Hochberg procedure

Gene ids are in the index

Return type:pandas.DataFrame
compute_autocorrelations(jobs=1)[source]

Perform feature selection using local autocorrelation

In addition to returning output, this also stores the output in self.results

Parameters:jobs (int) – Number of parallel jobs to run
Returns:results

A dataframe with four columns:

  • C: Scaled -1:1 autocorrelation coeficients
  • Z: Z-score for autocorrelation
  • Pval: P-values computed from Z-scores
  • FDR: Q-values using the Benjamini-Hochberg procedure

Gene ids are in the index

Return type:pandas.DataFrame
compute_local_correlations(genes, jobs=1)[source]

Define gene-gene relationships with pair-wise local correlations

In addition to returning output, this method stores its result in self.local_correlation_z

Parameters:
  • genes (iterable of str) – gene identifies to compute local correlations on should be a smaller subset of all genes
  • jobs (int) – Number of parallel jobs to run
Returns:

local_correlation_z – local correlation Z-scores between genes shape is genes x genes

Return type:

pd.Dataframe

create_modules(min_gene_threshold=20, core_only=True, fdr_threshold=0.05)[source]

Groups genes into modules

In addition to being returned, the results of this method are retained in the object at self.modules. Additionally, the linkage matrix (in the same form as that of scipy.cluster.hierarchy.linkage) is saved in self.linkage for plotting or manual clustering.

Parameters:
  • min_gene_threshold (int) – Controls how small modules can be. Increase if there are too many modules being formed. Decrease if substructre is not being captured
  • core_only (bool) – Whether or not to assign ambiguous genes to a module or leave unassigned
  • fdr_threshold (float) – Correlation theshold at which to stop assigning genes to modules
Returns:

modules – Maps gene to module number. Unassigned genes are indicated with -1

Return type:

pandas.Series

calculate_module_scores()[source]

Calculate Module Scores

In addition to returning its result, this method stores its output in the object at self.module_scores

Returns:module_scores – Scores for each module for each gene Dimensions are genes x modules
Return type:pandas.DataFrame
plot_local_correlations(mod_cmap='tab10', vmin=-8, vmax=8, z_cmap='RdBu_r', yticklabels=False)[source]

Plots a clustergrid of the local correlation values

Parameters:
  • mod_cmap (valid matplotlib colormap str or object) – discrete colormap for module assignments on the left side
  • vmin (float) – minimum value for colorscale for Z-scores
  • vmax (float) – maximum value for colorscale for Z-scores
  • z_cmap (valid matplotlib colormap str or object) – continuous colormap for correlation Z-scores
  • yticklabels (bool) – Whether or not to plot all gene labels Default is false as there are too many. However if using this plot interactively you may with to set to true so you can zoom in and read gene names