Endotyper API
- class EndotypY.endotyper.Endotyper[source]
Bases:
object
Endotyper class for endotyping analysis.
This class provides methods to perform endotyping analysis using a random walk approach. It includes methods for reading networks, preparing the random walk matrix, and performing the endotyping analysis.
Attributes:
- network_filestr | Path
The path to the input network file.
- rfloat
The damping factor for the random walk.
- annotate_local_neighborhood(enrichr_lib: str, organism='Human', sig_threshold=0.01)[source]
Get the Gene Ontology (GO) terms for a given gene and its RWR defined neighbors. This function uses the Enrichr library to perform Gene Set Enrichment Analysis (GSEA) and returns significant terms for the expanded neighborhood of genes (significance threshold = p-value for enrichment).
- Parameters:
enrichr_lib (str) – The name of the Enrichr library to use for GSEA.
organism (str) – The organism for which the GSEA is performed. Default is ‘Human’.
sig_threshold (float) – The significance threshold for the GSEA results. Default is 0.01.
- define_endotypes()[source]
Define endotypes based on the annotated local neighborhoods. This function computes the feature matrix from the neighborhood annotations (binary matrix) that describes which enrichment terms are present for each gene based on the enrichment of the gene +local neighborhood. The feature matrix is a binary matrix where rows are genes and columns are enrichment terms. Each entry is 1 if the term is present for the gene, and 0 otherwise.
It then performs recursive clustering to identify endotypes.
- Returns:
The Endotyper object with the endotypes defined.
- Return type:
self
- define_local_neighborhood(neighbor_percentage=1, scaling=True)[source]
Run RWR starting from every single gene in seed_genes and extract the top % genes from the visiting probabilities around each seed gene.
- Parameters:
neighbor_percentage (int) – Percentage of top genes to identify.
scaling (bool) – Whether to apply scaling to the RWR.
- explore_seed_clusters(scaling=True, k=200)[source]
Run the seed clustering process. This function computes the RWR for each seed gene, clusters them based on their neighborhoods, and plots the results.
- Parameters:
k_max (-) – Maximum neighborhood size to test.
scaling (-) – Whether to apply scaling to the RWR.
- import_network(network_file: str)[source]
Imports a network from a file.
- Parameters:
network_file (str) – Path to the network file. Supported formats are:
edges (-'.txt' or '.tsv' or '.csv' with two columns representing)
tab-separated.
- Returns:
The Endotyper object.
- Return type:
self
Notes
Lines that start with ‘#’ will be ignored.
Self-loops are eliminated in the last filtering step
- import_seeds(seeds_file: str)[source]
Imports seeds from a file and sets them as the seeds for the object.
- Parameters:
seeds_file (str) – The path to the seeds file.
- Returns:
The Endotyper object.
- Return type:
self
Notes
The seeds file should contain a list of seed genes, one per line.
Alternative formats for the seeds file is tab separated entries on first line of file.
- plot_endotype(iteration: int, cluster_id: int = None, node_size: list = ['degree', 'betweenness'], path_length: int = 2)[source]
Plots the endotype network for a given iteration and cluster. This function generates a network plot visualizing the identified endotype, highlighting seed genes, endotype genes, and connecting genes within the larger network. :param iteration: The iteration number of the endotyping clustering process. :type iteration: int :param cluster_id: The ID of the cluster to plot. If None, defaults to the first cluster. Defaults to None. :type cluster_id: int, optional :param node_size: A list of network measures to use for node sizing.
Defaults to [‘degree’, ‘betweenness’].
- Parameters:
path_length (int, optional) – The path length to consider when connecting endotype genes. Defaults to 2.
- plot_multiple_endotypes(node_size: list = ['degree', 'betweenness'], layout: str = 'spring', path_length: int = 2)[source]
Plots multiple endotypes on the network. This function iterates through the endotypes dictionary, combining endotypes from different iterations into a single dictionary. It then calls the plot_multiple_endotypes function to visualize these combined endotypes on the network. :param node_size: network measures to use for node sizing. :type node_size: list, optional :param Defaults to [‘degree’: :param ‘betweenness’].: :param layout: The layout algorithm to use for the network plot. Defaults to ‘spring’. :type layout: str, optional :param path_length: The path length to use for shortest path calculations. Defaults to 2. :type path_length: int, optional
- prepare_rwr(r=0.8)[source]
Prepares the Random Walk with Restart (RWR) matrix.
This function computes the RWR matrix based on the network and restart probability, using the formula (I-r*M)^-1 where M is the column-wise normalized Markov matrix according to M = A D^{-1}.
To provide the option of scaling the visiting probabilities, a scaling matrix is also created, which is the diagonal matrix of the inverse degree of the nodes in graph G.
- Parameters:
r (float, optional) – Damping factor/restart probability. Defaults to 0.8.
- Returns:
- Returns the Endotyper object with the RWR matrix, scaling matrix,
and index to ensembl mapping stored as attributes.
- Return type:
self