partipy.extract_specific_processes

partipy.extract_specific_processes#

partipy.extract_specific_processes(est, pval, n=20, p_threshold=0.05)#

Extract the top biological processes that are uniquely enriched in each archetype.

This function identifies the top n biological processes for each archetype based on their enrichment scores (est) and associated p-values (pval). Only processes with p-values below p_threshold in a given archetype are considered. A “specificity” score is computed for each process, reflecting how much more enriched it is in the target archetype compared to others.

Parameters:
  • est (pd.DataFrame) – A DataFrame of shape (n_archetypes, n_processes) containing the estimated enrichment scores for each process and archetype.

  • pval (pd.DataFrame) – A DataFrame of shape (n_archetypes, n_processes) containing the p-values corresponding to the enrichment scores in est.

  • n (int, default: 20) – The number of top processes to extract per archetype.

  • p_threshold (float, default: 0.05) – The p-value threshold for filtering processes. Only processes with p-values below this threshold are considered.

Return type:

dict[int, DataFrame]

Returns:

dict : [int, pd.DataFrame] A dictionary mapping each archetype index to a DataFrame containing the top n processes specific to that archetype. Each DataFrame includes: - “Process”: Name of the biological process. - “{archetype indices}”: Enrichment score in the given archetype. - “act_{archetype indices}”: Duplicate enrichment score columns for future compatibility. - “pval_{archetype indices}”: P-values corresponding to each enrichment score. - “specificity”: Score indicating how uniquely enriched the process is compared to other archetypes.