partipy.extract_specific_processes#
- partipy.extract_specific_processes(est, pval, n=20, p_threshold=0.05)#
Extract the top biological processes that are uniquely enriched in each archetype.
This function identifies the top
nbiological processes for each archetype based on their enrichment scores (est) and associated p-values (pval). Only processes with p-values belowp_thresholdin a given archetype are considered. A “specificity” score is computed for each process, reflecting how much more enriched it is in the target archetype compared to others.- Parameters:
est (
pd.DataFrame) – A DataFrame of shape (n_archetypes, n_processes) containing the estimated enrichment scores for each process and archetype.pval (
pd.DataFrame) – A DataFrame of shape (n_archetypes, n_processes) containing the p-values corresponding to the enrichment scores inest.n (int, default:
20) – The number of top processes to extract per archetype.p_threshold (float, default:
0.05) – The p-value threshold for filtering processes. Only processes with p-values below this threshold are considered.
- Return type:
- Returns:
dict : [int, pd.DataFrame] A dictionary mapping each archetype index to a DataFrame containing the top
nprocesses specific to that archetype. Each DataFrame includes: - “Process”: Name of the biological process. - “{archetype indices}”: Enrichment score in the given archetype. - “act_{archetype indices}”: Duplicate enrichment score columns for future compatibility. - “pval_{archetype indices}”: P-values corresponding to each enrichment score. - “specificity”: Score indicating how uniquely enriched the process is compared to other archetypes.