partipy.extract_enriched_processes#
- partipy.extract_enriched_processes(est, pval, order='desc', n=20, p_threshold=0.05)#
Extract top enriched biological processes for each archetype based on significance and enrichment score.
This function filters and ranks biological processes using enrichment estimates (
est) and p-values (pval) from decoupler output. For each archetype, it selects the topnprocesses with p-values belowp_threshold, optionally sorting by the highest or lowest enrichment scores. It also computes a “specificity” score indicating how uniquely enriched a process is for a given archetype compared to others.- Parameters:
est (
pd.DataFrame) – A DataFrame of shape (n_archetypes, n_processes) containing the estimated enrichment scores for each process and archetype.pval (
pd.DataFrame) – A DataFrame of shape (n_archetypes, n_processes) containing the p-values corresponding to the enrichment scores inest.order (str, default
"desc") –The sorting order for selecting the top processes. Options are:
”desc”: Selects the top
nprocesses with the highest enrichment scores.”asc”: Selects the top
nprocesses with the lowest enrichment scores.
n (int, default
20) – The number of top processes to extract per archetype.p_threshold (float, default
0.05) – The p-value threshold for filtering processes. Only processes with p-values below this threshold are considered.
- Return type:
- Returns:
dict[int, pd.DataFrame] A dictionary mapping each archetype index to a DataFrame of the top
nenriched processes. Each DataFrame has the following columns: - “Process”: Name of the biological process. - “{archetype indices}”: Enrichment score for that process. - “act_{archetype indices}”: Duplicate enrichment score columns for future compatibility. - “pval_{archetype indices}”: P-values corresponding to each enrichment score. - “specificity”: A score indicating how uniquely enriched the process is in the given archetype.