partipy.extract_enriched_processes

partipy.extract_enriched_processes#

partipy.extract_enriched_processes(est, pval, order='desc', n=20, p_threshold=0.05)#

Extract top enriched biological processes for each archetype based on significance and enrichment score.

This function filters and ranks biological processes using enrichment estimates (est) and p-values (pval) from decoupler output. For each archetype, it selects the top n processes with p-values below p_threshold, optionally sorting by the highest or lowest enrichment scores. It also computes a “specificity” score indicating how uniquely enriched a process is for a given archetype compared to others.

Parameters:
  • est (pd.DataFrame) – A DataFrame of shape (n_archetypes, n_processes) containing the estimated enrichment scores for each process and archetype.

  • pval (pd.DataFrame) – A DataFrame of shape (n_archetypes, n_processes) containing the p-values corresponding to the enrichment scores in est.

  • order (str, default "desc") –

    The sorting order for selecting the top processes. Options are:

    • ”desc”: Selects the top n processes with the highest enrichment scores.

    • ”asc”: Selects the top n processes with the lowest enrichment scores.

  • n (int, default 20) – The number of top processes to extract per archetype.

  • p_threshold (float, default 0.05) – The p-value threshold for filtering processes. Only processes with p-values below this threshold are considered.

Return type:

dict[int, DataFrame]

Returns:

dict[int, pd.DataFrame] A dictionary mapping each archetype index to a DataFrame of the top n enriched processes. Each DataFrame has the following columns: - “Process”: Name of the biological process. - “{archetype indices}”: Enrichment score for that process. - “act_{archetype indices}”: Duplicate enrichment score columns for future compatibility. - “pval_{archetype indices}”: P-values corresponding to each enrichment score. - “specificity”: A score indicating how uniquely enriched the process is in the given archetype.