partipy.compute_bootstrap_variance#

partipy.compute_bootstrap_variance(adata, n_bootstrap, n_archetypes_list=None, init=None, optim=None, weight=None, max_iter=None, early_stopping=True, rel_tol=None, coreset_algorithm=None, coreset_fraction=0.1, coreset_size=None, delta=0.0, seed=42, save_to_anndata=True, return_result=False, n_jobs=-1, verbose=False, force_recompute=False, **optim_kwargs)#

Perform bootstrap sampling to compute archetypes and assess their stability.

This function generates bootstrap samples from the data, computes archetypes for each sample, aligns them with the reference archetypes, and stores the results in adata.uns["AA_bootstrap"]. It allows assessing the stability of the archetypes across multiple bootstrap iterations.

Parameters:

adata (anndata.AnnData) – The AnnData object containing the data to fit the archetypes. The data should be available in adata.obsm[obsm_key].
n_bootstrap (int) – The number of bootstrap samples to generate.
n_archetypes_list (Union[int, List[int]], default list(range(2, 8))) – A list specifying the numbers of archetypes to evaluate. Can also be a single int.
optim ({"regularized_nnls", "projected_gradients", "frank_wolfe"}, default "projected_gradients") –
Optimization algorithm to use (aliases "PCHA" → "projected_gradients" and "FW" → "frank_wolfe" are also accepted). Options are:
- "projected_gradients": Projected gradient descent (also known as PCHA) [MH12].
- "frank_wolfe": Frank-Wolfe algorithm (often abbreviated FW) [BKHT15].
- "regularized_nnls": Regularized non-negative least squares [CB94].
See partipy.schema.OPTIM_ALGS for all available options.
init ({"uniform", "furthest_sum", "plus_plus"}, default "plus_plus") –
Initialization method for the archetypes. Options are:
- "plus_plus": Archetypal++ initialization [MS24].
- "furthest_sum": Utilizes the furthest sum algorithm [MH12].
- "uniform": Random initialization.
See partipy.schema.INIT_ALGS for all available options.
seed (int, default 42) – Random seed to use for reproducible results.
save_to_anndata (bool, default True) – Whether to save the results to adata.uns["AA_bootstrap"]. If False, the result is returned.
n_jobs (int, default -1) – The number of jobs to run in parallel. -1 uses all available cores.
verbose (bool, default False) – Whether to print the progress
force_recompute (bool, default False) – Recompute bootstrap samples even if cached results for the given configuration already exist.
**optim_kwargs – TODO: Additional keyword arguments passed to AA class.
weight (None | str)
max_iter (None | int)
early_stopping (bool)
rel_tol (None | float)
coreset_algorithm (None | str)
coreset_fraction (float)
coreset_size (None | int)
delta (float)
return_result (bool)

Return type:

None | dict[str, DataFrame]

Returns:

None The results are stored in adata.uns["AA_bootstrap"] as a DataFrame with the following columns: - x_i: The coordinates of the archetypes in the i-th principal component. - archetype: The archetype index. - iter: The bootstrap iteration index (0 for the reference archetypes). - reference: A boolean indicating whether the archetype is from the reference model. - mean_variance: The mean variance of all archetype coordinates across bootstrap samples. - variance_per_archetype: The mean variance of each archetype coordinates across bootstrap samples.

partipy.compute_bootstrap_variance

Contents

partipy.compute_bootstrap_variance#