partipy.compute_archetypes

partipy.compute_archetypes#

partipy.compute_archetypes(adata, n_archetypes, n_restarts=5, init=None, optim=None, weight=None, max_iter=None, early_stopping=True, rel_tol=None, coreset_algorithm=None, coreset_fraction=0.1, coreset_size=None, delta=0.0, verbose=None, seed=42, n_jobs=-1, save_to_anndata=True, return_result=False, archetypes_only=False, force_recompute=False, **optim_kwargs)#

Perform Archetypal Analysis (AA) on the input data.

This function is a wrapper around the AA class, offering a simplified interface for fitting the model and returning the results, or saving them to the AnnData object. It allows users to customize the archetype computation with various parameters for initialization, optimization, convergence, and output.

Parameters:
  • adata (anndata.AnnData) – The AnnData object containing the data to fit the archetypes. The data should be available in adata.obsm[obsm_key].

  • n_archetypes (int) – The number of archetypes to compute.

  • n_restarts (int) – The optimization with be run with n_restarts. The run with the lowest RSS will be kept.

  • init ({"uniform", "furthest_sum", "plus_plus"}, default "plus_plus") –

    Initialization method for the archetypes. Options are:

    • "plus_plus": Archetypal++ initialization [MS24].

    • "furthest_sum": Utilizes the furthest sum algorithm [MH12].

    • "uniform": Random initialization.

    See partipy.schema.INIT_ALGS for all available options.

  • optim ({"regularized_nnls", "projected_gradients", "frank_wolfe"}, default "projected_gradients") –

    Optimization algorithm to use (aliases "PCHA""projected_gradients" and "FW""frank_wolfe" are also accepted). Options are:

    • "projected_gradients": Projected gradient descent (also known as PCHA) [MH12].

    • "frank_wolfe": Frank-Wolfe algorithm (often abbreviated FW) [BKHT15].

    • "regularized_nnls": Regularized non-negative least squares [CB94].

    See partipy.schema.OPTIM_ALGS for all available options.

  • weight ({None, "bisquare", "huber"}, default None) –

    Weighting scheme for robust archetypal analysis, based on [EL11]. Options are:

    • None: No weighting (standard archetypal analysis).

    • "bisquare": Bisquare weighting for robust estimation.

    • "huber": Huber weighting for robust estimation.

    See partipy.schema.WEIGHT_ALGS for all available options.

  • max_iter (int, default 500) – Maximum number of iterations for the optimization algorithm.

  • early_stopping (bool, default True) – Whether to stop the optimization early if the relative change in RSS is below a certain threshold.

  • rel_tol (float, default 0.0001) – Tolerance for convergence of the optimization algorithm.

  • coreset_algorithm ({"None", "standard", "lightweight_kmeans", "uniform"}, default None) –

    Coreset algorithm to use for data reduction, based on [MB19]. Options are:

    • None: No coreset is used.

    • "standard": Coreset construction for archetypal analysis [MB19]. Recommended option if data reduction is needed.

    • "lightweight_kmeans": Lightweight coreset for k-means clustering [LBK16].

    • "uniform": Coreset based on uniform sampling.

    See partipy.schema.CORESET_ALGS for all available options.

  • coreset_fraction (float, default 0.1) – Fraction of the data to use for the coreset. Only used if coreset_algorithm is not None and coreset_size is None.

  • coreset_size (int, default: None) – If None, it is set to n_samples * coreset_fraction. Otherwise overwrites the coreset_fraction argument.

  • delta (float, default: 0.0) – Parameter that relaxes the constraint that B must be convex combination of the data points. Must be in the interval [0, 1).

  • verbose (bool, default False) – Whether to display progress messages and additional execution details.

  • seed (int, default 42) – Random seed to use for reproducible results.

  • n_jobs (int, default -1) – Number of jobs for parallel computation. -1 uses all available cores.

  • save_to_anndata (bool, default True) – Whether to save the results to the AnnData object. If False, the results are returned as a tuple. If adata is not an AnnData object, this is ignored.

  • archetypes_only (bool, default True) – Whether to save/return only the archetypes matrix Z (if det to True) or also the full outputs, including the matrices A, B, RSS, and variance explained varexpl.

  • optim_kwargs (dict | None, default None) – Additional arguments that are passed to partipy.arch.AA

  • return_result (bool)

  • force_recompute (bool)

Return type:

ndarray | tuple[ndarray, ndarray, ndarray, list[float] | ndarray, float] | None

Returns:

np.ndarray or tuple or None The output depends on the values of save_to_anndata and archetypes_only:

  • If archetypes_only is True:

    Only the archetype matrix Z is returned or saved.

  • If archetypes_only is False:

    A tuple is returned or saved, containing:

    • Andarray of shape (n_samples, n_archetypes)

      The matrix of weights for the data points.

    • Bndarray of shape (n_archetypes, n_samples)

      The matrix of weights for the archetypes.

    • Zndarray of shape (n_archetypes, n_features)

      The archetypes matrix.

    • RSSfloat

      The residual sum of squares from optimization.

    • varexplfloat

      The variance explained by the model.

  • If save_to_anndata is True:

    Returns None. Results are saved to adata.uns["AA_results"].

  • If save_to_anndata is False:

    The results described above are returned.