partipy.AA#

class partipy.AA(n_archetypes, init='plus_plus', optim='projected_gradients', weight=None, max_iter=500, rel_tol=0.0001, early_stopping=True, coreset_algorithm=None, coreset_fraction=0.1, coreset_size=None, delta=0.0, centering=True, scaling=True, verbose=False, seed=42, **optim_kwargs)#

Archetypal Analysis approximates data points as a convex combination of a set of archetypes, which are themselves convex combinations of the data points. The goal is to find the best approximation for a given number of archetypes, representing the structure of the data in a lower-dimensional space.

The model is defined as follows:

\[\hat{X} = A B X = A Z\]
where:
  • \(X \in \mathbb{R}^{N \times D}\) is the data matrix, where \(N\) is the number of samples and \(D\) is the number of featurs.

  • \(A \in \mathbb{R}^{N \times K}\) is the coefficient matrix mapping each data point to a convex combination of archetypes.

  • \(B \in \mathbb{R}^{K \times N}\) is the coefficient matrix mapping each archetype to a convex combination of data points.

  • \(Z = B X\) is the matrix containing the archetypes coordinates.

The optimization problem minimalizes the residual sum of squares (RSS) \(\text{RSS} = \| X - A Z \|_F^2\) subject to the constraints that \(A\) and \(B\) are non-negative and their rows sum to 1, ensuring convex combinations.

Parameters:
  • n_archetypes (int) – Number of archetypes to compute.

  • init ({"uniform", "furthest_sum", "plus_plus"}, default "plus_plus") –

    Initialization method for the archetypes. Options are:

    • "plus_plus": Archetypal++ initialization [MS24].

    • "furthest_sum": Utilizes the furthest sum algorithm [MH12].

    • "uniform": Random initialization.

    See partipy.schema.INIT_ALGS for all available options.

  • optim ({"regularized_nnls", "projected_gradients", "frank_wolfe"}, default "projected_gradients") –

    Optimization algorithm to use (aliases "PCHA""projected_gradients" and "FW""frank_wolfe" are also accepted). Options are:

    • "projected_gradients": Projected gradient descent (also known as PCHA) [MH12].

    • "frank_wolfe": Frank-Wolfe algorithm (often abbreviated FW) [BKHT15].

    • "regularized_nnls": Regularized non-negative least squares [CB94].

    See partipy.schema.OPTIM_ALGS for all available options.

  • weight ({None, "bisquare", "huber"}, default None) –

    Weighting scheme for robust archetypal analysis, based on [EL11]. Options are:

    • None: No weighting (standard archetypal analysis).

    • "bisquare": Bisquare weighting for robust estimation.

    • "huber": Huber weighting for robust estimation.

    See partipy.schema.WEIGHT_ALGS for all available options.

  • max_iter (int, default 500) – Maximum number of iterations for the optimization algorithm.

  • rel_tol (float, default 0.0001) – Tolerance for convergence of the optimization algorithm.

  • early_stopping (bool, default True) – Whether to stop the optimization early if the relative change in RSS is below a certain threshold.

  • coreset_algorithm ({"None", "standard", "lightweight_kmeans", "uniform"}, default None) –

    Coreset algorithm to use for data reduction, based on [MB19]. Options are:

    • None: No coreset is used.

    • "standard": Coreset construction for archetypal analysis [MB19]. Recommended option if data reduction is needed.

    • "lightweight_kmeans": Lightweight coreset for k-means clustering [LBK16].

    • "uniform": Coreset based on uniform sampling.

    See partipy.schema.CORESET_ALGS for all available options.

  • coreset_fraction (float, default 0.1) – Fraction of the data to use for the coreset. Only used if coreset_algorithm is not None and coreset_size is None.

  • coreset_size (int, default: None) – If None, it is set to n_samples * coreset_fraction. Otherwise overwrites the coreset_fraction argument.

  • delta (float, default: 0.0) – Parameter that relaxes the constraint that B must be convex combination of the data points. Must be in the interval [0, 1).

  • centering (bool, default True) – Whether to center the data by subtracting the feature means before optimization.

  • scaling (bool, default True) – Whether to scale the data globally by dividing by the global norm before optimization.

  • verbose (bool, default False) – Whether to display progress messages and additional execution details.

  • seed (int, default 42) – Random seed to use for reproducible results.

  • optim_kwargs (dict) – Additional arguments that are passed to compute_A and compute_B.

Methods table#

fit(X)

Computes the archetypes and the RSS from the data X, which are stored in the corresponding attributes.

transform(X)

Computes the best convex approximation A of X by the archetypes Z.

Methods#

AA.fit(X)#

Computes the archetypes and the RSS from the data X, which are stored in the corresponding attributes.

Parameters:

X (np.ndarray) – Data matrix with shape (n_samples, n_features).

Returns:

-self (AA) The instance of the AA class, with computed archetypes and RSS stored as attributes.

AA.transform(X)#

Computes the best convex approximation A of X by the archetypes Z.

Parameters:

X (np.ndarray) – Data matrix with shape (n_samples, n_features).

Return type:

ndarray

Returns:

np.ndarray The matrix A with shape (n_samples, n_archetypes).