Accessing, Saving, and Reloading Cached Results#
ParTIpy keeps everything it computes inside the working AnnData object. This notebook walks through the cache layout, the accessor helpers that read those artefacts back, and the utilities that make the caches safe to round-trip through .h5ad files. Because each cache is keyed by an immutable ArchetypeConfig, the settings that produced an analysis travel together with its results.
Setup#
We start by importing the dependencies we will use. The examples below assume you already configured which embedding to use via set_obsm and that you have an AnnData object named adata.
import anndata as ad
import numpy as np
import partipy as pt
import scanpy as sc
from partipy.datasets import load_hepatocyte_data_2
adata = load_hepatocyte_data_2()
sc.pp.normalize_total(adata)
sc.pp.log1p(adata)
sc.pp.highly_variable_genes(adata)
sc.pp.pca(adata, mask_var="highly_variable")
adata.layers["z_scaled"]= sc.pp.scale(adata.X, max_value=10, copy=True)
pt.compute_shuffled_pca(adata, mask_var="highly_variable")
pt.set_obsm(adata=adata, obsm_key="X_pca", n_dimensions=3)
adata
AnnData object with n_obs × n_vars = 1999 × 8354
obs: 'cell_type', 'zone', 'run_id', 'time_point', 'UMAP_X', 'UMAP_Y'
var: 'highly_variable', 'means', 'dispersions', 'dispersions_norm'
uns: 'log1p', 'hvg', 'pca', 'AA_pca', 'AA_config'
obsm: 'X_pca'
varm: 'PCs'
layers: 'z_scaled'
Computing and Caching Results#
The high-level ParTIpy routines both perform the computation and persist their outputs to adata.uns. Each cache entry is keyed by the full ArchetypeConfig, so repeated calls with the same settings reuse what was already computed.
ArchetypeConfig is a frozen Pydantic model that records every optimisation knob: the embedding key, dimensionality, solver choices, regularisation parameters, coreset options, and even any extra optim_kwargs. The getter functions reuse those field names, so you can pass any of them as keyword filters when retrieving cached entries.
Function |
Cached location |
Notes |
|---|---|---|
|
|
Stores weights ( |
|
|
Evaluates multiple archetype counts; each fit is cached or reused via |
|
|
Saves the cell-by-archetype weight matrix. |
|
|
Aligns bootstrap archetypes to the cached reference fit, reusing or populating AA results as needed. |
|
|
Stores the polytope t-ratio for the resolved AA configuration. |
|
|
Persists permutation-based null distributions (t-ratio and RSS) for later inspection. |
A few practical tips:
Use
force_recompute=Trueon any compute function to refresh a cached entry.Keep track of the configuration you ran—filters passed to the getter utilities must uniquely identify one
ArchetypeConfig.Once results are cached, the getter functions (
get_aa_result,get_aa_metrics,get_aa_cell_weights,get_aa_bootstrap,summarize_aa_metrics) provide the recommended read-only interface.When you need to persist the caches, prefer the helper functions demonstrated later in this notebook (
pt.write_h5ad/pt.read_h5ad).
The following cell runs a compact example that populates each cache so you can experiment with the accessors in later sections.
# Run archetypal analysis for three archetypes and cache the full result payload
pt.compute_archetypes(
adata=adata,
n_archetypes=3,
save_to_anndata=True,
archetypes_only=False,
)
# Evaluate a small grid of archetype counts; metrics are saved in adata.uns["AA_selection_metrics"]
pt.compute_selection_metrics(
adata=adata,
n_archetypes_list=[2, 3, 4],
)
# Cache cell weights and bootstrap variance for later inspection
pt.compute_archetype_weights(adata, result_filters={"n_archetypes": 3})
pt.compute_bootstrap_variance(
adata=adata,
n_bootstrap=5,
n_archetypes_list=[3],
save_to_anndata=True,
)
Applied length scale is 3.12.
Save Cached Results to .h5ad#
ParTIpy caches use ArchetypeConfig objects as keys. HDF5 only accepts string keys, so saving via adata.write_h5ad fails once AA artefacts are present. Call pt.write_h5ad to temporarily serialize the cache keys before writing. Loading with pt.read_h5ad restores them automatically.
Under the hood each ArchetypeConfig is converted into a JSON string prefixed with ArchetypeConfig::, which keeps the file standards-compliant without losing any information. During the read step those strings are decoded back into the original frozen objects, so the cached entries continue to work with the accessors and accept the same filter arguments as they did before saving.
pt.write_h5ad(adata, "analysis.h5ad") # serializes ArchetypeConfig keys before writing
adata = pt.read_h5ad("analysis.h5ad") # restores the keys after loading
adata
AnnData object with n_obs × n_vars = 1999 × 8354
obs: 'cell_type', 'zone', 'run_id', 'time_point', 'UMAP_X', 'UMAP_Y'
var: 'highly_variable', 'means', 'dispersions', 'dispersions_norm'
uns: 'AA_bootstrap', 'AA_cell_weights', 'AA_config', 'AA_pca', 'AA_results', 'AA_selection_metrics', 'hvg', 'log1p', 'pca'
obsm: 'X_pca'
varm: 'PCs'
layers: 'z_scaled'
Loading files created outside ParTIpy#
If you load a file through anndata.read_h5ad or scanpy.read_h5ad, call pt.ensure_archetype_config_keys(adata) once to restore any cached stores so that get_aa_result and related utilities continue to work. The function is idempotent, so it is safe to run even if the caches already use ArchetypeConfig keys. Advanced workflows can reach for pt.serialize_archetype_caches when they need to manage the round-trip manually (for example, when writing to a custom storage backend) and then call pt.ensure_archetype_config_keys afterwards to rehydrate the keys.
pt.write_h5ad(adata, "analysis.h5ad") # serializes ArchetypeConfig keys before writing
adata = sc.read_h5ad("analysis.h5ad") # restores the keys after loading
pt.ensure_archetype_config_keys(adata)
pt.get_aa_result(adata, n_archetypes=4)["Z"].shape
(4, 3)
Retrieving Cached AA Results#
get_aa_result returns the payload that was stored by compute_archetypes. You can optionally pass filters to disambiguate between multiple cached configurations. Filters accept any field of ArchetypeConfig, for example n_archetypes, delta, or optim.
# Retrieve the only cached AA result and inspect the archetype coordinates
result_payload = pt.get_aa_result(adata, n_archetypes=4)
A = result_payload["A"]
B = result_payload["B"]
Z = result_payload["Z"]
print("Archetypes shape:", Z.shape)
print(Z)
Archetypes shape: (4, 3)
[[-4.1829314 4.8940554 -0.42909184]
[ 3.4212031 -2.0040917 -5.227825 ]
[-3.1794775 -3.8219402 1.0898299 ]
[ 7.8230276 0.9535061 3.1575007 ]]
Accessing Selection Metrics#
Selection diagnostics (variance explained, RSS, etc.) are stored per configuration in adata.uns["AA_selection_metrics"]. Use get_aa_metrics to retrieve a specific table, or call pt.summarize_aa_metrics to concatenate the entries that share the same optimization settings (aside from the number of archetypes).
metrics_df = pt.summarize_aa_metrics(adata)
metrics_df
| k | n_archetypes | n_restarts | seed | varexpl | IC | RSS | |
|---|---|---|---|---|---|---|---|
| 0 | 2 | 2 | 5 | 42 | 0.505511 | 4112.441228 | 2965.446045 |
| 1 | 3 | 3 | 5 | 42 | 0.780190 | 4045.360678 | 1318.203125 |
| 2 | 4 | 4 | 5 | 42 | 0.933783 | 4357.642608 | 397.105713 |
Accessing Bootstrap Results#
Bootstrap runs are stored in adata.uns["AA_bootstrap"] as tidy DataFrames. Use get_aa_bootstrap to read them back for plotting or downstream analysis. As with other getters, filters ensure the correct configuration is selected.
bootstrap_df = pt.get_aa_bootstrap(adata, n_archetypes=3)
bootstrap_df.head()
| X_pca_0 | X_pca_1 | X_pca_2 | archetype | iter | reference | mean_variance | variance_per_archetype | |
|---|---|---|---|---|---|---|---|---|
| 0 | -4.239301 | 4.869498 | -0.415050 | 0 | 1 | False | 0.026822 | 0.028961 |
| 1 | 7.639462 | 0.079809 | -0.067928 | 1 | 1 | False | 0.026822 | 0.017955 |
| 2 | -2.752828 | -3.950900 | 0.186937 | 2 | 1 | False | 0.026822 | 0.033550 |
| 0 | -3.541122 | 4.984828 | -0.393629 | 0 | 2 | False | 0.026822 | 0.028961 |
| 1 | 7.669269 | -0.178665 | -0.022822 | 1 | 2 | False | 0.026822 | 0.017955 |
Accessing Cached Cell Weights#
All cell weight matrices computed via compute_archetype_weights live in adata.uns["AA_cell_weights"]. To retrieve one, call get_aa_cell_weights. Setting return_config=True also returns the ArchetypeConfig key that was matched.
config, weights = pt.get_aa_cell_weights(adata, return_config=True)
print(config)
print("Weights shape:", weights.shape)
obsm_key='X_pca' n_dimensions=(0, 1, 2) n_archetypes=3 init='plus_plus' optim='projected_gradients' weight=None max_iter=500 rel_tol=0.0001 early_stopping=True coreset_algorithm=None coreset_fraction=0.1 coreset_size=None delta=0.0 seed=42 optim_kwargs=()
Weights shape: (1999, 3)
Tips for Working with Cached Results#
Every getter raises a descriptive
ValueErrorif the requested configuration is missing or the cache is empty. Handle these exceptions to provide actionable messages in your pipelines.To refresh a cached entry, rerun the corresponding compute function with
force_recompute=True.When you plan to keep several configurations, pass explicit filters (for example,
delta=0.1,optim="frank_wolfe") to the getter utilities to avoid ambiguity.
With these helpers you can treat the AnnData object as the single source of truth for all AA artifacts, keeping reproducible analyses compact and self-contained.