petitRADTRANS.sbi.calibration#

Calibration and posterior-predictive reporting for amortized SBI.

This module provides the functionality of Simulation Based Calibration (SBC) and posterior-predictive checks. SBC performs inference on held-out simulation data to evaluate model performance.

Classes#

`CoverageLevelReport`	Coverage summary for one nominal interval level.
`SimulationBasedCalibrationReport`	Rank-based SBC summary over a held-out set of observations.
`PosteriorPredictiveReport`	Aggregate posterior-predictive summaries for held-out observations.
`LocalSensitivityPointReport`	Local linear-identifiability summary around one representative point.
`LocalSensitivityReport`	Aggregate local information-content diagnostics for one observation.

Functions#

`generate_local_sensitivity_report`(, ...)	Diagnose local physical identifiability around representative posterior points.
`local_sensitivity_report_to_payload`(→ dict[str, Any])	Convert a local sensitivity report into a JSON-serializable payload.
`central_credible_interval`(→ tuple[numpy.ndarray, ...)	Compute a central credible interval from posterior samples.
`empirical_interval_coverage`(→ float)	Measure empirical central-interval coverage over a batch of posteriors.
`simulation_based_calibration_report`(, metadata, ...)	Build an SBC report from posterior draws and matched ground truths.
`generate_sbc_report`(, max_cases, seed, data_parallel)	Run SBC over a dataset reader using normalized observation batches.
`posterior_predictive_report_from_samples`(...)	Summarize posterior-predictive draws for one observed system.
`generate_posterior_predictive_report`(...)	Generate posterior-predictive summaries for a held-out dataset split.
`summarize_training_diagnostics`(→ dict[str, Any])	Summarize per-epoch stability/inverse-error diagnostics of a trained flow.
`summarize_posterior_samples`(→ dict[str, dict[str, ...)	Per-parameter posterior p16/median/p84, with optional truth values.
`summarize_posterior_contraction`(→ dict[str, dict[str, ...)	Per-parameter posterior contraction relative to the prior.
`summarize_predictive_report`(→ dict[str, Any])	Serialize a posterior-predictive report's scalar metrics to JSON-safe dicts.
`serialize_ood_diagnostic`(→ dict[str, Any] \| None)	Serialize an OOD diagnostic to a JSON-safe dict (or None).

Module Contents#

class petitRADTRANS.sbi.calibration.CoverageLevelReport#

Coverage summary for one nominal interval level.

Attributes#

level:: Nominal central credible-interval level.
empirical_coverage:: Measured coverage fraction over the evaluated cases.
absolute_error:: Absolute difference between empirical and nominal coverage.

level: float#

empirical_coverage: float#

absolute_error: float#

class petitRADTRANS.sbi.calibration.SimulationBasedCalibrationReport#

Rank-based SBC summary over a held-out set of observations. The rank is essentially the number of posterior samples that are less than the true α, and should be uniformly distributed if the posterior is calibrated.

Attributes#

ranks:: Integer rank of the ground-truth parameter within posterior samples for each held-out case and parameter dimension.
rank_histogram_counts:: Per-parameter SBC histogram counts.
posterior_means:: Posterior mean for each held-out case.
truths:: Ground-truth parameters paired with each posterior sample set.
coverages:: Coverage summaries for the requested nominal interval levels.
mean_rank:: Mean empirical rank for each parameter dimension.
normalized_mean_rank_error:: Absolute mean-rank error normalized by the expected average rank.
metadata:: Auxiliary run metadata such as split name and number of posterior draws.

ranks: numpy.ndarray#

rank_histogram_counts: numpy.ndarray#

posterior_means: numpy.ndarray#

truths: numpy.ndarray#

coverages: tuple[CoverageLevelReport, Ellipsis]#

mean_rank: numpy.ndarray#

normalized_mean_rank_error: numpy.ndarray#

metadata: Mapping[str, Any]#

class petitRADTRANS.sbi.calibration.PosteriorPredictiveReport#

Aggregate posterior-predictive summaries for held-out observations.

Attributes#

observed_values:: Observed values kept on the original observation scale.
predictive_mean:: Posterior-predictive mean curves or vectors.
predictive_std:: Posterior-predictive standard deviations.
interval_lower, interval_upper:: Central predictive interval bounds for the requested level.
interval_coverage:: Per-dataset fraction of observed points covered by the predictive interval.
mean_absolute_error:: Mean absolute deviation between predictive mean and observed values.
metadata:: Auxiliary metadata such as split name, number of cases, and parameter space used during predictive generation.

observed_values: Mapping[str, numpy.ndarray]#

predictive_mean: Mapping[str, numpy.ndarray]#

predictive_std: Mapping[str, numpy.ndarray]#

interval_lower: Mapping[str, numpy.ndarray]#

interval_upper: Mapping[str, numpy.ndarray]#

interval_coverage: Mapping[str, float]#

mean_absolute_error: Mapping[str, float]#

mean_absolute_error_sigma: Mapping[str, float]#

median_interval_width_over_uncertainty: Mapping[str, float]#

crps: Mapping[str, float]#

metadata: Mapping[str, Any]#

class petitRADTRANS.sbi.calibration.LocalSensitivityPointReport#

Local linear-identifiability summary around one representative point.

Attributes#

label:: Short human-readable label for the representative point.
parameters:: Physical parameter vector at which the Jacobian was evaluated.
finite_difference_steps:: Per-parameter finite-difference step sizes used during Jacobian construction.
finite_difference_schemes:: Per-parameter scheme labels such as 'central' or 'forward'.
whitened_jacobian:: Observation Jacobian divided by observational uncertainty, with shape (n_observation_values, n_parameters).
singular_values:: Singular values of the whitened Jacobian.
effective_rank:: Number of singular values larger than the configured relative cutoff.
condition_number:: Condition number inferred from the singular spectrum.
fisher_matrix:: Approximate Fisher information matrix J^T J.
fisher_covariance:: Damped pseudo-inverse of the Fisher matrix.
fisher_correlation:: Correlation matrix derived from the approximate Fisher covariance.
parameter_sensitivity_norm:: Per-parameter square-root Fisher diagonal.
local_sigma:: Approximate local standard deviation from the Fisher covariance.
metadata:: Auxiliary diagnostics such as the ridge term and failed columns.

label: str#

parameters: numpy.ndarray#

finite_difference_steps: numpy.ndarray#

finite_difference_schemes: tuple[str, Ellipsis]#

whitened_jacobian: numpy.ndarray#

singular_values: numpy.ndarray#

effective_rank: int#

condition_number: float#

fisher_matrix: numpy.ndarray#

fisher_covariance: numpy.ndarray#

fisher_correlation: numpy.ndarray#

parameter_sensitivity_norm: numpy.ndarray#

local_sigma: numpy.ndarray#

metadata: Mapping[str, Any]#

class petitRADTRANS.sbi.calibration.LocalSensitivityReport#

Aggregate local information-content diagnostics for one observation.

Attributes#

parameter_names:: Parameter ordering used throughout the report.
posterior_mean:: Posterior mean in physical parameter space.
posterior_std:: Posterior standard deviation in physical parameter space.
posterior_median:: Posterior median in physical parameter space.
posterior_iqr:: Posterior interquartile range in physical parameter space.
representative_points:: Local sensitivity summaries evaluated at representative posterior points such as the posterior mean and highest-density sample.
aggregate_local_sigma:: Median local Fisher sigma across representative points.
aggregate_parameter_sensitivity_norm:: Median parameter sensitivity norm across representative points.
posterior_to_local_sigma_ratio:: Ratio between posterior standard deviation and local Fisher sigma.
parameter_diagnostics:: Per-parameter heuristic summary separating weak data constraints from broader-than-local posterior structure.
metadata:: Auxiliary metadata such as quantile levels and observation slices.

parameter_names: tuple[str, Ellipsis]#

posterior_mean: numpy.ndarray#

posterior_std: numpy.ndarray#

posterior_median: numpy.ndarray#

posterior_iqr: numpy.ndarray#

representative_points: tuple[LocalSensitivityPointReport, Ellipsis]#

aggregate_local_sigma: numpy.ndarray#

aggregate_parameter_sensitivity_norm: numpy.ndarray#

posterior_to_local_sigma_ratio: numpy.ndarray#

parameter_diagnostics: Mapping[str, Mapping[str, Any]]#

metadata: Mapping[str, Any]#

petitRADTRANS.sbi.calibration.generate_local_sensitivity_report(task: petitRADTRANS.sbi.task.SBITask, posterior_samples: Any, observation_blocks: Sequence[Any], posterior_log_probabilities: Any = None, parameter_space: str = 'physical', simulator: petitRADTRANS.sbi.simulator.RuntimeSimulator | None = None, quantile_levels: Sequence[float] = (0.1, 0.5, 0.9), finite_difference_relative_step: float = 0.001, finite_difference_std_fraction: float = 0.1, finite_difference_absolute_floor: float = 1e-05, max_step_reduction_attempts: int = 6, svd_relative_tolerance: float = 0.001, posterior_underexploited_ratio_threshold: float = 1.5, weak_sensitivity_fraction_threshold: float = 0.15, seed: int | None = None) → LocalSensitivityReport#

Diagnose local physical identifiability around representative posterior points.

The report evaluates a deterministic simulator Jacobian at representative posterior points, whitens it by observational uncertainty, and derives a Fisher-style local covariance approximation for each point.

petitRADTRANS.sbi.calibration.local_sensitivity_report_to_payload(report: LocalSensitivityReport) → dict[str, Any]#: Convert a local sensitivity report into a JSON-serializable payload.

petitRADTRANS.sbi.calibration.central_credible_interval(samples: Any, level: float = 0.8) → tuple[numpy.ndarray, numpy.ndarray]#

Compute a central credible interval from posterior samples.

Parameters#

samples:: Posterior samples for one observation, with shape (n_samples,) or (n_samples, n_dim).
level:: Nominal central interval level between 0 and 1.

Returns#

tuple[np.ndarray, np.ndarray]: Lower and upper quantile bounds for each inferred parameter dimension. The returned arrays are one-dimensional even for scalar posteriors.

petitRADTRANS.sbi.calibration.empirical_interval_coverage(samples: Any, truths: Any, level: float = 0.8) → float#

Measure empirical central-interval coverage over a batch of posteriors.

Parameters#

samples:: Posterior draws for one or more held-out cases. Expected shape is (n_cases, n_draws, n_dim) or a shape that can be promoted to it.
truths:: Ground-truth parameters matched to each held-out case.
level:: Nominal central interval level between 0 and 1.

Returns#

float: Fraction of truth values falling inside the requested central interval.

petitRADTRANS.sbi.calibration.simulation_based_calibration_report(samples: Any, truths: Any, levels: Sequence[float] = (0.5, 0.8, 0.95), metadata: Mapping[str, Any] | None = None) → SimulationBasedCalibrationReport#

Build an SBC report from posterior draws and matched ground truths.

Parameters#

samples:: Posterior draws for held-out cases with shape (n_cases, n_draws, n_dim) or a promotable equivalent.
truths:: Ground-truth parameters paired with each held-out case.
levels:: Coverage levels to summarize alongside the rank statistics.
metadata:: Optional metadata attached to the returned report.

Returns#

SimulationBasedCalibrationReport: SBC summary containing ranks, histograms, posterior means, and coverage summaries.

petitRADTRANS.sbi.calibration.generate_sbc_report(posterior: Any, dataset_reader: petitRADTRANS.sbi.dataset.NormalizedObservationDatasetReader, split: petitRADTRANS.sbi.dataset.DatasetSplit = DatasetSplit.TEST, n_posterior_samples: int = 256, batch_size: int = 32, parameter_space: str | None = None, levels: Sequence[float] = (0.5, 0.8, 0.95), max_cases: int | None = None, seed: int | None = None, data_parallel: bool | None = None) → SimulationBasedCalibrationReport#

Run SBC over a dataset reader using normalized observation batches.

Parameters#

posterior:: Trained posterior estimator exposing encode_observation and sample_posterior.
dataset_reader:: Reader yielding normalized held-out observations and matched parameter values.
split:: Dataset split used for the SBC evaluation.
n_posterior_samples:: Number of posterior draws generated per held-out case.
batch_size:: Reader batch size used during report generation.
parameter_space:: Optional parameter space override. Defaults to the posterior’s own configured parameter space.
levels:: Coverage levels summarized in the returned report.
max_cases:: Optional cap on the number of held-out observations evaluated.
seed:: Optional base seed used to generate reproducible posterior draws.

Returns#

SimulationBasedCalibrationReport: SBC summary computed from the requested dataset split.

petitRADTRANS.sbi.calibration.posterior_predictive_report_from_samples(task: petitRADTRANS.sbi.task.SBITask, posterior_samples: Any, observation_blocks: Sequence[Any], parameter_space: str = 'physical', interval_level: float = 0.9, simulator: petitRADTRANS.sbi.simulator.RuntimeSimulator | None = None, seed: int | None = None, n_predictive_forward_model_samples: int | None = None) → PosteriorPredictiveReport#

Summarize posterior-predictive draws for one observed system.

Parameters#

task:: SBI task used to map posterior samples back into physical parameter space and construct the simulator.
posterior_samples:: Posterior draws in the specified parameter space.
observation_blocks:: Original user-facing observation blocks. Reported observations remain on this raw scale.
parameter_space:: Coordinate system in which posterior_samples are expressed.
interval_level:: Central predictive interval level used for reporting.
simulator:: Optional simulator override. When omitted a new runtime-backed simulator is constructed from task.
seed:: Optional seed used when instantiating the simulator.
n_predictive_forward_model_samples:: Number of posterior draws passed through the forward model to generate the predictive distribution. When None all posterior_samples are used. Setting this to a small value (e.g. 50–200) dramatically reduces the number of expensive petitRADTRANS calls without materially affecting the predictive summary.

Returns#

PosteriorPredictiveReport: Posterior-predictive summary for the supplied observation.

Notes#

The observations in the returned report remain on the original observation scale even when the posterior itself was trained on normalized inputs.

petitRADTRANS.sbi.calibration.generate_posterior_predictive_report(task: petitRADTRANS.sbi.task.SBITask, posterior: Any, dataset_reader: petitRADTRANS.sbi.dataset.NormalizedObservationDatasetReader, split: petitRADTRANS.sbi.dataset.DatasetSplit = DatasetSplit.TEST, n_posterior_samples: int = 256, interval_level: float = 0.9, max_cases: int | None = None, seed: int | None = None, simulator: petitRADTRANS.sbi.simulator.RuntimeSimulator | None = None, n_predictive_forward_model_samples: int | None = None, checkpoint_directory: str | pathlib.Path | None = None, data_parallel: bool | None = None) → PosteriorPredictiveReport#

Generate posterior-predictive summaries for a held-out dataset split.

Parameters#

task:: SBI task defining parameter transforms and the simulator configuration.
posterior:: Trained posterior estimator used to sample held-out predictive draws.
dataset_reader:: Reader providing normalized observations and preprocessing metadata.
split:: Held-out split used for the predictive report.
n_posterior_samples:: Number of posterior draws generated per held-out observation.
interval_level:: Central predictive interval level reported for each dataset.
max_cases:: Optional cap on the number of held-out observations evaluated.
seed:: Optional base seed used to make predictive sampling reproducible.
simulator:: Optional runtime simulator override.
n_predictive_forward_model_samples:: Number of posterior draws passed through the forward model per held-out case. When None all n_posterior_samples draws are forwarded. Subsampling here is the primary lever for keeping the total number of petitRADTRANS calls to a manageable level when evaluating many held-out cases.
checkpoint_directory:: Optional directory for per-case checkpoints. When provided, each completed case is written to disk as a compressed .npz file and skipped on resume. This makes the expensive forward-model loop restartable after interruption.

Returns#

PosteriorPredictiveReport: Aggregate posterior-predictive summary over the requested split.

Notes#

Observations are normalized internally for posterior encoding but compared on the original observation scale in the returned report.

petitRADTRANS.sbi.calibration.summarize_training_diagnostics(posterior: Any) → dict[str, Any]#: Summarize per-epoch stability/inverse-error diagnostics of a trained flow.

petitRADTRANS.sbi.calibration.summarize_posterior_samples(posterior_samples: numpy.ndarray, parameter_names: list[str], truth_parameter_values: Mapping[str, float] | None = None) → dict[str, dict[str, float | None]]#: Per-parameter posterior p16/median/p84, with optional truth values.

petitRADTRANS.sbi.calibration.summarize_posterior_contraction(posterior_samples: numpy.ndarray, parameter_names: list[str], retrieval_config: Any) → dict[str, dict[str, float | None]]#

Per-parameter posterior contraction relative to the prior.

contraction is the ratio of the posterior central 68% width to the prior central 68% width. Values near 1.0 mean the marginal is effectively the prior (uninformative); values well below 1.0 mean the observation constrains that parameter. This is the diagnostic that separates an informative posterior from a calibrated-but-uninformative one – something SBC rank histograms alone cannot reveal, since a posterior that returns the prior is trivially calibrated.

petitRADTRANS.sbi.calibration.summarize_predictive_report(predictive_report: PosteriorPredictiveReport) → dict[str, Any]#: Serialize a posterior-predictive report’s scalar metrics to JSON-safe dicts.

petitRADTRANS.sbi.calibration.serialize_ood_diagnostic(ood_diagnostic: Any) → dict[str, Any] | None#: Serialize an OOD diagnostic to a JSON-safe dict (or None).