petitRADTRANS.sbi.benchmark#

Benchmarking interfaces for comparing amortized and exact retrievals.

Classes#

RetrievalBenchmarkCase

One benchmark problem used to compare inference backends.

BenchmarkMetrics

Metrics summarizing agreement and predictive performance.

BenchmarkComparison

Compare an amortized result to one or more exact retrieval baselines.

RetrievalBenchmarkSuite

Run standardized benchmark comparisons for SBI tasks.

Module Contents#

class petitRADTRANS.sbi.benchmark.RetrievalBenchmarkCase#

One benchmark problem used to compare inference backends.

name: str#
task: petitRADTRANS.sbi.task.SBITask#
observation: Any#
reference_posterior: Any = None#
metadata: Mapping[str, Any]#
class petitRADTRANS.sbi.benchmark.BenchmarkMetrics#

Metrics summarizing agreement and predictive performance.

calibration: Mapping[str, float]#
posterior_distance: Mapping[str, float]#
predictive_checks: Mapping[str, float]#
runtime: Mapping[str, float]#
class petitRADTRANS.sbi.benchmark.BenchmarkComparison#

Compare an amortized result to one or more exact retrieval baselines.

case_name: str#
amortized_result: petitRADTRANS.sbi.inference.AmortizedRetrievalResult#
exact_results: Mapping[str, Any]#
metrics: BenchmarkMetrics#
metadata: Mapping[str, Any]#
class petitRADTRANS.sbi.benchmark.RetrievalBenchmarkSuite(cases: list[RetrievalBenchmarkCase])#

Run standardized benchmark comparisons for SBI tasks.

cases#
abstractmethod run_case(case: RetrievalBenchmarkCase) BenchmarkComparison#

Run one benchmark case and compute comparison metrics.

Parameters#

case:

Benchmark case describing the task, observation, and optional exact reference posterior to compare against.

Returns#

BenchmarkComparison

Comparison payload combining amortized and exact results together with any derived metrics.

Notes#

The base class is intentionally abstract. Concrete suites are expected to bind exact and amortized inference backends and define the metric computations appropriate for the comparison.

run_all() list[BenchmarkComparison]#

Run all configured benchmark cases.