petitRADTRANS.radtrans_core.linalg#

Attributes#

`TINIEST`
`HUGE_SQRT`
`GRAD_SAFE_DENOM`
`_USE_GPU`
`_FEAUTRIER_THOMAS_MIN_BATCH`
`_FEAUTRIER_INV_DTAU_CAP`
`_FEAUTRIER_INV_DTAU2_CAP`

Functions#

`_feautrier_use_thomas`(→ bool)	Whether to use the Thomas-factorisation Feautrier path over PCR.
`solve_tridiagonal_pcr`(a, b, c, d)	Solves a batch of tridiagonal systems using the parallel cyclic reduction (PCR) algorithm.
`_prepare_feautrier_system_lm_cpu`(dtau1_lm, dtau2_lm, ...)	Fused coefficient construction + Thomas pre-factoring + Lambda diagonal.
`_prepare_feautrier_system_lm_gpu`(dtau1_lm, dtau2_lm, ...)	GPU variant: materialise the tridiagonal rows for the PCR solver.
`prepare_feautrier_system_lm`(dtau1_lm, dtau2_lm, ...)	Build the per-chunk Feautrier solver state, Lambda diagonal and surface
`_solve_feautrier_system_lm_cpu`(factors, source_lm, ...)	Layer-major prefactored Thomas solve.
`_affine_compose`(earlier, later)	Compose two affine maps applied earlier-then-later.
`_solve_feautrier_system_lm_assoc`(factors, source_lm, ...)	Associative-scan variant of `_solve_feautrier_system_lm_cpu()`.
`_solve_feautrier_system_lm_gpu`(state, source_lm, ...)	Layer-major Feautrier solve through the PCR kernel (GPU).
`solve_feautrier_system_lm`(state, source_lm, r_surf_ga, ...)	Solve the prepared layer-major Feautrier system for one source iterate.
`linear_fit`(x, y)	Calculate slope and y-axis intercept of x,y data, assuming zero error on data.

Module Contents#

petitRADTRANS.radtrans_core.linalg.TINIEST#

petitRADTRANS.radtrans_core.linalg.HUGE_SQRT#

petitRADTRANS.radtrans_core.linalg.GRAD_SAFE_DENOM#

petitRADTRANS.radtrans_core.linalg._USE_GPU#

petitRADTRANS.radtrans_core.linalg._FEAUTRIER_THOMAS_MIN_BATCH#

petitRADTRANS.radtrans_core.linalg._feautrier_use_thomas(n_gf: int, n_angles: int) → bool#

Whether to use the Thomas-factorisation Feautrier path over PCR.

The Thomas factorisation (prepare) feeds either the sequential scan solve on CPU or the associative-scan solve on GPU; PCR is the alternative. Always true on CPU. On GPU, true when the batch n_gf * n_angles is large enough that the assoc solve’s speed outweighs the Thomas factorisation’s higher one-time prepare cost. n_gf and n_angles are static, so this resolves at trace time and keeps the prepare/solve choice consistent (their solver states are not interchangeable).

petitRADTRANS.radtrans_core.linalg.solve_tridiagonal_pcr(a, b, c, d)#

Solves a batch of tridiagonal systems using the parallel cyclic reduction (PCR) algorithm.

This implementation is based on a parallel CUDA implementation and is suitable for GPU execution. For the method to be numerically stable, the matrix should be diagonally dominant.

The shapes of the input arrays should be (…, N), where (…) is one or more batch dimensions and N is the size of each tridiagonal system.

Source: https://github.com/tanim72/15418-final-project

Args:

a (jnp.ndarray): The lower diagonal of the matrix A, shape (…, N).: a[…, 0] should be 0.

b (jnp.ndarray): The main diagonal of the matrix A, shape (…, N). c (jnp.ndarray): The upper diagonal of the matrix A, shape (…, N).

c[…, -1] should be 0.

d (jnp.ndarray): The right-hand side of the equation, shape (…, N).

Returns:

jnp.ndarray: The solution of the system, shape (…, N).

petitRADTRANS.radtrans_core.linalg._FEAUTRIER_INV_DTAU_CAP = 10000000000.0#

petitRADTRANS.radtrans_core.linalg._FEAUTRIER_INV_DTAU2_CAP = 5000000000.0#

petitRADTRANS.radtrans_core.linalg._prepare_feautrier_system_lm_cpu(dtau1_lm, dtau2_lm, emission_cos_angles, emission_cos_angles_weights, n_gf, n_angles, n_layers)#

Fused coefficient construction + Thomas pre-factoring + Lambda diagonal.

One scan over layers builds the Feautrier tridiagonal rows on the fly from the delta-optical-depths, immediately eliminates them (Thomas forward elimination), and accumulates the angle-quadrature Lambda diagonal, so the full (batch, n_layers) coefficient arrays are never materialised.

The per-row formulas are exactly those of the original gf-major construction: row 0 uses the one-sided top boundary stencil, rows 1..n_layers-2 the centered stencil, and row n_layers-1 is the identity (the surface boundary value is imposed through the RHS).

Parameters#

dtau1_lmjax.Array, shape (n_layers - 1, n_gf): First-neighbor optical-depth differences, floored away from zero.
dtau2_lmjax.Array, shape (n_layers - 2, n_gf): Second-neighbor optical-depth differences, floored away from zero.

emission_cos_angles : jax.Array, shape (n_angles,) emission_cos_angles_weights : jax.Array, shape (n_angles,) n_gf, n_angles, n_layers : int

Static shape parameters; n_layers must be >= 3.

Returns#

factorstuple of jax.Array, each shape (n_layers, batch): (c_prime, inv_denom, a_times_inv_denom) Thomas factors, with batch = n_gf * n_angles (angle fastest).
lambda_loc_lmjax.Array, shape (n_layers, n_gf): Angle-quadrature diagonal of the approximate Lambda operator.
f_surface_gajax.Array, shape (n_gf, n_angles): mu / dtau1[n_layers - 2] (clipped), used for the surface derivative of the emergent Feautrier variable.

petitRADTRANS.radtrans_core.linalg._prepare_feautrier_system_lm_gpu(dtau1_lm, dtau2_lm, emission_cos_angles, emission_cos_angles_weights, n_gf, n_angles, n_layers)#

GPU variant: materialise the tridiagonal rows for the PCR solver.

Same formulas as _prepare_feautrier_system_lm_cpu(), but built vectorised (PCR has no separable factorisation step, so the coefficient arrays must exist). Returns (a, b, c) in the (batch, n_layers) layout expected by solve_tridiagonal_pcr() as the solver state.

petitRADTRANS.radtrans_core.linalg.prepare_feautrier_system_lm(dtau1_lm, dtau2_lm, emission_cos_angles, emission_cos_angles_weights, n_gf, n_angles, n_layers)#

Build the per-chunk Feautrier solver state, Lambda diagonal and surface stencil coefficient from layer-major delta-optical-depths.

Dispatches to the Thomas fused scan (CPU always, and GPU at large batch) or to the materialising PCR preparation (GPU at small batch); the returned state is consumed by solve_feautrier_system_lm(), which must make the same choice. See _feautrier_use_thomas().

petitRADTRANS.radtrans_core.linalg._solve_feautrier_system_lm_cpu(factors, source_lm, r_surf_ga, angle_weight_a, n_gf, n_angles, n_layers)#

Layer-major prefactored Thomas solve.

Parameters#

factorstuple of jax.Array, each (n_layers, batch): Thomas factors from _prepare_feautrier_system_lm_cpu().
source_lmjax.Array, shape (n_layers, n_gf): Angle-independent source function; only rows 0 .. n_layers-2 enter the RHS (the surface row is r_surf_ga).
r_surf_gajax.Array, shape (n_gf, n_angles): Surface boundary RHS.
angle_weight_ajax.Array, shape (n_angles,): Angular quadrature weights for the mean-intensity accumulation.

Returns#

I_H_top : jax.Array, shape (n_gf, n_angles) x_last : jax.Array, shape (n_gf, n_angles) x_nm2 : jax.Array, shape (n_gf, n_angles) J_bol_lm : jax.Array, shape (n_layers, n_gf)

petitRADTRANS.radtrans_core.linalg._affine_compose(earlier, later)#

Compose two affine maps applied earlier-then-later.

Each operand is a pytree (A, B) representing x -> A x + B. Returns the composition later ∘ earlier. Used as the associative operator for lax.associative_scan(), whose inclusive prefix at position k is the map taking the initial value to y_k.

petitRADTRANS.radtrans_core.linalg._solve_feautrier_system_lm_assoc(factors, source_lm, r_surf_ga, angle_weight_a, n_gf, n_angles, n_layers)#

Associative-scan variant of _solve_feautrier_system_lm_cpu().

Identical interface, factors and result; the forward RHS sweep and backward substitution are evaluated with lax.associative_scan() (O(log n_layers) depth) rather than sequential lax.scan. The full per-layer solution is materialised once to form the angle-weighted J_bol (the fused sequential solve avoids this, trading memory for the parallel depth).

petitRADTRANS.radtrans_core.linalg._solve_feautrier_system_lm_gpu(state, source_lm, r_surf_ga, angle_weight_a, n_gf, n_angles, n_layers)#: Layer-major Feautrier solve through the PCR kernel (GPU).

petitRADTRANS.radtrans_core.linalg.solve_feautrier_system_lm(state, source_lm, r_surf_ga, angle_weight_a, n_gf, n_angles, n_layers)#

Solve the prepared layer-major Feautrier system for one source iterate.

state must come from prepare_feautrier_system_lm() with the same batch (the Thomas-factor and PCR-coefficient states are not interchangeable); the shared _feautrier_use_thomas() decision guarantees this. Returns (I_H_top, x_last, x_nm2, J_bol_lm) with the angular arrays shaped (n_gf, n_angles) and J_bol_lm shaped (n_layers, n_gf).

Solver choice (see GPU benchmarking): - Thomas-factor path (CPU always, GPU at large batch): on GPU the

associative-scan solve is fastest at every measured size (e.g. ~6x faster than PCR and ~1.8x faster than the sequential Thomas scan at xl), so it is used there; on CPU the sequential scan stays best (the parallel scan’s O(n_layers log n_layers) extra work does not pay off without massive device parallelism).

PCR path (GPU at small batch): kept because its near-free vectorised prepare wins the full prepare+solve loop when the batch is too small to amortise the Thomas factorisation.

petitRADTRANS.radtrans_core.linalg.linear_fit(x, y)#

Calculate slope and y-axis intercept of x,y data, assuming zero error on data. Translated from Fortran linear_fit.

Args:: x: 1D array of x values. y: 1D array of y values.
Returns:: Tuple (a, b) where a is the y-intercept and b is the slope.