petitRADTRANS.retrieval.data

Module Contents

Classes

Data

This class stores the spectral data to be retrieved from a single instrument or observation.

class petitRADTRANS.retrieval.data.Data(name, path_to_observations=None, data_resolution=None, model_resolution=None, distance=None, external_pRT_reference=None, model_generating_function=None, wlen_range_micron=None, scale=False, scale_err=False, offset_bool=False, wlen_bins=None, photometry=False, photometric_transformation_function=None, photometric_bin_edges=None, opacity_mode='c-k', pRT_grid=False, pRT_object=None, wlen=None, flux=None, flux_error=None, mask=None)

This class stores the spectral data to be retrieved from a single instrument or observation.

Each dataset is associated with an instance of petitRadTrans and an atmospheric model. The pRT instance can be overwritten, and associated with an existing pRT instance with the external_pRT_reference parameter. This setup allows for joint or independant retrievals on multiple datasets.

Args:
namestr

Identifier for this data set.

path_to_observationsstr

Path to observations file, including filename. This can be a txt or dat file containing the wavelength, flux, transit depth and error, or a fits file containing the wavelength, spectrum and covariance matrix.

distancefloat

The distance to the object in cgs units. Defaults to a 10pc normalized distance.

data_resolutionfloat

Spectral resolution of the instrument. Optional, allows convolution of model to instrumental line width.

model_resolutionfloat

Will be None by default. The resolution of the c-k opacity tables in pRT. This will generate a new c-k table using exo-k. The default (and maximum) correlated k resolution in pRT is \(\\lambda/\\Delta \\lambda > 1000\) (R=500). Lowering the resolution will speed up the computation. If integer positive value, and if opacities == 'lbl' is True, then this will sample the the high-resolution opacities at the specified resolution. This may be desired in the case where medium-resolution spectra are required with a \(\\lambda/\\Delta \\lambda > 1000\), but much smaller than \(10^6\), which is the resolution of the lbl mode. In this case it may make sense to carry out the calculations with lbl_opacity_sampling = 10e5, for example, and then rebinning to the final desired resolution: this may save time! The user should verify whether this leads to solutions which are identical to the rebinned results of the fiducial \(10^6\) resolution. If not, this parameter must not be used. Note the difference between this parameter and the lbl_opacity_sampling parameter in the RadTrans class - the actual desired resolution should be set here.

external_pRT_referenceobject

An existing RadTrans object. Leave as none unless you’re sure of what you’re doing.

model_generating_functionmethod

A function, typically defined in run_definition.py that returns the model wavelength and spectrum (emission or transmission). This is the function that contains the physics of the model, and calls pRT in order to compute the spectrum.

wlen_range_microntuple,list
Set the wavelength range of the pRT object. Defaults to a range +/-5% greater than that of the data. Must at

least be equal to the range of the data.

scalebool

Turn on or off scaling the data by a constant factor. Set to True if scaling the data during the retrieval.

wlen_binsnumpy.ndarray

Set the wavelength bin width to bin the pRT model to the data. Defaults to the data bins.

photometrybool

Set to True if using photometric data.

photometric_transformation_functionmethod

Transform the photometry (account for filter transmission etc.). This function must take in the wavelength and flux of a spectrum, and output a single photometric point (and optionally flux error).

photometric_bin_edgesTuple, numpy.ndarray

The edges of the photometric bin in micron. [low,high]

pRT_grid: bool

Set to true if data has been binned to pRT R = 1,000 c-k grid.

opacity_modestr

Should the retrieval be run using correlated-k opacities (default, ‘c-k’), or line by line (‘lbl’) opacities? If ‘lbl’ is selected, it is HIGHLY recommended to set the model_resolution parameter. In general, ‘c-k’ mode is recommended for retrievals of everything other than high-resolution (R>40000) spectra.

loadtxt(path, delimiter=',', comments='#')

This function reads in a .txt or .dat file containing the spectrum. Headers should be commented out with ‘#’, the first column must be the wavelength in micron, the second column the flux or transit depth, and the final column must be the error on each data point. Checks will be performed to determine the correct delimiter, but the recommended format is to use a csv file with columns for wavlength, flux and error.

Args:
pathstr

Directory and filename of the data.

delimiterstring, int

The string used to separate values. By default, commas act as delimiter. An integer or sequence of integers can also be provided as width(s) of each field.

commentsstring

The character used to indicate the start of a comment. All the characters occurring on a line after a comment are discarded

load_jwst(path)

Load in an x1d fits file as produced by the STSci JWST pipeline. Expects units of Jy for the flux and micron for the wavelength.

Args:
pathstr

Directory and filename of the data.

loadfits(path)

Load in a particular style of fits file. Must include extension SPECTRUM with fields WAVLENGTH, FLUX and COVARIANCE (or ERROR).

Args:
pathstr

Directory and filename of the data.

set_distance(distance)

Sets the distance variable in the data class. This does NOT rescale the flux to the new distance. In order to rescale the flux and error, use the scale_to_distance method.

Args:
distancefloat

The distance to the object in cgs units.

update_bins(wlens)
scale_to_distance(new_dist)

Updates the distance variable in the data class. This will rescale the flux to the new distance.

Args:
new_distfloat

The distance to the object in cgs units.

get_chisq(wlen_model, spectrum_model, plotting, parameters=None, per_datapoint=False)

Calculate the chi square between the model and the data.

Args:
wlen_modelnumpy.ndarray

The wavlengths of the model

spectrum_modelnumpy.ndarray

The model flux in the same units as the data.

plottingbool

Show test plots.

Returns:
logLfloat

The log likelihood of the model given the data.

get_log_likelihood(spectrum_model, alpha=1.0)

Calculate the log-likelihood between the model and the data.

The spectrum model must be on the same wavelength grid than the data.

Args:
spectrum_model: numpy.ndarray

The model flux in the same units as the data.

alpha: float, optional

Model scaling coefficient.

Returns:
logLfloat

The log likelihood of the model given the data.

static log_likelihood_gibson(model, data, uncertainties, alpha=1.0, beta=None)

Calculate the log-likelihood between the model and the data.

The spectrum model must be on the same wavelength grid than the data.

From Gibson et al. 2020 (https://doi.org/10.1093/mnras/staa228). Constant terms are dropped and constant coefficients are set to 1.

Set:

chi2(A, B) = sum(((f_i - A * m_i) / (B * sig_i)) ** 2), implicit sum on i

with f_i the data, m_i the model, and sig_i the uncertainty; i denotes wavelength/time variation. Starting from (implicit product on i):

L = prod(1 / sqrt(2 * pi * B * sig_i ** 2) * exp(-1/2 * sum(((f_i - A * m_i) / (B * sig_i)) ** 2))), => L = prod( 1 / sqrt(2 * pi * B * sig_i ** 2) * exp(-1/2 * chi2(A, B)) ), => ln(L) = -N/2 * ln(2 * pi) - N * ln(B) - sum(ln(sig_i)) - 1/2 * chi2(A, B).

Dropping constant terms:

ln(L)^* = - N * ln(B) - 1/2 * chi2(A, B).

B can be automatically optimised by nulling the ln(L) partial derivative with respect to B. Using the best estimator of B instead of the true value:

d_ln(L) / d_B = - N / B + 1 / B ** 3 * chi2(A, B=1) = 0, => B = sqrt( 1/N * chi2(A, B=1)).

Replacing:
ln(L)^**= - N * ln(B) - 1/2 * chi2(A, B),

= - N * ln(sqrt(1/N * chi2(A, B=1))) - 1/2 * sum(((f_i - A * m_i) / (B * sig_i)) ** 2), = - N/2 * ln(1/N * chi2(A, B=1)) - 1/2 / B ** 2 * sum(((f_i - A * m_i) / sig_i) ** 2), B cst with i = - N/2 * ln(1/N * chi2(A, B=1)) - 1/2 * N / chi2(A, B=1) * chi2(A, B=1), = - N/2 * ln(1/N * chi2(A, B=1)) - N/2.

Dropping constant terms:

ln(L)^*** = - N/2 * ln(1/N * chi2(A, B=1)).

Args:
model: numpy.ndarray

The model flux in the same units as the data.

data: numpy.ndarray

The data.

uncertainties: numpy.ndarray

The uncertainties on the data.

alpha: float, optional

Model scaling coefficient.

beta: float, optional

Noise scaling coefficient. If None,

Returns:
logLfloat

The log likelihood of the model given the data.

line_b_uncertainty_scaling(parameters)

This function implements the 10^b scaling from Line 2015, which allows for us to account for underestimated uncertainties:

We modify the standard error on the data point by the factor 10^b to account for underestimated uncertainties and/or unknown missing forward model physics (Foreman-Mackey et al. 2013, Hogg et al. 2010, Tremain et al. 2002), e.g., imperfect fits. This results in a more generous estimate of the parameter uncertainties. Note that this is similar to inflating the error bars post-facto in order to achieve reduced chi-squares of unity, except that this approach is more formal because uncertainties in this parameter are properly marginalized into the other relevant parameters. Generally, the factor 10^b takes on values that fall between the minimum and maximum of the square of the data uncertainties.

Args:
parameters: Dict

Dictionary of Parameters, should contain key ‘uncertianty_scaling_b’. This can be done for all data sets, or specified with a tag at the end of the key to apply different factors to different datasets.

Returns:
b: float

10**b error bar scaling factor.

static convolve(input_wavelength, input_flux, instrument_res)

This function convolves a model spectrum to the instrumental wavelength using the provided data_resolution Args:

input_wavelengthnumpy.ndarray

The wavelength grid of the model spectrum

input_fluxnumpy.ndarray

The flux as computed by the model

instrument_resfloat

\(\\lambda/\\Delta \\lambda\), the width of the gaussian kernel to convolve with the model spectrum.

Returns:
flux_lsf

The convolved spectrum.