Analysis Module (brutus.analysis)#

The analysis module implements the statistical inference machinery for brutus. It provides high-level fitting functions that combine stellar models with Bayesian inference to estimate parameters from photometric and astrometric data.

Key Components:

  • Individual Star Fitting: BruteForce class performs grid-based Bayesian inference for field stars

  • Population Analysis: Functions for fitting coeval stellar populations (clusters) with mixture models

  • Photometric Offsets: Tools for empirical calibration of systematic photometric errors

  • Line-of-Sight Dust: Functions for 3-D dust mapping from stellar ensembles

Design Philosophy:

The module implements the complete Bayesian inference workflow:

  1. Model Evaluation: Compute likelihoods over stellar model grids

  2. Prior Application: Incorporate Galactic priors on stellar properties and dust

  3. Optimization: Find best-fit parameters in distance-extinction space

  4. Marginalization: Integrate over nuisance parameters

  5. Sampling: Generate posterior samples for uncertainty quantification

Typical Usage Patterns:

For individual field stars with photometry and parallax:

from brutus.analysis import BruteForce
from brutus.core import StarGrid
from brutus.data import load_models
from brutus.utils import inv_magnitude
import numpy as np
import h5py

# Load pre-computed grid
models, labels, label_mask = load_models('grid_mist_v9.h5')
grid = StarGrid(models, labels)

# Initialize fitter
fitter = BruteForce(grid)

# Observed data (Nstars=1 in this example)
# BruteForce.fit expects linear flux densities ("maggies"), NOT magnitudes.
# Convert observed magnitudes + errors to maggies via inv_magnitude.
mag = np.array([[16.5, 15.2, 14.8, 13.5, 13.1]])  # shape (1, 5)
mag_err = np.array([[0.01, 0.01, 0.02, 0.03, 0.03]])
flux, flux_err = inv_magnitude(mag, mag_err)  # maggies = 10**(-0.4*mag)
mask = np.array([[True, True, True, True, True]])
obj_ids = np.array([[1]])
parallax = np.array([2.5])  # mas
parallax_err = np.array([0.1])  # mas

# Fit and save results to HDF5
output_file = fitter.fit(
    data=flux, data_err=flux_err, data_mask=mask,
    data_labels=obj_ids, save_file='results.h5',
    parallax=parallax, parallax_err=parallax_err,
    Ndraws=250
)

# Read posterior samples from HDF5
with h5py.File(output_file, 'r') as f:
    dist = f['samps_dist'][0]  # (Ndraws,) distances in kpc
    av = f['samps_red'][0]     # (Ndraws,) A_V values
print(f"Distance: {np.median(dist)*1000:.0f} pc")
print(f"Extinction: {np.median(av):.2f} mag")

For stellar clusters with MCMC:

from brutus.analysis import isochrone_population_loglike
from brutus.core import Isochrone, StellarPop
import emcee
import numpy as np

# Initialize population model
iso = Isochrone()
pop = StellarPop(isochrone=iso)

# Observed cluster data (N stars, M filters)
obs_flux = np.array([...])  # shape (N, M)
obs_err = np.array([...])   # shape (N, M)

# Define log-likelihood for MCMC
# theta = [feh, loga, av, rv, dist]
def lnprob(theta):
    return isochrone_population_loglike(
        theta,
        stellarpop=pop,
        obs_phot=obs_flux,
        obs_err=obs_err,
        cluster_prob=0.9  # External membership prior
    )

# Run MCMC
ndim, nwalkers = 5, 32
sampler = emcee.EnsembleSampler(nwalkers, ndim, lnprob)
sampler.run_mcmc(initial_pos, 5000, progress=True)

# Extract posterior samples
samples = sampler.get_chain(discard=1000, thin=10, flat=True)

Key Methodological Features:

  • Brute-force grid evaluation: Systematically explores parameter space without convergence issues

  • Multi-stage optimization: Fast magnitude-space → accurate flux-space → Bayesian posterior

  • Mixture-before-marginalization: Mathematically correct treatment of field contamination in clusters

  • Importance sampling: Efficient posterior sampling weighted by probability

See Also:

Individual Star Fitting#

class brutus.analysis.BruteForce(star_grid, verbose=True)[source]#

Bases: object

Bayesian parameter estimation for individual stars using grid-based models.

This class performs brute-force fitting over a pre-computed stellar model grid to estimate stellar parameters, distances, and extinction. It uses the StarGrid infrastructure for model management and applies Bayesian priors for robust inference.

Parameters:
  • star_grid (StarGrid) – Pre-loaded stellar model grid for SED generation.

  • verbose (bool, optional) – Whether to print initialization information. Default is True.

star_grid#

The underlying stellar model grid.

Type:

StarGrid

models#

Magnitude coefficients from the grid.

Type:

numpy.ndarray

models_labels#

Labels for each model in the grid.

Type:

structured numpy.ndarray

labels_mask#

Mask indicating which labels are grid parameters (True) vs predictions (False).

Type:

dict

See also

brutus.core.StarGrid

Stellar model grid infrastructure

brutus.data.load_models

Load pre-computed grids

loglike_grid

Compute log-likelihoods

logpost_grid

Compute log-posteriors

Notes

The BruteForce fitter uses a two-stage approach:

  1. Likelihood computation (loglike_grid): Optimizes distance and extinction for each grid point to find maximum likelihood

  2. Posterior computation (logpost_grid): Integrates over distance and extinction uncertainty using Monte Carlo, applying priors for Galactic structure, dust maps, and astrometry

The fitter automatically handles: - Grid parameter vs. prediction distinction - Age weighting for proper sampling - Grid spacing corrections - Parallax constraints - Galactic structure priors - Dust map priors

Examples

Basic usage with a pre-loaded grid:

>>> from brutus.core import StarGrid
>>> from brutus.analysis.individual import BruteForce
>>> from brutus.data import load_models
>>>
>>> # Load grid
>>> models, labels, params = load_models('grid_mist_v9.h5')
>>> grid = StarGrid(models, labels, params)
>>>
>>> # Initialize fitter
>>> fitter = BruteForce(grid)
>>>
>>> # Fit data
>>> results = fitter.fit(
...     data, data_err, data_mask,
...     data_labels, save_file='results.h5',
...     parallax=parallax_data,
...     data_coords=coordinates
... )
__init__(star_grid, verbose=True)[source]#

Initialize BruteForce with a StarGrid instance.

property nmodels#

Number of models in the grid.

property nfilters#

Number of filters in the grid.

get_sed_grid(indices=None, av=None, rv=None, return_flux=False)[source]#

Compute SEDs for multiple grid points simultaneously.

This is the grid-based batch computation method, distinct from StarGrid.get_seds() which handles single star synthesis.

Parameters:
  • indices (array-like, optional) – Grid indices to compute SEDs for. If None, uses all models.

  • av (array-like or float, optional) – A(V) values for each model. If None, defaults to 0.

  • rv (array-like or float, optional) – R(V) values for each model. If None, defaults to 3.3.

  • return_flux (bool, optional) – If True, return fluxes instead of magnitudes. Default is False.

Returns:

  • seds (numpy.ndarray of shape (Nmodels, Nbands)) – Computed SEDs.

  • rvecs (numpy.ndarray of shape (Nmodels, Nbands)) – Reddening vectors.

  • drvecs (numpy.ndarray of shape (Nmodels, Nbands)) – Differential reddening vectors with respect to Rv.

See also

brutus.core.sed_utils._get_seds

Underlying SED computation

StarGrid.get_seds

Single star SED generation

Notes

This method performs batch SED computation for multiple grid points simultaneously, which is more efficient than calling StarGrid.get_seds() repeatedly.

The scale factor relates to distance as \(s = 1/d^2\) where d is distance in parsecs. The reddening is applied as:

\[\begin{split}m(\\lambda) = m_0(\\lambda) + A_V \\cdot [r_0(\\lambda) + R_V \\cdot dr(\\lambda)]\end{split}\]

where \(r_0\) and \(dr\) are the reddening vector components.

loglike_grid(data, data_err, data_mask, avlim=(0.0, 20.0), av_gauss=(0.0, 1000000.0), rvlim=(1.0, 8.0), rv_gauss=(3.32, 0.18), av_init=None, rv_init=None, dim_prior=True, ltol=0.03, ltol_subthresh=0.01, init_thresh=0.005, parallax=None, parallax_err=None, return_vals=False, indices=None, **kwargs)[source]#

Compute log-likelihood over the stellar model grid.

This is a wrapper around the module-level loglike_grid function that uses the instance’s model grid.

Parameters:
  • data (~numpy.ndarray of shape (Nfilt)) – Measured flux densities.

  • data_err (~numpy.ndarray of shape (Nfilt)) – Measurement errors.

  • data_mask (~numpy.ndarray of shape (Nfilt)) – Binary mask for valid data.

  • indices (array-like, optional) – Subset of model indices to use. If None, uses all models.

  • avlim (tuple, optional) – (min, max) bounds on A(V). Default is (0.0, 20.0).

  • av_gauss (tuple, optional) – (mean, std) for Gaussian prior on A(V). Default is (0.0, 1e6).

  • rvlim (tuple, optional) – (min, max) bounds on R(V). Default is (1.0, 8.0).

  • rv_gauss (tuple, optional) – (mean, std) for Gaussian prior on R(V). Default is (3.32, 0.18).

  • parallax (float, optional) – Parallax measurement in mas.

  • parallax_err (float, optional) – Parallax error in mas.

  • return_vals (bool, optional) – If True, return full results including covariances. Default is False.

  • **kwargs – Passed to optimization functions.

Returns:

  • If return_vals=False

    lnlnumpy.ndarray

    Log-likelihoods for each grid point

    Ndimint

    Number of dimensions (filters)

    chi2numpy.ndarray

    Chi-squared values

  • If return_vals=True – Also includes scale, av, rv, icov_sar arrays

See also

logpost_grid

Compute log-posteriors from likelihoods

_optimize_fit_mag

Magnitude-space optimization

_optimize_fit_flux

Flux-space optimization

Notes

This method optimizes (distance, Av, Rv) for each grid point by:

  1. Initial magnitude-space fit for numerical stability

  2. Iterative flux-space refinement until convergence

  3. Optional parallax constraint during optimization

The optimization uses priors on Av and Rv but not on distance/scale (distance priors are applied in logpost_grid).

logpost_grid(results, parallax=None, parallax_err=None, coord=None, Nmc_prior=100, lnprior=None, wt_thresh=0.001, cdf_thresh=0.002, max_models=50000, precision_shrinkage=0.0, subsample_mode='representative', lngalprior=None, lndustprior=None, dustfile=None, dlabels=None, avlim=(0.0, 20.0), rvlim=(1.0, 8.0), mem_lim=8000.0, rstate=None, apply_av_prior=True, R_solar=8.2, Z_solar=0.025, **kwargs)[source]#

Compute log-posterior over the stellar model grid.

This is a wrapper around the module-level logpost_grid function.

Parameters:
  • results (tuple) – Results from loglike_grid with return_vals=True.

  • logpost_grid. (Other parameters are passed to)

Return type:

Results from logpost_grid.

fit(data, data_err, data_mask, data_labels, save_file, phot_offsets=None, parallax=None, parallax_err=None, Nmc_prior=50, avlim=(0.0, 20.0), av_gauss=None, rvlim=(1.0, 8.0), rv_gauss=(3.32, 0.18), lnprior=None, wt_thresh=0.001, cdf_thresh=0.002, Ndraws=250, apply_agewt=True, apply_grad=True, lngalprior=None, lndustprior=None, dustfile=None, apply_dlabels=True, data_coords=None, logl_dim_prior=True, ltol=0.03, ltol_subthresh=0.01, logl_initthresh=0.005, mag_max=50.0, merr_max=0.25, rstate=None, save_dar_draws=True, running_io=True, mem_lim=8000.0, max_models=50000, precision_shrinkage=0.0, subsample_mode='representative', verbose=True, R_solar=8.2, Z_solar=0.025)[source]#

Fit all input models to the input data to compute log-posteriors.

This is the main interface for fitting stellar parameters using grid-based Bayesian inference. Results are saved to an HDF5 file.

Parameters:
  • data (numpy.ndarray of shape (Ndata, Nfilt)) – Observed flux densities for each object.

  • data_err (numpy.ndarray of shape (Ndata, Nfilt)) – Associated errors on the flux densities.

  • data_mask (numpy.ndarray of shape (Ndata, Nfilt)) – Binary mask (0/1) indicating whether each measurement is valid.

  • data_labels (numpy.ndarray of shape (Ndata, Nlabels)) – Labels for each object to be stored in the output file.

  • save_file (str) – Path to the output HDF5 file. The ‘.h5’ extension will be added if not present.

  • phot_offsets (numpy.ndarray of shape (Nfilt,), optional) – Multiplicative photometric offsets applied to data and errors.

  • parallax (numpy.ndarray of shape (Ndata,), optional) – Parallax measurements in mas for each object.

  • parallax_err (numpy.ndarray of shape (Ndata,), optional) – Errors on parallax measurements. Required if parallax is provided.

  • Nmc_prior (int, optional) – Number of Monte Carlo samples for prior integration. Default is 50.

  • avlim (tuple, optional) – (min, max) bounds on A(V). Default is (0.0, 20.0).

  • av_gauss (tuple, optional) – (mean, std) for Gaussian prior on A(V). If provided, this is used instead of the distance-reddening prior during fitting.

  • rvlim (tuple, optional) – (min, max) bounds on R(V). Default is (1.0, 8.0).

  • rv_gauss (tuple, optional) – (mean, std) for Gaussian prior on R(V). Default is (3.32, 0.18) based on Schlafly et al. (2016).

  • lnprior (numpy.ndarray of shape (Nmodel,), optional) – Log-prior for each model. If not provided, defaults to Kroupa IMF prior on initial mass (for MIST models) or PS1 luminosity function prior (for Bayestar models).

  • wt_thresh (float, optional) – Threshold wt_thresh * max(weight) for model selection. Default is 1e-3.

  • cdf_thresh (float, optional) – CDF threshold for model selection (used if wt_thresh is None). Default is 2e-3.

  • Ndraws (int, optional) – Number of posterior draws to save per object. Default is 250.

  • apply_agewt (bool, optional) – Whether to apply age weighting from MIST models. Default is True.

  • apply_grad (bool, optional) – Whether to apply grid spacing corrections. Default is True.

  • lngalprior (callable, optional) – Galactic structure prior function. If not provided, uses the default Galactic model from Green et al. (2014).

  • lndustprior (callable, optional) – Dust prior function with signature f(avs, dustmap, coord, distance=None). If not provided and dustfile is given, uses logp_extinction.

  • dustfile (str or ~brutus.dust.Bayestar, optional) – 3D dust map for extinction priors. Can be a file path (string) to a Bayestar HDF5 file, which will be loaded automatically, or a pre-loaded Bayestar object.

  • apply_dlabels (bool, optional) – Whether to pass model labels to Galactic prior. Default is True.

  • data_coords (numpy.ndarray of shape (Ndata, 2), optional) – Galactic (l, b) coordinates for each object in degrees. Required if using default Galactic prior.

  • logl_dim_prior (bool, optional) – Whether to apply dimensional correction to log-likelihood. Default is True.

  • ltol (float, optional) – Convergence tolerance for likelihood optimization. Default is 3e-2.

  • ltol_subthresh (float, optional) – Sub-threshold for convergence. Default is 1e-2.

  • logl_initthresh (float, optional) – Initial likelihood threshold for model culling. Default is 5e-3.

  • mag_max (float, optional) – Maximum allowed magnitude for valid data. Default is 50.0.

  • merr_max (float, optional) – Maximum allowed magnitude error. Default is 0.25.

  • rstate (numpy.random.RandomState, optional) – Random state for reproducibility.

  • save_dar_draws (bool, optional) – Whether to save distance, A(V), and R(V) draws. Default is True.

  • running_io (bool, optional) – If True, writes results to disk after each object (safer for long runs). If False, accumulates results in memory and writes at end (faster for slow filesystems). Default is True.

  • mem_lim (float, optional) – Memory limit in MB for Monte Carlo sampling. Default is 8000.0.

  • verbose (bool, optional) – Whether to print progress to stderr. Default is True.

  • max_models (int, optional) – When the number of models selected for an object exceeds this, the models are subsampled (see subsample_mode) to bound memory and runtime. Default is 50000.

  • precision_shrinkage (float, optional) – Fractional shrinkage applied to the off-diagonal terms of the 3x3 (scale, A(V), R(V)) precision matrix before Monte Carlo sampling, which stabilizes strongly-correlated fits. 0 disables it; ~0.03 is a reasonable nonzero value. Default is 0.0.

  • subsample_mode (str, optional) – Strategy used to thin models when the selection exceeds max_models: 'representative' (Gumbel-max sampling weighted by likelihood) or 'topk' (the highest-likelihood models). Default is 'representative'.

  • R_solar (float, optional) – Solar galactocentric radius in kpc, used by the Galactic structure prior. Default is 8.2.

  • Z_solar (float, optional) – Solar height above the Galactic midplane in kpc, used by the Galactic structure prior. Default is 0.025.

Returns:

Path to the output HDF5 file.

Return type:

str

Notes

The output HDF5 file contains the following datasets:

  • labels: Object labels (Ndata, Nlabels)

  • model_idx: Resampled model indices (Ndata, Ndraws)

  • ml_scale: Maximum likelihood scale factors (Ndata, Ndraws)

  • ml_av: Maximum likelihood A(V) values (Ndata, Ndraws)

  • ml_rv: Maximum likelihood R(V) values (Ndata, Ndraws)

  • ml_cov_sar: Covariance matrices (Ndata, Ndraws, 3, 3)

  • obj_log_post: Log-posteriors (Ndata, Ndraws)

  • obj_log_evid: Log-evidence per object (Ndata,)

  • obj_chi2min: Minimum chi-squared per object (Ndata,)

  • obj_Nbands: Number of bands used per object (Ndata,)

  • mc_ess: Monte Carlo effective sample size (Ndata, Ndraws). For each posterior draw, the ESS of the MC integration over (distance, Av, Rv) for that draw’s source model. Low ESS indicates poor overlap between the Gaussian proposal and the target posterior.

If save_dar_draws=True, also includes:

  • samps_dist: Distance draws in kpc (Ndata, Ndraws)

  • samps_red: A(V) draws (Ndata, Ndraws)

  • samps_dred: R(V) draws (Ndata, Ndraws)

  • samps_logp: Log-weights for draws (Ndata, Ndraws)

Examples

>>> from brutus.analysis import BruteForce
>>> from brutus.core import StarGrid
>>> from brutus.data import load_models
>>>
>>> # Load model grid
>>> models, labels, params = load_models('grid.h5')
>>> grid = StarGrid(models, labels, params)
>>> fitter = BruteForce(grid)
>>>
>>> # Fit photometry
>>> results_file = fitter.fit(
...     phot, phot_err, phot_mask, obj_labels,
...     save_file='results.h5',
...     parallax=parallax, parallax_err=parallax_err,
...     data_coords=coords
... )
__repr__()[source]#

Return string representation of BruteForce object.

Population Analysis#

brutus.analysis.isochrone_population_loglike(theta, stellarpop, obs_phot, obs_err, parallax=None, parallax_err=None, cluster_prob=0.95, dim_prior=True, outlier_model_func=None, smf_grid=None, eep_grid=None, mini_bound=0.08, eep_binary_max=480.0, return_components=False, mask=None, **outlier_kwargs)[source]#

Compute log-likelihood for coeval stellar population using isochrone fitting.

Uses the mathematically correct mixture-before-marginalization approach: 1. Generate isochrone grid over (mass, SMF) 2. Compute cluster and outlier likelihoods at each grid point 3. Apply mixture model at each grid point 4. Marginalize over (mass, SMF) with proper geometric factors 5. Sum over all objects

Parameters:
  • theta (array-like, shape (6,)) – Population parameters: [feh, loga, av, rv, dist, field_frac] where field_frac is the field contamination fraction (0 to 1).

  • stellarpop (StellarPop object) – StellarPop model from core.populations module with get_seds method

  • obs_phot (array-like, shape (N_objects, N_filters)) – Observed flux densities in units of 10**(-0.4 * mag)

  • obs_err (array-like, shape (N_objects, N_filters)) – Flux density errors in same units

  • parallax (array-like, shape (N_objects,), optional) – Parallax measurements (mas)

  • parallax_err (array-like, shape (N_objects,), optional) – Parallax errors (mas)

  • cluster_prob (float, optional) – Prior probability of cluster membership. Default 0.95

  • dim_prior (bool, optional) – Use chi-square (True) or normal (False) likelihood. Default True

  • outlier_model_func (callable, optional) – Custom outlier model function

  • smf_grid (array-like, optional) – Secondary mass fraction grid for binary modeling

  • eep_grid (array-like, optional) – EEP grid for isochrone evaluation

  • mini_bound (float, optional) – Minimum initial mass for isochrone. Default 0.08

  • eep_binary_max (float, optional) – Maximum EEP for binary modeling. Default 480.0

  • return_components (bool, optional) – Return intermediate results for debugging. Default False

  • mask (array-like, shape (N_objects, N_filters), optional) – Data validity mask

  • **outlier_kwargs (dict) – Additional arguments passed to outlier model

Returns:

  • lnl_total (float) – Total log-likelihood summed over all objects

  • components (dict, optional) – Intermediate results (if return_components=True) containing: - ‘lnl_total’: same as primary return value - ‘lnl_per_object’: array of per-object likelihoods - ‘isochrone_grid’: the generated grid dictionary - ‘lnl_cluster’: cluster likelihood array - ‘lnl_outlier’: outlier likelihood array - ‘lnl_mixture’: mixed likelihood array

See also

generate_isochrone_population_grid

Step 1 - Grid generation

compute_isochrone_cluster_loglike

Step 2 - Cluster likelihood

compute_isochrone_outlier_loglike

Step 3 - Outlier likelihood

apply_isochrone_mixture_model

Step 4 - Mixture model

marginalize_isochrone_grid

Step 5 - Marginalization

brutus.core.populations.StellarPop

Stellar population model

Notes

This function implements the complete mixture-before-marginalization workflow:

\[\begin{split}\\ln L(\\theta) = \\sum_i \\ln \\left[ \\int \\int \\left( w_c P_c(d_i|m, s, \\theta) + w_o P_o(d_i) \\right) dm \\, ds \\right]\end{split}\]

where: - \(\\theta = [{\\rm Fe/H}, \\log {\\rm age}, A_V, R_V, d, f_{\\rm field}]\) are population parameters - \(m\) is stellar mass - \(s\) is secondary mass fraction (SMF) - \(w_c, w_o\) are mixture weights - \(P_c, P_o\) are cluster and outlier likelihoods - \(d_i\) is data for object i

The function is designed for use with MCMC samplers like emcee or nested sampling codes like dynesty. It handles errors gracefully by returning -∞ for failed computations.

Examples

Use with emcee for MCMC sampling:

>>> def lnprob(theta):
...     if not in_prior_bounds(theta):
...         return -np.inf
...     return isochrone_population_loglike(theta, pop, flux, err)
>>>
>>> sampler = emcee.EnsembleSampler(nwalkers, ndim, lnprob)
>>> sampler.run_mcmc(p0, nsteps)

Extract intermediate results for diagnostics:

>>> lnl, components = isochrone_population_loglike(
...     theta, pop, flux, err, return_components=True
... )
>>> print(f"Per-object likelihoods: {components['lnl_per_object']}")
brutus.analysis.generate_isochrone_population_grid(stellarpop, feh, loga, av, rv, dist, smf_grid=None, eep_grid=None, mini_bound=0.08, eep_binary_max=480.0, corr_params=None)[source]#

Generate isochrone population grid over (mass, SMF) parameter space.

Parameters:
  • stellarpop (StellarPop object) – StellarPop model from core.populations module with get_seds method

  • feh (float) – Stellar population parameters (metallicity, log age, extinction, distance)

  • loga (float) – Stellar population parameters (metallicity, log age, extinction, distance)

  • av (float) – Stellar population parameters (metallicity, log age, extinction, distance)

  • rv (float) – Stellar population parameters (metallicity, log age, extinction, distance)

  • dist (float) – Stellar population parameters (metallicity, log age, extinction, distance)

  • smf_grid (array-like, optional) – Secondary mass fraction grid. Default is 21 uniform points from 0.0 to 1.0

  • eep_grid (array-like, optional) – EEP grid for isochrone evaluation. Default is 1000 points from 202 to 808

  • mini_bound (float, optional) – Minimum initial mass for evaluation. Default 0.08 solar masses

  • eep_binary_max (float, optional) – Maximum EEP for binary modeling. Default 480.0

  • corr_params (array-like, optional) – Empirical correction parameters [dtdm, drdm, msto_smooth, feh_scale]

Returns:

grid – Dictionary containing: - ‘photometry’: array, shape (N_total_points, N_filters) - model photometry - ‘masses’: array, shape (N_total_points,) - stellar masses - ‘smf_values’: array, shape (N_total_points,) - SMF values for each point - ‘mass_jacobians’: array, shape (N_total_points,) - mass grid spacing - ‘smf_jacobians’: array, shape (N_total_points,) - SMF grid spacing - ‘grid_info’: dict with SMF grid structure information

Return type:

dict

See also

compute_isochrone_cluster_loglike

Use this grid for likelihood computation

StellarPop.get_seds

Underlying SED generation

Notes

The grid is constructed by:

  1. Looping over SMF values (binary mass ratios)

  2. For each SMF, computing isochrone along EEP dimension

  3. Extracting masses from the isochrone

  4. Computing geometric jacobians (grid spacings) for proper integration

  5. Filtering invalid models (NaN photometry from impossible binaries)

The jacobians are critical for proper marginalization - they represent the geometric factors \(dm\) and \(d({\\rm SMF})\) in the integral:

\[\begin{split}P({\\rm data}) = \\int \\int P({\\rm data}|m, {\\rm SMF}) \\, dm \\, d({\\rm SMF})\end{split}\]

Binary models are only computed for EEP ≤ eep_binary_max (typically main sequence) to avoid unphysical binary configurations.

brutus.analysis.compute_isochrone_cluster_loglike(obs_flux, obs_err, isochrone_grid, parallax=None, parallax_err=None, distance=None, dim_prior=True, mask=None)[source]#

Compute cluster membership likelihood using existing photometry infrastructure.

Parameters:
  • obs_flux (array-like, shape (N_objects, N_filters)) – Observed flux densities

  • obs_err (array-like, shape (N_objects, N_filters)) – Flux errors

  • isochrone_grid (dict) – Isochrone grid from generate_isochrone_population_grid()

  • parallax (array-like, shape (N_objects,), optional) – Parallax measurements (mas)

  • parallax_err (array-like, shape (N_objects,), optional) – Parallax errors (mas)

  • distance (float, optional) – Population distance (pc). Required if parallax provided

  • dim_prior (bool, optional) – Whether to use chi-square (True) or normal (False) likelihood

  • mask (array-like, shape (N_objects, N_filters), optional) – Data mask (1=use, 0=skip)

Returns:

lnl_cluster – Cluster membership log-likelihood for each grid point and object. Invalid models (NaN photometry) are assigned NaN likelihood.

Return type:

array-like, shape (N_grid_points, N_objects)

See also

brutus.utils.photometry.phot_loglike

Photometric likelihood function

compute_isochrone_outlier_loglike

Complementary outlier likelihood

generate_isochrone_population_grid

Creates the isochrone_grid input

Notes

The likelihood includes both photometric and parallax components:

\[\begin{split}\\ln L_{\\rm cluster} = \\ln L_{\\rm phot} + \\ln L_{\\rm parallax}\end{split}\]

where the photometric likelihood uses either chi-square (dim_prior=True) or Gaussian (dim_prior=False) formulation.

Invalid models with NaN photometry are preserved as NaN in the output to be properly handled during marginalization (they contribute zero probability via logsumexp).

brutus.analysis.compute_isochrone_outlier_loglike(obs_flux, obs_err, isochrone_grid=None, parallax=None, parallax_err=None, dim_prior=True, outlier_model_func=None, **outlier_kwargs)[source]#

Compute outlier likelihood with stellar-parameter-aware interface.

Parameters:
  • obs_flux (array-like, shape (N_objects, N_filters)) – Observed flux densities

  • obs_err (array-like, shape (N_objects, N_filters)) – Flux errors

  • isochrone_grid (dict, optional) – Isochrone grid containing stellar parameters for potential dependence

  • parallax (array-like, shape (N_objects,), optional) – Parallax measurements (mas)

  • parallax_err (array-like, shape (N_objects,), optional) – Parallax errors (mas)

  • dim_prior (bool, optional) – Use chi-square (True) or uniform (False) outlier model

  • outlier_model_func (callable, optional) – Custom outlier model function

  • **outlier_kwargs (dict) – Additional arguments for outlier model

Returns:

lnl_outlier – Outlier likelihood for each grid point and object

Return type:

array-like, shape (N_grid_points, N_objects)

brutus.analysis.apply_isochrone_mixture_model(lnl_cluster, lnl_outlier, cluster_prob, field_fraction)[source]#

Apply mixture model at each grid point: mixture before marginalization.

For each (grid_point, object) pair: P(data|mass,SMF) = P_cluster * P(data|cluster) + P_outlier * P(data|outlier)

Parameters:
  • lnl_cluster (array-like, shape (N_grid_points, N_objects)) – Cluster membership likelihoods

  • lnl_outlier (array-like, shape (N_grid_points, N_objects)) – Outlier model likelihoods

  • cluster_prob (float) – Prior probability of cluster membership (external)

  • field_fraction (float) – Fraction of cluster stars that are field contaminants (fitted parameter)

Returns:

lnl_mixture – Mixed log-likelihoods for each grid point and object

Return type:

array-like, shape (N_grid_points, N_objects)

See also

marginalize_isochrone_grid

Next step after mixture application

compute_isochrone_cluster_loglike

Cluster likelihood component

compute_isochrone_outlier_loglike

Outlier likelihood component

Notes

The mixture model is applied as:

\[\begin{split}P({\\rm data}|m, {\\rm SMF}) = w_c \\cdot P_c + w_o \\cdot P_o\end{split}\]

where: - \(w_c = P_{\\rm cluster} \\cdot (1 - f_{\\rm field})\) - \(w_o = 1 - w_c\) - \(P_{\\rm cluster}\) is the prior probability (cluster_prob) - \(f_{\\rm field}\) is the field contamination fraction

This is computed in log-space using logsumexp for numerical stability:

\[\begin{split}\\ln L_{\\rm mix} = \\ln(\\exp(\\ln L_c + \\ln w_c) + \\exp(\\ln L_o + \\ln w_o))\end{split}\]

The key distinction from traditional approaches is that this mixture is applied before marginalization over stellar parameters, which is mathematically correct for contaminated populations.

brutus.analysis.marginalize_isochrone_grid(lnl_mixture, mass_jacobians, smf_jacobians)[source]#

Marginalize mixed likelihoods over (mass, SMF) grid with geometric jacobians.

Performs: P(data|population_params) = ∫∫ P(data|mass,SMF) dm d(SMF)

Parameters:
  • lnl_mixture (array-like, shape (N_grid_points, N_objects)) – Mixed likelihoods at each grid point

  • mass_jacobians (array-like, shape (N_grid_points,)) – Mass grid spacing (geometric factors for integration)

  • smf_jacobians (array-like, shape (N_grid_points,)) – SMF grid spacing (geometric factors for integration)

Returns:

lnl_marginalized – Marginalized log-likelihoods for each object

Return type:

array-like, shape (N_objects,)

See also

apply_isochrone_mixture_model

Previous step before marginalization

generate_isochrone_population_grid

Provides the jacobians

Notes

Performs the integration:

\[\begin{split}P({\\rm data}|\\theta) = \\int \\int P({\\rm data}|m, {\\rm SMF}, \\theta) \\, dm \\, d({\\rm SMF})\end{split}\]

numerically using a grid-based approach:

\[\begin{split}\\ln P \\approx \\ln \\sum_{i,j} \\exp(\\ln L_{i,j}) \\cdot \\Delta m_i \\cdot \\Delta({\\rm SMF})_j\end{split}\]

where: - \(\\ln L_{i,j}\) is the mixed likelihood at grid point (i,j) - \(\\Delta m_i\) is the mass grid spacing (mass_jacobians) - \(\\Delta({\\rm SMF})_j\) is the SMF grid spacing (smf_jacobians)

Invalid models (with NaN likelihood) are converted to -∞ before the logsumexp operation, so they contribute zero probability.

The jacobians represent geometric integration weights and are crucial for obtaining unbiased parameter estimates.

Photometric Offsets#

class brutus.analysis.PhotometricOffsetsConfig(min_bands_used: int = 4, min_bands_unused: int = 3, n_bootstrap: int = 300, uncertainty_method: str = 'bootstrap_iqr', progress_interval: int = 10, use_vectorized_bootstrap: bool = True, random_seed: int | None = None, validate_inputs: bool = True)[source]#

Bases: object

Configuration class for photometric offsets computation.

This class encapsulates all configuration parameters and provides sensible defaults with the ability to customize behavior.

Parameters:
  • min_bands_used (int, optional) – Minimum number of bands required for objects where the current band was used in fitting. Default is 4 (equivalent to >3+1).

  • min_bands_unused (int, optional) – Minimum number of bands required for objects where the current band was not used in fitting. Default is 3.

  • n_bootstrap (int, optional) – Number of bootstrap realizations for uncertainty estimation. Default is 300.

  • uncertainty_method (str, optional) – Method for uncertainty estimation. Options: - ‘bootstrap_std’: Standard deviation of bootstrap medians - ‘bootstrap_iqr’: Scaled interquartile range of bootstrap medians Default is ‘bootstrap_iqr’.

  • progress_interval (int, optional) – Print progress every N iterations. Set to 0 for no progress. Default is 10.

  • use_vectorized_bootstrap (bool, optional) – Use vectorized bootstrap implementation for better performance. Default is True.

  • random_seed (int, optional) – Random seed for reproducible results. Default is None.

  • validate_inputs (bool, optional) – Perform input validation. Default is True.

See also

photometric_offsets

Main function using this configuration

Notes

The configuration defaults are chosen to balance statistical robustness with computational efficiency:

  • min_bands_used=4: Ensures robust likelihood reweighting

  • min_bands_unused=3: Minimum for meaningful photometric constraints

  • n_bootstrap=300: Sufficient for stable uncertainty estimates

  • bootstrap_iqr: More robust to outliers than standard deviation

Examples

>>> config = PhotometricOffsetsConfig(
...     min_bands_used=5,
...     n_bootstrap=500,
...     random_seed=42
... )
>>> offsets, errors, n_used = photometric_offsets(
...     phot, err, mask, models, idxs, avs, rvs, dists,
...     config=config
... )
__init__(min_bands_used: int = 4, min_bands_unused: int = 3, n_bootstrap: int = 300, uncertainty_method: str = 'bootstrap_iqr', progress_interval: int = 10, use_vectorized_bootstrap: bool = True, random_seed: int | None = None, validate_inputs: bool = True)[source]#
brutus.analysis.photometric_offsets(phot: ndarray, err: ndarray, mask: ndarray, models: ndarray, idxs: ndarray, reds: ndarray, dreds: ndarray, dists: ndarray, sel: ndarray | None = None, weights: ndarray | None = None, mask_fit: ndarray | None = None, old_offsets: ndarray | None = None, dim_prior: bool = True, prior_mean: ndarray | None = None, prior_std: ndarray | None = None, verbose: bool = True, config: PhotometricOffsetsConfig | None = None, rng: Generator | None = None) Tuple[ndarray, ndarray, ndarray][source]#

Compute multiplicative photometric offsets between data and models.

This function computes photometric offsets that account for systematic differences between observed photometry and model predictions. The offsets are computed by comparing model/data flux ratios across a sample of objects, with proper uncertainty estimation and optional prior constraints.

Parameters:
  • phot (np.ndarray of shape (n_objects, n_filters)) – Observed flux densities for all objects.

  • err (np.ndarray of shape (n_objects, n_filters)) – Associated flux errors for all objects.

  • mask (np.ndarray of shape (n_objects, n_filters)) – Binary mask (0/1) indicating observed bands for each object.

  • models (np.ndarray of shape (n_models, n_filters, n_coeffs)) – Magnitude polynomial coefficients for generating reddened photometry.

  • idxs (np.ndarray of shape (n_objects, n_samples)) – Model indices fit to each object.

  • reds (np.ndarray of shape (n_objects, n_samples)) – A(V) reddening values for each object and sample.

  • dreds (np.ndarray of shape (n_objects, n_samples)) – R(V) reddening curve shape values for each object and sample.

  • dists (np.ndarray of shape (n_objects, n_samples)) – Distance values (kpc) for each object and sample.

  • sel (np.ndarray of shape (n_objects), optional) – Boolean selection of objects to use. Default uses all objects.

  • weights (np.ndarray of shape (n_objects, n_samples), optional) – Sample weights for each object. Default uses uniform weights.

  • mask_fit (np.ndarray of shape (n_filters), optional) – Boolean mask indicating which filters were used in fitting. Default assumes all filters were used.

  • old_offsets (np.ndarray of shape (n_filters), optional) – Previous offsets to remove before computing new ones. Default is no previous offsets.

  • dim_prior (bool, optional) – Whether to apply dimensionality prior in likelihood reweighting. Default is True.

  • prior_mean (np.ndarray of shape (n_filters), optional) – Gaussian prior means for offsets. Must be provided with prior_std.

  • prior_std (np.ndarray of shape (n_filters), optional) – Gaussian prior standard deviations for offsets.

  • verbose (bool, optional) – Whether to print progress information. Default is True.

  • config (PhotometricOffsetsConfig, optional) – Configuration object with analysis parameters. Default uses standard configuration.

  • rng (np.random.Generator, optional) – Random number generator for reproducible results. Default creates new generator.

Returns:

  • offsets (np.ndarray of shape (n_filters)) – Multiplicative photometric offsets (model/data ratios).

  • offset_errors (np.ndarray of shape (n_filters)) – Uncertainties on the photometric offsets.

  • n_objects_used (np.ndarray of shape (n_filters)) – Number of objects used to compute each offset.

Examples

>>> import numpy as np
>>> from brutus.analysis.offsets import photometric_offsets
>>>
>>> # Mock data for demonstration
>>> n_obj, n_filt, n_samp = 100, 5, 50
>>> phot = np.random.uniform(0.1, 10, (n_obj, n_filt))
>>> err = 0.1 * phot
>>> mask = np.random.choice([0, 1], (n_obj, n_filt), p=[0.1, 0.9])
>>>
>>> # Mock fitted parameters
>>> models = np.random.random((1000, n_filt, 3))
>>> idxs = np.random.randint(0, 1000, (n_obj, n_samp))
>>> reds = np.random.uniform(0, 2, (n_obj, n_samp))
>>> dreds = np.random.uniform(2.5, 4.5, (n_obj, n_samp))
>>> dists = np.random.uniform(0.1, 10, (n_obj, n_samp))
>>>
>>> # Compute offsets
>>> offsets, errors, n_used = photometric_offsets(
...     phot, err, mask, models, idxs, reds, dreds, dists
... )
>>> print(f"Computed offsets: {offsets}")

See also

PhotometricOffsetsConfig

Configuration options

brutus.core.sed_utils.get_seds

Model SED generation

brutus.utils.photometry.phot_loglike

Likelihood computation

brutus.analysis.individual.BruteForce

Source of fitted parameters

Notes

The photometric offset for each band is computed as:

  1. Generate model SEDs for all fitted objects and posterior samples

  2. Scale by distance: \(F_{\\rm model} = F_0 / d^2\)

  3. Compute flux ratios: \(r = F_{\\rm model} / F_{\\rm obs}\)

  4. Reweight samples: For bands used in fitting, recompute \(P(M|D_{-i})\) excluding band i to avoid circularity

  5. Bootstrap: Resample objects and models with weights, compute median

  6. Uncertainty: From bootstrap distribution (IQR or std)

  7. Apply priors (optional): Bayesian combination with prior

The reweighting in step 4 is critical: if a band was used in the original fit, including it in offset computation would create a circular dependency. Instead, we recompute posteriors excluding that band.

The offsets should be applied as:

\[\begin{split}F_{\\rm corrected} = F_{\\rm observed} \\times {\\rm offset}\end{split}\]

For iterative refinement, provide old_offsets from previous iteration.

References

The bootstrap methodology follows standard non-parametric uncertainty estimation. The likelihood reweighting approach ensures unbiased estimates for bands included in the original fit.

Line-of-Sight Dust#

brutus.analysis.los_clouds_priortransform(u, rlims=(0.0, 6.0), dlims=(4.0, 19.0), pb_params=(-3.0, 0.7, -inf, 0.0), s_params=(-3.0, 0.3, -inf, 0.0), dust_template=False, nlims=(0.2, 2))[source]#

Transform unit cube samples to physical parameters for LOS dust fitting.

The “prior transform” for the LOS fit that converts from draws on the N-dimensional unit cube to samples from the prior. Used in nested sampling methods. Assumes uniform priors for distance and reddening and a (truncated) log-normal in outlier fraction.

Parameters:
  • u (array_like, shape (Nparams,)) – The Nparams values drawn from the unit cube. Contains the portion of outliers P_b, followed by the foreground smoothing sfore and background smoothing sback, followed by the foreground reddening fred, followed by a series of (dist, red) pairs for each “cloud” along the LOS.

  • rlims (tuple of float, optional) – The reddening bounds within which we’d like to sample. Default is (0., 6.), which assumes reddening is in units of A_V.

  • dlims (tuple of float, optional) – The distance bounds within which we’d like to sample. Default is (4., 19.), which assumes distance is in units of distance modulus.

  • pb_params (tuple of float, optional) – Mean, standard deviation, lower bound, and upper bound for a truncated log-normal distribution used as a prior for the outlier model. The default is (-3., 0.7, -np.inf, 0.), which corresponds to a mean of 0.05, a standard deviation of a factor of 2, a lower bound of 0, and an upper bound of 1.

  • s_params (tuple of float, optional) – Mean, standard deviation, lower bound, and upper bound for a truncated log-normal distribution used as a prior for the smoothing along the reddening axis (in %). The default is (-3., 0.3, -np.inf, 0.), which corresponds to a mean of 0.05, a standard deviation of a factor of 1.35, a lower bound of 0, and an upper bound of 1.

  • dust_template (bool, optional) – Whether or not to use a spatial distribution for the dust based on a particular template. If true, dust along the line of sight will be in terms of rescalings of the template rather than A_V. Default is False.

  • nlims (tuple of float, optional) – Lower and upper bounds for the uniform prior for the rescaling applied to the Planck spatial reddening template. Default is (0.2, 2.).

Returns:

x – The transformed physical parameters in order: [pb, s_fore, s_back, A_V_fore, d1, A_V1, d2, A_V2, …] where di are sorted in increasing order.

Return type:

ndarray, shape (Nparams,)

See also

los_clouds_loglike_samples

Likelihood function for these parameters

kernel_gauss

Gaussian smoothing kernel

kernel_lorentz

Lorentzian smoothing kernel

Notes

The prior distributions are:

  • Outlier fraction (P_b): Truncated log-normal with median ~0.05

  • Smoothing (s_fore, s_back): Truncated log-normal with median ~0.05

  • Foreground extinction: Uniform over rlims

  • Cloud distances: Uniform over dlims, sorted in increasing order

  • Cloud extinctions: Uniform over rlims, ordered by distance

The sorting ensures clouds are ordered by increasing distance, which is required for the cumulative extinction calculation.

The log-normal priors on outlier fraction and smoothing concentrate probability near zero while allowing occasional larger values.

Examples

>>> import numpy as np
>>> # For 1-cloud model: pb, s0, s, fred, dist1, red1 (6 parameters)
>>> u = np.random.uniform(0, 1, 6)
>>> params = los_clouds_priortransform(u)
>>> print(f"Outlier fraction: {params[0]:.3f}")
>>> print(f"Cloud distance (DM): {params[4]:.1f}")
>>> print(f"Cloud extinction (A_V): {params[5]:.2f}")
brutus.analysis.los_clouds_loglike_samples(theta, dsamps, rsamps, kernel='gauss', rlims=(0.0, 6.0), template_reds=None, Ndraws=25, additive_foreground=False, monotonic=True)[source]#

Compute log-likelihood for multi-cloud extinction model along line-of-sight.

Compute the log-likelihood for the cumulative reddening along the line of sight (LOS) parameterized by theta, given a set of input reddening and distance draws. Assumes a uniform outlier model in distance and reddening across our binned posteriors.

Parameters:
  • theta (array_like, shape (Nparams,)) – A collection of parameters that characterizes the cumulative reddening along the LOS. Contains the fraction of outliers P_b followed by the fractional reddening smoothing for the foreground s0 and background s followed by the foreground reddening fred followed by a series of (dist, red) pairs for each “cloud” along the LOS.

  • dsamps (array_like, shape (Nobj, Nsamps)) – Distance samples for each object. Follows the units used in theta.

  • rsamps (array_like, shape (Nobj, Nsamps)) – Reddening samples for each object. Follows the units in theta.

  • kernel (str or callable, optional) – The kernel used to weight the samples along the LOS. If a string is passed, a pre-specified kernel will be used. Options include ‘lorentz’, ‘gauss’, and ‘tophat’. Default is ‘gauss’.

  • rlims (tuple of float, optional) – The reddening bounds within which we’d like to sample. Default is (0., 6.), which assumes reddening is in units of A_V.

  • template_reds (array_like, shape (Nobj,), optional) – Reddenings for each star based on a spatial dust template. If not provided, the same reddening value in a given distance bin will be fit to all stars. If provided, a rescaled version of the individual reddenings will be fit instead.

  • Ndraws (int, optional) – The number of draws to use for each star. Default is 25.

  • additive_foreground (bool, optional) – Whether the foreground is treated as just another value or added to all background values. Default is False.

  • monotonic (bool, optional) – Whether to enforce monotonicity in the fits so that the values must get larger with distance. Default is True.

Returns:

loglike – The computed log-likelihood.

Return type:

float

Examples

>>> import numpy as np
>>> # Generate synthetic stellar samples
>>> nstars = 10
>>> nsamps = 50
>>> dsamps = np.random.uniform(6, 12, (nstars, nsamps))  # distance modulus
>>> rsamps = np.random.uniform(0, 2, (nstars, nsamps))   # A_V extinction
>>>
>>> # 1-cloud model parameters: [pb, s0, s, fred, dist1, red1]
>>> theta = [0.1, 0.05, 0.05, 0.2, 8.0, 0.5]
>>>
>>> # Compute likelihood
>>> loglike = los_clouds_loglike_samples(theta, dsamps, rsamps)
>>> print(f"Log-likelihood: {loglike:.2f}")
brutus.analysis.kernel_tophat(reds, kp)[source]#

Compute weighted log-probabilities using a Top-Hat kernel.

The kernel defines uniform dispersion of extinction values around the cloud mean: samples within mean +/- half_bin_width are equally likely, and samples outside are assigned zero probability.

Parameters:
  • reds (array_like, shape (Nsamps,)) – Reddening samples for each object.

  • kp (tuple of float) – The kernel parameters (mean, half_bin_width).

Returns:

logw – Log-weights for each sample.

Return type:

ndarray, shape (Nsamps,)

Examples

>>> import numpy as np
>>> reds = np.array([0.1, 0.3, 0.5, 0.7, 0.9])
>>> kp = (0.5, 0.2)  # mean=0.5, half-width=0.2, so range [0.3, 0.7]
>>> logw = kernel_tophat(reds, kp)
>>> # Samples within [0.3, 0.7] should have higher weights
brutus.analysis.kernel_gauss(reds, kp)[source]#

Compute weighted log-probabilities using a Gaussian kernel.

The kernel defines Gaussian dispersion of extinction values around the cloud mean: samples are weighted by a normal distribution centered on the mean extinction with the given standard deviation.

Parameters:
  • reds (array_like, shape (Nsamps,)) – Reddening samples for each object.

  • kp (tuple of float) – The kernel parameters (mean, standard_deviation).

Returns:

logw – Log-weights for each sample.

Return type:

ndarray, shape (Nsamps,)

Examples

>>> import numpy as np
>>> reds = np.array([0.1, 0.3, 0.5, 0.7, 0.9])
>>> kp = (0.5, 0.1)  # mean=0.5, std=0.1
>>> logw = kernel_gauss(reds, kp)
>>> # Sample at 0.5 should have highest weight
brutus.analysis.kernel_lorentz(reds, kp)[source]#

Compute weighted log-probabilities using a Lorentzian kernel.

The kernel defines heavy-tailed dispersion of extinction values around the cloud mean: samples are weighted by a Cauchy/Lorentzian distribution centered on the mean extinction, allowing larger deviations from the mean than a Gaussian kernel.

Parameters:
  • reds (array_like, shape (Nsamps,)) – Reddening samples for each object.

  • kp (tuple of float) – The kernel parameters (mean, HWHM) where HWHM is the half-width at half-maximum.

Returns:

logw – Log-weights for each sample.

Return type:

ndarray, shape (Nsamps,)

Examples

>>> import numpy as np
>>> reds = np.array([0.1, 0.3, 0.5, 0.7, 0.9])
>>> kp = (0.5, 0.1)  # mean=0.5, HWHM=0.1
>>> logw = kernel_lorentz(reds, kp)
>>> # Sample at 0.5 should have highest weight, with heavy tails