A recent arXival presents an alternative method for approximating the posterior of some cosmological parameters given an observational dataset for which it is also required to marginalise over a set of nuisance parameters describing the non-linear regime. Previous approaches have run expensive simulations of the cosmology at a set of test cosmological parameters and attached to these a round of simulations of the observable data under the non-linear regime model at a set of test parameters. Against this collection of mock data an emulator for the conditional mean (conditional on both cosmological and non-linear regime parameters) of the observable data is created; the emulator is then plugged into an MCMC sampler as a proxy model. The new approach instead approximates the marginal likelihood of the observed data at each set of cosmological parameters, marginalising over the non-linear regime part of the model, and then applies an emulator (currently now a simple variational approximation) to the posterior in cosmological parameter space.
This is an approach I’ve been encouraging cosmologists to explore in the past, although it wasn’t obvious to me that it would be a guaranteed improvement since it seemed in my consideration (for the case of cosmological covariance matrices) that the marginal likelihood approximation would still require mock data simulations at each test set of cosmological parameters. In this arXival the authors seem to side-step the need for mock data simulations from the non-linear model component by adopting a so-called tabulation strategy. Since I’m unfamiliar with that particular approach it is unclear to me whether this gives an unbiased approximation to the likelihood or is somehow a crude estimator, which means that some of the efficiency gain may simply be a sleight-of-hand.
In any case, since I’m again discussing this problem I may as well say the same thing I always say, which is ‘why not go further and do multi-fidelity Bayesian optimisation?’. Estimation of the marginal likelihood (using the tabulation trick and nested sampling) takes approx 0.3 hours to achieve a reasonable estimate. Presumably one could get the broad shape of the posterior by running nested sampling with much fewer live particles, and then decide where to refine estimates with more live particles in an automatic manner. Ditto for scheduling of further cosmological simulations.