Long time readers of this blog will know that I’ve been bigging up the potential of Bayesian optimisation approaches for fitting expensive astronomical simulations for many years now. In particular, when likelihood function computations are computationally very expensive and possibly stochastic due to the need to run some kind of crazy physics simulator at the back end, then it makes sense to approximate the posterior in an iterative manner using Bayesian optimisation. And if you’re going to do that you should learn from the wealth of experience in the statistics community (and elsewhere in other fields doing applied statistics), because, yeah, this problem is not unique to astronomy.
Nevertheless I was not surprised to see a recent arxival (which pointed to another that I’d missed and which has the same flaws) in which Bayesian optimisation is presented as if it were a newly developed methodology invented by the authors. And since they don’t cite any of the relevant literature it also isn’t a surprise to see them fall into all the traps that could have been avoided. For instance, we know that the squared exponential kernel is a shitty choice for emulating a log-likelihood surface, that homoskedastic noise is a shitty assumption, that it all works a lot better if we throw in a linear kernel made of linear and quadratic version of the input parameters, and if (like we do in cosmology) have some low-cost-but-not-perfect versions of the physical simulators then we should throw those in the mix with multi-fidelity Bayesian optimisation. I would also suggest that for problems like recovering the CMB posterior there are gains to be made in building separate GPs for each of the log-likelihood contributions of the different observational components (e.g. angular power spectrum, polarization, etc) since each is more informative of lower subset of dimensions and can be optimally emulated with an automatic relevance determination kernel.
To be fair to the authors, one of the problems with our current academic system is that it rewards quantity over quality, so there’s little incentive to make a thorough investigation of a topic if a quick investigation can lead to a quick publication.