What is pdBIL-MCMC, you ask? From Drovandi et al. (2014) it’s ‘parametric data Bayesian indirect likelihood Markov Chain Monte Carlo’; and the unwitting use of it was in a paper on the Milky Way’s Initial Mass Function (IMF) appearing in yesterday’s astro ph by Rybizki & Just (2015). By the way, although the term ‘unwitting’ might sound pejorative because it contains ‘un-wit’, I really mean it to be read here as a near synonym for ‘unintentional’.
The statistical problem here is that the authors have an observed colour-magnitude diagram with data coming from multiple surveys and they would like to use it to tune the parameters of a simulation code they’ve built to model the data-generating process. To compute a ‘likelihood’ for the observed data given a set of input model parameters the authors proceed as follows: first, they divide their CMD into 12 magnitude bins for dwarf stars and 7 magnitude bins for giant stars and tally the number of observed stars in each; second, they suppose that the observed count in each bin corresponds to a draw from a Poisson distribution with expectation to be set by the model; and third to estimate the model predicted expectation in each bin they run their model 400 times and average the ‘mock observed’ number of simulated stars. Incidentally the average of the mock data is also the maximum likelihood estimator for the underlying rate of the Poisson.
Readers familiar with ABC and the Pseudo-Marginal method will recognise the general form of this problem—i.e. that the likelihood cannot be directly computed given the model parameters, instead simulations are needed to generate mock data—yet the proposed approach is not exactly either—it’s not an ordinary ABC algorithm because there’s no thresholding (or fixed kernel weighting) of a discrepancy distance (although the binned counts might be seen as a summary statistic), and it’s not pseudo-marginal because its not taking the average likelihood over multiple realisations of the mock data (and in any case this would not be an unbiased estimator for the likelihood!). In fact, unlike ABC it is important in this set up to run more than one instance of the mock data simulation per MCMC step (i.e., to improve estimation of the underlying Poisson rate).
It turns out that the algorithm presented corresponds to the pdBIL-MCMC algorithm studied by Drovandi et al. (2014) if we allow the binning structure imposed with the Poisson assumption to represent an auxiliary model as per the indirect inference paradigm. Why is this worthwhile to recognise? Well, (i) once we know what method we’re dealing with we can compare ‘our’ approach to the results of other studies using the same method (e.g. why 400 realizations, not 200 or 10000? what is the role of estimation error in the auxiliary model parameter in bounding the physical model parameter estimation error?); and (ii) to recognise that this isn’t the only possible method to analyse the given dataset (e.g. how would ordinary ABC compare? would a well chosen summary statistic give greater inference power than that allowed by the auxiliary model with its ‘crude binning’ structure and uneven assignment of ‘likelihood’ importance between bins?).