## Six key trends in contemporary statistics that really could revolutionise astronomical data analysis …

(4) ABC and the pseudo-marginal method.
Of the six trends discussed here Approximate Bayesian Computation (ABC) is my personal favourite, in part because I’ve enjoyed applying it to estimation of Chlamydia incidence estimates in a recent Australian study with collaborators at the Kirby Institute (almost submitted).  Ordinary Bayesian inference problems are often ‘a little bit intractable’ in that the posterior can be evaluated only up to an unknown normalising constant, but a number of interesting models (such as the Ising/Potts model, cf. Wu 1982) lead to doubly intractable posteriors for which we cannot readily evaluate either the likelihood function or the normalising constant.  Obviously this can throw a spanner in the works, but if we can at least simulate mock datasets from the likelihood function for any given set of input parameters then ABC (and its cousin, the pseudo-marginal method) provides a well studied computational algorithm for approximating the posterior.  Pioneering applications include those of Tavare et al. 1997, Beaumont et al. 2002 and Marjoram 2003 in the field of population genetics (e.g. inference of coalescent times from DNA data).

The trick for conducting inference under the ABC paradigm is to replace the true posterior with an approximation based on the comparison of simulated (mock) datasets generated under a range of input parameters (drawn or weighted in proportion to their prior density) against the observed dataset, accepting only those input parameters producing sufficiently ‘close’ matches as ABC posterior draws.  Of course, the metric of comparison requires careful definition: typically a Euclidean distance is used in a lower dimensional space corresponding to the range of summary statistics for the true data (see e.g. Nunes & Balding 2010).  In the typical situation that we lack sufficient statistics for our model the choice of summary statistic needs careful consideration, and there exist a number of schemes proposing principled methods of construction (e.g. Fearnhead & Prangle 2012, Aeschbacher et al. 2012).  Key techniques for exploration of the ABC posterior include SMC ABC (Del Moral et al. 2011, Toni et al. 2009Drovandi & Pettitt 2010), and ABC MCMC (Marjoram 2003; though note that the 1-hit approach of Lee et al. [see discussion here] provides ‘better’ convergence to stationarity; Lee & Latuszynski 2012).

For some models the posterior might be more like ‘one and a half times’ intractable in that we can write down the likelihood function conditional upon some vector of auxiliary variables with doubly intractable likelihood; in which case the thresholding procedure of ABC may be dispensed with but the key step of simulation-based inference remains.  The resulting pseudo-marginal methodology (Andrieu & Roberts 2009, Andrieu & Vihola 2012) shares many of the design concerns of ABC algorithms, and provides important solutions for much the same type of applied statistics problems (e.g. Stramer & Bognar 2011).

Notes/References.  Recently both myself (see Cameron & Pettitt 2012) and Weyant et al. 2012 have presented novel applications of ABC to astronomical model analysis: in my case the subject was galaxy evolution and in Anja’s case it was cosmological parameter estimation from supernova data.  There have also been a number of astronomical papers proposing ad hoc methods quite similar to ABC, including e.g. Vaughan et al. 2003 for X-ray light curve analysis and Kashyap et al. 2002 in the study of stellar flares—which may give some further insight into the types of astronomical problems amenable to ABC.  Likewise, there have been numerous “pseudo-pseudo-marginal” applications in astronomy as well: a good example is the stellar cluster modelling (with IMF stochastic sampling) by Fouesneau  & Lancon 2009.