## ABC for Lyman forest studies …

I enjoyed reading this recent arXival by Davies et al. in which the authors present an ABC  (approximate Bayesian computation) analysis of the cosmological reionization history.  In particular, the idea is to use ABC to overcome intractability of the joint likelihood function for Lyman-$alpha$ and Lyman-$beta$ absorption along the line (or lines) of sight to distant ($z>5$) quasars under realistic models for the generating process.  The reason I enjoyed this paper was because the steps of the ABC analysis were comprehensively described and seem to have been carefully thought through: the origin of the intractability in the likelihood and the choice of summary statistics are thoroughly discussed, and mock data simulations under known parameters are used to test the pipeline.  Perhaps not surprising because I see in the acknowledgements both Foreman-Mackey and Hogg, and also Fouesneau who I might one day hope to see apply ABC to his stochastic IMF models.  Also, of interest is that one of the summary statistics explored is a pseudo-likelihood statistic motivated by the connection between ABC and Indirect Inference discussed in (e.g.) Drovandi et al. (2015).

Some outstanding questions I was left with are as follows.  In their demonstration of the inadequacy of a multivariate Gaussian likelihood (e.g. Fig 5) for the sampling distribution of transmission it looks as if the only problem is the approximation of the Poisson by a Gaussian rather than necessarily any non-Gaussianity in the underlying rate parameters of the Poisson.  In that case it looks like a pure indirect inference model in which the rate parameters of the Poisson in each bin are estimated from mock data and then a Poisson likelihood is supplied might be a good option; ignoring the correlation between bins (is this even important) but making the sampling distribution more realistic?  With possible extension to estimating the full mean and covariance matrix of a (multivariate) Gamma-Poisson auxiliary model.

The use of only a PIT (probability integral transform) style test of Bayesian parameter coverage also puzzled me, since this test is only confirming that the ABC likelihood function behaves like a likelihood function (in terms of introducing some new information to the prior) but doesn’t speak to the length of the confidence intervals being generated.  That is, such a test is a good sanity check on the numerical implementation of the ABC algorithm but can’t satisfy the usual role of the PIT run out-of-sample for model checking, nor can it be used to rank summary statistics a la the RMSSE framework of Nunes & Balding (2010).

An interesting side bar is that David Balding is well known for his work on statistics for forensic science.  Unrelated brain dump: the other night I found myself watching a forensic files episode in which an astronomer helps solve the case.