I noticed the revised version of a paper on astro-ph today by Dorn et al. ( http://arxiv.org/pdf/1307.3889.pdf ) concerning a diagnostic for errors in one’s code for approximate posterior reconstruction. In the usual case the approximation is simply the point mass approximation to the posterior from output of a Markov Chain Monte Carlo simulation (which, user errors aside, should be nicely consistent), though it could just as well be a Gaussian ansatz representing a (expected-to-be-) biased approximation to the posterior (their example) or perhaps an ABC approximation to the complete-likelihood posterior (my suggestion).

The basic idea (for *univariate* theta) follows the method suggested by Cook et al. (2006) and involves simulating theta_true from the prior, simulating mock data given theta_true from the likelihood, and then applying one’s (approximate) code for estimating the resulting posterior. If the distribution function of the true posterior for theta (given any data from the likelihood function) is strictly monotonic then it can be easily shown that the resulting distribution of P(theta<theta_true) from these repeat simulated posteriors should be uniform * if* the posterior estimating code produces unbiased estimates of the true posterior.

*The point being that if we run this simulation process and examine our output distribution of P(theta<theta_true)’s we can identify (and perhaps later correct) unwanted biases in our posterior approximation code/method as departures from the uniform.*

(As Dorn et al. note, however, for multi-dimensional theta one would need to look at projections of the multi-dimensional posterior down to one dimension. [From the Cramer-Wald Theorem I might guess that looking at a basis set of normed linear mappings of the marginal P()’s is the way to go.])

Two thoughts:

(1) Pragmatically, it might be overkill to aim for a faithful approximation of the posterior over the complete product space of theta from the prior × data from the likelihood function. A more useful goal for a single experiment is to have good recovery of theta for data generated (via the likelihood function) from theta near the current posterior. That is, rather than use our original prior in Dorn et al.’s test we might instead use an ansatz for our current posterior. This is by analogy to what is already done for refinement of ABC summary statistics with respect to the RMSSE (e.g. Balding & Nunes 2010). [Before anyone gets the wrong idea, I’m not suggesting to use the data twice in our main inference step by actually changing our prior to the posterior. But I am suggesting we do it for these validation simulations!]

(2) Following on from my mention of the RMSSE (root mean sum of square errors) it might make sense (again pragmatically; and particularly for the multivariate case) to focus on posterior validation for a limited set of posterior functionals (e.g. the mean and covariance matrix), without being too concerned about the higher moments. At the end of the day we’re often interested in only a few such quantities, and we’d be happy to get these right at the expense of almost everything else.

As it stands I think the paper raises an important issue (validation of one’s posterior simulations), but more work is needed to make it usefully applicable to multi-dimensional posteriors (which are the one’s we usually have difficulty simulating in the first place!).