I noticed this paper by Asensio Ramos & Martinez Gonzalez (2014) on astro ph today presenting a hierarchical model for inferring the distribution of magnetic field Stokes parameters in the quiet Sun. The use of a hierarchical model for this problem is a natural way to pool the information from each pixel, as convincingly argued by the authors. What caught my eye though was their method for exploring the posterior which can be blocked into two groups of parameters: (1) the 4 hyper parameters of interest affecting all pixels, and (2) the 4 individual latent variable values of each of the N ~ 10^{5-6} pixels. Their strategy for exploring the marginal posterior of (1) is to first sample via MCMC from the pseudo-posterior of each pixel in (2) under a dummy prior, and then to do MCMC over the four hyper parameters of (1) with the marginal likelihood of each pixel (given these four hyper parameters) approximated by an importance sample reweighting (personally, I call these pseudo-importance sampling) of the pseudo-posterior draws. The strategy is credited to Hogg et al. (2010), but statisticians reading this will recognise it as the pseudo-marginal method with pseudo-importance sampling rather than true importance sampling: hence the title of this post (i.e., pseudo-pseudo-marginal). The point of pointing this out being that (a) with pseudo-importance sampling the algorithm is, technically, no longer targeting the true stationary distribution of the hyper parameters (though in practise for this example this might not be a significant problem), and (b) as I’ve noted before one can benefit from existing insights and theory on the pseudo-marginal method for, e.g., tailoring the acceptance ratio (cf. Doucet et al. 2012/14).

I would also disagree with the authors’ motivation for using a pseudo-pseudo-marginal approach here: which is that the large dimensionality of the posterior would preclude efficient MCMC sampling via even Hamiltonian MCMC. In fact, since the likelihood function of the latent variables of each pixel at a given set of hyper parameters are independent of all other pixels, one could easily treat this via rejection Gibbs sampling for which I’d wager JAGS is a sufficiently powerful tool in this case (given that the authors’ note that 100 pseudo-posterior samples are sufficient for their importance sample reweighting scheme, indicating that JAGS should be able to adapt its rejection sampling algorithm to a high level of efficiency). Here it is worth noting a *rare* recent use of JAGS (for an entirely different purpose) in the astronomical literature by Sereno et al. 2014.

One of my least favourite things about the statistics literature is the need to give every little hack a fancy name and call it a new method. I’ve used this same approach before. I referenced Hogg for it too, just to point to another place where it’s been used. However I of course don’t believe that he invented it.

Thanks for the criticisms, which turned out to be several times more useful than the referee report we received. First, we share Brendon’s motivation for citing Hogg’s work. In my limited knowledge of the astro literature, this was probably one of the first times this trick is used. However, I am still on time to cite previous works. Can you send me a few references by e-mail?

Second, concerning your suggestion to use a rejection Gibbs sampling, I have used in the past a Metropolis-within-Gibbs (that I understand it is essentially the same method) for a problem like this. The problem had independent latent variables for different cases and then

hyperparameters that were shared by all the cases. The Metropolis-within-Gibbs method did its work nicely but after solving some convergence problems. In the present case, I wanted to consider a very large number of variables and didn’t want to spend time tweaking parameters until getting convergence. The method might be approximate, but scales fantastically well for problems with thousands of independent latent variables.

Hi Andrés,

The best reference for the pseudo-marginal method in the sense of providing a reasonably in-depth theoretical discussion and historical review is Andrieu & Roberts (2009) paper http://projecteuclid.org/download/pdfview_1/euclid.aos/1236693147

There are a lot of past uses for sure in the literature, but the easiest to find—being those linking directly to one of the main stats papers—are by a selection effect inevitably the most complicated / convoluted!

Regarding optimisation of the Metropolis-within-Gibbs method: indeed this will likely be necessary, though one nice property of JAGS is that it has an automatic routine for self-adaptation which is performed during the burn-in phase. So it could be worth trying at some time.

cheers,

Ewan.

Ah, this one is a good applied reference in the present context: http://www.maths.bris.ac.uk/~mazjcr/supranetAddendum.pdf