Particle Gibbs with Ancestor Sampling …

The day after the night before.  Feeling a bit hung-over after hitting the Royal Oak with some of the local statisticians, but I can’t complain because the drinking was fun and it followed a very interesting talk by Thomas Schoen from Uppsala on “Particle Gibbs with Ancestor Sampling” (PGAS).  The type of inference problem for which these methods were developed is best characterised by that of a state space model in which the observed data form a time series of noisy measurements of some underlying probabilistic trajectory with possibly non-Markovian dynamics.  I.e., we observe a series of Y_t ~ f(.|X_t,theta) driven by a hidden process X_t ~ g(X_t|X_1,X_2,…,X_{t-1},theta) with the forms of f() and g() known, and we aim to infer theta and/or the hidden path of X_t.  For instance, in the epidemiological context we might have a series of observed incidence counts ({Y_t}) from a small sample of the population over a period of time and we wish to infer the unobserved path of underlying (true) incidence ({X_t}) as well as some transition probabilities of our SIR model (theta).  In a simple SIR model the transition probabilities will be strictly Markovian (i.e., g(X_t|X_{t-1},theta)) but if immunity is built up by past exposure (like in the case of Pf. malaria) this Markovian structure may well be destroyed … so it’s a good thing that PGAS doesn’t require it!  In any case, there is an obvious relevance to my current work, which I knew before attending the talk.  What surprised me was the more general applicability of the PGAS method to probabilistic graphical models (as explained by Naesseth et al.)—this flexible class of constructs being useful in a wide range of computational applications, including survey astronomy, where they are heavily promoted by, e.g., David Hogg.  It can be a steep learning curve to go from regular MCMC and Gibbs sampling ideas to mastering these advanced particle-based methods, but I am quite sure that a lot of astronomers would find it a worthwhile journey.

Update/continuation (after returning from an interesting production of Noel Coward’s Semi-Monde at the Oxford Playhouse).  I thought I might try to give a very brief summary of PG and PGAS in case this helps illustrate its potential.  With the space of paths, {X_t}, sharing the same dimension as the data, {Y_t}, it can become very difficult to design efficient MCMC proposals for this type of state-space problem.  So, one approach is to build an Sequential Monte Carlo (SMC), particle-based approximation to {X_t}|{Y_t},theta.  The way this is done is to (i) draw a set of particles at t=0 from some initial proposal distribution, (ii) importance weight them to approximate {X_0}|{Y_0},theta, (iii) resample them according to these importance weights, (iv) propagate the survivors via a suitable proposal kernel targetting {X_1}, and repeat the process of importance weighting, resampling, and propagation.  The great advantage of this strategy is that one need only a reasonable proposal kernel for each sequential propagation step; the great disadvantage is that of path degeneracy: for large t it is almost inevitable that most of the surviving particles share a common ancestor.  This will be problematic if we aim to use this SMC method over and over again while Gibbs sampling over the full posterior, {X_t},theta|{Y_t} … it looks like we’re going to have to jack up the number of particles at great computational expense.

But this is where PG and PGAS come to the rescue!  First, the PG method applies the (not at all obvious) observation that in fact we can target the true posterior even with a degenerate SMC step, provided we make one tiny change to the SMC algorithm: instead of resampling all particles we add a single reference particle on a pre-defined path which we do not resample (during our SMC kernel).  This reference particle acts to guide the other particles and after each application of the full SMC kernel we sample a new reference particle path from its output and iterate our Gibbs sampler.  I.e. we draw {X_t}|{Y_t},X*,theta (where X* is the reference path) using our SMC kernel, then choose a new X*|{X_t},{Y_t},theta, and finally draw a new theta|{X_t},X*,{Y_t} (this step is assumed easy since the theta parameter vector is most often low-dimensional; or much lower dimensional than {X_t}).  Of course, under this ordinary PG algorithm the mixing can be extremely slow, owing again to path degeneracy: most of the {X_t}|{Y_t},X*,theta drawn from our SMC kernel will be very close to the reference path.  The trick of PGAS is to add a further step of “ancestor sampling”, which is to my mind much like the ‘losing the labels’ strategy of biased sampling (or RLR, or AMIS, or INS).  That is, although we know the true path we’ve drawn as our new reference path one instead resamples this again by (in effect) pretending not to know its path (i.e. its sequence of ancestors).  The power of this little trick in improving mixing of the particle Gibbs sampler in computational experiments is really remarkable, such that in fact one can sample very well the full posterior {X_t},theta|{Y_t} using as few as twenty particles in the SMC kernel phase.

This entry was posted in Particle MCMC. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s