As I mentioned in the update to my previous post, I’ve lately been having a careful read through a classic paper on MCMC algorithms: Tierney (1994). One interesting property of Tierney’s ‘*Markov chains for exploring posterior distributions’* is that of its over 3000 citations the vast majority are from other highly cited papers within the statistics and applied probability literature; and after following up on just a handful of these it is my impression that the citations are well deserved in the sense of Tierney’s paper having legitimately contributed a useful piece of knowledge to the citing paper. [Incidentally, in one of the citing papers I clicked through I found a very earnest-looking photo of our friend from Oxford stats—a young Arnaud Doucet.]

On the other hand, despite discussing a diverse range of MCMC varieties (Metropolis-Hasting proposals, independence sampling, Gibbs sampling, rejection Gibbs sampling, grid-based sampling, etc.) this paper appears not to be so well known in the physical sciences where MCMC methods are now quite common-place. One reason for this may be that Tierney brings out the measure theory quite early on (in order to be sure of his discussion covering *general* state spaces); but my advice for (astro-)physicists not familiar with measure theory would be to just plough through Sections 1 (brief introduction), 2 (introduction to the language of general state space, discrete time Markov chains and discussion of key MCMC strategies), 4 (implementation issues: including conditioning) and 5 (numerical example) regardless and read all integrals as Riemann integrals with Dirac delta functions allowed as we’re already thoroughly comfortable with. Section 3 can then be tackled at leisure depending on interest. The value of Section 2 is that it gives a rigorous mathematical description of the Metropolis-Hasting algorithm (with a three step proof that the targeted posterior is indeed its invariant distribution) and then uses this consistent language to describe and unify the host of alternative MCMC schemes listed above.

Despite cranking up the maths level somewhat Section 3 will not be intractable to anyone already familiar with basic discrete space, discrete time Markov chain theory (i.e. types of chains [periodic/aperiodic, positive recurrent, null recurrent], limiting/stationary distributions, etc.; e.g. Haggstrom’s textbook) and some measure. In fact, as this was my first time reading a formal treatment of general state space chains I got most enjoyment out of spotting the similarities and differences with the discrete state space regime: e.g. while we don’t need to explicitly state the metric of convergence in writing lim_{k->inf} P^k = 1 pi in the irreducible, aperiodic finite, discrete state case, it turns out we do need to specify *pi-almost everywhere* convergence in *total variation norm* for the general state space case, which makes intuitive sense and helped me clarify some thinking about the total variation distance. The real value of Section 3 though is that it identifies the importance of distinguishing Harris recurrence from boring old pi-almost everywhere recurrence, and simple ergodicity from geometric ergodicity: which in turn have been re-emphasised and discussed further in the influential paper (it had a lot of verbal references at the i-like meeting here in Oxford earlier this year) by Roberts & Rosenthal (2007).

So … overall I would thoroughly recommend Tierney (1994) to my blog readers as a worthwhile paper to get stuck into when you have a decent block of free thinking time (which can sometimes feel like never, of course).

p.s. If I had one quibble with the terminology in this paper it would be the term “harmonic function” for h=Ph functions: this just confused me thanks to the existence of a different type of harmonic function in physics. In my head I think of the pi = pi P functions (/measures) as “outwards-going self similar inputs” to P and the “harmonic functions” as their counterpart: “inwards-going self similar inputs” to P (although I understand this corrupts the sense of the word ‘input’ somewhat!).