Uniform ergodicity

A first thought here is that some advantage may be gained from the observation that it seems like it should be relatively straight-forwards to identify conditions on the prior+likelihood pairing that would ensure every likelihood-constrained prior admits a random walk MCMC algorithm that is uniformly ergodic. That is, for n-step transition kernel, P^n(x,\cdot), target \pi^\ast(\cdot) \propto \pi(\cdot)I(L(\cdot)>L^\ast), positive constant, M, positive constant, r < 1, and state space, E,
\mathrm{sup}_{x \in E} ||P^n(x,\cdot)-\pi^\ast(\cdot)|| \le Mr^n,
where ||\cdot|| denotes the total variation distance (see Meyn & Tweedie’s excellent, free, online book for a high level of detail).

Now, I say that conditions for the NS case should be relatively straight-forwards to identify based on Corollary 3 from Tierney (1994):
“A Metropolis kernel with \mu(E^+) < \infty and q and \pi bounded and bounded away from 0 on E^+ satisfies a minorization condition M(1,\beta,E,\nu) with \nu proportional to the restriction of \mu to E^+, and is therefore uniformly ergodic.”
Here E^+ is the support of (what in NS is our) \pi^\ast, the likelihood-constrained prior; so if E is \mathcal{R}^k then provided the posterior is proper the density in its tails must go to zero and E^+ for any L^\ast >0 becomes bounded (hence, \mu(E^+) < \infty modulo measurability concerns).

Yet to do is to precise the necessary conditions on \pi(\cdot) and L(\cdot) to make this true, and generally useful. Also the utility of this observation may further depend on any statements about the existence and nature of a corresponding drift condition (possibly implied by uniform ergodicity (that is, I need to research further on this topic).

A note of confusion: The theorem from Tierney (1994) quoted above is directly contradicted by Theorem 3.1 in Mengersen & Tweedie (1996); namely,
“If Q satisfies (23) [23: is a random walk MCMC algorithm, my note] on \mathcal{R}^k, then the Metropolis algorithm is not uniformly ergodic for any \pi.”
The difference here seems to be that these papers take a very different definition of the Metropolis algorithm when x lies outside of the support of the target density, i.e., \pi(x)=0. In both the authors define the kernel to have acceptance 1 when \pi(x)q(x,y)=0, but in Tierney (1994) it is assumed that “to avoid some trivial special cases, let Q(x,E^+)=1 for x \not\in E^+“, whereas in Mengersen & Tweedie (1996) this is not stipulated. To my mind the Tierney (1994) definition is more ‘useful’, so for the remainder I will assume this one.

It is also worth noting that there are quite a few think-o’s in the introduction to Mengersen & Tweedie (1996) that can cause confusion. For instance, the definition of a small set is quoted as being from Meyn & Tweedie, but includes a clause not in their defintion, namely that \nu be concentrated on C. [I’m wondering if this might be deliberate though to preclude having to specific the aperiodic+irreducilble clause on Theorem 1.3 (ii) and (iii), which is omitted in Mengernsen & Tweedie but present in Meyn & Tweedie??] Also, the defintion of the total variation norm given after their Equation (9) is missing a -\mathrm{inf} |\mu(A)|. And the proof of Lemma 1.2 is quite a bit off: “non-empty” should be \mu(C) > 0, the equality in (8) should be an inequality, and the “choose B \subset C” should be omitted to focus on A only.

I’ve just come across Roberts & Tweedie (1996) which gives some conditions for contours of densities in \mathcal{R}^k being smooth enough for geometric ergodicity, so I will read this and comment when I have time.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s