# Evans ->p proof

So perhaps the first convergence proof proposed for the marginal likelihood estimate in nested sampling is that of its convergence in probability to the true marginal likelihood shown by Michael Evans in the discussion of Skilling’s Valencia paper.  It’s a pretty straight-forwards proof focusing on the error introduced to the Riemann integral by the probabilistic association of $L_i$‘s (in his notation, $\psi_i$‘s) to $X_i$‘s set to their (exp-log-)expectation, $\exp(-i/N)$.  So again this proof assumes exact sampling of the replacement particles from the likelihood-constrained prior; and, as Michael notes, it’s only a convergence in probability so (unlike the later distribution proof by Chopin & Robert) it doesn’t tell us about the rate.

I’ve extracted the key section in the (large; apologies!) jpegs shown below and made a couple of margin notes that may help explain things.  Basically Michael’s strategy is to focus on continuous $L(X)$ (or more generally any function of the remaining prior mass to be integrated, not necessarily the inverse likelihood function, taking the form in his notation $g(\psi)$) defined on the entire compact interval $[0,1]$ for which the Weierstrass approximation theorem ensures a controlled error when $g(\cdot)$ is represented by a polynomial (of finite order, obviously).  This allows the convergence proof to focus on powers of $\psi$ for which the mean and variance of the Riemann approximation error can be computed and demonstrated to be vanishing and bounded, respectively.  Convergence in probability of the sum to the true integral then follows by an application of Chebyshev’s inequality (Michael call’s this Markov’s inequality, but I think of Markov’s inequality as the more general, but less powerful, version that assumes only the existence of the mean not the variance as well?).

One interesting point that Michael makes later in the discussion—which I do not believe has ever been followed up in practise (although I remember thinking about this in working on Cameron & Pettitt)—is (in my words) that one can perform nested sampling against an auxiliary density chosen closer to $L(\cdot)\pi(\cdot)$ than $\pi(\cdot)$ from which a simple reweighting gives the marginal likelihood.  One would imagine a well chosen auxiliary can readily improve the performance of nested sampling, just like it can in path sampling / thermodynamic integration (see e.g. Lefebvre et al.), or SMC.