# Feynman-Kac flow

It’s looking increasingly likely that some progress on this convergence proof, and perhaps some good ideas for understanding practical nested sampling implementations more generally, might be found by considering the nested sampling algorithm as a particle approximation to a Feynman-Kac flow (see Pierre Del Moral’s book).  That is, to focus on the convergence of the sequence of live particles to the discrete time Feynman-Kac flow:

$Q_n(d(\theta_0,\ldots,\theta_n)) = \frac{1}{C_n} \exp\left\{\sum_{s=0}^{n-1} V_s(\theta_s)\right\}\Pi(d(\theta_0,\ldots,\theta_n))$

where $V_s := 0$ for $L(\theta)>\psi(\exp(-s/S))$; $0$ otherwise, and $\psi(X) = L : \mathrm{Pr}\{L(\theta)>L\} = X$.  (Here $C_n$ is the normalization constant usually written $Z_n$ in Feynman-Kac formulae, but not here to distinguish it from the marginal likelihood targeted by the NS integral, $Z=\int_0^1 \psi(X) dX$; and $\Pi(\cdot)$ is $n$ copies of the prior to the product space).

Which in turn converges to the continuous time flow:

$dQ_t = \frac{1}{C_t} \exp\left\{\int_{s=0}^{t} V_s(\theta_s)dt\right\}d\Pi$.

So, in principle, all one needs to do is find the appropriate convergence theorems for particle algorithms with MCMC moves and find the restrictions on NS algorithms (e.g. ratio of attempted MCMC moves to live particles; necessary smoothness of the constrained likelihood contours?) to ensure that the conditions on those theorems hold.  Right … ?  (To be continued!)

Update (18/09/15): I’m increasingly confident that we’re on the right track now with the Feynman-Kac approach to study NS, having been pointed by Arnaud Doucet to a related application in Arnaud Guyader’s Phd thesis (chapter 4) [see also Cerou & Guyader].  In Arnaud (Guyader)’s case the problem is not strictly nested sampling for marginal likelihood estimation, but rather using the same live particle evolution within likelihood-constrained contours to estimate rare event probabilities, $p=\Pi(L(\theta)>L^\ast)$ (in my ‘Bayesian notation’) where $p$ is of order $10^{-9}$ so cannot be simply estimated by sampling directly from the prior.  In this context the nested sampling path of live particle evolution is identified as a minimum variance, adaptive version of the “Importance splitting” algorithm of Kahn & Harris (1951), and Rosenbluth & Rosenbluth (1955).  [Interestingly, an alternative version is also considered in which the quantile for rejection is not simply $(N-1)/N$, like we tried in Cameron & Pettitt  (and I think there’s a de Freitas paper with this too)].

The encouraging part for me is that Guyader identifies the connection between this nested sampling algorithm and Feynman-Kac flows (referencing, of course, Del Moral), and in fact uses this framework to derive a fluctuation result for $\sqrt{N}\frac{\hat{p}-p}{p}$ in the quantile version.  (There’s also some convergence theorems for the NS-like version which, not surprisingly, require similar conditions to the Chopin & Robert proof: i.e. $\psi(X)$ twice differentiable with continuous second derivate.)   However, throughout only the idealised algorithm with exact draws from the likelihood-constrained prior is assumed, despite the applications being with MCMC kernel.   There is a note on the MCMC version though which mentions the RW MCMC kernel being Harris recurrent, but as I’ve noted earlier  it shouldn’t be too restrictive to specify $\pi(\cdot)L(\cdot)$ combinations for which the RW MCMC sampling of the likelihood-constrained prior is uniformly ergodic (that is, we have a bound on the rate of convergence): this is why I believe the Feynman-Kac formulae can be exploited further to study the non-idealised algorithm.

All in all I am glad to have been shown this connection between rare event estimation and nested sampling; hopefully the two communities can learn something from each other in the future!

Update (19/09/15): I just discovered the 2015 paper by Cerou & Guyader which addresses the non-idealised case! Of course the focus is still on the good behaviour of the algorithm to estimate a tail probability but every other part is identical to nested sampling with MCMC refreshment kernel.  I will need a little time to go through this proof but at this stage it seems that all the heavy lifting has been done: all that’s needed now is to examine the behaviour of the NS marginal likelihood estimate with their theorem & examine the consequences for various choices of MCMC kernel and prior+likelihood pairings.