I noticed this paper on astro ph today by Hou, Goodman, & Hogg describing their “new” idea for marginal likelihood estimation using the geometric path. A quick inspection reveals that in fact the idea being presented is exactly the same as Fan et al. (Jan 2011)’s generalised “stepping stone” estimator, which in turn is an extension of Xie et al.’ (2010)’s naive “stepping stone” estimator by integration from a well-chosen reference density — following the ideas of Lefebvre et al. 2010 (and Friel & Pettitt 2008) regarding the connection between the error of thermodynamic integration along the geometric path and the J-divergence between the two end point distributions.
It’s odd that the authors didn’t mention any of these previous presentations of this marginal likelihood estimator. For one thing, if ever you go on the internet then you will find out that the stepping stone estimator is all up in that like lol cats, memes, and bad selfies. (Just type “geometric path” and “marginal likelihood” into google; see e.g. the review by Baele et al. 2012). For another thing, I sent Hogg & Hou a copy of my recursive marginal likelihood estimators paper (which discusses this stepping stone business and the choice of reference density for geometric path sampling) in October 2013 (and it had been on arXiv since many months before then). I even wrote on Hogg’s blog that I thought what I’d heard of their paper sounded a lot like Lefebvre’s ideas just a few days ago. At that time Hogg’s reply was rather cryptic: “Yes, didn’t mean to claim novelty! Just invention.” Overlooking the fact that the dictionary definition of invention gives novelty as its implication (e.g. “create or design [something that has not existed before]”: link), I would have at least thought they’d cite Lefebvre et al in their arXived version. (Strangely, they did cite my recursive estimators paper in the context of prior-sensitivity; which was nice!)
For those interested in using the stepping stone estimator for geometric path sampling from an auxiliary density it’s worth noting here a few other ideas overlooked by Hou, Goodman, & Hogg. First, although they state that “their” method has computational advantages over reversible jump MCMC, parallel tempering, nested sampling, and population Monte Carlo this is certainly not guaranteed in general and will very likely be untrue for the specific cases of multi-modal or fat-tailed posteriors. The point is that since the error of this method depends on the J-divergence between the two end point densities (prior and auxiliary) it can easily happen that one chooses a poor auxiliary with worse J-divergence against the posterior than the prior itself. Indeed for some posterior — auxiliary combinations one can end up in the infinite variance quagmire of the dreaded harmonic mean. As anyone who truly understands marginal likelihood estimation will tell you: it’s horses for courses, baby! One way to limit the potential damage caused by unwittingly adopting a poor auxiliary is via a defensive importance sampling strategy (Hesterberg 1995), i.e., taking instead of Hou et al.’s Laplace approximation-esque g(theta) a defensive h(theta) = alpha*pi(theta)+(1-alpha)*g(theta) [for 0 < alpha < 1; alpha small].
It’s also worth remembering, as I stress in my recursive estimators paper, that the stepping stone path is not only used by the stepping stone method, but existed previously as the preferred identity for estimating marginal likelihoods via the population Monte Carlo method (e.g. del Moral et al. 2006). In this case it becomes really the mechanics by which the tempered bridging distributions are explored that sets the difference between methods — again which is better will ultimately come down to implementation- and problem- specific considerations. Finally, in my opinion (forged from experience in practical applications, though I only have a heuristic argument to back this up) the biased sampling (Vardi 1985) summation presented in my recursive estimators paper will always provide a more accurate estimator of the marginal likelihood than the stepping stone given the exact same draws — the reason for this being that the pseudo-importance sampling character of biased sampling makes for more efficient use of the available information — and the biased sampling version gives direct access to an efficient estimator of prior-sensitivity. So if you use Hou et al.’s method then you ought to try my (by which I mean Vardi’s, Geyer’s, Kong’s, etc.) summation!