One of my favourite textbooks begins its preface with a quotation attributed to Edward Davenant who would (apparently) “have a man knockt in the head that should write anything in Mathematiques that had been written of before“. And that was back in the stone age, or at least, before the internet put more-or-less anything you’d like to read (modulo the paywall system for certain academic journals) at your fingertips.
Not surprisingly, I feel much the same way about the relentless stream of review papers on marginal likelihood estimation: it’s as if every sub-discipline of physics and astronomy requires its own review to say in a limited way what has already been said so many times before in the statistics literature, including in some excellent review articles there (e.g. Kass & Raftery 1995; Friel & Wyse 2012; Robert & Wraith 2009).
Hence, I didn’t find much to interest me in today’s contribution by Knuth et al. concerning Bayesian model selection for the Digital Signal Processing journal post on the arXiv. Although quite a few contemporary techniques get a brief shout out, the focus of the text is on importance sampling*, thermodynamic integration, and nested sampling, i.e., a very physics-y selection. Perhaps the one version of the marginal likelihood review I would like to see would be one for physicists presenting contemporary techniques yet to penetrate the field widely such as Sequential Monte Carlo, Adaptive Multiple Importance Sampling, or Marginal Posteriors.
However, even within the topics presented the discussion is quite limited: e.g. the use of auxiliary densities (e.g. LeFebvre’s J-divergence paper) for thermodynamic integration / path sampling is missed, likewise for alternative types of tempering like data tempering (a la Chopin 2002); e.g. recent developments in nested sampling like diffusive nested sampling and importance nested sampling are ignored too. The connection between the density of states and nested sampling which I described in my Statistical Science paper is also presented with a dodgy measure-free treatment (but no citation to my careful measure-theoretic version); hopefully this will be corrected in revisions now that I’ve pointed it out to them but it’s still a bit ‘story of my life’.
* The discussion of importance sampling in this paper reminds me to point out with regard to the previous post that when using importance sampling in a pseudo-marginal MCMC algorithm one really should not force the weights to a sum of one as this is the biased version of importance sampling (cf. Hesterberg 199x) .