On path integral evidence (marginal likelihood) computations …

I noticed the latest paper by Kitching & Taylor, entitled Path-integral evidence, on astro-ph recently and decided it was high time for me to take a careful read through the full series (i.e. including TK2010 and KT2011). After parsing the astronomers’ statistical writing into my own ‘Ewan-compileable’ language I believe that the general scheme presented for (perhaps approximate) analytic marginalisation over nuisance parameters (whether they be parametric descriptions of instrumental systematics or non-parameteric [i.e., here Gaussian process] ‘fields’/functions representing non-standard cosmologies) is (a) quite appealing for cases where the conditional likelihoods for nuisance parameters at given (non-nuisance) model parameters are close to Gaussian; and (b) shares an interesting convergence of philosophy with the Integrated Nested Laplace Approximation (INLA) scheme I’m so fond of.

If I understand correctly the astronomers’ method then the similarities and differences with INLA are as follows.
(1) Both schemes seek to make efficient (perhaps doubly-, and least singularly-)approximate inference for the posterior of a model consisting of some (typically) very high dimensional set of nuisance parameters, \psi, coupled to a low dimensional (non-nuisance) model parameter vector, \theta, of primary interest. In INLA the nuisance parameters are typically the mesh nodes of the latent Gaussian Markov Random Field and the model parameters of interest are its corresponding hyper-parameters and possibly the coefficients of some linear predictors; while in the astronomers’ case the nuisance parameters are the finite projections at the observed datapoints of a continuous space GP with fixed covariance structure and the model parameters of interest are cosmological parameters describing the mean of the main signal component at those points.
[Don’t make too big a fuss over the fact that INLA uses a GMRF and the astronomers use a continuous GP; see the GP -> GMRF SPDE connection: Lindgren et al. (2011).  Also, forgive the astronomers the (to a statistical eye) ‘hand wavy’ treatment of integration over functional priors and cancelling of infinities.]
(2) Both schemes seek computational efficiency by introducing an approximation to the conditional probability in \psi space at each given value of \theta, i.e, \pi(\psi|\theta,D), such that exploration of the (typically) lower-dimensional \theta space can be reduced to (e.g.) direct, brute force exploration.
(3) Whereas INLA approximates \pi(\psi|\theta,D) via a Laplace approximation about the mode \psi^\ast|\theta of \pi(\psi|\theta,D) the astronomers make their approximation using a second order Taylor expansion of the log conditional likelihood, L(D|\theta,\psi) about a single \psi_0 fixed for all \theta to the global posterior mode. (I’ve tried to illustrate this in the diagram below).

(4) If the conditional probability in \psi space would be exactly Gaussian then both schemes are exact, but since this assumption rarely holds then neither will be exact in practice. Also, for non-Gaussian likelihoods the astronomers’ solution risks returning a non-positive definite covariance matrix, which breaks the scheme; but for Gaussian likelihoods this approach could be faster than INLA since it is only necessary to compute a single vector of first derivatives and a matrix of second derivatives at each \theta, rather than having to do this after first finding \psi^\ast|\theta.
– In the introduction to Rue et al. (2008) [*the* INLA paper] the authors seem to suggest that this type of scheme, which is attributed to Breslow & Clayton (1993) / Ainsworth & Dean (2006) is from experience too inaccurate for the non-Gaussian conditional likelihoods one tends to encounter in applied spatial statistics (particularly, geostatistics/forestry/mining/epidemiology).

If my thoughts above are correct then I must say I am intrigued as to the performance of the astronomers’ algorithm for some of the models we investigate in malariology: in particular, although we have non-Gaussian response data we often tranform it such that we can approximately use a Gaussian likelihood (e.g. via the empirical logit transform) because even with INLA this tends to give us better performance in Bayesian coverage and cross-validation diagnostics. Also, with codes taking advantage of autodiff, like TMB (as used to great effect in the Global Burden of Diseases project), finding the global posterior mode and computing the necessary first and second derivatives should be super quick (especially if the field is reduced to a GMRF using the INLA mesh approximation). Of course, I’m not suggesting that it will be as accurate as INLA, but it might be competitive for first-order solutions in some BIG DATA or model-building/testing regimes … ??

????????? Needs more thought ….

Some feedback from friends:
– The same scheme *might* also pass under the umbrella of ‘the Laplace approximation’ in machine learning; e.g. Flaxman et al..
– The same scheme *might* also be able to be run in INLA by nominating strategy=”gaussian”.
But in both cases I am yet to satisfy myself that the implementation is the same as the astronomers’ one above with expansion about a global \psi_0 and not just a Gaussian approximation about each \psi^\ast|\theta
– Both schemes need to fill once a second derivative matrix at each \theta (?), which is likely the most expensive operation in any case, finding the conditional posterior mode being much cheaper.

This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s