Six key trends in contemporary statistics that really could revolutionise astronomical data analysis …

(2) the SPDE approach to large-scale random fields.
Cosmologists have long been familiar with the statistics of random fields, owing in part to the prominent role of the Gaussian and log-Gaussian in models of the initial density perturbations (Bardeen et al. 1986) and large-scale structure (Coles & Jones 1991), respectively—and indeed cosmologists have in the past led the way to development of key techniques for excursion set analysis and efficient simulation from Gaussian fields (e.g. Arnoud Doucet credits Hoffman & Ribak 1991 for introducing one of the most widely used tricks for sampling the posterior at indirectly constrained locations).  However, in the past decade the thrust of new ‘applied statistics’ ideas in random field theory have been driven by geostatistical research, where the embedding of Gaussian fields within the Bayesian hierarchical modelling framework has proven an exceptionally powerful technique for mapping continuous spatial signals (cf. Diggle et al. 1998, Banerjee et al. 2004 and examples in epidemiology: Diggle et al. 2002, Kelsall & Wakefield 2002, Gething et al. 2010).  And it has been within this context of geostatistical modelling that the SPDE (stochastic partial differential equation) approach to the modelling of large-scale random fields has lately emerged as a revolutionary new trend.

By far the greatest challenge for computational solution of large-scale inference of Gaussian random fields is that of matrix factorisation for their dense covariance structures.  The SPDE strategy described by Lindgren et al. (2011) identifies an explicit link between general Gaussian random fields and Gaussian Markov random fields (GMRFs), and makes use of this link to develop mesh-based approximations in the style of the finite element method; the result of which are sparse precision matrix proxies for non-sparse Gaussian random field parent models.  The computational advantages of this widely-applicable approach (see also Simpson et al. 2011 for key examples) are immense—and even more so when coupled to the Integrated Nested Laplace Approximation (INLA) strategy for GMRF posterior approximation (Rue et al. 2009) which solves the other key challenge of Gaussian field modelling: efficient exploration of the joint hyper-parameter–latent field space.

With each observation giving rise to its own latent random variable, in large observational studies (or observational studies gridded over large areas/volumes) the dimension of the posterior in Gaussian field models can easily be in the hundreds, thousands, or hundreds of thousands!  As conventional MCMC schemes quickly begin to struggle at the lower end of this spectrum the now ‘conventional’ approach is to use Hamiltonian MCMC (Neal 1997, Girolami & Calderhead 2011Neal 2012; and for an astrophysical example: Jasche & Kitaura 2009).  However, for the GMRF-based proxies provided by the SPDE method INLA proposes a super-fast alternative: since the Gaussian prior typically leads to a near-Gaussian posterior for the latent field why not take the Laplace approximation (Tierney & Kadane 1986) here and run numerical quadrature over the remaining hyper-parameters?  It turns out that this MCMC-free algorithm is more than sufficiently accurate for a lot of practical inference tasks—and, importantly, it runs a heck of a lot faster (we’re talking hours on a desktop compared to weeks on a cluster in some cases)!  Recent successful applications of the SPDE / INLA methodology outside of astronomy include the mapping of global ozone levels (Bolin & Lindgren 2011) and air pollution (Cameletti et al. 2013, Clifford et al. 2013), bovine diarrhoea(!) (Schroedle et al. 2011), forest fires (Natario et al. 2013), and disease genealogies (Palacios & Minin 2012), amongst others.  Important to note is that SPDE / INLA can also be used for time series modelling (cf. Martino 2008).

inlaAn example SPDE approximation (Lindgren & Rue 2010).

References/Notes.  The most ‘obvious’ potential application of the SPDE / INLA method in astronomy would be for mapping large-scale structure (and inferring its correlation function).  Where the currently-used methods are either slow and computationally expensive (e.g. Jasche & Wandelt 2012, ) or rely on Normal approximations to the data generating process and a regular-gridded cubic domain for speed ups from FFT-based matrix manipulations (e.g. Jasche & Lavaux 2014), SPDE / INLA could do the same job quickly and cheaply, on possibly irregular grids, and without sacrificing the real (or realistic) latent field-to-data link function (here: Poisson point process).  However, there are likely many other potential applications of SPDE / INLA within the astronomical domain: for modelling simultaneous the systematics in time-series of many thousands of targets? for fast mapping of the CMB / pixel in-painting? for faster model emulation or experimental design?  Only time can tell (as Richard Hell might say) …

This entry was posted in Astrostatistics, Gaussian Processes, Infinite-Dimensional Inference, INLA, Statistics. Bookmark the permalink.

3 Responses to Six key trends in contemporary statistics that really could revolutionise astronomical data analysis …

  1. Andres Asensio Ramos says:

    Do you have any idea whether this SPDE idea to large-scale random fields can be applied to observations that are obtained as integrals of the random field (for instance, a random field observed with “pixels” that are larger than the smallest scale in the field? More interesting would be the case in which the field can be positive and negative, so that you get cancellations in the “pixels”.

    • Broadly speaking, both the MCMC and SPDE/INLA algorithms can be run on hierarchical Bayesian models that would describe such an observational process: as long as its likelihood function can be computed conditional on each approximate representation of the latent random field. In this pixelized case, for pragmatic reasons of computational time / desired level of accuracy, one would usually just simplify the model to equate the pixel measurement with a point measurement of the field at its centre.

      But, if we did want to go sub-pixel then the usual strategy would be to make a sub-pixel mesh, and this is where the SPDE method could be interestingly different from the MCMC method. The default (and rather ad hoc) strategy would be use a regular grid: start with, say, 9x subsampling and see how it compares to the non-subsampled model; if it differs significantly go to, say, 16x subsampling and run again; until you eventually run out of computational power! However, the SPDE approach would advocate a principled approach to the placement of sub-pixel nodes for interpolation of the true field based on functional approximation theory. To this end I would imagine that accurate results could be recovered by placing nodes at the corner of each pixel plus one in the centre, with the next best approximation being with an extra four nodes. (?) In any case, since INLA typically runs orders of magnitudes faster than MCMC it would be a lot easier to explore the effect of pixel sub-sampling to figure this out!

  2. telescoper says:

    Reblogged this on In the Dark and commented:
    I’ve just come across this very interesting astrostatistics site, and I thought I’d reblog a piece from it. In fact I did make a very crude attempt back in the 90s to do something very like the SPDE analysis described here, but it came to nothing. That seems that there’s been a great deal of activity in this area more recently so it might be worth reviving interest in it.

    Now. Where did I put those notes.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s