Six key trends in contemporary statistics that really could revolutionise astronomical data analysis …

With all the schools (Canary Islands, Penn State), conferences (IAUS306; 2013: Astroinformatics), books (S,MD,&ML in Astronomy), and articles (e.g. Hilbe 2014) hyping the potential of contemporary statistical methodology for revolutionising the practice of astronomy and astronomical data analysis you’d be forgiven for thinking that the discussion here was about more than just introductory Bayesian inference and MCMC.  But with few exceptions (e.g. the Penn State school will touch on a little SMC and ABC) the tools being talked about are very much Bayesian Inference 101, and in many other fields (genetics, health care, epidemiology, geography, mineral exploration, finance, & computational chemistry to name a few) would barely rate a yawn.

So, to inject a little pick-me-up of genuinely modern statistical methodology into the discussion over the next few days I will present: “Six key trends in contemporary statistics that really could revolutionise astronomical data analysis …” : (1) Bayesian non-parametrics / semi-parametrics, (2) the SPDE approach to large-scale random fields, (3) particle filters and population Monte Carlo, (4) ABC and the pseudo-marginal method, (5) Big Data Bayes, and (6) sequential/adaptive experimental design.  To date there have been but a handful of astronomical/cosmological publications to use any of these ideas and where they have been used we have only seen the tip of the iceberg of potential applications in the field (but if I’ve missed any key references feel free to let me know through the comments box).  Also, although I now use my powers for good (in the study of malaria epidemiology) rather than evil (galaxy evolution) I would be more than happy to give some pointers or answer technical questions on any of these topics if there are any astronomers out there interested in the following.

(1) Bayesian non-parametrics / semi-parametrics.
One of the promises of the “Big Data” revolution is that we can become less reliant on the boring old exponential family of distributions (Gaussian/Normal, Exponential, Poisson, Binomial, etc.) to model our observational datasets; instead we can let the data more strongly speak for itself through the flexible class of Bayesian non-parametric / semi-parametric priors based on stochastic processes: most notably, the Dirichlet process and the Gaussian process.  While it’s true that the Gaussian process (well known to cosmologists) is starting to become popular in astronomy for modelling time series (Brewer & Stello 2009) and the systematics thereof (Gibson et al. 2013), non-parametric regression (Seikel et al. 2012), and image processing (Sutter et al. 2014), it’s also true that there remains great potential yet to be explored (particularly with respect to very large scale Gaussian process modelling, as we’ll see in part 2)—and the Dirichlet process has remained almost completely undiscovered!

In contrast to the Gaussian process, which offers a convenient prior over the space of continuous functions, the Dirichlet process offers instead a (highly flexible: i.e. “non-parametric”) prior over the space of probability distributions.  As such it can serve as a convenient basis for non-parametric modelling of unknown error forms, such as those arising from instrumental systematics, as commonly used in clinical meta-analysis studies (cf. Burr & Doss 2004, Muller et al. 2004); and, when nested below a Normal error source in a hierarchical framework provides the infinite-mixture model (cf. Escobar & West 1995, Cameron & Pettitt 2013a) for semi-parametric analysis of uncertain distributional forms.  But the fun doesn’t stop there!  We can make even more flexible non-parametric distributions from the general class of Polya urn and Chinese restaurant processes to which the Dirichlet belongs (cf. Zhou & Carin 2012, Muller & Quintana 2004).  Lately the Dirichlet process has also been found highly useful for for non-parametric “on-line” object classification with Mondrian forests / Bayesian decision trees (Roy & Teh 2009, Aderhold et al. 2013Lakshminarayanan et al. 2013); the “on-line” part of which is perhaps yet another trend deserving its own discussion.  (See also the role of the Dirichlet process in novel analyses of graphical data and graphical models: e.g. Tsuda & Kurihara 2008 and Heinz 2009!)

fergusonThomas Ferguson: Father of the Dirichlet Process (Ferguson 1973)

References/Notes.  Past uses of the Dirichlet process in astronomy: Chattopadhyay et al. 2007 and Shin et al. 2009 have used the DP for mixture model-based classification; Magorrian 2013 uses the DP as a non-parametric prior on a galactic action-space distribution; and we (Cameron & Pettitt 2013a) have used it with Bayesian model selection as a systematic error model for investigating the fine structure constant dipole [importantly, we also show how the Radon-Nikodym derivative of the Dirichlet process marginals allows for efficient prior-sensitivity analysis].

This entry was posted in Astrostatistics, Dirichlet Processes, Statistics. Bookmark the permalink.

2 Responses to Six key trends in contemporary statistics that really could revolutionise astronomical data analysis …

  1. Pingback: Six key trends in contemporary statistics that really could revolutionise astronomical data analysis … « AstroNayla

  2. Pingback: Astro ph round up … | Another Astrostatistics Blog

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s