Earlier this week I was up in London for the Statistical Data Science 2017 conference organised by Imperial College London and hosted by Winton Capital. Naturally quite a few of the talks focussed on machine learning methods and Gaussian processes. My talk was on the use of both: that is, stacked generalisation with GPs (e.g. Bhatt et al. 2017) for Bayesian optimisation (one type of model emulation; see also ‘history matching’ for semi-analytical galaxy sims). One idea that got me thinking was Joe Meagher‘s use of Gaussian processes to reconstruct the echolocation call of the ancestral bat species: in this case echolocation calls of different species were represented as basis functions with weights distributed according to a Gaussian process with covariance determined by a distance over the space of phylogenetic trees relating these bats. Apparently something similar has been done for reconstructing ancient speech patterns. Another interesting talk concerned an investigation by Francisco Rodriguez-Algarra of the rules learnt by deep classifiers to distinguish between different music genres: in one application it turned out that the classifier was predominantly based on sub-audible frequencies below 20 Hz! So although it was doing a good job on the training dataset it clearly wasn’t learning the type of difference notable to a human ear, and hence its potential to generalise over different qualities of recordings that may not have such accurate sub-bass reproduction was expected to be poor.

Of relevant to today’s discussion was the talk by Lukas Mosser on the use of generative adversarial networks (GANs) to model the distribution of structures seen in various types of porous media (a beadpack, a Berea sandstone, and a Ketton limestone). Although a flexible Gaussian process model fit to 3D scans of these rocks was able to learn the correlation function of filled/unfilled voxels, it was observed that mock datasets drawn from the GP posterior simply didn’t look convincingly like real porous media—evidently there was information contained in higher order structure functions not captured by the GP. Hence, Lukas and his team turned to GANs as a tool for generating more realistic mock images of the porous media. Reassuringly, they were able to show that the mock images from the GAN also reproduced the correlation function found by the GP, but also key higher order functions.

After his talk Lukas was able to point me towards a recent astronomy paper that I’d missed on the arXiv by Mustafa Mustafa et al. called “Creating Virtual Universes Using Generative Adversarial Networks“. In this arXival the authors train a GAN to mock weak lensing maps produced by an expensive cosmological simulation code. They show that they are successfully able to do this, such that the fitted GAN can then be used to easily draw additional mock weak lensing maps statistically indistinguishable for their original training set. The authors motivate the use of GANs in astronomy with reference to the use of cosmological simulations to learn the covariance matrix of key observational summary statistics under different parameter settings. As has been discussed previously on this blog, there are various ways this can be done with different biased and unbiased estimators for the covariance, although when used as plug-ins all are still producing biased likelihood estimates (Price, Drovandi & Lee 2017; see also Selentin & Heavens 2015). The utility of GANs in this scenario is not readily apparent to me, however. If one imagines using a large number of mock images drawn from the fitted GAN to estimate the covariance matrix then I’m sure you will effectively get a point estimate but it won’t capture any more information than available in the original set of training simulations; rather it’ll just have a somewhat opaque regularisation to it (by the structure of the GAN) and will have been extra-expensive to obtain (because of the cost of fitting the GAN).

I suspect the authors recognise this as in their penultimate paragraph they point towards future work on a parametrised GAN to learn the generative model over a suite of mock datasets from different parameterisations of the base model. This makes sense to me but brings the problem back from the specific power of GANs to the regime of other deep generative models which have been used for this purpose, such as the example by Papamakarios & Murray (2016) and the variation Bayes + GAN version by Tran et al. (2017). I think there certainly are use cases for such approaches but I don’t have faith that precision cosmological modelling is one of them because of the sheer precision required for knowledge of the cosmological covariance matrices (or, if you like, their precision matrices) in order to make progress on the current understanding. I only hope that the cosmologists planning the next generation of precision observations are budgeting for some serious computational simulation time and not just pinning their hopes on some new algorithmic or statistical solution. I know I don’t have any easy answers.

As a minor comment I would note that I’m not a fan of (mis-)using p-values as evidence for the null, even if it’s in a somewhat tautological case (here the network is trained until it passes a p-value threshold of the K-S test).