On the bus to Cambridge today for a workshop/un-conference on the genetics of the malaria parasite, which gives me almost four pile-inducing hours of bus travel around the outskirts of London and hence the time for a batch job of astro-ph commentary. The first two papers concern what I see to be applications of Bayesian inference on the borderline of its practical competitiveness with more ‘algorithmic’ strategies, while the latter two concern applications of advanced Bayesian methodologies to cosmological problems. Papers 2 anad 3 are also linked by the need for pragmatic decisions about the limits of hierarhical modelling.

1) “Matching radio catalogs with Realistic Geometry …” by Fan et al. (including Ray Norris, who I’ve long thought was a bit of an ass-hat since he turned down my application for a summer studentship at the AAO many years ago 🙂 ) considers the utility of Bayesian model selection for distinguishing between genuine double-lobed radio sources and chance three way alignments of unrelated objects in large-scale imaging fields (with additional optical and near-IR data). In this case the limited range of radio source morphologies considered (i.e. ‘neat’ double lobed, symmetric sources) allows relatively tight priors to be placed on each unknown model parameter, such that the model selection criterion corresponds pretty closely to what one might write down from a purely ‘algorithmic’ standpoint, i.e., if one was aiming to write down a set of rules to computationally identify double-lobed radio sources (much like the algorithm for finding asteroids in optical survey data that I eventually wrote down during my back-up summer studentship at MSO). Indeed, it will be interesting to see how the authors get on with their further experiments in this direction where some challenges for optimising this pipeline within a fully Bayesian frame are already noted: e.g. “*To improve the algorithm, greater weight needs to be given to the presence of IR counterparts …. It is difficult to see how such information can readily be incorporated into a Bayesian algorithm.*“.

One annoying part of the paper for me was the lack of explanation as to how the Bayes factors were estimated. “*To scale to the large number of possibilities, the numeri- cal integration has to be very fast. Our implementation uses importance sampling for all integrals (e.g., Flannery et al. 1992). We speed up the calculations by always sampling from the tightest possible Gaussian.*” We’ve encountered importance sampling frequently on this blog, so that’s understood: what I don’t get is what is meant by ‘the tightest possible Gaussian’. Also, for such a low dimensional model I would have thought that quadrature would be faster per unit accuracy; especially if there is some ‘tightest possible Gaussian’ reference point from which Gauss-Hermite integration could be conducted. It’s also annoying that the priors aren’t precisely specified: I’d like to know whether or not anything was lost by not (apparently) truncating the Normal prior on k (eqn 4) to the positive reals.

2) “Celeste: Variational inference for a generative model of astronomical images” by Regier et al. presents an inital application of a variational inference approach to modelling astronomical images, with a further aim of the manuscript being to highlight the potential of data-intensive problems in astronomy as challenges for machine learning. (Or at least ‘pro-bono’ machine learning since there’s not so many astronomical problems on kaggle, or of interest to google labs, yet? 🙂 ). Despite including a couple of well-known astronomers the authors give a rather unfair impression of the degree of past work in astornomy on generative models for astronomical images, but nevertheless highlight some interesting points which deserve further discussion. Alors …

First of all, let’s correct the record. “*But, to our knowledge, fully generative models for inferring celestial bodies’ locations and characteristics have not yet been examined. *[footnote to single workshop note by Hogg et al.]” One previous example close to the specific application considered here is that of Brewer et al. (and see also this) in which a fully Bayesian solution is presented for the case of locating and characterising stars in a crowded field with the number of stars not fixed *a priori* (thereby addressing one of the ‘future work’ challenges flagged up by Regier et al.). In observational cosmology there’s been a bunch of work done on Bayesian characterisations of galaxy shapes in large imaging fields (e.g. Lance Miller et al.; and Kitching et al.). And even in galaxy structural decomposition the old Gim2D code gives a Bayesian exploration of the 2D galaxy profile (image) space with PSF convolution in its generative model (albiet with effectively a Normal approximation to the true Poisson likelihood) (see also the MultiNest capabilities of MegaMorph). etc. etc.

Now, what is the USP of the present paper aside from (not really) being novel just for introducing a generative image model? Well, in fact, a few things. What Regier et al. do is to construct a hierarchical model for two types of celestial object—galaxy or star—to which they supply tight priors (much like in paper 1 above) derived from huge libraries of previous observations with every empahsis being made on computational speed: the galaxy model includes approximate versions of the deV and exp profiles made of Gaussian mixtures which allows for analytic convolution supposing a Gaussian PSF, and a fast variational scheme is developed for posterior approximation (see Magorrian for a previous use of variational inference in astronomy). A good deal of attention is also given to decisions regarding which components of (what I’m calling) a hypothetical ‘master’ hierarchical model to subject to posterior constraint and which to leave as point estimates from past data processing steps. For example, in this study it is decided to leave the background level fixed, as estimated during image processing, rather than allowed to vary as a parameter of the model. Similarly, hyperparameters of the prior distributions for galaxy colours and so on are set fixed to the values inferred from the reference libraries, when in principle they could be fitted too. Such decisions are indeed crucially important for Bayesian modelling at scale, so it is useful to see this issue discussed (here and in paper 3 below).

One interesting observation the authors make is that “*We have not found the speed of automatic differentiation toolkits competitive with manually coded derivatives, so our current results use the latter.*” This is surprising to be because auto-diff has been so greatly hyped to me by the STAN user community and the TMB community (esp. our friends at IHME). I wonder to what extent this is a limitation of the generic libraries for auto-diff that are available or simply an over-hyping of the autodiff technique itself? In our experiments with STAN it seemed that the speed simply wasn’t there for the type of Gaussian Markov Random Field models we need to fit in malaria mapping, but this could well be explained by the fact that at that point (perhaps still) the type of Koeneker product matrix structures we use weren’t yet optimised in STAN.

Finally, one butt-clenching moment for me in reading this paper was the penultimate sentence: “*Also, a model that assigns more degrees of freedom to galaxies—–unconstrained by prior distributions—–than stars biases the classification in favor of galaxies.*” As a Bayesian this doesn’t parse: more (effective) degrees of freedom should be penalized by the Occam’s razor effect of the Bayes factor relative to stars which the authors have shown already follow very closely a single locus in the colour-colour diagram and have point source morphologies.

Passing through Buckingham on the bus now = three minute pause to enjoy Ween’s “Buckingham Green“.

3) “Hierarchical Cosmic Shear Power Spectrum Inference” by Alsing et al. presents an impressive application of the ‘messenger field’ technique to a problem of genuine computational difficulty: inferring the power spectrum of cosmic shear from deep field galaxy survey data. At first I thought this application contradicted my initial impressions of the ‘messenger field’ technique (i.e., it’s an auxiliary variable method exclusively for cases of the Normal linear model where the field is structured such that the component matrices are sparse; see earlier post), but on re-reading I think it’s like I thought: because here the survey field is encompassed by a rectangular ‘frame’ within which (after adding the flat sky approximation) the Fourier representation is sparse. (Unless I missed something and the rectangular geometry is not important?? {the other limiting case of harmonics on the full celestial sphere also gives a sparse [Toeplitz] matrix}).

The authors’ explicit decision to reduce the analysis to a single piece of the hierarchy is again interesting (see paper 2 above). If only there was some decision theoretic, principled framework for making/justifying these choices? (How’s that paper going, Pierre Jacob, does it have a beginning, middle, and end yet? 🙂 )

4) “A new model to predict weak-lensing peak counts. II. Parameter constraint strategies” by Lin & Kilbinger presents an application of ABC to cosmological parameter estiamtion with weak-lensing data. The second author is already well-known in astrostatistics for promoting population Monte Carlo methods (still vastly under-utilised in astronomy), so it is no surprise that the ABC soution developed for this particular problem is also PMC (/SMC) ABC [like my SMC ABC paper, how come only Anja gets a citation for that! 🙂 ]. While I approve of all the modelling decisions here I have a few quibbles with some of the introduction and discussion of ABC:

– I feel the nature of the ABC (double-)approximation is not clearly elucidated: i.e., that unless the summary statistics are sufficent the approximation to the true posterior is only first order: i.e. does not converge to the fully Bayesian posteiror even if epsilon goes to zero. [e.g. “*Approximate Bayesian computation (ABC) is an algorithm that provides the posterior distribution of a complex stochastic process when evaluation of the likelihood is expensive or unreachable.*“]

– some conflation of the ABC error (from non-zero epsilon and summary statistics) with the Monte Carlo error from one sample accept-reject under the likelihood (which is of course unbisaed; the ‘of course’ for MC estimation also being true for MCMC thanks to pseudo-marginal theory). [e.g. “*Therefore, the one-sample test with tolerance generates samples under P(ε). This means that the fact that only one model for a given parameter π is tested does not add additional noise compared to ordinary Monte Carlo sampling.*“]

– the reference to McKinley et al. (2009) should be supplemented with more recent results: e.g. Vihola & Andreiu (and this talk); and Bornn et al.;, and to Anthony Lee’s one-hit ABC MCMC algorithm which has geometric ergodicity. But that’s a higher citation standard than I’d hold a usual astrostatistics paper to.