Last week’s astro-ph mail out saw an interesting contribution by Heavens et al., who present a formula for computing the Fisher information of an experiment to estimate what appears to be the slope and coefficient of a linear regression model with (possibly correlated) Normal errors in both X and Y variables. The main contribution is some impressive matrix algebra to give an analytic form for the likelihood function supposing a uniform prior on the distribution of the (latent) design variables. This, of course, should sound odd though since it implies an improper experimental design; the authors note that their formula can hold too when the errors in the X variables are much smaller than the scale of the design distribution: but in this case I would imagine one do just as well in estimating the Fisher information by simply ignoring the errors. What really caught my eye, however, was that they submitted the paper to JRSSS(B)! For those who don’t know, this is one of the *top* statistics journals with a brutally low acceptance rate: by the old ranking system this was an Astar journal, i.e., a lot of my colleagues in statistics would happily sell their soul at the crossroads on the edge of town to get published in JRSSS(B). Certainly it would be unusual to have a JRSSS(B) publication referring readers to some astronomical papers for “pedagogical discussions of straight-line fitting and Bayesian approaches to fitting”, as the Heavens et al. manuscript does! 🙂 [Without being too bitchy I think they authors (and other astronomers with similar papers) might look at some of the journals in the B list: e.g. American Statistician often publishes this sort of thing (and its papers can be quite widely cited, especially if they’re picked up by biologists or social scientists, in addition to the target audience).]
One paper catching my eye in today’s astro-ph listing was that by Ciollaro et al. concerning “Functional regression for quasar spectra”. The challenge, in brief, is to estimate the functional form (wavelength->flux) of a high redshift quasar spectrum below the Ly-alpha limit conditional upon the its observed functional form above this limit, given a low redshift sample of quasar spectra (with Ly-alpha forest damping) as reference. The authors’ solution is to apply modern Function Data Analysis techniques to derive a non-parametric functional regression estimator, which is actually not as exciting as it sounds: essentially it amounts to taking an average of the low redshift reference spectra, weighted by some kernel taking a semi-metric distance between the input spectrum and library spectra in the above Ly-alpha wavelength regime. Ciollaro et al. do make some interesting contributions to estimation of confidence intervals on the predicted spectrum (including a use of my favourite bootstrap: the wild bootstrap), however, such that the end result is like a principled version of the sort of “machine learning” algorithm astronomers might well have stumbled towards by themselves one day.
Finally it’s worth noting a couple of recent papers doing hierarchical Bayesian inference in astronomy: Sanders et al. and Ramos et al.. This is a basic tool of contemporary Bayesian inference (cf. Gelman et al. “Bayesian Data Analysis”)—one that should be taught in every “Introduction to Bayesian Astrostatistics” course—so it’s always encouraging to see this approach in action. I noticed that Sanders et al. use STAN: for those who haven’t discovered this neat package for building fast (i.e., c++ compiled) MCMC-based inference codes it’s well worth looking up. In fact this was the second use of STAN in astronomy to my knowledge, but I’ve forgotten the first, which was in a paper only a couple of weeks ago(?).