Fisher matrices, functional regression, and stan …

Last week’s astro-ph mail out saw an interesting contribution by Heavens et al., who present a formula for computing the Fisher information of an experiment to estimate what appears to be the slope and coefficient of a linear regression model with (possibly correlated) Normal errors in both X and Y variables.  The main contribution is some impressive matrix algebra to give an analytic form for the likelihood function supposing a uniform prior on the distribution of the (latent) design variables.  This, of course, should sound odd though since it implies an improper experimental design; the authors note that their formula can hold too when the errors in the X variables are much smaller than the scale of the design distribution: but in this case I would imagine one do just as well in estimating the Fisher information by simply ignoring the errors.  What really caught my eye, however, was that they submitted the paper to JRSSS(B)!  For those who don’t know, this is one of the *top* statistics journals with a brutally low acceptance rate: by the old ranking system this was an Astar journal, i.e., a lot of my colleagues in statistics would happily sell their soul at the crossroads on the edge of town to get published in JRSSS(B).  Certainly it would be unusual to have a JRSSS(B) publication referring readers to some astronomical papers for “pedagogical discussions of straight-line fitting and Bayesian approaches to fitting”, as the Heavens et al. manuscript does! 🙂  [Without being too bitchy I think they authors (and other astronomers with similar papers) might look at some of the journals in the B list: e.g. American Statistician often publishes this sort of thing (and its papers can be quite widely cited, especially if they’re picked up by biologists or social scientists, in addition to the target audience).]

One paper catching my eye in today’s astro-ph listing was that by Ciollaro et al. concerning “Functional regression for quasar spectra”.  The challenge, in brief, is to estimate the functional form (wavelength->flux) of a high redshift quasar spectrum below the Ly-alpha limit conditional upon the its observed functional form above this limit, given a low redshift sample of quasar spectra (with Ly-alpha forest damping) as reference.  The authors’ solution is to apply modern Function Data Analysis techniques to derive a non-parametric functional regression estimator, which is actually not as exciting as it sounds: essentially it amounts to taking an average of the low redshift reference spectra, weighted by some kernel taking a semi-metric distance between the input spectrum and library spectra in the above Ly-alpha wavelength regime.  Ciollaro et al. do make some interesting contributions to estimation of confidence intervals on the predicted spectrum (including a use of my favourite bootstrap: the wild bootstrap), however, such that the end result is like a principled version of the sort of “machine learning” algorithm astronomers might well have stumbled towards by themselves one day.

Finally it’s worth noting a couple of recent papers doing hierarchical Bayesian inference in astronomy: Sanders et al. and Ramos et al..  This is a basic tool of contemporary Bayesian inference (cf. Gelman et al. “Bayesian Data Analysis”)—one that should be taught in every “Introduction to Bayesian Astrostatistics” course—so it’s always encouraging to see this approach in action.  I noticed that Sanders et al. use STAN: for those who haven’t discovered this neat package for building fast (i.e., c++ compiled) MCMC-based inference codes it’s well worth looking up.  In fact this was the second use of STAN in astronomy to my knowledge, but I’ve forgotten the first, which was in a paper only a couple of weeks ago(?).

Advertisements
This entry was posted in Astrostatistics, Infinite-Dimensional Inference. Bookmark the permalink.

12 Responses to Fisher matrices, functional regression, and stan …

  1. lukebarnes says:

    I was doing that kind of Lyman alpha forest thing at ETH. The literature is a bit of a mess …
    * power-law extrapolation
    * Find the mean, reject the everything 2 sigma below the mean, find the mean again, repeat until nothing is rejected.
    * Principle component analysis, somehow.

    I messed around with a few methods. Is the Ciollaro et al. method reasonable? Is there anything better in the statistics toolbox?

    • Yeah, I thought you might have encountered this problem at some point in the past with your Lyman alpha absorbers. The Ferraty & Vieu (et al.) type non-parametric estimator that Ciollaro uses seems like a quite reasonable choice to me, but there’s a whole world of Functional Data Analysis (e.g. the monograph by Ramsay & Silverman) that I don’t know much about.

      Coincidentally I’ve been reading a paper by Deliagle & Hall on “densities” for probabilities over function spaces which discusses the idea of posterior “mode”-based estimators and their connection to PCA representations. Long story short, I suspect that most of these FDA ideas have analogous PCA-based versions that do much the same thing.

  2. lukebarnes says:

    Also, a suggestion: collect your “Six key trends in contemporary statistics” posts into one summary post, with the following format:

    Before you read these posts, your should know the material in this textbook … (Gelman?)

    1. Bayesian non-parametrics / semi-parametrics. (Link)
    The Dirichlet process offers a highly flexible prior over the space of probability distributions, and can serve as a convenient basis for non-parametric modelling of unknown error forms.
    Start here: http://www.gatsby.ucl.ac.uk/~ywteh/research/npbayes/dp.pdf

    2. …

    Once the posts leave the front page of your blog, and particularly if people arrive at the first post via a link, it’s hard to find the other ones and in the right order.

    • Thanks for the suggestions: I am planning to update the blog soon(isn) with e.g. a top menu bar that includes a link directly to the six key trends, and also to some other overview type posts I have in the pipeline.

  3. Brendon’s list of overrated things:

    * Stan
    * Affine invariant MCMC

    • Really? Have you tried it (Stan) out on a non-trivial problem and found it falling short of expectations? I’ve only played with simple modifications to its Gaussian Process examples but it seems very quick for GPs up to 500 data points (=500 latent variables).

      • I have only tried STAN properly twice. On one problem it was outperformed by JAGS, however this was fixable by changing the coordinate system, then the two seemed to be roughly similar. On another problem I was getting good mixed results about 10x faster in DNest (although with about 10x the coding effort).

        Your mileage may vary.

      • Interesting. I would expect that Stan’s Hamiltonian MCMC might really come to the fore for high (say >50) dimensional problems, but indeed, mileage may vary!

  4. aasensio says:

    I hate when you have a probabilistic model more or less clear in your mind and then you spend days/weeks working on the (technical issue of) sampling to make inferences. It is nice that some samplers are freely available (affine invariant in emcee, HMC in Stan, etc.) but my experience is similar to Brendon’s. Specially when you end up having thousands of variables, it is better to code HMC from scratch and tweak it to make it work in your problem.

    • Thousands of variables … hypothetically or do you have a particular problem in mind?! Does the prior feature a GP: i.e. could this be a job for INLA?

      • aasensio says:

        I can think of a few problems in Solar Physics in which I need this many number of parameters. Many pixels observed, each one with different physical conditions and all of them coming from a common distribution. I’ve never considered INLA as a possibility but I will read some literature..

  5. Alan Heavens says:

    Thanks for your appreciation of the substance of the paper, Ewan. In fact the paper shows that the Fisher matrix does depend on the x errors even in the limit when the design distribution is broad, and in fact the standard Fisher calculation may be significantly in error. The revised version of the paper (linked as above) also shows that ‘x’ can be any quantity on which the model depends, so the results are rather general.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s