A sociological observation segueing into a note on Drton & Plummer.
I noticed on today’s astro ph a paper on ‘extreme deconvolution’, which is an astro-statistics term for fitting a Normal mixture model to noisy data; I’m not sure if the technique is extreme per se of if it needs to be applied to a large dataset (as in a pioneering example) to properly garner the complete appellation. In this instance the application envisaged (and with a small example) is for modelling the multi-variate distribution of supernovae in the SALT2 dataset which I believe to be on the order of 800 objects.
So, the sociological observation I have is that the new generation of astronomers—let’s call them Gen Y, while defining my 1981 birth year as the (left closed, right open) boundary of Gen X—seem to view their wrapper scripts for running functions as a worthwhile contribution to the published literature. That is, the paper at hand doesn’t present a new algorithm or methodology for performing extreme deconvolution, or add any novel contribution to thinking about extreme deconvolution methodologies in astronomy, rather it simply describes the authors’ wrapper script for running either of two existing extreme deconvolution models and computing conditional densities from their output. The latter itself is simply the application of the well known rules for manipulating Multivariate normals which are not in any sense difficult to implement—and certainly not to the extent that I could imagine anyone seeking a third-party application rather than just looking up wikipedia for themselves. But I guess if editors are happy to publish it then c’est la vie.
Another thing that surprised me about this paper was that there are two methods for selecting the number of mixture components presented—the BIC and a cross-validation-based mean log-likelihood method—but the motivations of each are barely discussed and nothing is mentioned of the (order 1) nature of these approximations (especially so when we’re in the extreme deconvolution case and the observational uncertainties mean that n is not quite n as envisaged by Schwarz). Moreover, it’s observed that on the toy example the BIC points towards 5 components and the cross-validation method points towards something greater than 10, but you should probably use the BIC because it’s too expensive to add lots of components. So, uh, yeah.
Having got that off my chest I thought it worth pointing out the existence of the recently ‘read’ (at the RSS) Drton & Plummer paper looking at the problem of model selection for the case of nested singular models, of which Gaussian mixtures (sans observational noise) is one of the given examples. In particular, the authors offer a strategy for applying Watanabe’s method to this problem that side-steps the self-defeating requirement of knowing in advance which is the true model, but does not side-step the requirement to be able to identify the ‘learning rate’ of each model.