Classification with Bayes factors; also regularization

Sifting through the backlog of astro ph papers I’d flagged up for reading after the Christmas holidays I’ve found two interesting studies already.  The first is “Breaking the curve with CANDELS …” by Salmon et al., in which marginal likelihood ratios computed from the fitting of model SED + dust extinction templates to filter photometry of high-z galaxies are used to assign posterior probabilities to the type of dust law acting in a given galaxy (starburst vs SMC-like).  Almost every astronomer who’s worked on observational studies of galaxy evolution will have gone through a similar exercise of template fitting in the past (e.g. in my PhD we did this with a simple maximum-likelihood approach), so it’s a familiar and important problem in the field.  At a threshold of 10 in Bayes factor (plot below is $2 \times \log \mathrm{BF}$) the authors find ‘strong’ evidence for 4 starburst templates and 17 SMC templates with 123(?) galaxies effectively unclassified in their spectroscopic sample.

While these results are encouraging I wonder whether one could get even more galaxies over the classification threshold by folding all the model fits into a single hierarchical model with MCMC moves over the dust template space (being the logical vector of all 144 dust template types)?  Exploring such a discrete model space requires a clever algorithm to jump over this discrete space (e.g. transdimensional nested sampling) but would allow the introduction of hyper-parameters to the priors on the template parameters (age, metallicity, E(B-V)) for galaxies of each dust extinction type; at present the priors are fixed with no prior-sensitivity analysis.  Certainly the above plot suggests a degree of separation of these populations by stellar mass, so likely the same is true for age and metallicity.  In principle the single hierarchical model approach would also allow one to search for (& correct) small systematic ‘biases’ (or ‘model vs reality disagreements’) in each filter, which is something we did in a more ad hoc way in my PhD research.

The second interesting paper I saw was “Dust in a compact, cold, high-velocity cloud …” by Lenz et al. in which the authors adopt a GLM (Generalized Linear Model; see e.g. de Souza et al. 2014) for predicting FIR emission intensity from a set of observed emissivities at longer wavelengths but this time the authors apply a LASSO penalty as regularizer to limit their exposure to over-fitting (as opposed to, e.g., regularization via priors w/ Bayes theorem).  The implementation seems to be correct (with cross-validation used to find the optimal strength of the L1 coefficient, and the resulting model seems to do well at removing the foreground FIR dust signal from the high-velocity cloud under study.  My only suggestion would be that in such an application it can be worthwhile to try OLS post-LASSO (e.g. Belloni & Chernozhukov 2009) and compare its performance against LASSO-only with cross-validation on an appropriate hold-out sample.