I noticed this neat little paper by Uemura et al. on astro ph today demonstrating the application of LASSO regression to variable selection in the case of supernova-based distance/luminosity estimation. While a quick google scholar search suggests this is not the first application of the LASSO method to an astronomical problem I’d wager that it’s the first use with such a clear methodological explanation. The basic idea behind the LASSO method is to introduce an additional constraint to the regression on top of the ordinary least squares penalty—the new constraint being a limit on the L1 normed distance of the vector of regression coefficients (supposing the predictor variables have been standardised to zero sample mean and unit sample variance first). As illustrated neatly by the graphic below from one of the most highly cited stats papers ever (Tibshirani 1996), the choice of L1 norm can yield actual zeros in the constrained best-fit coefficient vector—hence this method is closely related to various sparsity tools used in astronomical image compression and cosmological signal reconstruction. More generally the idea of seeking to avoid over-fitting through regularisation with a hard threshold distance penalty is a foundational idea in statistical modelling, and appears in many guises including the maximum entropy method and the calibration of artificial neural networks.
For Bayesians it is worth noting that LASSO-esque types of shrinkage regularisation are possible with priors on the coefficients of a regression (or other) model having a sharp concentration of density around zero, as in the double exponential priors of JAGS (and see Battacharya et al.). Fans of Gaussian processes might also point out that a more sophisticated solution is to go beyond variable selection over the finite model space of linear predictors (remembering that this means linear in the coefficients, not a restriction on transformations of observables to produce the set of candidate explanatory variables) using Gaussian process regression; the prior here playing the role of regulariser/shrinker, but not to the level of sparsity produced by the L1 norm in the LASSO. Yet another Bayesian approach is to just go ahead with Monte Carlo sampling over the huge space of models (combinations of included/excluded explanatory variables)—which might be too big to ever fully explore though meaningful results can be extracted from the marginals of a partial sampling—with the marginal likelihood (usually simplified to the BIC for computational speed in such applications) quantifying the quality of each model and playing a regularisation role.