I noticed on astro ph last week a new paper promoting a method of “Sparse Bayesian Inference” for probing the temperature structure of the solar corona by Warren et al. In this case (as in many Bayesian problems) the model-prior-likelihood triple plays the role of “regulariser” to bring an ill-posed inversion problem back into the realm of identification. And in particular the aim here is to infer the form of the Differential Emission Measure (DEM)—a continuous function () from temperature to emissivity—probed via emission intensity observations for a given line list which serve as evaluations of known test functions, , integrated against the unknown DEM, , as . The DEM in this example is proposed to be modelled as the sum of a series of evenly spaced spline basis functions. The sparsity comes into play where the authors introduce a prior specification targeting solutions in which the ‘coefficients’ on most of these spline bases are close to zero; implemented via a Bayesian LASSO (i.e., independent Cauchy* prior on each coefficient). Although I have previously advocated for the use of such approaches for ordinary regression problems, I do not think this is a good approach for the present application.
The red flag for me here is that the posterior fits resulting from this method don’t look particularly sensible since the sum of a small number of spline basis functions selected out of a much larger set of candidates tends to give DEM models that look like a bad Halloween costume.
The introduction to the paper suggests that real DEM features are expected to be intrinsically narrow in range under one theoretical paradigm, so we might be prepared to motivate a prior favouring compression of ‘mass’ in the fitted DEM within small bands of temperature as opposed to diffused across the entire temperature range, but I still suspect there remains a better choice than sparsity penalised splines. One problem with the splines is that the width of the chosen spline basis functions is (without further theoretical guidance) quite arbitrary and will lead to drastically different solutions—an effect amplified by the sparsity condition. Perhaps this issue could be alleviated by introducing a further regularisation in the form of a smoothness (mean absolute second derivative) penalty, but how to choose the relative weights? Well the natural step would be to draw DEMs from a candidate prior and tweak its structure and hyper-parameters until the prior is generating plausible DEMs before it sees any data. And I think this is my point: that sparsity in itself lacks purpose unless its effect is to render a prior that generates more plausible DEMs than any feasible alternative. On this point I remain to be convinced.
* Technically the Bayesian LASSO is usually taken as the double exponential (or Laplace) distribution but the Cauchy has a similar effect!