A ubiquitous saying in boxing analysis is “styles make fights”, which means that to predict what a match up will look like you need to think about how the characteristic styles of the two opponents might work (or not) with respect to each other. Two strong counter-punchers might find themselves circling awkwardly for twelve rounds, neither willing to come forward and press the action. While a match up between two aggressive pressure fighters might turn on the question of whether their styles only work while moving forwards. As an analyst of Bayesian science my equivalent maxim is “priors make models”. Well-chosen priors can sensibly regularise the predictions of a highly flexible model, achieve powerful shrinkage across a hierarchical structure, or push a model towards better Frequentist coverage behaviour. For that reason I don’t understand why cosmologists are so keen on ‘uninformative’ priors. It’s like throwing away the best part of Bayesian modelling.
Anyway, two papers from the arXiv last week caught my eye. The first proposes a statistic for ‘quantifying tension between correlated datasets with wide uninformative priors’. So, aside from the focus on a type of prior (wide, uninformative) that I don’t care for, I’m also puzzled by the obsession of cosmologists with searching for ‘tension’ in the posteriors of models with shared parameters fitted to different datasets (or different aspects of the same dataset), as an indicator of either systematic errors or new physics. As this paper makes clear, there is a huge variety of techniques proposed for this topic, but all of them come from the cosmology literature. How is it that no other field of applied statistics has got itself twisted up in this same problem? An example use of this statistic is given in which a model with shared parameters is fitted to four redshift slices from a survey and the decision whether or not to combine the posteriors is to be made according to how much their separately fitted posteriors overlap.
The other paper I read this week concerns the coupling of a variational autoencoder model as generative distribution for galaxy images with a physics based gravitational lensing model. The proposal for this type of model and the authors’ advocacy for modern auto-diff packages like PyTorch makes a lot of sense. However, it seems that a lot of work is still to be done to improve the prior on galaxy images and the posterior inference technique, because for the two examples shown suggests that there is a serious under-coverage problem in the moderate and high signal-to-noise regime. Also, the cost of recovering a small number of HMC samples is very high here (many hours); I don’t think that HMC is a viable posterior approximation for this type of model. Any why bother when the coverage is so bad? Most likely a better option will be some kind of variational approximation that will be quicker to fit and will improve coverage partly by accident and partly by design through its approximate nature; i.e., deliberately slowing the learning rate. Sounds crazy to some, perhaps, but remember that the variational autoencoder here is trained via stochastic gradient descent with a predictive accuracy-based stopping rule, which is just another way of slowing the learning rate or artificially regularising a model.