## More hyper parameters for softening likelihoods in cosmological inference …

I noticed this paper, “Determining $H_0$ with Bayesian hyper-parameters”, on astro ph today by Cardona et al. The problem is one that arises often in astronomy whereby the observed data are assumed to have a Normal likelihood function in which the given standard deviation for each datapoint may in fact be ‘not quite right’. Sometimes (and indeed in today’s paper) this is called a ‘systematic error’, although I would disagree with that terminology because for me a systematic error is one that can’t be overcome simply by observing more of the same type of data. The motivating examples given are where the quoted standard deviations are correct in a relative sense but their absolute scale is incorrect (as per Lahav et al. 2000), or where most of the quoted standard deviations given are correct but there is a small population of extreme outliers not drawn from the supposed Normal (my suggested citation for this would be Taylor et al. 2015). For the former scenario the previously given solution was to introduce a Jeffreys prior on the unknown scale and to marginalise this out explicitly, while for the latter the previously given solution was to introduce a mixture model with latent variables representing labels for ‘good’ and ‘bad’ datapoints.

Interestingly though, in this case Cardona et al. have decided to use neither of the previous approaches and to instead suppose each quoted standard deviation is to be divided by the square root of its own (latent) independent uniform random variable, i.e., $\sigma_i^\mathrm{new} \leftarrow \sigma_i^\mathrm{old}/\sqrt(\alpha_i)$ for $\alpha_i \sim U(0,1)$. To me this seems like a strange choice because now the softening parameters, $\alpha_i$, on each likelihood contribution can only influence each other weakly through the product of likelihoods and not more directly through a hyperprior structure. In the Lahav et al. version they directly influence each other through the shared absolute scale parameter, while in the Taylor et al. version their influence each other not quite as directly through learning the proportion of data that is ‘bad’.  In principal one can get close to the Lahav et al. version in the Cardona et al. framework by introducing a beta distribution, $\alpha_i \sim \mathrm{Beta}(\beta_1,\beta_2)$ with $\beta_1,\beta_2$ shared, and the Taylor et al. version could be approximated by a mixture of two beta distributions.  And indeed one can imagine even more sophisticated schemes based on the Dirichlet Process or Dirichlet Process Mixtures as we explored in Cameron & Pettitt (2013).  Certainly this type of hyper parameter problem in astronomy requires a prior-sensitivity analysis to properly understand how the assumed error model influences the conclusions.