In a recent arXival it is proposed to estimate the relationship between redshift and the log comoving density of ionising photons with a “non-parametric” Bayesian approach, such that “ can take any value at any redshift”. In fact, the authors then introduce the following model structure: at the midpoints of 11 redshift bins we have a unique value of and in-between we have a linear interpolation (evaluated at a discrete set of grid points; so more of a fine step interpolation). The maximum variation in between redshift bins is assigned a uniform prior between -1 and 1 dex. So, if the bounds of the uniform prior play a negligible role in constraining the observed fit, what we have is a series of local linear regressions in which information is shared only between observations within each redshift bin, with a prior favouring logarithmic trends in . I can’t say whether or not this is a good model for the observed data but it’s certainly not a non-parametric model in the strict sense: the hard linear regression within fixed redshift bins is a strong structural parameterisation. It’s also not the kind of model I would propose for this purpose.
The beauty of Bayesian models is that they can be built hierarchically with the ability to take advantage of shrinkage at a data-adaptive scale: hopefully thereby exploiting the bias-variance trade off to improve estimation accuracy. For instance, here we might add a hyper-prior structure to allow the model to learn an appropriate prior for the slopes: shrinking them towards their empirical mean. Or one could introduce a redshift-dependent hyper-prior over the slopes: shrinking together only those neighbouring a given bin. Since we’re focusing on the slope and we’re going to model at a sub-bin scale, why not switch to an integrated Gaussian process model (i.e., a Gaussian process for the rate of change), such as used in sea level reconstructions in the climate change literature. Of course, care needs to be taken to avoid the credible sets from these models becoming over-confident (as mentioned in the previous post), but the advantages of smoothing provided by this type of model are usually worth the effort of checking that it’s being well-behaved. (And this checking is something one should be doing for any model.)
One minor technical issue I noticed in the same arXival regards what seems to be a method for marginalisation over uncertainties in some photometric redshifts. The authors write that “we draw 1000 redshifts from the photometric redshift distribution for sources used in each measurement, maintaining detailed balance by using same drawn redshifts in every likelihood calculation, and use the median of the likelihoods calculated for the drawn redshifts”. It sounds to me like the drawn redshifts are serving as fixed sample points for an arithmetic mean estimate of the marginal likelihood: for this reason it would be the mean rather than median that is desired. In particular, if the median was chosen because the likelihoods over these sample points show a large dynamic range that’s probably a good sign that the current strategy is breaking down and that the unknown redshifts should just be treated as nuisance parameters and integrated out as latent parameters in the MCMC sampler. For reference, it is possible to construct an MCMC sampler that targets the true posterior given noisy arithmetic mean estimates of the marginal likelihood, but apart from using the mean the other important trick is to draw a new sample for each proposal while not updating the estimate of the marginalised likelihood at the current chain position. This is the pseudo-marginal MCMC method. The version the authors suggest would preserve detailed balance of the chain targeting a biased version of the posterior, so the detailed balance isn’t really helping.