Today I read the new arXival by Leistedt et al. on the topic of photometric redshifting, in which a hierarchical model is proposed to include learning something of the inherent distribution of galaxies in redshift-magnitude-template space from the data itself: constraining the hyper-parameters of a distribution model and tightening the posteriors on each individual galaxy’s inferred redshift through Bayesian shrinkage.
I link to David van Dyk’s talk on this topic at the IAUS306 since afterwards I wrote to Narciso Benitez & David: “I’m not sure if you’ve talked to each other at the meeting yet, but when I was watching Narciso’s presentation on the importance of the prior for photometric redshift estimation I thought it seemed a very natural place to apply the Bayesian shrinkage ideas that David presented. I.e., allow the data itself to improve the prior for both the redshift distribution and the proportion of each template type in the survey under study using a simple hierarchical model with some hyper parameters controlling the shape of p(z) and the probability of each template!“. Though I do not believe they did end up collaborating …
All that to say, I think this is a good idea. With regards to specifics I think the proposed model in this particular paper is a little simplistic. For instance, the intrinsic distribution is formed by (arbitrarily) dividing the redshift-magnitude-template space into rectangular prisms across which a constant contribution to the prior mass is assigned by the Dirichlet. Alternatively one could avoid arbitrary binning by supposing a distribution on the partitioning (a la methods for treed Gaussian processes; Gramacy & Lee or Kim & Holmes) and then marginalising over this unknown distribution (which, in the treed GP case, does tend to produce smooth marginals despite the inherently discontinuous nature of the sub-models). Moreover, a stronger prior of correlation between bins could well be motivated by the adjacency of the prisms in the physical space. Another minor problem with the proposed implementation seems to be that since the likelihood stated does not depend on the magnitude each galaxy is presumably being assigned to a bin solely according to its (noisily) observed flux in the reference band.
On a more general level I wonder at the value of the inferred intrinsic distribution as compared to the posterior draws of the true photometric redshift lists themselves. In this type of ‘deconvolution’ problem (e.g. mixture models) the posterior for the intrinsic distribution is typically highly prior/model sensitive (see, e.g., mixture model examples). If one reports both the posterior samples of the true redshifts along with their prior densities it makes it easy for others to come along with their own prior/models for these latent variables and apply a simple reweighting to approximate their own posterior (as in my mixture modelling paper). Ultimately for comparison to cosmological models one would like a prior tuned (e.g.) for consistency with simulations of mock galaxies from that family of models, I would imagine. Likewise, one can compute whatever summary statistics concerning the redshift distribution that one might be interested in as functionals over the true redshift samples.
Edit: I forgot to mention that I think the toy simulated dataset example should be augmented with a test of performance (in RMSE) on one of the photometric redshifting challenge datasets.