As one comes to know more colleagues within astrostatistics it naturally becomes more difficult to criticise their work without hestitation. I do, however, feel that the latest contribution from Stein (who I don’t know), van Dyk (who I do know), Kashyp (who I’ve emailed) & Siemiginowska (ditto) deserves some bashing up here 🙂
In the manuscript, “Detecting Unspecified Structure in Low-Count Images“, the authors propose a framework for ‘testing whether inferred structure in an image with Poisson noise represents a significant departure from a baseline (null) model of the image’. The chosen test statistic is a little unusual—namely, the tail probability for a parameter describing the relative strength of the unspecified component under the Bayesian posterior—although not too unusual for astronomers used to thinking of (e.g.) Bayes factors as test statistics (see Jenkins & Peacock 2011). The particular context in mind is that of detecting (faint) X-ray jets beside (bright) AGN in Chandra X-ray Observatory data.
For me there are two major problems with this paper. First, I simply don’t believe that it is worthwhile for such applications to choose a test statistic that requires an MCMC procedure acknowledged to be ‘computationally expensive’. Although the authors adopt a lower bound estimator for the p-value to limit the damage of only being able to run a limited number of simulations under the null, I don’t believe this saves the procedure in practical terms. Far better in my opinion would be to devise a cheap-to-compute test statistic for which the p-values can be computed precisely. The authors note that they use a Bayesian model to infer image structure because ‘it provides a principled way to account for a PSF, varying probability of photon detection, and varying amounts of smoothing at different image resolutions’. Acknowledging this motivation I would point towards analogous deconvolution problems for Gaussian distributed data in which the Wiener filter provides a fast exact solution for the posterior mean (e.g. Ensslin et al. 2009); that is, I’m suggesting one’s efforts in this would be better directed towards a fast estimator of some particular posterior functional under the Poisson likelihood if a high level Bayesian approach is sought. Otherwise, I would advocate for abandonning the Bayesian model altogether and constructing a test statistic from a faster direct transform of the observational data; although described from the Bayesian point of view the example of Lochner et al. using the angle between the AGN and apparent twin jets provides direction for formulating an appropriate statistic. Indeed, disentangling the p-value from the Bayesian image reconstruction procedure would seem to dove-tail well with the authors’ aim to provide ‘an additional check to prevent over-interpreting apparent structurre in the noise as discovery of real structure in the signal’.
My second (major) problem is that I don’t believe enough (any?!) attention has been given to the potential for mis-use of p-values in this problem, for instance that due to multiple hypothesis testing since astronomers would no doubt like to roll this approach out to many hundreds, thousands, or even tens of thousands of images. Or, perhaps more likely, they will run another algorithmic search procedure to mine data for images with departures then seek the p-value from this Bayesian test statistic as ‘proof’ of a real companion. I particularly worry that such behaviour will be encouraged by the authors’ suggestion in section 3.2 of considering multiple versions of the test statistic according to different cut outs of the target region. One can foresee astronomers simply playing with the image cut out until they get the p-value they desire. To my mind the authors’ one caveat disclaimer—“Although very popular in practice, p-values are criticized on theoretical grounds from both frequentist and Bayesian perspectives (e.g., Berger and Delampady 1987; Zhao 2014).”—is not sufficient to abrogate their responsibilities to steer the audience towards better practise in the use and interpretation of p-values.
A couple of minor concerns of mine. (1) The multi scale smoothing model based on quadrants of the image seems a bit clunky in the era of machine learning and fast Bayesian non-parametrics; and indeed the use of a Dirichlet prior with hyper-prior weighting its hyper-parameters towards zero (i.e. highly un-smooth) places responsibility for the smoothing on the number of layers allowed in the multi scale subdivision. (2) The estimation of tail probabilities from an ordinary MCMC run is a bit uninspired given that astronomers were early adopters of nested sampling which has a close connection to equivalent procedures in rare event estimation (e.g. Guyander’s PhD thesis).
That said, it’s worth remembering that I try to hold astrostatistical contributions by statisticians (and more
competent data-literate astronomers) to a higher standard than ordinary astro-stats papers.