I noticed two papers on astro ph in the past week attempting to perform hypothesis testing by comparing parameter estimates for subsamples of the data: the hypotheses in each case being that the subsamples do or do not share the same parent distribution. The first is that by Sami Dib in a study of **IMF universality** (here) and the other is that by Appleby & Shafieloo in a study of **local universe homogeneity** (here): both important topics in contemporary astronomical research.

In the Dib paper a variety of parametric IMF forms are fitted against the ‘observed’ (estimated) mass functions of eight stellar clusters and their Bayesian posterior parameter credible intervals are then compared. Since a number of these are more widely separated than one might expect under the hypothesis of a single universal IMF form the author concludes that IMF universality has been disproven. However, two methodological problems are immediately evident. First, there are no posterior predictive checks made to confirm that the Poisson sampling likelihood function gives a good description of the observational data; possible unaccounted for noise sources (such as the acknowledged possibility of binary contamination) may well contribute an effect such that the recovered posterior remains a poor model for the observed data, and hence the posterior credible intervals may not be meaningful. Second, within the Bayesian framework the conventional approach to hypothesis testing would be to compare the model evidence (marginal likelihood) for the hypothesis that all clusters were drawn from a single IMF parameterisation versus that of the alternative hypothesis that some clusters were drawn from different IMF parameterisations. This model evidence framework introduces the “Occam’s Razor” effect oft cited as a strength of the Bayesian approach.

In the Appleby & Shafieloo paper the authors fit Schechter luminosity functions to the 2MASS local galaxy population in 4 separate quadrants of the sky. Although the fitting is done with a maximum likelihood approach the authors’ hypothesis testing approach is much the same as Dib’s: fit the parameters of each quadrant separately and compare (maximum likelihood-based approximate) confidence intervals. My thoughts on this again follow the discussion above. It’s interesting to note that there’s no plot of binned luminosity function versus fitted shown; from past experience I would expect the Schechter function to be a poor model for the observed data, hence the credibility of the derived parameter constraints is questionable. In some sense the Schechter function parameters are serving here as a summary statistic for the observed data, when in fact it would make better sense to compare directly the empirical distribution functions in each quadrant via, e.g., the K-S test. Assuming that such a test would provide ever stronger evidence of local inhomogeneity two follow up questions would spring to mind: are there any unmodelled effects that could cause this result (e.g. different galactic dust extinction as a function of viewing direction)? and, in any case, is such a level of observed inhomogeneity on this scale notable/noteworthy in a cosmological sense?