I noticed an unusual contribution on the philosophy of science with Bayesian model selection by Gubitosi et al. on astro ph the other day, in which some rather bold claims are made, e.g.

*“By considering toy models we illustrate how unfalsifiable models and paradigms are always favoured by the Bayes factor.”*

Despite the authors making a number of sniping comments about the sociology of “proof of inflation” claims in astronomy, their meta-reflections did not reach a point of self-awareness at which they were able to escape my own sociological observation: the bolder the claims made by astronomers about Bayes theorem, the narrower their reading of the past literature on the subject. Indeed, in this manuscript there are no references at all to any previous work on the role of Bayes factors in scientific decision making, even from within the astronomical canon (leaving beside the history of statistics); more precisely, it seems the authors have been spending a lot of research time on something they could have looked up online before starting.

Perhaps the most relevant past work on this topic in astronomy is the paper by Jenkins & Peacock (2011) called “The power of Bayesian evidence in astronomy” in which the frequentist properties (type I/II error rates) of model selection via the Bayes factor are examined for a number of examples. Computing the power of the Bayes factor under a given base hypothesis is directly analogous to computing the ‘falsifiability’ score suggested by Gubitosi et al.; only the power has a precise statistical interpretation, whereas the ‘falsifiability’ is somewhat nebulous (especially when the “reasonable range of data” over which the induced distribution of the evidence is to be computed is not explicitly identified as the null hypothesis).

*“These results are sensible from the point of view of probability theory: “if you want to have a high probability of winning, then hedge your bets”. However, science is not about playing the lottery and winning, but falsifiability instead, that is, about winning given that you have bore the full brunt of potential loss, by taking full chances of not winning a priori.”*

The above discussion of betting is similarly naive, ignoring whole swathes of papers on mathematical decision theory, loss functions and expected utility, and experimental design. I just can’t even. Depending on one’s choice of loss function one could write the exact same paper from the opposite point of view: that wacky theorems making very precise claims about a future observation are overly rewarded relative to the prevailing paradigm if the cost of producing the wacky theories is small and the loss of face in having them fail is small also relative to the reward of a lucky guess.

Some more specific points:

*“If model M1 gets its prediction wrong it is penalised exponentially. In contrast, model M2 is penalised for not making a prediction merely as a power-law. We emphasise that this difference in penalisations is not related to the fact that one model has a Gaussian prior while the other has a flat prior. As we will see in Section IIIC, the same behaviour is observed for an intermediate model, and thus we conclude that this penalisation behaviour is the same regardless the particular form of the priors.”*

(1) The authors seem confused about the difference between prior and likelihood. In the toy example presented one model has a delta function (dogmatic) prior and the other has a flat prior; neither has a Gaussian prior. The Gaussianity and exponential penality emerges from the Gaussian likelihood function, so it is no surprise that it takes the same form in their intermediate model, which simply adds Gaussian blurring to the original delta function.

*“The denominator P(D) normalises the posterior distribution … “*

(2) Interestingly, although the authors emphasise in their introduction that the evidence plays the role of normalizing constant for the likelihood-weighted prior (aka the unnormalized posterior), in the toy example given the delta function prior is in fact a rare measure for which this normalization role is not fulfilled. To understand why this is from technical point of view one needs to look at the measure-theoretic definition of conditional probability (e.g. Rosenthal’s “A First Look at Rigorous Probability Theory”), but informally it is simply because the posterior is identical to the prior under this model: theta is fixed and assumed exactly known.

“By considering toy models we illustrate how unfalsifiable models and paradigms are always favoured by the Bayes factor.”

I might be misinterpreting their statement, but it seems impossible in principle. If you have two competing hypotheses with certain prior probabilities, it’s mathematically impossible for one to be favoured for all possible datasets, since the expected value of the posterior distribution (over all possible datasets) has to equal the prior.

The paper concerned has a number of detailed problems, but the biggest difficulty I think is that it starts from a false premise. From a Bayesian perspective the relevant criterion for a scientific theory is not “falsifiability” but “testability”. Falsifiability is just one side of the coin. What actually matters is that data should be able to change the probability of a theory, either increasing it or decreasing it. Concentrating on the latter possibility only leads to the kind of muddle that this paper gets into.

Pingback: Falisifiability versus Testability in Cosmology | In the Dark

Reblogged this on In the Dark and commented:

Yesterday’s post is generating quite a lot if traffic for a weekend so I thought I would reblog this piece on the same topic..