I noticed some silliness in a pulsar timing paper recently (Babak et al.), which was a bit surprising given the number of authors, some of whom I know are experienced Bayesians so I assume they must only have skim-read the manuscript. The silliness I’m referring to is this:
“We compute the evidence using both MultiNest and parallel tempering MCMC searches. In all the Bayesian searches with fixed noise we obtain Bayes factors close to zero, consistent with a non-detection and with the outcome of the frequentist analysis. In particular, we get log(B) = −0.27 for the search Bayes_E, and log(B) = −0.31 for the search Bayes_EP_NoEv.”
To my mind this is very poorly worded for two reasons:
(1) Bayes factors take the range 0 (‘null’ hypothesis infinitely preferred) to (‘alternative’ hypothesis infinitely preferred) with 1 representing equal evidence for each. Hence, the evaluating Bayes factors in the linear metric is misleading, rather the log scale forms a natural distance.
(2) Log Bayes factors of -0.27 and -0.31 are so close to log(1) that generally they are considered inconclusive evidence either way, rather than consistent with the null hypothesis. To make this conclusion sensible one would need to specify a prior model probability markedly favoring the null.
Also worth noting is that the analysis performed here is with uniform priors, which open to some criticism when used for Bayesian model comparison, since the Bayes factor is so notably sensitive to the ranges of the uniform priors adopted which are not usually able to be agreed upon by expert consensus.