Some thought about today’s astro-ph paper on Bayesian model selection for exoplanet discovery by Tuomi et al. While the methodology is more-or-less reasonable I feel like there’s always one or two things a bit ‘off’ with Tuomi & Jones’ statistical analyses. In this case one thing that caught my eye was that they go to the trouble of defining Bayesian credible sets for the marginal posterior of a given parameter as defined by the interval range of theta such that the two end points share an *equal posterior density* while enclosing the specified fraction, d, of posterior mass (say, 99%). That is as opposed to the more frequently used Bayesian credible intervals defined simply by finding the lower and upper bounds excluding (1-d)/2 worth of mass below and above, respectively. While it is possible to estimate the former via MCMC (e.g. Chen & Shao 1998) it’s generally a bit of a faff since it requires estimating the density, whereas the ‘ordinary’ credible intervals are trivially recovered from MCMC. As there’s no mention of how Tuomi et al went about recovering their credible sets I wonder if they did actually compute credible intervals?? Hmm … too confusing. Likewise, I notice they temper on pi(theta)^beta*L(y|theta)^beta rather than the more conventional (and I would say more sensible) pi(theta)*L(y|theta)^beta, and then use the old Newton & Raftery (1994) prior + posterior estimator of the evidence rather than something more powerful (if you’re tempering anyway then thermodynamic integration or biased sampling should be far more robust).

Interesting to see that they used an AR(1) model for the noise; if any exoplaneters out there are interested in something like an MA(>1) noise models then I have a lot of thoughts about particle Gibbs strategies for efficient posterior sampling 🙂

- Follow Another Astrostatistics Blog on WordPress.com
### View Posts by Category

ABC Astronomy Astrostatistics Bad Science Big Data Bayes Dirichlet Processes Fourier analysis Gaussian Processes Infinite-Dimensional Inference INLA Marginal Likelihood Estimation Measure Theory Non-Parametric Order Statistics Particle MCMC Quantile Regression Rants Semi-Parametric Statistics Uncategorized Zoology, Epidemiology, & Clinical Trials### Archive

- June 2017
- May 2017
- April 2017
- March 2017
- February 2017
- January 2017
- December 2016
- November 2016
- October 2016
- September 2016
- August 2016
- July 2016
- March 2016
- February 2016
- January 2016
- December 2015
- October 2015
- September 2015
- August 2015
- July 2015
- June 2015
- May 2015
- April 2015
- March 2015
- February 2015
- January 2015
- December 2014
- November 2014
- October 2014
- September 2014
- August 2014
- July 2014
- June 2014
- May 2014
- April 2014
- March 2014
- February 2014
- January 2014
- December 2013
- November 2013
- October 2013
- September 2013
- August 2013
- July 2013
- June 2013
- May 2013
- April 2013

Highest density intervals and the like are stupid.

Also, big warning for exoplaneteers calculating the evidence: the kind of data you have can quite easily give you first order phase transitions.

Agreed. Re the HPDs, I imagine there’s some highly restrictive conditions (that one probably couldn’t verify on a practical problem) to ensure that they’re well behaved. Somewhere on my future reading list is Donoho 1988 which deals with the problem that you can only ever place lower bounds on the number of modes in a density (of unknown form) with finite draws.