Yesterday I was having some back and forth on Facebook with Till Sawala about the relative standards of statistical practice in medicine and astronomy prompted by this article by Carlos Frenk in which he suggests the widespread problems identified recently in the medical literature would not cross over to physics because we have a more mathematical understanding of statistics. So, feeling bitchy, I decided to check the statistics of Sawala and Frenk’s recent papers and I found some pretty glaring examples of bad stats. In particular, let’s take a look at Sawala et al. 2013.
After a lot of highly complex & computationally intensive astrophysical simulations (which I don’t criticise) the authors aim to present the relationship between the median GIMIC and DMO simulated masses of galaxies as a function of DMO mass in their Fig 8. They say they’re plotting in each bin of DMO mass “the estimate of the median ratio and its statistical error”, so I decided to zoom in and see how they calculated that (since it wasn’t described in the text). After measuring the width of the overlaid error bars and then deleting these in Inkscape I discovered the following mistakes:
(1) in bins of an even number of objects they took as estimate of the median the value of the ceiling((n+1)/2) ranked object rather than the midpoint of the floor((n+1)/2) and ceiling((n+1)/2) objects (see wikipedia for a simple explanation); and
(2) for confidence interval on the median they took the median +/- sd(objects)/sqrt(n), which is what one might do for the confidence interval on the mean, but is not appropriate for the median. A better simple formula is that given by 1.5*IQR/sqrt(n) (as explained for NZ school children here); and a formal non-parametric CI on the median (and any other quantile) can be found using order statistics (as explained here).
Following on from these mistakes I would also guess that the authors used a naive methodology to fit their parametric forms for the median as a function of DMO mass (which is again not explained in the text). That is, I would expect that they performed a least squares fit to their (mis-)estimated sample medians in each bin. A better technique for this type of fitting is to use quantile regression. Quantile regression codes are available freely in R and a number of other languages and typically include sophisticated estimators for the CIs on each fit parameter. Interesting to note is that the authors do not quote any uncertainties in their Table A1, which might also be considered bad statistical practice.
Ahh, the hubris of astronomers …