Everything against everything …

One of my past bosses in galaxy evolution research used to love telling us post-docs and PhD students trying to make sense of our precious ‘reduced’ datasets (in this case, enormous galaxy catalogues compiling 100-200 different variables for each of 100,000 objects) to go away and plot “everything against everything”; the idea being to search for some hitherto unknown trends of one variable against another (usually for some huge collection of subsets of the galaxy population) to hopefully write a paper about.  Reading up on the guidelines (e.g. QUORUM, PRISMA) for meta-analyses of clinical research today I couldn’t help be struck by the contrasts.  In particular, the latter couldn’t be clearer about the importance of pre-specifying the hypotheses under investigation in order to control the Type 1 error rate.

I wonder whether any astronomical study has ever attempted to correct their significance claims for this common design flaw of our usual exploratory analysis phase? Seems unlikely.
[For a relevant review of significance level computation in the multiple testing setting see http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1713204/ %5D.

Advertisements
This entry was posted in Astronomy, Astrostatistics, Statistics. Bookmark the permalink.

2 Responses to Everything against everything …

  1. lukebarnes says:

    Must you pre-specify the hypotheses? Couldn’t the null hypothesis rejection criteria be a function of the number of hypotheses tested with the data?

    • Totes! Well, at least in the astronomy context you should do that, adjust your p-value threshold for a given significance (or visa versa). But I can’t recall ever having seen someone actually do that …
      In the meta-analysis case it seems to be bad form to not clearly pre-specify your hypotheses (since there’s been some bad mistakes in the past). Interestingly, there’s actually a journal devoted to publications outlining hypotheses and study designs for meta-analyses, including details such as which search terms will be used to identify candidate datasets for inclusion and/or rejection criteria for the same.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s