More p-values instead of likelihoods …

I noticed this paper by Dressler et al. on astro ph today concerning the potential of Lyman alpha emitters at z=5.7 to reionize the universe.  The observational component of the study  is to present redshifts for a sample of high redshift LAE candidates, 40 of which have fluxes in a range 2-20 x 10^-18 ergs/s/cm^2: 13 of these are confirmed as true positives, the remainder being identified as contaminating foreground sources.  The subsequent statistical analysis is a constraint of the LAE luminosity function slope using this binomial statistic: i.e., the expected LAE to total source fraction for various possible foreground LFs and LAE LFs is compared against that observed to weight the plausibility of the various LAE LF slopes.  The problem I have with the authors’ approach is that they make the classic astronomy mistake of using a p-value for parameter estimation instead of the likelihood. Details are given in Table 3 of the paper in which the authors identify the expected number of LAEs in a sample of 40 LAE + foreground sources according to each set of LF parameters; they then compute the corresponding tail probability of 13 or more LAEs (although they report this in the table description [but not the text] as the probability of n=13 exactly); and finally the compute a sort of p-value which they use as a likelihood: for cases where the expectation is for less than 13 LAEs they take the above tail probability and for cases where the expectation is for more than 13 LAEs they take its complement.  The use of p-values instead of likelihoods has been discussed frequently on this blog, though mostly for cases where authors use the K-S test statistic (unwittingly) as a summary statistic.  Perhaps in this case the authors were thrown by the discrete sample space of the observations or the discrete set of trial LF parameters considered?  A hint at this perhaps comes from the fact that they describe using Monte Carlo simulations to estimate the probability of 13 or more LAEs in a sample of 40 given X expected: a number which can easily be recovered analytically from the binomial survival function.

In other news, I noticed another upcoming astrostatistics conference for mid-2015: the Local Group Astrostatistics Conference at U Michigan.  I have the impression there’s a bit of a US – Europe divide forming in the world of astrostatistics: with the NYU/Vanderbilt/Caltech centred collaboration forming one, and the Imperial/CEA/Cambridge centred collaboration forming another.  Each organises their own conferences, runs their own workshops, and promotes their own lists of astrostatistics experts, and there’s seemingly very little cross-over between the two.  Possibly this can be explained from an economics perspective, like with a cellular automaton model for collaboration and competition in an environment of limited resources?  All well and good if you want to see a bunch of people backslapping each other, but I suspect the science would benefit from a bit more rough and tumble.

This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s