The bitch is back (to quote Elton John): on testing for bimodality …

I noticed this paper by Zafar et al. presenting “Evidence of bimodality in the [N/alpha] distribution” in damped Lyman alpha absorbers (accepted to MNRAS), which is to my mind quite indicative of the poor level of statistics in astronomy—both amongst practitioners and journal editors / referees.  The application of statistical procedures described in their manuscript is simply bizarre.  

The available dataset consists of 69 estimated [N/alpha] values with standard errors, 28 upper limits, and 11 lower limits.  Section 4.3.1 contain their description of bimodality testing.  First the authors use a chi-squared test and a Ryan-Joiner test to illustrate rejections of the null hypotheses that the data come from a uniform or a (unimodal) Normal distribution, respectively: from which they conclude that “this demonstrates departure from unimodal behaviour”.  [Of course not, it simply demonstrates rejection of these two strawman hypotheses; there are many unimodal distributions that would also be hopefully rejected by these tests, e.g. skew-Normal, Student’s t, etc.]  

As a next step, the authors divide their [N/alpha] sample into two groups: the subset with [N/alpha] < -1.2 and the remainder, and then apply (1) a Wilcoxon Rank-Sum test to reject the hypothesis that the two sub-samples are drawn from distributions with the same mean [but obviously they’re not: they’ve been thresholded above and below -1.2] and (2) the F-test to reject the hypothesis that the two sub-samples come from distributions with the same variance [which assumes normality; and again the issue of the thresholding].  From this they conclude: “these various statistical tests demonstrate with a high confidence level that the [N/alpha] distribution is bimodal.”  [At no point has bimodality been properly addressed.]  They also claim to have performed these tests with and without the censored (upper and lower limit) datapoints in their sample, but since no details are given and there are no widely-used procedures for performing these tests with censored data I remain skeptical about this too.

Although the methods presented in the above paper are clearly quite wacky it is worth acknowledging that testing for bimodality is not an easy task under either the Bayesian or Frequentist regimes.  In astronomy a popular approach is the likelihood ratio test for comparing a unimodal Gaussian (Normal) distribution against a pair of Gaussians as described by Ashman et al. (1994) and Gnedin (2010).  These papers are written with a focus on the ‘simple’ case of uncensored data observed without errors, and while it would be easy enough for someone thoroughly familiar with the relevant statistical methods to extend their approach to the case of the upper, lower, and standard error types of the Zafar et al dataset I suspect it would defeat most astronomers.  

My suggestion in this case would be to write the “bimodality test” as a Bayesian model selection problem for two reasons: (1) because writing the likelihood as a hierarchical model should be easier for the non-statistician, and (2) because the posterior sampling and marginal likelihood computations can be done in readily available software for hierarchical models, such as JAGS.  After extracting the data from Zafar et al’s published table I was able to quickly bang out the code below for computing the Bayes factor of a two component Normal mixture model versus a one component Normal.  The marginal likelihood is computed dubiously via the harmonic mean estimator (a better approach would be, e.g., thermodynamic integration; a slightly longer job to loop over runs at different temperatures in JAGS and do some book-keeping on the latent variables), but it’s interesting to note that the initial estimate I get here is in fact essentially neutral: no preference either way for the one or two-component model.  A sensible extension would be to look at mixtures of non-Gaussian densities (e.g. Student’s t, skew Normal) [all of which can be done with basic JAGS code too], and one should also look at prior sensitivity (of course!).


### R script to demostrate Normal mixture modelling via Gibbs sampling with the “rjags” and “bayesmix” packages in R


# note: before installing the “rjags” package for R one must already have installed the ordinary JAGS package:

# documentation for “bayesmix”:
# and introduction:

# note: the HME is not a good marginal likelihood estimator, so consider it’s use here only temporary / for illustrative purposes
# note: we are only able to ignore the censoring restrictions on the latent variables here because I have chosen identical hyperparameters

## Load Zafar et al.’s N/alpha dataset which I cut and pasted in parts from their manuscript
nah.normalerrs <- read.table(“nolimits.dat”)$V1 <- read.table(“nolimits.dat”)$V2

nah.lowerlimits <- read.table(“lowerlimits.dat”)$V1 # > limit
nah.upperlimits <- read.table(“upperlimits.dat”)$V1 # < limit

## Fit simple mixture models to N/alpha dataset: with Normal errors and censoring

# Set up “rjags” run for one component model
k1_errs_censored.bug <- “model {
for (i in 1:N1) {
y1[i] ~ dnorm(z1[i],obstau[i]);
z1[i] ~ dnorm(mu,tau);
for (i in 1:N2) {
y2[i] ~ dunif(-100,upperlim[i]);
z2[i] ~ dnorm(mu,tau);
for (i in 1:N3) {
y3[i] ~ dunif(lowerlim[i],100);
z3[i] ~ dnorm(mu,tau);
mu ~ dnorm(b0,B0inv);
tau ~ dgamma(nu0Half,nu0S0Half);
jags.k1 <- jags.model(textConnection(k1_errs_censored.bug),data = list(‘y1′ = nah.normalerrs,’obstau’=1/^2,’N1’=length(nah.normalerrs),’y2′ = (numeric(length(nah.upperlimits))-100),’upperlim’=nah.upperlimits,’N2′ = length(nah.upperlimits),’y3′ = (numeric(length(nah.lowerlimits))+100),’lowerlim’=nah.lowerlimits,’N3’ = length(nah.lowerlimits),b0=mean(nah.normalerrs),B0inv=1/var(nah.normalerrs)/2,nu0Half=1/var(nah.normalerrs)*0.25*4,nu0S0Half=0.25),inits=list(z2=rep(-10,length(nah.upperlimits)),z3=rep(50,length(nah.lowerlimits))),n.chains=1,n.adapt=0) # Specify hyperparameters for prior
posterior.k1 <- jags.samples(jags.k1,c(‘mu’, ‘tau’,’z1′),n.iter=1000000,thin=10)
posterior.z1 <- as.mcmc.list(posterior.k1$z1)[[1]]
posterior.k1 <- cbind(as.numeric(as.mcmc.list(posterior.k1$mu)[[1]]),as.numeric(as.mcmc.list(posterior.k1$tau)[[1]]))
posterior.k1[,2] <- 1/sqrt(posterior.k1[,2]) # Convert precision to standard deviation
lg.likelihoods.k1 <- numeric(100000)
for (i in 1:100000) {lg.likelihoods.k1[i] <- sum(dnorm(posterior.z1[i,],nah.normalerrs,,log=T))} # Compute likelihood for each draw
hme.k1 <- log(1/mean(1/exp(lg.likelihoods.k1))) # Compute marginal likelihood via the dubious HME method

# Set up “rjags” run for one component model
k2_errs_censored.bug <- “model {
for (i in 1:N1) {
y1[i] ~ dnorm(z1[i],obstau[i]);
z1[i] ~ dnorm(mu[S1[i]],tau[S1[i]]);
S1[i] ~ dcat(eta[]);
for (i in 1:N2) {
y2[i] ~ dunif(-100,upperlim[i]);
z2[i] ~ dnorm(mu[S2[i]],tau[S2[i]]);
S2[i] ~ dcat(eta[]);
for (i in 1:N3) {
y3[i] ~ dunif(lowerlim[i],100);
z3[i] ~ dnorm(mu[S3[i]],tau[S3[i]]);
S3[i] ~ dcat(eta[]);
for (j in 1:2) {
mu[j] ~ dnorm(b0,B0inv);
tau[j] ~ dgamma(nu0Half,nu0S0Half);
eta ~ ddirch(e);
jags.k2 <- jags.model(textConnection(k2_errs_censored.bug),data = list(‘y1′ = nah.normalerrs,’obstau’=1/^2,’N1’=length(nah.normalerrs),’y2′ = (numeric(length(nah.upperlimits))-100),’upperlim’=nah.upperlimits,’N2′ = length(nah.upperlimits),’y3′ = (numeric(length(nah.lowerlimits))+100),’lowerlim’=nah.lowerlimits,’N3’ = length(nah.lowerlimits),e=c(1,1),b0=mean(nah.normalerrs),B0inv=1/var(nah.normalerrs)/2,nu0Half=1/var(nah.normalerrs)*0.25*4,nu0S0Half=0.25),inits=list(z2=rep(-10,length(nah.upperlimits)),z3=rep(50,length(nah.lowerlimits))),n.chains=1,n.adapt=0) # Specify hyperparameters for prior
posterior.k2 <- jags.samples(jags.k2,c(‘mu’, ‘tau’,’z1′,’z2′,’eta’),n.iter=1000000,thin=10)
posterior.z1 <- as.mcmc.list(posterior.k2$z1)[[1]]
posterior.e <- as.mcmc.list(posterior.k2$eta)[[1]]
posterior.k2 <- cbind(as.numeric(as.mcmc.list(posterior.k2$mu)[[1]][,1]),as.numeric(as.mcmc.list(posterior.k2$tau)[[1]][,1]),as.numeric(as.mcmc.list(posterior.k2$mu)[[1]][,2]),as.numeric(as.mcmc.list(posterior.k2$tau)[[1]][,2]))
posterior.k2[,2] <- 1/sqrt(posterior.k2[,2]) # Convert precision to standard deviation
posterior.k2[,4] <- 1/sqrt(posterior.k2[,4]) # Convert precision to standard deviation
lg.likelihoods.k2 <- numeric(100000)
for (i in 1:100000) {lg.likelihoods.k2[i] <- sum(dnorm(posterior.z1[i,],nah.normalerrs,,log=T))} # Compute likelihood for each draw
hme.k2 <- log(1/mean(1/exp(lg.likelihoods.k2))) # Compute marginal likelihood via the dubious HME method

### Plot Results


x <- seq(-2.1,0.1,length.out=100)
y <- dnorm(x,posterior.k1[which.max(lg.likelihoods.k1),1],posterior.k1[which.max(lg.likelihoods.k1),2])*4

y1 <- dnorm(x,posterior.k2[which.max(lg.likelihoods.k2),1],posterior.k2[which.max(lg.likelihoods.k2),2])*4*posterior.e[which.max(lg.likelihoods.k2),1]
y2 <- dnorm(x,posterior.k2[which.max(lg.likelihoods.k2),3],posterior.k2[which.max(lg.likelihoods.k2),4])*4*posterior.e[which.max(lg.likelihoods.k2),2]

y1 <- dnorm(x,-1.41,0.14)*4*0.25
y2 <- dnorm(x,-0.85,0.20)*4*0.75

mtext(“Number per bin”,side=2,line=1.7)

legend(“topleft”,c(“Observed w/ Std. Err.”,”Upper limit available”,”Lower limit available”),bty=’n’,col=c(“grey70″,”red”,”blue”),lty=1)
legend(“topright”,c(“One Comp. Max. L. Fit”,”Two Comp. Max L. Fit”,”Zafar et al. 2014″),lty=c(1,2,2),lwd=1.5,col=c(“magenta”,”magenta”,”green”),bty=’n’)
legend(“left”,c(expression(plain(“Bayes Factor of”)),expression(H[plain(“Two Comp.”)]/H[plain(“One Comp.”)]),paste(” ~ “,sprintf(“%2.2f”,exp(hme.k2-hme.k1)),sep=””)),bty=’n’)

This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s