After my previous post on forced identifiability for exoplanet modelling it was brought to my attention that this idea had previously been explored and rejected in earlier papers by Phil Gregory. So, I decided to take a look at these papers (Gregory 2007, Gregory 2010), and I think it may have been abandoned too soon.

First, it is true that Phil tried out a version of forced identifiability in the period distribution of his 2007 exoplanet model and came to the conclusion (my wording) that *for the purposes of posterior exploration (only) it is ‘more efficient’ to relax the ordering constraint during MCMC and simply relabel the planets with the desired ordering afterwards. * This seems quite reasonable to me, and I would note that a number of the Gibbs samplers for univariate mixture models available in R offer exactly this mode of operation. However, note that Phil’s comment was with regard to posterior exploration rather than marginal likelihood estimation. In fact, marginal likelihood estimation is precisely where Phil’s paper(s) get ugly.

[Worth noting is that in 2007 Phil can’t have been aware of reverse logistic regression, as in fact this provides a means of marginal likelihood estimation from tempered posterior simulation requiring only a handful of temperatures which would have suited his purposes nicely!]

I also *suspect* (since few details are given) that for his forced identifiability experiments Phil may have run his MCMC chain directly in period space with the hard constraint, p1 < p2 < … < pn, carving out an awkward boundary to explore via MCMC. A better way to handle this type of problem is to make a transformation of variables to the unit (hyper-)cube based on the sequence of conditional distribution functions for the ordered variates, F(u1), F(u2|u1), F(u3|u2) etc, representing the quantiles of an ordered draw from the original prior on the period parameter. The R script pasted below demonstrates this in code for a mock three planet problem. Even for the case of a likelihood function favouring two planets with similar periods (which has been suggested as a difficult one for forced identifiability) the posterior in our transformed space is just as easy to explore with MCMC (though MultiNest should be even better here!) as its parent density in period space. The burn in and posterior sample in our transformed space (u1,u2,u3), and its mapping back to period space (f1,f2,f3), look like this:

### R script for sampling from the order statistics of a three planet period distribution with gamma parent density

# Shape & rate of our gamma prior: the hyperparameters chosen here give a [1-99]% quantile range of 1 to 460 days

g.shape <- 1

g.rate <- 0.01

# Transformation from our unit cube sample space to R+^3 period space: derived from distribution functions of the unit interval order statistics conditioned iteratively: f(u3), f(u2|u3), f(u1|u2[,u3])

trans.coords <- function(u) {

u1 <- u[1]

u2 <- u[2]

u3 <- u[3]

f3 <- u3^(1/3)

f2 <- sqrt(u2*f3^2)

f1 <- u1*f2

f <- qgamma(c(f1,f2,f3),g.shape,g.rate)

return(f)}

# Sample from unit cube and transform to ordered periods

f.draws <- matrix(0,nrow=10000,ncol=3)

for (i in 1:10000) {f.draws[i,] <- trans.coords(runif(3))}

# Sanity check: confirm sample quantiles and correlation matrix agree with draws under manual re-ordering

f.alt.draws <- matrix(0,nrow=10000,ncol=3)

for (i in 1:10000) {f.alt.draws[i,] <- sort(rgamma(3,g.shape,g.rate))}

quantile(f.draws[,1],c(0.25,0.75))

quantile(f.alt.draws[,1],c(0.25,0.75))

quantile(f.draws[,2],c(0.25,0.75))

quantile(f.alt.draws[,2],c(0.25,0.75))

quantile(f.draws[,3],c(0.25,0.75))

quantile(f.alt.draws[,3],c(0.25,0.75))

cor(f.draws)

cor(f.alt.draws)

# Construct a mock (log) likelihood function in which two planets have close periods (10 & 11 days) and the third is wildly different (300 days)

library(tmvtnorm)

m.means <- c(10,11,300)

m.cov <- diag(c(2,2,50))

mock.log.likelihood.fn <- function(f) {

f <- sort(f) # Non identifiable (if that’s how we were doing things!)

f.log.like <- dtmvnorm(f,m.means,m.cov,lower=c(0,0,0),log=T)

# Warning: in my past experience with dtmvnorm I suspect it actually gives inexact log likelihoods in certain regimes

return(f.log.like)}

# Sample via MCMC within our unit cube

current.position <- c(0.5,0.5,0.5)

current.log.likelihood <- mock.log.likelihood.fn(trans.coords(current.position))

step <- c(0.01,0.01,0.01)

# Burn in

track <- list()

for (i in 1:2000) {

proposal <- current.position + c(rnorm(1,0,step[1]),rnorm(1,0,step[2]),rnorm(1,0,step[3]))

if (prod(proposal > 0)*prod(proposal < 1)==1) {

proposal.log.likelihood <- mock.log.likelihood.fn(trans.coords(proposal))

if (proposal.log.likelihood-current.log.likelihood > log(runif(1)) ) {current.position <- proposal

current.log.likelihood <- proposal.log.likelihood}}

track[[i]] <- current.position}

track <- do.call(rbind,track)

plot3d(track,type=’l’,xlim=c(0,1),ylim=c(0,1),zlim=c(0,1),xlab=”u1″,ylab=”u2″,zlab=”u3″)

# Output

step <- sqrt(diag(cov(track[1000:2000,])))

track <- list()

for (i in 1:10000) {

proposal <- current.position + c(rnorm(1,0,step[1]),rnorm(1,0,step[2]),rnorm(1,0,step[3]))

if (prod(proposal > 0)*prod(proposal < 1)==1) {

proposal.log.likelihood <- mock.log.likelihood.fn(trans.coords(proposal))

if (proposal.log.likelihood-current.log.likelihood > log(runif(1)) ) {current.position <- proposal

current.log.likelihood <- proposal.log.likelihood}}

track[[i]] <- current.position}

track <- do.call(rbind,track)

plot3d(track,type=’p’,xlab=”u1″,ylab=”u2″,zlab=”u3″)

length(unique(track[,1]))/10000

f.track <- track

for (i in 1:10000) {f.track[i,] <- trans.coords(track[i,])}

plot3d(f.track,type=’p’,xlab=”f1″,ylab=”f2″,zlab=”f3″)

# Sample via MCMC within our original period space for comparison

current.position <- c(50,50,50)

current.log.likelihood <- mock.log.likelihood.fn(current.position)

current.log.prior.density <- sum(dgamma(current.position,g.shape,g.rate,log=T))

step <- c(5,5,5)

# Burn in

track <- list()

for (i in 1:2000) {

proposal <- current.position + c(rnorm(1,0,step[1]),rnorm(1,0,step[2]),rnorm(1,0,step[3]))

proposal.log.likelihood <- mock.log.likelihood.fn(proposal)

proposal.log.prior.density <- sum(dgamma(proposal,g.shape,g.rate,log=T))

if (proposal.log.likelihood-current.log.likelihood + proposal.log.prior.density – current.log.prior.density > log(runif(1)) ) {current.position <- proposal

current.log.likelihood <- proposal.log.likelihood

current.log.prior.density <- proposal.log.prior.density}

track[[i]] <- current.position}

track <- do.call(rbind,track)

plot3d(track,type=’l’,xlim=c(0,1),ylim=c(0,1),zlim=c(0,1),xlab=”u1″,ylab=”u2″,zlab=”u3″)

# Output

step <- sqrt(diag(cov(track[1000:2000,])))

track <- list()

for (i in 1:10000) {

proposal <- current.position + c(rnorm(1,0,step[1]),rnorm(1,0,step[2]),rnorm(1,0,step[3]))

proposal.log.likelihood <- mock.log.likelihood.fn(proposal)

proposal.log.prior.density <- sum(dgamma(proposal,g.shape,g.rate,log=T))

if (proposal.log.likelihood-current.log.likelihood + proposal.log.prior.density – current.log.prior.density > log(runif(1)) ) {current.position <- proposal

current.log.likelihood <- proposal.log.likelihood

current.log.prior.density <- proposal.log.prior.density}

track[[i]] <- current.position}

track <- do.call(rbind,track)

plot3d(track,type=’p’,xlab=”u1″,ylab=”u2″,zlab=”u3″)

length(unique(track[,1]))/10000

f.alt.track <- track

for (i in 1:10000) {f.alt.track[i,] <- sort(track[i,])}

plot3d(f.alt.track,type=’p’,xlab=”f1″,ylab=”f2″,zlab=”f3″)