Back from my travels abroad with the standard amount of jetlag and a non-standard amount of enthusiasm for getting my house in order & resubmitting my backlog of refereed and rejected papers! But on to identifiability for now …

Playing catch-up with astro-ph at 5am this morning I noticed a new submission ( http://arxiv.org/pdf/1307.6984.pdf ) from some of my recent collaborators, Farhan Feroz & Mike Hobson, describing their Bayesian re-analysis of a ‘controversial’ exoplanet dataset: the controversy being that some previous studies have claimed the detection of up to 6 or 7 planets in the noisy radial velocity signal, though Farhan & Mike’s study argues for only 2-3. What caught my eye was the unusual method the authors use here (and in previous studies; e.g. http://arxiv.org/pdf/1012.5129v2.pdf ) to perform their model selection, which they dub ‘the residual data model’ … or, more precisely, what caught my eye was their stated motivation for this unorthodox approach: *the n! growth in the number of posterior modes in their exoplanet model posterior owing to ‘the counting degeneracy’*.

In statistical terms the exoplanet model has an **identifiability** problem equivalent to that of a basic mixture model: the likelihood of the n planets having the set of parameter vectors {θ_1, θ_2, … , θ_n } is identical to that for {θ_2, θ_1, … , θ_n }, and the remaining n!-2 other permutations of parameter vectors over planet index. At face value this presents some difficulties for marginal likelihood estimation since if (in the best case scenario) each permutation contributes a single mode in the posterior space the number of total modes necessary to integrate over goes as n!.

[Interestingly, I was discussing another example of this type of problem with Jonathan Whitmore and Tyler Evans at the Varying Constants meeting in Sesto with regard to the Voigt profile mixture modelling of quasar absorption spectra … a domain where evaluation of the likelihood function is just that little bit too slow to disregard computationally owing to the mathematical awkwardness (or beauty, if you prefer) of the Voigt function.]

In the mixture modelling domain the typical way to handle this problem is by way of **forced identifiability**: we simply impose the constraint that, e.g., the (aphelion) orbital radius of the planet labelled 1 is always less than that of the planet labelled 2 and so on. This can be implemented in the actual marginal likelihood computation by, e.g., specifying that the prior radii of these planets to correspond with their place in the order statistic distribution of our original prior. Under this restriction the number of posterior modes is reduced, of course, by a factor of n! … which we then simply need to divide off our forced identifiability-based marginal likelihood estimate.

As far as I can tell, none of the other exoplanet Bayesians have caught on to this little trick yet either … so if anyone finds this useful, you’re welcome and you owe me a beer!

I am glad that they were able to get MultiNest to work on this problem by enforcing a restriction that gets rid of the N! modes. I imagine it might introduce other awkward structures in the posterior, but good samplers are okay with that.

I like to use a sampler that has no problem with N! modes in the first place 😉

This n! growth of modes should be a function of the model specification rather than the sampler, I reckon.

Awesome that they used a correlated noise model though. A lot of “detections” of things in astronomy are probably due to naive use of the “iid gaussian noise” assumption. When the number of data points is large that kind of model can make it look like you have a lot of information.

Final comment: I’m not a fan of the approach where if N is unknown you run an N=1 model and an N=2 model and so on, and get the marginal likelihood of each. I reckon it’s usually much more efficient to make a big model where N is unknown and do it in one run. But I can’t prove that (except to say that when I look for ~1000 stars in a noisy image there’s no way I’m doing hundreds of separate runs!)

The correlated noise model is nice!

Mike & I have already played around with forced identifiability approach exactly as you described. Brendon is right, we found that it introduces very awkward features in the posterior making exploration of the parameter space very difficult e.g. with this approach finding two objects with similar values of the parameter you are ordering on is very tricky. Another exo-planet study (by Phil Gregory, need to find the exact reference) came to the same conclusion.

Brendon, the ideal solution to deal with unknown number of objects in the data-sets is to do what you just described, make a larger model & indeed we do accept this in our paper (arXiv:1012.5129) in which we discuss different object detection approaches. However, in our experience this approach is not very practical with complicated models which is why resort to this poor man’s approximation to the gold standard. Obviously with number of objects being too large, this approach is not practical either but there are other ways to deal with it given certain assumptions are satisfied by the problem (e.g. look at the single source model in arXiv:1012.5129).

Interesting. I don’t see why this should introduce awkward features in the posterior: it should just be a regular sub-region of the original posterior? Definitely point me in the direction of the Gregory paper …

Finally found the Gregory paper. Its arXiv:1003.5549. See the 2nd paragraph on 2nd column of page 6.

Thanks for commenting Farhan. I really need to spend a day going back and rereading all of your papers properly. I think I have a rough idea of what you’ve been doing but we do work on quite similar things so I should know more details!