Apologies for the long delay between posts here; I’ve been hunkered down in my office getting funky with DPs (Dirichlet Processes) and writing talks for my upcoming Giro d’Italia / UK (Sesto -> Venice -> Oxford -> Bologna). Anyways, this post I thought I might focus on a preprint by John Magorrian (http://arxiv.org/abs/1303.6099), entitled “Bayes vs the Virial Theorem” (BvVT), since this touches closely on some work I’ve been doing for semi-parametric analysis of the fine structure constant quasar dataset (the Webb et al. 2011 one, that is).

In BvVT Magorrian sets himself the ambitious task of devising a fully Bayesian scheme for inferring a galactic potential function from observations of (the positions and velocities of) individual stars drawn from its distribution function. The motivation being that a fully Bayesian approach should marginalise over a prior for all possible distribution functions consistent with the observed data), rather than simply picking one (e.g. via maximum likelihood) and sticking with this (as is done at present). To do this the author adopts a Dirichlet Process prior on the unknown distribution function; the motivation (with some interpretation on my part) being that the support of DP(alpha,H) contains a weakly convergent approximation to every continuous distribution function having matching support and absolutely continuity with respect to its baseline measure, H. That is, the DP prior gives (in some sense) the most “flexible” prior over distribution space remaining amenable to ordinary Bayesian inference.

A computational difficulty then arises since for each galactic potential function to be tested we require to compute the marginal likelihood with respect to this DP prior space. To assist with the latter (or perhaps more so to give a flavour for the model form under observational errors) Magorrian adds a characteristic “blob” smoothing kernel to the spikes of each atomic draw from the DP; which in this case is taken as a multivariate Normal with “circular” boundaries at [0,2pi] in each coordinate. [Interestingly (I think) this domain is topologically analogous to the N-dimensional torus; which may or may not allow someone more familiar with probability distributions on such structures to add some insight here!] [Also interestingly (I think) the smoothed model so described for a given galactic potential is very much like the DP mixed effects model sometimes used in meta-analyses.]

Although sampling from the posterior of DP models is now quite a routine problem for Bayesian analysis (particular when the priors adopted are conjugate, e.g. MacEachern & West 1998; and even if they’re not, e.g. Neal 2000) the problem of marginal likelihood computation for such models has been little explored. Basu & Chib (2003) give one possible solution (again for a conjugate model) based on Chib’s estimator (though I wonder how effective this really is owing to the known limitations of Chib’s estimator when mixing is slow, as it can be for DP problems; see Radford Neal’s online note on Chib’s estimator for the “galaxy dataset”). Dosi (2009) exhibits a reverse logistic regression model for computing relative Bayes factors of DP models for varying alpha (concentration parameter) relative to some nominal alpha_0, exploiting his derivation of the Radon-Nikodym derivative of the DP marginals. However, owing to his non-conjugate priors Magorrian has in this instance opted for his own, seemingly computationally-intensive scheme using the variational method to identify a lower bound.

One solution which I’ve been having reasonable success with this week is to use reverse logistic regression over a series of likelihood draws from tempered versions of the posterior with temperatures (the t’s in L^t) ranging between zero and one (prior to full posterior), and beyond (to say t=5; powered-up posterior). The nice thing about this procedure is that once you’ve finished RLR you not only have the normalisation for your posterior but also for each of the other tempered distributions explored; the latter allowing for efficient prior-sensitivity analysis (a la Cameron & Pettitt 2013) from pseudo-importance sampling reweighting under alternative priors. (The only novelty being that one needs Hani Doss’s Radon-Nikodym formula when making changes to alpha and/or H). Compared to the usual parametric Bayes implementation of RLR I have also found it necessary to use a much longer, more gradual heating sequence (e.g. t=(0,1/n,2/n,…,n/n,(n+1)/n)^10 with n ~50 plus some higher t’s, say, 2,3,4,5,10) so as to ensure the empirical log-likelihood distributions drawn from consecutive tempered posterior overlap to at least some extent [this is like a stronger version of the non-separability required for the RLR method to work in this first place; cf. Vardi 1985, Geyer 1994].

One nice point about this method: for Normal smoothing kernel, N(e_i,sigma^2), [e_i the blob center drawn from a realisation of the DP, and sigma its std deviation] the tempered posterior at temperature t is distributed according to the usual prior x likelihood function with the N(e_i,sigma^2) in the likelihood replaced by N(e_i,sigma^2/t). So we can use the same machinery for exploring each of these tempered configurations. (Neat, huh?)

Off to Italy tomorrow … booyah!

Pingback: Bayes vs the virial theorem | Hypergeometric