Having been (re-)inspired in pseudo-marginal methods by Patrick Muchmore’s talk at the recent ABC in Banff conference I’ve started evangelising to cosmologists who might be interested in fits based on multivariate Normal (Gaussian) distributions with estimated covariance matrices. For example, in the case of estimating cosmological parameters from the (empirical) two point correlation function of a galaxy survey. Here, although the power spectrum of the cosmological field is given by the model parameters, the covariance matrix of the observable two point correlation function (in a given set of radial bins) cannot be directly computed due to (e.g.) the irregular survey window, variations of imaging depth across the field, missing data due to bright star cut-outs, redshift fibre collisions, and so forth.
In such circumstances a sensible solution is to use mock datasets to side step the intractable calculation of exact likelihoods. Sometimes this will be done with full cosmological N-body simulations [Option A] which will give the most faithful representation of the random field but is extremely expensive. Typically one will only have the means to produce independent N-body realisations; these can be subsampled and bootstrapped to generate estimates of the observational covariance matrix. Other times the mock datasets will be constructed by directly bootstrapping the original dataset [Option B], though of course this approach cannot capture variability (cosmic variance) on scales beyond the survey footprint. Yet another approach employing mock datasets is that of approximating the cosmological field as a log-Normal process [Option C] which brings fast Fourier methods for field simulation into play. For the following discussion I am interested in Option C as perhaps the only computationally feasible method of these three with which to address the dependence of the observational covariance matrix on the true (unknown) cosmological parameters.
It is well known (thanks to Hartlap et al.) that inversion of the sample covariance matrix does not provide an unbiased estimator for the precision matrix, though a small correction factor allows for simple construction of an unbiased estimate of the precision matrix. This in turn provides an unbiased estimate of the log-likelihood (of a multivariate Normal), but not of the likelihood itself. Unfortunately, it can be an impossible job to convert an unbiased estimator taking values on the entire real line into an unbiased estimator for a non-negative statistic. That said, there is a different set of estimators—mostly developed in the 1950s, 60s, and 70s; sometimes for the purpose of studying bombing accuracy—that target directly the multivariate Normal likelihood. These have been identified by Price et al. and Muchmore & Marjoram as key ingredients for constructing pseudo-marginal MCMC algorithms on multivariate Normal targets.
The achievement of the pseudo-marginal MCMC algorithm is that in the unlucky case that one cannot exactly compute the likelihood function at a given set of parameters one can still construct an MCMC chain targeting the same Bayesian posterior as in the exact case as long as one has access to an unbiased estimate of the likelihood. Just plug the unbiased likelihood estimate for each new proposal into the Metropolis-Hastings accept/reject step (and, importantly, save and re-use the likelihood estimate attached to each accepted state). While the pseudo-marginal MCMC chain will always be noisier and stickier (and hence, demand running of a longer chain) than the equivalent MCMC algorithm using exact (non-noisy) likelihoods it will—quite amazingly!—nevertheless represent the same posterior. This is not guaranteed to be the case for, e.g., a noisy MCMC algorithm built by plugging in the likelihood estimate corresponding to the Hartlap et al estimate for the unbiased precision matrix. Likewise, for plugging in the marginalised t-distribution likelihood given by Selentin & Heavens or the likelihood corresponding to the shrinkage estimator of Joachimi. (These latter two should rather be considered as methods relating to Options A and B mentioned earlier.)
The reason the cost of producing mock datasets is an important consideration for this approach is that at each proposal of the MCMC chain a new unbiased estimate has to be computed, and for a multivariate Normal likelihood this will usually mean generating more than one mock dataset. For instance, the estimator used by Price et al requires at least datasets where is the dimension of the observed summary statistic (typically ~20-50 for an empirical two point correlation function). The Muchmore & Marjoram estimator can deal with just one mock dataset but will be very noisy in that regime. Tuning of the pseudo-marginal algorithm should focus on adjusting this number of mock datasets used, along with the proposal distribution, to target a sensible acceptance rate (see Andreiu & Vihola and Sherlock et al.). Efficient use of parallel computing architectures would be worthwhile when using this algorithm in anger; even brute force approaches such as batching the mock datasets to be produced over the available cores at each proposal.