About six months ago I discovered a series of papers by Torsten Ensslin & collaborators promoting a methodology they had developed, called Information Field Theory (IFT), for performing Bayesian inference on random fields; that is, Bayesian inference in infinite dimensional space. The group’s website is here. At the time my gut reaction was that the calculations looked suspicious because numerous delta functions, and in particular compositions of functions inside delta functions, were involved, invoking the notoriously ill-defined (from a mathematical perspective) technique of physics-style path integration. But, having limited experience with the manipulation of probability measures beyond the standard R^k space I was not really in a position to properly explain my concerns. Now, however, after getting up to speed with the latest statistical research in this area I can show exactly why the mathematical foundation of IFT is incomplete, and why in all its practical applications to-date IFT has never actually interrogated an infinite dimensional posterior.
The basic premise of IFT (see, e.g., Ensslin & Weig 2010) is that for whatever infinite dimensional space (e.g., their suggestion, the Hilbert space of all L^2-integrable functions over some domain) we’re interested in we just give it a prior probability density, compute the likelihood of our observed data, and obtain (at least in thought experiment) a posterior probability density (pdf) via the ordinary Bayes theorem. The information part of IFT then comes in the approximation of the posterior pdf via an expansion about its Gibbs free energy minimising point.
So, where are the problems? First, the existence of a suitable prior density in our infinite dimensional space. From the Radon-Nikodym theorem we know that in order to have a probability density function we need first a sigma-finite reference measure against which our proposed prior probability measure is absolutely continuous. Unlike in the familiar, finite dimensional R^k case where Lebesgue measure provides a handy default suitable for most situations there is no analogous translation-invariant measure (except the trivial) in infinite dimensional space, so defining our prior probability density is not going to easy. The only feasible option that I know of is to use Wiener measure, and indeed there are Bayesian applications of this in the statistical literature (e.g. Beskos & Stuart; Wolpert & Ikstadt) for inversion of SDEs and Fredholm integral equations, but no mention of Wiener measure appears anywhere in the IFT papers.
Supposing, however, that a well-behaved pdf has now been constructed in the original, infinite-dimensional sample space (which presumably includes some ancillary bumpf in addition to the signal we’re interested in) the IFT proposal is to use this pdf to induce a pdf for a subspace called the signal via path integration. In order for this to be a valid operation the transformation from the original space to the signal space must be a measurable mapping for which the induced measure is again a sigma-finite measure absolutely continuous with respect to a suitable reference measure. Examples of transformations between finite dimensional spaces where these conditions do not hold are easy constructed (for instance, the integration of a 2D field over the contour paths of another 2D function gives an ill-defined probability density with respect to Lebesgue measure when the latter contains flat segments), so one would imagine we need to be extra cautious in infinite dimensional space. But again, no mention is made of these caveats in the IFT papers.
[Incidentally, when I emailed Torsten to ask about this issue his reply was that as long as the solution holds in the finite space of real-world, pixelated observations we can ignore any technicalities.]
Ignoring these technicalities (as well as a number of others, such as conditions on the nature of the likelihood function well-described for the Wiener measure case by Beskos & Stuart), it makes sense to look at how IFT is actually implemented. Well, here it turns out that we can forget all of the above since in fact all applications of IFT so far skip the challenges of defining a prior probability density on infinite dimensional space and go straight to adopting a prior on the finite (but high) dimensional observational space (e.g. the Z+^k product space of natural numbered count on an array of k pixels); cf. Weig & Ensslin’s “signals from Poissonian data” paper or Oppermann et al. 2011’s “improved map of the Galactic Faraday sky”. So, as it stands, IFT is neither properly defined for application to infinite dimensional spaces nor has actually ever been applied to achieve Bayesian inference on infinite dimensional spaces, even though its advocates try to give the impression that it has already achieved both.