About six months ago I discovered a series of papers by Torsten Ensslin & collaborators promoting a methodology they had developed, called Information Field Theory (IFT), for performing Bayesian inference on random fields; that is, Bayesian inference in infinite dimensional space. The group’s website is here. At the time my gut reaction was that the calculations looked suspicious because numerous delta functions, and in particular compositions of functions inside delta functions, were involved, invoking the notoriously ill-defined (from a mathematical perspective) technique of physics-style path integration. But, having limited experience with the manipulation of probability measures beyond the standard R^k space I was not really in a position to properly explain my concerns. Now, however, after getting up to speed with the latest statistical research in this area I can show exactly why the mathematical foundation of IFT is incomplete, and why in all its practical applications to-date IFT has never actually interrogated an infinite dimensional posterior.

The basic premise of IFT (see, e.g., Ensslin & Weig 2010) is that for whatever infinite dimensional space (e.g., their suggestion, the Hilbert space of all L^2-integrable functions over some domain) we’re interested in we just give it a prior probability density, compute the likelihood of our observed data, and obtain (at least in thought experiment) a posterior probability density (pdf) via the ordinary Bayes theorem. The information part of IFT then comes in the approximation of the posterior pdf via an expansion about its Gibbs free energy minimising point.

So, where are the problems? First, the existence of a suitable prior density in our infinite dimensional space. From the Radon-Nikodym theorem we know that in order to have a probability density function we need first a *sigma-finite* reference measure against which our proposed prior probability measure is *absolutely continuous*. Unlike in the familiar, finite dimensional R^k case where Lebesgue measure provides a handy default suitable for most situations there is no analogous translation-invariant measure (except the trivial) in infinite dimensional space, so defining our prior probability density is not going to easy. The only feasible option that I know of is to use Wiener measure, and indeed there are Bayesian applications of this in the statistical literature (e.g. Beskos & Stuart; Wolpert & Ikstadt) for inversion of SDEs and Fredholm integral equations, but no mention of Wiener measure appears anywhere in the IFT papers.

Supposing, however, that a well-behaved pdf has now been constructed in the original, infinite-dimensional sample space (which presumably includes some ancillary bumpf in addition to the signal we’re interested in) the IFT proposal is to use this pdf to induce a pdf for a subspace called the signal via path integration. In order for this to be a valid operation the transformation from the original space to the signal space must be a measurable mapping for which the induced measure is again a sigma-finite measure absolutely continuous with respect to a suitable reference measure. Examples of transformations between finite dimensional spaces where these conditions do not hold are easy constructed (for instance, the integration of a 2D field over the contour paths of another 2D function gives an ill-defined probability density with respect to Lebesgue measure when the latter contains flat segments), so one would imagine we need to be extra cautious in infinite dimensional space. But again, no mention is made of these caveats in the IFT papers.

[Incidentally, when I emailed Torsten to ask about this issue his reply was that as long as the solution holds in the finite space of real-world, pixelated observations we can ignore any technicalities.]

Ignoring these technicalities (as well as a number of others, such as conditions on the nature of the likelihood function well-described for the Wiener measure case by Beskos & Stuart), it makes sense to look at how IFT is actually implemented. Well, here it turns out that *we can forget all of the above* since in fact all applications of IFT so far skip the challenges of defining a prior probability density on infinite dimensional space and go straight to adopting a prior on the finite (but high) dimensional observational space (e.g. the Z+^k product space of natural numbered count on an array of k pixels); cf. Weig & Ensslin’s “signals from Poissonian data” paper or Oppermann et al. 2011’s “improved map of the Galactic Faraday sky”. *So, as it stands, IFT is neither properly defined for application to infinite dimensional spaces nor has actually ever been applied to achieve Bayesian inference on infinite dimensional spaces, even though its advocates try to give the impression that it has already achieved both*.

Dear Ewan,

thanks for your continued interest in Information Field Theory. When you stopped our email conversation I assumed you were satisfied with my explanations. Now I see that you prefer to argue in the public. Fine with me.

The way you cite me is shortening and twisting my arguments. The point is not that it is sufficient to define a probability measure for Information Field Theory in a finite dimensional pixelization. The point is that if such a measure can be constructed in any sufficiently fine grained pixelization (to resolve the relevant physics) and gives consistent results in all these pixelizations that then the continuity limit can be taken and we can talk about a field theory. This is the usual way statistical field theories are defined and large branches of solid state physics using those would be in trouble if your arguments would be substantial.

This was actually expressed in initial my answer to your question about the definition of delta functions in such spaces:

“Anyhow, if it is about the question how to define a delta function over

a functional space, then I might help you. The idea is to think about

the functional space just as an abstraction for a series of finer and

finer discretized function spaces. The delta function is well defined

for each of the discretized cases, and the limit needs not to be taken

if any physical result calculated for a finite discretization does not

change any more with increasing resolution.

I tried to formulate this a bit more clearly an in-a-nutshell

introduction to IFT:

`Information field theory’ – an in-a-nutshell introduction to IFT

Torsten A. Enßlin, accepted for the proceedings of MaxEnt 2012, the 32nd

International Workshop on Bayesian Inference and Maximum Entropy Methods

in Science and Engineering, arXiv:1301.2556”

Here the link to the cited paper, http://arxiv.org/abs/1301.2556 . I believe it gives a more clear summary to IFT than your blog, where I spotted some confusion about the terms signal and data space, likelihood and other details.

In practice, on a computer, we have to work with finite pixelizations in applied IFT. However, as soon as the pixelization is sufficiently fine, the results do not change any more. On a code level, we can formulate our IFT algorithms in a pixelization free way if we use the NIFTy package (http://www.mpa-garching.mpg.de/ift/nifty/ ) that takes care of all the technical details.

So after this reply, I want to congratulate you to your Bayesian blog. It is a good idea to reflect literature and practice.

Best Regards,

Torsten Enßlin.

Hi Torsten,

I think this is really a case where mathematical statistics and physics-style statistics just don’t agree; so as a result I suspect we will never agree (on this topic). The point for me is that from a mathematical perspective the infinite dimensional spaces of functions and their analogues have entirely different properties (in particular, topologies) to those of finite dimensional spaces and their countably infinite extrapolations; e.g. though the space of R^infinity (the space of a [countably] infinite number of copies of the real line) is a little different to R^N, in many ways for probability theory it is also the same (e.g. in the sense that the convergence-determining classes are more-or-less identical), whereas C[Omega] (the space of continuous functions on a compact domain) is remarkably different in almost every way. Most importantly, convergence at finitely many points on the line extrapolated to infinity does not guarantee convergence in C[0,1]. For this reason I would point to, e.g., any of the Beksos & Stuart papers on infinite dimensional MCMC to give a flavour for what I think is required to call a method of statistical inference “infinite dimensional”.

cheers,

Ewan.

& I forgot to mention that DeLaigle & Hall (2010) also give a description of the problem for more general (i.e. not “simply” Wiener measure) random functions.

Dear Ewan,

I think you just stumbled over the measure problems of field theories in general. If there is a problem, it is shared by quantum field theory (QFT) and statistical field theory (SFT) and is not special to information field theory (IFT). The measure implicitly used by the path integral formalism (basically R^N with N going to infinity) might not be the Lebesgue measure you insist to request to be the basis for probabilities over the space of all possible functions. I guess the difference is that the former has probably only a countably infinite number of degrees of freedom, whereas for the latter the number of d.o.f. is uncountable infinite.

If you want to claim that IFT is not a field theory, you have to show in what respects it differs from QFT and SFT..

If your logic would be correct, any known field theory would *not* be a field theory!

Regards,

Torsten.

Hi Torsten,

I would more-or-less agree with this: that the mathematical problems of IFT are essentially the same as those of QFT and SFT. And it is accepted that the QFT approach of constructing the path integral as the limit of discretised space can work (i.e. give useful results) and be mathematically rigorous for a selection of important physics problems; though certain other cases give counterexamples resisting any renormalisation efforts.

I suppose the problem I have then is in accepting that this approach should be taken outside of the QFT domain and advocated as a Bayesian methodology for inference on infinite-dimensional spaces. For me the strategy of trying to build the problem from the ground up as it were, by looking at successively finer discretisations, is no longer sensible, since (1) we already have powerful statistical methodologies that start with infinite measures encoded in the prior (thereby avoiding the need for demonstrating convergence/normalization on a case-by-case basis), and (2) there is no longer the driving motivation from physics for trying to make theories we can manage on discrete space somehow extend to the continuum.

For problems of image reconstruction I think it makes sense to work on pixelised discrete spaces (perhaps even on finer grids than the data images for certain problems), and for these problems I cannot think of a need to deal with the infinite limit. For problems concerning real signals (e.g. the cosmic microwave background in cosmology; or e.g. surface elevation maps in geostatistics) I think one should begin with an infinite-dimensional prior (in these examples, typically the Gaussian process prior; though this is by no means the only available choice).

In these (signal space) cases it is then essential to identify one’s aim: is it parameter inference (e.g. learning a feature of the primordial power spectrum) or prediction (e.g. filling in a map across regions of missing coverage)? For the former it may well be that consistent estimators are available that depend little (if at all) on the characteristics of our assumed prior. On the other hand, for the latter the prior choice may be very important. Some stochastic process priors will tend towards their prior means over regions of missing coverage, while others will tend to maintain their local means. It is in this type of problem that the value of the top down statistical approach is revealed since it allows access to the huge body of published knowledge concerning these processes and the map making problem (both for choosing priors and then the computational procedures for fitting and prediction given the data).

cheers,

Ewan.

A reference I’m thinking of for the Bayesian image reconstruction approach (which also uses Gibbs energy formulations for its theory) is Winkler’s “Image Analysis, Random Fields and Markov Chain Monte Carlo Methods: A Mathematical Introduction”. There also seems to have been a “first-wave” of astronomical forays into fields and spatial statistics for image and field reconstruction in the 1990s. E.g. Molina, Katsaggelos, Mateos & Abad 1996, Pasztor & Toth 1995.

Hi Ewan,

its good to see that you agree finally: Information field theory *is* a field theory. Field theories might not be completely understood from a rigorous mathematical basis, but this is a different question. They are undoubtedly extremely useful.

In practice, one always works with a finite pixelization, since our computers are finite. There is, however, a good reason to set up IFT with an existing continuous limit, namely to ensure that the all sufficient fine pixelizations give consistent results. If one start with finite pixels right away, it easily happens that a method is designed that depends explicitly on the pixel and does not have a well defined continuous limit. I would argue that one should always have to prove then that a continuous limit gives consistent results if one did not start from it. However, if one uses NIFTy (our numerical IFT package in Python: http://www.mpa-garching.mpg.de/mpa/institute/news_archives/news1301_bbb/news1301_bbb-en.html) then one can formulate the algorithms abstractly, independent of the pixelization used during in the code execution.

For the references you give: Yes, there are plenty previous works addressing the problem how to reconstruct a field. A long, but still incomplete attempt to list them can be found in Enßlin, Frommert, & Kitaura (2009). In particular the mentioned Markov random fields are a subset of the field statistics accessible with IFT (they usually assume implicitly a k^-2 power spectrum). What is the new element of IFT given that many of its ingredients already existed? Well it’s the usage of methods developed in theoretical physics to solve the computational challenges of relevant inference problems. If one wants to solve real world problems, one is often dealing with complex and hierarchical probabilities. IFT provides a number of approximations (Feynman diagrams, renormalization, thermodynamical approach) that might help to cast the problem in a form that can be handled in practice. Some of these approaches were known before: minimal Gibbs free energy is equivalent to minimizing the Kullback-Leibler divergence (but note that the concept of Gibbs free energy was there first).

To (hopefully) conclude this debate:

Information field theory *is* a field theory!

Regards,

Torsten.

I have put a short summary of the essential point of this debate – from my perspective – to this web address:

http://www.mpa-garching.mpg.de/ift/Why_IFT_is_a_field_theory.html

Torsten.