Two interesting papers that I found myself re-reading this week were “Functional Uniform Priors for Non-Linear Modelling” by Bornkamp and “Nonparametric Importance Sampling” by Zhang. The first I had encountered in 2012 thanks to Bornkamp’s talk at the Design of Experiments [DAE3] workshop at the Isaac Newton Institute in Cambridge; and the second came up in attempting to follow Deylon & Portier’s recent proof of the faster than root-n convergence when the usual importance sampling weight, f(theta)/g(theta) [theta ~ g(), f() the target], is replaced by a kernel density estimator for g(), regardless of whether or not g() is known!
Briefly, the idea behind functional uniform priors is to provide a construction for prior choice in non-linear models for regression where uniform priors on the parameters produce a highly non-uniform ‘distribution’ of regression curves in function space. The trick is to form a notion of uniform ‘densities’ in the model’s function space via (approximately) the packing number, which can be (further) approximated for well-behaved models using a Taylor expansion of the distance mapping to function space. (Which all sounds much more complicated than it actually is in reality … don’t be put off by the jargon!) Bornkamp gives a number of illustrative examples of how seemingly non-informative priors on the model parameters can have a strong influence on the posterior regression function, and the relative advantages of the functional uniform priors in allowing the data to speak for itself. Interesting to note is the examples Bornkamp gives of regression designs in which the Jeffreys prior cannot be computed—I shall cite these next time I’m arguing with someone that there’s no universal formula for prior choice.
Also briefly, the idea behind nonparametric importance sampling is perform adaptive importance sampling using as proposal a kernel density estimator targeting the optimal proposal. The idea does not appear to be widely used in comparison to parametric versions of adaptive importance sampling with which I’m more familiar; but, the usual problems of dimensionality scaling with kernel density estimators aside, it might be worth my while to try this out. Likewise with Deylon & Portier’s trick of replacing the ordinary weight with a kernel based estimate; though I would be surprised if the performance (when computational time is taken into account) in practice is as impressive as the theoretical result (which is obviously surprising).