A Bayesian Nadaraya-Watson estimator?

The other day I was wondering whether it would be possible to formulate the Nadaraya-Watson (i.e. kernel smoothing), non-parametric regression estimator within the context of a hierarchical Bayesian model?  The idea would be to take advantage of the exceptional flexibility of the N-W estimator to adapt itself to the (true, unknown) regression curve as more and more data becomes available, while simultaneously benefiting from prior knowledge (or prejudice) regarding the optimal bandwidth parameter, parametric regression curve, and various nuisance parameters.

To first order I think the answer to the above question is ‘yes’, using the following sort of hierarchical structure (here the example dataset is a prevalence(x)-incidence(y) relationship for falciparum malaria):
y_i \sim \mathrm{Pois}(t_i \nu_i \exp(f(x_i)))
f(x^\ast) = \int z_x K(|x^\ast-x|/h)dF(x)
\{z_x\} \sim \mathrm{N}(a F(x)+b,\sigma^2)
x_i \sim F
F \sim \mathrm{DP}(G_\theta,\alpha)
\nu_i \sim \Gamma(r,r)
h,a,b,\theta,\alpha,r,\sigma \sim \pi
where K(\cdot) is some positive definite kernel (like the Normal density), and G_\theta is a suitable parametric density for the design.  In this particular example the t_i is the number of ‘person years observed’ at each site and \nu_i accounts for over-dispersion.

I coded this model up in JAGS using a hacky stick-breaking construction of the DP and a hacky super-precise Normal density on the ‘observation’ of x_i to condition the DP appropriately.  I’m also working in the ilogit space for x.  JAGS code pasted below.  It runs quite slowly with only 100 atoms in the approximate DP draw, but the result is quite consistent with my expectations for an empirical falciparum prevalence-incidence relationship.  Fun times!

previnc

model.jags <- “model {
for (i in 1:N) {y[i] ~ dpois(pyo[i]*exp(reffect[i]*y.curve[components[i]]));}
for (i in 1:N) {reffect[i] ~ dgamma(r,r);}
for (j in 1:M) {y.curve[j] <- sum(atomsy*sweights[j,1:M])/sum(sweights[j,1:M]);}
for (j in 1:M) {for (k in 1:M) {sweights[j,k] <- exp(-sqrt((atomsx[k]-atomsx[j])^2)/h);}}
for (j in 1:M) {atomsy[j] ~ dnorm(atomsx[j]*slope+intercept,1/sigmay^2);}
for (i in 1:N) {x[i] ~ dnorm(atomsx[components[i]],1/0.000001^2);
components[i] ~ dcat(weights);}
for (j in 1:M) {
atomsx[j] ~ dnorm(mux,1/sigmax^2);
}
for (j in 1:(M-1)) {
w[j] ~ dbeta(1,alpha)
}
weights[1] <- w[1]
for (j in 2:(M-1)) {
weights[j] <- w[j]*(1-w[j-1])*weights[j-1]/w[j-1]
}
weights.sum <- sum(weights[1:(M-1)])
weights[M] <- 1-weights.sum
h ~ dgamma(h1,h2)
alpha ~ dgamma(g1,g2)
mux ~ dnorm(mu.mu,1/mu.sigma^2)
sigmax ~ dgamma(s1,s2)
sigmay ~ dgamma(s3,s4)
slope ~ dnorm(mu.s,1/sigma.s^2);
intercept ~ dnorm(mu.i,1/sigma.i^2);
r ~ dgamma(1,1);
mu.mu <- -1
mu.sigma <- 0.3
g1 <- 2
g2 <- 2
s1 <- 1
s2 <- 1
s3 <- 1
s4 <- 1
mu.s <- 0
sigma.s <- 2
mu.i <- -1
sigma.i <- 2
h1 <- 2.5
h2 <- 10
r1 <- 1
r2 <- 1
}”
jags.mod <- jags.model(textConnection(model.jags),data = list(‘N’=length(x),’x’=x,’y’=as.integer(inc.obs),’pyo’=pyo.obs,’M’=100),inits=list(‘atomsx’=c(x,rep(1,100-length(x))),’atomsy’=c(y,rep(1,100-length(y)))),n.chains=1,n.adapt=500)
update(jags.mod,500)
posterior.jags <- jags.samples(jags.mod,c(‘x’,’alpha’,’mux’,’sigmax’,’atomsx’,’atomsy’,’y.curve’,’h’),n.iter=1000,thin=10)

alpha.post <- as.numeric(as.mcmc.list(posterior.jags$alpha)[[1]])
atomsx.post <- as.matrix(as.mcmc.list(posterior.jags$atomsx)[[1]])
atomsy.post <- as.matrix(as.mcmc.list(posterior.jags$atomsy)[[1]])

plot(1/(1+exp(x)),exp(y),ylim=c(0,3.5),xlab=”Prevalence”,ylab=”Incidence (#/PYO)”)
xplot <- yplot <- seq(-5.7,2.4,length.out=100)
for (i in 1:100) {
for (j in 1:100) {yplot[j] <- sum(atomsy.post[i,]*exp(-sqrt((xplot[j]-atomsx.post[i,])^2)))/sum(exp(-sqrt((xplot[j]-atomsx.post[i,])^2)))}
lines(1/(1+exp(xplot)),exp(yplot),col=hsv(0,s=1,v=0.75,alpha=0.2))}
points(1/(1+exp(x)),exp(y))

## DATA

> x

 [1] -1.0252810 -0.3791717 -1.4051510 -1.5998685  0.3866409 -1.4500102

 [7] -4.4671639 -5.6514862  0.1804884 -0.3846743 -1.6434218 -2.7515353

[13] -1.7765820 -2.1477539 -0.9286282 -0.9286282  2.3136349  1.9736255

[19]  1.5645131 -3.0783251  2.2894558 -0.9895549 -2.6021528 -2.2115872

[25] -0.5065612  0.4054651  0.3683925  0.4768340 -0.3988495 -2.0527262

[31] -1.0563897 -1.9199970 -0.4895482 -0.2626496 -3.1988907 -2.0527263

[37] -1.8489179 -3.4422774 -3.6233148  2.2540581 -0.3540089

> pyo.obs

 [1]   352.00000  1081.00000   527.00000   197.00000   374.50846   139.52221

 [7]  1101.00000   929.00000   196.00000   199.00000   201.00000   199.00000

[13]   420.19231   608.16667 19531.00000 20530.00000   154.10000   154.90000

[19]   146.20000 12061.00000   112.00000   110.50000   112.00000   123.00000

[25]  1637.21000  1564.02143    73.63000   454.10000   152.02466    94.00000

[31]  2316.25000   370.40385   795.61876    52.44231  2542.50000  1113.00000

[37]   547.16986   819.94247   120.25479    68.66667  1075.83333

> inc.obs

 [1]  486.00000 1048.00000  263.00000   70.00000  696.00000  199.00000

 [7]   27.00000   14.00000  262.00000  271.00000  379.00000  197.00000

[13]  552.00000  128.00000 8893.00000 5871.00000  157.00000   90.00000

[19]  121.00000  224.00000  113.90230   16.95521    3.02445   51.00000

[25]  800.35249  564.31589  175.00000 1071.00000 1058.00000  120.00000

[31]  626.00000   53.00000  130.00000   30.00000 3795.00000 1265.00000

[37]  358.00000   40.00000   66.00000  132.00000  188.00000

> y

 [1]  0.32257745 -0.03100295 -0.69504652 -1.03470849  0.61973528  0.35508102

 [7] -3.70813727 -4.19505141  0.29022984  0.30881400  0.63423130 -0.01010110

[13]  0.27283556 -1.55841870 -0.78673850 -1.25186225  0.01864406 -0.54297008

[19] -0.18918500 -3.98608633  0.01684215 -1.87444030 -3.61176961 -0.88035872

[25] -0.71569661 -1.01940143  0.86573342  0.85803063  1.94009288  0.24419696

[31] -1.30835441 -1.94430197 -1.81158568 -0.55851628  0.40053656  0.12801305

[37] -0.42422630 -3.02035472 -0.59995804  0.65353804 -1.74440887

Advertisements
This entry was posted in Uncategorized. Bookmark the permalink.

One Response to A Bayesian Nadaraya-Watson estimator?

  1. On a tip off from Yee Whye I’ve found the correct search terms to identify work in more-or-less the same direction — e.g. http://www3.stat.sinica.edu.tw/statistica/oldpdf/A20n48.pdf — but I think the my toy model is not exactly this.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s