A recent arXival brought to my attention some interesting work (including an author who’s a friend of mine from our St Andrews days, so some personal bias here) modelling the time-series and frequency-dependent time-lags of X-ray sources. The model is described most clearly in Zoghbi et al. (2013) but this builds on earlier work by Miller et al. (2010). In this field it is observed that a sensible statistical description of the X-ray time-series in a given band can be made via the shape of the power spectrum in the frequency domain; and that mapping a parametric form of this spectrum back to the time domain defines the covariance function for a Gaussian process, hence straightforwards for likelhood-based inference. This is interesting because the technique by which the mapping between frequency space and the time domain is numerically approximated is equivalent to the approximation inherent to the random Fourier features method from machine learning. Inference in Zoghbi et al. (2013) is via maximum likelihood in the time domain, but observing more directly the connection to random Fourier features suggests options to try to develop the representation for computational efficiency or to include non-stationary covariance functions. To estimate the time-lags between different frequencies the authors introduce an explicit model for the cross-covariance function, and I wonder here whether there is also some capacity to play with Stein shrinkage via an explicit random Feature representation.
On the other hand, I’m confused about the purpose of the more recent arXival, since the author there proposes to throw away the explicit modelling of cross-covariance, instead just using the GP as an interpolator to gap-fill the time-series and register them to a common grid, after which the inconsistent empirical estimator of the lag frequency function is to be applied to draws from the posterior predictive. This is similar to the approach taken by the H0licow team for estimating time delays in their lensing systems, although there they use only the posterior mean GP path. In both cases, it seems strange to not want to finesse estimation of the delay along the bias-variance trade-off using a joint/complete Bayesian model. Especially when we know that posterior functionals of the GP are typically themselves biased and in pathological cases can be disasterous! I also didn’t really understand the point of Section 3 in which different kernels are examined empirically to see which is best at representing the process resulting from a power spectrum of given frequency dependence since we can already just look this up from a table of correspondences (e.g. Table 1 here).