Once again there was a nice synchronicity between the astronomy and stats astro-ph this week: this time involving kernel-based methods. The first kernel paper was that by Zuluaga et al. investigating the use of a kernel-based metric to define a discrepancy distance between real and mock datasets in ABC (Approximate Bayesian Computation). According to the authors’ experiments with the example problem from Wood’s Nature paper this appears to be a sensible choice for some types of ABC analysis—though indeed I am biased favourably in this regard because I had the same idea (though I never followed it up) after seeing this distance used in Minkser et al.‘s subset posterior Bayes paper last year. Although it’s not taken up in Zuluaga et al.’s paper there are a number of attractive features of embeddings in reproducible kernel Hilbert spaces that make convergence proofs fall out ‘easily’ (quote-unquote; see the many pages of working in Minsker et al.!) for a variety of statistical problems.
The astronomical focus on kernel-based methods this week concerned the estimation of photometric redshifts; specifically the estimation of the distribution of redshifts (effectively ‘line of sight distances’ for non-astronomers) of an entire population of objects given some photometric measurements plus a training dataset of objects at precisely known (spectroscopic) redshifts with the same photometric measurements. A model-free methodology for tackling this problem is of course to use some kind of classic non-parametric estimator involving the kernel-based distance from the training data to the targets of the prediction. And indeed this is the approach explored by Rau et al. through the machine learning toolkits of quantile regression forests and ordinal class PDF estimation.
While I don’t claim to be an expert on machine learning techniques I had both good and bad impressions of the Rau et al. paper: the good being that there seemed to be quite some care put into choosing a sensible evaluation metric for comparing the different tools considered here (the authors’ choice being the KL divergence [imprecisely called a ‘distance’]) and on bandwidth selection; and the bad being that (1) the quantile regression trees are only discussed with a mean squared error loss function which wouldn’t make them quantile regression trees at all! and (2) I couldn’t find the word ‘boosting’ in the discussion of regression trees—the difference between boosted and ordinary regression trees being like the difference between conventional weapons and a nuclear bomb (in my very loose understanding of these machine learning tools). Feedback from an expert reader might well be appreciated here.
Happy holiday of the sexy pagan goddess to one and all!