A fairly common technique in astronomical data analysis is to estimate the impact of some particular physical process on a nominated observable (e.g. the impact of AGN activity on host spheroid SFR) from the difference between the mean value of that observable amongst two samples — one that has experienced the particular physical process (e.g. AGN hosts) and one that hasn’t (e.g. AGN-less galaxies) — *matched* pairwise on a series of other possible explanatory variables (e.g. galaxy mass, environment, Hubble type). Examples include: Martini et al. 2003, Gabor et al. 2009, Silverman et al. 2011, Kocevski et al. 2012 [and many more].

The motivation for this technique is quite intuitive and one would imagine readily appreciable even to non-scientists; but perhaps even more so if we describe the problem in its more common context: medical research. Here we want to estimate the typical impact of a given treatment on population health, but for ethical or practical reasons we cannot run a randomized controlled clinical trial, so instead we look for two otherwise identical groups in the population who have and have not, respectively, experienced this treatment. For instance, we might aim to estimate the mean difference in life expectancy between smokers and non-smokers after controlling for other lifestyle factors (e.g. alcohol consumption, physical activity) that might skew our results.

There is, however, one major difficulty for such studies, which typically arises when the number of possible confounding factors is large: namely, that with limited data it can become very difficult to find a sufficiently large control sample matched well to each of the properties of a counterpart in our treatment group. (A similar difficulty is the seeming arbitrary-ness of the distance metric used to define similarity in this context; for continuous random variables exact matches are, of course, impossibly unlikely, so we must take near-matches.) For this reason it is common in population health studies to match instead on the propensity score, defined in somewhat the reverse sense to the original matching as “the conditional probability of assignment to a particular treatment given a vector of observed covariates”. The classic paper is Rosenbaum and Rubin (1982): http://biomet.oxfordjournals.org/content/70/1/41.short . Under certain (not particularly restrictive) conditions it may be shown that propensity score matching will give unbiased results, just like for the “ordinary” pairwise matching.

Seems like this would be *very* useful for astronomical studies. So I was wondering, has in fact this technique been used already in astronomy (i.e. i just don’t know about it)? and, if not, why is it not yet popular?

[One barrier to its use in astronomy may well be the obscure notation used in the Rosenbaum and Rubin paper; in order to make sense of it for myself I had to re-derive a number of their proofs from scratch! But, then again, there are more modern reviews of propensity score matching which overcome this problem too.]