On Poisson Point Processes (PPPs) in astronomy …

All quiet on the blog for the past few weeks on account of multiple trips abroad (including visits to the Institute for Disease Modeling, the Institute for Health Metrics Evaluation & the Bill & Melinda Gates Founation in Seattle, and the Swiss Tropical & Public Health Institute in Basel). Astronomers wondering what sort of role BigData(Trademark) ideas play in healthcare economics might well be interested in the brand new biography of IHME’s Chris Murray, Epic Measures.

Having thus been so busy with travel & work I could find time to do little more than glance over astro ph each day, with three results: (1) that I have a few papers downloaded to read in the future; (2) that I went apeshit when I saw a paper (re-)introducing ABC for astronomers that managed to ignore all past ABC work in astronomy including my own (cf. a Xian’s Og for some context); and finally (3) that  I found cause to dig into the a definition of the Poisson Point Process as I will describe below.

The beginning on my saga is with Mario Gennaro et al.’s recent arXival presenting “A new method for deriving the stellar birth function of resolved stellar populations”, in which the authors take a Poisson Point Process approach to constructing a likelihood function for observations that might be used to reconstuct the global population parameters of a collections of stars. Personally I find this presentation (of starting with a complex PPP form for the upper layer of the eventual hierarchical likelihood) to be more confusing than necessary in comparison to the alternative (of starting from the bottom of the hierarchy to define the population level distribution). But as it yields the same solution more power to the authors. One quirky detail in this paper that caught my eye and triggered an investigation leading down the rabbit hole of measure theory and Hausdorff metrics was their Eqn 6, which transcribes Eqn 2.11 from Streit’s 2010 book, “Poisson Point Processes” (whose notation I will use):

p_{\chi|N}(\{x_1,\ldots,x_n\}|n) = n! \prod_{j=1}^{n} p_X(x_j).

In Streit’s introductory book the Poisson Point Process on a compact subset R of S (with S more-or-less assumed to be \mathcal{R}^m) is defined via an algorithmic construction: first, one draws the integer number of points, n, from the ordinary Poisson distribution with mean parameter set equal to the integral of the intensity function, \lambda(s), over R; second, for n > 0, n points, (x_1,\ldots,x_n), are drawn iid according to the density, p_X(x), corresponding to a normalized version of \lambda(s); and third, (x_1,\ldots,x_n) is returned as the ‘unordered set’, \{x_1,\ldots,x_n\}. Elements of the space of such unordered sets (known to mathematical physicists as configuration space, C_nS) in Streit’s book are denoted \chi (for given n) to distinguish them from elements of the space of ordered n-tuples, X. The Eqn 2.11 above is therefore presented as the conditional pdf of \chi|N\ (=n). However, from a measure theory (and, hence, formal probability theory) perspective it seems to me the statement of Eqn 2.11 is meaningless.

To see what’s going on we first need to observe what is meant by ‘ordered’ here. The ordered n-tuple (x_1,\ldots,x_n) places the simulated x_i in the order by which they were simulated: so we’re not talking [yet] about a total ordering on the space, S, rather we’re talking about having ‘visible’ the labels, (1,\ldots,n), that tell us which x_i is which. When we construct \{x_1,\ldots,x_n\} we simply lose the labels. Now the measure P_{\chi} (dropping the |N notation reminding us of the conditioning on N) to which p_\chi might be thought of as a density given an underlying measure on the configuration space, C_nS, (though I know of no automatic equivalent to Lebesgue measure on this space) is an induced measure from our base measure, P_X, on the space of ordered n-tuples of (x_1,\ldots,x_n), S^n. To understand how this induced measure works (following the classic Halmos approach) we introduce the measurable mapping H:\ (B[S^n],(x^n)) \rightarrow (B[C_nS],\{x^n\}) given by the losing the labels operation; here the definition of measurable is that the pullback of any set in B[C_nS] is in B[S^n] (with B[] denoting the generated Borel sigma algebra on the respective metric spaces: taking Euclidean distance for S^n and the equivalent Hausdorff distance for C_nS); with the first of those algebra-set pairs having the base measure P_X to form a probability triple. For any B[C_nS]-measurable function, F(\{x^n\}), the induced measure is evaluated as

\int_G F(\{x^n\})dP_\chi(\{x^n\}) = \int_{H^{-1}(G)} F(H(x^n))dP_X((x^n))

with the latter equal to \int_{H^{-1}(G)}F(H(x^n))\prod_{j=1}^n p(x_j)dx^j. With a little visualisation one can see that for some small sets, which we’ll denote K, the pullback operation effectively makes n! symmetric subsets (call these copies of, say, the set with the n-tuple of  lowest x_1 value, k) in S^n which have equal measure so that P_\chi(K) = n! P_X(k), but that generally this isn’t true: take for instance the whole of C_nS for which the induced measure obviously evaluates to one.

Interestingly, if we were to make a measure that would have for every measurable set, G, the relation P_\chi(H^{-1}{G}) = P_{X'}(G) = n! \int_{G} \prod_{j=1}^n p_X(x_j)dx_j it would have to be induced from the ordered n-tuple X' : (x_{(1)},\ldots,x_{(n)}) where here the ordering no longer refers to labelling but to a total ordering on the space S. For one-dimensional S this could be simply the real line ordering, but for multi-dimensional S we would need to introduce a function playing the role of the likelihood in nested sampling).

Is any of this important? Not really, in Bayesian parameter estimation (for a parametric model of the underlying intensity function) any normalizing constants in front of the likelihood function drop out so applying either formula 2.11 or 2.12 as if they apply to S^n with labels known or unknown will give the same result. Likewise, in any case Streit comes by (IMO) fudgy means to a formula in Eqn 2.23 that will (if we close our eyes and forget we don’t know the labels) allow us to solve for posterior functionals. But, it does give me an idea of how I’d improve the constructivist approach to introduce the PPP (to readers uninterested in measure or stochastic process theory): do Steit’s algorithm steps (1) and (2) but then instead of (3) which returns the unordered list I’d introduce a random label switching operation to return an n-tuple (y_1,\ldots,y_n) being a random permutation of (x_1,\ldots,x_n). This would remove all confusion about ordering (ie. meaning label-switching or total ordering), remove all confusion about n! factors in densities, and give a neat explanation for the symmetrization operation of Streit’s Eqn 2.24.

Much respect to anyone who actually read this far!

This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s