A bold proclamation by Pflamm-Altenburg et al. today (link) that the theory of pure stochastic star formation for the initial cluster (stellar) mass function (ICMF) is ruled out by the Sharma et al. (2011) mass – radius dataset.

The ICMF is thought to be something like a power-law distribution with slope beta=2 (i.e., p(m) proportional to M^-beta). For beta>=2 we should take it as implicit that there exists some lower mass threshold to this distribution such that the density is normalizable; however, for practical purposes we can only observe down to a certain lower mass limit due to selection effects and so on. While an upper limit is unnecessary from such normalization arguments there seems to be an assumption in the literature that one exists (physics, I guess), and the major debate (at least in this paper) therefore centers on whether or not this upper limit is the same in all environments. PA et al. aim to demonstrate that the upper limit depends on galactocentric distance.

Their first (correct) step is to observe that if one looks at fixed bins of galactocentric distance then the distribution of the i-th most massive star will follow the distribution of the i-th order statistic, and therefore will depend on the number of clusters in the bin. Therefore, if we’re trying to figure out what the mass cut-off is doing by comparing, say, the distributions of i-th ranked stars we’d better choose equal-sized bins. So far so good.

To this end PA et al. examine the masses of each of the 1st, 2nd, …, 5th ranked clusters in a series of galactocentric radius bins of equal size (i.e., equal number of clusters per bin); this is shown in their Fig 3. Looking at this data it does seem like at least the 2nd, …, 5th ranked masses are declining with galactocentric distance, although the 1st ranked (most massive) appears relatively independent of distance. *To my mind this would suggest that it’s not necessarily the upper mass limit we’d better be worrying about (contrary to the title of their paper), but rather the beta of the power law that seems to be changing.*

To quantify these observations PA et al perform a series of least squares linear fits to the trends of 1st, …, 5th ranked masses against galactocentric distance. This is the second point of departure between my thinking and theirs. If you’re going to look at trends in quantiles you better use ** quantile regression** (cf. anything by Roger Koenker [who’s a thoroughly nice guy, btw]). This avoids binning and gives the appropriate weighting to each datapoint: whereas least squares takes a (x-y)^2 distance, quantile regression takes an |x-y| absolute difference instead.

In the figure below I show the results of a quantile regression applied to the data from JA et al. At each quantile value we fit an intercept and slope and plot the p-value of the slope significance (recovered by bootstrapping in the quantreg package for R). A significant slope has p < 0.05; while at p > 0.1 we’re probably just fitting noise. Reassuringly, given the amount of MC simulations JA et al did to verify their fitting significances we get relatively similar results: there does seem to be a highly significant change in the distribution of cluster masses around the intermediate quantiles (not the upper mass limit) in this dataset.

However, the JA et al. analysis is focussed entirely on what’s going on above the 0.7 quantile (5th most massive star out of 17). The point they neglect is that these results depend most sensitively on the reliability of our mass estimates and completeness of the sample *at the low mass end! *That is, there’s a lot riding on their assumption that 600 Msun is an appropriate completeness limit for this sample at all galactocentric distances. If we truncate our sample at slightly higher mass limits of 1000 Msun or 1500 Msun the significance of these results is not nearly as impressive (the lower panels of the above figure). True, part of the loss of significance goes with the reduction in sample size, but not all of it. *The point is that with these power law distribution functions completeness and Malmquist-type biases are something to be very concerned with, even when we’re focussing on the distribution of the upper i-th ranked masses*.

For this reason I really do wonder if JA et al might one day find themselves rueing the hubris of their couplet: “Pure stochastic star formation is thereby ruled out. We use this example to elucidate how naive analysis of data can lead to unphysical conclusions.” *Hmmm..*.

Nice post 🙂

Here’s a couple of my thoughts on the topic. First of all, finding a non-universal ICMF is not the same as ruling out stochasticity of star formation. Their definition of stochastic star formation has always been that the total distribution function of stellar masses in all clusters thrown together is the same as the distribution function underlying each and every cluster (and it could be different, for example through their ‘sorted sampling’ of the stellar IMF in cluster following a steep ICMF). Stellar masses are likely set by very small scale physics, whereas the cluster masses are more likely set by their galactic environment.

All well, let’s go on with their actual claim, that the ICMF depends on galactocentric radius (something that has been claimed by others, including me in my very first, and slightly naive paper). I fully agree that the lower mass end of the cluster masses is an important ingredient in such studies, look e.g. at their own eq.2…. There are a lot of concerns with the cluster sample. They only use young clusters, which cluster strongly, so confusion/blending may be a major issue. Completeness may be fine, although for those cluster with a large radius they may get the mass wrong, as it is hard to estimate what light belongs to the cluster and what doesn’t (star clusters are marginally resolved at the distance of M33). A wrong estimate of the dust extinction (which is highly degenerate with age and metallicity) might exclude some clusters from this young sample, at a probability that is higher for more embedded, and potentially more massive clusters.

I won’t at all argue with your statistical analysis. I love your final paragraph. How did you get their data? You just read them all off from Fig. 3? Would it be worth contacting the authors? It seems that this paper already is accepted, by a referee who isn’t even acknowledged…

Interesting … I guess what is really required is to fold in some estimates of all these sources of uncertainty and possible bias (the confusion, the dust, etc.). I made a rough version of the data by opening their Fig 2 (from the astro ph source file) in adobe illustrator, deleting everything but the datapoints themselves, saving as an eps (which gives x-y coordinates for the points relative to some arbitrary point on the page, then rescaling these points to match the limits of the graph!

I think it would indeed be worth asking for the original catalogue from the authors (possibly before the truncation) and trying to cross match with the “original” data paper to flag for quality of the multiband photometry (and hence for quality of the mass estimates).

I can email them. If you want a cc, then please give me an email addy somehow (private fb message?). Shall I refer them to your post here, or would you rather have me being descriptive in the idea and ask for the data without showing that you basically already proved them wrong?

Hi Marcel, yes, please cc me; I’ll send you my email address on facebook. I already sent a link of the blog post to my friend, Carsten Weidner, who collaborates with Kroupa et al. so emailing them the link at the same time as the request is probably superfluous. Looking forward to seeing what this full dataset looks like …

Pingback: Another MNRAS stats disaster … | Another Astrostatistics Blog