[Non stats post] I love Australia, but seriously, f*£k Australia!

Apologies for the off topic post, but as an Australian blogger I thought it important to give emphasis to the latest evidence of sadistic abuse going on under the authority of the Australian people at our off shore detention centres: The Nauru Files.  The non-Aboriginal peoples of our country have a long had a difficult relationship with other new arrivals, from the Lambing flat riots to the White Australia policy, and the widespread racism experienced by the ‘wogs’ who settled here since the second world war and the ‘gooks’ who arrived after the Vietnam war.  In recent times (that is, for at least the past fifteen years) our politicians have had great success in harnessing racist attitudes amongst the population by whipping up fear of boat people and promising ever tougher measures to deter the huddled masses from attempting passage to Australia by boat.  A case in point being the Tampa affair which saw John Howard win huge support for refusing entry to Australian waters for a Norwegian commercial vessel that rescued 438 refugees (mostly Afghani) from a sinking ship in international waters between Australia and Indonesia.

herd.jpg
The song “77%” by The Herd reflects on the Tampa Affair.

Since the Tampa Affair both sides of politics—namely, the Australian Labor Party and the (ironically named) Liberal Party of Australia—have taken a hardline attitude to ‘illegal’ arrivals of refugees by boat, building up the myth that we need a strong deterrent against those seeking to ‘jump the queue‘ of ‘legal’ refugees.  The prime focus of which has been the consignment of such ‘illegal’ arrivals by boat to detention centres on Nauru (a tiny Micronesian republic) and Manus Island (part of Papua New Guinea).  Here the asylum seekers (including women and children) are held for years at a time while their cases are ‘investigated’ to determine whether or not the government will grant them refugee status.  The living conditions are these internment camps are squalid, the guards are known to harass the women and kids for sexual favours, and rates of mental illness are consequently staggeringly high amongst detainees.  In PNG homosexuality is illegal and gay refugees on Manus Island live in fear for their safety.  The latest leak of documents from the Nauru detection centre paint a shocking picture which it has been argued amounts to raw evidence of torture.

As much as I feel pride in my fellow Aussies when I see them doing well at the Olympics or excelling in science and the arts, I feel just as much shame at our dickhead attitude towards refugees.

I’ll finish here with some thought-provoking words from “The Phillip Ruddock Blues” by a great Aussie band called TISM:

Why should we let towel-heads in
’cause their ships won’t float?
what other race has ever come to Australia on a boat?
and if self-interest should rule
five miles out from shore
why the hell don’t it apply to those who live next door?

tism.jpg

Posted in Uncategorized | Leave a comment

Learning GP covariance matrices by example …

Another astro ph submission that I’m reviewing a few weeks late here is this one by Garnett et al., entitled “Detecting Damped Lyman-\alpha Absorbers with Gaussian Processes”.  The application here is pretty much what it says in the title; that is, probabilistic classification of QSO spectra into those containing DLAs and those without (plus, of course, an estimation of the redshift and column density of any candidate absorption systems).  Underneath the hood this is a straightforwards Bayesian model selection problem in which the underlying quasar spectrum is modelled as the realisation of a Gaussian process and any DLAs as Voigt profiles acting multiplicatively on top.

The strength of the analysis here is that a series of sensible decisions are made to yield a fast and effective implementation.  In particular, (i) a large training dataset of labelled SDSS quasar spectra is used to learn a reasonable mean and covariance function for quasars without DLAs, with the covariance represented via a low-rank approximation; (ii) the Woodbury identity and Sylvester determinant theorem are used to make efficient matrix operations on the Gaussian; and (iii) the two parameters of each absorber are integrated out numerically via a quasi-Monte Carlo ‘grid’ (FYI: Gerber & Chopin give some useful notes on the relative performance of MC and QMC methods at given dimension in the introduction to their QSMC read paper).  While it may not be common in purely observational data analysis problems to have a pre-labelled catalogue against which to learn a covariance function, it is often the case in astronomy that one has a reference set of simulated mock datasets against which a covariance can be learnt so this approach could well be useful in a number of other settings.  For example, to learn the (presumably non-stationary) spatial covariance function underlying a log-Gaussian Cox process of the galaxy distribution in a magnitude-limited survey.

Looking forwards, I wonder as to the validity of the assumed Gaussian likelihood (as proxy for Poisson photon counting statistics) as the method is rolled out to noisier spectra; of course, removing Gaussianity from the top layer of the likelihood function would mean analytic solutions are no longer available for key marginalisation steps, with presumably a massive increase in computational cost to the algorithm.

Minor notes:
– I was surprised to see an unweighted mean in Eqn 22 since I would have thought each quasar might have a different observational noise on its j-th wavelength bin for which usually one would use inverse variance weighting
– The term ‘unbiased’ is used improperly (from a statistical point of view) in describing the QMC integral of Eqn 57.

Posted in Uncategorized | Leave a comment

Some Bayesian unsupervised learning …

I noticed the latest IFT group paper on astro-ph (as a replacement) this morning—“SOMBI*: Bayesian identification of parameter relations in unstructured cosmological data” by Frank et al.  [* which I imagine to be pronounced thusly]—and, since I have some free time while waiting for simulations to finish, I thought I would comment on this one for today’s blog entry.

The idea of the SOMBI code/model is to first ‘automatically identify data clusters in high-dimensional datasets via the Self Organizing Map’, and then second to recover parameter relations ‘by means of a Bayesian inference within respective identified data clusters’.  Notionally we are to imagine that ‘each sub-sample is drawn from one generation process and that the correlations corresponding to one process can be modeled by unique correlation functions.’  In effect, the authors cut their model such that the identification of clusters is not informed by the nature of any inferred correlation functions; nor is there any shrinkage achieved by a hierarchical structure of shared hyperparameters during the regression on each subset.  This seems to me to be an unfortunate choice because it does turn out in the application given (and one would expect more generally) that some parameter relationships are not entirely dissimilar between groups (for which shrinkage would be sensible); likewise, two groups turning out to have similar relationships might actually be sensibly pooled back into one group in a model without cuts (and so forth).

For those interested the model for regression in each group is simply:
y \sim N(X'B,\sigma^2),
B_{1:M} \sim U(-\infty,\infty),
\sigma \sim U(0,\infty)
with X being a design matrix consisting of the 1 to mth powers of the independent variable, x.  The normal-uniform likelihood-prior choice is used to allow explicit marginalisation and the empirical Bayes solution for \sigma|m  is adopted under m chosen via the BIC.

One criticism I would make is that this work is presented in vacuo without reference to the myriad other approaches to automatic data clustering and/or variable selection/regression already existing in the stats and ML literature.  Certainly the BIC has a long history of use for selection of regression coefficients in linear (and GLM) models and its properties are well understood; but these are typically in the context of Bayesian models with priors that inform the expected scale of power terms and without the free parameter of unknown \sigma.  (For reference, Gelman et al. recommend a standardisation of inputs before construction of power terms under their Cauchy prior structure for logistic regression).  Alternative penalties for automatic covariate selection are (e.g.) the LASSO and ridge regression, and Bayesian model selection with Zellner’s g-prior.  Likewise, there are numerous Bayesian schemes which aim to achieve clustering and regression inference without cuts to the model, typically involving a clustering over some latent space (with dimensional reduction; e.g. Titsias & Lawrence).  In particular, it would have been useful to see such a comparison performed over a common dataset (e.g. one from the default R package) as is common in NIPS submissions.

Posted in Uncategorized | Leave a comment

Once more unto the breach …

It’s been a long time between posts here at Another Astrostatistics Blog; there’s no single reason for my hiatus, more like a diverse series of distractions, including travel to Seattle for the IDM Disease Modelling Symposium and to Pula (Sardinia) for the ISBA World Meeting.  Both of these were excellent conferences and I will hopefully find the time to blog about some of the more interesting presentations as I remember them.  A more recent distraction has been my referee duties for NIPS 2016; of the seven manuscripts I reviewed from the fields of  ‘ABC’, ‘non-parametric models’ and ‘Gaussian processes’ at least two could be important tools for astronomical data analyses—the \epsilon-free ABC approach of Papamakarios & Murray and the non-parametric (GP/spline-based) model for clustering disease trajectory maps (i.e., irregularly-sampled time series) of Schulam & Arora.  The former has already had an unpublished application to the fitting of cosmological models described by Iain Murray at the MaxEnt 2013 meeting, while the latter could well prove useful in the study of x-ray or \gamma-ray bursts (for instance).

As a show of good faith for loyal readers wondering if indeed I really have resumed blogging duties I give below some thoughts upon reading the new paper by Licquia & Newman, in which the authors seek an ensemble estimate for the radial scale-length of the Milky Way disk through a meta-analysis of previously published estimates.  Methodologically this new paper is quite similar to their earlier meta-analysis study of the Milky Way’s stellar mass and SFR, albeit with an added component of Bayesian model averaging to allow for uncertainty in the optimum choice of model for their ‘bad measurement’ category .  By way of reminder, Licquia & Newman’s meta-analysis model is a mixture of two likelihoods—‘good’ studies contribute to the likelihood exactly as per their quoted uncertainties on the disk scale length in what amounts to a fixed-effect meta-analysis model, while ‘bad’ studies either don’t contribute at all or contribute in a weakened sense by re-scaling of their error bars (depending on the model choice).

As with their earlier paper I have a positive impression of the analysis and implementation, and I think there remains great potential for meta-analysis studies to forge more precise estimates of key astronomical quantities from the many conflicting results reported in the literature.  However, while reading this manuscript I began to question whether the adopted meta-analysis model—the mixture of a fixed-effect component with an over-dispersed component—is philosophically the ‘right’ model for this purpose.  The fixed-effect model is typically used in (e.g.) clinical trial meta-analyses when it is believed that each separate study is targeting the same underlying parameter, with disagreements between estimates driven primarily by the sampling function rather than systematics like structural differences in the study cohorts.  In the case of the Milky Way there is ultimately only one galaxy with a fixed population of stars to observe, so the variation in estimated disk scale lengths is driven primarily by differences in the region of the disk included for study, treatment of dust extinction, the adopted galaxy model and the method of fitting it.  From this perspective it seems the answer sought requires a model for the variation introduced by different astronomers’ decisions regarding these analysis choices, for which a random-effects meta-analysis would seem more appropriate.  Or at least, this seems a question which deserves some further investigation if the community is to be convinced that meta-analyses are indeed a good thing for astronomy.

One side note: Although it’s not discussed explicitly it seems to me that the authors’ computation of the marginal likelihood for each model and the posteriors in L_d and f_\mathrm{good} are based on summation of the mixture likelihood such that the ‘labels’ (i.e., ‘good’ or ‘bad’) are not sampled in the posterior representation, and hence each study cannot be assigned a posterior probability of being either ‘good’ or ‘bad’.  It’s natural to perform the computation in this manner to avoid having to perform Monte Carlo over a high dimensional discrete space (the labels) but it seems a shame not to be able to report the inferred ‘quality’ of each study in the results.  I wonder (not rhetorically) if there’s a simple a posteriori ‘trick’ that one can use to compute or approximate these scores?

Posted in Astrostatistics | Leave a comment

More fine structure silliness …

I noticed this paper by Pinho & Martins on astro ph today (accepted to Phys Lett B) concerning the alleged spatial variation of the fine structure constant; I say alleged but from reading this paper you’d think it was a settled debate with only the precise functional form of the final spatial model left to be decided.  In this latest instalment the authors propose to consider what updates can be made to the parameters of the spatial dipole model given 10 new quasar absorption datapoints (along 7 unique sight lines) drawn from post-Webb et al. studies published in the recent literature, with “the aim of ascertaining whether the evidence for the dipolar variation is preserved”.  Which, since they don’t consider the possibility of systematic errors* in the Webb et al. dataset, it is … since 10 data points with slightly lower standard measurement errors—and supposedly lower systematic errors—cannot trump the couple of hundred original measurements used in the Webb et al. analysis.

*Here I pause to note that Pinho & Martins propagate Webb/King et al.’s obtuse definition of “systematic uncertainties” as being strictly zero-mean random errors, whereas usually systematic errors are taken to include the possibility of an unknown bias inherent to the instrument or technique.  That is, according to the canonical definition, if my bathroom scales have a systematic error of +/- 0.1 kg I cannot hope to learn my true weight to better than a +/- 0.1 kg accuracy no matter how many times I weigh myself and average the measurements.  In order to improve the accuracy I would need to average measurements taken with many different bathroom scales. (Replace bathroom scales with telescope+instrument pairings.)

Problematically the authors don’t investigate the new data in terms of hypothesis testing, which would have been a worthwhile approach since the Webb/King et al. model makes quite specific predictions for \Delta \alpha / \alpha along these sight lines.  Since the dataset is so small I was easily able to code this test up (see plot below) and compute a Bayes factor comparing the marginal likelihood of the new data points under the dipole model (marginalising over the posterior parameter uncertainties from the King et al. paper) against that of the null (\Delta \alpha / \alpha = 0 everywhere).  The result is a Bayes factor of 3.9 in favour of the null; obviously not strong evidence (which it no surprise given the sample size) but also obviously not in any way strengthening the dipole hypothesis.  So, more people compliment the emperor on his wonderful new clothes and the charade continues …

finestructure.jpg
(Blue is predicted by the dipole model; magenta is observed.  These error bars are 95% credible intervals for the predictions and 2\times the standard errors [i.e., also ~95%] for the observations.)

 

Posted in Uncategorized | 1 Comment

Bayesian shrinkage …

Today I read the new arXival by Leistedt et al. on the topic of photometric redshifting, in which a hierarchical model is proposed to include learning something of the inherent distribution of galaxies in redshift-magnitude-template space from the data itself: constraining the hyper-parameters of a distribution model and tightening the posteriors on each individual galaxy’s inferred redshift through Bayesian shrinkage.

I link to David van Dyk’s talk on this topic at the IAUS306 since afterwards I wrote to Narciso Benitez & David: “I’m not sure if you’ve talked to each other at the meeting yet, but when I was watching Narciso’s presentation on the importance of the prior for photometric redshift estimation I thought it seemed a very natural place to apply the Bayesian shrinkage ideas that David presented.  I.e., allow the data itself to improve the prior for both the redshift distribution and the proportion of each template type in the survey under study using a simple hierarchical model with some hyper parameters controlling the shape of p(z) and the probability of each template!“.  Though I do not believe they did end up collaborating …

All that to say, I think this is a good idea.  With regards to specifics I think the proposed model in this particular paper is a little simplistic.  For instance, the intrinsic distribution is formed by (arbitrarily) dividing the redshift-magnitude-template space into rectangular prisms across which a constant contribution to the prior mass is assigned by the Dirichlet.  Alternatively one could avoid arbitrary binning by supposing a distribution on the partitioning (a la methods for treed Gaussian processes; Gramacy & Lee or Kim & Holmes) and then marginalising over this unknown distribution (which, in the treed GP case, does tend to produce smooth marginals despite the inherently discontinuous nature of the sub-models).  Moreover, a stronger prior of correlation between bins could well be motivated by the adjacency of the prisms in the physical space.  Another minor problem with the proposed implementation seems to be that since the likelihood stated does not depend on the magnitude each galaxy is presumably being assigned to a bin solely according to its (noisily) observed flux in the reference band.

On a more general level I wonder at the value of the inferred intrinsic distribution as compared to the posterior draws of the true photometric redshift lists themselves.  In this type of ‘deconvolution’ problem (e.g. mixture models) the posterior for the intrinsic distribution is typically highly prior/model sensitive (see, e.g., mixture model examples).  If one reports both the posterior samples of the true redshifts along with their prior densities it makes it easy for others to come along with their own prior/models for these latent variables and apply a simple reweighting to approximate their own posterior (as in my mixture modelling paper).  Ultimately for comparison to cosmological models one would like a prior tuned (e.g.) for consistency with simulations of mock galaxies from that family of models, I would imagine.  Likewise, one can compute whatever summary statistics concerning the redshift distribution that one might be interested in as functionals over the true redshift samples.

Edit: I forgot to mention that I think the toy simulated dataset example should be augmented with a test of performance (in RMSE) on one of the photometric redshifting challenge datasets.

Posted in Uncategorized | 2 Comments

Aussie science continues its slide down the toilet …

Recent weeks have seen some disturbing science news coming out of Australia.  The University of Wollongong awarding a PhD for an anti-vaxxer thesis; the CSIRO axing its Oceans and Atmosphere division; the CSIRO claiming credit for its role in the discovery of gravitational waves—work done by a unit it had already axed; and now this: the ABC’s flagship ‘science’ program, Catalyst, running a Bart’s people-esque story on the harmful effects of wifi signals and mobile phones, followed up by a piece in the Guardian by the same ‘journalist’, Maryanne Demasi.

bartspeople.png
It’s the classic mix of conspiracy theory, ‘undone science’, and general mumbo-jumbo we see in all sorts of pseudo-science narratives, from anti-vaxxers to flat-earthers (incidentally, I find hollow-earthers the most interesting of nut jobs).  In particular, Maryanne fails to understand the nature of scientific proof and rational decision making, taking the position that if scientists can’t ‘prove’ these signals are safe we must assume they’re harmful.  The only ‘evidence’ of harm she has is reference to a 2011 IARC monograph review of studies investigating the possibility of a link between mobile phone use and brain cancer, which concluded:

“The human epidemiological evidence was mixed. Several small early case–control studies were considered to be largely uninformative. A large cohort study showed no increase in risk of relevant tumours, but it lacked information on level of mobile-phone use and there were several potential sources of misclassication of exposure. The bulk of evidence came from reports of the INTERPHONE study, a very large international, multicentre case–control study and a separate large case–control study from Sweden on gliomas and meningiomas of the brain and acoustic neuromas. While affected by selection bias and information bias to varying degrees, these studies showed an association between glioma and acoustic neuroma and mobile-phone use; specically in people with highest cumulative use of mobile phones, in people who had used mobile phones on the same side of the head as that on which their tumour developed, and in people whose tumour was in the temporal lobe of the brain (the area of the brain that is most exposed to RF radiation when a wireless phone is used at the ear).”

The ultimate decision of the IARC working group was to categorise RF radiation in group 2B: “Possibly carcinogenic to humans“.  For many scientists thought this was overly conservative (i.e., erring on the side of an unnecessary warning) and that a more appropriate category would be 3: “Not classifiable as to its carcinogenicity to humans”, owing to the noted limitations of the INTERPHONE study (e.g. Dr Ken Karipidis, a rare voice of sanity in the Catalyst episode, says “On a personal level, I don’t think it should be a 2B.”)  In fact, the argument could be made that the IARC working group mis-interpreted the INTERPHONE study results given that their conclusions were quite different to those of the INTERPHONE study team itself:

“Glioma and meningioma
Overall, no increase in risk of glioma or meningioma was observed with use of mobile phones. There were suggestions of an increased risk of glioma at the highest exposure levels, but biases and error prevent a causal interpretation. The possible effects of long‐term heavy use of mobile phones require further investigation.

Acoustic neuroma
There was no increase in risk of acoustic neuroma with ever regular use of a mobile phone or for users who began regular use 10 years or more before the reference date. Elevated odds ratios observed at the highest level of cumulative call time could be due to chance, reporting bias or a causal effect. As acoustic neuroma is usually a slowly growing tumour, the interval between introduction of mobile phones and occurrence of the tumour might have been too short to observe an effect, if there is one.”

Interestingly, this is not the first time that Maryanne Demasi has been under fire for pseudo-science and bad journalist with Catalyst: her earlier debacle concerning the efficacy of statins was the subject of a memorable Mediawatch episode.  The same problems are evident in this new episode: a parade of ‘maverick scientists’ given the lion’s share of airtime with only a couple of token soundbites from ‘the establishment’.  Only this time there’s an extra pinch of “Won’t somebody please think of the children?”.

maude
Anyway, I’d be happy to explain in detail to the ABC and/or Maryanne Demasi how retrospective cohort studies work and the logic behind the ‘establishment’ position of ‘we don’t have any evidence that RF radiation from mobile phones or WiFi is harmful but we’re not prepared to say that they’ve been proven safe’, should they fancy a lesson in epidemiological statistics.

Posted in Uncategorized | Leave a comment