Data compression for cosmology

Today I read an arXival on the topic of data compression or likelihood compression for cosmology, which translates to construction of approximately (or exactly) sufficient summary statistics via a transformation with reference to the assumed likelihood function.  As Alsing & Wandelt point out, this type of transformation falls in the class of score function likelihood summaries.  Both papers mention the possibility of using these summaries as low dimensional targets for likelihood-free inference methods comparing mock data simulations to the observed cosmological data.  In this case the technique becomes the ABC-IS of Gleim & Pigorsch (see also here & here), and has a connection to indirect inference and the method of moments.  I’m somewhat skeptical that a realistic application would meet the conditions needed for sufficiency (with respect to the simulation-based model) of the auxiliary summary statistics; but some of the general insights might cross over (e.g. the auxiliary model should aim to be as richly descriptive as the simulation-based model, even if structural/computationally simpler).

This entry was posted in Uncategorized. Bookmark the permalink.

4 Responses to Data compression for cosmology

  1. Alan Heavens says:

    Not sure what the source of your skepticism is, Ewan. As far as I’m aware MOPED has essentially always worked for problems that it’s designed for, except in one case by Graff, Hobson and Lasenby ( that showed degeneracy (where it is easily modified to deal with it). Examples including real applications and simulated data include,, and for the CMB.
    Best, Alan

    • To clarify: I’m definitely in favour of score-based data summaries; my skepticism is with regards to the sufficiency condition for application in an ABC setting given by Gleim & Pigorsch ever being satisfied in a real application. That said, the general mood (e.g. over on xi’an’s ‘og) is that the lack of sufficiency in summary statistics for ABC is not the big deal it’s sometimes made out to be anyway. I’m actually very enthusiastic about using simulations from complex models (or bootstrapping schemes) with likelihoods (or likelihood summaries) from simple models.

      Btw. I wanted to ask you about my post on coverage ( ). Do you know of any papers in which a fixed set of cosmological parameters is chosen, a bunch of mock datasets are simulated, and the Bayesian posterior for each is constructed and compared with the fixed parameters? (Ideally one that looks at the Hubble constant).

      • Alan Heavens says:

        Dear Ewan,

        Distance ladder analyses are rarely Bayesian, and the most sophisticated BHM analysis of Feeney, Mortlock and Dalmasso, MNRAS, 476, 3861 (2018) doesn’t look at coverage. The nearest to what you want, that I’m aware of, is for supernovae: there are some coverage results in March et al, MNRAS 437, 3298–3311 (2014) (Note that the arxiv version doesn’t have them). In an idealised context there is a coverage study in

        On the more general question, I’m unclear whether to ignore mismatches between frequentist and Bayesian intervals completely in cases where Bernstein-von Mises does not apply (because it does not apply…), or whether to use the disagreement as a useful if non-rigorous indication of model misspecification. Perhaps there is no general rule-of-thumb.


  2. Thanks for the references; I couldn’t quite derive an impression on coverage from the March et al one because it looks like they just quantify bias from pooling samples from posteriors of 100 mock datasets. I did find in which they do something very close to checking coverage which was to test the average of “the distance between true and posterior mode parameters divided by the 1 sigma credible interval width”, but crucially they only do so for the transformed parameter sets (e.g. sigma_M h^2) designed to reduce the impact of degeneracies; nothing on H0 itself.

    I also found a 2004 article by Genovese et al echoing (in a time-reversed sense!) my exact concerns:
    “Interestingly, there seems to be some confusion about the validity of frequentist inference in cosmology. Since we have access to only one Universe–and thus cannot replicate it–some feel that it makes no sense to make frequentist inferences. This represents a common misunderstanding about frequentist inference in general and confidence intervals in particular. The frequency statements for confidence intervals refer to the procedure, not the target of the inference. Our method for constructing confidence balls traps the true function 95 percent of the time, even over a sequence of different, unrelated problems. There is no need to replicate the given experiment, or

    These confusions have led to an interesting movement towards Bayesian methods in cosmology. Of course, when used properly, Bayesian methods can be very effective. Currently however, the Bayesian interval estimates in the physics literature seem questionable, being based on unfettered use of marginalizing over high-dimensional, degenerate likelihoods using flat priors chosen mainly for convenience. Indeed, an active area of research is finding corrections for such intervals to make them have correct coverage. Moreover, the potentially poor coverage of the Bayesian interval seems not to have been widely recognized in the Physics literature.”

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s