This is an approach I’ve been encouraging cosmologists to explore in the past, although it wasn’t obvious to me that it would be a guaranteed improvement since it seemed in my consideration (for the case of cosmological covariance matrices) that the marginal likelihood approximation would still require mock data simulations at each test set of cosmological parameters. In this arXival the authors seem to side-step the need for mock data simulations from the non-linear model component by adopting a so-called tabulation strategy. Since I’m unfamiliar with that particular approach it is unclear to me whether this gives an unbiased approximation to the likelihood or is somehow a crude estimator, which means that some of the efficiency gain may simply be a sleight-of-hand.

In any case, since I’m again discussing this problem I may as well say the same thing I always say, which is ‘why not go further and do multi-fidelity Bayesian optimisation?’. Estimation of the marginal likelihood (using the tabulation trick and nested sampling) takes approx 0.3 hours to achieve a reasonable estimate. Presumably one could get the broad shape of the posterior by running nested sampling with much fewer live particles, and then decide where to refine estimates with more live particles in an automatic manner. Ditto for scheduling of further cosmological simulations.

]]>My main gripe is that the authors put a lot of emphasis on the advantages of the (estimator of!) their prefered CDE loss function, but nowhere here examine: (i) how close their estimator of the CDE loss is to the actual loss; (ii) how close their estimated-loss-minimising method is to the true posterior; or (iii) what implications this loss function might have on model building for the purpose of error propagation. [The first two of these are similar concerns to those raised on Xi’an’s ‘og in regards to an earlier paper by some of the authors presenting these ideas.] Instead we have in Example 1 the authors claiming that a goodness-of-fit test can’t distinguish a clearly nonsense model from a good model, but that their CDE loss can—when, in fact, the supposed goodness-of-fit test that fails here is the PIT test, which is not a goodness-of-fit test at all; it is a means to investigate an aspect of Bayesian uncertainty calibration, not really accuracy in location. In Example 2 the authors examine their RFCDE approach as a post-adjustment method for ABC posteriors and establish its supposed superiority according to their estimator of the CDE loss function. But if you visually compare (by inspecting their Figure 3) the RFCDE posterior approximation against the raw ABC posterior it is clear that their method’s contribution is to reform the ABC posterior so as to place vanishingly small amounts of posterior mass on the ground truth line of degeneracy (along which the generated data are indistinguishable) in some places (e.g. for all of ). Is that really an improvement?

With regards to the topic of error propagation, I’m not sure that approaches adding kernel density estimators actually improve on point cloud based posterior summaries or other representations (such as clever parametric summaries). A previous application of random forests to reconstruct the Bayesian posterior as a point cloud (the first step of RFCDE before the final KDE is added) in an astronomical context can be seen here. (Although note that the Figures and Results shown in that paper were not actually created with the exact method referred to in the paper, which is another discussion entirely!) The topic that really needs investigating in this area is how to construct efficient posterior representations that allow for prior-reweighting: such as when individual photo-z posteriors fit under a simple prior might be later reweighted to reflect a model with shrinkage based on spatial clustering. Draws from nested sampling help in this because they naturally include samples from a broader region of parameter space than a purely posterior focussed simulator. But indeed there can be challenges for storing the draws for fits to a large catalog of objects, so maybe we’re back to clever parametric summaries that allow some separation of the prior and likelihood contributions?

]]>across ad hoc methods for Monte Carlo integration of posteriors. Of

course, there are ready-made tools that can be downloaded and easily

used for most problems, still, one occasionally finds the lack

literacy in statistics leading to methods which are, to use the words of

Ewan, without

confusion concerning Bayesian methods in x-ray polarization

studies. Much of this comes from the classic misunderstanding of

credible regions and confidence regions. You’ll see astronomers

worried about not being able to

a parameter boundary. This lead to some Bayesian magic here that touches a bit on the issue, but applies only to a very fragile and

specific type of polarization data: one where the data are directly

latent polarization and everything is beautifully Gaussian. In other

words, not x-ray data! The point of the work was to derive an analytic

Nevertheless, the idea was picked up by x-ray astronomers and non-principled methods have taken over. For example, here is an application that ignores all the fragility inherent in Poisson distributed data to derive an “analytic” posterior assuming

that low count Poisson data are Gaussian; something that is simply

not true. But, as always, throw enough fancy statistics buzzwords in

and no one will notice. But the real problem with the use of the work

of Vallincourt for x-ray data is that Vallincourt derived a

**posterior** for the measurements. Assuming a prior and Gaussian

likelihood, the Rice distribution was derived. But the works above,

and these here use the Rice distribution as the likehood!

They place

further priors on their parameters, do things like background subtract

Poisson data, and run some form of random number generator to obtain

trace plots. It is difficult to understand why these approaches are

wrong as they are simply made up via some form of heuristic intuition

and missing the core concepts of what a Bayesian analysis is. But the

words Bayesian are in the papers, so it must be sophisticated, right?

As a side note, we attempted to solve this issue here by deriving

the likelihood, a conditional Poisson likelihood, and performing Bayesian

estimation of the latent parameters. We, of course, recover the Rice

distribution; because that is the posterior.

And yet, this week we had a new heuristic approach to Bayesian

estimation for x-ray polarization here. There are many

misconceptions in the work, but we can concentrate on the issues with

statistics. First, there is a claim of doing Bayesian estimation to

derive **confidence intervals**. The fun part is the appendix on how the

Monte Carlo was performed. It appears the authors drew random values

of the **data** from Gaussian distributions (yes, it is Poisson data),

then perform a least-squares fit to this randomly drawn data for the

parameter values… all stop… what the <insert favorite Bayesian

buzzword here>. Thus, we have a long way to go in communicating how

perform Bayesian analysis. Even if you can download every tested

sampler, easily write a Metropolis-Hastings algorithm, or just google

for a few interesting Bayesian blog posts. Intuition is a great tool,

principled Monte Carlo integration techniques are not proven on

intuition alone. When in doubt, consult a statistician.

Still things can get completely out of hand: here is an example where everything from maximum likelihood, MCMC, distribution, etc are misused so badly that the terms become meaningless. However, with enough of the right stats buzzwords, a reviewer that is not an expert might just accept the sophistication of the work based on the language alone. We have to be careful and investigate both the methods we use, the questions they address and the words associated with them. Otherwise, we might as well use chi-by-eye fits as the random BIC numbers being spewed out are just as meaningless.

*Editor’s note [E.C.]: The silly title is entirely my own; ‘polishing the unpolishable’ is a euphemism for an English expression that describes one’s endeavours when fruitlessly trying to fix something that is fundamentally broken, i.e., ‘polishing a turd’.

]]>Heck, if they’d read into the problem from a statistical point of view they might even have learned a bit about how to do subset posteriors better. For instance, they write that the multiplication of these sub-posteriors is only possible if you use uniform priors, and consequently they go to a lot of trouble to choose transformations allowing for construction of entirely uniform priors. But, wait for it, what about this from consensus MC: ?!

]]>On the other hand, I’m confused about the purpose of the more recent arXival, since the author there proposes to throw away the explicit modelling of cross-covariance, instead just using the GP as an interpolator to gap-fill the time-series and register them to a common grid, after which the inconsistent empirical estimator of the lag frequency function is to be applied to draws from the posterior predictive. This is similar to the approach taken by the H0licow team for estimating time delays in their lensing systems, although there they use only the posterior mean GP path. In both cases, it seems strange to not want to finesse estimation of the delay along the bias-variance trade-off using a joint/complete Bayesian model. Especially when we know that posterior functionals of the GP are typically themselves biased and in pathological cases can be disasterous! I also didn’t really understand the point of Section 3 in which different kernels are examined empirically to see which is best at representing the process resulting from a power spectrum of given frequency dependence since we can already just look this up from a table of correspondences (e.g. Table 1 here).

]]>All that to say, there’s no doubt that GAN projects are a good way to take a slice of the funding pie for machine learning, and getting students to work on them will make for some students who are readily employable outside of academia, but I’m yet to see them answer a meaningful science question. They’re a fun idea though, so I’d be happy to eventually be proved wrong on this.

]]>