I noticed an interesting paper arXived just before Xmas proposing to model stellar spectra as “sparse, data-driven non-Gaussian processes”. The non-Gaussian part is explained to mean that because a prior is placed on the covariance function and various hyper-parameters, when these are marginalised over the resulting posterior distributions for the latent spectral profiles are highly non-Gaussian. I don’t personally care for the use of the term “non-Gaussian process” to describe this situation, since it can confuse thinking around how the hierarchical GP model is behaving in the sense of the hyper-parameters learning a data-adaptive smoothness penalty for the GP which then sets the characteristic performance condition under that penalty and the data (see e.g. my favorite paper of last year). However, in this particular stellar spectra fitting scenario instance I think the proposed model should not even be thought of as having anything to do with a Gaussian process in the first place. The reason being that the model is better classified as a high dimensional Gaussian mixture model.

When we think of a stochastic process we typically think of a potentially infinite collection of random variables indexed by either a countable (discrete time) or uncountable (continuous time) set. Even when we map at pixel resolution with a GP or model a fixed period of financial tick (discrete time) data with one, we can mathematically extend the same model to an arbitrary resolution or project to infinite extent without changing fundamentally the behaviour of the model or its learning rate with respect to the available data. In this paper, the index set is a finite collection of spectral pixels and their covariance function is learned using a Wishart prior from having thousands of multiple instances observed, assuming no wavelength dependent kernel structure. So, although one can technically call this a model based on the Gaussian process, that would only be true in the sense that any model that uses a standard Normal distribution could also be labelled as such.

Aside from enjoyable pedantry, there is actual value in identifying the optimal description of the model class (here: high dimensional Gaussian mixture model) since it allows one to identify statistical theory and existing methods to understand and implement this model class. I must admit that it’s not a topic I have much experience in, but a quick google search returns papers by a number of known experts in high dimensional covariance estimation for instance [1, 2].