People tell me that, broadly speaking, the difference between “machine learning” and “statistics” is that the former concerns itself with learning highly complex functional relationships in the high signal-to-noise regime, while the latter concerns itself with estimating simple functional relationships in the low signal-to-noise regime. (I believe this was on slide in Geoffrey Hinton’s talk at the Oxford Martin school). For instance, learning to assign text descriptions to photographs given a huge library of images with somewhat reliable meta-data would call for a machine learning algorithm (like ANNs with deep-learning). Conversely, estimating the relative risk of stomach ulcers as a side-effect of a new headache pill given only a handful of uncertain results from heterogeneous studies would call for a statistical approach (e.g. a Bayesian meta-analysis).
Now, if one can allow such a broad-brush classification one can imagine my surprise at seeing logistic regression (at face value, statistical) lumped together with support vector machines (intermediate) and random forests (machine learning) in a recent arXival by de los Rios et al.. Especially when there’s no detailed explanation of how the logistic regression model (for identification of merging clusters) was put together, and certainly no indication that terms other than linear were considered nor of any model discrimination / regularization constraints, such as would be required to push ordinary logistic regression into the machine learning regime. Indeed, all we are told is that the three off-the-shelf algorithms were tried in R and only random forests did anything. Looking at the ROC curve in their Fig 1 makes a stronger point: random forests was the only algorithm doing better than random guessing! For the referee this should have been a instant red flag pointing to either lazy or naive application of what are not fully automatic machine learning tools.
Compare this to de Souza et al. where we use only logistic regression but with a non-trivial set of candidate models including polynomial terms & interactions between coefficients with an information criterion (not shown) computed for model selection. To wit, it’s not the flavour of your algorithm it’s what you do with it!