by Jason Eisner (2015)
Everyone should read Leo Breiman’s 2001 article,
Modeling: The Two Cultures. (Summary: Traditional statisticians
start with a distribution. They try to identify the parameters of the
distribution from data that were actually generated from it. Applied
statisticians start with data. They have no idea where their data
really came from and are happy to fit any model that makes good
I think there are currently three cultures of machine
learning. Different people or projects will fall in different places
on this “ML simplex” depending on what they care about most.
They start with something in green
and attempt to get blue as a way of
At the top of the triangle, we have exuberantly rationalist
approaches for when we think we know something about the data (e.g., a
generative story). These “scientific” approaches are not exclusively
Bayesian, but Bayesian ML practitioners cluster up here.
At the right vertex, we have Breiman’s know-nothing
approach—high-capacity models like neural nets, decision
forests, and nonparametrics that will fit anything given enough data.
This is engineering with less science
remarks). Deep learning people cluster here.
Estimators for both of the above approaches usually have to
solve intractable optimization problems. Thus, they fall back on
approximations and get stuck in local maxima, and you don’t
really know what you’re getting.
But in simple settings, the errors of both approaches can be
analyzed. This gratifies the people at the left vertex.
Frequentist statisticians and COLT folks (computational learning
theorists) cluster around that vertex; they try to bound the error.
For my take on the different priorities of frequentists
Finding an ML attack on an applied problem usually involves
combining elements of multiple traditions. It also involves using
various computational tricks (MCMC, variational approximations, convex
relaxations, optimization algorithms, etc.) to try to handle the
maximizations and integrations that are needed for learning and
(I suppose the drawing is mostly about prediction. It omits
reinforcement learning and causal learning. But such problems
involve prediction, so the same competing priorities guide how
practitioners approach problems.)
Update: Mark Tygert alerted me
(1998) gave a similar simplex diagram — Fig. 8, “A
barycentric picture of modern statistical research” — whose
corners were the Bayesian, frequentist and Fisherian
This page online:
Original URL: http://feedproxy.google.com/~r/feedsapi/BwPx/~3/MZpeNiGca1s/ml-simplex.html