The Three Cultures of Machine Learning

by Jason Eisner (2015)

Everyone should read Leo Breiman’s 2001 article,
Modeling: The Two Cultures
. (Summary: Traditional statisticians
start with a distribution. They try to identify the parameters of the
distribution from data that were actually generated from it. Applied
statisticians start with data. They have no idea where their data
really came from and are happy to fit any model that makes good

I think there are currently three cultures of machine
learning. Different people or projects will fall in different places
on this “ML simplex” depending on what they care about most.
They start with something in green
and attempt to get blue as a way of
achieving red.

  • At the top of the triangle, we have exuberantly rationalist
    approaches for when we think we know something about the data (e.g., a
    generative story). These “scientific” approaches are not exclusively
    Bayesian, but Bayesian ML practitioners cluster up here.

  • At the right vertex, we have Breiman’s know-nothing
    approach—high-capacity models like neural nets, decision
    forests, and nonparametrics that will fit anything given enough data.
    This is engineering with less science
    (see these
    ). Deep learning people cluster here.

  • Estimators for both of the above approaches usually have to
    solve intractable optimization problems. Thus, they fall back on
    approximations and get stuck in local maxima, and you don’t
    really know what you’re getting.

    But in simple settings, the errors of both approaches can be
    analyzed. This gratifies the people at the left vertex.
    Frequentist statisticians and COLT folks (computational learning
    theorists) cluster around that vertex; they try to bound the error.
    For my take on the different priorities of frequentists
    and Bayesians,
    see here.

Finding an ML attack on an applied problem usually involves
combining elements of multiple traditions. It also involves using
various computational tricks (MCMC, variational approximations, convex
relaxations, optimization algorithms, etc.) to try to handle the
maximizations and integrations that are needed for learning and

(I suppose the drawing is mostly about prediction. It omits
reinforcement learning and causal learning. But such problems
involve prediction, so the same competing priorities guide how
practitioners approach problems.)

Update: Mark Tygert alerted me
that Efron
gave a similar simplex diagram — Fig. 8, “A
barycentric picture of modern statistical research” — whose
corners were the Bayesian, frequentist and Fisherian

This page online:

Original URL:

Original article

Comments are closed.

Proudly powered by WordPress | Theme: Baskerville 2 by Anders Noren.

Up ↑

%d bloggers like this: