Who Wrote the Anti-Trump New York Times Op-Ed? Using Tidytext (rstat)

Like a lot of people, I was intrigued by “I Am Part of the Resistance Inside the Trump Administration”, an anonymous New York Times op-ed written by a “senior official in the Trump administration”. And like many data scientists, I was curious about what role text mining could play.

This is a useful opportunity to demonstrate how to use the tidytext package that Julia Silge and I developed, and in particular to apply three methods:

Using TF-IDF to find words specific to each document (examined in more detail in Chapter 3 of our book)
Using widyr to compute pairwise cosine similarity
How to make similarity interpretable by breaking it down by word
Since my goal is R education more than it is political analysis, I show all the code in the post.

Even in the less than 24 hours since the article was posted, I’m far from the first to run text analysis on

