The Art of Topic Modeling for Total Beginners

A Tri-Co Seminar in Critical Digital Studies (Spring 2015-Fall 2015)

Call for Applications from Faculty & Staff (Deadline November 26)

(Contact Rachel Sagner Buurma rbuurma1 at swarthmore dot edu)


What is topic modeling?

Topic modeling lets you use ready-made probabilistic algorithms to quickly survey large sets of documents without having to read them all yourself. It can be a useful tool for creating a general impression of a large corpus as a prelude to further research and interpretive work. It can also help answer some specific questions about a large corpus.

This five-meeting workshop and seminar series

1) introduces the concept and teaches the practice of topic modeling and
2) opens a robust, critical, practice-based conversation about the possibilities, limits, and dangers of topic modeling – and of machine learning and computer-assisted humanities methods more broadly.

Participants will attend three meetings in Spring 2015 and two in Fall 2015, including two sessions taught by Dave Mimno ‘99 (assistant professor, Information Science, Cornell University).

How do I apply?

In a one-page statement, please describe your interest in the Seminar and its relation to your pedagogy or research/scholarship. Participants will be expected to attend all sessions.  The group of faculty and staff, which will include participants from Swarthmore, Bryn Mawr and Haverford, will finalize its spring meeting schedule in December.

Who should apply?

Faculty and staff from any division, major, and research background are welcome to apply; our approach and software tilts towards the types of text collections most often used in the humanities and humanistic social sciences fields. We encourage applications by individuals and small groups. We especially encourage projects related to Spring 2015 or AY 2015-16 courses. No previous experience or expertise required. Stipends available for eligible seminar members.

Faculty organizers include: Timothy Burke (History), Rachel Sagner Buurma (English), William Turpin (Classics), and Rich Wicentowski (Computer Science)

Offered by the Tri-Co DH Initiative, with support from the Swarthmore Libraries and ITS.

——-More Information——

What do I need?

Some idea of a corpus of documents of interest to you, along with the beginnings of some questions about their contents. Any topic related to your research or teaching that involves a set of documents that seems large to you could be a good candidate. For example, a full run of a specific newspaper, a set of 5,000 nineteenth-century novels, or all of your teaching notes from the last decade would be good candidates. We will offer assistance in identifying and building your corpus.

Can you give me some examples of how I might use topic modeling in my own teaching and research?

A cultural sociologist interested in adapting Kenneth Burke’s “grammar of motives” to the analysis of a large corpus of U.S. National Security Strategy (NSS) documents could use topic modeling to determine shifts over time in the rhetorical focus of those documents.

  • A historian interested in nineteenth-century newspaper publication could create maps of several regional sets of papers.
  • Teachers interested in gaining another view of their class notes might create models of a decade of his teaching notes from a particular class or set of classes.
  • A classicist working on an interdisciplinary project at the intersection of several fields each involving a rich and extensive scholarly literature might use topic modeling to get one overview of those fields’ intersections to see what might repay detailed reading.  (For a related project in the field of literary studies see here.)
  • A literary critic working on Charles Dickens’s novel Our Mutual Friend might use topic modeling to search for a theme she discovered in her close readings of that novel across Dickens’s entire oeuvre, or across the entire corpus of Victorian novels published in 1865.

I want to know more (note: it is not necessary to read ANY of this in order to apply).

Topic Modeling and Digital Humanities by David Blei in the special topic modeling issue of the Journal of Digital Humanities offers an accessible introduction. In the same issue, you may want to read/skim Megan Brett’s Topic Modeling: A Basic Introduction and then skim through the “applications” essay by Lisa Rhody and others.  After doing this to orient yourself, you may wish to read Ted Underwood’s more technical blog post Topic modeling made just simple enough for a more detailed perspective.


