Importing and analyzing theater plays

case study

In a previous post, we’ve seen how to use Textable for building a corpus of theater plays based on data available from the web. In this post, we’ll use a newly released widget to access the same data in a much easier fashion, and illustrate one way of analyzing it.

The Textable-Prototypes add-on

Textable was initally developed as part of a pedagogical innovation project at the University of Lausanne and it has been used for teaching Humanities computing there since 2012. Textable-Prototypes is a companion add-on collecting widgets created at Unil by teachers or students. It has just been adapted for Orange 3 by Aris Xanthos, and Textable 3 users gain access to a new widget named Theatre Classique by simply installing Textable-Prototypes using Orange’s Options > Add-ons menu.

Building a corpus with Theatre Classique

The new widget offers a straightforward way of importing theater plays from the Théâtre Classique website. This project offers a free, open access to nearly 1000 theater plays in French, in a richly annotated format (XML-TEI). Retrieving part of these data is as easy as creating an instance of the Theatre Classique widget and selecting the plays you’re interested in, possibly sorted by author, genre or year:

In the above example, the widget is being used with Advanced settings activated, so that a search criterion can be used. In this case, the criterion is the author, and a set of plays by Jean Racine have been selected and imported. Connecting the Theatre classique widget with an instance of Display enables us to view the metadata automatically associated with each play in the form of annotations:

An example analysis

As an illustration of how to analyze of these richly annotated texts, the following workflow gathers a set of 32 plays by Pierre Corneille, segments them into about 820’000 words, and computes the average number of distinct word types in 100 subsamples of 1000 word tokens drawn at random (without replacement) from each play:

The resulting box plots reveal interesting constrasts in the lexical richness of the genres represented in this corpus.

The reader interested in replicating this analysis is invited to download the workflow (after installing the Textable-Prototypes add-on) and further experiment with it, for instance by importing plays from other authors (note that it can take a few minutes to retrieve the data from the web and run the calculations): Lexical richness in Corneille's plays (44 downloads)


Get email updates

Enter your email to be informed when new recipes, case studies or software updates are made available.