I’m posting below the remarks from my part of a presentation given together with Anna Kijas, Senior Digital Scholarship Librarian (Boston College Libraries), at the joint International Association of Music Libraries/International Musicological Society Congress in New York, held June 21-26, 2015. Her talk and slides are available here. In our presentation, entitled “Digital Madeleines and Breadcrumbs: Discovering the Musical Past through Multimodal Analyses,” we spoke about the process involved in creating our digital projects and reflected on the similarities and differences with print scholarship.


I will be presenting some preliminary analysis I’ve done on an opera libretto written by Felice Romani (1788-1865), and its source play by Honoré-Antoine Richaud Martelly (1751-1817). Romani wrote I due Figaro in 1820 for a new production with composer Michele Carafa at La Scala of Milan. For this assignment, he chose to adapt what at the time was an immensely popular play, Martelly’s Les deux Figaro, which first appeared on the Parisian stage in 1794 during the French Revolution, and achieved a long-lived success throughout the European continent for the next fifty or so years. Martelly himself had wished to capitalize on the fame of Beaumarchais’s protagonist with this spurious third installment in which the wily servant Figaro gets his just deserts at the hands of his noble masters.

Although I originally wrote an analysis of Romani’s libretto and its source play when I was a grad student enrolled in Dr. Nardini’s music research methods class of the Butler School of Music (UT Austin), it has been on my mind for some time to explore ways of visualizing the process by which a librettist takes a stage play and adapts it for the lyrical stage. Since I have done a close reading, and am now experimenting with digital analysis techniques, I can offer an exploratory report on my findings, as well as remark on some of the differences in approach and process that both techniques entail.

Hermeneutics vs. sandwich making

One of the most difficult aspects about this project was the need to reframe my research questions once I was forced to acknowledge the messy state of my data and the limited resources at my disposal to whip it into the sort of shape where certain data analysis techniques would become available to me. I had initially hoped to perform analyses of the sort where I could extract what Figaro says in the French play as well as in the Italian libretto and run algorithms to detect the most commonly used words, the most frequently mentioned persons, and the quantity of speech he utters in each version, among other things. For this sort of analysis, I needed structured data of the type provided by TEI, only that wasn’t available to me. Not only that, there wasn’t even a digital text for either the French play or the Italian libretto. What I did have were two scans of 18th and 19th century editions, with text layers provided by OCR that included text, certainly. But also errors, extraneous characters and hashbangs of the sort familiar to anyone who wrestles with OCR algorithms and historical texts. In short, text analysis would not work for me for such a small corpus, at least not until I could get clean text transcriptions and perhaps structured TEI markup.

So what was left? I did have a couple of ideas that originated from my close reading of the texts that I wished to explore as visualizations. One had to do with the gender balance between the French and Italian texts. The second was a related question about the power and class dynamic among the characters of both texts. The major shifts I observed from reading Romani’s adaptation and Martelly’s play were:

  1. A displacement of the conflict between Figaro and Cherubino, the play’s eponymous Figaros, in favor of the love story between Cherubino and Inez, the daughter of the Count and Countess.
  2. A heightening of the source play’s anti-republican message, achieved by ennobling the role of the Count, who is a scoundrel in the French play, and recasting Susanna from a clever but essentially powerless lady’s maid, to a scheming, Commedia-style Colombina, or tricky slave.

After tabling the issue of text analysis, I decided that I could proceed with modeling the character interactions in each text. To do so, I turned to social network theory as a possible interpretive framework. Applying the proper vocabulary in this case, the characters become vertices or nodes, and the relationships between them are the ties or edges. I wished to create a weighted network graph, which means the edges have a “strength,” e.g. Figaro has greater number of character interactions than Susanna, therefore his node is sized larger; Figaro talks to the Count more than he talks to Susanna, therefore the edge between Figaro and the Count is thicker. Next came the question of how to qualify and count those character interactions in both texts.

Amanda Visconti and Marten Düring have shared their coding schemes for James Joyce’s Ulysses and Ralph Neumann’s autobiography, respectively, in which the ties are weighted by the type of interaction. For dramatic texts, I decided that I could forgo some of this complexity, given that one of the most basic ties was also the most commonly occurring: one character speaking directly to another. In this first pass of the data, I have recorded when one character speaks to another on a per-scene basis. For example, if Figaro addresses Susanna once or twelve times in a scene, it still gets counted only once. Soliloquies and asides are recorded as self-closing loops; in other words, as the character talking to himself. This method has the advantage of expediency, and the shortcoming of being rather simplistic. Revising with text weight could yield a more accurate model, although the soliloquies will introduce a counting problem. For instance, a one-line interjection can be interpreted as the character talking to himself once, but what to make of a three hundred word soliloquy in prose, or an eleven-line aria in rhyming couplets? What should be the unit of speech, and does it matter if it is different for the prose play versus the lyric libretto?

These sorts of questions get to the heart of a problem described at length in an essay by Franco Moretti entitled Operationalizing. Briefly, operationalizing describes the process of taking a question and devising some sort of measurement to arrive at a plausible answer to that question. Frequently, there are many possible measurements, and the selection of this or that measurement will depend on the research question, the availability and pliability of the data, as well as the interests of the researcher. I confess that when I started, I had thought that making a network graph would be something like what Jer Thorp describes in a blog post called Visualization as Process, which is to say I thought I’d be making a roast beef sandwich. Sure, there might be small refinements in the selection of this condiment or that bread, but the end result would still basically be a roast beef sandwich. Alternatively, many researchers argue that the real value to be had from visualization is the insight gained during the iterative process of developing a data model in response to a question and interacting with that data model within the chosen analysis framework. The output—the visualization—is the result of a series of decisions about what and how to measure, and each of those decisions can result in a radically different, highly subjective output. The visualization, in short, is only one of many possible interpretations. And the end result may simply be more (or better) questions… which makes for a rather unpredictable sandwich.

Feminizing Figaro (or not)

In shifting the focus of the play from a class-inflected battle of wits to a more conventional love story, Romani used some common strategies, including cuts to the male characters,  the reappropriation of dialogue to build up the lesser character of Inez, and the elimination of the harsher marital reflections of both the Count and Figaro. It had seemed to me from reading the text that the female characters enjoyed a much more prominent role in the Italian version than in the French, largely because Romani was under the obligation of providing the text of an aria for each major vocal part, not to mention a love duet.

Breakdown of character gender by scene in Martelly's play (left) and Romani's libretto (right). Made using the Treemap chart in RAW.

Breakdown of character gender by scene in Martelly’s play (left) and Romani’s libretto (right). The ladies are in blue; men in red. Made using the Treemap chart in RAW.

The above attempt to visualize the gender balance by scene yielded somewhat unexpected results. While it is clear that there are many more all-male scenes in the French play as compared to the Italian libretto (each floating mono- or dichromatic block represents a scene), there doesn’t appear to be much of an observable difference in gender count across the two texts. Romani’s libretto has a slightly higher number of female characters appearing per scene (31% when the number of female character appearances by scene are divided by the total number of appearances of characters of either gender) compared to Martelly’s play (28%), but it’s not a particularly impactful difference. It is also notable from this visualization that the women in Martelly’s play have several scenes to themselves (without men), whereas in Romani’s libretto they almost never appear without male characters. As far as the reader’s perception is concerned, it could in fact be a decisive factor that so many scenes in Martelly’s play are of men only. But at five acts and 77 scenes, Martelly’s play is also much longer than Romani’s adaptation, which is comprised of only two acts of 36 scenes.

Character networks

The following two visualizations are representations of the betweenness centrality of the characters of the source play and the libretto (this Wikipedia article provides a good description of the different measures of centrality). Roughly speaking, betweenness centrality seeks to measure the shortest paths from all nodes to all the others that pass through that node. By this measure, and with the character appearances per scene data that I compiled, Figaro, the Count and Cherubino are the most central characters of the French play, which corroborates my impressions from reading. By contrast, it is immediately observable that the Italian libretto is more of an ensemble piece. Susanna shows a much higher degree of centrality; Cherubino, a lesser degree. And the invented role of the chorus (vassalli, villanelle, paesani) also assumes a prominent position in the Italian text.

Here too there were surprises. For example, Susanna’s centrality in the French play was much smaller than anticipated, since in both texts it had seemed that she was an important bridge between the two groups of schemers: Figaro and Don Alvaro on the one side, and the Countess, Inez, Cherubino and herself on the other. It could be that my operationalization of the variable of character interaction is at fault. On the other, it could also be that Cherubino more effectively performs this bridge role in the French play, whereas Susanna takes it over in the Italian libretto. In this context, her relative isolation in Martelly’s text could be interpreted as a measure of her powerlessness to counteract Figaro’s immoral plot. Similarly, Cherubino’s diminished centrality in the Italian text allows him to get closer to Inez, and thereby create the love story that receives such scant attention in the French play.


My analyses of these two texts are still very much in process, and I may decide that the way in which I have operationalized my questions doesn’t work and find new ways. That said, the process so far has yielded many refinements to my original questions, and more approaches than I had thought possible.

Devising even a rudimentary coding scheme in response to my questions has prompted very careful, and repeated reading, such that I can now say I have a deeper knowledge of the texts. Ambiguities had to be resolved, or discarded, for the sake of developing a rigorously consistent model. I had to decide which character attributes were important to track. For now, it is gender. But going forward, it could also be class and community. Developing and refining this coding scheme, and interacting with the resulting data in the analysis tool, forced me to question my own assumptions and verify impressions from reading the texts.

Working through problem of representing gender and character interaction across two related texts has of course been time consuming, but it has also been a productive exercise in investigating my own biases and understandings. In some cases, my impressions of the texts were confirmed, but in just as many, I have been surprised. Needless to say, the visualizations do not supersede the need of careful reading, since a solid understanding of the underlying data is necessary to interpret the output. But as a complement to close reading, the process of visualization is an informative, not to mention fun, way of analyzing musical and dramatic texts.