CalDraCor

CalDraCor

The 205 dramas comprised in the Calderón Drama Corpus were converted to TEI in a joint effort headed by Prof. Dr. Hanno Ehrlicher and coordinated by Dr. Jörg Lehmann at the University of Tübingen, Institute of Romance Languages and Literatures, and by the research group coordinated by Dr. Simon Kroll at the University of Vienna, Institute of Romance Studies. [more...]

Aggregation 1–2 of 2
  1. text/tg.collection+tg.aggregation+xml
  2. text/tg.collection+tg.aggregation+xml

Calderón Drama Corpus (CalDraCor)

The Calderón Drama Corpus (CalDraCor) is a collection of Spanish Golden Age plays by Pedro Calderón de la Barca prepared for computational analysis. It comprises 205 TEI-encoded plays (comedias and autos sacramentales, among other dramatic genres) and reaches almost three million words.

As part of the DraCor infrastructure for programmable drama corpora, CalDraCor allows researchers to study Calderón’s oeuvre not only as literature, but also as a complex system of characters, dialogues, and linguistic patterns, and to compare his plays with other corpora hosted on the same platform.

The corpus has been developed collaboratively since 2019 by teams at the University of Tübingen and the University of Vienna, later joined by colleagues from the University of Stuttgart. The Tübingen team (led by Hanno Ehrlicher and coordinated by Jörg Lehmann) contributed 129 TEI-encoded comedias, the Vienna team (led by Simon Kroll) added 64 plays, and 12 additional plays were generated automatically. Since 2023, Antonio Rojas Castro has been responsible for the curation of the corpus, in particular for the annotation of characters with <trait> and for the scene segmentation encoded as <div type="scene">.

CalDraCor is distributed under a Creative Commons CC0 1.0 licence and is also available via DraCor and Zenodo, where the latest citable version can be accessed.


Overview

At present, CalDraCor consists of 205 dramatic works encoded in TEI P5, with an average length of about 13,840 words per play. The corpus is text-centric: each play is stored in its own TEI file rather than being wrapped in a single <teiCorpus> container.

Chronologically, the corpus covers plays composed between 1622 and 1675, with a concentration of works in the mid-1630s (notably 1635–1636) and a decline in production after 1650. In a few cases, CalDraCor includes multiple versions or generic transpositions of the same subject (for instance, La vida es sueño as comedia and as auto sacramental), which makes the corpus suitable for studying rewriting and creative genesis.

From the perspective of dramatic genre, CalDraCor follows the taxonomy proposed by Simon Kroll. Autos sacramentales dominate the corpus (88 plays, 42.93%), followed by “comedia cómica” (44 plays, 21.46%). Other genres (mythological, religious, historical, tragic / honour plays, novelistic / chivalric, and zarzuela) appear with lower frequencies. More than 60% of the corpus is concentrated in just two genres, while the remaining genres appear in smaller but still significant proportions.

The corpus encodes more than 3,400 characters, with an average of about 15 characters per play. Most are individual characters (over 3,100), while a smaller subset are collective personae. Each character is assigned a unique identifier and a name as it appears in the play, and, where possible, a sex value (MALE, FEMALE, UNKNOWN) and a role label such as galán, dama or gracioso via <trait>.

Structurally, CalDraCor reveals a relatively stable dramatic architecture. On average, each play has:

  • 50.39 scenes
  • 2,744 verse lines
  • 592.36 character speeches
  • about 117 stage directions

This segmentation is consistently encoded in TEI and supports quantitative approaches such as network analysis, character-centric studies, and rhythm or pacing analyses.


Image credit: Portrait of Pedro Calderón de la Barca, engraved by Pedro de Villafranca y Malagón. Madrid, Imprenta Imperial de Joseph Fernández de Buendía, 1676. Copper engraving, 188 × 130 mm. Public domain.