European Literary Text Collection (ELTeC)

European Literary Text Collection (ELTeC)

Distant Reading for European Literary History (COST Action CA16204) is a project aiming to create a vibrant and diverse network of researchers jointly developing the resources and methods necessary to change the way European literary history is written. Grounded in the Distant Reading paradigm (i.e. using computational methods of analysis for large collections of literary texts), the Action will create a shared theoretical and practical framework to enable innovative, sophisticated, data-driven, computational methods of literary text analysis across at least 10 European languages. Fostering insight into cross-national, large-scale patterns and evolutions across European literary traditions, the Action will facilitate the creation of a broader, more inclusive and better-grounded account of European literary history and cultural identity. [more...]

Aggregation 1–16 of 16

European Literary Text Collection (ELTeC) in TextGrid Repository

This is the project site of the European Literary Text Collection (ELTeC) in the TextGrid Repository. The goal of adding the ELTeC to TextGrid Repository is to publish and archive this valuable set of corpora in European languages and combine them with the technical possibilites that TextGrid Repository offers. Below, we list some of the possibilities that TextGrid Repository facilitates to researchers and readers who are interested in the ELTeC. Currently, we have imported 15 corpora of the ELTeC.

Browsing the ELTeC in TextGrid Repository

Here we present some possibilities of how to browse the ELTeC in TextGrid Repository:

In all these cases, you can add further filters with the facets on the left.

Corpora and Languages

Here are links to the subcorpus for each language:

Filtering through Specific Metadata of the ELTeC (Facets)

Because some specific metadata fields are relevant for the composition of the ELTeC, these have been incorporated as new searchable metadata and facets to TextGrid Repository. These facets can be used by selecting them in the menu on the left of the results page of the project, or use them in a query. Here we present some possible queries specific for the ELTeC:

Of course, queries combining these facets are possible and they can be combined with fulltext queries, such as:

For further information about querying TextGrid Repository, consider the documentation.

Basic Classification

The Basic Classification is a library classification system originally developed in the Netherlands and is similar to other systems such as the Dewey Decimal Classification (DDC) or the Regensburger Verbundklassifikation (RVK). It is used in several library networks in the German-speaking area, where it is one of the most widely used classification systems. In contrast to other library classification systems, it has a small number of classes (about 2,000) and is freely available and published as LOD resource. For more information, see the Wiki of the K10plus or its BARTOC entry.

With the integration of the ELTeC corpora, TextGrid now supports Basic Classification in several ways. First, the classes are displayed in the left menu when selecting a project that has assigned them. Second, they can be used for queries, both simple and complex. For example, we can select a class such as 18.37 Portuguese Literature and use the following query work.subject.id.value:18.37. Of course, that would be the same as using the metadata about the language. The benefit of the Basic Classification comes with its hierarchical structure, which allows for example to query all corpora of different language groups. Here are some examples:

These queries can be combined with further possibilities presented before.

Benefits of ELTeC in TextGrid Repository

The ELTeC corpora are already available as GitHub repositories and in Zenodo. So, what is the motivation to publish it also in TextGrid Repository? In our opinion, TextGrid Repository can offer a series of advantages to the ELTeC and its community of users:

  1. Long-term archive: TextGrid Repository is a long-term repository awarded with the CoreTrustSeal
  2. Findability through harvesting: By including the ELTeC editions in TextGrid Repository, these texts can be found in further platforms. Aggregators or registries like re3data, OpenAIRE, VLO (CLARIN Virtual Language Observatory) or DARIAH Collection Registry harvest the information of the TextGrid Repository. The corpora of ELTeC will become more visible and easier to find for interested scholars
  3. Identification: TextGrid Repository assigns persistent identifier to all corpora, works and editions of the ELTeC
  4. Integration: in TextGrid Repository, the ELTeC is integrated in one of the largest literary corpus openly available
  5. Queries using TextGrid metadata: users can query the corpora using the metadata organized in TextGrid's work, edition, and text objects
  6. Queries using project-specific metadata: users can query the corpora using project-specific metadata; in the case of ELTeC, this could be the gender of the author, the period of publication, or the size of the text
  7. Queries using library classes: users can query the corpora using the library classification system Basic Classification, as they would do in a classical library catalog, e.g. query only specific language groups
  8. Full text queries: users can also search for words or phrases in the texts
  9. Combined queries: users can combine different types of queries into a single complex query
  10. Combination with other corpora: users can combine easily some texts of the ELTeC with other corpora, for example filtering the entire TextGrid Repository by language or year of publication
  11. Shelf function: TextGrid Repository offer the shelf function, with any user can combine
  12. Publication in HTML: in contrast to other platforms, the TEI files are also published as HTML, enabling search engines to find them easily
  13. Transformation: Besides the HTML format, all texts in TextGrid Repository are authomatically transformed in other formats (zip, ePUB, plaintext)
  14. Analysis: TextGrid allows the sending single texts or entire corpora to Natural Language Processing (via Switchboard) and Digital Humanities tools (Voyant)
  15. Integration in the NFDI Consortium Text+ Portfolio: TextGrid Repository is part of the services of the Consortium Text+ as part of the German National Strategy of Research Data
  16. Integration in future services: TextGrid Repository is further developed in association with several ongoing projects. With its integration, the ELTeC will profit from future features and development, such as the currently in development Python library

TextGrid Metadata Files

The basic metadata is covered by the TextGrid Metadata schema in Edition and Work metadata, all additional project specific metadata is covered by the metadata added to the works. Please see the following examples from the Digital Library:

Citation Suggestion

To cite each corpus, please, click on them in the previous links, you will find a citation suggestion at the bottom of the page. To cite all ELTeC corpora in TextGrid Repository, we suggest following reference: