Skip to main content
Språkbanken Text is a department within Språkbanken.

OpenEDGeS

Citation Information

Coussé, Evie, Dijkstra, Trude, & van der Sijs, Nicoline (2024). OpenEDGeS (updated: 2024-01-25). [Data set]. Språkbanken Text. https://doi.org/10.23695/fhat-dd64
BibTeX Additional ways to cite the dataset.
The public license subset of the EDGeS Diachronic Bible Corpus, a diachronically and synchronically parallel corpus of Bible translations in Dutch,English, German and Swedish, with texts from the 14th century until today.

The EDGeS Diachronic Bible Corpus is a diachronically and synchronically parallel corpus of Bible translations in Dutch, English, German and Swedish, with texts from the 14th century until today. The wish list underlying the final compilation of 36 Bibles was: they should be a) first editions of complete Bible translations, and not modernizations; and b) translations in a narrow sense (not: harmonies, paraphrases, rhyming Bibles, etc.) into a language variety that was current at the time of publication. Furthermore they c) must have made a historical impact, typically through wide dissemination; and d) their text must be available electronically, with a traceable link to the original. The documentation included in the archive and the related publication give information about the extent these ideals were met for the final selection.

For all Bibles, we include at least the New Testament, and for most we also have the Old Testament. A smaller number of Bibles have Apocryphal books.

The parallel Bible texts were split into book-chapter-verse segments, and automatically aligned at verse level, using the contemporary Dutch Nieuwe Bijbelvertaling as a pivot. We also have compiled meta-information for each of the Bibles.

OpenEDGeS is the open subpart of EDGeS, distributed under a public license, made up of 31 historical Bibles.

Annotation

Paralleltexterna kommer i två format: a) som textfil som innehåller hela bibeltexterna, en vers per rad, utan rubriker eller versnummer, och b) som en samling tsv-filer, en fil per bibelbok, där rubriker och bok-kapittel-vers (bkv) identiferare finns med. Linjeringsinformationen levereras som tsv-filer, där varje rad innehåller linjerade bkv-identifierare. För att lösa problemet att olika biblar kan organisera texten på olika sätt har vi ompaketerat enstaka böcker i såkallade virtuella böcker, så att de följer pivån Nieuwe Bijbelvertalings indelning, innan linjering. Denna omorganisering är en tilläggsannotation och länken till ursprungsindelningen är helt transparant. De virtuella bibelböckerna finns med i resursen för att underlätta eventuella framtida linjeringsinsatser, som tsv-filer.

References

  • Gerlof Bouma, Evie Coussé, Trude Dijkstra, Nicoline van der Sijs (2020): The EDGeS Diachronic Bible Corpus, in Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020), May 11-16, 2020, Marseille, France BibTeX

File Size Modified Licence
OpenEDGeS_v1.01.zip
OpenEDGeS_v1.01.zip (zip)
121.17 MB 2024-01-25 CC BY-NC-SA 4.0
attribution, non-commercial, share-alike
72.89 MB 2024-01-25 For license details of the previous versions, see the 'Read me.txt' file in the download.

Type

  • Corpus

Language

Swedish
English
German
Dutch

Size

Tokens: 19,399,149

Keywords

  • Bible text
  • parallel
  • historical
  • English
  • Swedish
  • Dutch
  • German

Creators

  • Coussé, Evie
  • Dijkstra, Trude
  • van der Sijs, Nicoline

Updated

2024-01-25

Contact

Språkbanken
sb-info@svenska.gu.se