$filemod = filemtime(__FILE__);$filemodtime = date(" j M Y", $filemod); ?>
This is the second edition of the Stockholm-Umeå Corpus, Version 2.0, SUC 2.0. The SGML annotated corpus is distributed in 3 formats. The 2.0c version of the corpus has morphosyntactic descriptions in SUC format, while the 2.0d version is in PAROLE format. The third format has SUC morphosyntactic descriptions, and elaborate structural markup but the files lack bibliographic headers. They are SGML-conformant but not TEI-conformant documents, and bibliographic information must be provided in a separate file.
The first edition of the complete Stockholm-Umeå Corpus (SUC 1.0) was distributed in 1997. A subset of the annotated SUC corpus of approximately 300 000 words (swe01), created october 31, 1992, was distributed in 1994 as a part of the ACL European Corpus Initiative.
The number of bytes of the SUC format of the corpus (2.0c) is 59690705. A punctuation is defined as a token that is assigned part of speech F in PAROLE format, respectively MAD, MID or PAD in SUC format. A word is defined as a token that is not a punctuation token.
© 2002 Dept of Lingustics, Stockholm University, and Dept of Linguistics, Umeå University
Responsibility | Responsible |
---|---|
Principal | Eva Ejerhed, Umeå University (UmU) |
Funder | HSFR (Swedish Council for Research in the Humanities and Social Sciences) |
Funder | STU/NUTEK (The Swedish National Board for Industrial and Technical Development) |
Funder | The Faculty of Humanities, Umeå University |
Project management at SU and UmU respectively | Gunnel Källgren, Eva Ejerhed |
Compilation of corpus and text type taxonomy | Gunnel Källgren |
Creation of the SUC tagset for morphosyntactic descriptions | Eva Ejerhed |
Data acquisition and legal agreements | Gunnel Källgren |
English translation of legal agreements | Teresa Bjelkhagen |
Bibliography for SUC texts | Britt Hartmann |
Programming (SU) | Gunnar Eriksson, Sune Magnberg |
Selection of text samples and creation of raw text | Gunnel Källgren, Britt Hartmann |
Preprocessing texts for manual annotation, assigning lexical analyses to word tokens | Eva Ejerhed and Magnus Åström in collaboration with Fred Karlsson, University of Helsinki |
Programming library for SUC format in SUC 1.0, corpus production tools | Magnus Åström |
Manual annotation,morphosyntactic descriptions | SU : Janne Lindberg, Cecilia Lyckow, Ulrika Kvist, Svensson-Lindberg UmU: Joana Arnesson, Eva Ejerhed, Ola Wennstedt, Anna-Lena Wiklund |
Post-processing manually annotated text | Eva Ejerhed, Joana Arnesson, Anna-Lena Wiklund, Åström, Fredrick Backman, Rolf Sandberg |
Preparing SUC 1.0 for distribution | Eva Ejerhed, Magnus Åström, Fredrick Backman, Arnholm, in collaboration with Daniel Ridings and Pernilla Danielsson, Gothenburg University |
Construction of the SGML (TEI) tag set for SUC 2.0 | Gunnar Eriksson, Gunnel Källgren |
DTD for manual SGML-markup of SUC 2.0 | Gunnar Eriksson |
manual SGML markup | Maria Arnstad, Harald Berthelsen, Christina Ericsson, Malin Ericson, Tove Gerholm, Sofia Gustafson-Capkova, Sara Rydin |
Management of hard copies | Britt Hartmann |
Project management in Stockholm 1999-2001 | Benny Brodda, Sofia Gustafson-Capkova |
Artistic design of CD-cover for SUC 2.0 | Ulrika Kvist Darnell |
Preparation and compiling in Corpus Workbench | Sofie Johansson Kokkinakis, Språkbanken, University of Gothenburg |
Web-interface and search routines | Torgny Rasmark |
Uppdaterad . |