Hoppa till huvudinnehåll
Språkbanken Text är en avdelning inom Språkbanken.

SweLL-pilot

Citering Information

Språkbanken Text. SweLL-pilot [Data set]. Språkbanken Text. https://doi.org/10.23695/d3bg-8g24
BibTeX Ytterligare sätt att citera datamängden.
Uppsatser svrivna av vuxenstuderande i svenska, manuellt anonymiserade och annoterade med felkategorier. Korpusen innehåller både originaltexten och en normaliserad version av varje uppsats. Insamlingperiod 2006-2015.

SweLL-pilot corpus is a corpus of essays written by adult learners of Swedish. It was collected during the period of 2006-2015, and contains 502 essays that have been anonymized and graded with CEFR labels.

There are three subcorpora in the SweLL-pilot collection:

  • SpIn - 256 essays collected from Language Introduction course (mid-term exams) for newly arrived refugees. Some of the students are recurrent.
  • Sw1203 - 141 essays collected from university students in exam setting, most of who wrote three essays each
  • TISUS - 105 essays written as a part of a Test In Swedish for University Studies. All essays are on the same topic “Stress” and of argumentative genre

Links

Annotation

Essays are manually transcribed and anonymized. They are also graded using the CEFR scale by human teachers. In addition, the essays are also linguistically annotated (POS tagging, lemmatization, dependency annotation) using Sparv.

Förbehåll

Data collection is limited to a small geographical area and a short period of time. Although several language backgrounds are represented, the corpus is very unbalanced in this sense and as a consequence not well suited for native language identification tasks.The corpus consists of three subcorpora, each coming from a different source, with different proficiency levels, and with different colleciton periods. While the three subcorpora can be used simultaneously, care should be taken into account to ensure that artifacts from these differences do not leak into models.

Avsedd användning

Automated grading using the CEFR scale, anonymization, second language acquisiton studies

Referenser

Tillgänglig via

Typ

  • Korpus
  • Samling

Språk

svenska

Storlek

Resurser: 3

Nyckelord

  • L2 Swedish
  • second language
  • language learning
  • essays

Kontakt

Språkbanken Text
sb-info@svenska.gu.se