SweLL-pilot

Standardreferens

Elena Volodina, Ildikó Pilán, Ingegerd Enström, Lorena Llozhi, Peter Lundkvist, Gunlög Sundberg, Monica Sandell (2016): SweLL on the rise: Swedish Learner Language corpus for European Reference Level studies, i Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), May 23-28, 2016, Portorož, Slovenia

Datacitering

Volodina, Elena, Pilán, Ildikó, Enström, Ingegerd, Llozhi, Lorena, Lundkvist, Peter, Sundberg, Gunlög, & Sandell, Monica (2016). SweLL-pilot (uppdaterad: 2016-01-01). [Data set]. Bearbetad och distribuerad av Språkbanken. https://doi.org/10.23695/d3bg-8g24

Ytterligare sätt att citera datamängden.

Uppsatser skrivna av vuxenstuderande i svenska, annoterade med CEFR nivåerna (en Europeisk skala med färdighetsnivåer inom språkinlärningen). Uppsatserna samlades under perioden 2006-2015.

SweLL-pilot corpus is a corpus of essays written by adult learners of Swedish. It was collected during the period of 2006-2015, and contains 502 essays that have been anonymized and graded with CEFR labels.

There are three subcorpora in the SweLL-pilot collection:

SpIn - 256 essays collected from Language Introduction course (mid-term exams) for newly arrived refugees. Some of the students are recurrent.
Sw1203 - 141 essays collected from university students in exam setting, most of who wrote three essays each
TISUS - 105 essays written as a part of a Test In Swedish for University Studies. All essays are on the same topic “Stress” and of argumentative genre

Links

Annotation

Essays are manually graded using the CEFR scale by human teachers. In addition, the essays are also linguistically annotated (POS tagging, lemmatization, dependency annotation) using Sparv.

Förbehåll

Data collection is limited to a small geographical area and a short period of time. Although several language backgrounds are represented, the corpus is very unbalanced in this sense and as a consequence not well suited for native language identification tasks.The corpus consists of three subcorpora, each coming from a different source, with different proficiency levels, and with different colleciton periods. While the three subcorpora can be used simultaneously, care should be taken into account to ensure that artifacts from these differences do not leak into models.

Avsedd användning

Automated grading using the CEFR scale, anonymization, second language acquisiton studies

Referenser

[HOW TO CITE 1]: Volodina Elena. (2024) On two SweLL learner corpora–SweLL-pilot and SweLL-gold. In Huminfra Conference, pp. 83-94. https://doi.org/10.3384/ecp205012
[README]: Elena Volodina (2021). https://spraakbanken.github.io/swell-release-v1/Readme-SweLL-pilot
[HOW TO CITE 2]: Mats Wirén, Arild Matsson, Dan Rosén, Elena Volodina (2018): SVALA: Annotation of Second-Language Learner Text Based on Mostly Automatic Alignment of Parallel Corpora, i Selected papers from the CLARIN Annual Conference 2018, Pisa, 8-10 October 2018 / edited by Inguna Skadina, Maria Eskevich

Tillgänglig via

Åtkomst	Plattform	Licens
Application form (https://sunet.artologik.net/gu/swell)		CLARIN-ID, -PRIV, -NORED, -BY (https://www.kielipankki.fi/support/clarin-eula/#res)

Datamängder i samlingen

Antal träffar: 3

Resurs	Typ	Språk	Åtkomst
SpIn v1 256 essays collected from Language Introduction course (mid-term exams) for newly arrived refugees. Some of the students are recurrent.	Korpus	svenska	Utforska i:
SW1203-uppsatser Essays written by L2 Swedish language learners, university courses	Korpus	svenska	Ordstatistik: stats_SW1203.txt.zip 2025-04-22 – 71.13 KB – CC-BY-4.0 Utforska i:
TISUS-texter Essays written by L2 Swedish learners as part of a TISUS exam	Korpus	svenska	Utforska i:

Standardreferens

Datacitering

Links

Annotation

Förbehåll

Avsedd användning

Referenser

Tillgänglig via

Datamängder i samlingen

Del av samling

Typ

Språk

Storlek

Nyckelord

Skapad av

Uppdaterad

Kontakt

DOI