All news

Page 1
NoDaLiDa workshop, Online, May 31 2021

For further information, see the workshop website <>

The NLP4CALL workshop is co-located with NoDaLiDa 2021.
Just like the main conference, the workshop will be organized as an online event.


Description of the workshop
The workshop series on Natural Language Processing (NLP) for Computer-Assisted Language Learning (NLP4CALL) is a meeting place for researchers working on the integration of Natural Language Processing and Speech Technologies in CALL systems and exploring the theoretical and methodological issues arising in this connection. The latter includes, among others, insights from Second Language Acquisition (SLA) research, on the one hand, and promote development of “Computational SLA” through setting up Second Language research infrastructure(s), on the other.

The intersection of Natural Language Processing (or Language Technology / Computational Linguistics) and Speech Technology with Computer-Assisted Language Learning (CALL) brings “understanding” of language to CALL tools, thus making CALL intelligent. This fact has given the name for this area of research – Intelligent CALL, ICALL. As the definition suggests, apart from having excellent knowledge of Natural Language Processing and/or Speech Technology, ICALL researchers need good insights into second language acquisition theories and practices, as well as knowledge of second language pedagogy and didactics. This workshop invites therefore a wide range of ICALL-relevant research, including studies where NLP-enriched tools are used for testing SLA and pedagogical theories, and vice versa, where SLA theories, pedagogical practices or empirical data are modeled in ICALL tools.

The NLP4CALL workshop series is aimed at bringing together competences from these areas for sharing experiences and brainstorming around the future of the field.

Important dates
13 January: first call for papers
8 February: second call for papers
1 March: third call for papers
11 March: final call for papers
18 March: paper submission deadline (long, short and demo, research notes)
19 April: notification of acceptance (regular papers)
27 April: notification of acceptance (research notes)
6 May: camera-ready deadline
31 May: workshop date

Workshop website <>

Nu finns en ny version av textanalysverktyget Sparv. Sparv märker upp texter med lingvistisk information och är ett av många språkteknologiska verktyg som utvecklats inom Nationella språkbanken.

Läs hela nyheten på språ

As the COVID-19 virus became a pandemic in March 2020, the amount of (time-stamped written) data, such as news/newspaper reports, scientific articles, social media posts (e.g. blogs and twitter), surveys and other information about the virus and its symptoms, prevention, management and transmission became massively available.

Such data contained both valid and reliable information, and relevant facts from trusted sources and also rumors, conspiracy theories and misinformation from unofficial ones. However, it was not only the amount of (written) data and information …

Continue reading ”A Swedish COVID-19 (sv-COVID-19) corpus and its exploration … smorgasbord”

Artificial intelligence system dealing with (human) natural language rely on language models, predictions of which words occur together.

To better understand how such models work — and where they fail — when applied to Swedish texts we need Swedish test data. A collection of test data addressing various aspects of understanding and generating text allows us to evaluate and compare models. During the autumn of 2020 we have started working on developing evaluation data for Swedish language models at Språkbanken Text. This …
Fortsätt läsa ”The SwedishGLUE project”

Till Språkbanksbloggen
25-27 november gick den åttonde upplagan av SLTC, Swedish Language Technology Conference, av stapeln på Humanisten här i Göteborg.

Eller, skulle ha gjort om inte ett visst virus satte stopp för det. Istället fick vi som alla andra ställa om till en helt digital utgåva, men det funkade det med. Vi fick ett rekord i antalet registreringar: 193 deltagare från 34 olika länder! (Majoriteten, 60%, kom dock från Sverige). Inte alla dök förstås upp – dels var registreringen gratis, och dels var …
Fortsätt läsa ”Reflektioner från SLTC 2020”

Till Språkbanksbloggen
We at Språkbanken Text have just released a new corpus of native (L1) and non-native (L2) speech in four languages: English, Spanish, French and Italian.

The corpus contains more than 170 million words produced by more than 97 thousand speakers (size varies a lot across the four languages, though). The corpus has been created by scraping WordReference forums, where users discuss various questions about languages. Importantly, every user has to provide their native language, and this information, alongside with the nickname, is …
Fortsätt läsa ”How native and non-native speakers talk to each other”

Till Språkbanksbloggen
Sparv logo

En ny version av Språkbanken Texts korpuspipeline Sparv har nu släppts. I den här versionen har vi skrivit om Sparv från grunden och gjort verktyget mer användarvänligt. Sparv har även fått nya språkmodeller som leder till en bättre ordklasstaggning och dependensparsning. Dessutom kan Sparv nu producera fler exportformat såsom XML, CSV, CoNLL och ordfrekvenslistor. Läs mer om vad som är nytt på

Dokumentationen samt installationsinstruktioner hittar du här:

Källkoden är tillgänglig på

Göteborgs universitet ledigförklarar intermittent anställning som projektassistent (en eller flera) med placering vid Språkbanken Text, institutionen för svenska språket. Läs mer här: 

Projektassistent - Intermittent timanställning (en eller flera)>>

querying a word embedding model

Yesterday, we have released word embedding models trained on our historical newspaper archive, Kubhist 2. Word embedding models represent words using vectors and place them in their semantic neighbourhood such that words that are similar are closer together. Thus they allow to easily look for semantic similarity between words as well as detect relations. We have released diachronic models, one for each 20-year period of Kubhist 2. The interesting feature of diachronic embeddings is that they allow to for the study of a word's semantics over time. The models are currently available on Zenodo, and will be available via the SBX page shortly. A thorough description is soon to appear in the Journal of Open Humanities Data.

Searching for subjunctions in SIC2
SIC2, a small corpus of blogs with gold part-of-speech, morphosyntactic and named-entity tags, as well as basic info about the authors, has been added to Korp.

Any corpus is a welcome addition to Korp, but gold corpora (those where the annotation quality has been manually controlled) are particularly valuable. We have now added SIC2, a slightly modified version of the Stockholm Internet Corpus, originally created by Robert Östling et al. SIC2 is a small corpus of blogs, but it has gold part-of-speech, morphosyntactic and named-entity tags (SUC-style). In addition, basic information about the authors is also available. The corpus is downloadable.

The integration of SIC2 into Korp served also as a test drive for the new version of our annotation pipeline Sparv, to be released very soon.

Page 1