Skip to main content
Svenska
English
News and events
Research
Data
Analyses
Platforms
FAQ
About us
Contact us
Menu
Breadcrumb
Home
Language resources
Language resources
Language resources
On this page you can browse and search our datasets. Click on a row name to see what files are available for download. You can go directly to the search interface by clicking on the tool logo.
All (1367)
Collections (31)
Corpora (1223)
Lexicons (69)
Training and evaluation data (27)
Models (48)
Title
Free search
Language
- Any -
Swedish
Albanian
Arabic
Belarusian
Blissymbols
Bosnian
Bulgarian
Croatian
Czech
Danish
Dutch
English
Estonian
Faroese
Finland Swedish
Finnish
French
German
Icelandic
Iranian Persian
Italian
Kele (Papua New Guinea)
Kurdish
Latin
Latvian
Lower Sorbian
Macedonian
Modern Greek (1453-)
Multiple languages
Norwegian
Norwegian Bokmål
Old English (ca. 450-1100)
Old High German (ca. 750-1050)
Old Norse
Old Saxon
Polish
Portuguese
Romanian
Russian
Serbian
Slavomolisano
Slovak
Slovenian
Somali
Spanish
Turkish
Turkmen
Ukrainian
Upper Sorbian
Xhosa
Resurs
Typ
Språk
Åtkomst
Dependency parsing model: Stanza
Pretrained models for dependency parsing.
Model
Swedish
Dataset:
synt_stanza_eval.zip
2020-12-09 – 99.05 MB – CC BY 4.0
Dataset:
synt_stanza_full2.zip
2020-12-09 – 99.17 MB – CC BY 4.0
Dataset:
stanza_pretrain.zip
2025-02-20 – 91.7 MB – CC BY 4.0
Collection
Kubord-fasttext
A collection of fasttext models trained on modern newspaper texts from the National Library of Sweden
Model
Swedish
See 12 collected resources
Kubord-fasttext - Aftonbladet 2010–2022 - lemma
Fasttext model trained on Aftonbladet 2010–2022
Model
Swedish
Dataset:
kubord-fasttext-afb-2010-2022-lemma.zip
2024-08-05 – 2.94 GB – CC BY 4.0
Kubord-fasttext - Aftonbladet 2010–2022 - token
Fasttext model trained on Aftonbladet 2010–2022
Model
Swedish
Dataset:
kubord-fasttext-afb-2010-2022-token.zip
2024-06-11 – 3.18 GB – CC BY 4.0
Kubord-fasttext - Aftonbladet 2010–2024 - lemma
Fasttext model trained on Aftonbladet 2010–2024
Model
Swedish
Dataset:
kubord-fasttext-afb-2010-2024-lemma.zip
2025-06-18 – 3 GB – CC BY 4.0
Kubord-fasttext - Aftonbladet 2010–2024 - token
Fasttext model trained on Aftonbladet 2010–2024
Model
Swedish
Dataset:
kubord-fasttext-afb-2010-2024-token.zip
2025-06-18 – 3.17 GB – CC BY 4.0
Kubord-fasttext - Dagens Nyheter 2010–2022 - lemma
Fasttext model trained on Dagens Nyheter 2010–2022
Model
Swedish
Dataset:
kubord-fasttext-dn-2010-2022-lemma.zip
2024-08-05 – 2.81 GB – CC BY 4.0
Kubord-fasttext - Dagens Nyheter 2010–2022 - token
Fasttext model trained on Dagens Nyheter 2010–2022
Model
Swedish
Dataset:
kubord-fasttext-dn-2010-2022-token.zip
2024-06-11 – 3.1 GB – CC BY 4.0
Kubord-fasttext - Dagens Nyheter 2010–2024 - lemma
Fasttext model trained on Dagens Nyheter 2010–2024
Model
Swedish
Dataset:
kubord-fasttext-dn-2010-2024-lemma.zip
2025-06-18 – 2.9 GB – CC BY 4.0
Kubord-fasttext - Dagens Nyheter 2010–2024 - token
Fasttext model trained on Dagens Nyheter 2010–2024
Model
Swedish
Dataset:
kubord-fasttext-dn-2010-2024-token.zip
2025-06-18 – 3.1 GB – CC BY 4.0
Kubord-fasttext - Göteborgsposten 2013–2022 - lemma
Fasttext model trained on Göteborgsposten 2013–2022
Model
Swedish
Dataset:
kubord-fasttext-gp-2013-2022-lemma.zip
2024-08-05 – 2.69 GB – CC BY 4.0
Kubord-fasttext - Göteborgsposten 2013–2022 - token
Fasttext model trained on Göteborgsposten 2013–2022
Model
Swedish
Dataset:
kubord-fasttext-gp-2013-2022-token.zip
2024-06-11 – 2.84 GB – CC BY 4.0
Kubord-fasttext - Göteborgsposten 2013–2024 - lemma
Fasttext model trained on Göteborgsposten 2013–2024
Model
Swedish
Dataset:
kubord-fasttext-gp-2013-2024-lemma.zip
2025-06-18 – 2.74 GB – CC BY 4.0
Kubord-fasttext - Göteborgsposten 2013–2024 - token
Fasttext model trained on Göteborgsposten 2013–2024
Model
Swedish
Dataset:
kubord-fasttext-gp-2013-2024-token.zip
2025-06-18 – 2.89 GB – CC BY 4.0
Lemmatization model: Stanza
Pretrained model for lemmatization.
Model
Swedish
Dataset:
lem_stanza.zip
2020-11-19 – 3.74 MB – CC BY 4.0
POS-tagging model: Flair
Pretrained models for POS-tagging.
Model
Swedish
Dataset:
flair_eval.zip
2020-06-18 – 1.37 GB – CC BY 4.0
Dataset:
flair_full.zip
2020-06-18 – 1.37 GB – CC BY 4.0
POS-tagging model: Marmot
Pretrained models for POS-tagging.
Model
Swedish
Dataset:
marmot_eval.marmot
2020-06-29 – 108.59 MB – CC BY 4.0
Dataset:
marmot_full.marmot
2020-06-29 – 113.41 MB – CC BY 4.0
Dataset:
saldo_marmot.txt
2020-06-29 – 46.33 MB – CC BY 4.0
POS-tagging model: Stanza
Pretrained models for POS-tagging.
Model
Swedish
Dataset:
morph_stanza_eval.zip
2020-12-09 – 19.94 MB – CC BY 4.0
Dataset:
morph_stanza_full2.zip
2020-12-09 – 20.19 MB – CC BY 4.0
Dataset:
stanza_pretrain.zip
2025-02-20 – 91.7 MB – CC BY 4.0
Pretrained embeddings
A list of pretrained embeddings for Swedish
Model
Swedish
sbx/KB-bert-base-swedish-cased_PI-detection-basic
En modell baserad på KB/bert-base-swedish-cased tränad med syfte att upptäcka personliga uppgifter, särskilt i studentuppsatser.
Model
Swedish
Dataset:
KB-bert-base-swedish-cased_PI-detection-basic
109.41 KB – GPL-3.0
sbx/KB-bert-base-swedish-cased_PI-detection-basic-iob
En modell baserad på KB/bert-base-swedish-cased tränad med syfte att upptäcka personliga uppgifter, särskilt i studentuppsatser.
Model
Swedish
Dataset:
KB-bert-base-swedish-cased_PI-detection-basic-iob
109.62 KB – GPL-3.0
sbx/KB-bert-base-swedish-cased_PI-detection-detailed
En modell baserad på KB/bert-base-swedish-cased tränad med syfte att upptäcka personliga uppgifter, särskilt i studentuppsatser.
Model
Swedish
Dataset:
KB-bert-base-swedish-cased_PI-detection-detailed
109.91 KB – GPL-3.0
sbx/KB-bert-base-swedish-cased_PI-detection-detailed-iob
En modell baserad på KB/bert-base-swedish-cased tränad med syfte att upptäcka personliga uppgifter, särskilt i studentuppsatser.
Model
Swedish
Dataset:
KB-bert-base-swedish-cased_PI-detection-detailed-iob
110.07 KB – GPL-3.0
sbx/KB-bert-base-swedish-cased_PI-detection-general
En modell baserad på KB/bert-base-swedish-cased tränad med syfte att upptäcka personliga uppgifter, särskilt i studentuppsatser.
Model
Swedish
Dataset:
KB-bert-base-swedish-cased_PI-detection-general
109.58 KB – GPL-3.0
sbx/KB-bert-base-swedish-cased_PI-detection-general-iob
En modell baserad på KB/bert-base-swedish-cased tränad med syfte att upptäcka personliga uppgifter, särskilt i studentuppsatser.
Model
Swedish
Dataset:
KB-bert-base-swedish-cased_PI-detection-general-iob
109.83 KB – GPL-3.0
Swedish Diachronic Word Embeddings
Swedish Diachronic Word Embedding Models Trained on Historical Newspaper Data
Model
Swedish
Dataset:
HENGCHEN-TAHMASEBI_-_2020_-_Kubhist2_diachronic_embeddings.zip
2024-01-25 – 15.13 GB – CC BY 4.0
Word Embeddings trained on English Wikipedia
Word Embeddings trained on English Wikipedia
Model
English
Dataset:
wiki_300_5_word2vec.model
2024-01-25 – 112.01 MB – CC BY 4.0
Dataset:
wiki_300_5_word2vec.model.syn1neg.npy
2024-01-25 – 3.75 GB – CC BY 4.0
Dataset:
wiki_300_5_word2vec.model.wv.vectors.npy
2024-01-25 – 3.75 GB – CC BY 4.0
Dataset:
wiki_300_50_word2vec.model
2024-01-25 – 28.04 MB – CC BY 4.0
Dataset:
wiki_300_50_word2vec.model.syn1neg.npy
2024-01-25 – 949.26 MB – CC BY 4.0
Dataset:
wiki_300_50_word2vec.model.wv.vectors.npy
2024-01-25 – 949.26 MB – CC BY 4.0
News and events
News archive
Blog
Calendar
Conferences and workshops
CLT retreat 2020
AI Trust workshop
Autumn Workshop
Höstworkshop 2025
Höstworkshop 2024
Höstworkshop 2023
Höstworkshop 2022
Höstworkshop 2021
Autumn Workshop 2020
Autumn Workshop 2011 and Korp-release
Autumn Workshop 2012
Autumn Workshop 2013
Autumn Workshop 2014
Autumn Workshop 2015
Autumn Workshop 2016
Autumn Workshop 2017
Autumn Workshop 2018
Autumn Workshop 2019
Språkbanken 40 years
CDLC workshop
CLT workshop Spring 2023
EACL 2014
Korp Workshop
Korp Workshop 2014
Korpworkshop 2018
NoDaLiDa 2017
RESOURCEFUL
SLTC 2020
Programme
Instructions
People
Support
Call for papers
Sustainable language representations
Position statements
Workshop on Profiling second language vocabulary and grammar - 2023
Research
Publications
Doktorandutbildning
For PhD students and supervisors
Research meetings
Data
Analyses
Platforms
Korp
User manual
Web API
Distribution and development
Corpus statistics
Sentence sets
Karp
Web API
Sparv
Web Sparv - User Manual
Web service (API)
Web Sparv - Technical Documentation
Mink
User manual
Tutorial
Video: Overview (sv)
Web API
Privacy and data policy
Strix
Lärka
Other tools
Catta
IT-baserad grammatikinlärning
FAQ
About us
Staff
Organisation
Språkbanken Text around the world
Språkbanken 50 years
Celebration
A brief history
Studera språkteknologi
PhD program
Teaching
How to cite
Alumni
Meetings and workshops
Kick-off meetings
Kick-off H2021
Kick-off V2021
Kick-off H2020
Kick-off V2020
Kick-off H2019
Kick-off V2019
Kick-off H2018
Kick-off V2018
Kick-off H2017
Kick-off V2017
Kick-off H2016
Kick-off V2016
Kick-off H2015
Workshops
End of the year workshop & APT 2025
End of the year workshop 2024
End of the year workshop 2023
Semester workshop 2022
Semester workshop H2021
Semester workshop V2021
Semester workshop H2020
Semester workshop V2020
Forskningsmöten
SBX Retreat
SBX Retreat 2024
SBX Retreat 2023
SBX Retreat 2022
Working group meetings
Cookies
Internal
Contact us
Help desk