Background
This project (2009-2011, http://su.avedas.com/converis/contract/321) aimed at developing language learning word cards with a language's most frequent words corresponding to the Common European Framework (CEFR). They were developed for nine languages important for trade and associated countries, both LWUTL (Swedish, Norwegian, Greek, Polish) and MWUTL (Arabic, English, Chinese, Russian and Italian), as a complementary learning material.
Download Kelly lists here (Arabic, Chinese, English, Greek, Italian, Norwegian, Russian, Swedish).
Project description
Word cards are "focused, efficient and certain”(Nation) because they stimulate the necessary mental effort required. A native word is on the one side of the card and the target word on the other. Before turning the card over, you are stimulated to think. It is then, when you think, that you learn. Research shows that you can learn 30-100 words in an hour which is far more efficient than conventional learning methods.
The lists of most frequent words have been compiled in cooperation with lexical computational centers in partner countries and validated through translation. "If you know the most frequent 1,500 words in a language, then you understand 75 percent of a normal text."(Nation). "...an immediate priority…crucial for learners to master these high-frequency words”(Schmitt). "The pedagogical advantage is that it focuses the attention on the tasks to come, enhancing motivation”(Nunan).
Swedish Kelly-list
The Swedish Kelly-list is a freely available frequency-based vocabulary list that comprises general-purpose language of modern Swedish. The list has been generated from a large web-acquired corpus (SweWAC) of 114 mln. words dating from the 2010’s. It is adapted to the needs of language learners and contains 8 425 most frequent lemmas that cover 80% of SweWAC.
The way the Swedish Kelly-list is compiled, it is a reliable resource for suggesting lexical syllabus for CEFR-based courses in Swedish as well as for use in evaluating learner appropriate texts for different CEFR levels, for compiling course books, creating vocabulary exercises and tests, compiling dictionaries, and for a number of other language learning purposes and NLP applications. The list can be used by language learners and teachers, test creators, lexicographers, comparative linguists, corpus linguists, computational linguists, and many other user groups.
The headwords on the Swedish Kelly-list contain the following information, see also Table below:
- id/running number (i.e. relative placement in the frequency band);
- raw frequency (RF);
- relative frequency , i.e. “word-per-million” (WPM);
- CEFR level (A1, A2, B1, B2, C1, C2);
- source of lemma (indication whether the headword comes from SweWAC, from translation list (T2) or has been manually added);
- grammar information, i.e. article or infinitive marker;
- lemma, sometimes provided together with its spelling/stylistic variant;
- word class;
- comments/examples for some of the headwords
Example of items in the Swedish Kelly-list
ID | 88 |
Raw Freq | 2624 032 |
Word per Million | 23017,26 |
CEFR level | A1 |
Source | SweWaC |
Grammar marker | att |
Item | vara (vardagl. va) |
POS | verb |
Example | e.g. var så god! |
The information should be read in the following way: the verb “att vara” (Eng. “to be, to last”) has a colloquial variant “va”; it can be used in a phrase “var så god!” (Eng. “here you go!”); it has the rank “88” in the list and thus belongs to the language’s top 100 words. It has been used 2 624 032 times in SweWAC (RF) which gives 23 017,26 wpm value. The item belongs to the most important vocabulary for language learners and should be learnt at A1 CEFR level (here marked as “1”).
Further information on the list generation process is provided in the publications on Kelly (see "Publications" below). You can search in the full database here: http://kelly.sketchengine.co.uk/
Institutions / organisations
- Adam Mickiewicz University, Poland
- Cambridge Lexicography and Language Services, UK
- Consiglio Nazionale delle Ricerche, Italy
- Institute for Language and Speech Processing/R.C. “Athena”, Greece
- Keewords, Sweden
- Lexical Computing Ltd, UK
- University of Gothenburg, Sweden
- University of Leeds, UK
- University of Oslo, Norway
- University of Stockholm, Sweden (coordinating partner)
Swedish Kelly-list
The Swedish Kelly-list is a freely available frequency-based vocabulary list that comprises general-purpose language of modern Swedish. The list has been generated from a large web-acquired corpus (SweWAC) of 114 mln. words dating from the 2010’s. It is adapted to the needs of language learners and contains 8 425 most frequent lemmas that cover 80% of SweWAC.
The way the Swedish Kelly-list is compiled, it is a reliable resource for suggesting lexical syllabus for CEFR-based courses in Swedish as well as for use in evaluating learner appropriate texts for different CEFR levels, for compiling course books, creating vocabulary exercises and tests, compiling dictionaries, and for a number of other language learning purposes and NLP applications. The list can be used by language learners and teachers, test creators, lexicographers, comparative linguists, corpus linguists, computational linguists, and many other user groups.
The headwords on the Swedish Kelly-list contain the following information, see also Table below:
- id/running number (i.e. relative placement in the frequency band);
- raw frequency (RF);
- relative frequency , i.e. “word-per-million” (WPM);
- CEFR level (A1, A2, B1, B2, C1, C2);
- source of lemma (indication whether the headword comes from SweWAC, from translation list (T2) or has been manually added);
- grammar information, i.e. article or infinitive marker;
- lemma, sometimes provided together with its spelling/stylistic variant;
- word class;
- comments/examples for some of the headwords
Example of items in the Swedish Kelly-list
ID 88
Raw Freq 2624 032
Word per Million 23017,26
CEFR level A1
Source SweWaC
Grammar marker att
Item vara (vardagl. va)
POS verb
Example e.g. var så god!
The information should be read in the following way: the verb “att vara” (Eng. “to be, to last”) has a colloquial variant “va”; it can be used in a phrase “var så god!” (Eng. “here you go!”); it has the rank “88” in the list and thus belongs to the language’s top 100 words. It has been used 2 624 032 times in SweWAC (RF) which gives 23 017,26 wpm value. The item belongs to the most important vocabulary for language learners and should be learnt at A1 CEFR level (here marked as “1”).
Further information on the list generation process is provided in the publications on Kelly (see "Publications" below).
Available/downloadable Kelly-products
The Swedish Kelly-list is a freely available electronic resource and is distributed under the license agreement CC-BY-SA 3.0, LGPL 3.0. You are encouraged to make a reference to any of the articles describing this list if you use the Swedish Kelly-list.
- Human-friendly excel-file with the Swedish Kelly-list can be downloaded here.
- The machine-friendly version can be downloaded here.
As a side effect a number of other products have been created during the KELLY project, among other things:
- Kelly Database where a word in any of the partner languages can be entered and its translation into the other partner languages is presented, if the item is present in the database: http://kelly.sketchengine.co.uk/
- lists of universal vocabulary for 9 and 8 languages, i.e. items that can be used multidirectionally as translation lexica
- list of unique Swedish items, i.e. items never used by translators from other languages into Swedish (can be downloaded here)
Post-project plans
We have plans for further expansion and exploitation of the Swedish Kelly-list, among other things creation of a dynamic lexical database with a possibility for selecting lists of domain words, for adding corpus examples and translation equivalents. Linking this resource to other lexicons available through the Swedish Language Bank we can get morphological analysis of the headword items, their monolingual definitions, and a number of other interesting options. Test item generation as well as lexical analysis of text complexity can also be named among future plans of exploitation of this list.
The project will be evaluated while in use by students in upper secondary schools and adult education. Results will be promoted on a webpage, in media and at conference/seminars.
Publications
You are welcome to contact Sofie Kokkinakis or Elena Volodina if you have any questions.
- Johansson Kokkinakis, S. and Volodina, E. (2011). Corpus-based approaches for the creation of a frequency based vocabulary list in the EU project KELLY – issues on reliability, validity and coverage. eLex 2011, Slovenia. href="http://www.trojina.si/elex2011/Vsebine/proceedings/eLex2011-16.pdf">[pdf]
- Kilgarriff Adam, Charalabopoulou Frieda, Gavrilidou Maria, Bondi Johannessen Janne, Khalil Saussan, Johansson Kokkinakis Sofie, Lew Robert, Sharoff Serge, Vadlapudi Ravikiran Volodina Elena. (2014). Corpus-Based Vocabulary lists for Language Learners for Nine Languages. Language Resources and Evaluation Journal 48.1: 121-163. Springer, Netherlands. http://dx.doi.org/10.1007/s10579-013-9251-2 [Download pdf through open acces]
- Volodina, E. & Johansson Kokkinakis, S. (2012). Introducing Swedish Kelly-list, a new lexical e-resource for Swedish. LREC 2012, Turkey. [pdf]
- Volodina, E. & Johansson Kokkinakis, S. (2012). Swedish Kelly: Technical Report. GU-ISS-2012-01. The Swedish Language Bank, Gothenburg University. [pdf]
- Frieda Charalabopoulou, Maria Gavrilidou, Sofie Johansson Kokkinakis, Elena Volodina 2012. Building corpus-informed word lists for L2 vocabulary learning in nine languages. EuroCALL 2012 Proceedings, Gothenburg. [pdf]