	Compiling a corpus of CEFR-related texts.
	This paper reports on initial efforts to compile a corpus of course book texts used for teaching CEFR-based courses of Swedish to adult immigrants. The research agenda
behind compiling such a corpus comprises the study of normative “input” texts that can
reveal a number of facts about what is being taught in terms of explicit grammar, receptive vocabulary, text and sentence readability; as well as build insights into linguistic characteristics of normative texts which can help anticipate learner performance in terms of active vocabulary, grammatical competence, etc. in classroom and testing settings.
The CEFR “can-do” statements are known to offer flexibility in interpreting them for
different languages and target groups. However, they are nonspecific and therefore it is difficult to associate different kinds of competences and levels of accuracy learners need in order to perform the communicative tasks with the different CEFR levels. To address this problem a systematic study needs to be performed for each individual anguage, both for “input” normative texts and “output” learner-produced texts. In this project we take
the first step to collect and study normative texts for Swedish.
The article describes the process of corpus compilation, annotation scheme of CEFR-
relevant parameters, and methods proposed for text analysis, namely statistic and empiric methods, as well as techniques coming from computational linguistics/machine learning.
	Proceedings of the Language Testing and CEFR conference, Antwerpen, Belgium, May 27-29, 2013
	Volodina, Elena and Johansson Kokkinakis, Sofie
	2013