When it comes to L2 infrastructure, there are three major challenges: availability of data, the need of coordination and availability of methods for processing L2 data. This largely depends upon the following:
(1) L2 learner data, such as essays, is non-trivial to collect since it is not available online for download as is, it requires good contacts with teachers/assessors and via them with learners or their parents who have to be convinced to sign permits for use. This data is essentially sensitive often containing personal details that need to be anonymized.
(2) Hitherto research on learner data has been carried out in different fields, including linguistics, computational linguistics, and Second language acquisition, in a rather uncoordinated fashion - from different points of view and with different purposes and methods - and so far there has been little dialogue or coordination within or between the fields. Scattered individual efforts to collect L2 learner data such as essays, exercise logs and oral transcripts have been driven by project purposes, which has influenced the type of learner metadata, permits, data formats, databases and search tools. As a result, collected data from one project often cannot be compared to or complemented with data collected in another project. Sometimes permit types may even lead to data being forbidden to be used in new projects.
(3) Automatic annotation of L2 data is problematic due to presence of an excessive amount of deviations from the normative Swedish. The existing computational linguistics methods for text processing are developed with a normative language in mind, and cannot be applied in their current form to L2 texts. However, annotating learner data manually is an extremely time-consuming enterprise. To cater for the grammatical and orthographical infelicities in L2 texts, and to make annotation of L2 data more time-effective, computational linguistics methods need to be adapted to the challenges set by interlanguage, e.g. Hawkins and Buttery (2010), Rosen et al. (2014).
For further details and publications, see SweLL page here in English and in Swedish