Fall 2011
Introduction
This course is a part of the Language Technology Master Programme.
Lecturers
- Course coordinator and main lecturer: Lars Borin (LB)
- Lexical resources: Markus Forsberg (MF)
- Speech resources: Jonas Lindh (JL)
Course literature
- Bird, Steven and Gary Simons 2003. Seven dimensions of portability for language documentation and description. Language 79: 557-582. Preprint version at: <http://www.language-archives.org/documents/portability.pdf>
- Borin, Lars 2009. Linguistic diversity in the information society. Proceedings of the SALTMIL 2009 workshop on Information Retrieval and Information Extraction for Less Resourced Languages. 1-7. Donostia: SALTMIL. <http://spraakbanken.gu.se/personal/lars/pblctns/saltmil-2009.pdf>
- Borin, Lars, Dana Dannélls, Markus Forsberg, Maria Toporowska Gronostaj and Dimitrios Kokkinakis 2010. The past meets the present in Swedish FrameNet++. 14th EURALEX International Congress, 269-281. Leeuwarden: EURALEX. <https://svn.spraakdata.gu.se/sb/fnplusplus/pub/SweFN_Euralex_extended.pdf>
- Borin, Lars, Markus Forsberg and Dimitrios Kokkinakis 2010. Diabase: Towards a diachronic BLARK in support of historical studies Proceedings of LREC 2010. 35-42. Valletta: ELRA. <http://spraakbanken.gu.se/personal/lars/pblctns/lrec2010-diabase.pdf>
- Cieri, C., L. Corson, D. and Graff K. and Walker 2007. Resources for new research directions in speaker recognition: The Mixer 3, 4 and 5 corpora. Proc. Interspeech2007, 950-954. <http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.78.7500&rep=rep1&type=pdf>
- CLARIN deliverable D5C2: Language resources and tools survey and taxonomy and criteria for the quality assessment. <http://www-sk.let.uu.nl/u/D5C-2.pdf>
- CLARIN deliverable D5R3a: Linguistic processing chains as Web Services: Initial linguistic considerations. <http://www-sk.let.uu.nl/u/D5R-3a.pdf>
- Francopoulo G. et al. 2006. Lexical Markup Framework (LMF). LREC 2006 – 5th International Conference on Language Resources and Evaluation. Proceedings, 233-236. <http://lirics.loria.fr/doc_pub/LMFPaperForLREC2006FinalSubmission6March06.pdf>
- Giesbrecht, Eugenie and Stefan Evert, Stefan 2009. Part-of-speech tagging - a solved task? An evaluation of POS taggers for the Web as corpus. In I. Alegria, I. Leturia, and S. Sharoff, editors, Proceedings of the 5th Web as Corpus Workshop (WAC5). San Sebastian, Spain. <http://purl.org/stefan.evert/PUB/GiesbrechtEvert2009_Tagging.pdf>
- Helgason, Pétur 2010. Speech databases and speech corpora. <http://www.anst.uu.se/pehel169/Computer_tools_course/Speech%20databases.pdf>
- Ide, Nancy and James Pustejovsky 2010. What does interoperability mean, anyway? Toward an operational definition of interoperability. Proceedings of the Second International Conference on Global Interoperability for Language Resources (ICGL 2010). Hong Kong, <http://www.cs.vassar.edu/~ide/papers/ICGL10.pdf>
- Ide, Nancy, James Pustejovsky, Nicoletta Calzolari and Claudia Soria 2009. The SILT and FlaReNet international collaboration for interoperability. Proceedings of the Third Linguistic Annotation Workshop, held in conjunction with ACL 2009, Singapore. <http://www.aclweb.org/anthology/W/W09/W09-3034.pdf>
- Ide, Nancy and Keith Suderman 2009. Bridging the gaps: Interoperability for GrAF, GATE, and UIMA. Proceedings of the Third Linguistic Annotation Workshop, held in conjunction with ACL 2009, Singapore. <http://www.aclweb.org/anthology/W/W09/W09-3004.pdf>
- Ide, Nancy and Yorick Wilks 2006. Making sense about sense. In E. Agirre and P. Edmonds (eds.), Word Sense Disambiguation: Algorithms and applications. Springer. <http://clara.uib.no/files/2011/06/ide.pdf>
- Jurafsky, Daniel and James H. Martin 2009. Speech and language processing, 2nd ed. Prentice-Hall.
- Kilgarriff, Adam 2007. Googleology is bad science. Computational Linguistics 33 (1): 147-151. <http://www.aclweb.org/anthology-new/J/J07/J07-1010.pdf>
- Krauwer, Steven 2003. The Basic Language Resource Kit (BLARK) as the first milestone for the language resources roadmap. Proceedings of SPECOM 2003. Moscow. <http://www.elsnet.org/dox/krauwer-specom2003.pdf>
- Lamel, L.F., R.H.Kassel and S.Seneff 1989. Speech database development: Design and analysis of the acoustic-phonetic corpus. Speech Input/Output Assessment and Speech Databases. <http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.78.7500&rep=rep1&type=pdf<
- Sanderson, Mark, Martin Braschler and Nicola Ferro 2009. Best practices for test collection creation, evaluation methodologies and language processing technologies. TrebleCLEF deliverable D4.2. University of Sheffield. <www.trebleclef.eu/getfile.php?id=258>
- Wynne, Martin (ed.) 2005. Developing linguistic corpora: a guide to good practice. Oxford: AHDS. <http://icar.univ-lyon2.fr/ecole_thematique/contaci/documents/Baude/wynne.pdf>
Course requirements
There are three graded assignments all of which must be completed for passing the course. Their weights for calculating the final grade are shown in parentheses below.
- Take-home exam (40%): See attached instructions at the end of this web page; deadline for submission of answers: 31st October, 2011
- Course paper (40%): See attached instructions at the end of this web page; deadline for selection of topic: 3rd October 2011, 12.00 noon; deadline for submission of written paper: 14th October 2011
- Course paper presentation (20%): TBA; oral presentations on the 18th and 19th October 2011
Schedule
All lectures will be held in room K333, Lennart Torstenssonsgatan 6.
Lecture slides will be added as attachments at the bottom of this web page after the lectures.
| Date and time (lecturer) | Topic (recordings) | Literature |
|---|---|---|
| 1. Tue 6th Sep 10.15-12 (LB) | Introduction; what are LTR; why do we need LTR (part1 part2) | Bird/Simons 2003; Borin 2009; Ide et al 2009 |
| 2. Thu 15th Sep 13.15-15 (LB) | Types of LTR; resource acquisition and creation; language corpora (part1 part2) | Borin 2009; Giesbrecht/Evert 2009; Kilgarriff 2007; CLARIN D5C2; Wynne 2005 |
| 3. Wed 21st Sep 10.15-12 (MF) | Lexical resources (part1 part2) | Jurafsky/Martin 2009, ch. 19; Borin et al. 2010; Francopoulo et al 2006; Ide/Wilks 2006 |
| 4. Thu 22nd Sep 13.15-15 (MF) | Lexical resources (part1 part2) | Jurafsky/Martin 2009, ch. 19; Borin et al. 2010; Francopoulo et al 2006; Ide/Wilks 2006 |
| 5. Tue 27th Sep 10.15-12 (LB) | Corpora, IR collections and other text resources (part1 part2) | Kilgarriff 2007; Sanderson et al 2009; Wynne 2005 |
| 6. Wed 28th Sep 10.15-12 (LB) | Tools for LTR; the BLARK; practical matters in connection with the course paper (recording) | Borin/Forsberg/Kokkinakis 2010; CLARIN D5R3a; Krauwer 2003 |
| 7. Thu 29th Sep 13.15-15 (JL) | Speech resources and tools (part1 part2) | Cieri et al. 2007; Helgason 2010; Lamel et al 1989 |
| Mon 3rd Oct 12.00 noon | Deadline for selection of cours paper topic | |
| 8. Thu 6th Oct 13.15-15 (LB) | Standards and infrastructure for LTR (part1 part2) | Borin 2009; Ide/Pustejovsky 2010; Ide et al 2009; Ide/Suderman 2009 |
| Fri 14th Oct | Written course paper deadline | |
| 9. Tue 18th Oct 9.15-12 (LB) | Course paper presentations | |
| 10. Wed 19th Oct 13.15-17 (LB) | Course paper presentations | |
| Mon 31st Oct | Take-home exam deadline |
| Attachment | Size |
|---|---|
| take-home-exam-2011.pdf | 66.17 KB |
| course-paper-2011.pdf | 51.67 KB |
| LR11-01-nup.pdf | 761.58 KB |
| LR11-02-nup.pdf | 289.14 KB |
| LR11-03-nup.pdf | 2.98 MB |
| LR11-04-nup.pdf | 268.68 KB |
| LR11-05-nup.pdf | 120.24 KB |
| LR11-06-nup.pdf | 3.22 MB |
