Introduction
The course is a part of the Language Technology Master Programme.
News
For old news, see below.
 The lecture on January 19 is canceled due to the weather situation.
 The course will start on January 21 in the room L308 on LTgatan.
Teachers
 Course responsible: Richard Johansson
 Labs: Mehdi Ghanimifard
 Lecturer (machine translation): Prasanth Kolachina
Course Literature
 Manning, C. D. & Schütze, H. (1999) Foundations of Statistical Natural Language Processing. MIT Press.
 Krenn, B. & Samuelsson, C. (1997) The Linguist's Guide to Statistics.
 Mitchell, T. (2005) Generative and Discriminative Classifiers: Naive Bayes and Logistic Regression. Extra chapter from the book Machine Learning, McGraw Hill, 1997.
 Lecture notes by Michael Collins.
 Blei (2012) Probabilistic topic models, Comm. ACM 55(4).
 Heinrich (2008) Parameter estimation for text analysis, Technical note, University of Leipzig.
Links
 Reference
documentation for
scipy
's statistical functions and random variables.  Reference documentation for the plotting library.
 Table of several common probability distributions.
Mandatory assignments
 Computer exercise 1: Probability distributions (Deadline: January 30)
 Computer exercise 2: Estimation (Deadline: February 8)
 Programming assignment 1: Classification with Naive Bayes (Deadline: February 25)
 Programming assignment 2: Evaluation (Deadline: March 3)
 Programming assignment 3: Implementing a partofspeech tagger (Deadline: March 13)
Optional assignments for VG
 VG assignment 1: Topic modeling with LDA (Deadline: March 28)
 VG assignment 2: Machine translation (Deadline, March 28) (code)
Schedule
Tuesday 10.1512.00  Thursday 13.1515.00  Friday 10.1512.00  

Week 3, Jan 18–22 
Lecture
1: Course introduction; data analysis; probability and randomness (RJ) (video part 1, part 2) L308 M&S: 1, 2.1.12.1.2; K&S: 1.2.11.2.4 
Lecture
2: Random
variables (RJ) (video part 1, part 2) L307 M&S: 2.1.32.1.9; K&S: 1.3.11.3.5, 1.5.1 

Week 4, Jan 25–29  Lecture 3/lab
exercise: Distributions
(notes) (RJ) G212 
Lecture 4: Estimation (RJ) (video) L308 M&S: 6.2.1 K&S: 1.7.11.7.3, 1.7.5 

Week 5, Feb 1–5  Lab exercise: Estimation
(notes)
(RJ) G212 
Lecture 5: Classification (RJ) (video part 1, part 2) L308 Mitchell, mainly 1–2; M&S, 7.1.1, 7.2, 16 except 16.2.1; Collins, 1–3 

Week 6, Feb 8–12  Lab assignment 1: classification (MG) G212 
Lab assignment 1, continued (RJ) G212 

Week 7, Feb 15–19  Lab assignment 2:
evaluation
(notes,
video)
(RJ, MG) G212 
Lab
assignment 2, continued (MG) G212 

Week 8, Feb 22–26 
Lecture
7: tagging,
bootstrapping
(RJ) (video part 1, part 2) L308 M&S: 10, in particular 10.2–3, and optionally 9; Collins 
Lab
assignment 3: implementing a tagger (MG) G212 

Week 9, Feb 28–Mar 4  Lab assignment 3, continued (MG) G212 
Lecture 8: unsupervised and semisupervised methods (kmeans) (RJ) (video part 1, part 2) L308 Collins, 5–6; M&S, 14.2; Blei, Heinrich 

Week 10, Mar 7–11  Catchup session (MG) G212 
Lecture 9: statistical machine translation (PK) L308 Collins; M&S, 13 

Week 11, Mar 14–18  VG assignments / catchup (MG/PK) G212; NB: 13:1515:00 
VG assignments / catchup (MG/PK) G212 