Natural Language Processing
|
http://cs.jhu.edu/~jason/465
).
Course catalog entry: This course is an in-depth overview of techniques for processing human language. How should linguistic structure and meaning be represented? What algorithms can recover them from text? And crucially, how can we build statistical models to choose among the many legal answers?
The course covers methods for trees (parsing and semantic interpretation), sequences (finite-state transduction such as tagging and morphology), and words (sense and phrase induction), with applications to practical engineering tasks such as information retrieval and extraction, text classification, part-of-speech tagging, speech recognition, and machine translation. There are a number of structured but challenging programming assignments. Prerequisite: 600.226 or equivalent. [Applications, 3 credits]
Course objectives: Welcome! This course is designed to introduce you to some of the problems and solutions of NLP, and their relation to linguistics and statistics. You need to know how to program (e.g., 600.120) and use common data structures (600.226). It might also be nice—though it's not required—to have some previous familiarity with automata (600.271) and probabilities (600.475, 550.420, or 550.310). At the end you should agree (I hope!) that language is subtle and interesting, feel some ownership over some of NLP's formal and statistical techniques, and be able to understand research papers in the field.
Lectures: | MWF 3-4 or 3-4:15, Maryland 109. | |
Prof: | Jason Eisner - () | |
TA: | Dingquan Wang - | |
CA: | Roger Que - | |
Office hrs: |
For Prof: After class until 4:30; or by appt in Hackerman 324C For TA: Tue 9:30-10:30, Fri 10-1 in Hackerman 322, or by appt in Hackerman 321 For CA: TBA | |
Discussion session: | TA-led session (optional) for activities/discussion/questions/review: TBA | |
Discussion site: |
http://piazza.com/jhu/fall2014/600465
... public questions, discussion, announcements | |
Web page: | http://cs.jhu.edu/~jason/465 | |
Textbook: |
Jurafsky & Martin, 2nd ed. (semi-required - P98.J87 2009 in Science Ref section on C-Level) Roark & Sproat (recommended - P98.R63 2007 in same section) Manning & Schütze (recommended - free online PDF version here!) | |
Policies: |
Grading: homework 50%, participation 5%, midterm 15%, final 30% Submission: TBA Lateness: floating late days policy Honesty: CS integrity code, JHU undergraduate policies, JHU graduate policies Intellectual engagement: much encouraged Disabilities: If you need accommodations for a disability, obtain a letter from Student Disability Services, 385 Garland, (410) 516-4720. integrity code Announcements: Read mailing list and this page! | |
Related sites: |
|
This class is in the "flexible time slot" MWF 3-4:30. Please keep the entire slot open. Class will usually run 3-4, followed by office hours in the classroom from 4-4:30 (stick around to get your money's worth). However, class will sometimes run till 4:15 in order to keep up with the syllabus. I'll try to give advance notice of these "long classes," which among other things make up for no-class days when I'm out of town.
We'll also schedule a once-per-week discussion session led by your TA. This optional session will focus on solving problems together. That's meant as an efficient and cooperative way to study for an hour: it reinforces the past week's class material without adding to your homework load. Also, if you come to discussion session as recommended, you won't be startled by the exam style — the discussion problems are taken from past exams and are generally interesting.
Warning: The schedule below may change. Links to future lectures and assignments may also change (they currently point to last year's versions).
Warning: I sometimes turn off the PDF links when they are not up to date with the PPT links. If they don't work, just click on "ppt" instead.
Week | Monday | Wednesday | Friday | Suggested Reading | |
8/25 |
Introduction
(ppt)
|
|
|||
9/1 | No class (Labor Day) |
Assignment 1 given: Designing CFGs Chomsky hierarchy (ppt) |
Language models
(ppt)
|
|
|
9/8 |
Probability concepts
(ppt; video lecture)
|
Bayes' Theorem
(ppt) Smoothing n-grams (ppt) |
Assignment 2 given: Probabilities Smoothing continued |
|
|
9/15 |
(& another sign meant 3 ... ?) |
Assignment 3 given: Language Models Context-free parsing (ppt) |
Assignment 2 due Context-free parsing |
|
|
9/22 |
Quick in-class quiz: Log-linear models Earley's algorithm (ppt) |
No class (Rosh Hashanah) |
Extending CFG
(summary
(ppt))
|
| |
9/29 |
Probabilistic parsing
(ppt)
|
Assignment 4 given: Parsing Parsing tricks (ppt) |
Assignment 3 due Human sentence processing (ppt) |
|
|
10/6 |
Semantics
(ppt)
|
Semantics continued
Assignment 5 given: Semantics |
Semantics continued
|
| |
10/13 (Fri class meets on Thu this week) |
Midterm exam (3-4:30 in classroom) |
Forward-backward algorithm (ppt)
(Excel spreadsheet; Viterbi version; lesson plan; video lecture)
|
Forward-backward continued
|
|
|
10/20 |
Assignment 4 due Assignment 6 given: Hidden Markov Models Expectation Maximization (ppt) |
Finite-state algebra
(ppt)
|
Finite-state machines
|
|
|
10/27 |
No class? (prof away) |
No class? (prof away) |
Assignment 5 due Finite-state implementation (ppt) |
|
|
11/3 |
Assignment 7 given: Finite-State Modeling Finite-state tagging (ppt) |
Noisy channels and FSTs
(ppt)
|
More FST examples
(ppt)
|
||
11/10 |
Assignment 6 due Programming with regexps (ppt) |
|
Optimal paths in graphs
|
|
|
11/17 |
Structured prediction
(ppt)
|
Current NLP tasks and competitions
(ppt)
|
Applied NLP continued (ppt) | Explore links in "NLP tasks" slides | |
11/24 |
No class (Thanksgiving break) |
No class (Thanksgiving break) |
No class (Thanksgiving break) |
||
12/1 | Applied NLP continued (ppt) |
Topic
models
|
Assignment 7 due Machine translation |
|
|
12/8 | Thu 12/11 is absolute deadline for late assignments ---> |
Final exam: Tue 12/16, 2-5pm ---> |