Statistical Learning for Structured Prediction with Applications to Natural Language Processing
Instructor: Ivan Titov
Time: Friday, 2.15 - 3.45 pm (may be changed later)
Location: Building C 7.2, room 2.11 (may be changed later)
Office hours: send me a message by e-mail.
Note: First class is on October 28
The class will cover machine learning methods for structured prediction problems, the main focus will be on problems from natural language processing but most of the considered methods will have applications in other domains (e.g., bioinformatics, vision, information retrieval, etc).
Structured prediction problems are classification problems where the classifier predicts not a binary/multiclass label but rather an element of some structured space. Examples of structured problems include sequence labeling problems, segmentation problems, parsing (syntactic or semantic in NLP or, e.g, image parsing in vision) and many others.
In the class we will cover most of the state-of-the-art methods for this class of problems: starting from hidden Markov models, structured perceptron, conditional random fields to more advanced techniques such as structured SVM, Searn and others.
Though most of the applications considered papers will be from the NLP domain, I do not require any prior exposure to NLP (though it would be a plus). Ideally, I expect that you have some prior experience with machine learning, statistical NLP or IR. If hesitant, feel free to contact me and ask.
- Present a paper to the class (30 - 45 minutes long presentation)
- Write 2 critical reviews (surveys) on two selected topics (1 - 2 pages each)
- Write a term paper (12 - 15 pages) (If you are registered for 4 points you do not need to write the term paper)
- Read papers before the talks and participate in discussion
- Class participation grade: 60 %
- Your talk and discussion after the talk
- Participation in discussion of other papers
- 2 reviews (5 % each )
- Term paper grade: 40 %
- Only if you get 7 points, otherwise class participation constitutes 100 %
You can skip ONE class without giving any explanation to me (if it is not the class on which you are presenting).
If you need to skip more, you will need to write an additional critical review for every paper presented while you were absent.
- Present the chosen paper in an accessible way
- Present sufficient background, do not expect the audience to know much about Machine Learning or Natural Language Processing, except for the material already covered in the class (according to surveys
there is a good number of people who have no ML background)
- Have a critical view on the paper: discuss shortcomings, possible future work, etc
- To give a good presentation in most of the cases you will need to read one or two additional papers (e.g., those referenced in the paper)
- You should have a look into material on how to give a good presentation compiled by Alexander Koller
- The language for talks and discussions will be English
- We are planning to have 40 minutes long presentations, on some days we may decide to have 2 presentations. This will be more clear when we now how many students are attending
- Send me your slides (preferably in PDF) 4 days before the talk by 6 pm (the first 2 presenters can send me slides 2 days before the talk)
- If we keep the class on Friday, it means that the deadline is Mon, 6 pm
- I will give my feedback 2 days before the seminar (on Wed)
- A short critical (!) essay reviewing one of the papers in the list
- One or two paragraphs presenting the essence of the paper
- Other parts underlying both positive sides (what you like) of the paper and shortcomings
- You need to submit 2 reviews. There will be up-to 3 reviewers for each presentation.
- The review should be submitted (by email in pdf) before the presentation of the paper in class (Exception is the additional reviews submitted for the classes you missed: you should submit such an additional review within 2 weeks of the corresponding class and before the end of the term)
- No copy-paste from the paper. It should be all your words.
- Length: 1 - 1.5 pages each
- Choose a sub-topic covered in class, usually closely-related to the paper you presented (unless it happens that you are more interested in smth else)
- Do additional reading -- this would normally require reading 6-8 additional papers. You can ask me or do some search yourself and then discuss you choice with me
- The paper may be either an insightful survey of the research on this topic or some novel ideas. In both cases you need to negotiate the topic with me.
- This should be written in a style of a research paper, the only difference is that in this paper most of the work you present here is not your own
- Your ideas, analysis, comparison
- It should be written in English
- Example structure (can be different)
- Introduction and motivation for the problem studied
- Detailed survey of work and methods
- Any ideas on improvement of the approaches
- Any alternative interpretation or analysis
- If you paper contains some original ideas, it is OK to have a shorter survey part but if the survey is the main contribution -- then it should not be very shallow.
- Paper organization
- Technical correctness
- Style (written in research style without inappropriate speculations, correct citations, etc)
- Your ideas are meaningful and interesting
- Amount of reading is adequate
Length: 12 - 15 pages
Deadline: Available Later I would recommend to submit it soon after your presentation, as it would probably be easy.
Submitted in PDF to my email
Topics (some changes possible)
Note: References to papers, dates, and speakers are provided in the Google Docs (a reference was sent to attenders)
Past seminars (for information about future seminars see the Google Doc)
- Introduction into structured prediction: problems, settings, etc (given by Ivan)
- Hidden Markov models (Ivan)
- Structured perceptron (Ivan)
- Local models: Maximum entropy Markov models
- Conditional random fields (sequence labeling / segmentation settings)
- SVM: binary, multilable and structured settings (SVM-Struct)
- Maximum margin Markov networks (M3Ns)
- Combining learning and search: SEARN and predecessors
- Parsing: weighted context-free grammars (CFGs): generative vs discriminative training
- Parsing: transition-based vs global models (in dependency parsing context)
- Parsing: CFGs with latent annotation
- learning with latent representation of the context
- Semi-supervised methods for structured prediction