Natural Language Processing
Course Information
Instructor: Bonan Min
Office hours: TBD
Class meets: Monday 6:30PM-9:00PM
Discussion Forum & announcements: Canvas
Homework Submissions: TBD
Course Description
An enormous amount of text (news articles, weblog, tweets) is created every day. Natural language processing transforms text into presumably useful data structures, enabling many applications such as real-time event tracking and question answering. In this course, we will study the mathematics and algorithms in NLP to better understand how they do what they do. We will cover a wide range of text analysis methods, include word level (topic and sentiment analysis), syntactical (grammars and parsing), semantic (meanings of words and phrases), and discourse (pronoun resolution and text structure). We will cover both rule-base systems and statistical models. We will code several algorithms applying what we learn in hands-on projects. We will come away with a deeper understanding of how text is processed by a computer.This course will cover several levels of text analysis and understanding, including word and phrase level analysis (document retrieval and text classification), syntactic analysis (grammars and parsing), semantic analysis (word and sentence meaning), and discourse analysis (pronoun resolution and text structure). Students will learn to use such techniques to solve different NLP problems, including part of speech tagging, language modeling, sentiment analysis, information extraction, (visual) question answering, machine translation and text generation. While the main technologies will be introduced, we will focus more on the machine learning methods, especially deep learning, to approach such problems. In recent years, machine learning and deep learning have obtained very high performance for these tasks and become the major tools the solve NLP problems.
Tentative Schedule
Dates | Topics | Supplementary Resources | Assignments/Projects |
---|---|---|---|
2/1 | Intro (Slides) | SLP2 Chap 1 | |
2/8 | Bag of Word Models (Slides), evaluation metrics (Slides), and Machine Learning basics (Slides) | SLP2 23.1, 22.2.2, 12; SLP3 6 | Assignment #1 |
2/16 | Part of Speech Tagging (slides) and Sequential Labeling (HMM, MEMM, CRF and RNN)(slides) | SLP3 8, 9; http://www.cs.columbia.edu/~mcollins/fb.pdf | |
2/22 | Language Modeling (slides); WordNet (slides) | SLP3 3, 7, 19 | |
3/1 | Machine Learning tutorials (MLP, CNN, RNN for text classification), Word Embeddings (slides) | SLP3 6 | Assignment #2 |
3/8 | Syntax, Constituency Parsing (slides), Dependency Parsing (slides) | SLP3 12, 13, 14, 15 | |
3/15 | Contextualized Word Embeddings(slides) | ||
3/22 | Information Extraction Overview and Named Entity Recognition (slides), Coreference Resolution (slides) | SLP3 17, 21 | |
3/29 | Relation Extraction (slides(1); slides(2) ) | Assignment #3 | |
4/5 | Event Extraction (slides), revisit NLP pipeline (slides) | ||
4/12 | Entity Linking (slides). | ||
4/19 | Patriots' Day observed (University Holiday) No Classes | ||
4/26 | Machine Translation(slides) | ||
5/3 | Neural Machine Translation(slides), Information Retrieval(slides) | ||
5/10 | Final project presentation | Project due |
Textbooks and supplementary materials
The primary textbook is Speech and Language Processing, 2nd Edition (SLP2), by Daniel Jurafsky and James H. Martin. A few chapters of the draft 3rd edition (SLP3) is available online. Whenever available, we highly encourage you to read the draft chapters in SLP3 since they introduce newer methods for NLP that have become standard nowadays.You are encouraged to read papers in the ACL Anthology to read up-to-date papers on NLP.
An excellent book for deep learning is Deep Learning, by Ian Goodfellow, Yoshua Bengio and Aaron Courville.
Evaluation (tentative)
Grades will be determined by the following measures:
Homework (50%). 3 assignment (10-20% each). Assignments can be written assignments or programming assignments involving implementation and experimentation with natural language processing problems. These are individual projects.
Project proposal (15%). Students will need to submit a proposal at least 4 weeks before the final project is due. The intention of the proposal is two-fold: (1) demonstrating understanding of an NLP task/problem, (2) describing your plans for the final projects. We highly encourage you to submit this proposal early and discuss with the instructor about feasibility of your proposed final project. You can do this yourself, or team-up with your classmates. The team size should be between 1 and 3.
Final project (35%). This may involve developing a solution to an existing problem, or defining a new problem & developing a (proof-of-concept) solution. Working on the project early is highly recommended. Good projects can lead to paper submission at conferences afterward. You will need to submit the code and a written report by the end of the term. The deadline will be in the final exam time of the term.
- A short project presentation will also be scheduled for each team during final week of the term.
Assignments
-
Assignment 1: TBD
-
Assignment 2: files
-
Assignment 3: files
-
Final project proposal: TBD
-
Final project presentation: TBD