CS 780/880 Topics / Machine Learning for Sequences and Text

Instructor: Prof. Laura Dietz, Fall 2020
When: TBD
Where: TBD
Office Hours: TBD in Kingsbury Hall N215C, other times by appointment

See course catalog entry and MyCourses for the latest information.

Note

This course will be managed on MyCourses and Piazza.

Overview

The class covers machine learning methods to extract information from sequential data, such as human-written text and DNA sequences. Famous examples is to tag texts with parts-of-speech, extract relations expressed in text, or locate genes and gene variations in DNA sequences.

The class teaches how to derive and implement one’s own machine learning algorithm—it will not teach how to apply black-box machine learning implementations. The goal is to understand the working principles of the strongest machine learning methods for sequences we have today, such as Transformer-based Neural Networks. We will start with simple models that form the fundamentals such as Naive Bayes, Language Models, and Logistic Regression, as advanced models build on them. The class will teach how to design the ML model, derive the optimization criterion and take its derivative to implement a gradient descent optimization algorithm. Next we discuss more complex models such as conditional random fields, which require to solve to the “decoding problem”— here students will learn how to use an optimization package. Finally we cover neural networks for sequence data, especially convolutional neural networks, recurrent neural networks such as LSTMS, and transformer-based models such as BERT and XLNet. Here students will learn how to implement Neural Networks using a library such as pyTorch. As a cross-cutting topic, the class will teach fundamentals of text processing, such as tokenization and TF-IDF, as well as famous tasks in natural language processing, such as part-of-speech tagging, dependency parsing, and relation extraction.

While many ideas also apply to time series data, this will not be a major focus of this class.

Catalog Entry

This course covers basic and advanced machine learning algorithms for learning from sequential data like text or genes.

This is a synchronous lecture that requires to participate in bi-weekly homework, quizzes, and code presentations throughout the semester.

The course is independent of other Machine Learning and Data Science classes. However these be used to complement and deepen the student’s machine learning skills.

CS 750/850 “Machine Learning”: Offers a conceptual introduction to machine learning in general.
CS 757/857 aka Math 757/857 “Mathematical Optimization for Applications”: offers a deeper experience on optimization algorithms and their mathematical backgrounds.
CS 953 “Data Science for Knowledge Graphs and Text”: Offers a follow-up research/implementation intensive experience. (Pre-requirement for CS 953 are either this class or CS 853)
CS 753/853 “Information Retrieval”: Focuses on the software system for web search and mostly covers basic machine learning algorithms such as Naive Bayes, Language Models, TF-IDF, and learning-to-rank.
CS 7xx/8xx “Computer Vision”: Neural networks for extracting information from images, such as face detection, motion estimation, image segmentation.
CS 730/830 “Artificial Intelligence”: Covers a broader range of intelligent algorithms, such as agents, planning and control, logical reasoning.
Math 738/838 “Data Mining and Predictive Analysis”: Teaches the application of “black-box” machine learning tools using a workbench.
Data 820 “Programming for Data Science”: Teaches the application of “black-box” machine learning tools using a python workbench.
Data 822 “Data Mining and Predictive Modeling”: Teaches the application of “black-box” machine learning tools using a python workbench.
IT 640 “Data Science and Analytics”: Teaches the application of “black-box” machine learning tools using a workbench.

Courses marked with CS7xx/8xx have not yet received their permanent course number, but are currently listed as CS780/880 different sections denote different courses.

Prerequisits

Data Structures (CS 515) or permission of instructor. Ability to independently write programs in either Python.

For students interested in taking the follow-up course CS953 “Data science for Knowledge Graphs and Text”, either this course or CS853 will satisfy the prereqs.

Grading Policy

Final class project (50% of final grade): Students choose an application to develop a machine learning algorithm with data loading and evaluation on a topic of their choice.
Weekly quizzes (40% of final grade): Short 10 minute quizzes at the beginning of class.
Final exam (10% of final grade): Written exam to test knowledge gained throughout the semester.
Bi-weekly homework assignments (written and programming) are mandatory but graded pass-fail-excellent scale. A pass or excellent needs to be obtained on at least four (out of five) homework assignments in order to be eligible for participation in the final exam and submission of a final project report. Students with two or more excellents obtain an upgrade to the next higher letter grade (e.g., B- to B, or B+ to A-).
Code reviews on programming homework/project: If the code review is missed or student is unable to explain their code, the homework/project grade will be reduced to “F”.

The same grading policy applies to both students taking the course as CS 7xx and CS 8xx. Of course, expectations for students taking the course for graduate credits under CS 8xx are higher.

Late homework and project report submissions will generally be excluded. Any missed activity due to medical or families emergencies requires supporting documentation through the dean’s office or SAS etc.

Academic Integrity

The instructor is strongly committed to upholding the standards of academic integrity. These standards, at the minimum, require that students never present the work of others as their own. Any dishonest behavior, once discovered, will be penalized according to the University’s Student Code of Conduct.

Mutual Expectations

Students are expected to:

be present in class (physically and mentally),
ask at least one question every session,
present homework solutions,
do their own work and contribute significantly in team activities,
study and repeat necessary class materials independently.

The instructor is expected to:

make lecture notes available before the class,
return graded homework in a timely manner,
be available for questions regarding class material during class, online, and if necessary by appointment,
notify students that are in danger of not meeting the class goals early on,
provide ungraded test exams (quizzes) for students’ self-assessment.

Note that is not sufficient to just be present in class and submit homeworks. Obtaining an A requires that you study and review materials from lecture notes, assignments, and discussions with the help of the book. If stuck, please see the instructor.

Textbooks

No textbook is available. An earlier edition of the class can be found online: https://gitlab.cs.unh.edu/mlseq/lecture-materials/tree/master/slides

Schedule

Note that this schedule is preliminary and will possibly change as the course progresses. Chapter references are based on the book Introduction to Information Retrieval (IIS).

Lecture 01: Introduction
Lecture 02: Probability Theory
Lecture 03: Logistic Regression
Lecture 04: Naive Bayes
Lecture 05: Undirected Graphical Models
Lecture 06: Conditional Random Fields
Lecture 07: Neural Networks Introduction and Backpropagation
Lecture 08: MAP inference
Lecture 09: LTSMs
Lecture 10: Attention and Neural Turing Machines
Lecture 11: Transformers and BERT

An earlier edition of a similar course was taught at Mannheim University.

Important Dates

Quizzes: TBP, see “MyCourses” Final exam: TBD, see “MyCourses”

ABET Learning Outcome Classification

1. Analyze a complex computing problem and to apply principles of computing and other relevant disciplines to identify solutions.

Ability to derive a customized training algorithm for machine learning method, such as Logistic Regression with gradient descent and coordinate descent
Understanding the difference between different optimization algorithms, coordinate descent, gradient descent, and Wolfe conditions.
Ability to derive HMM-style decoding for conditional random fields
Knowing the basics of text processing, such as tokenization, TF-IDF, pre-trained word embeddings
Knowing important natural language processing tasks such as part-of-speech tagging, dependency parsing, coreference and relation extraction.

2. Design, implement, and evaluate a computing-based solution to meet a given set of computing requirements in the context of the program’s discipline.

Implement famous machine learning methods from scratch, such as Naive Bayes and Logistic Regression.
Integrate an optimization toolkit to in student’s training code for conditional random fields
Use a neural network toolkit to implement a custom sequence tagger with LSTMs and/or Transformers

6. Apply computer science theory and software development fundamentals to produce computing-based solutions.

Identification of a application domain for algorithms discussed in class
Implementation of a prototype for the application domain
Evaluation of prediction quality in the application domain
Release of the prototype with proper distribution management, installation and usage instructions.

ABET Curriculum Classification

4. Substantial coverage of a) algorithms and complexity, b) computer science theory, c) concepts of programming languages, and d) software development.

Covered by:

Algorithms for Machine Learning and NLP

Naive Bayes and Language Models
Logistic regression
Hidden Markov Model decoding
Conditional Random Fields
Neural networks
LSTMs and Transformer Models

Theory

Optimization algorithms with gradient descent
Formal probabilistic models with inference

Programming

Implementing gradient descent algorithms for logistic regression and Naive Bayes
Using optimization libraries to implement conditional random fields
Using neural network libraries for training

Class project on Domain-specific Application

Customizing and implementing a machine learning algorithm for a text problem of the student’s choice-
Using third-party libraries
Writing test code
Using git repositories as source code control
Writing installation and usage instructions for release management

Laura Dietz