CS 752/852

**Course Title:**

Foundations of Neural Networks

**Catalog Description**

Neural networks are a class of machine learning models which have recently revolutionized many applied machine learning domains such as natural language understanding, image/video processing, bioinformatics, time series analysis. This course teaches students to develop new neural network architectures from scratch and customize them. The course covers all necessary foundations of neural networks including gradient descent optimization and vector calculus. Students will learn how to design models using idioms such as observed variables, latent variables, gate variables, and different functions as well as a wide range of state-of-the-art architectures as design examples.

Prerequisite: CS 515

**Proposing Instructor**

Laura Dietz

**Overview**

An earlier edition of this course was taught in Fall 2020 as CS780 “Machine Learning for Sequences”.

The topic is on neural networks (aka deep learning) which is a class of machine learning models which have recently revolutionized many applied machine learning domains such as natural language understanding, image/video processing, bioinformatics, time series analysis.

The class teaches how to derive and implement one’s own neural network model from scratch – the class will not focus on applying existing black-box implementations to data. The course will teach theoretical foundations of parameter learning with gradient descent, tensor-based computation, and development of pytorch models. The goal is to understand the working principles of the strongest machine learning methods for sequences we have today, such as Transformer-based Neural Networks and Graph Attention Networks.

The class will start with simple models that form the fundamentals such as Naive Bayes, Language Models, and Logistic Regression, as advanced models build on them. The class will teach how to design the ML model, derive the optimization criterion and take its derivative to implement a gradient descent optimization algorithm. Next we discuss more complex models such as conditional random fields, which require solving the “decoding problem”— here students will learn how to use an optimization library. Finally we cover neural networks for sequence data, especially convolutional neural networks, recurrent neural networks such as LSTMS, and transformer-based models such as BERT and GPT-3. Here students will learn how to implement Neural Networks using a library such as pyTorch. As a cross-cutting example, the class will teach fundamentals of natural language processing (NLP), such as part-of-speech tagging and named entity recognition, grammatical dependency parsing, and relation extraction. The course will touch on other application domains and permits students to select their project from any domain of their choice.

The course is independent of other Machine Learning and Data Science classes. However, these can be used to complement and deepen the student’s machine learning skills.

Other than CS 515 (Data Structures) which is a prerequisite to taking this class students must also have the ability to independently write programs in Python. Knowledge of function differentiation, and a basic understanding of vectors is helpful.

For students interested in taking the follow-up course CS953 “Data science for Knowledge Graphs and Text”, either this course or CS853 will satisfy the prerequisites for CS 953.

**Related courses at UNH**

The course is independent of other Machine Learning and Data Science classes. However, these can be used to complement and deepen the student’s machine learning skills.

COMP 750 / DATA 750 - “Neural Networks”: Practical course on how to use existing neural networks architectures. (The course is currently not offered.)

CS 750/850 “Machine Learning”: Offers a conceptual introduction to machine learning in general.

CS 757/857 aka Math 757/857 “Mathematical Optimization for Applications”: offers a deeper experience on optimization algorithms and their mathematical backgrounds.

CS 953 “Data Science for Knowledge Graphs and Text”: Offers a follow-up research/implementation intensive experience. (Pre-requirement for CS 953 are either this class or CS 853)

CS 753/853 “Information Retrieval”: Focuses on the software system for web search and mostly covers basic machine learning algorithms such as Naive Bayes, Language Models, TF-IDF, and learning-to-rank.

CS 755/855 “Computer Vision”: Neural networks for extracting information from images, such as face detection, motion estimation, image segmentation.

CS 730/830 “Artificial Intelligence”: Covers a broader range of intelligent algorithms, such as agents, planning and control, logical reasoning.

Math 738/838 “Data Mining and Predictive Analysis”: Teaches the application of “black-box” machine learning tools using a workbench.

Data 820 and Data 822 “Programming for Data Science”: Teaches the application of “black-box” machine learning tools using a python workbench.

IT 630 “Data Science and Analytics”: Teaches the application of “black-box” machine learning tools using a workbench.

DATA 674 and DATA 675 - “Predictive and Prescriptive Analytics” focuses on basic machine learning methods and machine learning principles such as linear models, k-means, cross-validation, monte carlo methods, and time series.

**Attribute**

CS elective with a minimum passing grade of D-.

**Meeting Hours**

Two 1h 20min sessions per week plus occasional attendance of an office hour.

**Grading Policy**

Final class project (45% of final grade): Students choose an application to develop a machine learning algorithm with data loading and evaluation on a topic of their choice.

Final exam (10% of final grade): Written exam to test knowledge gained throughout the semester.

Bi-weekly written assignments (20% of final grade)

Bi-weekly programming assignments (25% of final grade)

Code reviews on programming homework/project: If the code review is missed or student is unable to explain their code, the homework/project grade will be reduced to “F”.

Late homework and project report submissions will generally be excluded. Any missed activity due to medical or family emergencies requires supporting documentation through the Dean’s office or SAS etc.

**Graduate Course Policy**

The course will be cross-listed as an undergraduate (CS752) and graduate (CS852) course. Students taking this course as graduate students will have to perform additional tasks on homework assignments and exams to obtain the same grade.

**Academic Integrity**

The instructor is strongly committed to upholding the standards of academic integrity. These standards, at the minimum, require that students never present the work of others as their own. Any dishonest behavior, once discovered, will be penalized according to the University’s Student Code of Conduct.

**Am I ready?**

Most of the advanced math will be taught along with the course. However, it will help if you know basic vector addition, basic probability, and analysis of functions (maxima, taking derivatives). In the week before the course starts, I will offer an ungraded “Am I ready?” quiz, which will help you identify weaknesses so you have time to learn before the class starts.

Below some links to chapters in the D2L book to prepare for the course. If you enjoy following along in code, then please select pytorch which we will be using in this course. Note that the book is often rather terse, so don’t fret if you have trouble following but consult alternative textbooks (also see text books on the syllabus):

Basic linear algebra and vectors: Vector-dot product, multiplying a vector with a scalar, the L2 norm/length of a vector and how it is different from its dimensionality.

https://d2l.ai/chapter_preliminaries/linear-algebra.htmlProbability and Random Variables: Multinomial distributions (modelling die rolls), and Bernoulli distributions (modelling coin flips). Conditional distribution and Joint distribution and Marginalization. Expectations. https://d2l.ai/chapter_preliminaries/probability.html

Calculus / derivatives of functions with one variable: Taking derivatives of polynoials (f(x)=x^2 -3x +10), sum rule, product rule.

https://d2l.ai/chapter_preliminaries/calculus.htmlPython programming: Familiarize yourself with the python programmin language, Numpy operations, work through the first steps of setting up pytorch. Python: https://cscircles.cemc.uwaterloo.ca/

PyTorch: https://pytorch.org/tutorials/beginner/introyt/introyt1_tutorial.html Installation of Book examples: https://d2l.ai/chapter_installation/index.html

If your computer does not have a GPU, you can configure Pytorch in CPU-only mode.

**Mutual Expectations**

Students are expected to:

be present in class (physically and mentally),

ask at least one question every session,

present homework solutions,

do their own work and contribute significantly in team activities,

study and repeat necessary class materials independently.

The instructor is expected to:

make lecture notes available before the class,

return graded homework in a timely manner,

be available for questions regarding class material during class, online, and if necessary by appointment,

notify students that are in danger of not meeting the class goals early on,

provide ungraded test exams (quizzes) for students’ self-assessment.

Note that it is not sufficient to just be present in class and submit homeworks. Obtaining an A requires that you study and review materials from lecture notes, assignments, and discussions with the help of the book. If stuck, please see the instructor.

**Textbooks**

No textbook is mandated. We recommend to consult the following books, which are affordable or available online as a free PDF

Easy materials:

**Book:**Dive into Deep Learning

**Intro:**Book about Neural Network foundations with plenty of examples in PyTorch (as well as other Deep Learning frameworks).

**Access:**https://d2l.ai/d2l-en-pytorch.pdf

**Book website**:https://d2l.ai/ Contains HTML view, matching jupyter notebooks, alternative learning materials.**Book:**Linear Algebra and Optimization for Machine Learning. Springer. Charu Aggarwal.

**Intro:**This book discusses linear algebra and optimization from a machine learning perspective.

**Access:**https://unh.primo.exlibrisgroup.com/permalink/01USNH_UNH/121i3ml/alma991020117590505221**Book:**Bayesian Statistics the Fun Way. Understanding Statistics and Probability with Star Wars, LEGO, and Rubber Ducks. Will Kurt**.**Intro:** The book explains concepts in probability theory that are needed for statistical machine learning.

**Access:**: https://unh.primo.exlibrisgroup.com/permalink/01USNH_UNH/1g6a9m7/alma991024543640905221

**Book Website:**https://nostarch.com/learnbayes**Book:**Doing Bayesian Data Analysis: A Tutorial with R, JAGS, and Stan. John Kruschke.

**Intro:**Bayesian Statistics is an important tool for any data scientist. Many other ML books assume you have prior background in mathematics. I found that this book was very intuitive in its explanations and I feel that I now understand Bayesian Statistics better.

**Access:**https://unh.primo.exlibrisgroup.com/permalink/01USNH_UNH/121i3ml/alma991019092789705221**Blog:**https://www.countbayesie.com

**Intro:**Excellent blog about various topics in statistics and machine learning.

Advanced texts:

**Book:**Linguistic Structure Prediction by Noah A. Smith

**Intro:**Explains NLP problems through machine learning theory.

**Access:**https://www.cs.cmu.edu/~nasmith/LSP/**Book:**Deep Learning by Goodfellow

**Intro:**Focuses on Deep Neural Networks

**Access:**Free https://www.deeplearningbook.org/

**Tentative Schedule**

Note that this schedule is preliminary and will possibly change as the course progresses.

Lecture 01: Introduction

Lecture 02: Coordinate Ascent

Lecture 03: Probability Theory

Lecture 04: Gradient Descent and Loss Functions

Lecture 05: Logistic Regression

Lecture 06: Application Domain: Natural Language Processing

Lecture 07: Conditional Random Fields (CRFs)

Lecture 08: Hidden Variables and Gates in CRFs

Lecture 09: Multi-Layer Perceptron

Lecture 10: Convolutional Neural Networks

Lecture 11: Word Embeddings

Lecture 12: Trained Matrix-Factorization

Lecture 13: Recurrent Neural Networks (RNNs)

Lecture 14: Long short-term Memory Networks (LTSMs)

Lecture 15: Attention Mechanisms

Lecture 16: Transformers and BERT

Lecture 17: Graph-Attention Networks

Lecture 18: Application Domain: Generating Natural Language

Lecture 19: Significance Testing

**ABET Learning Outcome Classification**

1. Analyze a complex computing problem and to apply principles of computing and other relevant disciplines to identify solutions.

Ability to design customized neural networks for applications.

Ability to derive a customized training algorithm for machine learning method, such as Logistic Regression with gradient descent and coordinate descent

Understanding the difference between different optimization algorithms, coordinate descent, gradient descent, and Wolfe conditions.

Understanding the strengths and weaknesses of different neural network architectures.

2. Design, implement, and evaluate a computing-based solution to meet a given set of computing requirements in the context of the program’s discipline.

Implement famous machine learning methods from scratch, such as Logistic Regression and Multi-Layer Perceptrons.

Integrate an optimization toolkit to in student’s training code for conditional random fields

Use a neural network toolkit to implement a custom sequence tagger with LSTMs and/or Transformers

6. Apply Computer Science theory and software development fundamentals to produce computing-based solutions

Identification of an application domain for algorithms discussed in class

Implementation of a prototype for the application domain

Evaluation of prediction quality in the application domain

Release of the prototype with proper distribution management, installation and usage instructions.

**ABET Curriculum Classification**

4. Substantial Coverage of A) Algorithms and complexity, B) Computer Science theory C) Concepts of programming languages D) Software development

Algorithms for Machine Learning and NLP

Naive Bayes and Language Models

Logistic regression

Hidden Markov Model decoding

Conditional Random Fields

Neural networks

LSTMs and Transformer Models

Graph Attention Networks

Theory

Optimization algorithms with gradient descent

Formal probabilistic models with inference

Programming

Implementing gradient descent algorithms for logistic regression and Naive Bayes

Using optimization libraries to implement conditional random fields

Using neural network libraries for training

Class project on Domain-specific Application

Customizing and implementing a machine learning algorithm for a text problem of the student’s choice-

Using third-party libraries

Writing test code

Using git repositories as source code control

Writing installation and usage instructions for release management