Machine Learning Theory

Fall 2017, Wednesday 14:00 to 16:45 in UvA SP C0.110/D1.111.

Aim

Machine learning is one of the fastest growing areas of science, with far-reaching applications. In this course we focus on the fundamental ideas, theoretical frameworks, and rich array of mathematical tools and techniques that power machine learning. The course covers the core paradigms and results in machine learning theory with a mix of probability and statistics, combinatorics, information theory, optimization and game theory.

During the course you will learn to

Formalize learning problems in statistical and game-theoretic settings.
Examine the statistical complexity of learning problems using the core notions of complexity.
Analyze the statistical efficiency of learning algorithms.
Master the design of learning strategies using proper regularization.

MasterMath Website

This course is offered as part of the MasterMath program. To participate for credit, sign up here. We also use their ELO for submitting homework and receiving grades.

Prerequisites

The course will be self-contained.

Basic probability theory. (Conditional) probability and expectations. Discrete and continuous distributions.
Basic linear algebra. Finite dimensional vector spaces. Eigen decomposition.
Basic calculus.
No programming will be required.

Lecturers

The lectures will be recorded. See the mastermath ELO for the link and password.

Mode

The grade will be composed as follows.

40%: weekly homework to be handed in before the next lecture.
30%: mid-term exam.
30%: final exam.

It is strongly encouraged to solve and submit your weekly homework in teams. There is no limit on the team size. Exams are personal.

Schedule

When	What	Who
Wed 13 Sep	Intro to statistical and game-theoretic learning. Proper losses. Notes	Wouter
Wed 20 Sep	PAC learnability for finite hyp. classes, realizable case. Chapter 1+2 in [1]. SVM + Kernels.	Rianne
Wed 27 Sep	Agnostic PAC learning and learning via uniform convergence. Chapter 3+4 in [1].	Rianne
Wed 4 Oct	No-Free Lunch. VC Dim. Fundamental theorem of PAC learning. Chapters 5-6.4 in [1].	Rianne
Wed 11 Oct	Proof of Fund.Th of PAC Learning. Sauer's lemma. Chapter 6.5 in [1]. Nonuniform Learnability, SRM, Other notions of Learning.	Rianne
Wed 18 Oct	Full Information Online Learning (Experts). Specialists. Notes	Wouter
Wed 25 Oct	Midterm exam (list of content)	TBD
Wed 1 Nov	Rademacher Complexity & covering numbers, Chapters 26 and 27 in [1].	Peter
Wed 8 Nov	Online Convex Optimization, Sections 3.1 and 3.3 in [2] and Notes.	Wouter
Wed 15 Nov	Bandits. UCB and EXP3. Chapters 2.2 and 3.1 in [3].	Wouter
Wed 22 Nov	Non-stationary environments. Tracking and long-term memory using Specialists Notes.	Wouter
Wed 29 Nov	Exp-concavity. Online Newton Step. Vovk mixability. Chapter 4 in [2], Notes.	Wouter
Wed 6 Dec	Rademacher, continued; log-loss prediction; Minimum Description Length (MDL)	Peter
Wed 13 Dec	Boosting. AdaBoost. Chapter 10 in [1].	Wouter
Wed 20 Dec	Tensor Decompositions + wrap up.	Wouter
Wed 24 Jan	Final exam (list of content). Location: IWO-geel

Exams

Midterm

Intro to statistical and game-theoretic learning. Proper losses. (Handout) PAC learnability for finite hyp. classes, realisable case. (Chapter 1+2). Half-spaces, SVM's and Kernels (only what is discussed in the lecture) Agnostic PAC learning and learning via uniform convergence. (Chapter 3+4) No-Free Lunch. VC Dim. Fundamental theorem of PAC learning. Proof of Fund.Th of PAC Learning, Symmetrisation. Sauer's lemma + proof. (Chapter 5.1 and 6 (not: 6.5.2); proofs as in the lecture - for similar proofs see Chapter 28.1 and 28.3). Non-uniform learnability, SRM (7.1 and 7.2)

Final

The final exam will test you on the online learning component of the class plus Rademacher complexity. Below is the list of topics. The weekly homework prepares you for the exam. The solutions can be found on the mastermath ELO.

If below we say 'you should read and understand a section', this means that:

if in the exam we refer to a Lemma/Theorem/Formula in the section, we will always state it explicitly (e.g. on the cheat sheet), so you don't have to learn formulas by heart.
however, you should understand all these Lemmas/Theorems/Formulas since we may ask questions about them and we will not give further explanations of formulas.

Online Learning

If below we say "algorithm and regret bound" we might question you on the setting, its main algorithm and (do computations involving) the corresponding regret bound. If we add "analysis" you should be able to reprove the regret bound at the exam.

Experts and specialists
- Mix loss game, lower bound, Aggregating algorithm regret bound and analysis
- Dot loss game, Hedge algorithm regret bound and analysis
- Mixable loss functions. Aggregating algorithm regret bound and analysis.
- Mix loss game with specialists. Specialists Aggregating Algorithm (SAA) regret bound and analysis.
Online Convex Optimisation
- Convex losses, Online Gradient Descent regret bound and analysis.
- Strongly convex losses, Online Gradient Descent regret bound and analysis.
- Exp-concave losses, Online Newton Step regret bound.
Non-stationarity
- Switching between experts. Fixed Share algorithm as a reduction to specialists. Regret bound and analysis.
- Long-term memory. Mixing Past Posteriors algorithm as a reduction to specialists. High-level construction.
Bandits
- Stochastic bandit setting. UCB algorithm, regret bound.
- Adversarial bandit setting. EXP3 algorithm, regret bound.
Boosting
- AdaBoost algorithm, iteration bound, analysis.
Log-Loss Prediction (covered in Rademacher lectures):
- Section 9.1-9.4: read and understand completely.

Rademacher Complexity & Covering Nrs - Chapter 26, 27

Proofs about this part of the course are sometimes difficult and we will not ask you to reproduce proofs from the book (except very simple ones). However, we do ask you to read and understand the proofs since that helps in getting the right insights for the questions we do ask - which are similar to (but not the same as!) what you were asked in the homework exercises.

Sections 26.1 - 26.2. You should read & understand both sections completely, except the proof of Thm 26.5 and Lemma 26.3 (McDiarmid).
Section 26.3,26.4 and 26.5 can be skipped.
Section 27.1: read & understand completely. Section 27.2: read and understand the statement of Lemma 27.4 (bound on Rademacher complexity in terms of covering numbers); no need to read or understand the proof.

We will not ask you about VC dimension or PAC learning. That was covered by the mid-term. We will also not ask you about the tensor decompositions bonus lecture.

Material

We will make use of the following sources

[1] Understanding Machine Learning: From Theory to Algorithms by Shai Shalev-Shwartz and Shai Ben-David. We are aware of a DNSSEC issue with this server, have notified the sysadmin and have mirrored the book on the MasterMath ELO in the interim.
[2] Introduction to Online Convex Optimization by Elad Hazan.
[3] Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems by Sébastien Bubeck and Nicolò Cesa-Bianchi.