Information-Theoretic Learning (ITL)
Leiden University, Spring Semester 2020
General Information
The URL of this webpage is
www.cwi.nl/~pdg/teaching/inflearn. Visit this page regularly for
changes, updates, etc.
This course is on an interesting but complicated subject. It is given
at the master's or advanced bachelor's level. Although the only
background knowledge required is elementary probability theory, the
course does require serious work by the student. The course load is 6
ECTS. Click here (studiegids) for a general course description.
Many thanks are due to Steven de Rooij (Leiden University) who prepared a
significant proportion of the exercises.
Lectures and Exercise Sessions
Because of the Corona Emergency, lectures take place remotely, via Blackboard, Kaltura Captura, each Tuesday from 14.15--16.00. The lectures
are immediately followed by a virtual
mini-exercise session held by Tyron Lardy.
Presumably, there will abe no lecture on April 14 and May 5, but due to the Corona emergency, these plans may change. The last
official lecture is scheduled for May 19, and the final exam will be
held on Friday, June 5th, in the morning from 10:15 to 13:15. The exam will be sent to you by email and/or be made available on blackboard. It is an Open Book/Internet Exam, so you are free to look up things on the internet. However, I will request you to subscribe to an 'honour's code' that you do not communicate with each other during the exam.
Homework Assignments
Weekly Homework: At every lecture. The assignment is made
available on this webpage. Homework is obligatory and must be
turned in 24 hours before the beginning of the next lecture, i.e. six
days after the assignment was handed out: on Mondays, at 14:15. You
can turn in your homework digitally via blackboard or by e-mailing it
to Tyron (photo's of handwritten homework are o.k.). Include your
name on the first page of the pdf you hand in. Include both the
number of the homework set as well as your name in the name of the pdf
file. For example: ITL_HW4_Lardy.pdf After the lecture, there is
(approximately) 30 minutes homework session, during which the homework
will be explained and discussed by Tyron. Turning in homework in time
is required, see below. Homework solutions will be made available the evening or morning before the homework is discussed.
Credit
6 ECTS points.
Examination form
In order to pass the course, one must obtain
a sufficient grade (5.5 or higher) on
both of the
following two:
- An open-book written examination (to be held June 5th).
- Homework. Each student must hand in solutions to homework
assignments at the beginning of the lecture after the homework was handed out.
Discussing
the problems in the group is encouraged, but every participant must
write down her or his answers on her or his own. The final homework
grade will be determined as an average of the weekly grades.
The final grade will be determined as the average of the
two grades, with the homework counting 40% and the final exam counting 60%.
Literature
We will mainly use various chapters of the
following source: P. Grünwald. The Minimum Description Length
Principle, MIT Press, 2007. Some additional hand-outs will be made
available free of charge as we go. For the third week, this is Luckiness
and Regret in Minimum Description Length Inference, by Steven de
Rooij and Peter Grünwald, Handbook of the Philosophy of Science,
Volume 7: Philosophy of Statistics, 2011. This paper gives an overview
of the part of this course that will be concerned with the relation
between statistics, machine learning and data compression, as embodied
in MDL learning.
Course Schedule
Lecture contents are subject to change at any time for any reason.
A more precise schedule, with links to all exercises, will be
determined as we go.
- February 4: introduction
- General introduction: learning, regularity, data compression. Kolmogorov Complexity; deterministic vs. purely random vs. ``stochastic'' sequences.
- Literature: Chapter 1 up to Section 1.5.1.
- February 11: Preparatory Probability Theory and Statistics.
- Basics of Probability Theory
- Maximum Likelihood and Bayesian Inference; Bayes Predictive Distribution
- Literature: Chapter 2, Section 2.2, 2.5.2, Section 4.4, Example 8.1.
- First Set of Homework Exercises.
- February 18: data compression without probability
- Learning of context-free grammars from example sentences.
- Basics of Lossless Coding. Prefix Codes.
- Bernoulli distributions, maximum likelihood.
- Literature: Chapter 2, Section 2.1.
Chapter 3, Section 3.1, Handout, Section 1.
- Second Set of Homework Exercises
- February 25: Codes and Probabilities (the most important lecture!)
- The Kraft inequality. The most important insight of the class:
the correspondence between probability distributions and code length
functions. The information inequality, entropy, relative entropy
(Kullback-Leibler divergence). Shannon's coding theorem.
- Coding integers: probability vs. two-stage coding view.
- Literature: Chapter 3 (3.2,3.3,3.4)
- Third Set of Homework Exercises
- March 3rd:
- Coding with the help of the Bernoulli model, using index codes.
- Coding with the help of the Bernoulli model, using Shannon-Fano two-part codes.
- Coding with the help of the Bernoulli model, using Shannon-Fano Bayes mixture codes.
- Markov Models (Chains): Definition, Maximum Likelihood.
- Literature: Chapter 5 until 5.6.
- Fourth Set of Homework Exercises
- Solutions Fourth Set of Homework Exercises
- March 10th and 17th: no lectures.
- March 24th: Universal Coding
- March 31st:
- NML vs. Bayes universal code for parametric models. Jeffreys
prior.
- Jeffreys' prior as a uniform prior on the space of distributions
equiped with the KL divergence.
- Literature: Chapter 6, Section 6.1 and 6.2; Chapter 7, 7.1 until 7.3.1; Chapter 8, 8.1 and 8.2
- Slides for Online Lecture
- Part 1 and Part 2 of on-line video.
- Sixth Set of Homework Exercises
- April 7: Simple Refined MDL, Prequential Plugin Codes
- Simple Refined MDL with its many interpretations
- Prequential Interpretation of Simple Refined MDL
- Prequential Plug-in Code
- NML regret, complexity as number of distinguishable distributions
- Literature: Chapter 9; Chapter 14, Section 14.1 and 14.2, esp. the box
on page 426.
- Part 1 and Part 2 of video recorded lecture.
- Slides for Online Lecture
- Seventh Set of Homework Exercises
- April 14: General Refined MDL, Prediction with MDL, Issues with Universal Codes/MDL
- April 21: Excursion: Safe Testing I.
- April 28: Excursion: Safe Testing II.
- May 5th: No Lecture! (liberation day)
- May 12th: Maximum Entropy
- May 19th: MaxEnt and MDL, Overview, Wrap Up
- Robustness Property of Exponential Families
- Maximum Entropy and Minimum Description Length. The zero-sum coding game.
- Back to Safe Testing: General Existence of Non-Trivial S-Values. The uniformly most powerful S-Value at a given sample size.
- Overview of whole course. What is it all good for?
- Literature: Chapter 19, 19.1-19.3, 19.5.
- Slides for Online Lecture
- Friday June 5th, 10.15-13.15: Exam (to be made at home, hand-in via email) Example examination.