Information-Theoretic Learning (ITL)
Leiden University, Spring Semester 2019
All students are requested to register for the course via blackboard (in addition to USIS).
Bachelor students who want to investigate the possibility of letting (the EC of) this course count towards their master's diploma, are advised to contact the chairman of the Exam Committee (Ronald van Luijk, email@example.com) at their earliest convenience.
|Lecturer||Prof. Dr. Peter Grünwald, Leiden
University, Mathematical Institute, and Centrum Wiskunde & Informatica
|Teaching assistant||Muriel Perez Ortiz,
Leiden University, Mathematical Institute, and Centrum Wiskunde & Informatica
|Contact: ||send email to:
The URL of this webpage is
www.cwi.nl/~pdg/teaching/inflearn. Visit this page regularly for
changes, updates, etc.
This course is on an interesting but complicated subject. It is given
at the master's or advanced bachelor's level. Although the only
background knowledge required is elementary probability theory, the
course does require serious work by the student. The course load is 6
ECTS. Click here (studiegids) for a general course description.
Many thanks are due to Steven de Rooij (Leiden University) who prepared a
significant proportion of the exercises.
Lectures and Exercise Sessions
Lectures take place each Tuesday from 13.30--15.15 in room 412
of the Snellius Building, Niels Bohrweg 1, Leiden. The lectures
are immediately followed by a mini-exercise session held by Muriel Perez.
The first lecture will take place February 5th, 2019. There will be no
lectures on March 12, April 16 and April 23 (though we may swap the March 12th date withanother week, if everyone agrees). The last
official lecture is scheduled for May 21, and the final exam is
provisionally scheduled for Tuesday May 28th, also in room B412.
Weekly Homework: At every lecture on Tuesday except the first
there is a homework assignment. The assignment will also be made
available on this webpage. Homework is obligatory and must be
turned in at or before the beginning of the next lecture, i.e. one
week after the assignment was handed out. You can turn in your
homework digitally via blackboard or (printed or handwritten) by
handing it over to me or Muriel at the beginning of the lecture.
After the lecture, there is (approximately) 30 minutes homework
session, during which the homework will be explained and discussed by
Muriel. Turning in written complete homework in time is required, see
Credit 6 ECTS points.
Examination form In order to pass the course, one must obtain
a sufficient grade (6 or higher) on
both of the
The final grade will be determined as the average of the
- An open-book written examination (to be held Tuesday May 28th).
- Homework. Each student must hand in solutions to homework
assignments at the beginning of the lecture after the homework was handed out.
the problems in the group is encouraged, but every participant must
write down her or his answers on her or his own. The final homework
grade will be determined as an average of the weekly grades.
Literature We will mainly use various chapters of the
following source: P. Grünwald. The Minimum Description Length
Principle, MIT Press, 2007. Some additional hand-outs will be made
available free of charge as we go. For the second week, this is Luckiness
and Regret in Minimum Description Length Inference, by Steven de
Rooij and Peter Grünwald, Handbook of the Philosophy of Science,
Volume 7: Philosophy of Statistics, 2011. This paper gives an overview
of the part of this course that will be concerned with the relation
between statistics, machine learning and data compression, as embodied
in MDL learning.
Lecture contents are subject to change at any time for any reason.
A more precise schedule, with links to all exercises, will be
determined as we go.
- February 5: introduction
- General introduction: learning, regularity, data compression. Kolmogorov Complexity; deterministic vs. purely random vs. ``stochastic'' sequences.
- Literature: Chapter 1 up to Section 1.5.1.
- February 12: data compression without probability
- Learning of context-free grammars from example sentences.
- Basics of Lossless Coding. Prefix Codes.
- Bernoulli distributions, maximum likelihood.
- Literature: Chapter 2, Section 2.1.
Chapter 3, Section 3.1, Handout, Section 1.
- First Set of Homework Exercises
- February 19: Codes and Probabilities (the most important lecture!)
- The Kraft inequality. The most important insight of the class:
the correspondence between probability distributions and code length
functions. The information inequality, entropy, relative entropy
(Kullback-Leibler divergence). Shannon's coding theorem.
- Coding integers: probability vs. two-stage coding view.
- Literature: Chapter 3 (3.2,3.3,3.4)
- Set of Homework Exercises
- February 26: Preparatory Statistics.
- Maximum Likelihood and Bayesian Inference; Bayes Predictive Distribution
- Literature: Chapter 2, Section 2.2, 2.5.2, Section 4.4, Example 8.1. (!)
- March 5:
- Coding with the help of the Bernoulli model, using index codes.
- Coding with the help of the Bernoulli model, using Shannon-Fano two-part codes.
- Coding with the help of the Bernoulli model, using Shannon-Fano Bayes mixture codes.
- Markov Models (Chains): Definition, Maximum Likelihood.
- Literature: Chapter 5 until 5.6.
- March 12th: Universal Coding Note: there is a lecture on March 12th even though the official schedule on the web says there isn't!
- Now it really gets exciting!
- Regret, Minimax Regret, NML Universal Code for finite and
- Asymptotic expansion of KL divergence
- NML vs. Bayes universal code for parametric models. Jeffreys
prior Part I.
- Literature: Chapter 4, 4.1-4.3; Chapter 6, Section 6.1 and 6.2; Chapter 7, 7.1 and 7.2; Chapter 8, 8.1 and 8.2
- March 19th: No Lecture! (even though the official schedule on the web says that there is!)
- March 26th:
- NML vs. Bayes universal code for parametric models. Jeffreys
prior Part II.
- Jeffreys' prior as a uniform prior on the space of distributions
equiped with the KL divergence.
- Literature: Chapter 6, Section 6.1 and 6.2; Chapter 7, 7.1 until 7.3.1; Chapter 8, 8.1 and 8.2 <
- April 2: Simple Refined MDL, Prequential Plugin Codes
- Simple Refined MDL with its many interpretations
- Prequential Interpretation of Simpe Refined MDL
- Prequential Plug-in Code
- NML regret, complexity as number of distinguishable distributions
- Literature: Chapter 9; Chapter 14, Section 14.1 and 14.2, esp. the box
on page 426.
- April 9: General Refined MDL, Prediction with MDL, Issues with Universal Codes/MDL
- General Refined MDL
- MDL Prediction/Model
Selection/Estimation/Mixed 1-part/2-part Codes
- p-value interpretation
- Issues: undefined NML or Jeffreys' prior, Horizon (In)Dependence
Chapter 6, Section 6.4; Chapter 11, Section 11.4; Chapter 14, Section 14.1, 14.2, 14.3
- April 16: No Lecture!
- April 23: No Lecture!
- April 30: Excursion: Sequential Prediction with General Loss Functions
- May 7th: Excursion, Part II.
- May 14th: Maximum Entropy
- Maximum Entropy Principle
- How to find MaxEnt distributions
- Exponential Families and Maximum Entropy
- Literature: Chapter 18, Section 18.1-18.4; Chapter 19, Section 19.5.1.
- May 21st: MaxEnt and MDL ; Overview/Wrap Up
- Canonical and Mean-Value Parameterization
- Robustness Property of Exponential Families
- Maximum Entropy and Minimum Description Length. The zero-sum coding game.
- Literature: Chapter 19, 19.1-19.3, 19.5.
- May 2814:00-17:00: Open-Book Examination in Room B2 of the Snellius building. Example examination.
Peter Grünwald’s home