Information-Theoretic Learning (ITL)
Leiden University, Spring Semester 2018
All students are requested to register for the course via blackboard (in addition to USIS).
Bachelor students who want to investigate the possibility of letting (the EC of) this course count towards their master's diploma, are advised to contact the chairman of the Exam Committee (Ronald van Luijk, firstname.lastname@example.org) at their earliest convenience.
|Lecturer||Prof. Dr. Peter Grünwald, Leiden
University, Mathematical Institute, and Centrum Wiskunde & Informatica
|Teaching assistant||Rianne de Heide,
Leiden University, Mathematical Institute, and Centrum Wiskunde & Informatica
|Contact: ||send email to:
The URL of this webpage is
www.cwi.nl/~pdg/teaching/inflearn. Visit this page regularly for
changes, updates, etc.
This course is on an interesting but complicated subject. It is given
at the master's or advanced bachelor's level. Although the only
background knowledge required is elementary probability theory, the
course does require serious work by the student. The course load is 6
ECTS. Click here (studiegids) for a general course description.
Many thanks are due to Steven de Rooij (Leiden University) who prepared a
significant proportion of the exercises.
Lectures and Exercise Sessions
Lectures take place each Tuesday from 13.30--15.15 in room 405
of the Snellius Building, Niels Bohrweg 1, Leiden. The lectures
are immediately followed by a mini-exercise session held by Rianne de Heide.
The first lecture will take place February 6th, 2018. There will be no
lectures on March 13, April 3 and April 17. The last
official lecture is scheduled for May 22, and the final exam is
provisionally scheduled for Monday May 28th (note: the official schedule says 'Tuesday May 28th', but that date does not exist!), 14.00-17.00, in room B02.
Weekly Homework: At every lecture on Tuesday except the first
there is a homework assignment. The assignment will also be made
available on this webpage. Homework is obligatory and must be
turned in at or before the beginning of the next lecture, i.e. one
week after the assignment was handed out. You can turn in your
homework digitally via blackboard or (printed or handwritten) via
teaching assistant Rianne de Heide's pigeon hole (postvakje) or by
handing it over to me or Rianne at the beginning of the lecture.
After the lecture, there is (approximately) 30 minutes homework
session, during which the homework will be explained and discussed by
Rianne. Turning in written complete homework in time is required, see
Credit 6 ECTS points.
Examination form In order to pass the course, one must obtain
a sufficient grade (6 or higher) on
both of the
The final grade will be determined as the average of the
- An open-book written examination (to be held Tuesday May 29th).
- Homework. Each student must hand in solutions to homework
assignments at the beginning of the lecture after the homework was handed out.
the problems in the group is encouraged, but every participant must
write down her or his answers on her or his own. The final homework
grade will be determined as an average of the weekly grades.
Literature We will mainly use various chapters of the
following source: P. Grünwald. The Minimum Description Length
Principle, MIT Press, 2007. Some additional hand-outs will be made
available free of charge as we go. For the second week, this is Luckiness
and Regret in Minimum Description Length Inference, by Steven de
Rooij and Peter Grünwald, Handbook of the Philosophy of Science,
Volume 7: Philosophy of Statistics, 2011. This paper gives an overview
of the part of this course that will be concerned with the relation
between statistics, machine learning and data compression, as embodied
in MDL learning.
Lecture contents are subject to change at any time for any reason.
A more precise schedule, with links to all exercises, will be
determined as we go.
- February 6: introduction
- General introduction: learning, regularity, data compression. Kolmogorov Complexity; deterministic vs. purely random vs. ``stochastic'' sequences.
- Literature: Chapter 1 up to Section 1.5.1.
- February 13: data compression without probability
- Learning of context-free grammars from example sentences.
- Basics of Lossless Coding. Prefix Codes.
- Bernoulli distributions, maximum likelihood.
- Literature: Chapter 2, Section 2.1.
Chapter 3, Section 3.1, Handout, Section 1.
- First Set of Homework Exercises
- February 20: Codes and Probabilities (the most important lecture!)
- The Kraft inequality. The most important insight of the class:
the correspondence between probability distributions and code length
functions. The information inequality, entropy, relative entropy
(Kullback-Leibler divergence). Shannon's coding theorem.
- Coding integers: probability vs. two-stage coding view.
- Literature: Chapter 3 (3.2,3.3,3.4)
- Second Set of Homework Exercises
- February 27: Preparatory Statistics.
- Maximum Likelihood and Bayesian Inference; Bayes Predictive Distribution
- Literature: Chapter 2, Section 2.2, 2.5.2, Section 4.4, Example 8.1. (!)
- Third Set of Homework Exercises
- March 6:
- Coding with the help of the Bernoulli model, using index codes.
- Coding with the help of the Bernoulli model, using Shannon-Fano two-part codes.
- Coding with the help of the Bernoulli model, using Shannon-Fano Bayes mixture codes.
- Markov Models (Chains): Definition, Maximum Likelihood.
- Literature: Chapter 5 until 5.6.
- Fourth Set of Homework Exercises
- March 13th: No Lecture!
- March 20th: Universal Coding
- Now it really gets exciting!
- Regret, Minimax Regret, NML Universal Code for finite and
- Asymptotic expansion of KL divergence
- NML vs. Bayes universal code for parametric models. Jeffreys
prior Part I.
- Literature: Chapter 4, 4.1-4.3; Chapter 6, Section 6.1 and 6.2; Chapter 7, 7.1 and 7.2; Chapter 8, 8.1 and 8.2
- Fifth Set of Homework Exercises
- March 27th:
- NML vs. Bayes universal code for parametric models. Jeffreys
prior Part II.
- Jeffreys' prior as a uniform prior on the space of distributions
equiped with the KL divergence.
- Literature: Chapter 6, Section 6.1 and 6.2; Chapter 7, 7.1 until 7.3.1; Chapter 8, 8.1 and 8.2 <
- Sixth Set of Homework Exercises
- April 3: No Lecture!
- April 10: Simple Refined MDL, Prequential Plugin Codes
- Simple Refined MDL with its many interpretations
- Prequential Interpretation of Simpe Refined MDL
- Prequential Plug-in Code
- NML regret, complexity as number of distinguishable distributions
- Literature: Chapter 9; Chapter 14, Section 14.1 and 14.2, esp. the box
on page 426.
- Seventh Set of Homework Exercises
- April 17: No Lecture!
- April 24: General Refined MDL, Prediction with MDL, Issues with Universal Codes/MDL
- General Refined MDL
- MDL Prediction/Model
Selection/Estimation/Mixed 1-part/2-part Codes
- p-value interpretation
- Issues: undefined NML or Jeffreys' prior, Horizon (In)Dependence
Chapter 6, Section 6.4; Chapter 11, Section 11.4; Chapter 14, Section 14.1, 14.2, 14.3
- Eighth Set of Homework Exercises
- May 1st: Excursion: Sequential Prediction with General Loss Functions
- May 8th: Excursion, Part II.
- May 15th: Maximum Entropy
- Maximum Entropy Principle
- How to find MaxEnt distributions
- Exponential Families and Maximum Entropy
- Literature: Chapter 18, Section 18.1-18.4; Chapter 19, Section 19.5.1.
- Eleventh Set of Homework Exercises
- May 22nd: MaxEnt and MDL ; Overview/Wrap Up
- Canonical and Mean-Value Parameterization
- Robustness Property of Exponential Families
- Maximum Entropy and Minimum Description Length. The zero-sum coding game.
- Literature: Chapter 19, 19.1-19.3, 19.5.
- MONDAY May 2814:00-17:00: Open-Book Examination in Room B2 of the Snellius building. Example examination.
Peter Grünwald’s home