Ming Li and Paul Vitanyi, An
introduction to Kolmogorov
complexity and its applications. 3rd edition, Springer,
2008. The 3rd edition has a couple of hundred pages more that the 2nd
edition, many corrections etc., and is about $20 cheaper than the 2nd edition
on Amazon ($67,96).
Book on Kolmogorov Complexity, the NEW 3rd edition of 2008.
The course material
will be mainly from our textbook and papers that will be provided
at this website. The students will make weekly homework, hand it in to
the Kolmogorov complexity homework mailbox at the Euclides building by
the next Wednesday
afternoon 17:00 so that Wouter can treat the corrected homework
at the following Friday sessions.
There will be a mid-term
written exam and a final exam.
This course is a
general course on Kolmogorov complexity and many of its applications.
Kolmogorov complexity is a modern theory of information and randomness.
The goal of this course is to teach the students the basic concepts of
Kolmogorov complexity and how to use this tool in their own research.
Topics include: plain Kolmogorov complexity, randomness,
prefix Kolmogorov complexity, incompressibility method,
information distance, applications in various fields ranging from
average-case analysis of algorithms to bioinformatics and from
document comparison to internet search, prefix complexity,
universal distribution (Solomonoff-Levin), Solomonoff prediction,
and minimum description length (MDL) hypothesis selection.
We will not be able to cover
the complete textbook, rather we will focus more on recent
developments and especially on applications of Kolmogorov
complexity.
Marking Scheme: Each student is expected to do every week homework
and mid-term and final exam. Maybe the exams will be replaced by
a final project, which can be either theoretical or programming,
which is presented in class. We will see what is feasible.
Homework 30%, midterm and final exam or
final project and presentation 60% (30% each), 10% class attendance
(including guest lectures and student presentations) and
participation in discussions.
Grading and Credits:
Credits: 10EC.
There will be a mid-term exam in early April, and a final
exam in June. If the audience is capable and willing exams
may be replaced by challenge tasks possibly executable by teams of two.
Course announcements and lecture notes will appear on this page.
You are responsible for reading it (and downloading the
lecture notes before classes) regularly.
Lectures:
Note: The lecture notes are constantly being improved, so it is a good idea
to download the most recent version before class. Since they are improved
after class as well, the version a few days after class is even better.
The description of homework and lectures is given both with regard to
the 2nd edition of 1997 of the textbook and to the 3rd
edition of 2008 of the textbook, leaving the info added to the 2nd edition.
Lecture 1, week 6 (05-02): History, definition of Kolmogorov complexity, basics
(Chapters 1 and 2). week 6. (05-02)
Lecture 1 notes, ppt file or
Lecture 1 notes, pdf file.
Homework: (both in Editions 2 and 3) 1.6.5, 1.6.6, 1.11.1, 1.11.2, 1.11.7,
2.1.1 (not 2.1.1.d), 2.1.2, 2.1.4.
Lecture 2, week 7 (12-02): Randomness. Definitions and Martin-Lof tests
(Sections 2.4, 2.5).
Lecture 2 notes, ppt file or
Lecture 2 notes, pdf file.
Homework: (both in Editions 2 and 3) 2.8.1.a, 2.8.2 (not 2.8.2.a), 2.5.1, 2.5.4, 2.6.1,
and prove the following statement: If a binary sequence x of length n has exactly n/2 0's, then x is not c-incompressible, for large enough n.
Lecture 3, week 8 (19-02): Shannon Information and Kolmogorov complexity:
Symmetry of Information (Section 2.8).
Lecture 3 notes, ppt file or
Lecture 3 notes, pdf file.
Homework: (both in Editions 2 and 3) 2.1.5, 2.1.10, 2.2.2 (in Edition 2 the $O(1)$ in 2.2.2.c
should be $O(\log\log n)$), 2.2.5, 2.7.1.
Lecture 4, week 9 (26-02): Information Distance
normalized information distance and applications.
(Section 8.3,
this
paper, and
this
paper, and
this paper).
Lecture 4 notes, ppt file or
Lecture 4 notes, pdf file
and possibly
Lecture 4 QA notes as ppt file or
Lecture 4 QA notes as pdf file.
Homework: In Edition 3: 8.4.4.a, prove Theorem 8.3.6, 8.4.5, 8.4.6, 8.4.8.
In Edition 2 these exercise didn't occur yet. Written down they are: Problem 1. Show that the NID (normalized information distance)
max {K(x|y),K(y|x)} / max {K(x),K(y)}
is not computable.
Problem 2. Prove that the sum distance $E_4 = K(x|y)+K(y|x)$
is a metric. Prove that it is an admissible distance
and it majorizes every admissible distance
D(x,y) up to a factor of 2: E(x,y) \leq 2D(x,y) +O(1).
(Similar to Theorem 8.3.2.)
Prove that its normalization
defined by $e_{4} = (K(x|y)+K(y|x))/K(x,y)$
takes values in $[0,1]$. It is also a metric, and if you have time and gusto, prove that.
Problem 3. Let $Z(x)$ denote the binary length
of the compressed version of $x$ using compressor Z.
A compressor Z is {\em normal}
if it is a real compressor, and satisfies the axioms
(identity) $Z(xx)=Z(x)$, and $Z(\lambda)=0$, where
$\lambda$ is the empty string,
(monotonicity) $Z(xy) \geq Z(x)$,
(symmetry) $Z(xy)=Z(yx)$, and
(distributivity) $Z(xy) + Z(z) \leq Z(xz)+Z(yz)$,
all (in)equalities up to an additive $O(\log n)$ term,
with $n$ the maximal binary length of a string
involved in the (in)equality concerned.
(a) Show that $E_Z(x,y) = Z(xy)- \min\{Z(x),Z(y)\}$,
with Z a normal compressor,
is computable, satisfies
the density requirement $\sum{y:y\neq x} 2^{-Z(x,y)} \leq 1$,
and satisfies the metric (in)equalities up to
additive $O(\log n)$ terms,
with $n$ the maximal binary length of a string
involved in the (in)equality concerned.
(b) Show that the distance $e_Z E_Z(x,y)/\max{K(x),K(y)}
is a normalized version of $E_Z(x,y)$,
with Z a normal compressor, has values in $[0,1]$. If you have time and gusto,
prove that it also satisfies the metric
(in)equalities up to
additive $O((\log n)/n)$ terms,
with $n$ the maximal binary length of a string
involved in the (in)equality concerned.
Problem 4. Show that the NGD distance $e_G$
is computable, takes values primarily in $[0,1]$ but also
outside in pathological cases, and is not a metric since it
violates $e_G(x,y) \neq 0$ for $x \neq y$, and
$e_G(x,y)+e_G(y,z) \leq e_G(x,z)$.
Lecture 5, week 10 (05-03): The incompressibility method and its applications in
average case analysis and lower bounds (Chapter 6)
Lecture 5 notes, ppt file or
Lecture 5 notes, pdf file
Additional lecture 5 notes
by Tao Jiang, pdf file
Homework: (both in Editions 2 and 3) 6.8.2, 6.8.3, 6.8.7, 6.8.8.
Lecture 5.1, week 11 (12-03): The incompressibility method
continued
Lecture 5.1 notes, ppt file or
Lecture 5.1 notes, pdf file.
Problem 1: Use Kolmorogov complexity to prove that a 1 head 2-way DFA cannot accept L={w#w | w in {0,1}*}. Problem 2: Using Kolmorov complexity, obtain the average case lower bound for binary search. Problem 3: Using Kolmogorov complexity (directly, without using theorems from Chapter 6), prove that {0^n 1^m | m>n} is not regular. Problem 4. Do Exercise 6.8.5 (both in Editions 2 and 3). Problem 5: Shaker sort compares each adjacent pair of items in a list in turn, swapping them if necessary, and alternately passes through the list from the beginning to the end then from the end to the beginning. It stops when a pass does no swaps. Using incompressibility argument, prove a tight average case lower bound for Shaker sort. Problem 6. Do Exercise 6.3.1 (both in Editions 2 and 3), using Kolmogorov complexity. Problem 7. Do Exercise 6.3.2 (Edition 2) which is 6.3.2 (Edition 3), using Kolmogorov complexity.
Mid-term exam, week 12 (19-03) (Ch. 1 especially 1.1, 1.5.1, 1.5.3, 1.9, 1.11;
Ch 2: 2.1, 2.2, 2.3, 2.6 (Thms), 2.7, 6.1, 6.3, 6.6, 6.8, 6.9,7.5.)
Plus contents ppt lectures 1 through 5.1; but the questions will primarily be about the mentioned sections..
Time: 16:00--19:00; Place: Room P016 Euclides.
Lecture 7, week 15 (09-04): Inductive Inference (Chapter 5)
Lecture 7 notes, ppt file or
Lecture 7 notes, pdf file.
Homework: (Edition 2) Prove the assertion of Example 5.2.4, and
do Exercises 5.3.1, 5.3.2, 5.3.5, 5.3.6.
Homework (Edition 3) Prove the assertion of Example 5.2.6, and
do Exercises 5.2.10, 5.2.11, 5.2.12.a, 5.2.12.b (the first b).
Lecture 9, week 17 (23-04): Guest lecture Peter Grunwald: MDL in statistical inference.
(Selection from
Chapter 5 of L\&V plus
Handout, pdf file
Chapter 1 has to be studied upto and including 1.4; 1.5, 1.6 and 1.7
are usefull but not obligatory. Specially look for the cadred boxes in
1.5, 1.6 and 1.7. Chapter 2 has to be studied up to and including 2.6.1.
Homework in pdf. This is also for Lecture 10.
Week 19 (07-05):
No lecture.
Lecture 10, week 20 (14-05): Guest lecture Peter Grunwald: Applications of MDL learning.
For Handout and Homework see Lecture 8.
Final exam: week 22 (28-05) 14.00-17.00 Euclides room P0.16.
(Course 1 on page xii (ed 2) + new stuff = Course 1 on page xv (ed 3)
of the textbook Li-Vitanyi plus material from
the ppt lectures, and also MDL in pdf.
Primarily the stuff not covered on the midterm,
although plain complexity should be known.)
Possible project topics:
Energy-time trade off for information processing (lecture 4).
This is a theoretical research oriented project, related to Lecture 4.
Alternative normalization of E(x,y) in Lecture 4.
Experimental project. Compare information distance against
traditional distances for internet search, as discussed in Lecture
5 notes.
Information distance approach applied to protein-protein relationship. This
is proposed by Brona Brejova. This is probably suitable for
a bioinformatics student.
History and comparative study of information distance metrics.
(Talk to me, I will provide some papers.)
Simplify and improve average-case analysis of
Quicksort using Kolmogorov complexity (I will provide the papers).
Improve the Shellsort average-case lower bound (Open). This is very
difficult.
Incompressibility method: find a problem in your own research, make an
average case or lower bound analysis of your problem.
Please do not hesitate to discuss with me on both finding and solving
your project problems. Please let me know what you wish to
work on. We generally prefer people working
on the different projects, unless it has some research component.
Announcements:
No announcements.
Maintained by Paul Vitanyi. These lecture notes,
power point presentations, and selection of papers,
are developed in cooperation with Ming Li,
David R. Cheriton School of Computer Science,
University of Waterloo, Waterloo, Ontario N2L 3G1
Email: mli at uwaterloo dot ca.