A standard PC file-compression program can tell the difference between
classical music, jazz and rock, all without playing a single note. This new-found
ability could help scholars identify the composers of music that until now
has remained anonymous.
The technique exploits the ability of off-the-shelf "zip" data-compression
software to do more than just squeeze PC files into manageable sizes. For
instance, various zip programs have already been used to detect the language
a piece of text is written in (New Scientist print edition, 15 December 2001).
To do this, you first take several long text files, each in a known language,
and compress them, noting the file size of each. You then append the unknown
file to each of the uncompressed, known files in turn, and compress them
again, noting the difference that adding the unknown file makes in each case.
The smaller the difference, the more likely the languages are to be the
same. That is because the zip program looks for duplicated sequences in the
text to shrink it without losing information.
Rudi Cilibrasi, Paul Vit˙nyi and Ronald de Wolf of the Dutch National
Research Institute in Amsterdam wondered if such compression could also help
distinguish between musical genres. So they tried it out on digital files
of various pieces, including some from Beethoven, Miles Davis and Jimi Hendrix.
Rhythm and melody
They subtracted any data unrelated to the actual music, such as digital
ID tags, to create a data string representing only the rhythm and melody
of the tune. Using a program called Bzip2, they followed a similar procedure
as with the text files, measuring how similar each piece was to every other.
Then they plotted the results in a way that produces a tree-shaped pattern,
in which similar pieces cluster together on the same branch.
In a test with 12 each of jazz, classical and rock pieces, the results
were fairly good. Ten of the jazz, nine of the rock and most of the classical
pieces ended up in three distinct branches of the tree.
When applied to 32 classical pieces, the technique clustered each composer
on a separate branch. Vit˙nyi thinks the trick could help identify a plausible
composer for works of unknown origin, as long as they have written several
known works for comparison. It could also help online music stores, for example
by classifying music files.
The technique's elegance lies in the fact that it is tone deaf. Rather
than looking for features such as common rhythms or harmonies, says Vit˙nyi,
"it simply compresses the files obliviously."
"I would love a technique that can work out who wrote something just by
putting the notes on a page into a computer," says Jeremy Summerly of the
Royal Academy of Music in London, who tries to identify the composers of
unattributed fragments of 16th-century musical scores. The technique is promising,
he says, because it detects features of a piece that the composer does not
consciously think about, but which are actually their hallmark.
Summerly hopes to see what the technique makes of the second half of Mozart's
Requiem, completed by Franz Süssmayr after Mozart's death. The way it clusters
among other works by Mozart and Süssmayr might reveal how much original work