Workshop Metadata or not?

Visited by Katharina Schwarz
19.05.2004, Auberge du Bonheur, Tilburg.

First talk: M. van Mackelenbergh, zelfstandig adviseur - Metadata, the art of adding signposts

One of Marcel's interests is to find a scientifically tested method for creating good metadata. According to his research and experience, it all comes down to the following four characteristics of metadata:

aboutness
only useful information
only relevant information for target group
clear and precise

He observed that generally there are not all that many target groups, usually just one or two, so it is possible to identify precisely the characteristics and requirements of these target groups.

Comment:
I got the impression from his talk that he doesn't say anything particularly new. It is recognized that these issues are important, but putting them in practice is the hard part. I disagree with his opinion to make only useful and relevant metadata, because it is not possible to foresee which metadata will be needed when by whom. I was more convinced by the approach of the next speaker.

recommended reading:
Women, fire and dangerous things, by Lakoff

Second talk: Prof C. Koster, Universiteit Nijmegen - Contextual and contentual meta data

Koster gave a talk in which he defended both a pro and a con view on metadata. He first compared the metadata problems with thesaurus problems. Some of the typical problems he listed are changes in terminology, updating metadata, adding new types of metadata to old information objects and human errors.

A solution he presented was to distinguish contextual from contentual metadata. The former can and should be made right away, because it cannot be done at a later point in time. The latter, in contrast, can be done automatically at any time, provided the full text of the content is stored aswell, preferably in ASCII. He described the following techniques:

automatic document classification - look for similarity between new document and example documents
term extraction
full-text mining

This talk was very entertaining: Koster looks like Santa Clause and acts like Gandalf.

Discussion

After the break the group was split in two. One half prepared arguments supporting the statement that humans would always be needed when creating metadata, the other group defended the statement that all metadata creation will at one point be automated. I was in the former group, our arguments were that

machines cannot extract meaningful metadata from information objects in other formats than texts, like images, audio files, executables etc,
documents carry an explicit and an implicit meaning, and only a human can find that implicit meaning (compare political message in Orwell's Animal Farm), and
human intervention is definitely needed when merging different metadata sets, because only humans can see the overlapping meanings.

We had a proper discussion between the two groups, with feelings running high and many dirty rhetorical attempts to shatter the opponent's composure and reasoning! All in good spirit, of course. The opponents' favourite argument was that all the things we brought forward as proof that automation is impossible, was that it is not possible YET, but will be in the future...

It was a successful and fun workshop, at which I met some members of the WGI for the first time and learned about their backgrounds and interests in Informatiewetenschap. There were also members of the DARE community, and I got a feeling for the many angles from which people are interested in the field of information storage, usage and sharing.