First DELOS International Summer School on Digital Library Technologies ISDL 2001 9-13 July, Pisa Italy http://www.iei.pi.cnr.it/DELOS/delos2/SummerSchool/1stschool.htm Lecturers: ---------- Carl Lagoze (Cornell University, USA) Robert Wilensky (University of California, Berkeley, USA) William Y. Arms (Cornell University, USA) Norbert Fuhr (University of Dortmund, Germany) Andreas Paepcke (Stanford University, USA) Brewster Kahle (Internet Archive, USA) Carol Peters (CNR-IEI, Italy) Fabrizio Sebastiani (CNR-IEI, Italy) Howard Wactlar (Carnegie Mellon University, USA) Elizabeth Lyon (UKOLN, UK) * Introduction to Mixed Media Digital Libraries * Carl Lagoze Nice talk giving a good overview of issues in Digital Libraries (DL). - Digital Object Architecture Defines DL as collections of bits. Challenge is to make them accessible. One way of doing this by using a Digital Object Model(FEDORA), which defines different views on digital data. These views are realized by so called dissemators which transforms and filter the digital content to the desired form. For example the 'same' digital data can be viewed as a book, a photo collection or a Dublin Core entity - Access management managing access policies on digital data - Meta data Frameworks Meta data is always biased so provide meta data from different perspectives. Find balance between costs and functionality RDF is instantiation of Warwick Framework SMIL is meta data for complex multimedia objects see: http://sunspot.dstc.edu.au:8888/smil/search.html http://metadata.net/harmony/Publications.htm - Exchanging structured data protocols and standardization * Digital Libraries: New Models for scholarly Dissemination * Robert Wilensky Talk was more project focused and talked and less introductory stuff. Talked (high level) mainly about features of of a browser system they implemented which was able to be extended by third parties so 'good' ideas would have an chance of being actually used. Examples of these ideas where for example, dots along the scroll bar for every matching keyword. Within a link the first five words of the referring document are added. In case the link is broken the referring page can still be retrieved (with high probability) by a search for the five first words in a search-engine. The browser had also extensive annotation features. The underlying document model of the browser is called multivalent documents. Any document could be expressed as a multivalent document. We asked him about the multivalent model and he was very surprised we didn't knew about it. see: http://http.cs.berkeley.edu/~phelps/Multivalent/ see: http://elib.cs.berkeley.edu/ Second part of the talk was about a image retrieval system called blobworld see: http://elib.cs.berkeley.edu/photos/blobworld/ The offline! demo was very impressive. Too impressive I would say, didn't really believe it. * Digital Library Architectures and Open Access to Digital Libraries * William Y. Arms This talk was about library stuff like, economics of open access publishing, automatic indexing, quality control, long-time preservation. Did not find it relevant to mm. Last part though was a case study about semi automatic generation of presentations which had been mainly done by designers. It is able to differentiate between multiple expertise levels see: http://www.siteforscience.org * DL and IR: IR models and methods, metadata and evaluation * Norbert Fuhr Technical talk, interesting though. Found lots of similarities between AI and IR-techniques. Both for knowledge representation and clustering/Information Retrieval techniques. Not sure to what extend they are into AI research. Question about purpose of RDF-Schema when we have XML-schema popped up which, Fuhr could not answer satisfactory because of lack of knowledge about XML-schema and RDF-schema standards. * Online Information Access from Hand held Devices * Andreas Paepcke Nice presentation from an educational perspective. Let us discover problems in GUI design (for small devices) ourselves by giving assignments for small groups to design a 'travel-dictionary'. Gave some 'design'-rules to get us started which are common knowledge among GUI-people i think. For example the conceptual model of a GUI should resemble real-life 'devices' as much as possible to prevent extra learning efforts for the user. A remarkable rule which differentiates between PDA-GUI and PC-GUI is that a PC-GUI provides a lot of context for a certain action (frames/ main- and sub-windows) user expects only small changes in GUI. In a PDA-GUI you can not provide a context so an action on a PDA device should change the screen completely to prevent the user of thinking there is a context. suggested book: "GUI bloopers" by Don Norman * Public Access to Digital Materials * Brewster Kahle This talk was mainly about making large libraries available to the public. Claimed that collecting data is very cheap (compared to books etc) and that we should just do it not worrying about rights, responsibilities etc. because if we do the data is gone for ever. Kahle is director of the Internet Archive which preserves the 'complete' web from time to time and makes it publicly available. * Cross Language Information Retrieval * Carol Peters Talk gave an overview of issues in Cross Language IR. Pointed out the web is becoming more multilingual and to be able to exploit it fully Cross language IR is important. Remarkably 'The grand challenge' of cross language IR is about the same as our 'grand challenge' (in MM generation) a little more emphasis on language though. Showed some techniques and approaches to deal with cross language IR and automatic translation of document. * Text Categorization and Information Filtering * Fabrizio Sebastiani This was basically a very short overview of some of the courses I had during my AI studies. Main topic was machine learning and how we could apply it to text categorization/classification tasks. Techniques: probabilistic, neural nets, decision tree, nearest neighbor, vector space models. Compared advantages and disadvantages for these techniques. Interesting talk, but didn't learn much new. * Video DL's * Howard Wactlar Presented a system which automatically indexes news-broadcast videos. speech is converted to text and used to retrieve media-clips. System also recognizes text within the picture (captions) and stores them with the video. This is mainly useful for finding persons. The system detects scenes, zooming etc and is able to recognize faces ans 'similar' images. Had some nice features like searching for a topic in a certain timespan which resulted in a kind of documentary about a certain subject. I found their system kind of impressive but it is very focused and tailored for news broadcasts (only CNN) I wonder what will happen if another non-news video library is used. Is it still useful for other people than journalists? * Libraries in the digital world * Liz Lyon Talk consisted of two parts. 1st part was mainly a management talk about network resources for schools in the UK. Talked a little about combining resources which sounded interesting but did not discuss techniques used to actually make the combining work. see: http://www.ukoln.ac.uk/distributed-systems/dner/arch/ Second part was about a system called PATRON which helped students in music and dance to study sheet music and dance-scenes. Gave a demo which looked/sounded nice but again she didn't talk much about details. (I think she adapted to the situation (last talk, nice weather etc..)) One remark: they used SMIL for synchronization but found out it had its limitations and I think they stopped using it. I don't know what they used instead. see: http://www.lib.surrey.ac.uk/Patron2/ Conclusion: I think the ISDL2001 was very interesting. The intended audience was quite broad. From librarians to computer scientist and therefore most of the talks where quite general and high-level. Nevertheless a somewhat 'complete' picture of DL's was sketched which turned out to be a very broad field. The issues which came back more often however were: - Information Retrieval - Meta-data/semantic web - Policy management - Quality insurance of digital publications Besides the scientific content the culinary content and the weather content were absolutely great! See for a detaild description the 'INS2 Leisure Report' directory :) http://www.darmstadt.gmd.de/~labbate/Pisa/ http://clientes.netvisao.pt/fbento/memorias/1st_ISDL_DELOS_Summer_School_Pisa_Italia_2001_July_07_14/ joost