ISDL2001.html

First DELOS International Summer School on Digital Library Technologies
ISDL 2001
9-13 July, Pisa Italy

http://www.iei.pi.cnr.it/DELOS/delos2/SummerSchool/1stschool.htm

Lecturers:
----------
Carl Lagoze (Cornell University, USA)
Robert Wilensky (University of California, Berkeley, USA)
William Y. Arms (Cornell University, USA)
Norbert Fuhr (University of Dortmund, Germany)
Andreas Paepcke (Stanford University, USA)
Brewster Kahle (Internet Archive, USA)
Carol Peters (CNR-IEI, Italy)
Fabrizio Sebastiani (CNR-IEI, Italy)
Howard Wactlar (Carnegie Mellon University, USA)
Elizabeth Lyon (UKOLN, UK)

* Introduction to Mixed Media Digital Libraries *
Carl Lagoze

Nice talk giving a good overview of issues in Digital Libraries
(DL).

- Digital Object Architecture
Defines DL as collections of bits. Challenge is to make them
accessible. One way of doing this by using a Digital Object Model(FEDORA),
which defines different views on digital data. These views are
realized by so called dissemators which transforms and filter the
digital content to the desired form. For example the 'same' digital data can be
viewed as a book, a photo collection or a Dublin Core entity

- Access management
managing access policies on digital data

- Meta data Frameworks
Meta data is always biased so provide meta data from different
perspectives.
Find balance between costs and functionality
RDF is instantiation of Warwick Framework
SMIL is meta data for complex multimedia objects
see:

http://sunspot.dstc.edu.au:8888/smil/search.html

http://metadata.net/harmony/Publications.htm

- Exchanging structured data
protocols and standardization

* Digital Libraries: New Models for scholarly Dissemination *
Robert Wilensky

Talk was more project focused and talked and less introductory
stuff. Talked (high level) mainly about features of of a browser
system they implemented which was able to be extended by third parties
so 'good' ideas would have an chance of being actually used. Examples
of these ideas where for example, dots along the scroll bar for every
matching keyword. Within a link the first five words of the referring
document are added. In case the link is broken the referring page can
still be retrieved (with high probability) by a search for the five
first words in a search-engine. The browser had also extensive
annotation features. The underlying document model of the browser is
called multivalent documents. Any document could be expressed as a
multivalent document. We asked him about the multivalent model and he
was very surprised we didn't knew about it.

see: http://http.cs.berkeley.edu/~phelps/Multivalent/

see: http://elib.cs.berkeley.edu/

Second part of the talk was about a image retrieval system called blobworld
see: http://elib.cs.berkeley.edu/photos/blobworld/
The offline! demo was very impressive. Too impressive I would say,
didn't really believe it.

* Digital Library Architectures and Open Access to Digital Libraries *
William Y. Arms

This talk was about library stuff like, economics of open access
publishing, automatic indexing, quality control, long-time
preservation. Did not find it relevant to mm. Last part though was a
case study about semi automatic generation of presentations which had
been mainly done by designers. It is able to differentiate between
multiple expertise levels

see: http://www.siteforscience.org

* DL and IR: IR models and methods, metadata and evaluation *
Norbert Fuhr

Technical talk, interesting though. Found lots of similarities between
AI and IR-techniques. Both for knowledge representation and
clustering/Information Retrieval techniques. Not sure to what extend
they are into AI research. Question about purpose of RDF-Schema when
we have XML-schema popped up which, Fuhr could not answer satisfactory
because of lack of knowledge about XML-schema and RDF-schema
standards.

* Online Information Access from Hand held Devices *
Andreas Paepcke

Nice presentation from an educational perspective. Let us discover
problems in GUI design (for small devices) ourselves by giving
assignments for small groups to design a 'travel-dictionary'. Gave some
'design'-rules to get us started which are common knowledge among
GUI-people i think. For example the conceptual model of a GUI should
resemble real-life 'devices' as much as possible to prevent extra
learning efforts for the user. A remarkable rule which differentiates
between PDA-GUI and PC-GUI is that a PC-GUI provides a lot of context
for a certain action (frames/ main- and sub-windows) user expects only
small changes in GUI. In a PDA-GUI you can not provide a context so an
action on a PDA device should change the screen completely to prevent
the user of thinking there is a context.
suggested book: "GUI bloopers" by Don Norman

* Public Access to Digital Materials *
Brewster Kahle

This talk was mainly about making large libraries available to the
public. Claimed that collecting data is very cheap (compared to books
etc) and that we should just do it not worrying about rights,
responsibilities etc. because if we do the data is gone for
ever. Kahle is director of the Internet Archive which preserves the
'complete' web from time to time and makes it publicly available.

* Cross Language Information Retrieval *
Carol Peters

Talk gave an overview of issues in Cross Language IR. Pointed out the
web is becoming more multilingual and to be able to exploit it fully
Cross language IR is important. Remarkably 'The grand challenge' of
cross language IR is about the same as our 'grand challenge' (in MM
generation) a little more emphasis on language though.
Showed some techniques and approaches to deal with cross language IR
and automatic translation of document.

* Text Categorization and Information Filtering *
Fabrizio Sebastiani

This was basically a very short overview of some of the courses I had during my
AI studies. Main topic was machine learning and how we could apply it to
text categorization/classification tasks. Techniques: probabilistic,
neural nets, decision tree, nearest neighbor, vector space
models. Compared advantages and disadvantages for these
techniques. Interesting talk, but didn't learn much new.

* Video DL's *
Howard Wactlar

Presented a system which automatically indexes news-broadcast
videos. speech is converted to text and used to retrieve
media-clips. System also recognizes text within the picture (captions)
and stores them with the video. This is mainly useful for finding
persons. The system detects scenes, zooming etc and is able to
recognize faces ans 'similar' images. Had some nice features like
searching for a topic in a certain timespan which resulted in a kind of
documentary about a certain subject. I found their system kind of impressive
but it is very focused and tailored for news broadcasts (only CNN) I
wonder what will happen if another non-news video library is used. Is
it still useful for other people than journalists?

* Libraries in the digital world *
Liz Lyon

Talk consisted of two parts. 1st part was mainly a management talk
about network resources for schools in the UK. Talked a little about
combining resources which sounded interesting but did not discuss
techniques used to actually make the combining work.

see: http://www.ukoln.ac.uk/distributed-systems/dner/arch/

Second part was about a system called PATRON which helped students in music and
dance to study sheet music and dance-scenes. Gave a demo which
looked/sounded nice but again she didn't talk much about details. (I
think she adapted to the situation (last talk, nice weather etc..))
One remark: they used SMIL for synchronization but found out it had its
limitations and I think they stopped using it. I don't know what they
used instead.

see: http://www.lib.surrey.ac.uk/Patron2/

Conclusion:

I think the ISDL2001 was very interesting. The intended audience was
quite broad. From librarians to computer scientist and therefore most
of the talks where quite general and high-level. Nevertheless a
somewhat 'complete' picture of DL's was sketched which turned out to
be a very broad field. The issues which came back more often however were:
- Information Retrieval
- Meta-data/semantic web
- Policy management
- Quality insurance of digital publications

Besides the scientific content the culinary content and the weather
content were absolutely great! See for a detaild description the 'INS2
Leisure Report' directory :)

http://www.darmstadt.gmd.de/~labbate/Pisa/

http://clientes.netvisao.pt/fbento/memorias/1st_ISDL_DELOS_Summer_School_Pisa_Italia_2001_July_07_14/

joost