1a) Project Title.

Intelligent Information Retrieval and Presentation
in Public Historical Multimedia Databases (I2RP)

1b Acronym


1c) Principal Investigator.

Prof. dr. L.R.B. Schomaker, KI/RUG

2) Classification.

3) Composition of the Research Team.

Prof. dr. L.R.B. Schomaker KI/RUG Grote Kruisstraat 2/1
9712 TS Groningen
Prof. Dr. H.J. van den Herik IKAT/UM Universiteit Maastricht
Department of Computer Science
P.O. Box 616
6200 MD Maastricht
43 38 83485
Prof. dr. G.A.M. Kempen Theor. Psych./RUL Postbus 9555
2300 RB Leiden
Dr. N. Taatgen KI/RUG Grote Kruisstraat 2/1
9712 TS Groningen
Mw. Dr. H.L. Hardman CWI Kruislaan 413
PO Box 94079
1090 GB Amsterdam
020 592 9333
Dr. J.R. van Ossenbruggen CWI Kruislaan 413
PO Box 94079
1090 GB Amsterdam
020 592 4141

Additionally, funding for six researchers is requested in total, distributed over the four institutes which are involved: KI/RUG (1), Psy/RUL (2), CWI (2), IKAT/UM (1). A more detailed overview is presented in the table in section 5 of this proposal.

4) Description of the Proposed Research.


Current advances in information technology have lead to a situation where systems have become very advanced at the lower levels of processing. Although powerful functionality is present within such systems ('under the hood') the paradox is that an increased amount of effort is expected from the user in terms of the required amount of input which is needed to provide the parametrization of the desired advanced functionality. Furthermore, there exist many new information processing and rendering functions, which were not available in the world of paper-based information processing and which were unimaginable during the early stages of computer development. For these new functionalities, many new interaction approaches have been introduced. As a consequence, current computer software requires a level of user competence which is beginning to limit the effectiveness of information and communication technology.

A case in point concerns those multimedial applications which give the regular computer user acces to historical information in large databases. The ARIA and Adlib databases, developed by the Rijkmuseum in Amsterdam, contain thousands of images and several hundreds of thousands textual database records concerning paintings and works of art. The problems in searching, accessing and utilizing the available multimedial information are huge. As an example, one cannot require the general user to specify his/her database-search query in a formal language such as SQL. Neither is it likely that a single design solution for a WWW-page in HTML (e.g., a form) will suit all possible types of access to such a database. Given the presence of advanced database software and pattern-recognition tools, the challenge will be to translate the available technical functionality into a form which is convenient for the end user. Consequently, there are a number of research questions:

The proposed research will leverage on the current international developments on Web-based agent-technology and ontologies in the context of the "Semantic Web" (E.g. activities stimulated by European 5th and 6th Framework, DARPA/DAML and W3C). While these developments focus on "under the hood" technologies, our research will focus on making these technologies available for the average user.

Within this general framework, a number of research perspectives can be identified. For each of these perspectives, subtasks are defined within the project at large.

Name Problem Area Task Title Institutes
Optima User-Input Support
User Modeling
A User-Agent for Object-based Image Search KI/RUG
Spreekbuis Language Output Performance Grammar Workbench: a Dutch sentence generator RUL
Cuypers Presentation generation Automatic user-centric hypermedia generation CWI
GO Knowledge visualization Graphical Ontologies IKAT/UM

An essential aspect of the proposed research is its focus on working systems. User groups and potential user groups will be regularly requested to participate in annual workshops, in which the results are demonstrated. Although the goal of system implementation is usually in conflict with the goal of scientific publication, our proposed approach is supported and safeguarded by means of the financial/organisational matching resources provided by the participating institutes.

Optima: Optimal Personalized Interface by Man-Imitating Agents

Current developments in software, the internet and consumer electronics are characterized by increases in functionality, but also by increases in complexity of the user interface. In general developers try to achieve a design that optimally fits the preferences and capabilities of the average user. The problem with this approach is that it is often impossible to define an average user, and that this definition is sometimes useless anyway. An example of this situation in which an individual user only uses part of the functionality of an application intensively, and the rest not at all. Electronic encyclopediae, web portals and other online information sources all fit into this category. Users vary wildly in their needs for information, and also vary in the way they search for information most comfortably.

A better solution is to make the user interface adaptive, such that the interface becomes optimized for an individual user. More in particular, the interface should adapt itself to support the strategies, knowledge level and proficiencies of a particular user. The general goal of the project Optima is to design a methodology to make adaptive user interfaces, based on the metaphor of the intelligent agent. The basis for the agent will be the ACT-R architecture, which is both a theoretical model of human information processing and a simulation environment for human cognition. The agent acts as if it looks over the shoulder of the user, so that it goes through the same learning process. The agent acquires information based on the behavior of the user: the choices that are made, reaction times, errors, etc. The result of the learning process is an agent that exhibits characteristics of the user. The interface can use information from the agent to adapt itself to the individual user. An interesting secondary component of this research is that the individual models can be reused in a cluster analysis, to detect general tendencies in the population. This will show what parts of possible user knowledge are general, and what parts are individual. In the project proposed here, the Optima agent-methodology will be developed in the context of a system to search images in large databases.

For the research within the Optima framework and the realization of the actual user agent, KI/RUG will cooperate with CWI. The CWI contribution focuses on the aspects of adaptive information rendering (see subtask Cuypers).

Spreekbuis: Automatische Taal- en Spraakgeneratie in het Nederlands

In het Nederlandse taalgebied vindt weinig onderzoek plaats op het gebied van automatische taalproductie. Dit staat in schril contrast met intensief onderzoek naar het ontleden, begrijpen en verstaan door de computer van geschreven en gesproken Nederlands. Deze asymmetrie kan de verdere ontwikkeling belemmeren van volwaardige dialoogsystemen, d.w.z. informatiediensten die mondeling of schriftelijk in gewone taal te raadplegen zijn en die hun informatie eveneens in geschreven of gesproken taal aanbieden. Moderne voorbeelden zijn Internet-zoekmachines die vragen in natuurlijke taal kunnen interpreteren en beantwoorden (althans tot op zekere hoogte, en meestal alleen in het Engels), en spraakdiensten voor mobiele telefonie (alhoewel die nog niet veel meer dan alleen ingeblikte teksten ten gehore kunnen brengen). Het Spreekbuis-project is erop gericht het evenwicht te herstellen, zodat de computer straks even goed Nederlands kan spreken en schrijven als Nederlands begrijpen en verstaan.

Uit het brede scala van mogelijke onderzoeksthema?s hebben we een keus gemaakt die gericht is op maximale portabiliteit van de te ontwikkelen taalgenerator voor het Nederlands. Dat wil zeggen, de te ontwikkelen modules moeten maximaal inzetbaar zijn ten behoeve van een breed scala aan toepassingen. Dit houdt in dat we zullen werken aan een portabel systeem dat een grote variëteit aan gesproken en geschreven Nederlandse zinnen en zinsconstructies kan voortbrengen, uitgaande van een logisch-semantische specificaties van de zinsinhoud en -context. Dit systeem zal de volgende softwaremodules omvatten:

Cuypers: A User-Centred Hypermedia Presentation Generator

The work of CWI will focus on the presentation aspects of personalized, media-centric hypermedia-interfaces.

The Cuypers proof-of-concept prototype, constructed in the first phase of ToKeN2000, currently focuses on the adaptation of hypermedia presentations to various end-user devices. For example, a desk-top computer, a hand-held device or a mobile phone. This device-driven approach was developed to validate our constraint-driven approach to hypermedia presentation generation.

In the following phase of ToKeN2000, the device-driven approach will be integrated with a more user-centric approach, based on explicit user profile information. In order to adapt hypermedia presentations to an individual user's task and preferences, adequate user models need to be developed.

To be able to convey the results of a multimedia database query to a user effectively, the individual multimedia objects need to be related by placing them in the context of a unified hypermedia presentation. This process of enriching the database content requires a number of steps. First, research is needed into appropriate rhetorical and narrative structures to guide the overall flow of the presentation. Second, research is needed into the process of mapping the rhetorical and narrative structures onto hypermedia presentation patterns. This process is driven by high-level hypermedia design rules which also have to be developed. Finally, research is needed into the realization of these hypermedia patterns in terms of a concrete hypermedia presentation format driven by lower-level design rules and qualitative and quantitative presentation constraint processing.

Collaboration will continue with the Rijksmuseum on providing added value by generating adaptive user-centric hypermedia presentations as a personalized interface to both the museum's internal databases (the Adlib database which is intended for museum experts) and external database (ARIA, intended for the general public).

To benefit directly from the state-of-the-art in the relevant Web-technoloy, the proposed research will capitalize on CWI's close links with W3C. Research aspects focusing on agents for personalized adaptation will be carried out in cooperation with KI/RUG in the context of the Optima project, while the cooperation with IKAT/UM will stress the role of ontologies in the agent-driven user interaction that characterizes both the Cuypers and the GO subtasks.

GO: Graphical Ontologies

The subtask GO will be performed by IKAT/UM.

In the previous phase of ToKeN 2000, a metabrowser for information retrieval (IR) was developed, with a special focus on the presearch (i.e., the phase where the user has not yet started searching for documents but is searching for the relevant concepts). The idea is to present the user with a partial view of the thesaurus, which changes depending on filters chosen by the user. Currently, there is much interest in ontologies for use in Internet interoperability. Especially in digital libraries, ontologies are key where it comes to searching heterogeneous information databases. To enable the building, maintenance, and use of ontologies various formalisms and tools are available. The proposed project aims at a generic tool for searching, accessing, and editing ontologies. It will be generic in the sense that it independent of the representation formalism used. Starting points will include the domain ontology for the annotations of the Rijksmuseum's ARIA database and a ontology for describing user profiles.

5) Requested Budget

Position Inst. Backgr. Name Task Title ftu, yrs Amount Supervisor
postdoc KI/RUG cog   Optima 1.0 , 2 250 kfl Taatgen
postdoc RUL cog   Spreekbuis X, 2 250 kfl Kempen
OIO RUL cog   Spreekbuis X, 4 250 kfl Kempen
postdoc CWI inf   Cuypers X, 2 250 kfl v. Ossenbruggen
OIO CWI inf   Cuypers X, 4 250 kfl v. Ossenbruggen
postdoc IKAT/UM inf F. Wiesman GO 1.0, 2 250 kfl v.d. Herik
Total           1500 kfl  


6) Literature

