1a) Project Title.

Intelligent Information Retrieval and Presentation
in Public Historical Multimedia Databases (I²RP)

1b Acronym

I2RP

1c) Principal Investigator.

Prof. dr. L.R.B. Schomaker, KI/RUG

2) Classification.

a) Op welk toepassingsgebied is het voorgestelde onderzoek gericht?

Opleiding & Cultuur

b) Welk(e) onderzoekscluster(s) betreft het voorgestelde onderzoek?
(Gebruiker, Data of Kennisveredeling en Kennisoverdracht)

Gebruiker
Kennisoverdracht

c) Welk(e) bedrijven en instanties hebben direct of indirect baat bij het voorgestelde onderzoek? (Geef ook de naam van het relevant lid van de Gebruikerscommissie)

Afnemer Vertegenwoordiging
Gebruikerscommissie
Rijksmuseum Amsterdam Schoemaker

Afnemer	Vertegenwoordiging Gebruikerscommissie
Rijksmuseum Amsterdam	Schoemaker

Verdere gebruikers: GlobalArtVillage, Den Haag (J. Pieters).

3) Composition of the Research Team.

[U wordt verzocht bij deze vraag alle personen te vermelden die direct zijn of worden betrokken bij de uitvoering van het voorgestelde onderzoek, met inbegrip van de aangevraagde personele steun (vacature(s)). Vermeld voor zover bekend van deze personen de namen met voorletters, titels, hun specialisme en universiteit of instituut, almede hun adresgegevens. U wordt verzocht deze informatie in tabelvorm aan te leveren. U wordt verzocht in de begeleidende tekst nader in te gaan op de voorgestelde interdisciplinaire samenwerking.]

Prof. dr. L.R.B. Schomaker KI/RUG Grote Kruisstraat 2/1
9712 TS Groningen
050-3637908
schomaker@ai.rug.nl
Prof. Dr. H.J. van den Herik IKAT/UM Universiteit Maastricht
Department of Computer Science
P.O. Box 616
6200 MD Maastricht
43 38 83485
herik@cs.unimaas.nl
Prof. dr. G.A.M. Kempen Theor. Psych./RUL Postbus 9555
2300 RB Leiden
071-5273834
Kempen@fsw.LeidenUniv.nl
Dr. N. Taatgen KI/RUG Grote Kruisstraat 2/1
9712 TS Groningen
050-3636435
niels@ai.rug.nl
Mw. Dr. H.L. Hardman CWI Kruislaan 413
PO Box 94079
1090 GB Amsterdam
020 592 9333
Lynda.Hardman@cwi.nl
Dr. J.R. van Ossenbruggen CWI Kruislaan 413
PO Box 94079
1090 GB Amsterdam
020 592 4141
Jacco.van.Ossenbruggen@cwi.nl

Additionally, funding for six researchers is requested in total, distributed over the four institutes which are involved: KI/RUG (1), Psy/RUL (2), CWI (2), IKAT/UM (1). A more detailed overview is presented in the table in section 5 of this proposal.

4) Description of the Proposed Research.

[

Hierbij moet worden ingegaan op de volgende aspecten:

wetenschappelijke vraagstelling
beoogde onderzoeksresultaten

onderzoeksmethode
wetenschappelijk belang van het voorgestelde onderzoek
inpassing binnen ToKeN2000
hoe de beoogde onderzoeksresultaten (in het bijzonder software) voor een ruimer gebruik toegankelijk gemaakt zouden kunnen worden.

]

Current advances in information technology have lead to a situation where systems have become very advanced at the lower levels of processing. Although powerful functionality is present within such systems ('under the hood') the paradox is that an increased amount of effort is expected from the user in terms of the required amount of input which is needed to provide the parametrization of the desired advanced functionality. Furthermore, there exist many new information processing and rendering functions, which were not available in the world of paper-based information processing and which were unimaginable during the early stages of computer development. For these new functionalities, many new interaction approaches have been introduced. As a consequence, current computer software requires a level of user competence which is beginning to limit the effectiveness of information and communication technology.

A case in point concerns those multimedial applications which give the regular computer user acces to historical information in large databases. The ARIA and Adlib databases, developed by the Rijkmuseum in Amsterdam, contain thousands of images and several hundreds of thousands textual database records concerning paintings and works of art. The problems in searching, accessing and utilizing the available multimedial information are huge. As an example, one cannot require the general user to specify his/her database-search query in a formal language such as SQL. Neither is it likely that a single design solution for a WWW-page in HTML (e.g., a form) will suit all possible types of access to such a database. Given the presence of advanced database software and pattern-recognition tools, the challenge will be to translate the available technical functionality into a form which is convenient for the end user. Consequently, there are a number of research questions:

How can we translate a conceptual internal knowledge representation (containing, e.g., a specific item of knowledge on a topic in Art and History) into understandable language (Dutch), in a format which is adapted to the current user and the current context of usage?
How can we render hypermedia content in a way which is adapted to the prevailing constraints within the usage context and the actual system hardware which is being used?
How can we design system architectures which understand user behavior and which actively support the user and his/her typical (time-variant) interests and preferences?

The proposed research will leverage on the current international developments on Web-based agent-technology and ontologies in the context of the "Semantic Web" (E.g. activities stimulated by European 5th and 6th Framework, DARPA/DAML and W3C). While these developments focus on "under the hood" technologies, our research will focus on making these technologies available for the average user.

Within this general framework, a number of research perspectives can be identified. For each of these perspectives, subtasks are defined within the project at large.

Name Problem Area Task Title Institutes
Optima User-Input Support
User Modeling A User-Agent for Object-based Image Search KI/RUG
Spreekbuis Language Output Performance Grammar Workbench: a Dutch sentence generator RUL
Cuypers Presentation generation Automatic user-centric hypermedia generation CWI
GO Knowledge visualization Graphical Ontologies IKAT/UM

Name	Problem Area	Task Title	Institutes
Optima	User-Input Support User Modeling	A User-Agent for Object-based Image Search	KI/RUG
Spreekbuis	Language Output	Performance Grammar Workbench: a Dutch sentence generator	RUL
Cuypers	Presentation generation	Automatic user-centric hypermedia generation	CWI
GO	Knowledge visualization	Graphical Ontologies	IKAT/UM

An essential aspect of the proposed research is its focus on working systems. User groups and potential user groups will be regularly requested to participate in annual workshops, in which the results are demonstrated. Although the goal of system implementation is usually in conflict with the goal of scientific publication, our proposed approach is supported and safeguarded by means of the financial/organisational matching resources provided by the participating institutes.

Optima: Optimal Personalized Interface by Man-Imitating Agents

Current developments in software, the internet and consumer electronics are characterized by increases in functionality, but also by increases in complexity of the user interface. In general developers try to achieve a design that optimally fits the preferences and capabilities of the average user. The problem with this approach is that it is often impossible to define an average user, and that this definition is sometimes useless anyway. An example of this situation in which an individual user only uses part of the functionality of an application intensively, and the rest not at all. Electronic encyclopediae, web portals and other online information sources all fit into this category. Users vary wildly in their needs for information, and also vary in the way they search for information most comfortably.

A better solution is to make the user interface adaptive, such that the interface becomes optimized for an individual user. More in particular, the interface should adapt itself to support the strategies, knowledge level and proficiencies of a particular user. The general goal of the project Optima is to design a methodology to make adaptive user interfaces, based on the metaphor of the intelligent agent. The basis for the agent will be the ACT-R architecture, which is both a theoretical model of human information processing and a simulation environment for human cognition. The agent acts as if it looks over the shoulder of the user, so that it goes through the same learning process. The agent acquires information based on the behavior of the user: the choices that are made, reaction times, errors, etc. The result of the learning process is an agent that exhibits characteristics of the user. The interface can use information from the agent to adapt itself to the individual user. An interesting secondary component of this research is that the individual models can be reused in a cluster analysis, to detect general tendencies in the population. This will show what parts of possible user knowledge are general, and what parts are individual. In the project proposed here, the Optima agent-methodology will be developed in the context of a system to search images in large databases.

For the research within the Optima framework and the realization of the actual user agent, KI/RUG will cooperate with CWI. The CWI contribution focuses on the aspects of adaptive information rendering (see subtask Cuypers).

Spreekbuis: Automatische Taal- en Spraakgeneratie in het Nederlands

In het Nederlandse taalgebied vindt weinig onderzoek plaats op het gebied van automatische taalproductie. Dit staat in schril contrast met intensief onderzoek naar het ontleden, begrijpen en verstaan door de computer van geschreven en gesproken Nederlands. Deze asymmetrie kan de verdere ontwikkeling belemmeren van volwaardige dialoogsystemen, d.w.z. informatiediensten die mondeling of schriftelijk in gewone taal te raadplegen zijn en die hun informatie eveneens in geschreven of gesproken taal aanbieden. Moderne voorbeelden zijn Internet-zoekmachines die vragen in natuurlijke taal kunnen interpreteren en beantwoorden (althans tot op zekere hoogte, en meestal alleen in het Engels), en spraakdiensten voor mobiele telefonie (alhoewel die nog niet veel meer dan alleen ingeblikte teksten ten gehore kunnen brengen). Het Spreekbuis-project is erop gericht het evenwicht te herstellen, zodat de computer straks even goed Nederlands kan spreken en schrijven als Nederlands begrijpen en verstaan.

Uit het brede scala van mogelijke onderzoeksthema?s hebben we een keus gemaakt die gericht is op maximale portabiliteit van de te ontwikkelen taalgenerator voor het Nederlands. Dat wil zeggen, de te ontwikkelen modules moeten maximaal inzetbaar zijn ten behoeve van een breed scala aan toepassingen. Dit houdt in dat we zullen werken aan een portabel systeem dat een grote variëteit aan gesproken en geschreven Nederlandse zinnen en zinsconstructies kan voortbrengen, uitgaande van een logisch-semantische specificaties van de zinsinhoud en -context. Dit systeem zal de volgende softwaremodules omvatten:

Conceptualisator (m.n. het logisch-semantische representatieformalisme)
Grammatische Encoder (zinsbouw- en woordvormingsmodule)
Lexicon (groot on-line woordenboek met per woord alle benodigde grammatische en sematische codes)
Intonator en Spraaksynthetisator (goed verstaanbare, prosodisch verantwoorde spraak-uitvoer)
Lexiconbouwer (faciliteit om de taal- en spraakmodules af te stemmen op de vereisten van een een nieuw toepassingsdomein).

Cuypers: A User-Centred Hypermedia Presentation Generator

The work of CWI will focus on the presentation aspects of personalized, media-centric hypermedia-interfaces.

The Cuypers proof-of-concept prototype, constructed in the first phase of ToKeN2000, currently focuses on the adaptation of hypermedia presentations to various end-user devices. For example, a desk-top computer, a hand-held device or a mobile phone. This device-driven approach was developed to validate our constraint-driven approach to hypermedia presentation generation.

In the following phase of ToKeN2000, the device-driven approach will be integrated with a more user-centric approach, based on explicit user profile information. In order to adapt hypermedia presentations to an individual user's task and preferences, adequate user models need to be developed.

To be able to convey the results of a multimedia database query to a user effectively, the individual multimedia objects need to be related by placing them in the context of a unified hypermedia presentation. This process of enriching the database content requires a number of steps. First, research is needed into appropriate rhetorical and narrative structures to guide the overall flow of the presentation. Second, research is needed into the process of mapping the rhetorical and narrative structures onto hypermedia presentation patterns. This process is driven by high-level hypermedia design rules which also have to be developed. Finally, research is needed into the realization of these hypermedia patterns in terms of a concrete hypermedia presentation format driven by lower-level design rules and qualitative and quantitative presentation constraint processing.

Collaboration will continue with the Rijksmuseum on providing added value by generating adaptive user-centric hypermedia presentations as a personalized interface to both the museum's internal databases (the Adlib database which is intended for museum experts) and external database (ARIA, intended for the general public).

To benefit directly from the state-of-the-art in the relevant Web-technoloy, the proposed research will capitalize on CWI's close links with W3C. Research aspects focusing on agents for personalized adaptation will be carried out in cooperation with KI/RUG in the context of the Optima project, while the cooperation with IKAT/UM will stress the role of ontologies in the agent-driven user interaction that characterizes both the Cuypers and the GO subtasks.

GO: Graphical Ontologies

The subtask GO will be performed by IKAT/UM.

In the previous phase of ToKeN 2000, a metabrowser for information retrieval (IR) was developed, with a special focus on the presearch (i.e., the phase where the user has not yet started searching for documents but is searching for the relevant concepts). The idea is to present the user with a partial view of the thesaurus, which changes depending on filters chosen by the user. Currently, there is much interest in ontologies for use in Internet interoperability. Especially in digital libraries, ontologies are key where it comes to searching heterogeneous information databases. To enable the building, maintenance, and use of ontologies various formalisms and tools are available. The proposed project aims at a generic tool for searching, accessing, and editing ontologies. It will be generic in the sense that it independent of the representation formalism used. Starting points will include the domain ontology for the annotations of the Rijksmuseum's ARIA database and a ontology for describing user profiles.

5) Requested Budget

Position	Inst.	Backgr.	Name	Task Title	ftu, yrs	Amount	Supervisor
postdoc	KI/RUG	cog		Optima	1.0 , 2	250 kfl	Taatgen
postdoc	RUL	cog		Spreekbuis	X, 2	250 kfl	Kempen
OIO	RUL	cog		Spreekbuis	X, 4	250 kfl	Kempen
postdoc	CWI	inf		Cuypers	X, 2	250 kfl	v. Ossenbruggen
OIO	CWI	inf		Cuypers	X, 4	250 kfl	v. Ossenbruggen
postdoc	IKAT/UM	inf	F. Wiesman	GO	1.0, 2	250 kfl	v.d. Herik
Total						1500 kfl

Algemeen

Specificeer de gevraagde personeelsplaatsen, een eventueel additioneel reisbudget en nieuw aan te schaffen apparatuur/software (kosten < kf 50) die specifiek benodigd is voor het project. Kosten van apparatuur/software die tot het gebruikelijke pakket voorzieningen van een instituut gerekend moeten worden, worden niet gesubsidieerd.

De totale aanvraag, inclusief matching, (salariskosten, materieelkrediet en reisbudget) kan maximaal Mf 1,5 bedragen voor grote projecten, en Mf 0,75 voor kleine projecten.

Voor elk project dient 25% van de totale kosten als matching hier gespecificeerd te worden. De matching mag (gedeeltelijk) bestaan uit inzet van vast personeel voor de begeleiding van nieuw aan te stellen OiO?s en postdocs.

Subsidiebedragen

Voor de salariskosten van een OiO wordt een standaardbedrag van kf 228 gehanteerd, voor een periode van vier jaar.

Voor de salariskosten van een postdoc wordt gerekend met een standaardbedrag van kf 100 per jaar. De hoogte van de subsidie wordt aangepast aan de inschaling van de kandidaat, met als maximum de tarieven van salarisschaal 11. Voor een postdoc kan ten hoogste voor twee jaar subsidie worden aangevraagd.

Voor werk- en congresbezoek in binnen- en buitenland wordt (standaard) aan elke nieuw aan te stellen OiO of postdoc een persoonsgebonden budget toegekend van f 7.350,- voor de gehele projectperiode. Postdocs met een éénjarige aanstelling ontvangen de helft.

Voor de salariskosten van overig personeel (technici, programmeurs) kan ten hoogste kf 90 per jaar worden aangevraagd. Overig personeel kan alleen tezamen met (een) OiO(?s) of (een) postdoc(s) worden aangevraagd. De aanstellingsduur is nooit langer dan die van de OiO(?s) of de postdoc(s).

Alle subsidiebedragen zijn inclusief 35% opslag voor overhead.

De vermelde bedragen zijn op basis van de vigerende CAO?s. De definitieve subsidie wordt aangepast aan CAO-veranderingen.

6) Literature

S.v.p. een opsomming van de in het voorstel gebruikte referenties geven. Daarnaast wordt u verzocht de belangrijkste vijf publicaties van het onderzoeksteam te vermelden.

Vooraanmeldingen van onderzoeksvoorstellen

(uitsluitend als PS, PDF of Word-file) kunnen per email worden ingediend bij dr. S.C.M. Wigchert, email: wigchert@nwo.nl.

Prof. dr. L.R.B. Schomaker	KI/RUG	Grote Kruisstraat 2/1 9712 TS Groningen 050-3637908 schomaker@ai.rug.nl
Prof. Dr. H.J. van den Herik	IKAT/UM	Universiteit Maastricht Department of Computer Science P.O. Box 616 6200 MD Maastricht 43 38 83485 herik@cs.unimaas.nl
Prof. dr. G.A.M. Kempen	Theor. Psych./RUL	Postbus 9555 2300 RB Leiden 071-5273834 Kempen@fsw.LeidenUniv.nl
Dr. N. Taatgen	KI/RUG	Grote Kruisstraat 2/1 9712 TS Groningen 050-3636435 niels@ai.rug.nl
Mw. Dr. H.L. Hardman	CWI	Kruislaan 413 PO Box 94079 1090 GB Amsterdam 020 592 9333 Lynda.Hardman@cwi.nl
Dr. J.R. van Ossenbruggen	CWI	Kruislaan 413 PO Box 94079 1090 GB Amsterdam 020 592 4141 Jacco.van.Ossenbruggen@cwi.nl