Job Openings

PhD Position - Interactive Information Retrieval in Semi-structured Data Sets

Traditionally, database research has focused on answering precise queries over homogeneously structured data. In contrast, information retrieval research has focused on answering imprecise queries over unstructured data. Because of the current ubiquity of semi-structured information, combinations of the two approaches are becoming increasingly relevant.

For example, given the query ''in which European countries can I pay with Euros?'' search engines on the Web traditionally aim at returning a ranked list of pointers to those web pages with the highest "relevance", that is, loosely speaking, the pages with the highest probability to satisfy the user's information need. More recently, systems are striving to deploy the structure present in the data and query to provide a more direct answer, e.g. a list of country names, with pointers to the Wikipedia pages describing those countries.

One approach to improve the performance of such systems is to develop better retrieval algorithms so that the resulting system directly provides better answers. High accuracy would however require the system to build up some "understanding" of the data, which is an extremely difficult problem, especially in an unconstrained domain such as the web.

A potentially more viable approach is to try to solve the query in a more interactive manner, so that the algorithms can benefit from user feedback on intermediate results to find the desired answer in multiple steps. The key difference is that the user refines the system's attempt at understanding the data in the process of using the "guessed" meaning. The original problem of understanding the information need is turned into a process where the system negotiates the solution strategy with the user, potentially making much better use of his or her capacity to understand the intermediate results than the system could.

This leads to the following research questions:

  1. What characteristics of the query and underlying data set determine which of the two approaches will lead the user to the desired result in the most efficient way.
  2. How to design a system that combines both approaches and unifies state of the art retrieval algorithms with effective interactive search interfaces.
  3. How to evaluate the performance of such a system on representative query sets and realistically large data sets.

The goals described above bridge the research carried out in the INS1 (Database Architectures and Information Access) and INS2 (Semantic Media Interfaces) research groups. More specifically, it builds on Arjen de Vries's work on entity ranking that is carried out in the INEX XML retrieval evaluation initiative. It also extends Jacco van Ossenbruggen's work on user interface design for interacting with large linked data sets that is carried out in the context of the MultimediaN e-culture project.

The PhD student will help establish stronger working relations between the two groups. The ideal candidate would have a strong technical background in information retrieval and affinity with UI design. Experience with user testing and evaluation is desired but not required.

Some more details and how to apply?

The vacancy concerns a temporary research position for four years. The project is carried out in the Information Systems (INS) cluster, and brings together the research of Arjen P. de Vries (INS1) into entity ranking with the research of Jacco van Ossenbruggen (INS2) into user interface design for interacting with large linked data sets. The salary and terms of employment are in accordance with the "CAO-onderzoekinstellingen". Besides the salary, CWI offers attractive and flexible terms of employment, like a collective health insurance, pension-fund, and initial help with housing for foreigners.

Applications including a Curriculum Vitae with publication list, a statement of research qualifications and interest should accompany the application. Applications can be send by email before November 7th, 2008 to Prof.dr.ir. Arjen P. de Vries (arjen.de.vries AT cwi DOT nl) or to Dr. Jacco van Ossenbruggen (jacco.van.ossenbruggen AT cwi DOT nl), who can also supply further information about this position.