Research Roadmap

Author: Raphaël Troncy

$Date: 2008/09/15 08:33:29 $
$Revision: 1.2 $

Motivation

The context of our research is the Web in its more general sense, that is, any information system that uses web technologies. The architecture of the Web exploits simple technologies which connect efficiently, to enable an information space that is highly flexible and usable, and which, most importantly, scales. Nowadays, the Web is an impressive platform that hosts always more information, encompassing more languages, more media and more activities [BernersLee, 2006].

On one hand, producing multimedia content today is easier than ever before. Digital photos, videos and music are captured, edited and produced every day. These documents can be easily uploaded, communicated and shared using dedicated media hosting web sites such as Flickr, Picasa and YouTube. Furthermore, social networking web sites such as Facebook and MySpace build online communities of people who share interests, and activities. This results in an ever growing amount of digital multimedia content that is difficult to process effectively, and thus to find, understand and reuse.

On the other hand, the proliferation of interconnectivity and interactivity of web-delivered content have led to the development and evolution of web-based communities and hosted services, such as social-networking sites, wikis, blogs, and folksonomies, and often coined by the term Web 2.0 [OReilly, 2005]. In particular, wikis are often used to create collaborative web sites and to provide Knowledge Management systems. The collaborative encyclopedia Wikipedia is one of the best-known wikis. It illustrates the phenomenon that people are willing to contribute and bring their knowledge and expertise on particular subject of interests in order to consitute encyclopedia or compendium that gathers knowledge ^[1].

The Semantic Web provides languages and technologies for representing formally the semantics of information and services on the web. Recently rebranded as the Web of Data, it proposes to expose, share and connect data on the Web, expressed in the RDF machine readable format, via dereferenceable URIs, following the so-called linked data principle. This results in an ever growing amount of formal knowledge that can be used to power information systems. The following datasets are examples of compendium of knowledge built collaboratively by end-users and formalized using semantic web technologies ^[2]:

DBpedia, a community effort to extract structured information from Wikipedia
YAGO, a huge semantic knowledge base composed of over 1.7 million entities such as persons, organizations and cities, extracted from Wikipedia and using Wordnet to structure the information
Freebase, an open shared database of the world's knowledge
Geonames, a geographical data base
UMBEL (Upper Mapping and Binding Exchange Layer), a lightweight reference structure of 20,000 subject concept classes and their relationships derived from OpenCyc
Semantic CrunchBase, a free directory of technology companies, people, and investors.

Finally, there is a growing number of devices that have access to the Web that becomes ubiquitous with the mobile technologies. Users are also more trained to the Web and have more diverse and often more complex information needs. Our asumption is that single sites have rarely the answer to these complex information need. Therefore, users do not only search for particular documents that may contain the information they are looking for, but woud like to be supported while gathering heterogenous and distributed information.

Goal

Our goal is to support users complex information needs such as exploring large information space or gathering hetegoneous and distributed information. More precisely, our goal is to model knowledge in order to i) extract data that is currently locked into documents and ii) create meaningful connexion between pieces of data.

The Web of documents lock data and does not allow information sharing except in the user’s head who has to drive system and data integration ... usually through ''copy & paste''. Open APIs facilitate mash-ups and leverage silos of content, but they actually only allow a shallow integration at the user interface level while the data is still kept in the silos. Our goal is to model and develop semantic mashups in order to improve data and system integration.

We see the user interfaces as a proxy between the data and the users. Our goal is to study how the design and development of these interfaces influence the knowledge and data models that are rendered, and conversely, what constraints the data models put on the user interfaces level.

Research Problems

At the heart of our research, there is the problem of modeling knowledge. We have investigated how to model ontologies and especially how to structure taxonomies. We have proposed a three steps methodology (named ARCHONTE) and a tool (DOE) introducing a clear semantic commitment to normalize the meaning of the concepts and properties of an ontology [Bachimont, 2002].

Ontology Alignment: [Straccia, 2005], [Straccia, 2006]

Build knowledge models for supporting complex user information needs requires to be able i) to describe the content of documents and ii) to represent and make use of the context surrounding the information. Semantic technologies can be applied at several stages. Upstream, knowledge of the context allows to enhance multimedia content analysis. Downstream, knowledge of the user allows to better support user interaction, providing recommendation and personalized access to multimedia content. In the middle, semantic technologies are used to integrate heterogenous and distributed data, infer connexion between pieces of data, select, rank and organize the information, etc.

Structure as context: The asumption that a single model and format could be used to represent both the structure and the content is wrong. The MPEG-7 myth proved to be inefficient. We have advocated to have a clear separation between the representation of the content and the structure as context [Troncy, 2003], [Troncy, 2004a], [Troncy, 2004b]. The Core Ontology for Multimedia (COMM) is a model that formalizes this separation. It provides the descriptors for decomposing structurally any multimedia document as well as the descriptors for representing their low-level features [Arndt, 2007]. Decomposing and representing the structure of a multimedia document amount to define and address fragments of multimedia documents and establish relationships (logical, temporal, topological, etc.) between these fragments. Addressing any fragments of a multimedia document could be directly supported by the web architecture [Troncy, 2007].That would ultimately reduce MPEG-7 to its most basic functionality: the description of low-level features for exchanging multimedia analysis results.

Event as context: NewsML: repeatition of the same error? [Troncy, 2008]

Metadata for the content: IPTC News Codes, Photo Metadata Standard, W3C Media Annotations WG

References

[Arndt, 2007]: Richard Arndt, Raphaël Troncy, Steffen Staab, Lynda Hardman and Miroslav Vacura: COMM: Designing a Well-Founded Multimedia Ontology for the Web. In 6^th International Semantic Web Conference (ISWC'2007), vol. LNCS 4825, pages 30-43, Busan, Korea, November 11-15, 2007.
[Bachimont, 2002]: Bruno Bachimont, Antoine Isaac and Raphaël Troncy: Semantic Commitment for Designing Ontologies: A Proposal. In 13^th International Conference on Knowledge Engineering and Knowledge Management (EKAW'02), vol. LNAI 2473, pages 114-121, Sigüenza, Spain, October 1-4, 2002.
[BernersLee, 2006]: Tim Berners-Lee, Wendy Hall, James A. Hendler, Kieron O’Hara, Nigel Shadbolt and Daniel J. Weitzner: A Framework for Web Science. Foundations and Trends in Web Science, Vol. 1, No 1 (2006), 1–130.
[OReilly, 2005]: Tim O'Reilly: What Is Web 2.0. O'Reilly Network, 2005.
[Straccia, 2005]: Umberto Straccia and Raphaël Troncy: oMAP: Combining Classifiers for Aligning Automatically OWL Ontologies. In 6^th International Conference on Web Information Systems Engineering (WISE'05), vol. LNCS 3806, pages 133-147, New York City, New York, USA, November 20-22, 2005.
[Straccia, 2006]: Umberto Straccia and Raphaël Troncy: Towards Distributed Information Retrieval in the Semantic Web: Query Reformulation Using the oMAP Framework. In 3^rd European Semantic Web Conference (ESWC 06), vol. LNCS 4011, pages 378-392, Budva, Montenegro, June 11-14, 2006.
[Troncy, 2003]: Raphaël Troncy: Integrating Structure and Semantics into Audio-visual Documents. In 2^nd International Semantic Web Conference (ISWC'03), vol. LNCS 2870, pages 566-581, Sanibel Island, Florida, USA, October 20-23, 2003.
[Troncy, 2004a]: Raphaël Troncy: Formalization of Documentary Knowledge and Conceptual Knowledge With Ontologies: Applying to The Description of Audio-visual Documents.. PhD Thesis, University Joseph Fourier-INPG, Grenoble, France, March 5, 2004.
[Troncy, 2004b]: Raphaël Troncy and Jean Carrive: A Reduced Yet Extensible Audio-Visual Description Language: How to Escape From the MPEG-7 Bottleneck. In 4^th ACM Symposium on Document Engineering (DocEng'04), pages 87-89, Milwaukee, Wisconsin, USA, October 28-30, 2004.
[Troncy, 2007]: Raphaël Troncy, Lynda Hardman, Jacco van Ossenbruggen and Michael Hausenblas: Identifying Spatial and Temporal Media Fragments on the Web. In W3C Video on the Web Workshop, San Jose, California, USA and Brussels, Belgium, December 12-13, 2007.
[Troncy, 2008]: Raphaël Troncy: Bringing the IPTC News Architecture into the Semantic Web. In 7^th International Semantic Web Conference (ISWC'08), vol. LNCS 5318, pages 483-498, Karlsruhe, Germany, October 26-30, 2008.

^[1] In 2006, Time Magazine chose the millions of anonymous contributors of user-generated content as Person of the Year, personified simply as You, praising the accelerating success of online collaboration and interaction by millions of users around the world.

^[2] The latest version of the linked data cloud is maintained by Richard Cyganiak at http://richard.cyganiak.de/2007/10/lod/.