The context of our research is the Web in its more general sense, that is, any information system that uses web technologies. The architecture of the Web exploits simple technologies which connect efficiently, to enable an information space that is highly flexible and usable, and which, most importantly, scales. Nowadays, the Web is an impressive platform that hosts always more information, encompassing more languages, more media and more activities [BernersLee, 2006].
On one hand, producing multimedia content today is easier than ever before. Digital photos, videos and music are captured, edited and produced every day. These documents can be easily uploaded, communicated and shared using dedicated media hosting web sites such as Flickr, Picasa and YouTube. Furthermore, social networking web sites such as Facebook and MySpace build online communities of people who share interests, and activities. This results in an ever growing amount of digital multimedia content that is difficult to process effectively, and thus to find, understand and reuse.
On the other hand, the proliferation of interconnectivity and interactivity of web-delivered content have led to the development and evolution of web-based communities and hosted services, such as social-networking sites, wikis, blogs, and folksonomies, and often coined by the term Web 2.0 [OReilly, 2005]. In particular, wikis are often used to create collaborative web sites and to provide Knowledge Management systems. The collaborative encyclopedia Wikipedia is one of the best-known wikis. It illustrates the phenomenon that people are willing to contribute and bring their knowledge and expertise on particular subject of interests in order to consitute encyclopedia or compendium that gathers knowledge [1].
The Semantic Web provides languages and technologies for representing formally the semantics of information and services on the web. Recently rebranded as the Web of Data, it proposes to expose, share and connect data on the Web, expressed in the RDF machine readable format, via dereferenceable URIs, following the so-called linked data principle. This results in an ever growing amount of formal knowledge that can be used to power information systems. The following datasets are examples of compendium of knowledge built collaboratively by end-users and formalized using semantic web technologies [2]:
Finally, there is a growing number of devices that have access to the Web that becomes ubiquitous with the mobile technologies. Users are also more trained to the Web and have more diverse and often more complex information needs. Our asumption is that single sites have rarely the answer to these complex information need. Therefore, users do not only search for particular documents that may contain the information they are looking for, but woud like to be supported while gathering heterogenous and distributed information.
Our goal is to support users complex information needs such as exploring large information space or gathering hetegoneous and distributed information. More precisely, our goal is to model knowledge in order to i) extract data that is currently locked into documents and ii) create meaningful connexion between pieces of data.
The Web of documents lock data and does not allow information sharing except in the user’s head who has to drive system and data integration ... usually through ''copy & paste''. Open APIs facilitate mash-ups and leverage silos of content, but they actually only allow a shallow integration at the user interface level while the data is still kept in the silos. Our goal is to model and develop semantic mashups in order to improve data and system integration.
We see the user interfaces as a proxy between the data and the users. Our goal is to study how the design and development of these interfaces influence the knowledge and data models that are rendered, and conversely, what constraints the data models put on the user interfaces level.
At the heart of our research, there is the problem of modeling knowledge. We have investigated how to model ontologies and especially how to structure taxonomies. We have proposed a three steps methodology (named ARCHONTE) and a tool (DOE) introducing a clear semantic commitment to normalize the meaning of the concepts and properties of an ontology [Bachimont, 2002].
Ontology Alignment: [Straccia, 2005], [Straccia, 2006]
Build knowledge models for supporting complex user information needs requires to be able i) to describe the content of documents and ii) to represent and make use of the context surrounding the information. Semantic technologies can be applied at several stages. Upstream, knowledge of the context allows to enhance multimedia content analysis. Downstream, knowledge of the user allows to better support user interaction, providing recommendation and personalized access to multimedia content. In the middle, semantic technologies are used to integrate heterogenous and distributed data, infer connexion between pieces of data, select, rank and organize the information, etc.
Structure as context: The asumption that a single model and format could be used to represent both the structure and the content is wrong. The MPEG-7 myth proved to be inefficient. We have advocated to have a clear separation between the representation of the content and the structure as context [Troncy, 2003], [Troncy, 2004a], [Troncy, 2004b]. The Core Ontology for Multimedia (COMM) is a model that formalizes this separation. It provides the descriptors for decomposing structurally any multimedia document as well as the descriptors for representing their low-level features [Arndt, 2007]. Decomposing and representing the structure of a multimedia document amount to define and address fragments of multimedia documents and establish relationships (logical, temporal, topological, etc.) between these fragments. Addressing any fragments of a multimedia document could be directly supported by the web architecture [Troncy, 2007].That would ultimately reduce MPEG-7 to its most basic functionality: the description of low-level features for exchanging multimedia analysis results.
Event as context: NewsML: repeatition of the same error? [Troncy, 2008]
Metadata for the content: IPTC News Codes, Photo Metadata Standard, W3C Media Annotations WG
[1] In 2006, Time Magazine chose the millions of anonymous contributors of user-generated content as Person of the Year, personified simply as You, praising the accelerating success of online collaboration and interaction by millions of users around the world.
[2] The latest version of the linked data cloud is maintained by Richard Cyganiak at http://richard.cyganiak.de/2007/10/lod/.