Author: George Anadiotis



Date: 18/7/2006

A tag-based bootstrapping approach to multimedia metadata in large unstructured communities


The starting point for this blue note was the notion of emergent semantics, and some thoughts (also partially discussed with Frank and Raphael) on how this paradigm could be applied in the field of multimedia modelling and annotation. The main principles of emergent semantics, introduced in [1] and briefly summarized here , make this approach a natural match for distributed, non hierarchical environments.


There is also some existing work on this field that was very useful to draw inspiration from:


[2], [3]. The main idea of Santini, introduced in [2] and further elaborated in [3], is that image semantics is not (only) defined by static metadata, but also by its context and by user interaction.



[4] Groke et al basically build on Santini’s idea by taking it from its original field of application (image db’s) to the semantic web, and extending it from low-level descriptors to high-level concepts.



[5] Heylighen & Bollen again elaborate on the ’semantics via interaction’ idea, and make the connection with their own previously published work that exploits user navigation to define context & meaning.



[6] Pouwelse et. al. present their approach to user-provided mm content + metadata: collaboratively tagging wikipedia content using the Tribler platform. This may seem simple, but it could be a demonstrator of a far-reaching approach.



Santini is mostly based on low-level image features, but the same principle (interaction) could be applied to higher level metadata that is not based on feature extraction, as demonstrated in [4]. This opens new possibilities, as having metadata that is not specific to images allows for the same basic principle to be applied for virtually any type of multimedia document. However, there are many important questions here:

  1. What kind of metadata could that be?

  2. How could that metadata be obtained?

  3. What could the metadata be used for?



The answer to the 1st question -'what kind of metadata'- will necessarily have to be pragmatic. One main reason that has been hindering the widespread adoption of the SW to this point has been the lack of proper, standards-based metadata for web resources. This can be explained when we consider on the one hand the high cost of adding metadata to resources, on the other hand the proliferation of different mm metadata formats: EXIF, XMP, MPEG-7, NewsML, to mention just a few. While each format has its merits and serves a purpose, lack of interoperability hinders development of metadata-based solutions. This is where the SW standards can play a very significant role, by acting as a bridge between metadata expressed in various formats.



The answer to the 2nd question -'how to obtain the metadata'- will again have to be pragmatic; we have argued why using the SW metadata standards is important – but if we expect this to happen, that means average users will have to use them in their daily routine, whether creating documents or interacting with other users of their personal interest groups or online communities, simply commenting on documents other than their own. That means that using it should be simple, and also that there should be a tangible benefit in it for the user. One solution could possibly lie in tags and folksonomies, in combination with an incentive mechanism.



Finally, there can be many answers to the 3rd question – 'what can the metadata be used for': retrieval, navigation, presentation, recommendation. An interesting application of such a recommendation system can be found in [5].



The ever-growing success of social, tag-based resource sharing systems such as Delicious, Flickr, Technorati, Last.fm, Connotea etc shows that in real-life tagging is a very viable solution for annotating mm items. Social resource sharing systems are web-based systems that allow users to upload their resources, and to label them with arbitrary words, so-called tags [11]. In fact, this approach lowers the barrier of metadata annotation, since it requires minimal effort on behalf of annotators: there are no special tools or complex interface that the user needs to get familiar with, and no deep understanding of logic principles or formal semantics required – just some standard technical expertise.



This has been adequately demonstrated at the hands-on session of the MMSEM XG [16]; for this session, members of the XG were given a certain time within which to annotate any item of a given cross-modality mm collection, without any restriction on the method, the only requirement being that the annotation would have to be SW-standards compliant. The 'contestants' in this experiment were the members of the XG: academics, all of which, to varying degrees, are experts on the Semantic Web, its concepts, standards and best practices. The results in the end of the hands-on session were quite interesting:



  1. Some of the participants were not at all able to annotate even a single item, because producing RDF 'manually' without aid from some sort of annotation tool is not really viable. So, the ones that did not have access to or were not adequately familiar with such a tool could not complete this task within the given time.

  2. Some of the participants used some specialized tools to annotate items. All such approaches involved either creation or acquisition of a formal ontology, so they required a high level of expertise as well as considerable amount of time.

  3. In contrast to the above, using a tag-based solution (Flickr, in this example) did not require any special tool or expertise, no formal ontology creation/acquisition, and produced a result within a few minutes.



Another interesting feature of the tagging approach is, as the examples mentioned earlier demonstrate, that it can be applied across domains and modalities: tags are generally high-level concepts, which makes them equally well suitable to describe any sort of mm item. This is complimentary to to metadata based on low-level features, which are usually only suitable for a specific modality. There seems in fact to be a many-to-many correspondence between low level features and high level concepts [4], so having them coexist could be a way to alleviate the semantic gap [13].



In addition, having many annotators for the same document, although on the one hand may create ambiguity, on the other hand offers a potentially broader perspective on the same topic; showing all users'annotation for a specifc item is a very useful feature of social resource sharing systems.



The problem with this approach is that the metadata that is obtained lack any form of formal semantics: their biggest advantage -intuitiveness- can also be a great disadvantage, as the meaning of tags is not formally defined, but rather relying on a common understanding of concepts between annotator(s) and user(s). Another drawback that results from this fact is that currently there is no way to reuse tags across different systems. This could be eliminated by adding some sort of formal semantics to unstructured tags; the challenge lies therefore in finding a way to transparently express intuitive metadata, such as tags, in a formal way, without putting this burden on the shoulders of end-users. Users should not have to deal with complex concepts and tools to be able to annotate documents – something that seems to be currently the case. Machines on the other hand need formal semantics in order to be able to make inferences and offer richer navigation, search and presentation of mm documents.



Although the unstructured tagging approach is generally seen as being in contrast with formal semantics, it can also be seen as a complementary one: having both static annotation - i.e. the one provided by document creators, as well as dynamic annotation - i.e. the one that reflects the way documents are used by users and associated with other documents [4, 8].



The unstructured tagging approach is suitable for large non-hierarchical user communities, such as the ones served by the Web or p2p systems, for 3 reasons [17]:

  1. An ontology, no matter how elaborate and well constructed, can only represent either the ontologists's view of the domain, or, at best, consensus reached among a limited number of ontologists / stakeholders. Naturally, different individuals will have different views of the domain, so adhering to a 'one size fits all' approach will not work in practice, in cases where:

  1. The use of formal ontologies for annotation requires a certain level of knowledge both about ontological structures, as well as about the specific ontology at hand. This level of knowledge can generally not be assumed, when referring to large loosely-coupled user communities, members of which have varying levels of field expertise and technical competency.

  2. Even if the consensus for a common ontology can be achieved, it may not be able to catch the fast pace of change of the targeted web resources or the change of user vocabularies in their applications.



There are a couple of important questions that need to be asked here – and by answering these questions some directions may start to shape. Questions such as : 'why do people use tags', 'how can tags aid in modeling and accessing mm documents'.



More often than not, people use tags as a sort of 'mental notes'; their main use is to provide easy access to the tagged documents [7]. Although at first this may seem like a pure IR issue, there can be however some far-reaching consequences:

  1. As already mentioned, tags tend to represent high level concepts; as people use tags intuitively, this resembles the 'free association' experiment [21], in which subjects are asked to name concepts that are triggered by giving them certain stimuli.

  2. Not surprisingly, URLs are already been used as tags too in some cases [7]. The net effect of this is the creation of links between documents that their authors did not foresee. Tags of this kind are a special case, as in contrast to 'normal' tags they provide links to documents that are ouside the scope of a specific tag-based platform.

  3. The above 2 points combined, make for a quite rich navigation model: each tagged document may be associated not only to other similarly tagged documents, but also directly to any document (via the aforementioned URL-tagging), or part of document (taking into account fragment identifiers).

  4. The set of tags that are attached to a certain document can be seen as a coherence [8]. Thus, adoption of the bootstrapping paradigm for knowledge representation could possibly be one way of circumventing the semiotics dillema faced when trying to apply linguistics models to express the meaning of mm documents: whilst visual media such as photography, painting and drawing have lines, colours, shadings, shapes, proportions and so on which are 'abstractable and combinatory', and which 'are just as capable of articulation, i.e. of complex combination, as words', they have no vocabulary of units with independent meanings [18]. So, a mm document's meaning cannot be expressed in linguistic / ontological terms, but rather by neighboring / overlapping coherences.



However, as already pointed out in [8], coherences are non-directional structures, and that is a shortcoming: non-directionality is less general than directionality. It also corresponds to symmetry, while in general relations may or may not be symmetric. So in order to support richer semantics, some sort of directionality is needed; this is usually expressed in terms of directed links that are used to denote the exact relationship between concepts (represented by tags, in our paradigm). Directionality does not necessarily have to be hierarchical, though; other relations such as 'same-as'or 'inverse-of''may as well be used.



But what can tags do to aid modeling and accessing mm documents? As hinted by Langer [18], language (or text, its digital manifestation) is fundamentally different than senses (or multimedia, their digital manifestation).



Language is crafted, therefore it has structure, as it serves a purpose – to communicate effectively. At least, this is usually the case. Therefore, with the exception of language used in an artistic context (poetry, for example), sentences generally have a fixed structure and convey specific messages.



An image, or a moving image backed by sound for that matter, is different in the sense that, generally speaking, it serves no particular purpose – it is simply conceived. There are naturally documents that were created with a specific purpose, to convey a specific message and following more or less fixed structures too – television programs and films are prime examples. But again, this has to do more with the structure than the content itself.



Or to put it differently, 'a rose is a rose is a rose' – there is no intrinsic 'Aristotelian' meaning in it; it all depends on the context and the interpretation given to it by the receiver. So, although formal semantics may be used to model domains that are structured and relatively limited, they may not be appropriate to capture the 'meaning' of multimedia content in the general case, simply because there does not have to be one per se. Audiovisual documents can be considered as externalized fragments of memory – something which seems in fact to be very much in line with Frank Nack's view [9]. It could be fragments of our own memory / imagination (such as personal pictures) or somebody else's memory / imagination (such as digitized museum exhibits) that eventually also become own memories, after being 'consumed'.



This approach acknowledges the fact that there can be no equivalent of iconic sentences and phonemes, and the enumeration of objects identified in multimedia documents does not define their meaning. Since 'non-discursive' media are more complex and subtle than verbal language and are 'peculiarly well-suited to the expression of ideas that defy linguistic "projection"', we should not seek to impose linguistic models upon other media since the laws that govern their articulation 'are altogether different from the laws of syntax that govern language'. Treating them in linguistic terms leads us to 'misconceive' them: they resist 'translation' [18]. Their meaning can instead be defined by each document's association to documents classified under similar concepts or directly associated with certain other documents [1].



Bridging the two seemingly opposite ends of the annotation spectrum, unstructured tagging / folksonomies and formal semantics / ontologies could be very benefitial for both worlds, by combining ease of use with ability for machine processing. Some first steps that could lead to that direction have been taken, and are actually in use in production systems [7]. These are for example tag bundles (hierarchy of tags), tag clouds (visualization of a set of tags based on their frequency of use) and tag suggestions (showing commonly used tags for resources that have already been tagged by other users).



Approaches like these, combined with e.g. rules mined from user navigation in the content space and some minimal user intervention could eventually lead from a basic personal taxonomy (a 'personomy ' [13]),to an implicit lightweight personal ontology extraction - which in turn could be leveraged in the Emergent Semantics approach (pair-wise local interactions between parties that make use of different schemata/ontologies). Furthermore, it can be used to annotate content that is either provided by the ‘owner’ of this ontology or simply encountered while browsing. This field of research is an emeging one, but there are already some results published in a WWW06 workshop.



One emerging standard that seems to be a natural match for this setting is SKOS [15]. Its intended use is to be leverage expression of thesauri, taxonomies and folksonomies in the SW standards. There are in fact already some results [12] on how this process could be standardised. Applying similar, automated, methods to personomies could transparently produce their SKOS manifestation, which will enable existing platforms to integrate semantic metadata. A similar approach has been proposed for example for blogging platforms, enriching blogging to turn it into semantic blogging [10]. Although this application does not utilize SKOS, it demonstrates how SW metadata can be added transparently to enhance user experience.



This approach is definitely promising, as it deals with the high cost of SW metadata creation by making it easy to produce them. It does not address however another fundamendal issue: incentive. However easy adding metadata may be, users will be much more inclined to do so if there is an immediate reward for them. The p2p world may have something to offer in this area, as in this field the 'free rider effect' [14] has been extensively studied. It has been observed that in resource-sharing communities, many users tend to simply reap the benefits of the resources others contribute, without contributing something in return [20]. In a collaborative annotation approach like tagging, users may be less willing to tag or to share their tags with others, if they know/expect that someone else will do it for them.



One interesting approach that has been developed in the p2p world to deal with this issue is the design of mechanisms that give incentives for users to contribute resources [22]. One of the most well-known and simple strategies for incentives is the 'tit-for-tat' strategy adopted by the BitTorrent platform: users are given an initial download limit, after the end of which they should upload at least as much as they download. The same idea could be extended to metadata: if metadata are also treated as a contribution, people will have an incentive to annotate. Furthermore, by personalizing metadata and tracing them back to their origin, not only quantity but quality could be assesed as well [11].



In that respect, it seems that a unified, user-centered platform that would allow personalized browsing of content could very well serve this purpose. A very interesting approach towards this end is presented in [6]: its vision is to create a user-centered, unified platform that will allow browsing existing content as well as publishing new content, both on the web and on the p2p space. In fact, it is already focused on the use of tags for the annotation of mm documents (keeping in mind that they make for the largest portion of traffic in p2p) , and also builds on the incentive mechanism, so this seems like a natural match.



We have already mentioned the emergent approach to semantics, according to which the semantics of any item is not (only) defined by static metadata, but also by its context and by user interaction. One issue that exists here however is that of privacy: users may be justifiably reluctant to have their navigation patterns, personal preferences etc mined for purposes and by means outside their control. To avoid this issue would mean changing the perspective from ’source-centered’ to ‘user-centered’: all interactions will not be mined by and for the source, but by and for the user and will remain under explicit control. Thus, while helping create the form of ‘implicit personal ontology/profile’ mentioned earlier, explicitly shared data can be used to (selectively, granting access on a per-profile or group or user etc basis) improve other users experience as well.



In addition, the already existing feedback - collaborating filtering mechanism in Tribler [6] could be very nicely complemented with the approach described in [5]. Again, this approach is ’source-centered’: it focuses on retrieving data from web sites, based on usage logs. This could be also applied in a unified, user-centered environment: data could be mined on the fly, as user navigation takes place.



References:

[1] Karl Aberer , Philippe Cudre-Mauroux et.al : Emergent Semantics Principles and Issues.

[2] Staab, S., Santini, S., Nack, F., Steels, L., and Maedche, A. (2002) Emergent semantics. IEEE Intelligent Systems, 17(1):78–86.

[3] S. Santini, A. Gupta, and R. Jain, “Emergent Semantics Through Interaction in Image Databases,” IEEE Transactions on Knowledge and Data Engineering, Vol. 13 (2001), pp. 337-351

[4] W. I. Grosky, D. V. Sreenath, and F. Fotouhi, “Emergent Semantics and the Multimedia Semantic Web,” SIGMOD Record, Vol. 31 (2002), pp. 54-58

[5] Heylighen F. & Bollen J. (2002): “Hebbian Algorithms for a Digital Library Recommendation System”, in Proceedings 2002 International Conference on Parallel Processing Workshops (IEEE Computer Society Press)

[6] J. Fokker, J. Pouwelse, W. Buntine: Tag-Based Navigation for Peer-to-Peer Wikipedia. To appear in WWW06

[7] M. Guy and E. Tonkin. Folksonomies: Tidying up tags? D-Lib Magazine, 12(1), January 2006

[8] Heylighen F. (2001): "Bootstrapping knowledge representations: from entailment meshes via semantic nets to learning webs", Kybernetes 30 (5/6), p. 691-722.

[9] Nack, F. (2005) You must remember this. IEEE MultiMedia, Vol 12, No. 1, pp. 4 – 7.

[10] Bojars, U., Breslin, J., Moller, K. (2006): Using Semantics to Enhance the Blogging Experience. 3rd European Semantic Web Conference, 11-14 June 2006, Budva, Montenegro

[11] Hotho, A., Jaschke, R., Schmitz, C., Stumme, G. (2006): Information Retrieval in Folksonomies: Search and Ranking. 3rd European Semantic Web Conference, 11-14 June 2006, Budva, Montenegro

[12] van Assem, M., Malaise, V., Miles, A., Schreiber, G. (2006): A Method to Convert Thesauri to SKOS. 3rd European Semantic Web Conference, 11-14 June 2006, Budva, Montenegro

[13] V. Gudivada and V.V. Raghavan. Content-Based Image Retrieval Systems. IEEE Computer, 28(9):18-22. 1995.

[14] Adar, E., Huberman, B. Free riding on Gnutella. Technical report, Xerox PARC, 10 Aug. 2000.

[15] SKOS Core: http://www.w3.org/2004/02/skos/core/

[16] W3C Multimedia Semantics Incubator Group: http://www.w3.org/2005/Incubator/mmsem/

[17] Wu, X., Zhang, L., and Yu, Y. 2006. Exploring social annotations for the semantic web. In Proceedings of the 15th International Conference on World Wide Web (Edinburgh, Scotland, May 23 - 26, 2006). WWW '06. ACM Press, New York, NY, 417-426.

[18] Langer, Susanne K (1951): Philosophy in a New Key: A Study in the Symbolism of Reason, Rite and Art. New York: Mentor

[19] P. Mika. Ontologies are us: A unified model of social networks and semantics. In Proc. of 4rd Intl. Semantic Web Conference (ISWC2005), 2005.

[20] J. Shneidman and D. Parkes. Rationality and self-interest in peer to peer networks. Int. Workshop on Peer-to-Peer Systems (IPTPS), 2003.

[21] Edinburgh Associative Thesaurus: http://www.eat.rl.ac.uk/

[22] Feigenbaum, J. and Shenker, S. Distributed Algorithmic Mechanism Design: Recent Results and Future Directions," in Proceedings of the 6th International Workshop on Discrete Algorithms and Methods for Mobile Computing and Communications, ACM Press, New York, 2002, pp. 1-13.