In the recent years we have seen a a growing interest in the design and implementation of networked virtual information spaces, which are being used for engineering and design, education and training, entertainment, and commerce. Despite the technical challenges of such systems, such as communication architecture, dynamic shared states, resource management, etc., it be becomes apparent that the real challenge is to purposely use the increasingly available amount of mainly audio-visual information in a more intelligent way.
To achieve this, it is required to get access to the semiotic and semantic level of information which is hidden in the unified structures of a single image, video, audio, or tactile unit that results from the composition of all its elements.
One approach to tackle this problem is to develop schemata which once being instantiated contain the required information in a structured way. However, if the 'content representation' becomes part of a real time, interactive environment,e.g. a setup box based application, we have to ensure that the content representation structures are actually as well stream-able.
Within MPEG-7 a number of ideas are discussed for solving such problems, e.g. the development of annotation structures with the ability to change and grow, partial parsing, multiplexing of channels, etc.
For the workshop we would like to report on these strategies and perhaps provoke a discussion about this particular and related problems, e.g. how to stream XML in general and how to get XML and media connected above a level of linking from an XML document to am audio or video source.