Lynda's view of the current state of our research 25 Jan 2001 and edited 19 Feb 2001 and edited 8 May 2001 This piece is meant to give an insight into where I think the group's research is and is going. It is meant to be broad and not go into any particular topic in detail. It is a snap-shot of where we are, and will be interesting in a year or so to see where we went. The sections are based on the figure of 5 layers (left hand side, layers are processing steps) and 4 experts (right hand side, each expert is a knowledge repository). Layers: communication means, communicative devices, qualitative and quantitative constraints, final presentation. Experts: application, user, design, context. Background information van be found in the WWW10 paper, the HT 00 paper and Jacco's HT 01 Semantic Web submission (correction, rejection :-( ). Discourse model --------------- We need to understand what the discourse model layer is. Both in terms of what the structures of the layer should be (e.g. is RST a good idea, or not), and in terms of how it fits in with the annotated database items (see Application Expert). We need a (currently manually created) connection between the domain-specific ontology and the communicative devices. (There is a parallel here with style sheets between an XML document and final presentation, where the ontology is the "XML document", the communicative device is the "final presentation" and the rhetorical mapping is the "style sheet". One question is, can we ever expect the mapping from the domain-specific ontology to the communicative devices to be done automatically?) The information about how to communicate the content is something like the pedagogical style in learning environments. It may be a domain-independent (i.e. not domain-specific) ontology (or several domain-independent ontologies), or it may not. If it isn't an ontology, then could it be an XML schema? I think the question I am asking is whether this "rhetorical mapping" layer should be information rich, and processable in its own right, or whether it should be somewhat lighter, such as a style sheet, and just describe a simple mapping. I think we need to try things out before we can really answer this. An article by Boll et al. (see leesklub 7th February 2001) assumes a consistent storyline and replaces media items in a document with semantically similar ones (where semantically similar can also depend on the context). They do not take into account, nor ask the question, as to whether the "discourse roles" of the substitute media items are equivalent. I think the TUE group have an RMM notion, with database slices. We have instead a fragment of the ontology, and the "presentation rules" (is this is correct RMM term?) are not just hypertext pages but include space, time and links. I think they go from domain-specific ontology directly to presentation and miss out the rhetorical part. Communicative devices --------------------- The processing layer needs to select appropriate communicative devices, but a lot has to happen before we get that far. A lot of communication needs to happen between this process and the Design Expert (including the Network Expert), the User Expert (including the Context Expert) and then a decision as to which communicative devices are going to be used is made. Here perhaps we need to know what the options are for individual communicative devices and how to take into account information in the experts. We think we know what the communicative devices are - combinations of links, spatial and temporal layout. They seem suspiciously like hypermedia design patterns. Hopefully Susanne and I can do some investigation in this direction. Franca Garzotto is already involved with some work in this direction, and has colleagues in Lugano. She has already given me a pointer to a design pattern collection. http://www.designpattern.lu.unisi.ch Research questions here are to what extent are existing design patterns suitable for our time-based world (links and spatial ones should already exist and be applicable). Can we then encode them and have them usefully incorporated in the generation system. What sort of encoding? What sort of decision making needs to be made to select among them. Do we need to find other more time-based ones (looking to the worlds of film, TV, or even radio). Can we already take some existing things and add them to the software framework? (No programming time?) Should I first try to write some things down and then look at how to express them? Sources for ideas on potential communicative devices are the Scott McCloud books and the Tufte book. I'd like to see if I can at least write down some ideas in English. (Maybe they won't have the status of patterns, since they are not common solutions to a problem, but it should be interesting trying to write them down.) (I am currently working on this.) Qualitative (and quantitative) constraints ------------------------------------------ I'm not sure what to say about this, other than read the WWW10 paper. Hopefully this is Joost's master's work, with joint supervision with Krzysztof Apt. Does he need other help? CWI? NL? International? Generating final form presentation ---------------------------------- So we generate SMIL. We should also be able to generate WML, or something that is different. We can also generate annotated SMIL i.e. SMIL+RDF. MPEG4 has also been suggested. An example where adding semantics to the generated presentation is useful is the following. When generating presentation level links there are two sorts - one sort where the information is less related, and another sort where they should have been on the screen together, but there was no space. This is where adding annotation to the generated document can be very useful (for later processing - e.g. all the hub/authority algorithms need to know why the link was there, although they would need to inderstand the annotation (which can be looked up in the URL of the ontology, right?)). There are (at least) 2 types of knowledge that can be included - the content-based ontology and the communicative means ontology (if it is an ontology). This then gets fairly close to the Boll et al. paper on including annotated media elements in a document. See Mike Uschold's paper on a categorisation of ontologies and what they can be used for. (KAW99 paper.) Application expert ------------------ This is where we get to the ontology work. There are 4 separate things: domain-specific ontology; annotations from this ontology attached to the media items; a multimedia-specific ontology (video has scenes, scenes have sequences); some way of finding relevant media items. We need a domain-specific ontology (we don't want to create this ourselves). The media items in the database need to be annotated with these. Again, we don't want to do this ourselves. It would also be more preferable if the annotations are attached within the media, and not just to complete objects. As well as the domain-specific ontology we may also need a multimedia ontology, describing the composition of the individual media. The annotated items are going to be used for information retrieval as well as for presentation generation. (A WWW10 article by Ronny Lempel and Aya Soffer "PicASHOW: Pictorial Authority Search by Hyperlinks on the Web" shows that you don't need ontologies to do image retrieval, but that successful tools can be built on top of text-search engines using link analysis. I guess we don't mind, although we need to know what the items are about to start with, _how_ they are found we shouldn't have to care about.) I think the user query should be defined in terms of the domain-specific ontology. How that is done may be through a sort of selection interface (I have an animal classification CD, and the user can browse through the categories.) Or it may be via a typed-in query, or some combination in the background of following the user browse. Having found the relevant images, texts, audio(?) and video(?) we want to combine them in a presentation. We need the domain-specific ontology. More details on multimedia and ontologies are in blue note [[cross-reference to be created]] "Relationship between ontologies and multimedia". How does the application expert relate to all the ontologies mentioned there? (There is a one-sheet talk for the VU which sort of shows the application expert and its relationship with the annotations and MM DB. http://www.cwi.nl/~media/semantics/GenerationVU.pdf) Question - do we need a multimedia-specific ontology? Or do we just think we do? Can we write a sensible argument for either way? Note that Jane Hunter's paper at WWW10 is extremely relevant for this. (Leesklub 9 May 2001.) She suggests combining RDF and XML schemas to use each for representing what it is best at. User expert ----------- I don't think we really want to research this, but we do need to get some user models from somewhere. Can Susanne help? We need to know what needs to be in the user model, and how it is used in the various layers of the generation process. I presume the variance will take place in the "upper" processing levels, such as using different communication means or communicative devices. It would be nice if some user characteristics apply more to the one layer or the other. Design expert ------------- The Design Expert needs to know about different users, different hardware and different network conditions. It also needs to be a repository for design information, such as different genres and styles. It may be the place to store the collection of communicative devices we should be developing. Perhaps they need to be stored here, since we may want to use different collections of communicative devices for different hardware, different users or different available network bandwidths. Perhaps we also need a network expert - an interface to the network that can be used by different parts of the processing chain. For example, we may want to build large parts of the presentations knowing that there will be a approximately consistent bandwidth available, but that at the final stage, small variations are needed to send the most appropriate bandwidth video. We need some sets of design rules, based on hypermedia design patterns, and more. Also layout techniques, and even film techniques. Since we don't have many years of time-based hypermedia experience, we'll just have to try to capture what we can find. Perhaps I need to draw some diagrams for information flow between the communicative device layer, the design and user experts (and network expert?). Context expert -------------- I'm not too interested in this. It should be a history box which we will find out whether we need or not, and when we find it out, can condense knowledge out of it. No doubt this will feed the development of the user's online persona... Similar to the User Expert - we want to use it but not build it. Fits in nicely with work going on at the TUE - Hong Ying and Alexandra Cristea starting in May. ----- The work of the group. So where do people fit in here? Michèle is looking at integrating multimedia and semantic markup formats. Not part of the processing per se, but a tool to be used within the process. He is also looking at integrating multiple MM information sources within the environment. Definitely part of the Application Expert. FrankN is looking at trying to extract low-level characteristics of (video) data and then combine them into higher-level descriptions of the material. Partly to do with annotating the media (Application Expert) and partly to do with the Design Expert (what are the things we can talk about in the Design Expert). Stephane is interested in the consequences of such an approach for high-level authoring issues. Jacco - RDF/XMLschema issues Joost - qualitative constraints Lloyd - XML fragments for MMM 01 Lynda - make design knowledge explicit Saurabh - network adaptivity Susanne - user profiles ----- Miscellaneous notes Don't forget that what we are doing (to some extent) is automating the stages of HDM. The Optima system, Lambert Schomaker, follows the user browsing, uses it to build up the user expert and context expert. Would be nice to see it working. ---***---