Patrick Schmitz, Ludicrum Enterprises
<cogit@ludicrum.org>
This draft describes the high-level design of a mechanism to use the explicit and implicit semantic relations among the elements in a set of multimedia query results to derive an appropriate presentation structure for automatically generated presentations. The lower layers of the Cuypers system are assumed as a target, and will continue the translation of the presentation pattern into an actual presentation expression in a target language such as SMIL, XHTML+SMIL or SVG. The approach builds on the layer model proposed by Stefano, and is inspired by work in the area of generated diagrams and visualization of semantic query results.
We first describe some of the background and related work that has inspired some of our approach. We then summarize the relevant pieces of the Cuypers layer model, providing context for the modules we will address. We describe details of the modules of interest, and the approach we intend to take with them. Finally, we mention work we will not attempt in this version, but see as future directions for study.
I know that this is still pretty rough. Feedback on any really confusing bits is useful, as well as skepticism (and of course suggested strategies or omissions!). Bring your questions on Wednesday!
[Need a brief intro to generated presentations. There must be some stuff laying around from Cuypers papers...]
[Need some motherhood and apple pie about the emerging semantic web, and why we think it has potential in application to multimedia presentations. The stuff at the beginning of Susan/Joost/Jane's paper is great in this respect.]
Thomas Kamps in his thesis, several publications (and more recently in his work with iView.com?) pushed the idea of analyzing the set-mathematical qualities of the relations expressed in an ontology (or more generally in annotations that expressed relations about a set of objects). His goal was to automatically construct diagrams for a set of related objects/facts. While this is different from our goal of mapping appropriate discourse models/narrative structures from the semantics of a data set, I think we can leverage a number of his ideas to good ends.
He generalized relations with respect to the kind of set-related properties they had, including things like transitivity, reflexivity, inclusion, etc. He then used the basic properties of the relations for two important purposes:
We will likely not directly leverage this work, but rather take inspiration from it. We need to consider new relations for time and HFO's but use the same idea that certain patterns and classes of relations can be manipulated mathematically, and can then be associated with certain discourse structures, visual semantics, and structural semantics in the presentation.
Current research and even a few commercial products are exploring ideas in visualizing results of queries on a semantic web. A good example of this can be found in the Spectacle system [ref?] which is developed as a product by aidministrator.nl. This analyses a set of query results, using a very simple ontology (basically they just consider a simple tree of sub-class relationships). The look for groups and patterns in the query results, and then present these to the user with a novel UI. Their user is a knowledge worker, and so the UI is not of direct interest, but their analysis of the grouping patterns is directly useful, and "validates the market" as we in the product world like to say.
I have several other papers that are also looking at initial steps for visualization of semantic relations - initial in the sense that they do not assume extensive annotation or powerful inference engines. I have not gotten past the abstracts of most of them, but have simply been collecting the ones that look promising. I am assuming that we will learn more from these folks, and refine our ideas.
The paper we recently received from Susan (last name?), Joost and Jane Hunter describes an iterative approach to gathering addition data beyond an initial set of query results, to fill out the media for a presentation. They are also assuming Cuypers underneath and generating SMIL presentations. Their work is really concentrating on the trivial data mining that can be done on Dublin Core relationships, in a given database they are working with. I am not entirely convinced by the examples in the paper, but the basic idea echoes our ideas about deducing semantic information from annotations, and using this information to generate better presentations.
In their model, some query refinement is done by the user through an iterative process, although the system does do some as well. If the ideas were automated somewhat more, and extended to leverage the concept of media roles in the discourse model, I think it might work a good deal better. [I know I need to explain this - I will talk about it on Wednesday.] I think this is an idea worth pursuing.
[Add a proper intro to Cuypers with references to recent docs and experience.]
[Address the issue that as I describe and use these entities, they are not really layers but rather modules or simply functional units. This is not at odds with Stefano's model, but rather another view of the same ideas.]
Cuypers as it exists has a fairly solid model of the lower layers in the process of generating presentations. However at this point, it does not have a good model for how to derive and define a discourse model, or how to map from the discourse model down to the existing layers. Most importantly to us, it does not define a mechanism to leverage semantic annotations to the media objects. We have identified several key modules that will address this, and divide the problem into distinct areas. The revised model is presented in Figure 1. Note that the current Cuypers layers are lumped together for simplicity - details of how these lower layers function can be found in [ref?].
Layer/Module | Inputs | |||
---|---|---|---|---|
Discourse |
<=== | User Model |
||
<=== | Ontology of |
|||
|| |
||||
Semantic |
<==> | Data Input (MMDB) |
<=== |
Domain Ontology |
<=== | Multimedia meta-ontology |
|||
|| |
||||
|
<=== |
Ontology?/List?/Heuristics? |
||
<=== |
Compositional |
<=== | Ontology of Presentation |
|
|| |
|| |
|||
Hypertext Formatting Objects |
Our work will concentrate on the Semantic Analysis engine and the Presentation Patter Expert. As part of this, we expect to define what is labeled as the Multimedia meta-ontology, as well as the configuration and annotation of the presentation patterns. The other modules and ontologies depicted in figure 1 are assumed in our model and we will describe them in any detail. In particular, the Discourse model and the Compositional Semantics expert are beyond the scope of our work, and yet are important inputs. [We need to figure out how to say enough about these to be clear what we expect from them - we just don't want to explain how they do their work]. In addition, the multimedia database, the associated query engine and the domain ontology for the data are assume din this model. [Along the way, I expect one of our results will be some comments on how to annotate multimedia data objects to improve the results of our engine, but that is a ways off at this point.]
We do not want to repeat work done in the area of figuring out user preferences or the role the user has adopted (e.g. student, child, expert). We can assume that this info is available to us, including the important aspect of the degree of desired interaction. This determines a lot about the presentation we should generate, and is otherwise related to things like the client device environment (e.g., television is largely passive, kiosks tend to be mildly interactive, and a PC can be highly interactive).
In addition, the nature of the query often dictates (or indicates) a particular model - this is usually because the query UI has actually framed the query as a question to be answered, effectively giving the user a choice of discourse models.
Nevertheless, I think there is room for us to consider the discourse layer from the perspective of the semantic relations we find in the (simple keyword) query results.
This raises the question (for me): do we need to know the discourse model in order to effectively query for the objects? I.e. in order to adopt a certain tone, don't we need to prefer some content (hopefully annotated to fit that tone)? To address a child, is it sufficient to simply omit portions of a text, or should we get a different text altogether? To what extent should the discourse layer and the semantic analysis engine negotiate on the query?
The Semantic Analysis Engine (SAE) is responsible for assigning the roles defined in the discourse model to actual media data items. As part of this, it may query the MMDB for additional media items. Note that unlike the simple approach described by Susan/Joost/Jane, I think that we can only determine the need or appropriateness for additional media if we consider the roles of the initial media. The question then becomes for which roles we lack media given the discourse model at hand. Simple gathering media that may somehow be related is too simplistic, IMO.
The SAE will consider also the semantic relations defined between the query results and the original query context, as well as relations among the query results. Two classes of analysis are performed: semantic reasoning using a meta-ontology that we define, and simple data mining that simple considers grouping patterns among the query results.
For the semantic reasoning, the SAE takes as input the data objects returned from the multimedia database, and the meta-ontology. We describe it as a meta-ontology because we are abstracting the characteristics of the domain ontology relations to more general relations (or properties of relations) that we care about and can use in choosing a presentation pattern. Some of the meta-ontology relations will allow us to apply some of the techniques of Kamps et al., and other relations will facilitate data mining for specific kinds of relations that we can associate with central aspects of the presentation patterns. Clearly, the definition of the meta-ontology and what it expresses are a significant piece of the puzzle.
Defining a meta-ontology lets us concentrate on semantics that we associate with presentation patterns, rather than dealing with the semantics of each domain ontology. It allows us wide application on the one hand, since we can work with any domain ontology for which we can define a reasonable meta-ontology mapping. It also means that we can more or less arbitrarily define the semantics of the meta-ontology to fit what we want to be able to do with presentation generation. This is currently wide open, but I have a few ideas about this to start a discussion with, based upon the dreams I had of doing temporal data mining. Once the discussion gets moving, we should figure this out better. It will also be influenced by what we want to do with on the lower levels (e.g. to choose a discourse structure) - we can attack from that side, thinking about what are the decision points for choosing one structure over another. If we find application of the ideas to other layers (instead or as well), like the visual semantics or even the HFO's, then the requirements there will inform us as well.
Initially at least (and perhaps always), the creation of a specific meta-ontology for the domain ontology will be done by hand. This is based upon the same kind of analysis that Kamps et al. take to characterize the set-mathematical properties of a given domain ontology. The meta-ontology is used to consider the relations defined on query results. The goals are to do the same kinds of data-reduction, collapsing of aligned relations, and perhaps some other simple data mining in the same spirit as what Susan, Joost and Jane described.
Looking ahead, I think that this is likely to expand in a number of different directions, including such things as relations that can can be associated with abstract story-telling paradigms; if we can find patterns in the data that we can associate with central themes in narrative structures, then we can either refine or perhaps partially construct the discourse model to more closely fit the query results.
In addition to the analysis using the meta-ontology, we can also do some simpler analysis based upon grouping elements according to simple patterns in the domain ontology. Note that we do not need any understanding of the semantics of the domain ontology for this analysis - it is sufficient simply to find patterns, and to consider grouping things accordingly. If we combine this with the knowledge in our meta-ontology, we can take this a step further and refine the groups (e.g., to further collapse groups, and/or to determine which patterns are likely to be more significant).
At the end of this process, we should have enough information about the media elements and how they should function in the discourse model to choose an appropriate presentation pattern.
The Presentation Pattern Expert (PPE) is responsible for taking the inputs from the discourse model, the Semantic Analysis Engine and the Compositional Principles expert to choose the most appropriate presentation pattern from among the many possible choices.
[Add background on template-based versus constructive approaches to this kind of work. Some good stuff for this in Kamps' thesis, but need refs for MM domain.]
We want to define a hybrid mechanism that can combine some templates with a constructive engine. The templates are likely to act as "frames", and allow a graphic designer to more directly influence the overall look and style of generated presentations. The PPE then uses a constructive approach to generate the data-specific portion of the presentation within the frame context. The frame templates will include stylesheets that influence the entire presentation, beyond just the frame content defined in the templates.
The development of the mechanism will take several steps, beginning with a simplified model and proceeding towards the constructive model as we develop more understanding of the components (i.e. the constructive building blocks) we need, and how we must annotate them to support automated construction.
In the simplified model, we will describe some prototypical template structures based upon our experience. We will describe these in such a way as to facilitate choosing the most appropriate one given the discourse model and the information we derive from the semantic analysis. Much of the point of this simplified approach is to develop an understanding of what we need from the semantic analysis in order to reasonably choose a presentation pattern.
As we develop this understanding further, we can then begin to decompose these prototypical pattern examples into the smaller building blocks or core features that distinguish them and that can be associated with aspects of the data and/or the discourse model. At that point, we then have the basis to develop a constructive model for the presentation patterns. This in turn should lead to a wider variety of presentations that can be generated, and hopefully presentations that are more tailored to the discourse model and the data at hand. [This may sound like blue sky, but it is simply the analog of what Kamps did for diagrams.]
Whatever form the PPE takes (i.e. template based or constructive), the result must provide the structure for the presentation as a whole. The presentation pattern is more general and over-arching than the Hypermedia Formatting Objects, which concentrate on more local elements. As such, the HFO modules will provide requirements for the PPE as well.
The Presentation Pattern Expert will likely have to balance issues such as the degree of interaction, the overall flow of the discourse model, the way that the media elements are ordered (or grouped) by various relations, the kinds of relationships that exist among the query results, etc. The expert may also have to consider the expressiveness of the target language, constraints defined by the user model and platform, if certain presentation patterns require specific facilities (such as animation or specific modalities).
The tricky part about this layer is that we want to be specific enough to provide context for the HFO and succeeding layers/stages, without completely tying down things like the layout and temporal structure. My tendency would be to start with templates that do define the overall layout and temporal layout, and explore how to make these more abstract, less constraining, and more constructive.
It is my hope that we can incorporate the direct input of graphic (and other) artists at this stage, e.g. in the definition of the templates or framing graphics and style for the presentation. While some input (e.g. from experienced multimedia authors) may come in a more abstract form (e.g. related to cohesion, pacing, etc.), I envision this as expressed by the Compositional Semantics. In order to let graphic artists contribute directly and more naturally to a presentation generation system, I think that (as described above) the Presentation Patterns must be able to accommodate something like authored presentation templates (possibly with some set graphics), and a set of associated presentation stylesheets. These templates and stylesheets could be in the target-language, or could be expressed using HFO's. The authors will either need an HFO authoring tool, or will work in a target language that would be translated to HFO's for wider application. This issue of how to integrate the target language constraints is another one that needs more thought.
There are really two projects here, one associated with each of the two modules described above. However, they are so interdependent that I think initially at least, we need to develop them together. Without an understanding of how the PPE will work and what requirements it has, it is very hard to know what the SAE really needs to do. And without the results of the SAE analysis, we do not really have a basis for decision-making in the PPE. Nevertheless, once we have a better idea of how the two modules must work together, ongoing work on the two should be able to proceed independently.
I am also interested in how we can accommodate what I have been calling non-narrative discourse models. This could involve one of a number of approaches (or some else altogether):