PRESENTATION GENERATION

Some ideas discussed by Mark and Geert-Jan

This paper reflects ideas from several discussions by Mark and Geert-Jan and the writing style reflects exactly this type of communication: it is meant to store some thoughts and it is not meant to be a “finished paper”.

Presentation generation is a process of representing information in an effective, efficient and semi-automatic way to users given a set of retrieved complex objects which are (parts of) structured documents. Usually, we start from a collection fo structured data, sometimes also semi-structured data, and a presentation needs to be generated from a set of (complex) objects retrieved from those structured data or documents. We assume the structure of these documents is specified in XML in this report as this is the de-facto standard for describing structured information on the Web.

Furthermore, it is assumed that most of the semantics of the documents is present but the interpretation of this information is not known a priori: Additional knowledge (e.g. from the domain and the user) is needed to personalize and customize the information in a number of steps so that it fits the information request of the user and the intentions of the authors of the retrieved information. Note that we separate the semantics and the interpretation: we assume that the structure in the retrieved data represents the semantics, but that the interpretation (how to deal with the semantics) is left to an other part of the system. This opens up the possibility to seek for personalization and customization by adding “interpretation information” to the context of the retrieved data. By choosing different versions of the interpretation information different kinds of “adaptation” are possible. In theory this difference may not be very explicit: the interpretation information could easily be included in the original data, certainly when both formulated in XML. We assume here that the two kinds of information are placed in separate “files”.

The ultimate goal of presentation generation is to create a presentation that conveys the retrieved information to the user in the best possible way: The information presented to the user should match with the cognitive state inside the mind of the user during interaction with the system. In reaching for this goal we want to create a presentation that conveys the retrieved information to the user in the best possible way, where “best” is determined not just in terms of the semantics, but certainly also in terms of the interpretation. This means that we are not just looking at a single collection of retrieved data, but at a “system” that includes also the interpretation information.

INTERNAL CONCEPT REPRESENTATION

A bag of concepts retrieved from an information/data retrieval request must first be analyzed and augmented with additional knowledge to make sure the user will understand the concepts in the best possible way. A concept in cognitive psychology literature is defined as something that re-presents something to us. A distinction is made between external representations (in computer science often called modalities, divided in linguistic (text, speech) and graphical (2D, 3D, video) representations) and internal or mental representations which can either be depictive (picture-like) or propositional (language-like) with no clear border between the two. In the early stages of presentation generation the concepts retrieved are mental representations with no external representation. The first task is to create a representation document that is the result of a transformation of the retrieved internal (machine) representation into a representation that is similar to the mental representation of the user that interacts with the system. The process might be split in a bottom-up part (depictive representation) and a top-down part (propositional representation), both biased by knowledge we have about the user. This has the advantage that we can form much richer knowledge structures than either alone and even seems to correspond with empirical evidence in cognitive psychology. In SRM terms this may be viewed as a combination of the control and content layer.

Put in another way: first, we look at the internal, semantic issues that play a role in showing the internal structures between the data elements included in the data; then, we look at the external issues that cover the actual presentation of the data elements to the data. We do not fully agree on a clear motive for a split inside the internal “phase” in a bottom-up part and a top-down part. At least, as far as this affects our work (meaning that this may be an issue inside the application and outside our general framework). Another point of debate is whether the internal concept representation is necessarily depictive.

INTERNAL CONCEPT REPRESENTATION [BOTTOM-UP]

The internal concept representation stage looks at the retrieved concepts and tries to identify relations between concepts already present and may create new, combine existing ones or refine concepts based on additional knowledge about the domain and the intentions, beliefs and goals of the user. According to some theories it therefore needs to be a bottom-up approach. One can also say that the original query that specifies the retrieval, if there is one, can give a clue as how to present the retrieved data. From the specification of the query, the system could then learn something about the specific goals of the user with these data, and use this knowledge in constructing the global aspects of the data collection, e.g. the global relationships. Surely, if you just consider the data not knowing what query is asked, then you cannot do anything else as going bottom-up. In our applications it could be possible to exploit the knowledge embedded in the “queries”.

As an example, suppose that an EPG user asks for a program on a certain time T, but allows also for programs on adjacent time blocks, then it would be a good idea to construct the set of resulting programs in such a way that the preference or priority of the program at time T is obvious (just like in NS Reisplanner the trains are given with the “most relevant” train in the center).

This bottom-up approach and the propositional top-down approach later are likely to be interleaved: The top-down approach influences the concept formation on a lower-level and the concept formation itself influences the high-level structure that can be imposed by the top-down approach. As an example to illustrate this consider the automatic generation of a presentation that shows you how to bake a cake: Task knowledge imposes a general high-level script or schema on how to prepare food that needs to be filled in. How we merge the results is influenced by this choice (after all the concepts must fit in this high-level structure to some degree) and the concept formation step (bottom-up) which influences how the script or schema is filled (not all the information needed might be present in the results).

Note that you could say that the query specification is more an “internal” issue, while the data elements belong to a more “external” issue. The motivation for this would be that the exploitation of the query specification asks for “interpretation information”, while the manipulation of the data elements will be mainly based on the structural semantic information (and not or less on the interpretation information).

The next subsections give some examples of concept formation out of simpler ones. Basically, they are examples of “standard” operations (in queries). This could lead to some sort of language of available building blocks. Perhaps, we can propose some sort of (meta-)language for the specification of queries. [Note that this language development is closely related to the design of object (oriented) models or languages that some of us are familiar with. Not too strange seeing the similarity between Web languages and OODB languages.]

Concept selection:

Using additional knowledge we could decide that we only want a brief overview, i.e. selecting only a subset of attributes for a concept. A simple but powerful way.

Example 1:

 
e.g. Only the program name and the channel for example in the EPG case. 

 

The additional information here can come from different sources: it can be specified somewhere in the query, it can be derived from the interpretation information. This is not an essential aspect. What is essential, is the possibility of selecting parts of the data: “do not provide all elements of the data, but only provide elements E”. In this case I assume that E is a condition on tags, i.e. a set of tags, e.g. (Program name, Channel).

 

One could think of a compromise-operation: “do not provide all elements of the data, but only provide elements E explicitly, and give “links” to the others”.

Concept merging:

If knowledge is available (or imposed on concept formation by the propositional top-down approach later) that tells us that a program has with a high probability an identifying name we can use this knowledge to merge other elements if they share the name tag and its value. Otherwise we would decide to treat them separately (i.e. do not merge)

Example 2:

 
e.g. Object a: <program> 
                <name>Big Brother</name> 
               <description>Real-life soap</description> 
                ... 
            </program>          
     Object b: <program> 
               <name>Big Brother</name> 
               <channel>Veronica</channel> 
               ... 
            </program> 
 
Could become (if there is knowledge that describes that programs that have 
name tags can be joined if their name tag values are identical): 
 
<program> 
     <name>Big Brother</name> 
     <channel>Veronica</channel> 
     <description>Real-life soap</description> 
</program> 
 

In general you have an operation that merges data elements on the basis of shared tags with the same value, much like a join in the relational context. This, just like the remarks made on concept selection, opens up the consideration that there are two possibilities: the values themselves of the data elements are included, or there are links to them. For reasons of simplicity it would be a good idea to define this merge-operation as a composition of values, but then the suggestion is to add an additional (separate) operation that replaces concrete values by links to them. (Question: Do we allow merging on any shared tags, or just on “identifying tags” (in the spirit of the relational keys)?)

Concept linking:

Another opportunity is to create a hyperlink that creates a relation between the two. This can be interesting for concepts that are conditionally related.

Example 3:

 
e.g. Object a: <program> 
               <name>Big Brother</name> 
               <description>Real-life soap</description> 
               ... 
            </program> 
     Object b: <program> 
                <name>Villa Muis</name> 
               <description>Real-life soap</description> 
               ... 
            </program>     
 
Could become (if there is knowledge that says that similar descriptions in other programs can be related to each other): 
 
<program> 
     <name>Big Brother</name> 
     <description> 
         Real-life soap 
         <link href=http://..../ObjectB.xml> 
            See also 
         </link> 
     </description> 
     ...      
</program> 
 
or
 
<program> 
     <name>Big Brother</name> 
     <description> 
         Real-life soap 
         <related-programs>Villa Muis</related-programs> 
     </description> 
     ...      
</program>

Note that in the first option link here is generic: What the behavior of this link will be (new window, optional, highlighting etc.) is postponed until later when presentation/navigation knowledge is used to define the behavior of the relationship. However, this behavior appears to be more an external aspect. Therefore, the second option is motivated by the fact that creating hyperlinks appears to be an external concept of the creation of relationships: by separating them, the creation of relationships contains much more “internal”, semantic aspects, while there will be several different ways of “implementing” these relationships.

 

Note that we need a clear definition of our concepts. We can see them as “objects” in the structured data, i.e. data elements with printable data values, with links to other elements and/or subelements (like in Complex Objects). We could also use the Resource concept from RDF (to be more standard), or the Element concept as used in RMM. Here, it is most important as to the requirements for the (query) language that we are considering.

PROPOSITIONAL INTERNAL CONCEPT REPRESENTATION [TOP-DOWN]

Important in the early stages of presentation generation is a high-level structure, script or schema in which the formed concept and their relations should fit. Propositional internal concept representation is therefore a top-down approach as it tries to enforce a structure in which the depictive internal concept representation should fit. This stage addresses user tasks, intentions, goals and behavior (representation spread out in time?). Basically it forms the conceptual framework or "story script" that needs to be communicated to the user. Obviously this story script must be extended or changed based on the knowledge the system has.

 

The story script can be seen as the rhetorical structure associated with the query. As described earlier, the query can “imply” a certain way of presenting the data: that is part of the rhetorical aspect. It is the feeling that we should concentrate on the (syntactic) facilitation of the rhetorical specification, not on the actual specification itself: that is something for the experts to do.

 

Example 4:

 
e.g. the task of fixing a thing could be represented as follows: 
 
<task> 
  <description>repair object</description> 
  <problem>...</problem> 
  <effect>...</effect> 
  <solution> 
    <constraint>...</constraint> 
    <operation>...</operation> 
    <validation>...</validation> 
    <help> 
      <name>...</name> 
      <address>....</address> 
    </help> 
  </solution> 
  <solution> 
    ... 
  </solution> 
</task>  
 
e.g. the task of presenting items (EPG): 
 
<task> 
  <description>overview information</description> 
  <item-list> 
     <item> 
     ... 
     </item>  
  </item-list> 
</task>  
 

Note that the inclusion of global relationships (for the internal aspect) involves much more temporal aspects than the relationships that solely play a role at the external presentation level. In terms of the rhetorical aspect the function of the relationships is quite different and at a higher level, which means that for instance the temporal relationships play a different role. Perhaps, this requires us to think of some high-level temporal building blocks: now, we just have things like “index” or “tour” that are based on the lower-level “link” (note that we sometimes use the link as a building block at the higher level too).

A question is whether the current “tag-oriented way” of describing the objects suits this use of higher-level relationships: there might be a need for a model that allows for Elements and Relationships; perhaps, not all relationships can elegantly be described inside the elements.

EXTERNAL CONCEPT REPRESENTATION

Once a mental representation that is a combination of concept formation (depictive internal concept representation) and organization (propositional internal concept representation) for the specific user has been created in computer memory, this mental representation must be transformed in a multimedia or external representation, i.e. one that can be presented to the user. The mental representation must be transported via one or multiple media in the right combination of modalities to the user. This time external representation or presentation design knowledge is needed. The target here is to provide the user with a presentation that is easy-to-view (while the internal representation is more aimed at easy-to-understand).

The subsequent sections make a distinction between modality and medium: we do not yet agree on the need for this distinction. One remark is that we are more interested in the inter-media or inter-modality aspects here (and leave the intra-m. aspects to the experts in the specific domains).

Modality refers to a particular way or mechanism of encoding information for presentation to humans in a physically realized form [SRM]. Examples include 2D and 3D graphics, written and spoken natural language. Multimodality refers to multiple modalities that are used for encoding information. Examples include maps (graphics and written natural language), video (graphics, images, music, spoken natural language). A medium is a channel of conveying encoded information for presentation to a specific user or group of users (output medium) and vice-versa (input medium) [my definition for this report]. A medium can support multiple modalities e.g. paper (conveys written natural language, graphics and images) and microphone (spoken natural language and audio). For each internal concept of the previous stage an external representation consisting of one or more modalities must be chosen (multi-modal presentation design). Additional knowledge (user, domain, platform, resources) can be used to personalize this process further. This knowledge and the presentation created so far influences the media allocation (multimedia presentation design). The end result is a presentation design plan ready for realization.

MULTI-MODAL PRESENTATION DESIGN

The external concept representation begins with a set of retrieved objects for each concept formed during the internal concept representation stage. During external concept representation or presentation design choices must be made about the modality and the objects that will become part of the final presentation plan. The modality choice must depend on:

The result of this step is a presentation plan with an ordered set of objects to choose from for each concept. In the next step the media allocation, layout and navigation and time presentation information is finalized.

 

Example 5:

 
e.g. Object a: <program type=text> 
                 <name>Big Brother</name> 
                 <description>Real-life soap</description> 
                 <channel>Veronica</channel> 
                 <content>http://www.dynamo.com/content/bb.txt</content>   
               </program> 
 
     Object b: <program type=video> 
                 <name>Big Brother</name> 
                <description>Real-life soap</description> 
                <channel>Veronica</channel> 
                <content>http://www.dynamo.com/content/bb.mpg</content>   
               </program>     
 
Given presentation platform and network capabilities knowledge: 
 
Object "a" would be chosen for an EPG on a Remote Control  
(our RC cannot support the multimodal nature of the "video" type.)  
Object "b" would be preferred for display on a standard TV. 
Object "a" and "b" would both be chosen for an enhanced TV.  
 
Note that the presentation medium places constraints on modality selection here. 

 

Note that one item here is not considered, but perhaps it should be considered: the collection of objects given. Now, only the objects retrieved are considered, but not the set of objects that is produced. In IR applications the result may be just a collection of objects, but in our target applications the structure of the result, e.g. a set of objects, could be “exploited” also. the resulting data set builds one integrated presentation where the presentation puts lots of semantics in the highest-level representation (in this case of the set-construct). Attention should be paid to this part of the presentation design.

 

Another aspect is that the example basically shows how a data element can be adapted to a certain modality. In my opinion the most crucial (new) aspect here is the combination of the different modalities: how does one specify relationships between different modalities? Note that most modality issues are modality specific, so we cannot really foresee all specification elements needed for the desired modalities. However, probably there will be a general way of relating parts of the specifications for the single modalities to each other.

 

MULTI-MEDIA PRESENTATION DESIGN

The resulting presentation design of the previous design step needs to be tailored to a specific combination of media. Decisions about navigational links, layout and timing information in the presentation design can only be made once the selection of modalities has been made with the capabilities and resources of the environment in mind. Multi-media presentation design chooses the media for presentation and creates the final presentation design plan for a particular medium or combination of media ready to be realized. Depending on the capabilities and resources of the medium, navigational links, layout and presentation timing information are added.

The reason for breaking up presentation design up in two here is to separate general presentation design from its specification for a number of selected media. Navigational links, layout and timing seem media (and not modality!) specific as each medium supports only a set of modalities (e.g. layout in audio devices and presentation timing in newspapers makes no sense).

PRESENTATION LINKS

Presentation links are the external representation of internal concept relations. Presentation links enable the user to get effective and efficient access to the external concept representations (objects). Each hyperlink in a web document usually refers to an external concept representation that might be of interest to the user given his current context (piece of the web page shown in his browser). Much adaptivity in a presentation can be added if the behavior of the links is personalized using additional knowledge. Access restrictions, level of user experience and related information can be modeled using links. It is important to make a distinction between knowledge that is remains the same during the interaction with the user and the presentation and knowledge that is not: Access restrictions often remain the same for users during a session while the level of user experience may change dynamically as the user reads and learns. This last category requires that some presentation links change during interaction of the user with the system. Interaction thus requires that (parts of) the presentation sometimes need to be regenerated.

Example 6:

 
e.g. Hyperlinked table of contents to book chapters. 
 
e.g. Clicking on the text object starts the TV program (presentation link): 
 
     Object a: <program type=text> 
                 <name>Big Brother</name> 
                 <description>Real-life soap</description> 
                 <channel>Veronica</channel> 
                 <content>http://www.dynamo.com/content/bb.txt</content>   
               </program> 
 
     Object b: <program type=video> 
                 <name>Big Brother</name> 
                <description>Real-life soap</description> 
                <channel>Veronica</channel> 
                <content>http://www.dynamo.com/content/bb.mpg</content>   
               </program>     
 
Could become (if there is knowledge that the medium supports MPEG!): 
 
<program> 
  <link href=http://www.dynamo.com/content/bb.mpg> 
    <name>Big Brother</name> 
    <description>Real-life soap</description> 
    <channel>Veronica</channel> 
  </link> 
</program>  

More about interaction in the last section of this report.

 

Note that one could define the concept of presentation link in a different way. (Maybe even we should reserve the name presentation link for links that are specifically targeted at enhancing the presentation instead of representing semantic relationships.) The adaptivity issues discussed above, really apply at a higher level of abstraction to all mechanisms representing relationships. Surely, in the context of the Web the link is the most prominent one, but a clear separation between the relationship and its platform-dependent representation better acknowledges where the adaptation takes place. In most cases the adaptation takes place at the relationship level, not at the level of the relationship’s visual representation.

This also better reflects the changes that take place during the user’s interaction with the system.

 

Another remark: the example about interaction (Example 6) shows that there the management of the platform has not been elegantly separated from the actual data.

PRESENTATION LAYOUT

Media types that support spatial dimensions (the spatial dimension is greater than 1) can represent multiple objects to a user at the same time. Possible constraints (knowledge) placed on presentation layout design:

Example 7:

 
e.g. Annotated picture library, tables 
 
e.g. Program text and snapshot image of the program are placed next to each other. 
 
     Object a: <program type=text> 
                 <name>Big Brother</name> 
                 <description>Real-life soap</description> 
                 <channel>Veronica</channel> 
                 <content>http://www.dynamo.com/content/bb.txt</content>   
               </program> 
 
     Object b: <program type=video> 
                 <name>Big Brother</name> 
                <description>Real-life soap</description> 
                <channel>Veronica</channel> 
                <content>http://www.dynamo.com/content/bb.mpg</content>   
               </program>     
 
Could become: 
 
<program> 
  <table> 
    <layout-manager>horizontal_grid</layoutmanager> 
    <item> 
      <cell>0</cell> 
      <resize>fit</resize> 
      <content>http://www.dynamo.com/content/bb.txt</content> 
    </item> 
    <item> 
      <cell>1</cell> 
      <resize>scale</resize> 
      <content>http://www.dynamo.com/content/bb.mpg#1034</content> 
    </item> 
  </table> 
</program>  
 
(I wrote down the links to save some space in the example above...) 

PRESENTATION TIMING

Media types that support a temporal dimension can support a presentation that changes through time. Constraints (knowledge) placed on the presentation timing:

Example 8:

 
e.g. Voice control systems, video, 3D simulations in VRML 
 
e.g. Customized advertising during video 
 
<program> 
  <table> 
    <layout-manager>horizontal_grid</layoutmanager> 
    <item type=0001> 
      <cell>0</cell> 
      <resize>fit</resize> 
      <content>http://www.dynamo.com/content/bb.txt</content> 
    </item> 
    <item type=0002> 
      <cell>1</cell> 
      <resize>scale</resize> 
      <content>http://www.dynamo.com/content/bb.mpg#1034</content> 
    </item> 
  </table> 
  <seq> 
    <content type=0002/> 
    <content type=0234 time=500 interrupt=1/> 
    ... 
  </par> 
</program>  
 
500 time slots after the first event starts we show a commercial with id 0234. 

Media types that support more than one "presentation" dimension (links, spatial, temporal) allow for more sophisticated presentations but this complexity comes at a price because all the constraints must hold (to some degree) for presentation realization.

The basic question, however, is whether the spatial constraints be specific to deal with spatial aspects, or should the temporal aspect be dealt with in the same way? It is unclear whether the three aspects (linking, space and time) can be dealt with in a uniform manner, but the advantage would be a more general and robust technique.

 

Note that the temporal aspect is different from the other two, linking and space, in one sense: adaptation is typically a dynamic process that changes the behavior over time. This means that handling the adaptation as a discrete process that interleaves with the interaction by the user does not quite apply to the temporal aspect. While the user is actually interacting with the system, the system may want to adapt: the traditional “interact-adapt-interact-adapt” sequence is not sufficient anymore.

 

Note an other difference. Links are stored locally: a web of data elements is stored in terms of nodes with links included, and we do not explicitly describe that web as a composed object. In the seq-example you construct a presentation with data elements starting 500 slots apart. Perhaps, we want a composed object here: a consistent approach might seem useful. Perhaps, in the combination of the dimensions.

 

Example 7 shows, what is addressed earlier (but not in detail), that it would be advantageous to have some neat addressing scheme. Now, the constraint to both the two data elements next to each other involves the description of the two elements in a rather strange way: instead of pointing at the two elements their content parts are filled in. Probably, a more fundamental approach to deal with (the inclusion or the reference to) data elements is necessary.

 

Multi-media presentation design, especially the parts concerning presentation layout and timing, and presentation realization (constraint checking) may need to work together in some sort of generate-and-test way if it is too hard to come up with a global optimum (all constraints satisfied) in a reasonable amount of time.

PRESENTATION REALIZATION

The presentation realization stage is an abstraction layer between presentation design and rendering that allows for constraint checking (on both time, layout and link structure) and presentation language independence (new presentation languages can be easily be added without much rewriting). Constraint checking is the hard part, fortunately sophisticated constraint checkers are available and the previous stages in the presentation generation process have (hopefully) eliminated a lot of ambiguity (possible errors) from the presentation plan. The mapping to a presentation language like SMIL or XHTML should be relatively straightforward.

Note that it is debatable what kind of constraint checking is done in this layer. Basically, the constructed specification has to be verified somewhere, between design and rendering.

Example 9:

 
e.g. name of a program will be shown in bold (supported by HTML) by default. 
 
  <name>Big Brother</name>  
 
becomes: 
 
  <b>Big Brother</b>  
 

PRESENTATION RENDERING

The last step is the rendering of the presentation to the user through one or more media. The presentation platform must support the presentation language of course and make sure the constraints are satisfied. Scheduling, presentation quality and network Quality of Service are important issues. Feedback from the user to the system must also be captured in the presentation device but is part of input analysis, which is not described here.

Example 10:

 
e.g. The user can change the default presentation behavior of his browser 
 

In another definition the rendering process is a technical process aimed at translating the specifications into real technical constructs that make something appear in the presentation. Then, some of the “intelligence” and feedback used in the other definition should be transferred elsewhere.