Author: Joost Date: okt 26-2004 The vicious triangle This note has a chronological character which is not the best structure for conveying a message, consequently it probably is hard to follow completely. The aim is to give you a bit of insight in what I am thinking about. The oldest contributions are on top, the newest are at the bottom. In general, the most recent is the most relevant but to understand the context reading the beginning is advised. If time is precious start reading the scenario's indicated by '**************' vicious-figure.pdf vicious-table.pdf (found in the bluebook dir) are a table and figure which illustrate some of the ideas. Intro The past years we have talked a lot about "the vicious triangle" (or variants) which embodies the mutual dependencies between content, presentation structure and style. We used examples of trade-offs which needed to be made to illustrate difficulties with multimedia documents compared to text based document structures (Some of the trade offs apply to text models as well). In our own work however we mostly avoided the triangle trade-offs since we did not understand fully what exactly the trade-offs are and when these decisions need to be made in the generation process. This blue note is an attempt to boil down the important factors of the triangle. We describe different scenario's in which we identify the trade-offs and the influence they have on the presentation. The overall goal is to establish an architecture/framework where these trade-offs are made explicit and can be influenced by the author. The Triangle definitions Content: The set of available media items Structure: The hierarchical discourse structure of presentation. (Structured Progression) [lynda: I would like to leave discourse out and stick to SP (importance, grouping, ordering)]] Style: Perceivable elements of a presentation defining aesthetics and semantics. dependencies Content - Content [[lynda: this is actually ambiguous]] The choice for a media item influences the choice for other media items. For example don't use it within the same presentation again. Content - Structure The choice for a media item influences the structure/discourse of a presentation Content - Style The choice for a media item influences the style of a presentation Structure - Content The structure/discourse of a presentation influences the choice of material used. Structure - Structure The choice for a discourse should remain constant/consistent. Structure here is Structured Progression which embodies discourse elements like genre. If the genre is established it constrains the sub-genres(narrative units?) allowed. For example if the genre is biography it can contain narrative units private life and factual data. If however the genre is fairy tale these sub-genres/narrative units are not really appropriate Structure - Style The discourse/genre influences the style. eg. Biography looks different than a fairy tale If because of lay-out constraints a (sub) Presentation Structure can not be fitted on one page but needs to be distributed over two pages. The fact that these two parts belong together needs to be conveyed using lay-out. Style - Content The style of a presentation requires media items not to conflict with this style. Style - Structure If the style of the presentation is fixed (eg. hard coded margin border paddings) this influences the structure of the presentation if the content does not fit onto one screen. Style - Style Style should remain consistent. note: I keep meaning discourse when I write structure: Is structure really discourse? Probably structure is more than just discourse. It is a structured progression. But is a structured progression than an abstraction of discourse? After discussion with Lynda about our different understanding of what structure in the vicious triangle means, I (we?) realized the triangle was more like a 3-dimensional space then a 2d space. The extra dimension is the abstraction of the presentation: one extreme is the final formatted presentation (smil file) the other extreme are the knowledge sources. Thus the goal of a presentation generation is transforming abstract knowledge into a concrete presentation. Although the explicit knowledge becomes less towards the end the concrete product still represents/encodes implicitly the knowledge contained in the original knowledge sources. Structured Progression is (partly) constructed by using discourse and domain knowledge, this information is implicitly stored in order, grouping, priorities. It is an abstraction needed to provide domain/discourse independent rules for presentation formatting. Plot - Presentation Structure - Structured Progression The plot, is domain knowledge structured towards presentation. Presentation Structure is an abstraction of document structure. Structured Progression is an instantiation of Presentation Structure. note: Just Style, Content and Presentation structure seems to be a bit simplistic. There is also a User who influences all of these categories. Summarily for the delivery context. Moreover the terms Style, Content and Presentation are rather abstract while in fact the trade-offs work at a more suitable level. E.g. Presentation Structure contains (implicitly) domain structures and discourse structures Discourse structures include plot but also genre and to some extent document structure. 2bdone refine categories Refinement of categories The following list provides for each individual category Content, Structure, Style knowledge sources which influence the result for the respective category. Note that sub categories can occur in different categories. Typically they provide another viewpoint though. Presentation Structure Discourse: Reflects the intended message of the author, organized in such a way it logically make sense. Domain: Presentation Structure defines roughly the organization of content to convey a particular message. To make decisions about what information needs to be presented together you might need information about the domain. Topia in contrast doesn't need this since the hierarchical grouping is determined by common attributes (independent of the domain). Disc knows about painters and artists en knows what is relevant information and how to structure it. User: The structure of the presentation is tailored to the request of the user. First of all, if the system responds to a query of the user, it should be answered. Secondly the presentation should know about a the background knowledge of the user in order to serve both a domain specialist and a novice. Document: Partly the structure of a blob of information is determined by the inherit structure of the document and the output media. For example a report has a particular document structure which is different to the structure of an essay. Output media: influences the structure of a presentation. A paper media has no temporal dimension, material can be presented spatially only. Within a film however the temporal structure is dominant. Device: The device influences what media can be used, and by doing so influences the presentation structure Content Selection: Domain: To select a particular media item you need to know what you want and how it is described by its metadata. Device: Selecting a media item only makes sense if it can be presented on the device you are using. Note that you might be able to adapt the media item to fit your requirement. Nevertheless this can be seen as just a larger collection of media items to choose from. User: A user might have preference for particular modalities, or requirement which exclude modalities. Do not show x-rays of a painting to a 12 year old. Discourse: A media item has a role within a presentation. Certain media items are more suited to be used within an introduction, others provide detailed information Modality: The modalities available for presentation restrict the set of available media items. Document: The document structure (and presentation media) influence the media choice since they may not support the modality of the media item. Genre: The genre influences the media choice. A biography/documentary document might use a painting of the subject. For a more formal document such as a CV a picture might be preferred. Style: The color scheme of a presentation should match the media content. Style Sheet: Design/Layout: User: Apply bright colours for children. Larger fonts for people with bad vision. Content: The style is influenced by the content. Darker images work better on a dark background. Device: do not use colours on black-white screens. Document Structure: A scientific paper/report has a formal/serious lay-out. A powerpoint presentation typically is more colourfull. When talking about Style there are two distinctions to be made. 1) There is the style in the classical sense (stylesheet) encoding perceivable style elements of a content/document structure (including border, padding, colours etc.). And there is 2) style/design/layout information in the "body" of the presentation. This is how media items are combined which are not already formatted by the document structure (A slideshow of images in a mm presentation). The first one is independent from the discourse (the discourse is implicitly in the document structure). The second one however is not. Every document is partly 1 and 2, the level of detail in the document structure (report,chapter,section,subsection) sets where the border between 1 and 2. again: Style: Layout: Part of Style is layout which is the spatial/temporal position of the elements in a presentation. Device: The layout should fit the display User: Not too many items on a screen Document Structure: If the document structure is report the layout typically has chapters, sections subsections etc. Discourse: If the document structure is relatively flat, discourse relations need to be expressed explicitly. For example an image, and a text explaining the image are next to each other and aligned to convey the relationship. Content: Media items influence the style because of its content. (black-white photograph - abstract art) Style Properties: Includes colour schemes, border, padding and margins definitions, fonts etc. Device: Don't use colours on black and white devices User: User preference for certain colours or colourblindness. Genre: Children stories typically have bright colours. Thrillers are black. Content: The colour scheme can be adapted to fit the content. The vicious triangle identifies Presentation Structure, which defines the structure of an presentation by order, grouping and priorities. From a discourse perspective the plot would resemble PS the most. The difference however is that within a plot domain relations exist while in a SP they are mapped/transformed to order, grouping priorities. Structured Progression Plot: a view upon a fabula. A fabula is a graph structure the plot is hierarchically structured (dag if you like) Fabula: (User defined) Subgraph of the World, contains all domain knowledge for the presentation. Genre: Narrative Units, Story templates, biography, essay etc. Document Structure: Report, letter, book, mm Scenarios So far we discussed examples bottom up. This way we tried to identify the trade-off dependencies within the vicious triangle. To check whether the "model" is rich enough we now focus on a top down approach. That is, we use "existing(practical)" trade-off scenario's and see how the fit the model we have described so far. Colourblind User/User Preference - Corporate Colours (user-design) Show N items - Screen limit (structure - device) Content which represents the domain concept best - Content which is easier to access note: The problem with trade-offs is that most of them happen under the surface and are not as easily explained and identified as the ones mentioned above. Example of such a trade-off is the choice for a document structure (such a report/paper/biography) which influences the structure of the material and the way it is presented. For more domain specific document structure (paper/biography) the structure is mostly fixed a priori. This states the required parts of a document/presentation and as such it is much like a template. The more domain specific the template gets the more it limits the scope in which it can be used. Besides a pre-fixed structure, document structure has the advantage that a generic style sheet can be used which makes style issues easier (scientific papers can all be formatted with one template). On the down side they will have a pre-cooked/unadapted and therefore 'boring' appearance. So, as these trade-offs exist, but not all of them are as explicit as the examples above a style-sheet-like approach where an author makes these trade-off explicit might not really be feasible because the consequences of a choice influences the presentation on multiple levels and an author cannot oversee this. Instead a more high level approach such as "strategies" might be better suited to control dependencies. This is also how formatting works in, for example \LaTeX where a an author can influence formatting by stating preferences (adding/removing badness) but it is the system which makes the final choice where it makes the trade off of all requirements. A similar approach is advocated by SRM-IMMPS and Suzanne's Loeber MAO which make use of experts which have an overview of the systems as a whole. How would this work for real? Experts communicate with each other therefore protocols need to be established. This is related to the previous dependency problem since it needs to be clear what is communicated when. This requires linearizing the process which already involves trade-offs to some extent... Disc ---- World, User -> Fabula Genre -> Plot/Discourse Document Structure Media -> Layout Style Props -> Style Aria ---- User -> Plot/Discourse Document Structure Media Content -> Layout Style Props -> Style Topia ----- World, User -> Media Content Genre -> Plot Document Structure -> Layout Style Props -> Style Sample ---- World, User -> Fabula Genre -> Plot/Discourse Document Structure Media -> Layout Style Props -> Style todo: scenario, 4 concepts need to be compared what are the possible trade-offs. tailor image to fit scenario test table whether it still fits make links explicit (influences, subset, uses) identify knowledge bases/external knowledge ******************************** Leesclub scenario ******************************** The objective of this simple scenario is to identify some of the choices which a presentation engine needs to make. The choice made is mostly arbitrary and is dependent of the goal of your presentation. One of the goals in the presentation is to compare 4 concepts. The ideal case, according to the rhetoric would be that the four concepts can be compared simultanously. If the concepts can be represented by images, this means 4 images are presented at once on the screen. In a presentation about Rembrandt's work his use of chiaroscuro is compared to work of other chiaroscuro artists. The objective of this comparison is to get the viewer acquainted with the chiaroscuro technique. Suppose the situation is not optimal and the 4 selected images can not be presented together at once because of insufficient screen space. To cope with this situation there are alternative way the presentation can adapt. Content - Substitute 1 or more media items with smaller ones. Badness: - smaller media items lack detail - for comparing, images should be of similar quality. - Scale images down Badness: - smaller media items lack detail - Choose alternative representation medium (audio/text instead of images) Badness: Audio and text are serializations of content which might not be very well suited to comparing Device - Change to a device with a larger screen. Badness: Inconvenient for a user Presentation Structure - Do not show all images at once but use separate pages Badness: comparing is harder especially for complex images Plot - Restructure the plot in such a way the comparison is not necessary Badness: - expensive operation since intermediate results are typically no longer valid. Discussion: Four images cannot be presented together. The suggested possible solutions happen at different stages during the generation phase. The substitution, for example, happens when the 4 concepts gets 'matrialized', in cuypers this is when a PS gets transformed to a HFO. Changing the plot, in contrast is done after the query returned its results which needs to be structured according to a narrative. The choice for an alternative presentation medium might also influence the presentation strucuture which needs to cope with time and synchronization. -- The objective is to generate a presentation about Rembrandt's use of chiaroscuro for a user who clicked a link on the Rijksmuseum website. The content provider wants to make sure the user feels the presentation is part of the rijksmuseum website. The corporate colours of the rijksmuseum website are a light shade of brownish-green. The visitor is a young girl who likes bright colours. Since the presentation is about 17th century art the graphics designer of the presentation wants to use dark colours with light accents to emphasis the use chiaroscuro which was important in that time. Content - Apply a filter to the content to match the design Badness: - Changing content is not advised since it might change the 'meaning' in an undesired way. Moreover it might be disallowed because of copyright reasons and sometimes not an option if the content (as is the case here) is the topic of the presentation. - Choose an alternative image which illustrates the concept in an appropriate way and meets the design criteria. Badness: not a realistic option since the available media content is limited. Device - Change to a device without colours. Badness: ignoring the problem Plot - The structure/genre of a the presentation can be changed to be less serious in which case a dark style is not appropriate. Badness: expensive -- A user queries the content database of the Rijkmuseum for the terms chiaroscuro Rembrandt. The database consist of digital representations of multiple media type of artifacts in the museum. The result set contains 4 images which need to be presented to the user. 3 images are self-portraits depicting Rembrandt one is of a student of Rembrandt who used the chiaroscuro technique. The images just fit all on one screen and they agree in style. Content - Drop image which does not match the narrative Badness: - result set is incomplete Presentation Structure - acknowledge the 'domain' grouping and present 2 groups one of three images, one of 1. Badness: - requires more space - Aesthetically less pleasing because balance is lost. Style - Ignore grouping and present images together on one screen. Badness: - Grouping/structure lost which confuses the user who expects a relation. - Present images on one screen but convey grouping by setting different style properties.. Badness: A user might still be confused if it is not clear what the groupings mean. The scenario's described above are relatively similar in the sense that they all describe a conflict which needs to be resolved. The proposed solutions are sometimes a bit sought, nevertheless the options and choices are valid. There exists no pre defined strategy which would account for all cases. Moreover, the solutions typically work at different stages during the processing chain. A choice for a particular solution influences the generation process and might cause unforeseen problems. For example, a choice to for a different media type influences the presentation structure and style properties. When making a choice one needs to know the consequences of the choice, because of that the local character of the choices described above in fact needs to be seen within a wider scope. Finally the possibility exist of getting into an infinite loop when there is a three way dependency problem. Theory vs Pragmatics. As mentioned before the options to resolve a conflict are not all that realistic. For example changing to a different device or revising the whole plot structure is probably not advisable (although with film generation revising the plot often only is the only option since material is scarce). In general one can say a complete automatic system which can cope with any situation is not realistic and not what we are after. There exist hardwired choices which limits the scope of possible adaptations. Cuypers, for example, uses depth first backtracking, this means the last choice made is revised when a conflict rises, when that didn't work the choice before that is revised etc. This might be considered an implementation decision, which in fact it is. Nevertheless the architecture and data model of a system typically makes these choices implicitly. By making a choice for an architecture one needs to understand the scope of the problems which it can resolve. The Cuypers architecture is based on depth first backtracking this typically works well for small scaled problems since the the whole search-tree can be overseen and enumerated. However when the tree gets bigger complexity issues arise which make that some choices will never be revised in favor of performance. In Cuypers the content (PS) dominates the layout, that is the layout always gets adapted and never the content. The underlying idea is that if every parent takes care of its children then we'll' end up with a reasonable result presentation. What we can not do (easily) with the parent-child paradigm however is for example choosing a colour scheme. A colour scheme is based on all content of the presentation and therefore is a top level choice. The media content however are leaf nodes which means a matching colour scheme is propagated upwards to the root node. In the best case this is just rather inefficient, but in case two children can not agree on a schema things become more complicated. Some of the choices in presentation generation requires an overview of the whole process. Cuypers does not deal well with this situation because of its architecture and implementation. With the SRM the 'overview' problem is solved by experts. What the experts precisely are and do is mostly undefined and if not rather vague. Especially the links between experts and the generation process. requirements: - overview of the whole generation process, the choices and the consequences - manageable (for example) by rules ------------------------ Towards a solution warning: very early/undeveloped idea metaphor: game In logic there exist a discipline of proving/deducing/solving statements which uses game playing as a metaphor. There are two or more opponents who all have their own strategy, in case of a given propositional statement which needs to be proven true or false, one is in favor, the other denies it. They take turns in attempt proving their vision by modifying the statements according to the rules of the game. For example, if the formula contains an 'and' operator this can be exploited by the falsifier since she only needs to prove one of the statements false to win. There exist a number of variations for different kind of games (chance, hidden, strategy). The metaphor of game playing might be usable for presentation generation. The vicious triangle can be seen as a game between three players who have different strategies to "win". The (abstract) presentation is the playfield/model/structure. Players act on moves of their opponents. Since the players have different objectives the playfield looks different to them. For example the structure player has ordered his view according to presentation structure (grouping, order, priorities) The design player however views her field more like categories, for images, section paragraph etc. The layout player sees physical pages (and substructures). The content player sees media items and combinations of media items. When a player makes a move, for example the content player selects a media item to represent a domain concept this influences the playfield, consequently the view of the design, and the layout player changes (note the view of the structure player does not change) The layout player finds out the images doesn't fit the screen and scales it down. The designer finds out the new image doesn't fit the colour scheme and applies a filter. etc. etc bla bla. From an implementation perspective this scenario needs a way of transforming the abstract presentation into an appropriate view for the respective players. Moreover a change in this view propagates to the views of the other players. The players "watch" a view and get triggered if somethings changes which they do not desire. problems: how to avoid cycles, - maybe a referee the process needs to progress - maybe after a progression step the game is played. Then a next step is made after which the game is played again etc. architecture implementation conclusion: An architecture for presentation generation systems is because of different strategies not a linear system. The typical software engineering principle of divide and conquer techniques might not be best suited to solve this problem. What are the important aspects of a presentation which we like to adapt against what costs? The notion of an overview of the process is advocated by the SRM and the MAO model by the Suzanne. These models however are more from a conceptual point of view then an architectural one. Cuypers implemented part of the model. Because of the architecture of Cuypers some parts of the models can not be implemented. This bluebook note is a first investigation of what the dependencies between the different components and the trade-offs involved are. INS2 Scenario's User Oriented ------------- Lynda describes a system in which the user (=visitor) is central. The actions the system takes influence the Motivation, Ability and Opportunity (MAO) of a user. The system has rules which optimize MAO. Katya describes a user as the author/designer of a presentation. The system supports the author in creating a presentation by providing suggestions. From that perspective the author is more like the user except that the adaptations are interactive. The trade-offs become more explicit (different work-flows) since a user needs to be supported. Nevertheless in SampLe the user takes the role, of designer, author and content provider so some of the trade-offs are made in the head of the user. Structure Oriented ------------------ Stefano describes a system which tries to convey an argument. The system has knowledge about the structure of an argument. Furthermore it knows the discourse effects of editing. To convey an argument it uses this knowledge to find appropriate material. Frank describes a scenario in which the conceptual structure of the presentation is known and should not be changed: Form follows function. Content/Media Oriented -------------- LLoyd describes a scenario in which a user gets results from a query. These results form the basis of a process which generates structure around these results Discussion with Jacco: After reading the groups scenario's Jacco and myself discussed them. The main observation was the difference in processing models. Similar to the discussion about the role of media data in a discourse ontology (whether it was part of the plot or not) we found that there are different basic assumptions in generating presentations. Cuypers leaves the choice which media to use almost until the last moment. Lloyd's Topia in contrast, and to some extent Stefano's work start with media objects. These are two conflicting views which are hard to unify in an architecture which emphasis the processing chain. We discussed a model-view-controller architecture which abstracts from an explicit processing chain but instead is based on events. There are a number of "agents" who get notified when a particular event occurs. For example, when a media item is added this triggers a design agent to judge whether it fits the style of the presentation. If not it can adapt the style, or remove the image again which both generate new events. These can trigger other agents to perform an action. Although this architecture appears to be more flexible and is closer to a real-life scenario in which a human author gradually adds/removes material and changes work-flow as Katya pointed out in her scenario. Nevertheless it also introduces problem of interfacing, what are the atomic actions an agent makes, what is the data structure it manipulates. A real-life model of a big table and a collection of media material which an author/designer arranges in such a way that a presentation is constructed is not representative since half of the data structures are implicit in the authors head. In other words the status of the presentation is not what is on the table. Having said that, producing the presentation is the process of making the data structures available in the authors head accessible by means of physical representations. The viewer perceives the representation (including structure, design etc) and uses it to construct her mental representation to match the structure the author intends to convey. Hmm this is sounding very much like semiotics, but the point is that the presentation is the materialized interface of communication. Generating such an interface automatically requires data structures which are only in our brain to be made explicit an metaphor might therefore be hard to find. We can however analyze how a human authors a presentation and we'll see it is not a linear process but it is not completely a-linear either. We take certain weighted assumptions to start with which layout the basic structure of the process. If we create a biography about Rembrandt we can start with a biography structure and find material which matches this structure, if however we create a presentation about Rembrandt we can start by looking for material and later decide a biography will suit the data best. One dominates the other, which is a linear aspect. If we look at the different generation systems we see that they are all perform similar task except the order of processing is different: Aria: Structure Query Concept Instance Disc: The user enters a query which returns concepts. These concepts are crawled according to a discourse scheme to find structure. The concepts are the represented by Media. Query Concept Structure Instance Behavior: If the concept doesn't match structure change the structure. If the media doesn't fit the Structure change the media. Noadster: The user enters a query which results in media items. The media items are annotated with concepts which is uses to deduce structure. Query Instance Structure Topia: The user enters a query which returns media items. The meta data of the media items is used to cluster the result. Query Instance Concept Structure Behavior: Media is fixed, adapt structure to media Hera: Hera uses templates called slices which are domain independent. An author created slices based on domain dependent schema information. A user enters a query which return media items/instances which fit a particular slice. Structure Concept Query Instance Backtracking or Design alternatives happen according to the dominance of the processing scheme. The real question thus is can we create general components of which the order in which they are used can be altered. Thus, can we create modules (like) Query, Concept, Structure and Instance which can arbitrarily be combined, or, is the order of these modules tightly coupled with the contents of the respective module? In order to find this out we need to analyze the data structures used in the modules at the different stages during the process chain. If there exist overlap in representation this might suggest that there exist levels of abstraction which might be unifiable. Towards Scenario The scenario needs to illustrate the need for an architecture in which a top-down and bottom-up approach are combined, or atleast can work together. In the later case there needs to be support for the fact that these two need to be combined. Alternatively we can just have a TD approach, like aria and disc *and* a BU approach like Topia. Intuitively the combination is obvious: Topia (read clustered media items) at some stage needs high level structure too since every presentation has a higher level structure. In Topia this is realized by a template like document structure: 1) a hierachical index like structure to give the detailed content context and 2) a detail window which gives detailed information about the currrently selected topic. Disc in contrast manipulates the higher level discourse strucuture (which is reflected in document structure). At some stage however media items need to be included to reflect the higher level concepts. The assumption here is that there exactly is a media item which matches the concept. Disc abstracts over the fact that there can be multiple media items, of different types and that the media item possibly does not exist. Difference in TD-BU approach is what gets changed/adapted. In Disc the discourse structure remains fixed, while the choice of media items and the formatting of these media items is flexible. Topia in contrast has a fixed media set of which the clustering (=structure) can be adapted. What we are looking for is a scenario where this trade-off matters. Both approaches switch from a TD->BU or BU->TD strategy. The question is what do we gain if we have an architecture which can cope with both stategies? A disc scenario in which the structure gets adapted A topia scenario which chooses modality which fit best. Top level uses modality knowledge. eg. use graphics for spatial information natural language for temporal information (from srm premo) Top-down Bottom-up PS_Media & Modality and (2bdone & Delivery Context) Modality grammar rules form the basis of dealing with media items. They present (create an HFO) for any sequence of one or more media items because there exist rules for presenting audible media and for graphical media. In addition there are also rules which present combinations of audible and graphical items. Since all media item fit one of these categories any sequence of media item can be presented. Of course the presentation of media items is a one size fits all approach and therefore hardly reflects the underlying semantics of the media items. A set of media items can be shallowly structured by explictly grouping them. We currently support "group" and "alternative". Although modality grammar rules present any group of media items, within a presentation you often like to differentiate between media items based on the concepts they represent or the rhetorical function they fulfill. Modality grammar rules know nothing about these which is why we need ps_media. PS_media deals with media on a higher level, it basicly represents a domain concept independent of the media item it uses in the final presentation. Within ps_media we still can select media based on the preferences of the user, or the device. Furthermore we can create media items if nessescary (e.g.\ captions with images) or transform media to fit the context. (e.g.\ text to audio) yet another scenario (...attempt) ------------------------------ Within the grammar rules there is a function selectMedia which selects a media item (hfo actually) from a set of alternatives. The alternatives are all valid from a technical point of view and will not make the constraint solving fail. The choice which one to choose is part of the vicious triangle. Cuypers used (some sort of) ccpp profiles to define the delivery context of the presentation. The assumption here was that there are different screensizes which require adaptation in layout. However, the screensize can become so small, for example with mobile phones, that alternative media types are advisable. Typically the selection of a different media type influences the structure of the presentation because media items typically can not be substitute one for the other while keeping semantically an equivalent presentation. If we consider ccpp profiles for a mobile phone and a workstation web browser, the first one preferably will use text or audio while the later one preferably uses images and audio. example: Currently the description text of the images does not always mention the term 'Chiaroscuro'. This however was the keyword which selected the image in the first place. So if we are only allowed to use text the images with textual descriptions without 'Chiaroscuro' should not be selected. In other words, the change for text-only changes the presentation structure. needs more... For the text-only version the aria demo substitutes text for images. As a consequence there will be shown the chiaroscuro elaboration text and descriptions of the example paintings is a slide-show. Is is unclear for the user what text to read and the relationship between the texts is not clear either. For a textual presentation the structure of the presentation needs to change to make sense. In case of the chiaroscuro presentation this means the elaboration text and the examples should be presented both as individual sections. The examples should be presented in subsections. There needs to be a transition/glue between section 1 and 2. Presentation structures were initially meant for this purpose. If you choose a different output medium, such as report instead of multimedia the presentation structures would be: report, section, subsection, whereas, multimedia had presentation structures: presentation, scene, sub-scene. Basically they were just some sort of templates, for each different type of output modality (report/multimedia) you'll need such a template. The presentation structure itself are in structure very much alike. Presentation vs. Report, Scene vs Section, Sub-Scene vs. subsection. These categories are presentation oriented, the presentation structure Scene knows how to present its children temporally. The presentation structure Section knows to present it children spatially, one below the other. This knowledge is encoded in the presentation structure itself which has some drawbacks: 1) every output medium needs its own presentation structures 2) For different genres of presentations you need different structures. 3) If the output modality is multimedia, but there is a preference for text the presentation structure should be able to cope. What is needed: Presentation structures should have no embedded design knowledge. Presentation Structures should be independent from the output medium. Presentation Structures should be independent from the input medium. Presentation structures should be tree structures. This sounds very much like a structured progression but there is more to it. A presentation structure should know at what level it occurs in the tree. Moreover it needs to have a function/purpose (maybe these can be combined?). I think there are at least three different types: top-level structures which are mainly influenced by the document structure (and rhetoric, genre, discourse knowledge) Then there is the concept level, which are atomic nodes in the discourse (but are not necessarily equivalent to media items). And there is a level in between which currently is the most vague. These are functional units, (maybe communicative devices?) typically with a rhetorical function such as compare or contrast. In the vicious triangle this assumes content and style are subordinate to structure/rhetoric. I think this is the most typical case, however if we choose style as dominant then these structures have a function like happy or dark, formal, playful. Note that the top-level structure is kind of independent of this, whether you choose a rhetorically dominant presentation or a style dominant presentation you still need a document structure. Same for the lower level/concept structures. Independent of a rhetorically/style dominant presentation you always need to select and present media items. What's next: The textual example as it is didn't work, theoretically we can fix it by adding another presentation structure for a text only document structure (e.g report). However this has scalability issues as mentioned before and we can't cope with a textual preference within a multimedia presentation. My suggestion is to create a general applicable presentation structure (much like it is at the moment) and make all design/layout decisions explicit outside the presentation structures (something like a style-sheet). This can serve as a foundation for further extensions.