Patrick Schmitz, Ludicrum Enterprises
<cogit@ludicrum.org>
These are just some ideas that arose as I read his thesis. I have added additional notes as I read other papers, but his ideas about attention, and some omissions in the architecture spurred my thinking in this area.
It seems critical to consider the function/use/author-intent of a piece of media (e.g., an image) in a presentation, and not just the fact of its media type, or even of its contents . Simply combining "image" as a class with other content is much less meaningful without the function/intentional-use included in the analysis.
If the work done to classify audio is moved onto a slightly different axis, I think it becomes much more meaningful, and is also general to all media types (not just audio). This axis is the intended level of user focus. Sticking with his three levels of distinction, we get content that is:
This of course begs the question: how do we determine/deduce/declare the function/intentional-use of a piece of media? When we are generating media, I would think it important in the rhetorical and discourse models to note the function of the media, and so this should come for free, to some extent. Stated another way, this would be one of my design criteria or requirements for a model of rhetorical structure or discourse.
In an existing presentation, I suspect that some heuristics might be applied that could help with this. These might well be inspired by the classification scheme that Oscar describes. However, it makes more sense to define a model that handles the broader case (including the exceptions to his rules he mentions). Once we have a model like this in place, it may make sense use his simpler rules to approximate the proper model, when the function/intentional-use is otherwise unknown.
The general assumption (or definition) is that static media has no intrinsic duration. It is usually modeled as having 0 duration or indefinite duration, such that other constraints or some explicit authoring directive controls the effective duration. However, in the context of generated presentations in which media is selected to fit or function in some context, often in association with other media, I think that even static media can often be understood to have an implicit duration. I see two approaches to defining this implicit duration, using as an example an image.
Naturally, one of the cognitive factors for media will be the function or intentional use of the media. For focal media, rules like the above are most likely to be useful. However, ambient media will have an implicit (useful) duration determined by the associated focal content in the presentation. By the same token, alert/interrupt/guide content will either be short (usually constrained to some stylistic rule for consistency across the presentation), or it will be persistent (as with menus, control buttons, guides, etc.), and will be constrained only by the associated presentation context as for ambient media.
Animation is not really covered. Raises lots of issues w.r.t. attraction, attention, the classification of the media, etc. For the purposes of this discussion, animation basically breaks down into three general categories:
Audio not considered as multichannel, or positional. Should be 3-D as well, although it has no extent. I.e., it has position (which leads to stereo balance when spatial processing is done), but audio has no extent. Position is particularly important when animated, to enhance presence in a 2-D or 3-D navigation space. It has also been used to model human interruptions (whispering a reminder or a hint in one ear, by placing the audio immediately offscreen to one side).
Audio volume not considered. To an extent not directly comparable to visual media, audio has volume which can be used for layering. This can be independent of position (as an attribute of the media), or it can be modeled with spatialized audio (that can include motion animation), and/or it can be controlled by styling (e.g., keying off the function/intent of the audio in context). Can layer text with opacity, and so too images, but not as commonly used as volume is for audio.
Images versus text - a whine. Images are inherently limited in the information they can convey (as are most media types). They are not necessarily richer than text - it depends on what must be communicated. This needs to be better accounted for. OTOH, if there is no good means of conveying an idea with an image, then an image is unlikely to exist, so the alternative will not present itself.
Open question on text to speech: Oscar describes some rules that govern how and when speech can be effective in concert with other media types. Should these rules be considered or extended to support compensational layout tools such as speech-to-text? Note that in particular for small display devices, text-to-speech may be an important strategy to compensate for the lack of screen real-estate and poor text display. Similarly, highly passive modalities (e.g., watching TV) and non-visual modalities (e.g., working with an autoPC) will prefer speech forms. If there is only text, then speech-to-text will be an important tool. It could be modeled as orthogonal (as though it just exists and is "chosen" when media is chosen from the MMDB), but it may also be a compensation tool to deal with constraints much later/deeper in the process/model.
I think there is an important distinction between narrative content and non-narrative content1. My use of these terms is based upon folks I knew in English and Comparative Literature majors in school, for whom narrative forms were the means of telling stories in the general sense, and included novels, textbooks, movies, poetry, etc. Non-narrative forms are things like query results that do not generally include a discourse model or equivalent.
The narrative forms lend themselves to a degree of authoring, even if only at some abstract level. Non-narrative forms can incorporate graphic design, and perhaps some intelligent ordering and grouping, but not really authoring as such; presentation generation has tended to be more automatic and simplistic, and less heuristic or analytical. As such, narrative forms also lend themselves to the application of cognitive modeling, but non-narrative forms will either not be as amenable to cognitive modeling, or will have a simpler and more general cognitive model associated with the app/content model in general, and not per presentation.
Nevertheless, in the area of semantic web queries and applications, it may be very interesting to develop some models for synthesizing discourse, narrative or cognitive models for non-narrative content. I would like to keep this in mind as we explore the application of semantic relations to generation and translation of narrative/discourse/cognitive models.
1I am not sure if there is such a thing as non-narrative expository content (that would sit in between these two - I am pretty sure this can be ignored).