Rhetoric and MM

Rhetorics in Hypermedia

Key

Jim wrote
What Jim wrote that Lloyd disagrees with
Lloyd wrote
Lloyd reworded Jim, perhaps from another source
Lloyd's disagreement
Jacco wrote, reworded by Lloyd

this is not even a draft yet. it's not even an outline yet. it's a collection of all the notes that are on this topic.

Rhetorical Issues Unique to Hypermedia

This focus of these brainstorm session was on what is special about multimedia in terms of rhetoric. The reason for wondering this is that in this paper we need to say something new. Much has been established for using design in the other presentations forms to convey general concepts. Some has been said for using the other forms for conveying rhetorics. Just repeating these will not be new. And just throwing them directly into multimedia use will not be new.

Here are the three main points of the novelty of this topic, that multimedia wrt (with regards to) rhetorics is unique because of:

interaction
adaptivity
the unique harmony from the combination of devices from the other forms

1) has been pretty well explored in general, but it still worth mentioning. Our novel discussion in this area with be interaction and rhetorics. Some of this has been discussed already in terms of linking structure and rhetorics.

2) is also well established. The HT00 submission discussed specifically conveying the same rhetorics through varying hypermedia adaptation, but with a focus on constraints. Adaptivity is unique to hypermedia over the other forms. It does have impact on rhetorics. One impact is that the intended rhetorics must be maintained. Another is that no unintended rhetorics can be introduced. What is distinct here about rhetorics over other meta-data? We need to focus not just on the uniqueness of multimedia over other forms, but the uniqueness of rhetorics over other meta-data. Is it that rhetorics is an aspect of the means of conveying rather than of the message content itself?

3) is harder to explain, but perhaps more novel and interesting to read in a paper. If you play an A note, and separately play a C, you play a note each time. If you play them together, you get two notes, but also a chord: a major 3rd, a particular, recognizable, kind of sound. If you add interaction to film/video, you get interactive film, of course, a somewhat lukewarm field of study. But technically, interactive video can be identical to hypermedia.

If you add multimedia to comics, away go many of the sound and noise bubbles. But if you turn off the noise in video and turn on closed caption, you don't get sound bubbles, and wouldn't want them. You'd want, and would get, text below the image. Add time to graphics and you get animation, and many of the rules change. Visual effects that work in graphics don't in animation -- they become too "noisy". Some visual effects only work in motion: the combination of graphics and time makes a unique harmony not existent in the separate media. Many animated Windows effects can be consider the combination of graphics, time and interaction and form communication devices with effects on existing in the individual forms.

To structure the consideration of 3), we could itemize the combinations of devices from the different forms that make unique communicative devices not existing the individual forms. Any new devices we come up with would be novel. With both the novel and previously existing devices, consideration of them wrt rhetorics in particular would also be, in many cases, novel. And the placing of existing use of combinatory devices for rhetorics within the context of the collection of these new ones would also be useful.

We also talked about his concept of "communication devices". If you have the rhetoric of a two-member sequence, such as "middle ages" and "renaissance", there are many ways to present is. One with with the text "comes after". Another is with a graphic icon, such as an arrow, that works independently of the placement of the terms -- in fact, it overrides the placement. These involve the use of a particular semantics symbol, and xcon, to convey the rhetoric. You can also use hypermedia structure, such as left-right and/or up-down spatial placement, sequentual temporal playing, or with the use of next/prev buttons. What term best combines xcons and structure here? Does "xcon" do, or should that be used for media items, it used at all? Jacco says the "communicative acts" applies only to media items, not to structure. Can the term "communication devices work here? Is there a better term we should use?

Jim also mentioned referring to Stu Card's (sp?) work on animated graphics and how to avoid having misleading or extraneous distractions in them.

Discussion of RST

RST...scope is written monologue. units are typically clauses an RST analysis is a functional analysis of the intended effect of a text on the reader. relations among clauses a relation is constraint on N constraint on S constraint on combination of N and S intended effect schema - patterns of span and relation S N (e.g. circumstance) N N constrast N N+ joint S S N a --motivation--> b <--enablement-- c N N+ (+ ordering) sequence There are conventional canonical orderings, but not required by theory. some relations are causally related to the domain (cause) others to the presentation itself (justify). if you can't find a nucleus, no RST is possible. you can often delete the S and retain coherence, but deleting N makes incoherent. Grosz and Sidner consider attention, but RST does not.

Overview Chart of RST


Cn: constraint on Nucleus
Cs: constraint on Satellite
Cc: constraint on combination
E:  intended effect
L:  locus of effect

R reader, W writer
N, S nucleus, satellite

<> the situation in CC.  For many of these the effect is just that R
understand or recognize whatever is in CC

----


Evidence
 Cn: reader does not (sufficiently) believe N
 Cs: reader believes S
 Cc: belief in S increases belief in N
 E: belief in N increased
 L: N

Concesssion
 Cn: W supports S
 Cs: W does not deny S
 Cc: W acknowledges potential or apparemt incompatibility between N and S.
     W believes N and S are in fact compatible
     understanding the compatibility increases R's regard for N
 E:  increase R's regard for N
 L:  N and S


Elaboration
 Cn:
 Cs:
 Cc: S presents additional detail about the situation or some element of 
     subject matter presented in or inferentially accessible from N.
 E:  R recognizes   R identifies the subject matter for which detail is
     presented.
 L: ns 

examples: set/member, abstract/instance, whole/part, process/step,
object/attribute, generalization/specific


Motivation
 Cn: presents action A where R is actor
 Cs:
 Cc: comprehending S increases R motivation to do A
 E:  R's desire to do A increased.
 L:  n 

Condition
 Cn:
 Cs:
 Cc:
 E:
 L:  

 ?? maybe S is the circumstance under which N may occur ??
 ?? but then what's circumstance ??

Evaluation
 Cn:
 Cs:
 Cc:
 E:
 L:  

Justify
 Cn:
 Cs:
 Cc:
 E: increase acceptance of W's right to present N
 L:  

Circumstance
 Cn: 
 Cs: S presents a situation not unrealized
 Cc: S sets a framework in the subject matter within which R is
     intended to interpret N
 E:  R recognize <>
 L: ns

 Example (p 14)
   N: Cleaning agents on the surface of the ECtype coating actually
      remove build-up from the head
   S: while lubricating it at the same time

 [I don't see how the example works.]


Background
 Cn: R can't understand N without S
 Cs: (R understands S)
 Cc: S increases ability of R to understand N
 E:  R's ability to comprehend N increases
 L:  N


Otherwise
 Cn:
 Cs:
 Cc: S is prevented by N
 E:
 L:  

Restatement 
 Cn:
 Cs:
 Cc: S restates N.  S and N of comparable bulk
 E: R recognizes S as a restatement of N
 L:  ns


Anti-thesis (subtype of Contrast)
 Cn:
 Cs:
 Cc:
 E:
 L:  

 ? S is the opposite of N

Solutionhood
 Cn:
 Cs: presents a problem
 Cc: N is a (partial) solution to S
 E: R recognizes that N is a (partial) solution to S
 L: ns 

The scope of "problem" includes questions, requests, conditions that
carry negative values.

Enablement - S enables R to do N
 Cn:
 Cs:
 Cc:
 E:
 L:  

[Volitional     ][Cause]   S causes N
[Non-Volitional ][Result]  S is result of N

Purpose
 Cn: presents an activity
 Cs: presents an unrealized situation
 Cc: S is realized through N
 E: R recognizes <>
 L:  ns

Interpretation - S relates N to some framework
 Cn:
 Cs:
 Cc:
 E:
 L:  

Summary -S restates N, size s less than N
 Cn:
 Cs:
 Cc:
 E:
 L:  

Means
 Cn: presents an action
 Cs: none
 Cc: situation in S tends to make possible or likely the situation in N
 E:  R recognizes <>
 L:  N and S

Note: this is not found in the short list on p 18

--- multinuclear

Sequence
 Cc: a succession relation
 E: R recognizes <>


Contrast
 Cc: no more than two.  the situations are the same in many respects, differing in at least one, compared with respect to those differences
 E: R recognizes <>


Joint
 None.

Introduction

RST has typically be used for analysis of existing text documents. Here we not only apply RST to non-text media and multimedia but also to the generation of new presentations, rather than the analysis of existing ones.

Determining Presentation Order from Rhetorics

RST does not fully state the order in which document components are to appear in presentations. The RST relation sequence is a clear exception, stating that the components in the relation have a particular order. Nuclear-satellite relations can imply an order, with either the nucleus or the satellite always appearing before the other. Another example is when a document component is a nucleus in multiple nucleus-satellite relations, the order in which the satellites are presented in conjunction with the nucleus can be in order of the different type of relations. However, for the most part, non-rhetorical meta-data would be used to determine order not stated by the RST sequence relation.

We might be able to recognize parallel constructions by finding parallelism in the RST structure. Reichenberge et al hint at that as a means of recognizing a potential sidebar.

Reichenberger et al used only nuclearlity to construct trees, they ignore the details of the relation.

The Narrative Structure of Surprise

Related to rhetorical structure is narrative structure. One type of narrative structure is surprise. It might be used in a joke or riddle. The intended effect is that the reader not see the material accidentally or before sufficient time has been allowed for him/her to puzzle over it.

There are some universal presentation techniques for conveying surprise. For example, presentations can always ensure that the surpise components always appear after their build-up and never simultaneously. This can be done by putting the surprise after the build-up in the time line or in navigation traversal.

One semi-formal specification of surprise is this: surprise S follows in some way from N. Present N first, and then, eventually, present S, but not right away and not by accident. The purpose of the utterance is be to make the reader expend some effort trying to find S on her own, where this effort is supposed to be pleasurable in itself. One use is in jokes, where N is the build-up and S is the punchline. Another is with quizes, where N is the question and S is the answer.

The surprise narrative construct has implications for spatial or temporal structure. For example,in ordinary paper media for a riddle you could either put the answer on the other side of a page, or at least a bit further down with some white space in between. The former is like a navigational link, the latter is a spatial constraint. In hypermedia one could also use a temporal constraint of delay.

The use of spatial structure works better if the reader recognizes that he or she is being shown something with the relation of surprise in it, and that the effect is being achieved by spatial separation and is willing to cooperate. If the reader didn't know this, he or she might quickly scan the page and see the answer too soon, which ruins the fun. So you could only use this constraint in cases where the genre allows the system to assume that the reader will recognize the situation.

Filtering the media before presentation can also enable the narrative impact of surprise. On example is that punchlines are also sometimes presented upside-down. The text making up the punchline is filtered before presentation to appear upside-down. This type of filtering for surprise is also sometimes used in Teletext, which is textual information encapsulated in the video-signal of most European TV broadcasts. Most European TV remote controls even have a special button to reveal this otherwise hidden "surprise text". It is used for riddles and jokes, but also for info that is supposed to be too technical for the average user.

The relationship between S and N, whatever it is, is more more interesting than either N or S, which are merely examples of it. Thus if you want to teach logic, in particular say modus ponens, you have to provide examples of the major and minor premise and thesis, but nobody cares whether the student remembers these later. In particular they don't even have to be true. See also MMT notes on "relational propositions" section 7.1 (p37 and following)

grice and hm

Make sure each device does what it is supposed to, and nothing extra

Grice's maxims under Manner included:

avoid obscurity of expression
avoid ambiguity
be brief
be orderly
and these can be applied to hypermedia as well. Add
Structure your presentation so that the relations between parts are obvious.

In encoding automatic processing, it is easier to make presentation structures that are helpful than to avoid presentation structures that are hurtful. One difficulty is that it is technologically impossible to represent all the conveyed semantics of media objects, thus it is impossible to prevent adjacency of objects with semantics whose combination conveys undesired messages. However, some types of implicature can be prevented with constraints on the presentation structure.

Since speech and text are linear, the locus of attention is always fixed for a given point in the progression of the presentation. Hypermedia is different than speech and text because different components can be presented simultaneously. An important part of narrative in hypermedia is ensuring that the locus of attention is on the desired objects among those that are shown in parallel. Animated images in Web pages are often considered undesirable because they draw attention from other parts of the display.

I do see some connections to Manner of speech. If I were to speak to you and at the same time wave my hands wildly in the air and leap about from place to place, you might find it distracting in something of the same way an animated gif is distracting. Or if I were to use a "funny" voice (like the actors on Monty Python do sometimes) it would be distracting also. but since Hypermedia is a visual medium, and speech is not, I think there's still more to be said about Manner, as it applies to Hypermedia, beyond what has been said about Manner and speech. (But I also think much of this is probably understood by good graphic designers.)

Hypermedia Devices for Conveying RST

The different types of multi-nucleus RST relations have universal tendencies in how they are conveyed in final presentation. Presentations can generally distinguish the nuclei from their corresponding satellites. However, there are no universally applicable techniques for convey a particular type of RST nucleus-satellite relation. There can be, of course, non-universal conventions that get established for conveying particular types of RST nucleus-satellite relations in a given document set or individual document.

Conveying RST with Spatial Structure

Much of how hypermedia spatial structure can convey rhetorics can be derived from how printed layout conveys rhetorics. The relationship between RST and printed layout has already been explored [Reichenberger 1995]. In comparing layout with hypermedia we get not just techniques that are the same between them but also a richer understand of what aspects of conveying rhetorics are unique to hypermedia.

Printed text often distinguishes RST nuclei and satellites. take for example a guidebook of Amsterdam architecture. Each page has five or six frames, laid out top to bottom. Each frame has a small black and photograph of a facade and some descriptive text.

Where the nuclei, where's the satellite? My intuition is that the picture is the satellite, but I cant explain why I think that.

Each photograph is a nucleus in an "elaborate" RST relation, with the descriptive text being its satellite.

In this example, the question arises of what the level of granularity of the photographs is. Since typical consideration of RST has been text-based, the usual granularity has been that of a clause of text. But how does this translate for multimedia? Each photograph could be considered as a whole. However, there may be components of a photograph that the reader can distinguish and perceive as part of an RST relation. An example of this that comes from printed text is that of arrows indicated portions of pictures. Another is that of text refering to identifyable portions of images, such as the word "rooftop".

Various multimedia languages have syntactic constructs for representing portions of media objects, including SMIL [SMIL 2000]. Syntax representing rhetoric meta-data could incorporate these or similar constructs for placing portions of media objects in RST relations.

RST between image and text

made up example: a page that has the following (invented) quote "never have the american people been freer, safer, happier" - J. Politicus and a photo of a homeless person.

This is an ironic juxtaposition. you can do it in a plain static graphic, there is nothing hyper-about it. what's it saying? and how?

because it's a quote, and explicitly a quote, there's no inconsistency in belief. that is, the person J Politicus said (assuming the quote is accurately reported) X, and believes X, and the author (who did the juxtaposition) might believe not-X, but neither one is inconsistent.

suppose instead it had been just text: "J Politics said quote never have the american people been freer, safer, happier unquote but there are in fact more homeless people in the usa that at any time since the civil war".

does something different happen because the layout follows, to some extent the layout of poster?

texts can lie. images can't. if images weren't either more informative or more believable we wouldn't use them.

suppose the poster is a justaposition of a photo of J Politicius smiling, and a quote "Homelessness rose in the USA in 1999", attributed to, say, some government agency. This reads as Politicus' reaction to the news, and we think of him as heartless, or at best ignorant. This isn't SAID though. But is it IMPLICATED?

does the text comment on the image, or the image on the text? both? How do you know? surely sometimes its one, other times the other.

If this implicature were desired, it would be easy enough to generate a presentation conveying it. However, it would be much harder for an automatic process to avoid or detect and correct accidental occurences of such implicature in presentations as they are generated.

rhet of film

In the movie 2001 there is a jump cut from a hurled bone to an orbitting space station. the cut matches the motion of the bone to the station. The jump cut tells the viewer "there is some relation here" but not what it is. Nothing tells you the relation between the two. the continuity of motion is the only clue that is some rleation but leaves out what it is. it poses a problem for the viewer which is left to the viewer to solve. In fact I dont think there is anything in the movie itself that tells you. from extra-textual sources (in this case, the novelization) it happens to be a weapons platform.

Intercutting a sequence of unknown referent (time, space, extent) with another one of known location or extent is a way to convey same location, or extent (duration) or at least direction.

Suppose that you have a sequence of unrelated shots, all indoors so you can't tell where or when they occur.

Suppose you intercut this with another sequence of shots of the sun (e.g. sun up, morning, noon, afternoon, sunset) . The meaning is that the events of the sequence took place over a day. If you instead intercut it with a set of exterior shots of various locations in the city, you get another meaning.

The classic exterior/interior cut. is a degenerate case of this, I think.

Okay, so this means that any time we have a set of shots for which we want to convey the postion (in some dimension), we can convey this by either

intercutting with shots that establish the position
putting a caption underneath the shot
putting a caption *into* the shot. This is often used device, too. The TV show X-Files is the current state of the art for this device.

In some cases, one can present snapshots of the sequences arranged in space in a way that also shows their position, e.g. overlayed onto a map of Amsterdam, with each shot placed in the location it occurs in.

This section discussed presentation devices for conveying information beyond that in the component media objects. Some of these devices contribute to the narrative structure of the presentation. Some of them can convey aspects of rhetorical structure.

implicature from layout

Implicature from Spatial Structure

suppose you did an query against an image database and the resulting display showed a big window with three thumbnails in one corner.

+----------------------------------+
| +--+   +---+  +---+              |
| |  |   |   |  |   |              |
| |  |   |   |  |   |              |
| +--+   +---+  +---+              |
|                                  |
|                                  |
|                                  |
|                                  |
|                                  |
|                                  |
+----------------------------------+

suppose instead the three thumbnails were shown zoomed up to fill all the available space

is there a difference
is it rhetorical
would presentational constrains affect the answer?

The uniformity of alignment implies sequentiality. In cultures using the Roman alphabet, order would be conveyed left-to-right, then top-to-bottom. In the figure above, after the alignment uniformity conveys a sequence, the empty space to the right and bottom implies these the display components are at the end of the sequence. If the images fill the display, the viewer may perceive that it is possible that more images later in the sequence appear in a subsequence display. These spatial structures can convey a sequence and that either the displayed components are at the end of the sequence or that the sequence continues.

Navigational structure can also be used to convey a sequence and whether the first or last components are shown in the current display. One common navigational technique is to use next and previous buttons that are ghosted out for the first and last displays. These navigational structures also control access of the sequence components in terms of the order imposed. A menubar providing access to any of the sequential displays by its number in the order is a navigational structure that further extends access based on rhetorical sequential structure. Temporal structure can be used as well. If each display is shown for a given period of time except the final display which is shown indefinitely, then the user can detect the final display from the temporal structure.

We have discussed how these sequences can be conveyed with these hypermedia presentation structures in earlier work [HT00].

to me the first one says (or at least implies) "there are these three results and no more" and the bottom one does not. (The reasoning is something like: the top one clearly had room it could have used to show additional results. it did not show them so they are not there. this would be defeated by e.g. an icon of an animated hourglass indicating that the system is still working)

in terms of loose/strict the first one says that I know the cardinality (three) of the result set.

if two images are presented side by side with tops aligned

+----------------------+
| +--+   +---+         |
| |  |   |   |         |
| |  |   |   |         |
| +--+   +---+         |
|                      |
|                      |
|                      |
+----------------------+

does this convey any proposition aside from SEQUENCE? as opposed to the case where they are not aligned?

+----------------------+
| +--+                 |
| |  |   +---+         |
| |  |   |   |         |
| +--+   |   |         |
|        +---+         |
|                      |
|                      |
+----------------------+

Does it say they are comparable? Does the second one say anything about dominance (due to position, size, or quality (e.g. color depth, presence of animation).

Lack of apparent spatial alignment can be used to convey a joint relation, which is a multi-nucleus relation in which there is no ordering of the components. The example we show is that of Amsterdam building built in the Louis XIV style -- they are all build in the same style, but their order is not significant. Temporal structure inevitably conveys a sense of ordering. Navigational structure can avoid conveying order by providing equal access to all components of a joint. Making the spatial structure conveying these choices random can convey a joint instead of a sequence relation. These techniques provide another possiblity for how undesired implicature can occur. Perceived patterns in presentation structure that convey order can appear even when not intentionally constructed.

mirror symmetry is also considered to be the "same" layout for purposes of establishing parallelism.

Symmetry in the presentation of multiple document objects implies they are of the same type of nucleus-satellite relation [Reichenberger 1995]. It can be used as an intended device, and must be prevented as accidental implicature. This prevention can be ensured for the most part with presentation constraints because this toype of implicature is conveyed with presentation structure alone and does not necessarily involve the semantics of the document objects involved.

When one has repeated material (page after page of these frames), is there some kind of "global nuclei" that can then be refined? I mean, when you first start to examine one frame in particular, and you have perhaps already seen many other frames, and you know that they are all about aspects of buildings, do you perhaps start with the endpoints of the relation loosely identified as "the photo" and "the text" and then as you read, refine the endpoints to more specific things such as "the columns flanking the door" (in the photo) and the text about those columns?

Another aspect of the repeating nature. What implications are warranted by the fact that the same layout is used in each frame, page after page? At the very least I think there are two:

all these things have something in common, they are of the same logical kind. They have a common schema
there is also a *set* of some kind introduced in the discourse. in this case it's the set of interesting buildings in amsterdam, or something. but this set is itself discussable as a thing, just as the buildings are. You can write something like "all the buildings we've seen up until now use only brick for construction, but after 1810, steel was also available and thus ..." and refer to the set.

Possibly there is also the implication that
there is an ordering to the set, that it is in fact a sequence.

Now I am approaching these questions from the standpoint of analysis, but I think once we get a good grip on them we'll be able to turn them around the other way and use them for generation/assembly as well.

at the very least it seems like one rule that follows from this might be

R1 if you want to present a set but the elements are of very different logical kinds then don't use the same layout for each one. rats: reicheberger et al already said this!

You can imagine for example a "guidebook" that had six frames of , say, houses, and then in the seventh frame there's a close up of, for example, an Amsterdam "coffee" house (or a red-light district window) and the text discussed the features in the same kind of language as used for the house. This would I think be taken for a joke (and I bet this joke has been used before). So for sure, if deviating from a rule results in a presentation that's "funny" then you know it's a rule.

In the Amsterdam example, each page repeats the same pattern. As the user reads the book, he or she becomes familiar with the pattern and learns to associate the placement of an image or piece of text with a particular type of information, perhaps rhetorical.

Another pattern with which the user can become acclimated is associated similar semantics of repeated media objects as indicated that all media objects in that presentation group share those semantics. Once this happens, the next time the user sees a media object positioned in the presentation structure in the same way as the other media objects, the user may assume it has the same semantics, even if the user is not already aware that it does.

Furthermore, if the user perceives an order to these semantics, then he or she may assume the order is consistent even among media objects for which the semantics is not recognized. For example, if in a presented sequence of Amsterdam buildings the user recognized the dates of some of them and noticed they are in order, the user may assume all the buildings shown are sorted in chronological order.

These presentation devices are sources of both utility and of accidental implicature. They can be generated to convey certain rhetorical structures. They can also perhaps be avoided or detected and corrected when not desired.

In the EPG domain, all the objects in the database are the same logically, they are all movies. But imagine an EPG where some of the items were movies that would be shown once only, others were available as "movies on demand", and others were only available from the rental store. Users might be upset if all were presented in a way that made them seem to be the same.

Implicature from Temporal and Navigational Structure and from Content Selection and Filtering

rhet in existing hypermedia

semantic coherence (diegetic) is like pronomial coherence. eg in shot one a person throws a ball to the right, in shot two we just see the ball travelling in space to the right. There's some kind of indexing of space. the film maker's term is "axis of action", the line perpendicular to the camera

can we indicate a change of paragraph by showing scenes with different axes?

it would be very hard if not impossible to study rhetoric of text without understanding the semantics of the clauses. how then can we expect to do a rhetoric of film?

there's an implied connection between sequential shots. external shot followed by internal, means we're now in the building we saw first from the outside. interior shot of person at window, and exterior shot means we're seeing the view out the window. two shots, each showing a person talking, first one facing right the second facing left. they are talking.

is there a closed set of these?

is a photo an ELABORATION? a Background? is it intended to help R identify the building?

a text can refer to things you can't see in the photo. example from the amsterdam architecture text p 48 "Simple but well-proportioned spout-gable standing on a timber lower front. The secret church, also known as 'Onze Lieve Heer op Zolder' and dedicated to St. Nicolaas, is on the top floor".

sometimes the text discusses things in the photo but if you don't know the vocabulary (e.g. what's a pilaster) neither text nor photo tells you, although after seeing enought of these you can puzzle it out.

Is size a clue to NS? it can be. but it can be misleading too. suppose you have an animated video with picture in picture. the little embedded picture could be context for interpreting the larger (current) animation.

might be useful to distinguish these layers

domain semantics, e.g. relations such as (evidence-for a b) (proposition b is evidence for the proposition A). (What happens with non-binary relations?) relations are not directional.
rhetoric/discourse. Assign NS structure, e.g. that B is the nucleus
rhetoric/communication. strategies for expressing discourse. example strategy: show S before N.
abstract constraint. (before B A). (show A before you show B), where BEFORE might mean spatially, temporally or navigationally
dimensional constraint (less-than (start-time A) (start-time B))

strategies: when reader has limited time (or processing ability, background knowledge) show N before S. Or maybe omit S. if user lacks the warrant that would allow him to understand how it is that S supports N, why bother presenting S? This is how executive summaries work.

how does pacing/rhythm affect rhetoric

are RST relations between texts or propositions. texts, I think. while you might concede that text conveys propositions, it's harder to say what propositions an image or video conveys. for one thing it's not verbal, and I have hard time thinking of propositions that are not verbal.

does mere justaposition convey any proposition at all? how can you tell accidental juxtaposition from deliberate.

does a picture at least convey that "such a such a thing existed"? Does it perhaps convey a very very large number of propositions, most of them too detailed to matter? Are images like seeing or being in the world? In one quick glance around the room I take in hundreds or thousands of propositions about spatial relations, textures, consistencies. (e.g. book1 is next to book2)

consider focus pull in cinema. does this change nuclearity?

shot transition and conveying grouping (e.g. in film, a "dissolve" might signify the one is crossing a larger grouping boundary than a cut.)

RST is about effects, but semantics is about truth.

The absence of an effect is itself noticable and hence accountable-for.

We have had centuries to evolve cultural traditions for RS in text, but very little for RS in HM. so it might not be surprising if there are no RS specific to HM. besides if RS deals with the underlying logic and it is extra (or surpra-) linguistic (as Grosz and Sidner have it) then there's no reason to expect it to change in HM. can there be effects in HM not achievable in text? the means are different, but are the ends?

suppose all you know about two pieces is that there's a non-null RR between them. Does this generate any constraint at all, or must you always know more?

RST roles of an image

what roles could a picture P be in? Excluding icons, pictures don't convey requests?

relation role

antithesis ?

background both

cause N

circumstance both

concesssion ?

condition both. consider picture of soldier. "he fights for you"

contract only if both are P

elaboration . s. pictures are specific, it's to see how it could be N to a text S

enable S yes, not P can't be an action/request

evaluation N text: what an ugly picture

evidence s

interpretation N text: this picture was shot in 1901

justify s, n

motivation S (advertising is good example)

otherwise S. text: "if you drink and drive". Image: car wreck

purpose hard to see how P could be a purpose

restatement can a P restate a T? and how could you compare sizes?

result S

solutionhood both

summary one P can summarize another

sequence

joint

relation	role
antithesis	?
background	both
cause	N
circumstance	both
concesssion	?
condition	both. consider picture of soldier. "he fights for you"
contract	only if both are P
elaboration	. s. pictures are specific, it's to see how it could be N to a text S
enable	S yes, not P can't be an action/request
evaluation	N text: what an ugly picture
evidence	s
interpretation	N text: this picture was shot in 1901
justify	s, n
motivation	S (advertising is good example)
otherwise	S. text: "if you drink and drive". Image: car wreck
purpose	hard to see how P could be a purpose
restatement	can a P restate a T? and how could you compare sizes?
result	S
solutionhood	both
summary	one P can summarize another
sequence
joint

a photo can be EVIDENCE for a proposition. (but a photo, being a specific iinstance, can only be an existance proof, it falsify a negative but never establish a universal). It can be an ELABORATION ("The canal boats are painted a dreary shade of green"). it can show the RESULT of an action. It could be a RESTATEMENT (in an alternate media) except the criteria on equality of size is hard to justify.

the whole picture can be a span and so can smaller sections

can there be RST within a picture? why not, if you allow for multiple spans. is it one picture or a composite/montage? for sure in video, a sequence of shots.

we need to be clear on the diff between propositions and RR between propositions.

what are the spans in HM. What are the nuclei?

RST of HM

7 feb 2000: simplest possible case one text, one picture. what RST is possible. can a picture be N, can it be S.

from RST analysis perspective, the relative opacity of image to automated analysis is no problem because RST analysis is done by humans anyway

consider advertisements. (Mann and Thompson analyzed them, why shouldn't we?) A problem is that some or much of the advertisement is semantically or pragmatically very tricky. eg the image is for shock value. How seriously can we take the "covert messages" of advertisements, e.g. sexual appeal? will analysts agree on the text and sub-text?

what about sound? well it's hard to see how music can even refer. One possible exception is the use of motif in Wagner. Some music sets a tone, as a means towards influencing the reader's attitude. The amsterdam demo has light cheerful amsterdam song background. compare with the often heard brisk efficient light modern funk type sounds. What about music that borrows or reworks well known material, e.g. muzak versions of Beatles.

motivated environmental sound seems to be like pictures. The image shows you what a thing looks like and the soundtrack what sound it makes.

other

Hypermedia communicative devices include

having two objects be adjacent in space, time or navigation - or having a set of object appear sequentially in space, time or navigation
having a group of objects appear on the screen simultaneously
having a visual object appear when a spoken word or phrase is said
or having a set of objects appear grouped together, as in the same of multiple screen displays

Type 1 acts are used in Fiets along all three hypermedia presentation dimensions. Jim observed an act of type 2 being implied in the EPG screen dump used in the HT00 submission . In this screen dump, each movie was adult oriented -- no, not kinky, but with violence or other mature themes. If the next screen display was all Disney movies, the perception of a type 2 act would be enforced, even if it was not intended.

Maja's Fiets time-to-time demo and WWW9 SMIL demo are good examples of type 3 communicative acts. Fiets displays keywords at different locations timed with the mentioning of those topics in the main audio speech. The WWW9 demo display images that related to spoken phrases when they are uttered in the main audio speech.

There have been publications specifically about communicative acts in hypermedia, right? Much of the SRM-IMMPSs stuff, for example. Can we get a list of acts from these? Would we be able to contribute something novel beyond these, such as how they apply to conveying rhetorics.

IMPLICATURE IN HYPERMEDIA

As stated in an earlier email, one type of implicature occurs when a communicative act is used that is not explicitly explained. If the user perceives a communicative act, a meaning for it will be sought. If the meaning is not clearly stated, the user may assume the author is flouting the quantity maxim. Why did the author want these to be associated but also not want to say why?

This problem is magnified with automatic generation because it is harder for computers to detected unintended communicative acts than humans. But if some communicative acts for hypermedia are quantified, then the computer can at least search for all of those in the rendered presentation and make sure a communication for each was indeed intended. What the commuter can't do so well is detect additional unintended communications for each act. This type of processing is probably AI-complete.

One example of this is if alphabetical order is used and incidently groups items in categories beyond the alphabet. Jim gave the example of the first screen displays of a tour of Amsterdam presentation being all coffeeshops. This could be because "coffeeshop" comes before "museum" and "restaurant" in alphabetical order.

COMMUNICATIVE ACTS IN HYPERMEDIA VS TEXT

Hypermedia communicative acts inherit communicative acts from text, of course. The communicative acts that appear in the text of a hypermedia presentation would also be communicative acts of that hypermedia presentation. This applies both to visual displayed text and audio speech in the hypermedia presentation.

As has been said, there is probably no difference between text and hypermedia in the Mann & Thompson rhetorics. (Or is there?) What is different, and well worth researching and publishing about, are the communicative acts and their use. Some of the communicative acts in hypermedia are different than for text, such as those in the dashes above.

Perhaps it is also true that the communicative acts that are the same between text and hypermedia are sometimes used differently in hypermedia. All the acts that are the same would appear in text: either the text that makes up the entirety of a text document, or the text that is integrated into a hypermedia presentation. A communicative act in the text of an integrated hypermedia presentation can be altered if the presentation of that act is associated with a document object in another part of the presentation using a hypermedia communicative acts, such as those described above.

COMMUNICATIVE ACTS IN HYPERMEDIA VS OTHER PRESENTATION FORMS

Here is a list of non-hypermedia presentation forms roughly in increasing order of complexity

text
speech
text publication layout
comics
graphics
immersive extended spatial display (VR, data viz)
video/film

Hypermedia could be considered the integration of all of these. It would thus inherit communicative acts from all of them, and how they are used for conveying rhetorics. This integration would also bring about new types of communicative acts, and new ways of conveying rhetorics.

Text

Text is the linear progression of written words, perhaps with some degree of formatting. An example is the letter in the T&M leesklub paper. Communicative acts for text, and how these convey rhetorics, has been extensively researched and is well established. Perhaps there has been work on (semi-)automatically generating text from rhetorics as well.

Speech

Speech inherits acts from text, for the most part. It adds voice tone and pausing as communicative acts. Has there been work how these apply to rhetorics? I believe Jim has mentioned that there has been. Speech doesn't inherit formatting from text, of course, but there are probably aural equivalents of all typical text formatting. Work has been done on generated speech from electronic text and meta-data, particularly for the sight-impaired. What's unique about how speech applied to hypermedia is that speech is continuous and temporal. How temporal occurrences in other aspects of the integrated hypermedia is synchronized with components of the continuous speech makes up communicative acts. Automatic generation could generate speech and synchronize components of with the the overall hypermedia presentation.

Text publication layout

Text publication layout is how text and images appear in more complex formatting -- for example, that of a typical magazine publication. This includes references to figures, and the inclusion of sidebars, for nucleus-satellite relations. Where these figures and sidebars appear, and how the text is broken up between pages, can be communicative acts. Thomas Kamps has done work on this, even in the context of the SRM-IMMPSs, I believe. Perhaps this discussion is in terms of communicative acts. Has there been work on conveying rhetorics with layout?

Text publication layout applies well to hypermedia presentation layout. Hypermedia layout has some additional things, and is typically used differently, however. One example is that text publication layout is based on the flow of the integrated text along columns, whereas hypermedia layout is based more on images and captions. Wolfgang [Klaas?] has done publication on hypermedia presentation layout in the context of the SRM-IMMPSs.

Comics

When first mentioning comics, one has to excuse oneself for doing so. Comics have a bad rap, but wrongly so. Comics have been very influential in layout and graphics in general. There has been serious research regarding their use for, among other things, instruction. Jim has mentioned a Scott McCloud publication on the subject. I am email a friend of mine from the comics world how has mentioned seminal and influential comics research done decades ago. While much work has been done on communicative acts in comics, I don't think any has been done on rhetorics. In terms of authoring and generation, comics contribute some to spatial layout and to the design of individual graphic components.

Comics have a left-right, top down flow of content compositioned into boxes. The EPG and space-oriented displays of Fiets inherit from this. Sometimes these boxes in comics are of various sizes. There are rules and guidelines for the laying out of these boxes.

A publication in MM99 discussed the use of these rules for laying out keyframes of video, and how the different types of keyframes were communicated using particular patterns of this layout. This was explicitly comics-based. The system is called Manga, which is the Japanese word for comics.

Graphics

Much work has been done on graphic design and what could be called communicative acts with graphics. I don't think much, if any, has been done regarding rhetorics. Graphics can, of course, be generated. Thomas Kamps has done work with this.

Immersive Extended Spatial Display

Anton Eli�n and Robert van Liere are both doing research in using this type of presentation for convey concepts. This conveyance of concepts could be called communicative acts. Robert's work is mostly in data visualization. The type of information convey is typically that of patterns within very large data collections, or complex 3D phenomonen. This does not lend itself to the types of document information (hyperlink relations) we are interested in.

However, Robert is familiar with the related, but still fledgling, field of information visualization. This uses data visualization techniques to convey information typical of documents. One example is visualizing large scale document structure, such as a navigational chart of a Website. One could extrapolate this to conveying the overall rhetorical structure of a large document. But even so, this is probably of more us to authors than to users. Perhaps there is work in information visualization that presents issues more readily applicable to conveying rhetorics to users.

Anton is building a lab for extended spatial presentation (I just made these words up, they are not his). He is interested in not just recreating real-world walkthroughs, but in useful means of conveying document concepts that are not necessarily based on physical reality. Are there rhetorical techniques here?

Video and Film

We've already discussed this plenty, and have plenty more discussion and research coming. The most important thing introduced with film is the use of time. Speech has time, but film is also visual. Film could be everything hypermedia is, but without navigation. The generation issue with video is that we are not generating video. A system could, however, generate animations along the same principles.

strawman

The paper could provide a quick overview of rhetorics ala T&M, Grosz & Sidner, and Grice. T&M would provide the actual rhetorics, while the others convey the importance of having each act convey exactly the intended rhetorics. The T&M rhetorics would be used as structured and processed metadata in a running example in the paper.

The paper's main thrust would be presenting hypermedia communicative devices for conveying these rhetorics. (Notice the use of the conditional tense here. It is implicating that this is not decided, it is a strawman. Can conditional tense be used in generated text, along with the typical textual and verbal cues that make up communicative acts?) An overview of SRM-IMMPSs and other published multimedia acts can be given. Then we can discuss the use of these for conveying rhetorics, which we currently assume to be a novel contribution. Also given would be the communicate acts use in the other presentation forms. Any work on the use of each of these for convey rhetorics in particular would be given. How each of these is inherited by and adapted for rhetorics conveyed with hypermedia would be discussed.

The important underlying and motivating themes are based on how important it is to understand the intended rhetoric structure and communicate it correctly. What applies to authoring also applies to generation, and then also to adaptation. Being able to author rhetorics for generation, and also having systems capable of correctly processing these rhetorics into presentations, which speed up the authoring and distribution process. Automatic generation enables automatic adaptation. No matter how a hypermedia presentation is adapted, the rhetorics must remain correctly conveyed. The question remains how much we want to go beyond authoring and into generation and adaptation. Just discussing these topics as advice to human authors may be enough, and may later be easily extendible into generation and adaptation.

A running example is important. We could have a Fiets demo that uses a substantial selection (if not all) of the hypermedia communicative acts. It should also convey all of the M&T rhetorics. Perhaps is should even illustration Grosz & Sidner and Grice issues. We could also through in some engineering by Joost. We could use the Houdini constraint handler to illustrate widely variation adaptation of hypermedia presentations that still maintain the intended rhetorics. More simply, the running

And perhaps the paper could be called "Conveying Rhetorics with Hypermedia Communicative Acts".

links back

There's a distinction between conveying nucleus-satellite relations and multi-nuclear relations. One must be able to distinguish between a nucleus-satellite relation and a two-member multi-nuclear relation. How? If one is dominant, than it is the nucleus in a nucleus-satellite relation. If both have equal emphasis, then they are in a two-member multi-nuclear relation.

What sort of communicative acts make one dominate? That it is bigger in space? That it takes up more time? That it comes first in some sort of conveyed sequence, be it spatially, temporally or navigationally conveyed? The Manga publication in MM99 discussed some spatial means for doing this based on comics.

Presentation must also distinguish between the different multi-nuclear relations. We have discussed (and hopefully published) sequence, and how to convey it. But what if a set of objects share a joint relation (which I assume to mean they are in a group in which order, unlike in sequence, does *not* matter)? What communicative acts show that order among a group of objects does matter, and what acts show order does not matter? These are communicative acts that distinguish sequences from joints.

The Fiets time-to-space and space-to-space presentations show sequences spatially. So do comic strips. Spatial structure is aligned so that there are clear rows, and these rows are stacked on top of each other. In comics, the fact that items in a row all have their tops and bottoms aligned enforces this left-right linearity. The fact that such aligned rows are atop one another with aligned left and right boundaries enforced a top-down direction. Together this spatially conveys a sequence.

Perhaps when conveying a joint, you want to avoid spatial patterns that communicate sequentiality. Perhaps the way to communicate a joint is to have the objects visually distributed in the display at random, so that no row-column order is perceived.

Time is intrinsically a sequence -- a timeline inevitably communicates sequentiality. Perhaps there is no way to avoid having time convey a sequence. Perhaps the heuristic here is to avoid putting members of a joint in a temporal sequence -- they must be displayed simultaneously or accessed by link traversals whose starting points do not convey sequentiality.

A cascade of link traversals also conveys a sequence. A certain sequence of links can be made traversable only sequentially by having each link be only traversable from the node of its predecessor. This is a clear way of denoting a sequence with link structure.

Alternatively, there can be a central node with links to each sequence member. In this case, each node does not have to have a link to the next member, but can instead have a "back" link to the central node. Or each node could have both links. Whatever type of links each member node has, the central node should have the presentation of the links to all member also convey sequentiality, and avoid conveying jointness. This can be done by having the temporal structure of the central node make the links available one at a time along a timeline to convey the sequence. The sequence of links in the central node can also be conveyed spatially with the comic strip means described above. An example of this would be a menu bar along the bottom of the screen, as used in Fiets space-to-link mapping. This relates to the discussion in the HT00 submission in the section on conveying strict vs. loose sequences.

You want to avoid a strictly navigationally sequential link structure when conveying a joint. If each node has a link to another member node, you want each to have links to all other member nodes. The starting points for these traversals in a single node do not, of course, convey the details of the members of the joint, but their presentation may still convey joint or sequence. Since these starting points are in a single node, they are not distinguished navigationally. Since they convey a joint, we do not want to separate them temporally. Thus, the relation between starting points must be conveyed spatially. And thus, we want to avoid row-column or any visually perceivable order in their spatial layout -- these starting points should be distributed randomly in each node. If there is a central node for the joint, it should display the links spatially randomly as well.

Can we have a similar exploration of the contrast multi-nuclear relation? I don't think there are structural means of conveying constructs. We would probably have to use language or audio/visual symbols (icon and/or earcon = xcon? ;). I believe the same is true for the distinction between the different nucleus-satellite relations.

But to babble on, what is meant by the "lack of structural means"? We can have "Chomsky-esque" innately and universally human means that are in our genetic code and independent of education or convention? Temporal and linking sequence probably qualify. Jointedness conveyed with spatial randomness may as well.

But spatially conveyed sequence, on the other hand, must be learned. The left-right top-down ordering is Indo-European. Script of semitic languages has a spatial ordering of right-left top-down. Isn't Chinese and/or Japanese iconography top-down left-right? I'd argue that we've learned spatial ordering from comics as well (much as some of us may want to deny having learned anything from comics ;). The fact that Japanese comics are left-right top-down while the writing is (I think) top-down left-right is evidence of this.

There are conventions for communication in media that go beyond language and have been made only recently, certainly only in this last century. Much of these are or will be discussed regarding the non-hypermedia presentation forms. To what degree do these use Chomsky-isms and what aspects of them are only established convention? Young Martin's understanding of music ending a video program is learned. Much graphic design uses adjacency, which is probably inherent, though if so only independently of direction (or is top-down inherent). I think graphic designers have cleverly used inherent perception as much as possible. Certainly data visualization relies heavily on nature, inherent perception.

The point is (in case you were wondering) that "lack of structural means" means that there is certainly no inherent and probably no established way of communicating the desired concept. Which means we either need to use an established xcon, a non-structural, specific media object (for example, the word "after", either writen or spoken, or an arrow), or we need to introduce a new xcon, or we need to introduce a new structural convention.

What is involved in introducing a new xcon or structural convention to convey a rhetoric? Maxim One: It should be based as much as possible on inherent perception and/or on established convention. Maxim Two: it should be used consistently in this presentation or series of presentations. These maxims must be established somewhere, certainly at least in text publishing and layout. If you are encoding a style, such as BBC or MTV, such convention should be used consistently for all presentations within that style.

And now for something completely different, though still in the topic of convey rhetorics with hypermedia. We could take each other presentation form, and see what extensions follow when adding one hypermedia aspect to it at a time. What happens when you add time to comics? Or animation in particular to comics? Jim and I saw yesterday an example on the Web of adding interaction to comics. What happens when you animate graphic design? This has probably already explored. Certainly adding interaction to text has been explored, and probably in the area of rhetorics. Maja can tell us about adding interaction to film. And what does this mean in terms of rhetorics?

References

Mann, Mattheissen and Thompson
Rhetorical Structure Theory and Text Analysis
USC ISI Research Report 89-242, Nov 1989

Rhetorical Structure Theory:
A Theory of Text Organization
William C. Mann and Sandra A. Thompson
ISI/RS-87-190
June 1987