14-5-2003

These are notes written during the writing of the ISWC2003 paper. It
mainly focusses on 'my' part which was: Given a Structured Progression
what are the issues when generating a presentation (hfo tree).

joost

 In a traditional document engineering task an author marks up the
 content of document. By doing so, the semantic function of the
 content is made explicit. This then can be used by a style sheet to
 transform the document into a final form format. An example is this
 paper which was marked up in \latex, defining sections, subsections,
 bibliography etc. The marked-up document can (in several
 transformation steps) generate various output formats for different
 platforms, such as Postscript for paper and HTML.\ for
 Hypertext. Making semantics explicit (through document structure)
 therefore allows us to abstract from the final format of the
 document. Although \latex defines a rich environment in which
 semantic functions can be specified (list, emph) there is a point
 after which content is considered to be atomic: Figures for example
 are black-boxes of which \latex knows nothing except its dimensions
 (which are specified by the author). A figure can thus not be
 reformatted in case it doesn't fit on a page. To some extent the same
 holds for plain-text paragraphs in which the relations between
 words\footnote{latex does know about hyphenation of words.} and
 sentences are not made explicit. Rephrasing of long sentences, for
 example, is not possible. In general this is not really a problem
 however because in text the atomic units are small and adaptive
 enough to fit the layout requirements.

 In contrast, atomic units in multimedia are typically large (images,
 video, audio) compared to text. As a consequence document structure
 for multimedia is not as fine grained as its textual
 equivalent. Therefore conveying semantics in a multimedia document
 is, just as the figure in \latex, the responsibility of the author
 who understands the semantics of the atomic unit. 

 Examples of conveying relationships between media can be realized by
 using design constructs like alignments or by using similar colored
 backgrounds. Grouping two units/items changes the message of the
 individual media items, for example, placing two equally sized images
 of Rembrandt ans Saskia next to each other suggests a relationship
 between them. Grouping therefore requires knowledge about the content
 of the media items, and in addition to that, design structures which
 convey the relation between the media. Besides knowledge about the
 semantics of the media its physical properties are also
 important. Text items and image items are grouped differently then
 two images. Finally, an author has to make sure the documents
 physically fits the output medium. This sometimes involves a complete
 restructuring of the presentation including (re)selection of content
 representing concepts.
 
 Concepts are being represented by media items, the choice which media
 item represents the concept best is far from trivial, even if we
 assume media items being annotated with domain concepts directly. The
 choice for a representing media item is influenced by platform
 constraints (no video on mobile phones), style (no happy colors in a
 obituary), document structure (no animation in a letter) and user
 profile (no scientific American pictures for a 7 year old). To some
 extend media items can be converted or transformed into media items
 which do not contradict the constraints. Examples of this include
 conversion from text to audio in case of small screen devices or
 displaying several key-frames of a video in a paper document.

 When automatically generating presentations, while abstracting over
 the final medium the presentation is played/viewed in, traditional
 document engineering prerequisites, such as the content of the
 presentation, as well as the document structure are unknown a
 priori. Still, the informative content which is used to produce, both
 a multimedia presentation and a paper document ,for example, about
 Rembrandt is very much alike. This structure is called Structured
 Progression and defines grouping, ordering and priorities of
 narrative unit instances. It describes the main message an author
 wants to convey, and relates and organizes narrative units
 contributing relevant information to this message. In sum, SP
 structures the information content of the presentation while
 abstracting over document structure and media content which are means
 to convey the message.

 [[figure:

	bio	->	SP	-> DS.  (latex)	-> TeX	-> foe	-> paper
				! DS			-> fo

	bio	->	SP				-> hfo	-> mm
 ]]
 
 A document structure allows separation of presentation and
 content. However, if the atomic units of a document are too large,
 such as media items in a multimedia presentation, this separation is
 not realistic anymore. An automatic formatter has to know some of the
 semantic content of an atomic unit in order to present it
 properly. This requires domain knowledge about the media item (what
 is being represented) and domain knowledge about how relations are
 being represented. This also includes preventing inference of
 unintended relationships which can occur by presenting them
 together. There exist a three way dependency between the choice of
 media, the way they are grouped in order to convey semantics and the
 document structure of the presentation. In addition all of these are
 externally influenced by content, platform, user, and style.

[[ figure:

			[content, platform, user, style]
				   |
				 media

                               /        \
		
		document structure  -	grouping
			|		   |
[content, platform, user, style]	[content, platform, user, style]


]]

 [[examples

  media:
	content		media represents content
	platform	no video on paper
	user		medical images for a doctor 
	style		bright colors for children

  document structure:
	content		no power point love letter
	platform        no temporal dimension on paper
	user		interactive multimedia for children
	style		MTV style -> multimedia
			news	 -> paper

  grouping
	content		align media to convey relation	
	platform	no slide-show on paper
	user		
	style		
 ]]

 
 These choices are not orthogonal, For example, a choice for document
 structure influences the choice for media and media groupings, and
 vice-versa. In addition to the interdependencies between the
 categories mentioned above, they individually have dependencies with
 \emph{ content (semantic)}, \emph{user profile}, \emph{platform
 profile} and \emph{style} (see~\cite{ins:smartstyle} for more detail
 and examples).


Observations:

 * Is this why paper documents have more layers of document structure
 then multimedia presentations? Since there is no option of media
 grouping in textual documents?

 * Overflow works for groupings based on document structure while
 media groupings need fallible fo's.

 Semantics seem to be less important in
 document structure groupings then they are media groupings. Two
 paragraphs/sections/scenes can be next to each other without the need
 to have deep knowledge about content. Media groupings in contrast
 need to know exactly what the relationships are.

  ]] 

 
 [[problem: lot of dependencies and need to provide general methods (rules) of
 transforming SP into a presentation ]]


 [[1: media: presentation rules which use domain and design ontological
 knowledge to present concepts using atomic media items. -- mapping
 from concepts to media is done here, genre specific media (picture in
 bio) is retrieved, uses media type ontology, adding redundant media
 such as voice overs and captions for figures, need knowledge about
 role of media]]

 [[2: mediagroup: presentation rules which use domain and design
 knowledge to present groups of media items. includes rules to combine
 different types (eg text-image, image-image) . uses hfo ontology need
 different types of hfo then. probably needs domain knowledge too and
 genre. Argh! this makes the rules very specific which we do not want
 really.]]

 [[3:document structure (DS): rules which map SP -levels to DS-levels
 (chapter, section etc) SP reflects genre leveling. how to decide
 whether to make a sub-level in the DS or use media grouping ? DS
 defines how to convey boundaries (fade effects in scene transition,
 bold headers in text) ]]

 [[ Are 1, 2 really separate steps? they'd better be otherwise there
 is not much room for generalizations.]]