Workshop on Multimodal Output Generation (MOG 2007)

25 and 26 January 2007, Aberdeen, Scotland

http://www.csd.abdn.ac.uk/mog2007/

Proceedings: http://www.csd.abdn.ac.uk/mog2007/mog07_proceedings.pdf

Trip Report by Zeljko Obrenovic

WORKSHOP PURPOSE

Past research in human-computer interaction provides evidence that the use of multiple output modalities makes systems more robust and efficient to use. A quick and successful interaction is expected when, for instance, the system's output is presented to the user via multimedia/hypermedia in which text and graphics are merged, or by a conversational agent that combines the use of speech and gesture. In such multimodal systems sophisticated specifications are needed to combine the different modalities in such a way that each bit of information is presented in the most appropriate manner (i.e., the system should select the most suitable modalities and modality combinations to convey information to the user).

Work on multimodal output generation is spread across disciplines and separate subfields such as multimodal natural language generation, the generation of multimedia/hypermedia presentations, and research on output modalities for conversational agents. Presentation of this work has been scattered across various events. One of our objectives of the MOG 2007 workshop is therefore to bring this work together in one workshop that aims to present the state of the art, and identify future research needs, in multimodal output generation by focusing on two research questions:

(1) what output modalities are most suitable in which situation?
(2) how should different output modalities be combined?

The intention is to provide a forum for researchers on multimodal output generation to meet, exchange ideas and engage in scientific and academic research collaboration.

GENERAL IMPRESSION

Workshop was very well organized and very interesting. All the presentations were well prepared, with many people participating in discussions.

Although the CfP for the workshop included general problems of multimodal interaction, most of the papers addressed natural language processing and virtual characters. However, all the talks were interesting, and some of them are particularly interesting for INS2, such as:

Harry Bunt's invited talk "Towards Standardization in Semantic Annotation", where he described ISO efforts towards "preparing international standards and guidelines for effective language resource management in the multilingual information society". ISO semantic annotation includes data categories for temporal information, reference relations, semantic roles, dialog acts and discourse relations. All this efforts, however, address only textual presentation.
John Bateman's talk on "Generating Text, Diagrams and Layout Appropriately According to Genre". They try to gradually transfer the implicit spatial information in the visual image to explicit representational structure for particular genre, and then use this structure to generate new presentations. Technology they use seems to be very similar to Cuypers systems.
Charlotte van Hooijdonk's talk "Towards Automatic Generation of Multimodal Answers to Medical Questions: A cognitive engineering approach". They presented a result of study of using different modalities (text, images, spatial layout, text formatting) to acchieve best presentation effects in learnign context. They classify images they use in presentation according to three functions: decorative function of media (no informativity), representational (if removing it does not alter informativity, but presence make it more clear), additional function (if removing it alters the informativity). Some of the problems seems to be similar to problems in MultimediaN.
Yulia Bachvarova has interesting paper "Towards a Unified Knowledge-Based Approach To Modality Choice", but was not at the workshop to present it.

In the rest of the report, I give the list of authors and some notes on their presentations. For main author, I have provided a link to his home page, with a list of his general research interests.

MOG 2007 TALKS AND NOTES

(Legend: Bold - authors present at the workshop).

	Thursday January 25th 2007	Location: Meston MT02
9.00-9.25	Registration in the foyer of the Meston building
9.25-9.30	Welcome
9.30-10.15	Jon Oberlander University of Edinburgh Discourse generation, individual differences, and multimodality: Intelligent labeling (ILEX, M-PIRO) Personality-based differences in discourse generation (CrAg), Diagrammatic reasoning and communication (MAGIC, COMIC).	Invited talk: What Are You Looking At? A Personal View on Multimodal Output Generation users prefer virtual characters to text, but not to pure voice: T < A < S RUTH talking head, Festival2
10.15-10.45	Coffee/Tea break
10.45-11.15	Erwin Marsi and Ferdi van Rooden Tilburg University Natural language and speech from a computational point of view. Recognizing Textual Entailment Text-to-text generation and sentence fusio Dependency Parsing Prosody prediction Speech Synthesis, both Text-to-Speech and Concept-to-Speech conversion *Talking head animation* Natural Language Generation Corpus annotation and validation Morphological analysis and POS tagging of Arabic Machine Learning (memory-based learning in particular)	Expressing Uncertainty with a Talking Head in a Multimodal Question-Answering System Main question: how to add cues of uncertainty to a talking head when it is answering question whit or without certanity Problems with experiments and evaulations
11.15-11.45	Dr. Jan Peter de Ruiter Max Planck Institute for Psycholinguistics, Nijmegen Durational Aspects of Turn-Taking in Spontaneous Face-to-Face and Telephone Dialogues	Some Multimodal Signals in Humans
11.45-12.15	Markus Guhe University of Edinburgh Cognitive processes underlying communication Processes and mechanisms with which communicative intentions are expressed. Mainly language, but also encompasses other modalities: gesture, facial expression, body posture, expression of the affective state, non-linguistic vocal expression Building computational cognitive models to gain scientific insights	Towards a Cognitive Model of Multimodal Output for Language Production where to put multimodal fission
12.15-13.15	Lunch break
13.15-14.00	Harry Bunt Tilburg University Interests: Formal and computational semantics of natural language -- look here, here and here Dialogue theory and computational pragmatics -- look here; see also the DIT++ taxonomy of dialogue acts Context modelling and context-driven dialogue management Semantics in multimodal communication Pragmatics and Human-Computer Interaction Definition of interoperable concepts for resources in language technology Constraint-based natural language description, parsing and generation Rational and cooperative multimodal communication Projects: Dynamic Interpretation and Dialogue Theory (further development and application of DIT - see the DIT++ taxonomy of dialogue acts) LIRICS: Linguistic Infrastructure for Interoperable Resources and Systems. eContent Project nr. 22.236 (with Mandy Schiffrin) PARADIME:Parallel Agent-nased Dialogue Management Engine, project within the IMIX program (with Simon Keizer and Jacques Terken) DIAMANT: Dialogue Management Theory. Subproject of the DIAMOND project (with Jeroen Geertzen) Computing Meaning in Interaction. Subproject of the DIAMOND project (with Roser Morante) Context-driven dialogue management for spoken human-computer interaction (with Yann Girard and Jacques Terken) Context-driven generation of multimodal dialogue acts (with Ielka van der Sluis and Emiel Krahmer) IDUSI: Natural dialogue structure and user interface design (with Hans van Dam and Jacques Terken) Machine learning and dialogue strategies (with Piroska Lendvai, Antal van den Bosch and Emiel Krahmer)	Invited talk: Towards Standardization in Semantic Annotation as undertaken by the ISO organisation giving standard for multimodal representation (ISO group) recent workshop ISO, LIRICS projects, ACL SISEM Working Group... ACL-SIGSEM Working Group on the Representation of Multimodal Semantic Information 2005, LIRICS Linguistics... "to prepare int. standards and guidelines for effective language resource management in the multilingual information society" - diversity of theoretical approaches - limited researchers' freedom + reuse and integration of language resources from different sources ISO semantic annotation registry of data categories: temporal information, reference relations, semantic roles, dialog acts, discourse relations LIRICS (European), lirics.loria.fr temporal information, reference relations, semantic roles, dialog acts, but NOT discourse relations no semantic annotation without a semantics trying to design a metamodel is a useful approach to see differences/similarities among approaches Defining semantic roles approaches to semantic role: description model (event/verb-dependent), semantic granularity (coarse, medium, fine) roles of frame-net semantic roles metamodels current work: test/validate in annotation experiments Data categories for reference annotation central notion is the "markable" additional relational and objectal relations (in additional to lexical - synonymy, hyponymy...) punctual and extended events ISO-TimeML Dialog acts (favorite subject) dialogue, turns, sender, overhearer, addresses, utterances, dialog act, semantic content, communicative function communicative functions stressed by dialog utterances have multiple functions ==> multidimensionality DAMSL what is a dimension in dialog: it can be addressed by means and independently of other aspects feedback, turn-taking, time, contact attention, opening, closing general purpose functions (to any dimension): informs, question contact management, auto-feedback... pilot testing: for usability by multiple annotators with little training segmentation in dimensions, not in dialog... dimension specific functions
15.00-15.30	Coffee/Tea break
15.30-16.00	Yulia Bachvarova, Betsy van Dijk and Anton Nijholt University of Twente Working on PhD thesis which focuses on developing formal, computational model of how different modalities communicate within a multimedia presentation. This model is used in automatic generation of multimedia presentations to provide the generation engine with the required knowledge base and algorithms to properly assign the appropriate modality combinations. ICIS/CHIM project Bachvarova, Y. and Elouazizi, N. (2005). Integrating Knowledge about Modalities to a Multimedia Knowledge Representation Framework. In Proceedings of the Second International Workshop on the Integration of Knowledge, Semantics and Digital Media Technology. EWIMT 2005. Published by the Institution for Electrical Engineering, IEE, London. Pp. 133-138. Bachvarova,Y. and Elouazizi, N. Conceptual Argument For a Modality Ontology to Support Automatic Modality Assignment. In: Proceeding of the Workshop on Multimodal Interaction for the Visualization and Exploration of Scientific Data, International Conference on Multimodal Interfaces (ICMI 05), Trento, Italy, October, 4-6, 2005 Elouazizi, N., and Bachvarova,Y. On Cognitive Relevance in Automatic Multimodal Systems. In Proceedings of the Sixth IEEE International Symposium on Multimedia Software Engineering (ISMSE ’04) (Miami, Florida, USA, December 13-15, 2004). IEEE Computer Society, Los Alamitos, California, 2004, 418-426. Floris Wiesman, Stefano Bocconi, Boban Arsenijevic, Yulia Bachvarova, Nico Roos, and Lambert Schomaker. Intelligent Information Retrieval and Presentation with Multimedia Databases. In: Proceeding of the Fourth Dutch-Belgian Information Retrieval Workshop (pages 52-56), Institute for Logic, Language and Computation, University of Amsterdam, Amsterdam, Netherlands, December, 8-9, 2003, Note: Edited by A.P. de Vries	Towards a Unified Knowledge-Based Approach To Modality Choice NOT PRESENT AT THE WORKSHOP
16.00-16.30	Željko Obrenovic, Raphaël Troncy and Lynda Hardman Centrum voor Wiskunde en Informatica, Amsterdam	Vocabularies for Describing Accessibility Issues in Multimodal User Interfaces
16.30-17.00	Charles Callaway University of Edinburgh Natural Language Generation, research, as well as its integration into cultural activities, learning environments, interfaces for animated agents, and large-scale generation projects. Discourse planning, sentence planning, surface realization, document planning, revision, multimodal explanation generation, spatial expressions, pronominalization, discourse markers, self-explaining documents and multilingual generation PhD Thesis: Narrative Prose Generation North Carolina State University, 284 pages, Raleigh, North Carolina, April 2000. (abstract, pdf version, LaTeX Bib)	Non-localized, Interactive Multimodal Direction Giving
17.00-18.00	Drinks
19.00-?	Burns Night	Lemon Tree
	Friday January 26th 2007	Location: Meston MT02
9.00-9.45	Elisabeth André University of Augsburg, Germany	Invited talk: From Annotated Multimodal Corpora to Simulated Human-like Behaviors Variants of Information Presentation Systems with Virtual Characters TV style, role plays, face-to-face dialogs, multi-party dialogs (future) how to acquire knowledge about multimodal human-human communication (intuitively, not model), how to code such knowledge, how to implement such behaviors in an embodied conversational agent data, models and ECA: analysis-by-observation, analysis-by-synthesis ... synthesis by observation always good to start from models, and then implement it and test it capturing knowledge on human-like behaviors, motion capturing, study of literature, video recordings use a corpus to derive typical behaviors use a corpus to compare human-human and human-agent communication humaine European network of excellence: emotions in man-machine communication facial action coding system: automatic generation of mimics based on MPEG-4 standards Ekman 1992 model, four ways how people lie: micro expressions, masks, timing, asymmetry effect appears only in situations which allows them to fully concentrate on agent's face Modeling politeness: people apply politeness norms when talking to computers, users feel better if computer appear polite, book by Brown and Levinson gesture classes: hand/arm/movement, non-communicative (adaptor), communicative (emblem, deictic, illustrative (iconic, metaphoric), annotation of corpora Anvil annotation tool new scheme for annotation politeness gestures how to exploit the knowledge to control the behaviour of an ECA: copy-synthesis approach, over-generate and filter, derivation of rules cross-cultural aspects of politeness: American vs. German students, statistically significant, slight differences were manly cause by problems in translation
9.45-10.15	Coffee/Tea break
10.15-10.45	Mary Ellen Foster Technische Universität München Research interests: Multimodal generation Generation in dialogue systems Example-driven generation Variation in generation Practical implementations of all of the above Current projects: JAST - Joint Action Science and Technology Previous projects: COMIC - Conversational Multimodal Interaction with Computers	Issues for Corpus-based Multimodal Generation corpora in text generation, multimodal corpora corpus provide guidance for human developers more direct use: design decision making, automated evaulation (cross-validation) role of variations multimodal corpora: recorded annotated collection of human behavior, annotated on multiple layers with imlicit and explicit links between the layers uses of multimedia corpora: analysis, indexing and retrieval, summarisation, generation non-verbal behavious for ECAs, using human behavior to decide how an agent should work contextual information: characteristics of the speech signal, information structure and affect, motion and error type, intended prosody, syntactic structure, dialog history, user model representing context this could also be interessting contribution for canonical processes of media production: Canonical Processes of corpus-based multimodal generation
10.45-11.15	Dirk Heylen University of Twente Generating Expressive Speech for Storytelling Applications	Multimodal Backchannel Generation for Conversational Agents Sensitive Artificial Listener nose tracker device interest in listening heads
11.15-11.45	Paul Piwek The Open University Research Theme: Dialogue and Natural Language Generation Speech acts in dialogue, in particular, (logically complex) imperatives; Multimodal generation for conversational agents; Empirical studies of demonstrative reference in dialogue; Politeness and bias in dialogue summarization.	Modality Choice for Generation of Referring Acts: Pointing versus Describing AIM: challenge two assumptions common in generation algorithms for multimodal referring acts: non-verbal means of referring are secondary to verbal means, there is single strategy for this previous work: point when you cannot express it with words cost of pointing
	Adrian Bangerter and Eric Chevalley University of Neuchâtel Research interest: Social interaction and practices in selection and appraisal interviews Coordination in collaborative work Group processes and teamwork Discourse and conversation analysis of task-related communication Interplay of language and non-verbal communication (gesture) Social representations, Diffusion of ideas and social construction of knowledge Life course research	Pointing and Describing in Referential Communication: When Are Pointing Gestures Used to Communicate? study gestures used in communicative situations show audience design gestures may be functional both for speakers and for communication question is not weather or not, nut when gestures communicate functions of pointing gestures
12.15-13.15	Lunch break
13.15-13.45	John Bateman and Renate Henschel University of Bremen Research Ontologies, particularly for natural language Discourse structure The presentation of information combining presentation modalities: texts, pictures, graphics, layout, video and so on. The automatic production of natural language texts and discourse Various aspects of SFL, particularly on the intersection of SFL and computational linguistic description.	Generating Text, Diagrams and Layout Appropriately According to Genre GeM project communicative artifacts that adopt a page metaphor are combining an increasing array of simultaneous modes systematic means for exploring kinds of means Twyman's classification of the combination of modes in documents (pure linear, linear interrupted, list, linear branching, matrix...) things organized about space Genre and multimodality Kress and van Leeuwen, Waller Rod, Martin (genre) Kress / van Leeuwen a semiotic mode waller's model of document design corpus basics, GEM annotation scheme (CSS3, XSL:F) content structure, rhetorical structure, layout structure, navigation structure, linguistic structure basic vocabulary of the page, layout unutis, ... xml, multilayered annotation ==> non-time based annotation -> no tools for it! gradually transfer the implicit spatial information in the visual image to explicit representational structure import from XSL:FO RST and Layout Structure often diverge => generally layout consequences Xalan-J, XSLT, XSL-FO, FOP, pdf break conditions ofr paricular genre derive genre constraints problem of working in XSLT framework notion of the virtual canvas
13.45-14.15	Charlotte van Hooijdonk, Emiel Krahmer, Fons Maes Tilburg University Mariët Theune and Wauter Bosma University of Twente Cognitive processing and representation of hyperlinked documents Information presentation in a multimodal environment Experience concerning Culture and Web Design	Towards Automatic Generation of Multimodal Answers to Medical Questions: A cognitive engineering approach IMIX INteractive Multimodal Information eXtraction IMOGEN answer modalities media allocation problem experiment: obtain a corpus of (multimodal) answers to different types of medical quastions decorational function of media (no informativity), representational (if removing it does not alter informativity, but presence make it more clear), additional function (if removing it alters the informativity) Cohen's scores of agreements of annotators
14.15-14.45	Christopher Habel and Cengiz Acartürk University of Hamburg Interests Representation of knowledge about space, time and events Language comprehension and language production Granularity Projects International Research Training Group CINACS Discourse and Event Structure Multimodality: representation und communication Extraction of temporal, spatial and causal information from news messages Linguistic and multimodal routes description Axiomatics of Spatial Concepts Conceptualizing Processes in Language Production (ConcEv) Texte Synphonics	On Reciprocal Improvement in Multimodal Generation: Co-reference by text and information graphics combining text & information graphics co-reference in multimodal documents improvement: the role of comprehension in producing complex multimodal documents the conceptual and lexical bases of cross-modal comprehension text: written language vs. speech, monological text vs. dialogues figures: information graphics: line graphs, bar charts; drawings, photographs tables: two dimensional text equations, formulas: part of text vs. separated from text combining modalities is good for sensory substitution Pinker's model of graph comprehension
14.45-15.15	Somayajulu Sripada and Feng Gao University of Aberdeen	Summarising Dive Computer Data: A case study in integrating textual and graphical presentations of numerical data presenting numerical data in text and graphics quantitative information (QI), always presented using graphical displays do the reduction, and than present reduced data data summaries integrating TEXT and GRAPHICS could releive data overload Scuba Diving Dive Computer (DC) - swatch; records all the data about the dive
15.15-15.30	Closing
15.30-16.00	Coffee/Tea