Past research in human-computer interaction provides evidence that the use of
multiple output modalities makes systems more robust and efficient to use. A
quick and successful interaction is expected when, for instance, the system's
output is presented to the user via multimedia/hypermedia in which text and
graphics are merged, or by a conversational agent that combines the use of
speech and gesture. In such multimodal systems sophisticated specifications are
needed to combine the different modalities in such a way that each bit of
information is presented in the most appropriate manner (i.e., the system should select the most suitable modalities and modality combinations to convey
information to the user).
Work on multimodal output generation is spread across disciplines and
separate subfields such as multimodal natural language generation, the
generation of multimedia/hypermedia presentations, and research on output
modalities for conversational agents. Presentation of this work has been
scattered across various events. One of our objectives of the MOG 2007 workshop
is therefore to bring this work together in one workshop that aims to present
the state of the art, and identify future research needs, in multimodal output
generation by focusing on two research questions:
Workshop was very well organized and very interesting. All the presentations
were well prepared, with many people participating in discussions.
Although the CfP for the workshop included general problems of multimodal
interaction, most of the papers addressed natural language processing and
virtual characters. However, all the talks were interesting, and some of them
are particularly interesting for INS2, such as:
In the rest of the report, I give the list of authors and some notes on their
presentations. For main author, I have provided a link to his home page, with a
list of his general research interests.
|
Thursday January 25th 2007 |
Location: Meston MT02 |
9.00-9.25 |
Registration in the foyer of the Meston building |
9.25-9.30 |
Welcome |
9.30-10.15 |
Jon Oberlander
University of Edinburgh
|
Invited talk:
What Are You Looking At? A Personal View on Multimodal
Output Generation
- users prefer virtual characters to text, but not to pure voice: T < A
< S
- RUTH talking head, Festival2
|
10.15-10.45 |
Coffee/Tea break |
|
10.45-11.15 |
Erwin Marsi and Ferdi van
Rooden
Tilburg University
|
Expressing Uncertainty with a Talking Head in a Multimodal
Question-Answering System
- Main question: how to add cues of uncertainty to a
talking head when it is answering question whit or without certanity
- Problems with experiments and evaulations
|
11.15-11.45 |
Dr. Jan Peter de Ruiter
Max Planck Institute for Psycholinguistics, Nijmegen
- Durational Aspects of Turn-Taking in Spontaneous
Face-to-Face and Telephone Dialogues
|
Some Multimodal Signals in Humans
|
11.45-12.15 |
Markus Guhe
University of Edinburgh
- Cognitive processes underlying communication
- Processes and mechanisms with which communicative
intentions are expressed. Mainly language, but also encompasses other
modalities: gesture, facial expression, body posture, expression of the
affective state, non-linguistic vocal expression
- Building computational cognitive models to gain
scientific insights
|
Towards a Cognitive Model of Multimodal Output for Language Production
- where to put multimodal fission
|
12.15-13.15 |
Lunch break |
13.15-14.00 |
Harry Bunt
Tilburg University
|
Invited talk:
Towards Standardization in Semantic Annotation
- as undertaken by the ISO organisation
- giving standard for multimodal representation (ISO
group)
- recent workshop
- ISO, LIRICS projects, ACL SISEM Working Group...
- ACL-SIGSEM Working Group on the Representation of
Multimodal Semantic Information
- 2005, LIRICS Linguistics...
- "to prepare int. standards and guidelines for effective
language resource management in the multilingual information society"
- - diversity of theoretical approaches
- - limited researchers' freedom
- + reuse and integration of language resources from
different sources
- ISO semantic annotation
- registry of data categories: temporal information,
reference relations, semantic roles, dialog acts, discourse relations
- LIRICS (European), lirics.loria.fr
- temporal information, reference relations, semantic
roles, dialog acts, but NOT discourse relations
- no semantic annotation without a semantics
- trying to design a metamodel is a useful approach to
see differences/similarities among approaches
- Defining semantic roles
- approaches to semantic role: description model
(event/verb-dependent), semantic granularity (coarse, medium, fine)
- roles of frame-net
- semantic roles metamodels
- current work: test/validate in annotation experiments
- Data categories for reference annotation
- central notion is the "markable"
- additional relational and objectal relations (in
additional to lexical - synonymy, hyponymy...)
- punctual and extended events
- ISO-TimeML
- Dialog acts (favorite subject)
- dialogue, turns, sender, overhearer, addresses,
utterances, dialog act, semantic content, communicative function
- communicative functions stressed by dialog
- utterances have multiple functions ==>
multidimensionality
- DAMSL
- what is a dimension in dialog: it can be addressed by
means and independently of other aspects
- feedback, turn-taking, time, contact attention,
opening, closing
- general purpose functions (to any dimension): informs,
question
- contact management, auto-feedback...
- pilot testing: for usability by multiple annotators
with little training
- segmentation in dimensions, not in dialog...
- dimension specific functions
|
15.00-15.30 |
Coffee/Tea break |
15.30-16.00 |
Yulia
Bachvarova,
Betsy
van Dijk and Anton
Nijholt
University of Twente
- Working on PhD thesis which focuses on developing
formal, computational model of how different modalities communicate within
a multimedia presentation. This model is used in automatic
generation of multimedia presentations to provide the generation engine
with the required knowledge base and algorithms to properly assign the
appropriate modality combinations.
- ICIS/CHIM
project
- Bachvarova, Y. and Elouazizi, N. (2005).
Integrating Knowledge about Modalities
to a Multimedia Knowledge Representation Framework.
In Proceedings of the Second International Workshop on the Integration of
Knowledge, Semantics and Digital Media Technology. EWIMT 2005.
Published by the Institution for Electrical Engineering, IEE, London. Pp.
133-138.
- Bachvarova,Y. and Elouazizi, N. Conceptual Argument
For a Modality Ontology to Support Automatic Modality Assignment. In:
Proceeding of the Workshop on Multimodal Interaction for the
Visualization and Exploration of Scientific Data, International Conference
on Multimodal Interfaces (ICMI 05), Trento, Italy, October, 4-6, 2005
- Elouazizi, N., and Bachvarova,Y. On Cognitive
Relevance in Automatic Multimodal Systems. In Proceedings of the
Sixth IEEE International Symposium on Multimedia Software Engineering (ISMSE
’04) (Miami, Florida, USA, December 13-15, 2004). IEEE Computer
Society, Los Alamitos, California, 2004, 418-426.
- Floris Wiesman, Stefano Bocconi, Boban
Arsenijevic, Yulia Bachvarova, Nico Roos, and Lambert Schomaker.
Intelligent Information Retrieval and Presentation with Multimedia
Databases. In: Proceeding of the Fourth Dutch-Belgian Information
Retrieval Workshop (pages 52-56), Institute for Logic, Language and
Computation, University of Amsterdam, Amsterdam, Netherlands, December,
8-9, 2003, Note: Edited by A.P. de Vries
|
Towards a Unified Knowledge-Based Approach To Modality Choice
- NOT PRESENT AT THE WORKSHOP
|
16.00-16.30 |
Željko Obrenovic,
Raphaël Troncy and
Lynda Hardman
Centrum voor Wiskunde en Informatica, Amsterdam |
Vocabularies for Describing Accessibility Issues in Multimodal User
Interfaces
|
16.30-17.00 |
Charles Callaway
University of Edinburgh
- Natural Language Generation, research, as well as its
integration into cultural activities, learning environments, interfaces
for animated agents, and large-scale generation projects.
- Discourse planning, sentence planning, surface
realization, document planning, revision, multimodal explanation
generation, spatial expressions, pronominalization, discourse markers,
self-explaining documents and multilingual generation
- PhD Thesis:
- Narrative Prose
Generation
North Carolina State University, 284 pages, Raleigh, North
Carolina, April 2000.
(abstract,
pdf version,
LaTeX Bib)
|
Non-localized, Interactive Multimodal Direction Giving
|
17.00-18.00 |
Drinks |
19.00-? |
Burns Night |
Lemon Tree |
|
Friday January 26th 2007 |
Location: Meston MT02 |
9.00-9.45 |
Elisabeth André
University of Augsburg, Germany |
Invited talk: From Annotated Multimodal Corpora to Simulated
Human-like Behaviors
- Variants of Information Presentation Systems with
Virtual Characters
- TV style, role plays, face-to-face dialogs, multi-party
dialogs (future)
- how to acquire knowledge about multimodal human-human
communication (intuitively, not model), how to code such knowledge, how to
implement such behaviors in an embodied conversational agent
- data, models and ECA: analysis-by-observation,
analysis-by-synthesis ... synthesis by observation
- always good to start from models, and then implement it
and test it
- capturing knowledge on human-like behaviors, motion
capturing, study of literature, video recordings
- use a corpus to derive typical behaviors
- use a corpus to compare human-human and human-agent
communication
- humaine European network of excellence: emotions in
man-machine communication
- facial action coding system: automatic generation of
mimics based on MPEG-4 standards
- Ekman 1992 model, four ways how people lie:
micro expressions, masks, timing, asymmetry
- effect appears only in situations which allows them to
fully concentrate on agent's face
- Modeling politeness: people apply politeness norms when
talking to computers, users feel better if computer appear polite, book by
Brown and Levinson
- gesture classes: hand/arm/movement, non-communicative
(adaptor), communicative (emblem, deictic, illustrative (iconic,
metaphoric),
- annotation of corpora
- Anvil annotation tool
- new scheme for annotation politeness gestures
- how to exploit the knowledge to control the behaviour
of an ECA: copy-synthesis approach, over-generate and filter, derivation of
rules
- cross-cultural aspects of politeness: American vs.
German students, statistically significant, slight differences were manly
cause by problems in translation
|
9.45-10.15 |
Coffee/Tea break |
10.15-10.45 |
Mary Ellen
Foster
Technische Universität München
- Research interests:
- Multimodal generation
- Generation in dialogue systems
- Example-driven generation
- Variation in generation
- Practical implementations of all of the above
- Current projects:
- JAST
- Joint Action Science and Technology
- Previous projects:
- COMIC
- Conversational Multimodal Interaction with Computers
|
Issues for Corpus-based Multimodal Generation
- corpora in text generation, multimodal corpora
- corpus provide guidance for human developers
- more direct use: design decision making, automated
evaulation (cross-validation)
- role of variations
- multimodal corpora: recorded annotated collection of
human behavior, annotated on multiple layers with imlicit and explicit
links between the layers
- uses of multimedia corpora: analysis, indexing and
retrieval, summarisation, generation
- non-verbal behavious for ECAs, using human behavior to
decide how an agent should work
- contextual information: characteristics of the speech
signal, information structure and affect, motion and error type, intended
prosody, syntactic structure, dialog history, user model
- representing context
- this could also be interessting contribution for
canonical processes of media production: Canonical Processes of
corpus-based multimodal generation
|
10.45-11.15 |
Dirk Heylen
University of Twente
- Generating Expressive Speech for Storytelling
Applications
|
Multimodal Backchannel Generation for Conversational Agents
- Sensitive Artificial Listener
- nose tracker device
- interest in listening heads
|
11.15-11.45 |
Paul Piwek
The Open University
- Research Theme: Dialogue and Natural Language
Generation
|
Modality Choice for Generation of Referring Acts: Pointing
versus Describing
- AIM: challenge two assumptions common in generation algorithms for
multimodal referring acts: non-verbal means of referring are secondary to
verbal means, there is single strategy for this
- previous work: point when you cannot express it with words
- cost of pointing
|
|
Adrian Bangerter and
Eric Chevalley
University of Neuchâtel
- Research interest:
-
Social interaction and practices in
selection and appraisal interviews
-
Coordination in collaborative work
-
Group processes and teamwork
-
Discourse and conversation
analysis of task-related communication
-
Interplay of language and non-verbal
communication (gesture)
-
Social representations, Diffusion of
ideas and social construction of knowledge
-
Life course research
|
Pointing and Describing in Referential Communication: When
Are Pointing Gestures Used to Communicate?
- study gestures used in communicative situations
- show audience design
- gestures may be functional both for speakers and for communication
- question is not weather or not, nut when gestures communicate
- functions of pointing gestures
|
12.15-13.15 |
Lunch break |
13.15-13.45 |
John
Bateman and
Renate Henschel
University of Bremen
- Research
- Ontologies, particularly for natural language
- Discourse structure
- The presentation of information combining
presentation modalities: texts, pictures, graphics, layout, video and so
on.
- The automatic production of natural language texts
and discourse
- Various aspects of SFL, particularly on the
intersection of SFL and computational linguistic description.
|
Generating Text, Diagrams and Layout Appropriately According
to Genre
- GeM project
- communicative artifacts that adopt a page metaphor are
combining an increasing array of simultaneous modes
- systematic means for exploring kinds of means
- Twyman's classification of the combination of modes in
documents (pure linear, linear interrupted, list, linear branching,
matrix...)
- things organized about space
- Genre and multimodality
- Kress and van Leeuwen, Waller Rod, Martin (genre)
- Kress / van Leeuwen a semiotic mode
- waller's model of document design
- corpus basics, GEM annotation scheme (CSS3, XSL:F)
- content structure, rhetorical structure, layout
structure, navigation structure, linguistic structure
- basic vocabulary of the page, layout unutis, ...
- xml, multilayered annotation ==> non-time based
annotation -> no tools for it!
- gradually transfer the implicit spatial information in
the visual image to explicit representational structure
- import from XSL:FO
- RST and Layout Structure often diverge => generally
layout consequences
- Xalan-J, XSLT, XSL-FO, FOP, pdf
- break conditions ofr paricular genre
- derive genre constraints
- problem of working in XSLT framework
- notion of the virtual canvas
|
13.45-14.15 |
Charlotte van Hooijdonk,
Emiel Krahmer,
Fons Maes
Tilburg University
Mariët Theune and
Wauter Bosma
University of Twente
- Cognitive processing and representation of hyperlinked
documents
- Information presentation in a multimodal environment
- Experience concerning Culture and Web Design
|
Towards Automatic Generation of Multimodal Answers to
Medical Questions: A cognitive engineering approach
- IMIX INteractive Multimodal Information eXtraction
- IMOGEN
- answer modalities
- media allocation problem
- experiment: obtain a corpus of (multimodal) answers to
different types of medical quastions
- decorational function of media (no informativity),
representational (if removing it does not alter informativity, but
presence make it more clear), additional function (if removing it alters
the informativity)
- Cohen's scores of agreements of annotators
|
14.15-14.45 |
Christopher Habel and
Cengiz Acartürk
University of HamburgInterests
-
Representation
of knowledge about space, time and events
-
Language
comprehension and language production
- Granularity
Projects
|
On Reciprocal Improvement in Multimodal Generation:
Co-reference by text and information graphics
- combining text & information graphics
- co-reference in multimodal documents
- improvement: the role of comprehension in producing
complex multimodal documents
- the conceptual and lexical bases of cross-modal
comprehension
- text: written language vs. speech, monological text vs.
dialogues
- figures: information graphics: line graphs, bar charts;
drawings, photographs
- tables: two dimensional text
- equations, formulas: part of text vs. separated from
text
- combining modalities is good for sensory substitution
- Pinker's model of graph comprehension
|
14.45-15.15 |
Somayajulu
Sripada and Feng Gao
University of Aberdeen |
Summarising Dive Computer Data: A case study in integrating
textual and graphical presentations of numerical data
- presenting numerical data in text and graphics
- quantitative information (QI), always presented using
graphical displays
- do the reduction, and than present reduced data
- data summaries integrating TEXT and GRAPHICS could
releive data overload
- Scuba Diving
- Dive Computer (DC) - swatch; records all the data about
the dive
|
15.15-15.30 |
Closing |
15.30-16.00 |
Coffee/Tea |