Thesis Proposal

by Stefano Bocconi

Introduction

The research I will be doing in the multimedia group led by Lynda Hardman will be in the field of Intelligent MultiMedia Presentation Systems (IMMPS). This once hot topic still remains pretty central to the group activity and has led to the development of a multimedia presentation generation system named Cuypers as a way to research/experiment multimedia presentation issues.

Cuypers' intended goal is to make automatic the process of creating a presentation from preexisting media items. The problem that Cuypers tries to solve is that of breaking the monolithic structure of a multimedia presentation to offer the user adaptability to his/her situation without human intervention.

A multimedia presentation is often a very specific-tailored product of a human creator which fits well the target-user the creator has in mind, but does not fit well all the other users and those, in a deeply information-connected society, turn out to be the majority of the audience.

Factors that can compromise the effectivity of a presentation are as trivial as different screen dimensions w.r.t. the design, low bandwidth, user language, or more complex like user preferences (colors, fonts), user expertise and user goals.

While text research has found solutions for many of the problems caused by the above-mentioned factors by determining the building elements of a text and their properties (e.g. headings, paragraphs, characters), multimedia is very often handled as a whole, so that the user is forced to take a multimedia presentation "as is" even if it is not suited for his/her situation.

We could then say that Cuypers tries to find the building blocks of a multimedia presentation and their properties, as well as include explicitly the rules a human creator would use in order to combine those building blocks in a "sensible" (from the communication point of view) way.

Where we stand today

In my opinion the biggest progress has been booked by Cuypers in the field of flexibility with respect to layout and bandwidth constraints, while in general flexibility with respect to designer wishes, user goal and user preferences still need to be achieved.

Motivation for the current research

As often said, the amount of information available is steadily growing while the structure of it is gradually fading. If once information reached people in a sort of monochannel and authoritative way, like via television or newspapers, now with the internet the information-gathering process is such that a considerable and continous user action is required in order to get the information, possibly hidden in some unknown digital data repository on the Web.

This is why a lot of interest has been focussed on searching the right information, and the current Semantic Web initiative can also be seen as (or should also results in) a better support for information retrieval.

If we move one step further, i.e. we assume that information more or less pertinent has been retrieved, the easiest strategy is to present the user with a list of (links to) information items, trying to present on top of the list the information that better matches the user request.

If we want to present the different information items in a more natural way (so not with a list), we need some criteria to "tell our story". That is what motivates the research of a discourse model that can guide the process of creating a strucured presentation composed by the information retrieved.

Proposal

My research will be concerned with the problem of composing semantically annotated media items in a abstract presentation structure, where the term abstract means that the focus will not be on the layout of the presentation but on the way the media items must be organized to meet the communicative goal selected by the user.

Research Questions

Why do we need a discourse model?

This is the problem statement, and, as said above, search for information is just the first step, how do we present the information?

Did previous IMMPS use a discourse model? If not, why not?

This could give an overview of the IMMPS history and if/how they used discourse

To what extent can a model of discourse be captured? To what extent can a discourse model capture the presentation designer's intentions? How effective is to use a discourse model to generate presentations? How do different discourse models affect the generated presentation?

The need for guidance in presenting the information alone does not motivate the choice of a discourse model as a solution. Is a discourse model an effective way of capturing a designer's intentions? Can we model a discourse to such an extent that it can be used by a system generating presentations? In what way is the final presentation dependent on the discourse?

Given a fixed source (fixed number of media items), is it possible to generate different presentations in terms of discourse model? Can the media items used in the presentation be completely independent from the discourse model?

This point raises the issue of whether we can couple discourse models and data sets without having to worry if they fit together. What requirements (if any) does the discourse set on the media items? What limitations (if any) do the media items set on the discourse?

Can a (theoretical) presentation generation system reason only in terms of a discourse model? How domain independent is this approach? To what extent does the nature of hypermedia influence the possible applicable discourse model?

How are the conclusions we found limited to hypermedia presentations generated in the Museum domain?

Application fields

The immediate field of application is in Cuypers within the proposed architecture. I would also like to see in how far the result/ principles found with the Museum Domain are also applicable to generating a presentation from annotated video segments, e.g. with Interview with America. This could provide insight into one of the research questions, namely how domain independent the conclusions we find are.

No go areas

No feature extraction from the media items: all the knowledge about an item is in its metadata
No media (text included) item generation
No database technology

Chapter outline of thesis

Chapter 1

Introduction

Problem statement/ Motivation

Research questions

Chapters Outline

Chapter 2 (Definition, Scope and Historical Background)

What do we mean with discourse model.

This is important because it will give the boundaries within which the research will take place. In my opinion the discourse could include the research topics now described as Presentation Abstraction and Presentation Flow in Cuypers, but maybe some domain restrictions will be necessary.

Literature survey of discourse models

This will be the overview about what models are available.

Literature survey of modelling tools and languages

This is about the possible implementations, what kind of technologies we can use.

History of the (most important) IMMPS and whether they used discourse techniques, why, why not.

How the others did it.

Chapter 3 (How to use it, What discourse models to use, What technologies to use)

What a discourse model can do for us

The effectivness of a discourse model to serve as guidance in generating a presentation.

Models we consider suited for our scope/goal

From the discourse model examined in chapter 2, the following ones are suited for our purpose because ....

Modelling tools and languages we consider suited for our scope/goal

From the technologies examined in chapter 2, the following ones are suited for our purpose because ....

Examples (with Cuypers hopefully)

Example discourse and domain 1
Which discourse model?
Which method of incorporating it in the system?
It worked because, it didn't work because

Example discourse and domain 2
It worked because, it didn't work because

Chapter 4 (Generalisations vs Dependencies between discourse, data, domain)

Data and discourse

This chapter should make explicit all possibly existing requirements the data items must satisfy so that we can use discourse techniques, and point out whether these requirements are general for all discourse model or specifically to each discourse model.

Examples are:

Requirements on each media item, like for ex. richness of annotations (metadata)
Requirements on the relations between media items, like presence/absence of particular relations, or over the "amount" of relations available (maybe a critical mass is needed to be able to use a discourse)

This should answer the questions: What requirements set the discourse on the media items? What limitations set the media items on the discourse?

Discourse and Domain

Can we abstract our conclusions from the Museum domain? How about video (IWA)? Can we still generate a presentation based from video fragments based on discourse techniques?

Discourse and Multimedia

How specific to Multimedia are our results? How does the nature (or what aspects) of Multimedia influence the discourse?

Conclusions

How were research questions answered.

Future research directions

Literature

Some hints:

Multimedia

Intelligent Multimedia Interfaces - Mark T. Maybury 1993
Koegel Buford, John F. (1994). Multimedia Systems. Addison Wesley.
The International Journal on the Development and Application of Standards for Computers, Data Communications and Interfaces. Volume 18, Numbers 6 and 7, December 1997. Special Issue Intelligent Multimedia Presentation Systems.
Davis, Marc E. - Media Streams: Rrepresenting Video for Retrieval and Repurposing. Ph.D. Thesis February 1995. Massachusetts Institute of Technology.

Knowledge Representation:

Sowa, John F. (2000) - Knowledge Representation - Logical, Philosophical and Computational Foundations.

From Frank on Narrative:

Brooks KM (1999).
Metalinear Cinematic Narrative: Theory, Process, and Tool. <http://ic.media.mit.edu/icSite/icpublications/Thesis/brooksPHD.html> MIT PhD Thesis
Black, J. B., & Bower, G. H. (1980). Story understanding as problem solving. Poetics, 9, 223 - 250.
Black, J. B., & Wilensky, R. (1979). An evaluation of story grammars. Cognitive Science, 3, 213 - 230.
Bordwell, D. (1989). Making Meaning - Inference and Rhetoric in the Interpretation of Cinema. Cambridge, Massachusetts: Harward University Press.
Chatman, S. (1978). Story and Discourse: Narrative Structure in Fiction and Film. New York: Ithaca.
Lehnert, W. G. (1983). Plot Units: A Narrative Summarization Strategy.
In W. G. Lehnert &. M. H. Ringle (Eds.), Strategies for Natural Language Processing (pp. 375 - 412). Hillsdale, New Jersey: Lawrence Erlbaum Associates.
Lehnert, W. G., Dyer, M. G., Johnson, P. N., Yang, C. J., & Harley, S. (1983). BORIS - An Experiment in In-Depth Understanding of Narratives. Artificial Intelligence, 20, 15 - 62.
Propp, V. W. (1968). Morphology of the Folktale. University of Texas Press.
Ricoeur, P. (1985). Time and Narrative. Chicago: The University of Chicago Press.Schank, R. C., & Abelson, R. (1977).
Scripts, Plans, Goals And Understanding. Hillsdale, New Jersey: Lawrence Earlbaum Associates.
Schank, R. C., Kass, A., & Riesbeck, C. (1994). Inside Case-Based Explanation. Hillsdale, N.J.: Lawrence Erlbaum Associates.
Wilensky, R. (1983b). Points: A Theory of the Structure of Stories in Memory. In W. G. Lehnert & M. H. Ringle (Eds.), Strategies for Natural Language Processing (pp. 345 - 376). Hillsdale, New Jersey: Lawrence Erlbaum Associates.
Wilensky, R. (1983c). Story grammars versus story points. The Behavioral and Brain Sciences, 6(4), 579 - 623.
Wilensky, R. (1990). A Model for Planning in Complex Situations. In J. Allen, J. Hendler, & A. Tate (Eds.), Readings in Planning (pp. 263 - 274). San Mateo: Morgan Kaufmann Publishers.

Personal interests

In this months I have seen that my interests are focused in general toward the "abstract reasoning" field. Some examples are:

Discourse structure (how do I tell my story?)
Knowledge representation and Semantics (also web-enabled)
Reasoning on retrieved data semantics in order to compose a presentation
Principle of Compositional Semantics (a la Marcos)

These points coincide more or less with the above layers of the proposed architecture for Cuypers

In my opinion my research will involve the following steps:

Read about discourse and model its characteristics
Investigate which abstract elements play a role when presenting a topic to a user, and relate them to each other and to the user goal
Investigate in what Presentation Flow structures these abstract elements can be organized to be presented to the user.
Investigate the dependencies: from the data, from the domain and from Multimedia.

Of the above mentioned steps, my interest goes to the second one and, were I to choose among them, that would be it.

Required knowledge

Knowledge relative to Discourse and Narrative.
Knowledge relative to Knowledge Representation and Semantics
General knowledge about Multimedia , in particular IMMPS.

Devil's advocate

Didn't we see all this before, given the fact that Discourse, knowledge representation and reasoning are old research topics?

Yup.
However, your contribution is to take the "woolly" semantics of discourse, narrow it down to something that can be applied computationally in the creation of multimedia presentations. You then get to use existing tools for implementing it. You aren't trying to create new discourse models (although I suspect that the work may creep into that terrain). You aren't trying to
create new KR&R tools (see Frank van Harmelen :-), but you do want to find the most appropriate for your/our problem.

What is new in here?

Putting a discourse model in the system.
(You now about Eliza? The computer psychotherapist? "She" would fool people into thinking that she was listening and talking to them. There was no explicit dicourse model.)
Also, discourse model for dynamic time-based media along with hyperlinks. (My thesis was _only_ about adding links to time-based media...)

What are the research questions?

Hey - this was your job...

"To what extent can a model of discourse be captured?"
"How do different discourse models affect the generated presentation?"
"To what extent do media items need to be annotated with attributes from the discourse model?"
"To what extent does the nature of hypermedia influence the "flow" of the discourse model?"

Have a look at Hongjing's research questions.
Somewhere in /ufs/lynda/tmp/Hongjing/main2.pdf (It's my scratch disk - you probably need to log in to mensa first?)
They are not world shattering, but clearly stated, and answered within the thesis.

Did not Frank do all this yet? Did not Frank do everything yet, but just would not tell us?

Of course he has done all this and will not tell us. It is thus our job to reveal it to the world...

Firstly, which discourse model are we going to use to start with?
(Everyone "disses" RST..) Is there an existing model that we can pluck from the shelf? (I doubt it.) What needs to be done to the models to improve them.

How am I going to find a real family-supporting job with this up-in-the-sky research

Nae chance. But seriously. What would you want to do afterwards? The chances of doing a fun postdoc somewhere are fairly high (given research will still be paid for 3 years from now). Have you already come across groups you would be interested in? Europe? USA? Amsterdam?? Experience shows that getting to know people and the opportunities around is vital.