ACM Multimedia 2005

8-10 November 2005, Hilton Hotel, Singapore
Author: Lynda
CWI participants: Frank, Dick Bulterman (SEN5))
# participants: 320
s/ACMMM05/URL to ACM DL/

Overall impression

Tuesday

Opening keynote: Yutaka Tanaka (NHK, Japan Broadcasting Corporation), Future of Home Media

(In the work we are doing on modelling processes of multimedia, we discuss that archiving includes a process of selection. The value of the archive is increased by not storing everything. The value of this trip report is being increased by not including too much information on this keynote.)
Program metadata allows anytime viewing. (There was a video demo with a guy in a suit explaining interactive program selection to a well-dressed woman who stood and nodded the whole time. I don't often feel insulted.) The talk is very close to everything going on in Passepartout, but other than the use of the word "metadata" there was nothing new.
"NEXT": "To View, To Know, To Use"
They are using face recognition, speech analysis and NL processing to add metadata to video content. They use MPEG-7. Even worse they talk about metadata and metadata production in the video - destined for the general public I guess. Also a demo of two animated characters talking about and gesturing to images (of Vietnamese food). (At least the female figure had an equal role to the rabbit....) Dynamic 3D modelling. Animation of a Noh play. Presenter talking with 3D puppets. Morphovision. 3D model of house which then deforms. (I'm not sure what the goal is.) NL interface to talk to TV program agent. Close to the work we are doing - ask the TV for certain information and it comes up with a list of potentially relevant programmes (IR related I guess). Cute ultra-thin TV screen. New standard for TV images - 7000x4000 pixels - 16 more than HDTV. (The interesting part of the talk is perhaps that what was research 5-10 years ago is now being included in real-life television.)

Brave New Topics 1: Multimedia Challenges for Planetary Scale Applications

Brave new topics are about asking questions, not giving answers.

IrisNet: An InternetScale Architecture for Multimedia Sensors

Connect millions of sensors to internet. Parking spasce finders. Where did I leave my umbrella/cat/child/aging parent. Avoid congestion in traffic/swimming pool/postoffice. Imagine this is one centralised database and get back the answer you want. Two components: SAs (sensor feed processing) and OAs (distributed database). Distributed XML database, user can pose a query. Data is organised as logical hierarchy [he means taxonomy]. Partioned among OA databsse nodes. Parking space finder is 500 lines of code.
This is also a good thing to include in our modelling special issue. This is at the capture end and on the problems of which ontology you are using to label the data.

The Multimedia Challenges Raised by Pervasive Games

"Can you see me now" players and online players. "Uncle Roy all around you". Find where Uncle Roy is within an hour. "Savannah" kids on school field being lions in the savannah.
One of the challenges is interacting in public. There are large numbers of public around the players. Move gamne from one city to another, need to build new model of city and make sure GPS works. Scale is also a problem.
Create "Hitchers" on continental scale. Pick them up and then drop at destination. Associate info with hitchers inside database.

PLASMA: A PLAnetary Scale Monitoring Architecture

Multimedia challenges of global sensor networks. Integration of high-level large scale sensor data. Everything displayed has to be consistent in terms of location and time. Also need suitable user interface. In some sense close to the goals of our part of n9c but she is more interested.

Gates of Global Perception: Forensic Graphics for Evidence Presentation, D. Schofield

I can imagine that Stefano is particularly interested in this (c.f. Vox Populi).
Graphics have come into courtrooms which brings problems with it. Courtrooms working at global level not just nationally - trans-national courtrooms. Multiple jurisdictions. Most are tried by peers. Evidence collection at crime scenes with mobile phones. Who owns it, audit trail? Oral -> literal -> visual and oral. In courtrooms the jury has gone from oral to visual and they don't get the literature.
Seeing is believing. If people see an animation "what-if" then people will be inclined to believe it.
Eye witness testomy carries greatest weight and is most unreliable.
How are decision making abilities changed by the use of media? (If we get it wrong then people will be locked up/set free incorrectly.)

How do you explain how accurate computer-based facial recognition is? Or facial reconstuction.
How accurate is a simulation of what happened? Speed of motorbicycles; smoke distribution.
12 minute 3D virtual environment of what happened at Birmingham shooting. Contradictory evidnce from hostile witnesses.

Advantages of digital media in court: persuasiveness, comprehension increase, efficiency, XXX
Disadvantages: persuiasiveness, XXX, XXX, XXX
Mark Drummond: Eight keys to the art of persuasion. Pick and lick tie to stop people listening to the defence lawyer. (I am not kidding...)
They study film theory and emotive effects, since they are trying to remove them from the animations and virtual realities they create. Perception, understanding and learning of new media.

Demos

I need to look up the demos I saw.

Video demonstrations and visions

All videos are on the SIGMM web site.

How Speech/Text Alignment Benefits Web-based Learning

My Digital Photos: Where and When?

Continuation of work on location-based research and retrieval of photographs. "Scraping" weather and location information. 12,000 photos.

Post-Bit: Embodied Video Contents on Tiny Stickies

Cute vision film of communicating and viewing video clips.

Natural Video Browsing

MMM2: Mobile Media Metadata for Media Sharing

Media Gallery TV — View and Shop your Photos on Interactive Digital Television

S. Thieme (CeWe Color AG), A. Scherp (Oldenburg R&D Institute for Computer Science Tools and Systems), M. Albrecht, S. Boll (University of Oldenburg)

Posters

Cooking Navi: Assistant for Daily Cooking in Kitchen

Our friends from NII: XXX Shin'ichi Satoh, plus some other authors. This is really cool. It is a bit cool because they are using video plus hyperlinks to the next step in a recipe to aid the cook in the kitchen. (Ideal for Stefano?) However, and this is the really cool bit, if you have 2 or more recipes that need to be ready at the same time, then the system merges the steps so you end up with everything ready at the same time. (You just need a wipeable screen mounted on the kitchen wall.) They have very little underlying logic (almost none to be honest), but it looks good and it works. They promised to send me/us the video film.

Personal Media Sharing and Authoring on the Web

This seems relevant, but just as it was getting interesting (after all the content analysis) it sort of stopped.

Automatic Generating Detail-on-Demand Hypervideo Using MPEG-7 and SMIL

They don't cite us once (in fact, they don't cite anyone...).

Automatic Video Annotation using Ontologies Extended with Visual Information

I missed this, but saw XXXLaura chatting to the poster giver.

Wednesday

Panel: What is the state of our community?

The panel was a bit frustrating. Reminded me of the navelstaren of the Hypertext conferences. Why won't industry listen to us? Where are our books? What is our core curriculum. Marc Davis said the sensible thing that we need more than the technology to be multimedia, and have to include the point of view of the user and those that have content/media. Do we need another (second tier) conference? ICME is 45% acceptance, of which 20% are oral presentations (as was Stefano's). Discussion on reviewing process, double blind reviewing, reviewers shouldn't just say that the author should cite their work. "This is one big psychoanalytical session...Inferiority complex." (Hmm, sounds familiar.)

Demos

LazyCut - Content-Aware Template-Based Video Authoring

Gave Xian-Sheng Hua Stefano's, Katya's and Joost's name so he can look at their work. They come from the content analysis side, but say they have an "end-to-end system, which enables fast, flexible and personalized video authoring and sharing. LazyCut provides a semi-automatic video authoring and sharing system that significantly reduces users' efforts in video editing while preserving sufficient flexibility and personalization." Three of their 5 references are self references, but none-the-less worth checking out for inclusion in literature lists: "Home Video Made Easy - Balancing Automation and Use Control", "AVE - Automated Home Video Editing", "Personal Media Authoring and Sharing on the Web".

Contextdriven Smart Authoring of Multimedia Content with xSMART

"With our Context-aware Smart Multimedia Authoring Tool (xSMART) we developed a semi-automatic authoring tool that integrates the targeted user context into the different authoring steps and exploits this context to guide the author through the content authoring process." Sound familiar? This is Susanne Boll's work - she keeps track of our work, but seems to have missed Katya's work (maybe the deadlines crossed?).

Media Processing Workflow Design and Execution with ARIA

This sounds more Passepartout like: "an Intelligent Stage to allow performers to have real-time control of the stage, by detecting and classifying the movement of the performers and responding by changing environmental elements to achieve spontaneous lighting and sound effects."

Thursday

Friday

To-Do List for when I get home