10-16 October 2004, Columbia University, New York, USA
Author: Lynda
CWI participants: Frank, Stefano, Dick Bulterman (SEN5), Kees Blom
(SEN5) and Patrick Schmitz (Ludicrum Enterprises*)
# participants: 400
Multimedia 2004
I must confess I was very pleasantly surprised. There was a definite presence of people interested in media and (people-oriented) communication and also many interested in art aspects. "Knowledge-based multimedia" (my term, not theirs) is here to stay.
Groups that made an impression on me were Marc Davis (who managed to present his stuff four times - tutorial, Brave New Topics session, panel and video session); Hari Sundaram, David Birchfield and Preetha Appan (doing her masters) at ASU (Arizona State University); Chitra Dorai IBM and Sveta Venkatesh. These are all old buddies of Frank's and are still doing good work.
I am not completely happy about this report. It seems to contain too much "this is what I think" and not enough "this is what I saw". Perhaps I should have left out more of the stuff I attended that isn't really relevant, but on the other hand, it is interesting to get a feel for the breadth of the conference. Comments and additions welcome. Stefano and Frank - please add bits where you think there are useful things to say and put in disagreements where appropriate. For the brave, my raw note are available at ~lynda/Trip-reports/acmmm-notes.txt
Frank gave his tutorial with Marc and Chitra. Everyone was very enthusiastic, including Patrick who also attended. We had dinner with Marc, Patrick, Frank and myself. That was useful to get Marc and Patrick hooked up. (I was just flying in.)
Patrick and I hooked up with his brother and met Haruko, who has a photo studio on (almost) downtown Broadway. I tried to capture it on the video camera, but am pretty sure I didn't succeed - lens not wide-angled enough.
I sat through the keynote by Gordon Bell. Apparently I should know who Gordon Bell is - he developed the VAX (which I at least remember!). Patrick was working around him when at Microsoft in California. I'm afraid I didn't take to all the graphs of increased CPU speeds, network connections, memory etc. He did, however, mention the 1945 Vannevar Bush article as compulsory reading. He also had a nice list of issues for the community to work on: create systems which handle the content of pictures, audio and video; capturing audio accurately and easily; segmenting video into useful clips (e.g. all the shots involving a particular character) rather than "mechanically detected" scenes. I'm not sure that he is aware of Marc's work from 10 years ago - although he was talking about machine processing of the video, not manual annotation.
The installation was a CAVE which allowed users to interact with different objects in the space. There was a sort of narrative context but I didn't get how the narrative combined with the user's interaction in the space.
Museums do experience design. They have to compete with other forms of entertainment. Her presentation didn't mirror her paper but complemented it. I videoed it, but it gets a far larger proportion of time than it perhaps deserves, but there is some fun visual stuff. (I think minutes 4-29 on the tape.) A definite case of edutainment.
Creating artificial ecosystems/physics in virtual reality using a games engine. Section 5 is titled "Knowledge Engineering as Authoring" - not sure whether this would be of any use to Katya for SaMPLE.
Leverage spatio-temporal context for gathering as much metadata as
possible at the point of capture. For example, with GPS you know where
you are. By polling bluetooth presence in the room you know who is in
the room. In general I liked this part of the message of the paper.
My question, which I didn't manage to ask properly, is given all the
potential sources of information, how do you know which are relevant
enough to go after?
While Marc is a great presenter and I believe every word he says (while
he is talking) I was left with the impression that he has a neat idea,
but that the hard problems have to be overcome. Yes, one could
aggregrate information across time
in the same space, and use a particular time for connecting pieces of
info together . But I am unclear as to how he is going to do this.
Does he need to do manual ontology mapping for the pieces of information
he is going to use, or does he want to use web services from some
catalogue. In which case he needs to know how to find them and
integrate them into his environment
In some sense I feel he is taking the idea of collecting metadata at
content capture time and not really doing an analysis of what you might
want or why. He has random examples of the kids of things you might
want to do - illustrated very nicely (as shown in the video - which you
should watch).
I am being a bit harsh on Marc, since the idea of the Brave New Topics
track is to present new fields to explore, not report completed research
work in a research paper. (But he did get 8 pages in the proceedings...)
I'm not completely sure that I get this paper. (It is hard to follow Marc's dazzling presentation.) But it was a nice follow-on to Marc's work. You can stitch resources together to find pictures taken at a certain location and at a certain time using tables of sunrise/sunset times and, hey presto, you can find pictures of sunsets. I feel they are coming up with cute queries and then manually doing the "ontology joins" to get the right connections among information sources and metadata to find the stuff they want. They are not dynamically joining stuff to do what the user wants - but their examples are a good start. He also has a JCDL paper. [[**LYNDA TO FIND REF**]]
This is not a very good new research conference paper, but it is a nice description of the problems with semantic annotations of media. A very good leesklub paper. (A nice summary of many of the things Frank has been telling us for years...) It seems to be doing "semantics scraping" from manually annotated images. If you are going to manually annotate, then use Guus's system. Maybe I'm missing something. [[Joost - do you have any comments about the paper?]] [[Lynda to send paper to Guus and FrankvH]]
One of my two regrets was that I didn't make enough time to see more of the art exhibits. These are the 3 I did see
You need to sit still and watch the picture build up. I enjoyed the experience of forcing myself to sit and relax and watch. I wonder whether there was (or should have been) a real narrative, and was a bit frustrated that there was no obvious "having arrived at the end" of the piece. But perhaps that was part of the point...
A computer screen shows an image and the user gives (4 or 5 dimensional) input (position on mouse mat, pressure and angle of pen movement) to explore the different layers beneath the topmost image. My only feeling was to try to find out what the different layers held and how this related to my mouse manipulation - but I wasn't emotionally involved in any way. (This could say more about me of course :-) .)
Frank explained this to me on video - so see the video. (Textual summary - the computer responds to your whistling.) There are some interesting observations about whistling as a (human) language in the paper, but this doesn't seem to be carried over into the installation.
This is weird stuff! The author spends his life wearing a video camera and recording everything he sees. Unbelievably, his wife gave birth sometime last year. http://www.eyetap.org/ http://wearcam.org/ His system can also do real-time substitution of images in the real world by computer generated images in his eye. His piece is mainly political, but the consequences and desirability of all this (drawing laser images on your retina - see cool glasses in figure 5 - for example) are fascinating. (It all reminds me of the guy in Snowcrash who physically lived in the driving seat of a truck but viewed the road through TV screens in the virtual world he perceived.) In some sense he is doing no more than the augmented reality people (for example helping surgeons with complex surgery by, for example, magnifying the images they see), but he is really into the political implications of recording those who record us. (Frank's comment was that these cameras were getting to the guy's sanity after all these years...)
http://Swarmart.com http://Swarm-Design.org All much more down to earth and good solid swarm design - making large groups of objects behave in swarm-like ways. It is very pretty, but I can't help thinking it belongs in SIG GRAPH.
Seemed to me to be a really neat paint box (sumi = ink, nagashi = floating) where you can make great watery pictures with your mouse where the system responds to pressure as well as movement. Patrick was totally underwhelmed and felt it was part of the currently available options in commercial tools.
S. Boll (University of Oldenburg)
S. Venkatesh (Curtin University)
T.-S. Chua (National University of Singapore)
R. Lienhart (University of Augsburg)
D. Bulterman (CWI)
M. Davis (University of California at Berkeley)
R. Jain (Georgia Institute of Technology)
I'm not sure what to say about the panel as a whole. I have a long list of factlets to do with the metadata question - add it by hand or do feature extraction. I wasn't impressed by Tat-Seng Chua's talk. I liked Rainer Lienhart's and Dick's talks - but I can't quite give you the essence of Dick's talk - it was based on his IEEE-MM paper "Moratorium on Metadata". Rainer said good stuff about context. Context isn't in the media you capture, it is from "the outside". He was sceptical about involving the user in the metadata capture process because we have no time for it. (But I disagree given the amount of effort media archives put into classifying their images/videos.) Marc mentioned the sensory gap as discussed in the Smeulders/Jain article of 2000 - which I should look up at least I guess. Ramesh talked about the field's problem of wanting to solve "X" (where X is for example understanding video content), then a grad student solves problem X'''''' (where each ' is a simplification step) and then the paper's claim is that X has been solved. I'm not sure whether he was pointing the finger at the feature analysis people, the metadata annotation people or everyone in general...
These are all on the official video page and definitely worth a quick watch. Text included for easier indexing and retrieval :-) For those that hadn't realised, Frank was the chair for this track and he and Katharina put in a lot of hard work into this and the ACM MM video web site. [[NEED URL!]
Desk manipulaton of "pieces of paper". I wasn't terribly convinced. Is this MM (you could do everything you want with A4 pages) or a not very inspiring UI paper? The cute idea was using 2 projectors. The first projector projects a hi-res image of the central part of the desk and a second projector projects a lower-res whole desktop. (Two rectangles, one enclosed within the other.) The problem is in getting the edges right so your "pieces of paper" don't go fuzzy as they cross the boundary. The ergonomics problem is that you look down all the time so I wonder about extended office use (unless you make the desktop not flat). My main question is - why restrict the interface to pieces of paper. Just use a "normal" desktop and manipulate windows and tools as normal.
This was very well done. Not only a nice example of the use of multimedia for a game for learning about the university, but a very good example of how to make a video of your work. I would have given them the video prize...
I was a bit weary of Marc's work by the time we got here. It was, however, a very professionally created video. An excellent piece of marketing. The idea remains cute - but it remains a cute idea and the real work behind it is difficult to see. (Mind you, if we could generate presentations like this :-) .)
I am a big fan of Alan Smeaton and have followed his IR work at a distance since my early hypertext days. However, the contrast with Marc's video was stark. This is how _not_ to read a cue card and how we Europeans can't compete with the presentations of the US (although Susanne's video was an excellent exception to my rule). I believe their work is interesting, but reckon that reading their original TRECVID paper would be a better way of understanding it. Still - it was nice to have them at the conference.
My understanding is that this is a system for setting up your multimedial meeting room to be able to give (complicated) presentations remotely. For example, to direct screens on and off, turn up volumes etc. Not really my cup of tea.
I was in the wrong mindset while watching this - I was looking for scientific content. But it was a video about an art installation. It is a very good video about an art installation. The video won the best video prize - which I thought was a bit of a shame since I prefer content over form, but they did have some good art content, just not typical ACM MM.
A new idea for a session - participants were invited to say their own piece. I think a few worked - useful ideas had by "senior" people in the field to share (I hope mine was perceived to be in this category), ideas just thrown to us to think about and give feedback on and just another advert for their PhD work (the least useful). I would do it again... XXXXX Someone came up from rii after my 1 minute, so that was worth my time.
Frank and I listened to Stefano's session. He presented very well (and some of it is on video - but I had forgotten at the beginning of his talk - ouch!), but his life has been shortened considerably by omitting to mention the name of the person who did related work on AUTEUR (having mentioned Marc Davis, Michael Mateas and Glorianna Davenport)... The questions were good too.
An interesting side topic is my first (conscious) use of RFID tags. (See SIGCHI-NL 2004 trip report for a political tirade about them.) We were wearing them so we could be identified when talking into the plenary microphone (the ID on the card was hooked up to a database of attendees). I saw it working for the main speakers, but didn't notice it working for anyone else. A bit of a shame really.
Susanne Boll was asking Patrick animatedly about Ludicrum Enterprises and he was explaining the goal of the company to bring irony and sarcasm to the Web. If it weren't for yours truely spluttering in her wine she might have bought it...