The core of these projects is a "database" of images and metadata about those images from the Rijksmuseum. I propose that we limit our media to exactly this set, although I would not object to scanning additional images if need be. But in particular I am proposing that we not use video, because there is none available now, and it would take too much effort to produce any, and especially because it would be very difficult to match the quality of the images scanned by the museum. On the other hand, I have no objections at all to extending the metadata about the image set.
When I say "database" I intend no commitment about how the data is stored - it could be stored in a file, it could be stored in an object store, it could be stored in a relational database. Whatever the storage is, it is below the level of abstraction of the project. We should probably use JBDC to insulate the project from the details of the storage system
The overall goal is a system that automatically produces high-quality adaptable multimedia/hypermedia presentations. The domain will be the artworks collected in the Rijksmuseum.
project components:
Slideshow 1: Apply the associative chaining algorithm from Frames to the database. The output is a sequence of images selected from the database. The initial step here is to re-implement the Frames algorithms as simply and cleanly as possible. note that we may need to be careful to avoid infringing on CSIRO intellectual property.
This step is mostly intended to enable the steps below, but it has two potentially immediate outputs. First, it will serve as a replication (hopefully positive) of the Frames work. (There is a long tradition in the natural sciences of validating research results by replication. Thise tradition is not often found in CS.) Second, it may lead to some interesting characterizations of the image metadata with respect to the Frames approach. What happens when one tries to apply Frames to a schema that was not designed with Frames in mind?
Presentation 1: Given a set of images selected by the chaining algorithm, find an effective presentation for them. This work builds on the work on overflow and compensation strategies described in the HT 2000 paper. It should lead to new results in adaptability for different output devices. For example, on a device with small screen size (a palm-size device) one must use temporal or navigational linking, because there's no room for more than one image.
There is potential for collaboration here if we can find a party interested in image processing algorithms for converting image data from large format, large gamut pictures (full size jpeg) to small format black and white images (suitable for palm-size device). How little must one show of an image to be recognizable? This is particularly interesting in the case of a dynamic in-museum guidebook, where all one needs to do is show which painting is intended. It's not necessary to attempt to reproduce the full glory of e.g. "Jeremiah's lament" it sufficies that the reader know which painting is being talked about. Even a "caricature" of the painting sufficies, and indeed may be better than an attempt at realism.
There is some room at this stage for use of rhetoric relations in presentation generation (but more comes later). For example it might prove useful to try to convey the overall size of the the presentation (the number of images in the set) as one way to help the reader stay oriented. but for the most part, at this stage there is little or no rhetorical knowledge available that would motivate further work.
There is also a potential thread here on graphical user interface. Given a limited screen size, if you have to show both an image and some text, how can you best share the screen? What's the difference between a presentation intended to be viewed in the museum, as kind of dynamic guide-book, and one to be viewed at home.
user model and slide generation: Two threads here:
generation of framing material The claim here is that at least some kinds of automatically generated sequences will be more easily comprehended if the basic slide (and accompanying texts) are augmented by automatically generated material (could be text or graphic) that presents at least some simple rhetorical relations about that material. the clear and obvious example is for a categorically-based presentation, that is, one where the weighting are set to as to cause the chaining algorithm to select all the material associated with one topic before moving on to the next. The hypothesis is that such presentations will be more easily understood if the system explicitly tells the user something about the structure of the resulting presentation, e.g. that for this particular presentation there are three topics within the category, and points out the boundaries from one topic to the text, ie "Here are the works of the students of Rembrandt. First, the works of van Verf... now the works of Piet de Schilder... finally the works of painter X."
we should be able to automatically generate texts (and perhaps graphical presentations that show the order and size of sequences, and at least the part/whoe relation. It is far from clear whether the metadata is rich enough to support other kinds of rhetorical relations (e.g. causation, interpretation).
note that this is the smallest possible step forwards in use of rhetorics that I can imagine. One can certainly imagine far richer rhetorical presentations (e.g. to convince the viewer that Jacob van Verf was the most unappreciated painter of the 16th century) but these seem more within the domain of AI.
explicit manipulation of chaining data strThe chaining algorithm produces, as an internal result, a list of its decisions and the alternate choices it might have made. Would it be useful to present this data structure to the user via a suitable graphic user interface and something to be manipulated? This allows the user to explore the query space in a different way. What kinds of sequences would I have gotten if I had changed the value of this weight?
extensions to the chaining algorithm the current algorithm makes comparisons only locally (on a clip-by-clip basis) treating the metadata as opaque tokens which may only be compared for equality. Some obvious extenions are:
Missing data: if a query misses, or an assocational chain gets too low, is it sometimes helpful to generate an explanation of this for the user? What I am trying to get at is that sometimes the failure of a query to produce any results is itself an interesting thing to show, and it may require explanation. For example, supposes there's a ten year gap in the record of paintings by painter X. Is there anything interesting to say about this gap?
It might also be interesting to analyze the texts that the museum staff has already entered, to see what (if any) rhetorical relations are present, and at what granularities. For example, it will be fairly hard to reuse text paragraphs in multiple contexts if the paragraphs have any single rhetorical (or for that matter, pronomial) structure "built in".