October 24-26, 2001
CIDE 2001 (http://www.irit.fr/CIDE2001) is an easy to follow single track conference. 86 participants are reported on the participants list. This was the 4th edition of a conference on electronic documents.
Even if most papers and presentations are in french, english is also accepted. For instance Lambert Schomaker presented a tutorial in english.
Under the theme "Cognitive methods and techniques", the conference has proposed tutorials, invited talks, paper presentations and posters/demos. It is noticable that the 3 short tutorials (one hour each) were integrated into the conference, at no extra cost.
After that conference, my feeling is that the electronic document community is multi-disciplinary, as in HCI and in CSCW, with experimental psychology and artificial intelligence as main backgrounds. According to the number of papers presented by PhD candidates, I would say also that the domain seems to be regenerating.
The trend is more or less to treat each modality (or media) on its own. You could categorize most of the papers according to the following modalities: images (diagrams, conceptual maps), images (video, photo), text (paper and screen), and voice. Only one paper treated explicitly multimedia documents. A few papers were more difficult to classify: interaction with geographical databases, web-based training / library portals and, not the least, my own paper about collaborative authoring.
Toulouse has got a well connected human-size airport. It is a warm city, famous for its rose tonality, due to the small bricks used in constructions, and also well known for its south-west gastronomy. The conference was organised on its south campus: Paul Sabatier, at the IRIT host institution (the Toulouse research institute closer to the IMAG research institute in Grenoble).
The organizers had wisely chosen a restaurant school for the conference lunches, which has led to efficient group socialization processes. The diner on a boat, crusing the canal du midi was also appreciated.
Lambert Schomaker director of the AI lab at Groningen university described existing prototypes of image search tools (QBIC from IBM, VisualSeek, FourEyes from MIT). The main methods are:
full image template based retrieval
feature based
layout based
grid based
The last method is based on semi-annotation of small image parts: the image is segmented into blocks. When a block is annotated (for instance this is some grass
, the computer propagate the annotation to all the blocks with a similar texture).
A variation is to ask a user to extract canonical objets from an image by drawing their boundaries (manual outlining, which can serve then for retrieving images with pattern matching). The user-defined outlines are used to train classifiers algorithms (neural network based).
To measure the utility of image query interfaces, he proposes a cost/benefit method based on the maximization of the value of the system response as a function of the effort spent on input.
For his own research he uses the now famous Rijksmuseum database. There is a related website: http://www.openmind.org
A session with two articles concerning accessing information in geographical databases. In the first paper, the goal is to define a language for expressing requests through a web interface. The requests are natural language sentences simplified by the use of deictic expressions as the system also displays graphical samples.
In the second paper, some textual requests are generated from the drawing of simple diagrams. Diagram features (such as inclusion of symbols, intersection of lines, etc.) have got special meaning. The goal is to have a bi-modal dialogue: simple diagrams to compose requests and a natural language response from the system to disambiguate the request if needed.
Then a third paper about cross language information retrieval, but I didn't understand the presentation (which was paused for a couple of minutes due to a computer problem).
The fourth paper was a general discussion about the information content of images and its application to build an interface. It was not very clear but, as I understand it, the goal was to allow naive users (non expert in image processing) to create image processing scripts through a visual interface. The presenter has used lot of meaningless expressions such as "le type d'intention metier est d'extraire des symboles" or "la galaxie qui tourne autour des outils XML" or "il va falloir lancer des expertises". The user task was not obvious.
Presentation of techniques to record and measure the reading activity in real time. The most interesting part was the presentation of eye movements (sacades oculaires) as recorded with a fast camera, on top of the document beeing read. It was based on a interesting symbolism: lines for eye movements, circles with a radius proportionnal to the duration for pauses. I have learned than when reading, the eye is focused on a 3-4 letters area, and it bounces on the margins.
I asked a question to know if the time for reading is proportional to the font size, but I was answered it was addressed by other studies. The reading time (which is the time taken to understand a text) is also different than the time just spent moving the eye on the text. I guess so that it is measured with other experimental protocols. Jacques Virbel pointed me to the work of Hartley on that topic. Its bibliography can be found at http://www.keele.ac.uk/depts/ps/Curpubswww1.htm
Just an idea: what if we use some method to display the text as a 3-4 characters visible area moving on the whole dark screen page ? Will we be able to understand the text ?
First, presentation of an electronic book. The prototype is still rough (only 320 x 200 screen resolution, an external micro-controller). Its main features are:
no microprocessor (a low cost micro-controller)
not a real computer but it mimics a book
bi-stable screen which needs power only to change its state, the text is displayed even when the device is switched off
the screen can be copied with a copy machine (which is not the case with classical LCD screens)
low energy consumption (6000 pages can be displayed with a 9V battery )
bitmap fonts
2 lateral tactile strips for interacting with the eBook (but not finished at that time, only a few buttons on the left side)
Jean Caelen was skeptical as this eBook does not support handwritten annotation as paper books, thus minimizing is advantages other paper.
This device should be marketed by a startup compagnie (with a 640 x 4809 new screen). The LORIA Nancy and the LIR (Lyon) are interested.
The second paper presented by Claudie Faure (who gave me a few guilder coins at the end of the conference) was about a controlled experiment conducted before the design of an enhanced structured drawing application. The goal is to find algorithms to turn drafts of diagram drawings into clean drawings. An interesting result is in the understanding of the influence of the drawing surface dimension on the general shape of the drawings (users were asked to redraw a diagram on a very tall or thin screen areas).
This talk has made interesting references to the gestalt theory concerning the influence of the spatial frequence on the grouping of items in a drawing, and to the semiotic of diagrams. The main question asked was about the existence of such a general semiotic (outside of a pragmatic context), or if it was domain dependent (for instance in architectural drawings, architects see buildings in shapes and they have special rules to interpret their meaning which may differ than the interpretation of petri-net diagrams in software engineering). A very often cited book about that subject is a 1967 book by Jacques Bertin. After a quick web search it seems that it has been translated in english as: Jacque Bertin. 1983. Semiology of Graphics: Diagrams, Networks, Maps. trans. by Berg, W.J. Madison: Univ of Wisc Press.
The last article presented some methods to deduce text structures from its visible appearance.
I have missed that session for preparing my own presentation with Christine Vanoirbeek.
A course about text filtering for generating summaries. The main methods are:
statistical -- most significant words, sentences, based on corpus analysis
linguistic -- analysis of text structures (intro, conclusion, etc.)
rhetorical -- application of theories such as RST
psycho-linguistic -- simulation of the human behaviour
The previous methods are used to fill in templates (or frames in the AI style) which are then converted to text.
Each poster was presented during 10 minutes in this special session. They were:
an web interface to access ethnographical recording of rare languages. Using java applets / javascript, the sound can be accessed by clicking on any text extract, or the text is read with a karaoke like display of the active sentence. I have talked with the author who told me he first tried to use SMILE. The system can be accessed online at http://lacito.archivage.vjf.cnrs.fr/index.html
Hidden Markov models to modelize texts and web sites. Can be used to build automatic indexes.
a biomedical library web portal (from Padoue in Italy)
a learning application for biomedical training. An important feature for training is to allow some areas where the users can express freely (chat, unconstrained text fields).
OCR technologies, algorithms and interfaces to handle millions of paper forms. It took one year to 360 persons to do the job. The system has now been exported to Hungary.
For Marc Gilloux, this type of applications has got a future even if more and more questionaries can be put on the web. First, as the cost are lowered, it will become more common to add questions to any paper commercial form (for instance to make inquiries about consumers when they order something). The second reason is that hundreds of tons of archives are still in paper form.
Very general (boring ?) presentation of the semantic web. Semantic web characterized as the transition from machine readable to machine understandible.
I have just noted a URL: http://www.afia.polytechnique.fr/accueil/accueil.html
Too stressed by my forthcoming presentation I missed the first paper.
I was questioned about versioning in my prototype, and at the end 5 persons came to talk with me. One of them (I suspect a reviewer) suggested again to compare my work with the commercial Acropolis tool from Xerox. The other persons had used/heard of collaborative authoring tools and wanted to share their experience.
The next paper was a controlled experiment comparing the efficiency of taking notes between paper and computer hypertext. The main task was to prepare an argumentation on a given topic, a secondary task was to react to alarm signals (by pressing a button), and the third task was to verbalize the activity at the time of the alarm (taking notes, reading). The idea of the alarm task is to substract the time of this activity from the total time spent on the task to avoid inter-individual variations due to different levels of concentration (?) Someone asked if it was not a too complex protocol with many dependant variables (for instance taking notes wiith a pen while reading on a screen or on paper is very different: in the screen case the pen is shared with the mouse).
The last paper could have been a brasilian presentation of a cultural CD-ROM at the COSIGN conference for those who were there in Amsterdam.
See the section about poster oral presentations above.
Not a very clear tutorial: Andre Tricot has choosen to present a recent psychological experiment for which obviously there was no consensus on the analysis of the results.
The first paper proposed a notation to describe the visual architecture of a text and to link it with the text structure. I guess it can be useful to extract these structures from a visual representation of a text and to agree on a vocabulary to discuss rhetorical devices.
The second paper described different means to translate a written text into an oral form (with a synthesized voice). The author has focused his work on rendering titles and lists, after previous work on typographical parameters such as font weight. For instance he uses prosodic parameters (intonations), or he changes the text to introduce rhetorical devices that reveal the structure (such as: hello Dupond which is read as: hello Dupond, I insist on Dupond). Jean Caelen suggested to use multiple voices (for instance a male reading titles and a female reading paragraphs).
Then a study of multimedia documents followed. The goal of this experiment was to understand the influence of animation when reading a document. In fact the study itself was not so multimedia, as, if I remember, the author has compared the effect of displaying text distributed inside a drawing (the legend on the items, spatial juxtaposition) versus displaying text and drawing in different places. The result is that the first version is better for understanding, but with many inter-individual variations. In that study the bimodality comes from the fact that text is treated as a symbolic representation whereas image is an analogical representation.
Finally the last article compared hierarchical views (trees) to contextual maps (networks or graphs) for representing a domain (for instance a course on social psychology). There is no difference for expert users and novice users are better with hierarchies. He also studied how the concepts are retrieved from memory depending on their representation (adding a third category: the list).
Forthcoming events:
October 2002, CIDE will merge with two other electronic document conferences (among which Electronic Publishing I think) and will take place in Hammamat, Morocco.
2003: a young researcher oriented conference/workshop.