author: Joost date: 7-4-2005 This text explains the grammar rules used to present media items. It uses knowledge from two ontologies: 1) Media.rdfs which is schema for technical data about the media item (mime-type, width, height etc). 2) Modality.rdfs, this ontology describes properties to denote uni-modalities. For example class "Static graphic images" is defined as everything which has the properties modality:nonLinguistic, modality:analogue, modality:nonArbitrary, modality:graphical. Mapping.owl defines a mapping between Media.rdfs and Modality.rdfs. Thus, if I know that the type of a media item is media:Image, I also know it has the properties modality:nonLinguistic, modality:analogue, modality:nonArbitrary. The grammar rules use modality properties to decide how (multiple) media items should be presented. For example, our media ontology defines Bitmap a subclass of Image. Mapping.owl defines media:Image to be equivalent with modality:'Static analogue graphics'. The class media:Image has thus the property modality:graphical. The grammar rules know to use hfo_atomic to present media items wich have the property modality:graphical. They also know multiple graphical items are presented by using hfo_box. Note that with similar rules for audio we can present any number of arbitrary media items. However, all media items would be presented either as an audible or a graphical item. This is not desirable we typically want to present text differently from images. Therefore there is a rule which recoginizes a text based on its modality properties which it presents as a hfo_text. Similar rules exist for image and speech. Basically, if there is reason to present media items differently (not considering discourse, this is assumed to be already done) then this should be expressed by rules differentiating on modality properties. Some of these rule are dependent of the domain, for example, sometimes you want to differentiate between images and diagrams in other cases not. Rules for combining media items happens by assigning label (non-terminals) to media items. For example we can define caption as a media item with properties modality:linguistic, modality:nonAnalogue, modality:nonArbitrary, modality:static, modality:graphical, modality:label and in a similar way image. The rule captionizedimage knows to present these media items in combination, by placing the caption below the image (note this is a style decision). Finally the input of media items can be structured by adding control structures like group, and alternative. A media item as it is currently represented in Prolog: media(myid,[ url:"http://www/cwi.nl/image.jpg", mime:"image/jpeg", type:"media:Bitmap", % media ontology width:100,height:100, dc-title:"title"]) %%% parse is the top level goal. It tries to rewrite the INPUT to the non-terminal hfo. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% parse(hfo,INPUT) hfo --> multimodal | group | alternative % | denotes or group --> grp(MEDIA1..MEDIAN) % 1..4 denotes 1 2 3 4 return parse(hfo,MEDIA1..MEDIAN) % start a (sub) parse alternative --> alt(MEDIA1..MEDIAN) select(MEDIAX) % select one of MEDIA1..MEDIAN. This is part of the viscous triangle return parse(hfo,MEDIAX) % start a (sub) parse %%% User defined rules for text, image, captionizedimage, speech %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% graphical --> text | image | captionizedimage audible --> speech captionizedimage --> caption image return hfo_ybox(captionizedimage) text --> speech *** % *** denotes unimplemented convert(text,speech) % convert using speech generator speech --> MEDIA if modality:discourse, % The type of MEDIA is defined in the media ontology. % there is a mapping from the media ontology to the modality % ontology. The modality ontology defines modality properties. % If MEDIA has all the specified modality properties the rule % succeeds. modality:audible, modality:linguistic, modality:nonAnalogue, modality:nonArbitrary, modality:static return hfo_atomic(MEDIA) image --> MEDIA if modality:nonLinguistic, modality:analogue, modality:nonArbitrary, modality:graphical return hfo_atomic(MEDIA) text --> MEDIA if modality:discourse, modality:graphical, modality:linguistic, modality:nonAnalogue, modality:nonArbitrary, modality:static return hfo_text(MEDIA) caption --> MEDIA if modality:linguistic, modality:nonAnalogue, modality:nonArbitrary, modality:static, modality:graphical, modality:label return hfo_text(MEDIA) %%% Fall Back Rules %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%% Multimodal (composites of different types) multimodal --> graphical [multimodal] % [optional] return hfo_box(multimodal) multimdodal --> audible [multimodal] return hfo_tbox(multimodal) %%% Composites of similar types %%% % graphical is a sequence of atomicgraphical graphical --> atomicgraphical [graphical] return hfo_box(graphical) % graphical is a sequence of atomicgraphical audible --> atomicaudible [audible] return hfo_tbox(graphical) %%% Terminals %%% % MEDIA (terminal) has the property modality:graphical atomicgraphical --> MEDIA if modality:graphical return hfo_atomic(MEDIA) % MEDIA (terminal) has the property modality:audible atomicaudible --> MEDIA if modality:audible return hfo_atomic(MEDIA) % 2bdone match MIME types with ccpp profile. Currently this is naivly implemented in vicious.pl.