A Possible Architecture for Cuypers

An architecture for Cuypers

Stefano Bocconi

I have had these ideas for some time now and I really would like to implement them. The talk with V2 stimulated me to write them down because I saw several similarities. Furthermore, we have to decide the role research directions like the Semantic Web, RDF knowledge representation will have in our flagship system Cuypers (and give some work to Gaurav as well). And do not forget also the coming collaboration on e-learning with the Telematica institute.

The basic idea is a sort of hierarchical model where layers are more or less well-defined and can be replaced as long as they provide the "layer functionality". This is the basic idea of a lot of standards (like TCP/IP and the TCP/IP-inspired OSI model from ISO). I think that the basic assumption we have to make in adopting such a model is that you can proceed in a (almost entirely) top-down manner from the content and the goal of the presentation to the presentation output, which is an open issue that will be treated in the following. Having made this assumption, and before exploring the layers, let's have a look at the data input, i.e. what Cuypers can use to generate presentations.

Each layer maps the input from the previous layer to the layer's output using explicit rules. Ideally those rules can be changed at each layer without having to change the software structure (i.e. the code).

The comparison to the TCP/IP protocol stack is valid in the sense that every layer can be provided by a different entity (vendor in the marketplace) and still the architecture should be able to generate a presentation.
The comparison is not valid in the sense that the layers presented here are more on a conceptual level and do not need to correspond to modules in the software implementation, whereas in the TCP/IP protocol stack there is no need to make this distinction.

In the same philosophy, the mapping between a layer's input and its output is specified as a any rule-based transformation from the input domain to the output domain. The implementation of such a transformation is not of importance for the present architecture proposal.

Furthermore, considering the fact that every layer is responsible for the implementation of the mapping, every layer has the power not to implement some of the directives dictated by the upper layer or implement them according to an arbitrary hierarchy defined by the layer itself.

The Data input

In general two cases can be distinguished:

We administer the data source, either directly or indirectly; this can happen if the data source is known a priori, i.e. it either comes from a database or from a known set of documents
We do not administer the data source; this is the case of using the Semantic Web as a data repository, which is now not known a priori.

One of the difference between these cases is that in the first case we have to deal with only one ontology, while in the second there can be many ontologies.

Cuypers' starting point is an existing database, the ARIA database: this means that we have to accept the database design as a fixed resource and hope it is well suited for our plans. A database is for us a collection of semantically annotated data, where the semantic stems from the database ER diagram (the Entities and their Relations) plus whatever (implicit) assumptions can be made (and have been made by the database designer) on the data.
Since this is the biggest source of information that will make up a presentation, the richer the semantic the richer the presentation information can be. This does not mean nicer pictures, but more info about what a user is being show, and more support for the decisions that are guiding the system to show a particular content.
If the database is poorly annotated, that does not need to be a problem because for experimental purposes we can augment a subset of the data with whatever we want, even though we can not annotate a complete database like the ARIA one.

The Discourse layer

At this layer the goal of the presentation is set, together with the way to achieve it. I can think of goals like "I want to learn", "Amuse me!", "Prove me that ...", "Shock my convinction!", "Show me the difference between ..." and something like that. At the moment in Cuypers our user's implicit goal is to learn, and that is more or less well supported by the structure of the database, because as we said before the database semantic has to support our presentation and its goal.
An interesting question/issue (at least for me) is to provide a set of requirements about the semantic structure of a database depending on the goal of the presentation: something like if the goal is to learn, the database semantic must have the concept of causal relation, ownership, authorship etc...
An interesting application of this would be the (possibly automatic) evaluation of the presentation goals a data source can support without actually creating any presentation. The discourse choice (made by the system or by the user) should drive the kind of interaction the user has with the system. For example the user query format could change, or the system could execute different queries (with or without letting the user know) to get the semantic it needs to make the presentation.

The Semantic layer

Having retrieved the annotated items from the database and knowing what kind of presentation they are for, the system can start organizing the presentation. The goal should dictate the way the items are annotated from a semantic point of view, i.e. the metadata regarding their logical interactions (painted by, example, etc.) and the semantic and the goal should dictate a series of compositional relations to be used in order to display the objects.

The Visual Semantic Layer (new proposed name: Compositional Semantics Layer)

This compositional relations are inspired by Marcos' work on Communicative Devices. He pointed out some principles that can dictate how the items are displayed in a presentation.

Some of the principles can be:

Salience: What is more important and what is less
Framing: Related concepts are put in a frame, which could be spatial, same background or also temporal
Cohesion: Similar concepts have similar positions (i.e. title always top-right)
Integration: the objects are gathered in a way as to facilitate their comprehension.
Rhythm: How spatially or temporally dense are the items placed in the presentation, are they placed with equal intervals or not?

An easy example could be that if you assign to some piece of text the semantic role of "Title", based on that and on your presentation goal you could decide to enforce cohesion on that role and require a "Title" to always appear in the same position.
This can be trivial but once we explicit the semantic of all items and we agree on the compositional principles, we can implement all sort of mapping from semantic to compositional semantic, possibly based on user expertise, cultural background, presentation goal etc.

There is still not complete agreement on what these compositional principles should be, but there is an agreement on the fact that principles conceptually equivalent to the above mentioned ones can be used to generate a presentation.

Together with the specification of the principle, this layer can also say how much that principle matters using a level (e.g. from 0 to 9).

The Communicative Device Layer (proposed name: Presentation Pattern Expert)

At this point we have the items with their logical and compositional relations, we can use Presentation Patterns to express them.
The example we all have in mind is the museum abstraction, where all the pictorial items can be thought of as artifacts, titles possibly as labels and texts as panels.
What I mean is that we could use the museum metaphor to represent our presentation, and that would dictate our Presentation Patterns, which would be the citizens of the metaphor (or virtual reality) we chose to represent the items.
That also means that Presentation Patterns are domain-dependent. The nice issue here would be to think of other metaphors to express our content that would use (partially) different Presentation Patterns, and still convey (more or less) the same message.

The Formatting Objects layer

At this level all properties of the Presentation Patterns are instantiated, like position, dimension, color, font, etc. This is based on Presentation Patterns and Compositional Semantic as defined by the layers above this one.

The Layout layer

In this layer the output format is chosen to encode the formatting from the layer above.

What is it good for?

The nice things of this structure in my view are the following:

Every Layer as explicit rules that can be changed in order to improve the overall result
Every layer elaborates what it gets from the layer above; if we use clearly defined interface we open the way to a plug-in concept at all layer levels: you can swap layer implementation to get different results at all level. I see the biggest application of this in the CommDev layer where you can think something like the "skins" many programs are offering nowadays (Window Media Player, WinAmp). You change the metaphor (CommDev) and you get the same functionality in another appearance. At every layer we get a sort of a content-style handling: the content is provided by the layer above and the style is applied by the layer itself. In that way you can change the style not changing the content.
NO DIET EFFECT: the presentation gets fatter as we descend the layers with no loss of information: the client is fully empowered, if he wishes so, to do something more intelligent with the items. It would be very nice to design our own interface (or browser plug-in) able to interpret the data and do sensible things with it (like V2_ is doing).
Where do you want to branch today? At all point we can stream to the client, if he prefers to handle the lower layers himself

At all levels the user model and the device characteristics can play a role, influencing the style of the layer.

Still it is debatable if this structure is providing any real improvement, and this issue resemble very closely the doubts regarding the semantic web. I think that if we explicit at every layer the rules governing the presentation creation we can discover interesting things just because we explicit the knowledge which implicitly is already available. Another point is that using XML-RDF to encode whatever we do and creating an ontology (or using an already existing one), just like the SW we can step into the richness of the web and into the integration problems.
I also think that we could learn a lot trying to design our own user-interface to view the presentation.
All in all, I see a promising perspective.

Top-down always good?

This structure does not consider the fact that the layout could influence the content in a way that was not foreseen by the system.

This is the issue that makes multimedia harder than text. The way I would tackle this problem is when such an undesired behavior shows up, to try to individuate the layer that failed in considering or modeling that.

I doubt whether we can achieve any automatic feedback loop that informs the system when it shows something it did not mean. In case our system can not be designed to avoid such situations, we still have some good examples for a Computational Humorism conference.

Open Issues

Does rhythm belong among the Compositional Semantics principles?
How does the data influences the Discourse used?

Comments

This is Oscar's comment:

Let's see:

>> A structure of Cuypers
>>
>> The basic idea is a sort of hierarchical model where layers are more or less well-defined and can be replaced
>> as long as they provide the "layer functionality". This is the basic idea of a lot of standards (like
>> TCP/IP and the TCP/IP-inspired OSI model from ISO). I think that the basic assumption we have to make in
>> adopting such a model is that you can proceed in a (almost entirely) top-down manner from the content and the goal of
>> the presentation to the presentation output, which is an open issue that will be treated in the following. Having
>> made this assumption, and before exploring the layers, let's have a look at the data input, i.e. what
>> Cuypers can use to generate presentations.

I think that this was the idea of the typical Cuypers architecture. Still now, the HFOs and the final output are, almost,
completely separated. The problems with an strict layering is that, as you said, you cannot consider (without backtracking)
dependencies from the bottom layers to the top layers. Also, the typical problem. Any decision about the final format is a
constraint on the possible solution. With an strict layering, you have to do ALL the work in the layers, before knowing
if the presentation is possible at all with all the constraints. The normal solution is to have different priorities of
the constraints, that can make more flexible the constraint set, while they still specify the perfect solution.

Also, a good layering needs a really good design of the interface of every layer, or a very simple one like TCP/IP, where the main
service is SEND(THIS,@).

>> The Data input

I agree

>> The Discourse layer
>> At this layer the goal of the presentation is set, together with the way to achieve it.

What should be the specific output of this layer ? Which can be rephrased as: what is "the way to achieve it" ?

>> The Semantic layer
>> Having retrieved the annotated items from the database and knowing what kind of presentation they are for,
>> the system can start organizing the presentation. The goal should dictate the way the items are annotated
>> from a semantic point of view, i.e. the metadata regarding their logical interactions (painted by, example, etc.)
>> and the semantic and the goal should dictate a series of compositional relations to be used in order to display the objects.

I don't get this one.
The input is ...?
And the output is ...?

>> The Layout layer

I don't like the name but is only my opinion.

In general I like it, because it works a lot in the TOP layers of the system, that until now were the poor orphan layers
that nobody wants to work with them.

The important thing is to determine the output of each layer, which I guess you don't really know at this moment.

Just a recommendation: if you work in this I think you should "disable" the bottom layers. The constraint solving that
refers to the screen adaptation requires most of the resources and, IMHO, it is perhaps the less important part.