An architecture for Cuypers
Stefano Bocconi
I have had these ideas for some time now and I really would
like to implement them. The talk with V2 stimulated me to write them down
because I saw several similarities. Furthermore, we have to decide the role
research directions like the Semantic Web, RDF knowledge representation will
have in our flagship system Cuypers (and give some work to Gaurav as well).
And do not forget also the coming collaboration on e-learning with the Telematica
institute.
The basic idea is a sort of hierarchical model where layers are more or less
well-defined and can be replaced as long as they provide the "layer functionality".
This is the basic idea of a lot of standards (like TCP/IP and the TCP/IP-inspired
OSI model from ISO). I think that the basic assumption we have to make in
adopting such a model is that you can proceed in a (almost entirely) top-down
manner from the content and the goal of the presentation to the presentation
output, which is an open issue that will be treated in the following. Having
made this assumption, and before exploring the layers, let's have a look
at the data input, i.e. what Cuypers can use to generate presentations.
Each layer maps the input from the previous layer to the layer's output using
explicit rules. Ideally those rules can be changed at each layer without
having to change the software structure (i.e. the code).
The comparison to the TCP/IP protocol stack is valid in the sense that every
layer can be provided by a different entity (vendor in the marketplace) and
still the architecture should be able to generate a presentation.
The comparison is not valid in the sense that the layers presented here are
more on a conceptual level and do not need to correspond to modules in the
software implementation, whereas in the TCP/IP protocol stack there is no
need to make this distinction.
In the same philosophy, the mapping between a layer's input and its output
is specified as a any rule-based transformation from the input domain to
the output domain. The implementation of such a transformation is not of
importance for the present architecture proposal.
Furthermore, considering the fact that every layer is responsible for the
implementation of the mapping, every layer has the power not to implement
some of the directives dictated by the upper layer or implement them according
to an arbitrary hierarchy defined by the layer itself.
The Data input
In general two cases can be distinguished:
- We administer the data source, either directly or indirectly; this can happen if the data source is known a priori, i.e. it either comes from a database or from a known set of documents
- We do not administer the data source; this is the case of using the Semantic Web as a data repository, which is now not known a priori.
One of the difference between these cases is that in the first case
we have to deal with only one ontology, while in the second there can be
many ontologies.
Cuypers' starting point is an existing database, the ARIA database: this means
that we have to accept the database design as a fixed resource and hope it
is well suited for our plans. A database is for us a collection of semantically
annotated data, where the semantic stems from the database ER diagram (the
Entities and their Relations) plus whatever (implicit) assumptions can be
made (and have been made by the database designer) on the data.
Since this is the biggest source of information that will make up a presentation,
the richer the semantic the richer the presentation information can be. This
does not mean nicer pictures, but more info about what a user is being show,
and more support for the decisions that are guiding the system to show a
particular content.
If the database is poorly annotated, that does not need to be a problem because
for experimental purposes we can augment a subset of the data with whatever
we want, even though we can not annotate a complete database like the ARIA
one.
The Discourse layer
At this layer the goal of the presentation is set, together with the way
to achieve it. I can think of goals like "I want to learn", "Amuse me!",
"Prove me that ...", "Shock my convinction!", "Show me the difference between
..." and something like that. At the moment in Cuypers our user's implicit
goal is to learn, and that is more or less well supported by the structure
of the database, because as we said before the database semantic has to support
our presentation and its goal.
An interesting question/issue (at least for me) is to provide a set of requirements
about the semantic structure of a database depending on the goal of the presentation:
something like if the goal is to learn, the database semantic must have the
concept of causal relation, ownership, authorship etc...
An interesting application of this would be the (possibly automatic) evaluation
of the presentation goals a data source can support without actually creating
any presentation. The discourse choice (made by the system or by the user)
should drive the kind of interaction the user has with the system. For example
the user query format could change, or the system could execute different
queries (with or without letting the user know) to get the semantic it needs
to make the presentation.
The Semantic layer
Having retrieved the annotated items from the database and knowing
what kind of presentation they are for, the system can start organizing the
presentation. The goal should dictate the way the items are annotated from
a semantic point of view, i.e. the metadata regarding their logical interactions
(painted by, example, etc.) and the semantic and the goal should dictate
a series of compositional relations to be used in order to display the objects.
The Visual Semantic Layer (new proposed name: Compositional Semantics Layer)
This compositional relations are inspired by Marcos' work on Communicative Devices.
He pointed out some principles that can dictate how the items are displayed
in a presentation.
Some of the principles can be:
- Salience: What is more important and what is less
- Framing: Related concepts are put in a frame, which could be spatial, same background or also temporal
- Cohesion: Similar concepts have similar positions (i.e. title always top-right)
- Integration: the objects are gathered in a way as to facilitate their comprehension.
- Rhythm: How spatially or temporally dense are the items placed in the presentation, are they placed with equal intervals or not?
An easy example could be that if you assign to some piece of text the semantic
role of "Title", based on that and on your presentation goal you could decide
to enforce cohesion on that role and require a "Title" to always appear in
the same position.
This can be trivial but once we explicit the semantic of all items and we
agree on the compositional principles, we can implement all sort of mapping from
semantic to compositional semantic, possibly based on user expertise, cultural background,
presentation goal etc.
There is still not complete agreement on what these compositional principles
should be, but there is an agreement on the fact that principles conceptually
equivalent to the above mentioned ones can be used to generate a presentation.
Together with the specification of the principle, this layer can also say
how much that principle matters using a level (e.g. from 0 to 9).
The Communicative Device Layer (proposed name: Presentation Pattern Expert)
At this point we have the items with their logical and compositional relations,
we can use Presentation Patterns to express them.
The example we all have in mind is the museum abstraction, where all the
pictorial items can be thought of as artifacts, titles possibly as labels
and texts as panels.
What I mean is that we could use the museum metaphor to represent our presentation,
and that would dictate our Presentation Patterns, which would be the citizens
of the metaphor (or virtual reality) we chose to represent the items.
That also means that Presentation Patterns are domain-dependent. The nice
issue here would be to think of other metaphors to express our content that
would use (partially) different Presentation Patterns, and still convey (more or less) the
same message.
The Formatting Objects layer
At this level all properties of the Presentation Patterns are instantiated,
like position, dimension, color, font, etc. This is based on Presentation
Patterns and Compositional Semantic as defined by the layers above this one.
The Layout layer
In this layer the output format is chosen to encode the formatting from the layer above.
What is it good for?
The nice things of this structure in my view are the following:
- Every Layer as explicit rules that can be changed in order to improve the overall result
- Every layer elaborates what it gets from the layer above; if we use
clearly defined interface we open the way to a plug-in concept at all layer
levels: you can swap layer implementation to get different results at all
level. I see the biggest application of this in the CommDev layer where you
can think something like the "skins" many programs are offering nowadays
(Window Media Player, WinAmp). You change the metaphor (CommDev) and you
get the same functionality in another appearance. At every layer we get a
sort of a content-style handling: the content is provided by the layer above
and the style is applied by the layer itself. In that way you can change
the style not changing the content.
- NO DIET EFFECT: the presentation gets fatter as we descend the layers
with no loss of information: the client is fully empowered, if he wishes
so, to do something more intelligent with the items. It would be very nice
to design our own interface (or browser plug-in) able to interpret the data
and do sensible things with it (like V2_ is doing).
- Where do you want to branch today? At all point we can stream to the client, if he prefers to handle the lower layers himself
At all levels the user model and the device characteristics can play a role, influencing the style of the layer.
Still it is debatable if this structure is providing any real improvement,
and this issue resemble very closely the doubts regarding the semantic web.
I think that if we explicit at every layer the rules governing the presentation
creation we can discover interesting things just because we explicit the
knowledge which implicitly is already available. Another point is that using
XML-RDF to encode whatever we do and creating an ontology (or using an already
existing one), just like the SW we can step into the richness of the web
and into the integration problems.
I also think that we could learn a lot trying to design our own user-interface to view the presentation.
All in all, I see a promising perspective.
Top-down always good?
This structure does not consider the fact that the layout could influence
the content in a way that was not foreseen by the system.
This is the issue that makes multimedia harder than text. The way I would
tackle this problem is when such an undesired behavior shows up, to try
to individuate the layer that failed in considering or modeling that.
I doubt whether we can achieve any automatic feedback loop that informs the
system when it shows something it did not mean. In case our system can not
be designed to avoid such situations, we still have some good examples for
a Computational Humorism conference.
Open Issues
Does rhythm belong among the Compositional Semantics principles?
How does the data influences the Discourse used?
Comments
This is Oscar's comment:
Let's see:
>> A structure of Cuypers
>>
>> The basic idea is a sort of hierarchical model where layers are more or less well-defined and can be replaced
>> as long as they provide the "layer functionality". This is the basic idea of a lot of standards (like
>> TCP/IP and the TCP/IP-inspired OSI model from ISO). I think that the basic assumption we have to make in
>> adopting such a model is that you can proceed in a (almost entirely) top-down manner from the content and the goal of
>> the presentation to the presentation output, which is an open issue that will be treated in the following. Having
>> made this assumption, and before exploring the layers, let's have a look at the data input, i.e. what
>> Cuypers can use to generate presentations.
I think that this was the idea of the typical Cuypers architecture. Still now, the HFOs and the final output are, almost,
completely separated. The problems with an strict layering is that, as you said, you cannot consider (without backtracking)
dependencies from the bottom layers to the top layers. Also, the typical problem. Any decision about the final format is a
constraint on the possible solution. With an strict layering, you have to do ALL the work in the layers, before knowing
if the presentation is possible at all with all the constraints. The normal solution is to have different priorities of
the constraints, that can make more flexible the constraint set, while they still specify the perfect solution.
Also, a good layering needs a really good design of the interface of every
layer, or a very simple one like TCP/IP, where the main
service is SEND(THIS,@).
>> The Data input
I agree
>> The Discourse layer
>> At this layer the goal of the presentation is set, together with the way to achieve it.
What should be the specific output of this layer ? Which can be rephrased as: what is "the way to achieve it" ?
>> The Semantic layer
>> Having retrieved the annotated items from the database and knowing what kind of presentation they are for,
>> the system can start organizing the presentation. The goal should dictate the way the items are annotated
>> from a semantic point of view, i.e. the metadata regarding their logical interactions (painted by, example, etc.)
>> and the semantic and the goal should dictate a series of compositional relations to be used in order to display the objects.
I don't get this one.
The input is ...?
And the output is ...?
>> The Layout layer
I don't like the name but is only my opinion.
In general I like it, because it works a lot in the TOP layers of the system, that until now were the poor orphan layers
that nobody wants to work with them.
The important thing is to determine the output of each layer, which I guess you don't really know at this moment.
Just a recommendation: if you work in this I think you should "disable" the bottom layers. The constraint solving that
refers to the screen adaptation requires most of the resources and, IMHO, it is perhaps the less important part.