Towards Semantic Web Document Engineering
Jacco van Ossenbruggen
Centrum voor Wiskunde en Informatica (CWI), Amsterdam
Jacco.van.Ossenbruggen@cwi.nl
Abstract:
Web publishing systems have to take into account a plethora of
Web-enabled devices, user preferences and abilities. Technologies
generating these presentations will need to be explicitly aware of the
context in which the information is being presented. Semantic Web
technology can be a fundamental part of the solution to this problem
by explicitly modeling the knowledge needed to adapt presentations to
a specific delivery context. We propose the development of a
Smart Style layer which is able to use metadata to improve the
presentation of content to human users. We discuss different uses of
metadata and suggest extensions to current Web technology.
As the Web continues to grow not only in size but also in complexity,
the increasingly varying needs of the intended audience marks the
end of the ``one size fits all'' era. Delivery
contexts [1] can be characterized in terms of specific
user preferences and abilities, capabilities of the access device and
available network resources. Given this heterogeneity, any single
message needs to be adapted to a particular set of circumstances. As
a minimum requirement, the author's intended message needs to be
conveyed to the user given the constraints imposed by the access
device. In addition, the generated presentation should conform as
much as possible to the preferences of the user and the author
[2]. These two types of adaptation may lead to an
explosion of potential delivery contexts with which current stylesheet
technology is unable to deal.
Our prototype multimedia presentation generation system
Cuypers [3] generates multimedia presentations adapted to
the constraints of a specific delivery context. We claim that the
particular solutions deployed within Cuypers realize a level of
adaptivity that should become generally available on the Web. This
introduces new challenges since the solutions need to be embedded
within the current Web infrastructure. In this paper, we introduce
the concept of Smart Style: an intelligent presentation
adaptation layer for the Web that builds upon two fundamental
technologies:
- Web document engineering technology, including delivery formats such
as HTML [4], SMIL [5], SVG [6]
and XSL [7], and style and transformation languages such as
CSS [8] and XSLT [9].
- Semantic Web knowledge representation and metadata technology, including
RDF [10], RDF Schema [11],
DAML+OIL [12] and CC/PP [13].
Currently, Semantic Web technology is primarily deployed to improve
Web-based information gathering and brokerage. Our
vision is, however, that the Semantic Web infrastructure should also
play a key role in presenting information in the most appropriate way
to each individual reader. On the other hand, document engineering
technology is developing relatively independently from the Semantic Web.
We argue that device independent Web content engineering requires a
large amount of knowledge that needs and could be made explicit by
employing Semantic Web technology. Our proposed Smart Style layer
would deploy Semantic Web technology to improve the presentation's
adaptation, aiming for an optimized design of the presentation that
suits the specific requirements of the user's delivery context.
Ingredients for a Smart Style Layer
To build a Smart Style layer on top of the existing Web
infrastructure, four ingredients are needed: ways of specifying
delivery contexts, support for content descriptions; processing for
delivery contexts and content descriptions.
Assuming that at least a
part of the adaptation will need to take place on the server, it is
essential to standardize the communication of delivery
contexts: clients need to be able to send the information in a way
that the server understands. A machine-readable description of a
delivery context that can be sent to the server is often called a
profile.
CC/PP [13] provides an RDF-based
framework for defining the vocabularies that are needed to define
profiles. In addition, it also provides a small vocabulary that can
be reused across different profiles. The WAP
Forum [14] provides a commonly agreed upon
mechanism to communicate the (technical) capabilities of mobile phones
to servers and proxies. The CC/PP framework, however, is sufficiently
flexible to allow the definition of profiles that focus on more
user-centered aspects of a delivery context, such as language
preference or media preference.
Clients need to be able to communicate delivery contexts, but in
itself this is insufficient. Many design decisions will also depend
on information that is only available at the server. Even when this
information is not intended to be published on the Web, having
commonly used and standardized solutions for describing and processing
it will greatly reduce the development effort needed to implement a
smart, adaptive Web site.
Intelligent adaptation systems will need some knowledge of the
function of the content they are adapting. To make this type of
knowledge explicit, appropriate use of metadata will be of key
importance. Within and outside W3C, a large amount of work on
metadata standardization is currently in progress, and in most of this
work RDF, RDF Schema and DAML+OIL (and the language being specified
within WebOnt) play a central role.
For example, suppose an online museum site has
developed an RDF Schema1 for the metadata2 used to annotate their Web site.
Also suppose the site features an HTML page describing a work by the
painter Rembrandt van Rijn, focusing on the use of
chiaroscuro (the painting technique that uses strong contrasts
of light and dark paintings). Figure 1 shows a
an example fragment of the page.
Figure 1:
Example XHTML 1.0 fragment from a page about a Rembrandt painting.
<div id="allegory">
<h1>Musical Allegory<h1>
<img src="allegory.jpg"/ >
<p>This is hardly just an ordinary group of musicians.
The figures are too exotically dressed in oriental
...
</div>
|
From an XML/HTML markup perspective, all we know is that we have a fragment
with a first level heading, an image and a text paragraph. The
underlying semantics, however, could be explicitly added by the use of
RDF metadata, as shown in figure 2.
Figure 2:
RDF metadata of XHTML 1.0 fragment.
<museum:Painter rdf:ID="Rembrandt">
<museum:fname>Rembrandt<museum:fname>
<museum:lname>Harmenszoon van Rijn<museum:lname>
<museum:painted rdf:resource="#allegory" />
<museum:Painter>
<museum:Painting rdf:about="#allegory">
<museum:title>Musical Allegory<museum:title>
<museum:technique>Chiaroscuro<museum:technique>
<museum:Painting>
|
This explicitly states that our HTML fragment is an instance of a
class Painting, with a title property ``Musical
Allegory'', and that there is a Painter instance that has a
painted relation with this painting. The question
is: can we exploit the knowledge provided by the metadata to improve
our style sheets and other adaptation technology?
While the current focus of this type of Semantic Web technology is on
the use of metadata to achieve a more intelligent model for Web-based
information retrieval (e.g. improving search engines), the use of
metadata in our Cuypers system shows that there is also a huge
potential in applying this type of technology for improving the
adaptation and presentation process. Through the use of metadata to
make the intended semantics and function of the content explicit,
adaptation systems should be able to make informed decisions during
the design process. This requires an adaptation process that is also
able to take into account presentation-related metadata. Based
on our experience with Cuypers, we found that most metadata is geared
to information retrieval purposes, but not for information
presentation. Presentation-related metadata provides information
about the properties of the content in the context of its presentation
to the user. Examples include information about the intended
audience (e.g. suitability for presentation to children), the role of
the content (e.g. suitability for a specific presentation role, as
introductory material or in-depth explanation), and the transformations
allowed (e.g. to what extent images may be scaled in terms of
minimum/maximum scaling and aspect ratios, or to what extent images
can be displayed in grayscale while still communicateing the
intended message).
Assuming that the information upon which we base our design decisions
will be available from the Web through the use of standard Semantic
Web technologies such as CC/PP and RDF, the next ingredient needed for
building a Smart Style layer are efficient tools
that are able to take this type of information into account
during the adaptation
process. A first step is to make the current generation
presentation-oriented Web technology interoperable with the
next-generation Semantic Web technology. For example, CSS stylesheets
are currently not able to take CC/PP profiles into account. CSS has,
however, a feature that is closely related to CC/PP, and allows the
specification of device dependent style rules: the
@media rule. Figure 3 shows an
example3 of a stylesheet that uses bigger
fonts on computer screens than on paper printouts of the same
document.
Figure 3:
Device dependent style rules as already supported in CSS2.
@media print {
body { font-size: 10pt }
}
@media screen {
body { font-size: 12pt }
}
|
A first step towards a CSS syntax that allows more detailed queries is
suggested in [17]. In this syntax, queries to
specific device features are allowed. For example, the CSS media rule
for screen display above could be further refined by adding
constraints on the minimum width of the screen, as shown in
figure 4. Using the constraints, stylesheets
could take into account the information provided by profiles such as:
Figure 4:
Detailed media queries using a CSS3 extension (work in progress).
@media screen and (min-width: 640px) {
body { font-size: 14pt }
}
@media screen and (min-width: 800px) {
body { font-size: 16pt }
}
|
Even from this extended CSS syntax, however, it is still a long way to
fully CC/PP aware style engines. CC/PP features that will affect
style application include the ability to define new profile
vocabularies, inheritance mechanisms for specifying default values and
the description of the capabilities of transcoding proxies. Style
engines need to be able to deal with these features in order to take
full advantage of the information specified in CC/PP delivery
contexts.
Note that the need to take CC/PP information into account also applies
to XSLT transformation engines. While the full details of how this
could affect future versions of XSLT are beyond the scope of this
paper, one could, for example, imagine an extension4 of XSLT's mode concept. For example,
transformation rules could be selected in a way similar to that of the
media rules in CSS. In such a hypothetical extension (see
figure 5) one could, for instance, define a rule for
creating a two column layout only if the output medium is print and
the paper is wider than 17cm.
Figure 5:
Device dependent rules by extending XSLT modes (tentative syntax).
<xsl:template match="body"
mode="print and (min-width: 17cm)">
...
<fo:region-body column-count="2"/>
...
</xsl:template>
|
In addition to taking information about delivery contexts into account,
stylesheets also need to take into account the semantic information
that is contained in the metadata associated with the content.
Currently, style selector mechanisms only match on the syntactic
properties of the underlying (XML) document hierarchy. This applies
both to the selector mechanism used by CSS and to the XPath [18] selectors used by XSLT.
In all examples above, the rules were intended to match on the
<body>
element of an HTML document. Similar rules could be
written to match on the syntactic properties of metadata, i.e. on the
XML element and attribute names that are used to encode the RDF
statements of Figure 2.
Using the current generation CSS and XSLT engines to
process general metadata it is, however, not practical to match on the
semantic properties of metadata: for CSS and XSLT processors,
RDF is just XML. As a result, it is very hard to write, for
example, a rule that matches on all alternative XML serializations
that are allowed for RDF. A more serious problem, however, is that
it is impossible to write CSS or XSLT rules that make use of the
structural relations of RDF and RDF Schema, for instance a style rule
that applies to all objects that are instances of a specific RDFS
(sub)class. Neither is it possible to write rules for all objects
that have a certain DAML+OIL-defined ontological relation, etc.
Future, Semantic Web-aware, selector mechanisms could allow
specification of style rules in terms of the RDF semantics expressed
in the metadata. This would extend the currently used CSS and XPath
selectors, that are based on the XML syntax encoding the semantics.
Consider the extended XSLT example rule in figure 6,
which uses the RDF-aware query language RQL [15] for its
selector, instead of XPath.
Figure 6:
Semantic matching of XSLT rules using RQL selectors (tentative syntax).
<xsl:template match=
"RQL(http://www.museum.com/schema.rdf#Artifact)">
...
</xsl:template>
|
It matches on all resources that are instances of (subclasses of) the
RDF class Artifact. Given the fact that our RDF Schema would
define Painting as a subclass of Artifact, the rule
would also match on the HTML fragment of Figure 1.
Such rules that employ the semantic relations defined in the metadata
are currently impossible to write in XSLT.
This paper sketches the requirements for an ambitious goal: automatic
adaptation of dynamic text and multimedia content to the requirements
of an individual user's delivery context, while respecting the
integrity of the semantics of the content. If we reduce our ambition
levels, however, and ``only'' aim for taking into account processing
context information, this alone would still have major consequences.
To prevent CC/PP from becoming a stand-alone W3C recommendation that
can only be processed with proprietary tools, we need to clearly
define how other recommendations, including CSS, XSLT, XHTML, SMIL and
SVG operate in the context of CC/PP. From CC/PP-aware Web
transformations, another step is required towards Semantic Web-aware
transformations that also take metadata semantics into account. Given
the amount of knowledge that needs to be taken into account when
adapting Web resources, we need to integrate the document engineering
layers of the Web with the knowledge engineering layers of the
Semantic Web. This will require tools that can abstract from the
underlying XML syntax and operate directly on the semantics of
languages such as RDF, RDFS and DAML+OIL.
Realizing such a level of interoperability among W3C Recommendations
will be a huge effort. It should be clear that the examples given in
this paper serve only to illustrate the discussion, and should by no
means be regarded as readily applicable syntactical solutions to
achieve the required interoperability. Making the current Web
infrastructure interoperate seamlessly with the upcoming Semantic Web
will be a huge challenge and a long term effort.
- 1
-
W3C, ``Device Independence Principles.'' Work in progress. W3C Working
Drafts are available at http://www.w3.org/TR, 18 September 2001.
Edited by Roger Gimson, co-edited by Shlomit Ritz Finkelstein,
Stéphane Maes and Lalitha Suryanarayana.
- 2
-
D. Bulterman, L. Rutledge, L. Hardman, and J. van Ossenbruggen, ``Supporting
Adaptive and Adaptable Hypermedia Presentation Semantics,'' in The 8th IFIP 2.6 Working Conference on Database Semantics (DS-8): Semantic
Issues in Multimedia Systems, (Rotorua, New Zealand, 5-8 January 1999),
1999.
- 3
-
J. van Ossenbruggen, J. Geurts, F. Cornelissen, L. Rutledge, and L. Hardman,
``Towards Second and Third Generation Web-Based Multimedia,''
in The Tenth International World Wide Web Conference, (Hong Kong),
pp. 479-488, IW3C2, May 1-5, 2001.
- 4
-
W3C, ``XHTML 1.1 - Module-based XHTML.'' W3C Recommendations are
available at http://www.w3.org/TR/, May 31, 2001.
Edited by Murray Altheim and Shane McCarron.
- 5
-
W3C, ``Synchronized Multimedia Integration Language (SMIL 2.0)
Specification.'' W3C Recommendations are available at
http://www.w3.org/TR/, August 7, 2001.
Edited by Aaron Cohen.
- 6
-
J. Ferraiolo, ``Scalable Vector Graphics (SVG) 1.0 Specification.''
W3C Recommendations are available at http://www.w3.org/TR/, 4 September 2001.
- 7
-
W3C, ``Extensible Stylesheet Language (XSL) Version 1.0.'' W3C
Recommendations are available at http://www.w3.org/TR/, 15 October 2001,
2001.
- 8
-
B. Bos, H. W. Lie, C. Lilley, and I. Jacobs, ``Cascading Style Sheets,
level 2 CSS2 Specification.'' W3C Recommendations are available at
http://www.w3.org/TR, May 12, 1998.
- 9
-
J. Clark, ``XSL Transformations (XSLT) Version 1.0.'' W3C
Recommendations are available at http://www.w3.org/TR/, 16 November 1999.
- 10
-
W3C, ``Resource Description Framework (RDF) Model and Syntax
Specification.'' W3C Recommendations are available at http://www.w3.org/TR,
February, 22, 1999.
Editied by Ora Lassila and Ralph R. Swick.
- 11
-
W3C, ``Resource Description Framework (RDF) Schema Specification
1.0.'' W3C Candidate Recommendations are available at http://www.w3.org/TR,
27 March 2000.
Edited by Dan Brickley and R.V. Guha.
- 12
-
F. van Harmelen, P. F. Patel-Schneider, and I. Horrocks, ``Reference
description of the DAML+OIL (March 2001) ontology markup language.''
http://www.daml.org/2001/03/reference.html.
Contributors: Tim Berners-Lee, Dan Brickley, Dan Connolly, Mike Dean,
Stefan Decker, Pat Hayes, Jeff Heflin, Jim Hendler, Ora Lassila, Deb
McGuinness, Lynn Andrea Stein, ...
- 13
-
W3C, ``Composite Capability/Preference Profiles (CC/PP):
Structure and Vocabularies.'' Work in progress. W3C Working Drafts are
available at http://www.w3.org/TR, 15 March 2001.
Edited by Graham Klyne, Franklin Reynolds, Chris Woodrow and Hidetaka
Ohto.
- 14
-
Wireless Application Group, ``WAP-174: WAG UAPROF User Agent
Profile Specification,'' 1999.
- 15
-
G. Karvounarakis, V. Christophides, D. Plexousakis, and S. Alexaki,
``Querying Community Web Portals.''
http://www.ics.forth.gr/proj/isst/RDF/RQL/rql.html.
- 16
-
J. van Ossenbruggen, L. Hardman, and L. Rutledge, ``Hypermedia and the
Semantic web: A research agenda,'' Tech. Rep. INS-R0105, CWI, 2001.
- 17
-
H. W. Lie and T. Celik, ``Media queries.'' Work in progress. W3C Working
Drafts are available at http://www.w3.org/TR, 17 March 2001.
- 18
-
J. Clark and S. DeRose, ``XML Path Language (XPath) Version 1.0.''
W3C Recommendations are available at http://www.w3.org/TR/, 16 November 1999.
Footnotes
- ... Schema1
- Museum schema example adapted from
[15].
- ... metadata2
- Metadata example adapted from
[16]).
- ...
example3
- Example taken from the CSS2
Specification [8].
- ... extension4
- We are
not advocating a specific syntax, but are only claiming that future
XSLT transformations need to be able to take CC/PP-like information
into account