XHTML Project Review

Steven Pemberton, CWI/W3C, Amsterdam

Slides

XHTML

XHTML1: The minimum necessary to get HTML4 into XML. The first real opportunity for the HTML community to use XML.

XHTML 1.1: A basic clean-up of XHTML 1.0, getting rid of the historical flotsam. Added ruby. 'Pure' XML.

XHTML is the next HTML: maybe it was wrong renaming it, but there you go. At the time, people thought it would be good to emphasise the XML.

Early problems

Media type: XML community was adamant about not allowing pure XML into text/html. Community adopted application/XXX+xml solution as a result (which has some problems: for instance, you can't content negotiate for "anything as long as it is XML". XForms implementations have this problem now).

Namespace: LONG discussions about one namespace or 3.

XHTML2

Many communities have come with requests to fix things in HTML/XHTML.

XHTML2 is an attempt to address those problems.

Examples of changes requested by communities

HTML Community in general: More structure, presentation in CSS

Accessibility: More structure, since using heuristics with the h1/h2 etc headings to try and work out the structure doesn't work. A method of classifying the purpose of an element. Better access key processing.

Device independence/mobile: extensible events

Internationalisation: the ability to mark up alt and title text.

Semantic web: better integration with RDF

For many of the problems a band-aid approach is no longer possible

Take for example event handling.

The device independence and mobile communities wanted to be able to add event handling for new events.

The onclick style in HTML is not extensible: each time you want to add a new event you have to iterate the language.

Therefore, XML Events (proposed by the mobile community) is used in XHTML2 instead of the onclick style.

Ideally you want a unified design that solves the problems, and not a piecemeal solution to each separate problem. As it turned out, a unified approach to the problems meant that several solutions solved other problems into the bargain.

Backwards compatibility

It is a myth that new versions of HTML have been backwards compatible in the past. Each time new functionality has been added there has been no backwards compatibility. For instance, tables and forms both required new browsers.

In fact XHTML 1.0 was the only version of HTML that has been in any serious way backwards compatible.

However XHTML 1.1 is largely a subset of XHTML2,and because of the way it has been designed, XHTML2 already works in many existing browsers. That's the power of XML! Just like in older versions of HTML, it is only the new functionality (XForms and XML Events) that don't work in old browsers.

Simple Example

XHTML2 is recognisably a family member:

<html xmlns="http://www.w3.org/2002/06/xhtml2/" xml:lang="en">
  <head>
    <title>Virtual Library</title>
  </head>
  <body>
    <p>Moved to <a href="http://example.org/">example.org</a>.</p>
  </body>
</html>

Structure

One of the biggest problems for non-sighted people with many HTML pages is working out what the structure is. Often the only clue is the level of header used (h1, h2 etc), and often they are not used correctly.

To address this, in XHTML2 you can now make the structure of your documents more explicit, with the <section> and <h> elements.

<section>
   <h>A heading</h>
   ...
   <section>
      <h>A lower-level heading</h>
      ...
   </section>
</section>

Structuring advantages

As usual, fixing something for accessibility gives everyone some advantages.

Advantages include:

h1-h6 are still available.

Images

You might be surprised to know that <img> was not in the original HTML.

<img> is actually badly designed:

... images

So what we have done is allowed the src attribute on any element. The image replaces the element content, and the element content is fallback. Essentially we have added fallback, moved the longdesc into the document, merged it with alt, and allowed it to be marked up all in one go.

<p src="map.gif">Walk down the steps from the platform
   turn left, and walk on to the end of the street</p>

The <img> element is still available, but the alt text goes in the content (allowing it to be marked up):

<img src="w3c.png">W3C</img>

New elements

The WG has been bombarded with requests for new element types: <length> <number> <person> <bibliographic-reference> even <irony>.

How do you decide?

Then we realised that the 'role' attribute solved the problem, since the problem is not with structure, but with semantics.

role

The accessibility community needed a way to specify what a particular element was for.

Some examples: that a certain <div> was just a navbar, that another <div> was the main content, etc. So we introduced the 'role' attribute for this. You can now say:

<div role="navigation">...</div>
...
<div role="main">...</div>

but once we had that mechanism, it allowed us to add any semantics we wanted, layering it on top of the structure. For example:

<p role="note">...

but also

<span role="note">...
<table role="note">...

role values

role is in a way like class but with meaningful (semantic) values.

In fact, anyone can add their own role values, so that whole communities can agree on new semantics to overlay on to the content.

<span role="my:irony">

Apparently the mobile and device-independent communities (as well as accessibility) are very excited about the possibilities of using role.

In fact, you don't really need RSS anymore:

<h role="rss:title">...
<p role="rss:description">...

The big contentious issue: <hr/>

Half the world: get rid of <hr/> it's only about presentation!

The other half: No! We use it all the time

Japan: Please, we need <vr/> too!

James Joyce Ulysses

James Joyce Ulysses <hr>

James Joyce Ulysses

James Joyce Ulysses <hr>

James Joyce Ulysses

James Joyce Ulysses <hr>

A consensus solution

These are all <hr>s!

<hr> is not presentational, but structural: a lightweight section separator. (I like to compare it with a comma).

The only thing wrong with <hr> is that it is not (necessarily) horizontal, and not (necessarily) a rule!

So in an attempt to achieve consensus, we decided to do away with all the confusion and rename <hr> to <separator>.

In this way, you get the <hr/> functionality, and hopefully we don't get the requests to remove it.

The danger of cherry-picking

Although many communities have produced the requirements for XHTML2, they often don't respect or understand the other communities' problems.

Several communities have said "if we could only add this one little bit of XHTML2 to XHTML 1.1, we'd be happy".

The risk is then you get 10 versions of XHTML 1.1: XHTML 1.1-accessible, XHTML 1.1-RDF, XHTML 1.1-DI, etc.

Or you can add all the solutions to XHTML 1.1, and then you get XHTML 2.

Some remaining sticky issues

Mime type

Namespace: half the world wants us to keep the same namespace, the other half wants us to change it.

Plus ça change!

XHTML2: "the one bright light"

"Simple functionality and common sense appear – at least temporarily – to have triumphed over byzantine theological imperatives."

"Is this a bright and shining star? I think so."

Some companies have already commited to adopting XHTML2, such as Vodafone and Time-Warner (even if it weren't ever implemented in browsers, it is still an excellent format for representing your information for filtering to other formats).

I have approached these companies to join the WG, but they have said "we don't have the resources available, but anyway, you are doing a great job, and the W3C process gives us sufficient input".

Conclusions

Current status: XHTML2 is ready for last call.

XHTML2 has been designed with huge amounts of input (I don't think we've reached the 10,000 issue-mail mark yet, but we are quite close, and each mail is usually several issues), and by a large community of people.

It tries to solve the extant HTML/XHTML1 problems in a unified consistent manner; because of this, often one solution solves other problems into the bargain (for instance, the accessibility solutions mentioned above, or the RDF solution which solves the I18N problem of marking up title text.)