XHTML, Usability and Accessibility

Steven Pemberton, CWI and W3C
Chair W3C HTML Working Group, and Forms Working Group

A Thought Experiment

What are the features of websites that you go back to regularly, that differentiate them from websites with the same purpose?

Differentiating features

Forrester did some research on this:

(the rest is noise: 14% and lower)

HTML is a mess!

Rather than being designed, HTML just grew, by different people just adding stuff to it.

We now have three versions:

  1. Loose/transitional, which is just a garbage can of whatever anybody fancied adding to the language before it got transferred to the W3C, and in particular is full of accessibility problems.
  2. Frameset, which we as usability experts know is a walking talking usability disaster.
  3. Strict, which is the closest we could manage at the time to the HTML we really wanted.

Usability and the web

It's worth repeating: Forrester Research demonstrated that usability is the second-most important property of a website after good content.

This means that it is pretty important that the markup language of the web support the usability aim.

The New HTMLs

To this end, we are designing a new family of HTMLs, in XML.

It gives us the opportunity to try and clear up the mess.

The Design of Notations

There is an amusing platitude that goes "A camel is a horse designed by committee".

This is of course an insult to camels, which are perfectly designed for their environment. You just trying putting a horse in a desert and see what happens.

Something that wasn't designed by a committee as it happens is the <img> tag in HTML. This element specifies an image for inclusion in a page, and has the form

   <img src="pic.gif"
       alt="Me, en route for France">

The (bad) design of <img>

This was badly designed in three ways:

  1. It wasn't backwards compatible: browsers that didn't know about <img> just showed nothing at that point.
  2. It didn't allow for any fallback apart from the 'alt' text. In other words you can only use a type of image that all browsers understand (GIF or JPEG usually). This has seriously held back the acceptance of PNG images, which have far superior user-experience properties to either GIF or JPEG.
  3. The alternative text is just plain text: you can't mark it up in any way to make it italic or whatever.

An alternative design

If <img> had been designed well, firstly it would be called <image> (contractions are OK, but there's no need to overdo it: why <img> when you have <blockquote>?), and secondly it would have a fall-back possibility like this:

    <image source="pic.png">
        <image source="pic.gif">
            Me, <em>en route</em> for France
        </image>
    </image>

This would display a PNG graphic if the browser knew about them, otherwise a GIF graphic, and if all else failed (or images were turned off) the marked-up text. Browsers which have never heard of the <image> tag would still display something sensible.

What <object> is about

This is why the <object> element was later added to HTML (by a committee), with exactly these properties (and handling other types of things than images into the bargain, such as MPEG movies), and why you should be thinking about moving from <img> to <object>.

Frames

Another thing not designed by a committee, but just added to a product without consultation, was Frames.

These are widely infamous in the interaction community for their lack of user-friendliness, initially and principally for the way that they broke the use of the [back] button.

But I have tried to identify all the problems with frames, and this is what I have come up with:

The design flaws of Frames

  1. The [back] problem (which still exists, even if it is not as bad as it was originally).
  2. You can't bookmark a frameset in the combination of pages it holds.
  3. 'Page up' and 'page down' usually don't work properly: even if there's only one frame that is scrollable, you usually have to click in it before you can use the paging keys.
  4. If you do a 'reload', you often get a different result than you started with.
  5. There are some security worries, when people combine pages from different sites in one frameset.
  6. Occasionally you can get trapped in a frameset, or you can get nested framesets, and it can be really difficult to get out.
  7. Search engines only find the pages included in the frames, so you get your search results without the navigation intended.
  8. Almost no one provides <noframes> content, so searches with search engines like Google are seriously weakened.

Frames? Huh! What are they good for?

In fact I can only find two compelling uses for frames:

  1. A search and display interface, where the results of some time-consuming search operation are shown in one frame, and clicking on those displays the resultant page in another frame.
  2. When scripting variables need to be kept over a series of pages.

So is there hope?

All in all a pretty damning charge sheet, and therefore surprising that Frames are so widely used.

What is also surprising is that if Frames had been designed in a slightly different way, most of the problems would disappear. If they had been designed not as a variation of HTML, but as a separate sort of document with the content HTML documents as parameters, such as

    http://www.example.com/home.frm?nav.html;
        main.html;banner.html

many of the usability problems mentioned above would never have arisen (back, bookmark, page up/down, reload, security, trapped, search results, no frames).

Conclusion on Frames

My conclusion? Firstly that "Many eyes make all bugs shallow" also applies to user-interface design bugs, and that user interface people should be involved in the design of notations.

Presentation

HTML was designed as a structure language, not a presentation language.

Now that we have CSS (96+% of people use a browser that supports CSS), you shouldn't use <font> etc.

So we can remove loads of presentation-oriented elements from XHTML.

But:

Many blind people complain about <hr> being used for 'pseudo-structure'.

Headings

The headings like <h1>, <h2> etc should be used in a structured way (h3's should come after h2's etc).

Particularly sight-impaired people have the greatest difficulty understanding pages that use the different heading mixed up, or even worse, use <b> and <font> for headings.

A possible solution

One solution would be to abolish h1, h2, etc, and replace them with <section> and <h>:

<section>
    <h>Chapter 1</h>
    <p>Bla bla bla</p>
    <section>
         <h>This is an h2</h>
         <p>Bla bla bla</p>
    </section>
</section>

More semantic markup

The 'class' attribute allows you to add extra information about an element, but there are no standards for what the values of 'class' should be.

    <p class="warning">

Should there be new markup for making it clear what the semantics of a piece of markup is? (A name, an address, a date, etc, etc.). Where should we stop?

Conclusion

Design of notations needs HCI people to be involved.

XHTML 1.1, which has just come out, is the first time HTML has got smaller in going to a new version

Currently we are designing the next, the 'real', version of XHTML.