The One Hundred Year Web

Steven Pemberton, CWI, Amsterdam

Cite as: Steven Pemberton, 2023, The One Hundred Year Web. In Proceedings of ACM Web Conference, ACM, New York, NY, USA, 8 pages. https://doi.org/10.1145/3543873.3585578.

Contents

Abstract

The year 2023 marks the thirty-second anniversary of the World Wide Web being announced.

In the intervening years, the web has become an essential part of the fabric of society. Part of that is that huge amounts of information that used to be available (only) on paper is now available (only) electronically. One of the dangers of this is that owners of information often treat the data as ephemeral, and delete old information once it becomes out of date. As a result society is at risk of losing large parts of its history.

So it is time to assess how we use the web, how it has been designed, and what we should do to ensure that in one hundred years time (and beyond) we will still be able to access, and read, what we are now producing. We can still read 100 year-old books; that should not be any different for the web.

This paper takes a historical view of the web, and discusses the web from its early days: why it was successful compared with other similar systems emerging at the time, the things it did right, the mistakes that were made, and how it has developed to the web we know today, to what extent it meets the requirements needed for such an essential part of society's infrastructure, and what still needs to be done.

Keywords: World Wide Web, History, Design, Declarative principles, Markup, HTML, XML, XHTML, HTML5, Ephemera, Longevity, Data conservancy

The Web of the Long Now

New College Oxford Dining Hall in Victorian times

New College, Oxford, built 1379, has a dining hall with huge oak beams in the roof.

On discovering the beams needed replacing, they didn't know where to go. So they asked the University forester.

"Which college are you from?" he asked,
"New College"
they replied.
"Well, I've got your trees".

It turns out that around the time that New College was built, they planted new trees to be ready for when they would need them.

We don't see that sort of attitude much these days.

32

2023 is the 32nd anniversary of the World Wide Web being announced on 6 August 1991 by Tim Berners-Lee.

The internet had becoming open and international in November 1988 with the first European open node at the CWI Amsterdam.

The Original Web

The original web was not revolutionary, just created the right combination of existing elements:

and possibly the most important one:

Another major property was that it was based on declarative principles.

The Declarative Web

Declarative techniques allow you to define what you are trying to achieve without having to say how it should be done.

<a href="talk.html" title="…" target="…" class="…">My Talk</a>

In a nutshell, the advantages are that it is:

Declarative Markup

The markup largely specified the role of the elements, rather than how they should appear.

For instance, an h1 was a top-level heading with no a priori requirement that it be displayed in any particular way, larger or in bold. It just stated its purpose.

Mistakes included hr (horizontal rule), and elements like b and i for bold and italic, which specify a visual property rather than a purpose, but most of the structure was purely declarative.

Advantages include machine and modality independence: you can just as easily 'display' such a document with a voice-reader as on a screen, without having to use heuristics to guess what is intended.

Style Sheets

Another advantage of declarative markup is that since display properties are not baked in to the language, you can use style sheets to control the display properties of a document, without altering the document itself.

In fact one of the first activities of the newly-created W3C was to add style-sheets as quickly as possible to undo the damage being done by the browser manufacturers, who were unilaterally adding visually-oriented elements to HTML, such as font, and blink.

The result, CSS, is another example of a successful declarative approach.

Implementers as Designers

The original HTML surprisingly did not have facilities for embedding images into documents, so they were added by the implementers of the first really successful browser, Mosaic.

<img src="…">

This has two regrettable, related, disadvantages:

Mosiac also introduced the unfortunate blink and font elements as well as that excrescence the frameset, with its security and usability problems.

Better Images

A better design would have allowed the element to have content to be used in fallback cases:

<img src="cat.png">
   <img src="cat.jpg">
      A <em>cat</em>, sitting on a mat.
  </img>
</img>

The advantage of such a design to visually impaired users of the web should be obvious.

When png images were introduced on the web, their usage was held back for a long time because of the lack of such a mechanism.

HTML 4

One of the early tasks of the nascent W3C was to try and undo the damage being inflicted on the web by the implementors.

By then there were two warring browsers both adding new things, often incompatible, and without consulting the community.

The W3C result was HTML4, a compromise between the different browsers, but with a clear development path.

Development

HTML4 → XML →XHTML → Modularisation → XHTML 1.1/Print/Basic →XHTML2

XHTML

XHTML+SVG+MathMLXML allowed the mixing of namespaces in a single document. This added enormous amounts of functionality.

This image is an example from 2002 of a single document (that ran in browsers already) combining XHTML, SVG and MathML.

HTML5: A New Web, by Programmers, for Programmers

HTML5 has taken a completely different path

Driven entirely by implementers, with little reference to users, predicated on procedural methods, disregarding the fundamental design principles of the web, and eschewing modularity.

Essentially HTML has become a monolithic programming environment.

Design

One of the design principles quoted was "Pave the Cowpaths".

But the HTML5 design document got it wrong:

"When a practice is already widespread among authors, consider adopting it rather than forbidding it or inventing something new. Authors already use the <br/> syntax as opposed to <br> in HTML and there is no harm done by allowing that to be used."

This is not "Paving the cowpaths", which would be more like noticing that huge numbers of sites have a navigation drop-down, and supporting that natively.

Cow-paths

But even "Paving the cowpaths" is not necessarily a good design practice in itself.

Cows are not designers. Cowpaths are data. If you pave cowpaths, you are setting in stone the behaviours caused by the design decisions of the past.

Cowpaths tell you where the cows want to go, not how they want to get there. If they have to take a path round a swamp to get to the meadow, then maybe it would be a better idea to drain the swamp, or build a bridge over it, rather than paving the path they take round it.

Faulty Cowpath-based Design

As an example, take the rev attribute.

<link rel="next" href="chap2.html"/>
<link rev="prev" href="chap2.html"/>

rel and rev are complementary attributes, they are a pair, like +/-, up/down, left/right.

The HTML5 group decided that not enough people were using @rev, and so removed it.

This breaks backwards compatibility, and puts a fence before those who do need to use it.

This is doubly bad in the light of another of their design principles: "Support Existing Content".

Irritated by Colon Disease

For years, the wider community on the web had agreed to use a colon (:) to separate a name from the identification of the vocabulary it comes from.

But for some reason a new separator was developed for HTML5: the hyphen. For example:

<div role="searchbox"
     aria-labelledby="label" 
     aria-placeholder="MM-DD-YYYY">03-14-1879</div>

apparently re-inventing namespaces.

This also went against another of their design principles: Do not Reinvent the Wheel.

Reinventing the Wheel

Despite not reinventing things being one of the design principles, that precept wasn't followed. As has been noted:

"The amount of “not invented here” mentality that [pervades] the modern HTML5 spec is odious. Accessibility in HTML5 isn’t being decided by experts."

Many groups had already solved problems that HTML5 should have used, but HTML5 decided to reinvent, usually with worse results, since they were not experts in those areas.

Not Invented Here: Microdata

To take an example, consider RDFa. This came as the result of the question: How should you represent general metadata in HTML?

In 2008 the RDFa Recommendation was released, after more than 5 years of work.

A year later in 2009 the HTML5 group created Microdata out of the blue with no warning, discussion or consultation, clearly copied from RDFa (it used the same attributes), but different, and less capable.

Forward compatibility: Empty elements

One major improvement that XML introduced was a new notation for empty elements: <br/>.

This one simple change meant that you always knew when an element was empty.

Incomprehensibly, HTML5 dropped the requirement for this notation meaning that a processor now has to know which elements are empty, and making it impossible to add new empty elements to HTML.

Programming

Rather than being designed, HTML5 uses programming to solve their design problems, and Javascript as the basis of functionality.

The result is that standardisation has become compromised.

Frameworks

As a result of the programming-instead-of-design problem, frameworks have emerged.

Now we have some twenty-odd versions of HTML instead of just the one. They are all different.

If a framework dies, or changes its licensing, you have to rewrite your whole website! There is no standardisation any more!

The use of frameworks has created bloat, slowed the web, and limited accessibility:

To look at the web-page of one single tweet of 140 characters, you have to download just under a megabyte. It's 5200 lines of HTML before you even get to the five Javascript packages. The whole of James Joyce's Ulysses is only half as long again.

Complexity

Finally, HTML5 has become so complex, that implementers have found it hard to implement.

This has led to an impoverishment of the browser landscape, several browsers, even Microsoft!, having given up trying and instead just put a new wrapper around Google's Chrome browser.

This is regrettable, giving a single player a disproportional power over the web, and risking turning the web into a monoculture.

Conclusion

A sustainable web needs Modularity, Extensibility, Accessibility, and Standardisation, based on Declarative Principles.

A 100 year web is needed because it is the way now that information is distributed. The web pages that are being created now need to be readable in 100 years time, just as 100-year-old books are still readable.

Requiring a web-page to depend on a particular 100-year-old implementation of Javascript and a framework which hasn't been supported for 70 years and of which the creators are all dead is not in any sense future-proof.

Future

The web started off as a simple, easy-to-use, easy-to-write-for infrastructure.

Programmers have remodelled HTML in their own image, and made it complicated, hard to implement, and hard to write for, excluding many potential creators.

Hopefully, in the not-too-distant future, the web community can come together again to try and undo the damage being inflicted on the web by the implementers, and bring it back to its declarative roots.

At least declarative markup is easier to keep alive because it is independent of implementation!

References

[ARIA] James Craig et al. (eds), 2014, Accessible Rich Internet Applications (WAI-ARIA) 1.0, W3C, https://www.w3.org/TR/wai-aria-1.0/

[Basic] Mark Baker et al. (eds), 2000, XHTML Basic, W3C, https://www.w3.org/TR/2000/REC-xhtml-basic-20001219/

[Brand] Stewart Brand, 1993, How Buildings Learn, Viking Press, ISBN 0140139966

[CSS] H.W. Lie et al. (eds), 1996, Cascading Style Sheets, level 1, W3C, https://www.w3.org/TR/REC-CSS1/

[CSSquirrel] Kyle Weems, 2009, Behold Leviathan, Confused, http://cssquirrel.com/blog/2009/08/03/behold-leviathan-confused/

[CWI] CWI, 2018, CWI celebrates 30 years of Open Internet in Europe, https://www.cwi.nl/news/2018/cwi-celebrates-30-year-of-open-internet-in-europe

[Design] Anne van Kesteren et al. (eds), 2007, HTML Design Principles, W3C, https://www.w3.org/TR/html-design-principles/

[Freinbichler] Marcel Freinbichler, 2018, Tweet, https://twitter.com/fr3ino/status/1000166112615714816

[History] Various, History of the World Wide Web, Wikipedia, https://en.wikipedia.org/wiki/History_of_the_World_Wide_Web

[HTML4] Dave Raggett et al. (eds), 1997, HTML 4.0 Specification, W3C, https://www.w3.org/TR/REC-html40-971218/

[HTML5] Anonymous, 2022, HTML5, WHATWG, https://html.spec.whatwg.org/multipage/

[JSSS] Wikipedia, JavaScript Style Sheets, https://en.wikipedia.org/wiki/JavaScript_Style_Sheets

[Lidwell] William Lidwell et al., 2010, Universal Principles of Design, Rockport Publishers, ISBN 1-59253-587-9

[M12N] Murray Altheim et al. (eds), 2001, Modularization of XHTML, W3C, https://www.w3.org/TR/2001/REC-xhtml-modularization-20010410/

[Microdata] Ian Hickson (ed.), 2010, HTML Microdata, W3C, https://www.w3.org/TR/2010/WD-microdata-20100304/

[Norwich] Norwich Record Office, 2016, The Norwich Computer, 1957, https://norfolkrecordofficeblog.org/2016/04/29/the-norwich-computer-1957/

[Pemberton] Steven Pemberton, 2020, On the design of the URL, in Proc. Declarative Amsterdam 2020, Amsterdam, The Netherlands, https://declarative.amsterdam/article?doi=da.2020.pemberton.design

[Pistacchio] ‘pistacchio’, 2016, I’m a web developer and I’ve been stuck with the simplest app for the last 10 days, Medium, https://medium.com/@pistacchio/i-m-a-web-developer-and-i-ve-been-stuck-with-the-simplest-app-for-the-last-10-days-fb5c50917df#.i7o9ivu3x

[Print] Melinda Grant et al. (eds), 2006, XHTML-Print, W3C, https://www.w3.org/TR/2006/REC-xhtml-print-20060920/

[RDFa] Ben Adida et al. (eds)., 2008, RDFa in XHTML: Syntax and Processing, W3C, http://www.w3.org/TR/2008/REC-rdfa-syntax-20081014/

[Weakley] Russ Weakley, 2015, Front End Frameworks - are they accessible? Slideshare, https://www.slideshare.net/maxdesign/front-end-frameworks-are-they-accessible

[XForms] John M. Boyer (ed.), 2009, XForms 1.1, W3C, https://www.w3.org/TR/xforms11/

[XML] Tim Bray et al. (eds), 1998, Extensible Markup Language (XML) 1.0, W3C, http://www.w3.org/TR/1998/REC-xml-19980210

[XHTML1] Steven Pemberton et al. (eds), 2000, XHTML™ 1.0: The Extensible HyperText Markup Language, W3C, http://www.w3.org/TR/2000/REC-xhtml1-20000126

[XHTML11] Murray Altheim et al. (eds), 2001, XHTML™ 1.1 - Module-based XHTML, W3C, https://www.w3.org/TR/2001/REC-xhtml11-20010531/

[XHTML2] Mark Birbeck et al. (eds.), 2010, XHTML 2.0, W3C, https://www.w3.org/TR/2010/NOTE-xhtml2-20101216/

[XMS] 石川 雅康 (ISHIKAWA Masayasu), 2002, An XHTML + MathML + SVG Profile, W3C, https://www.w3.org/TR/XHTMLplusMathMLplusSVG/