Cite as: Steven Pemberton, 2023, The One Hundred Year Web. In Proceedings of ACM Web Conference, ACM, New York, NY, USA, 8 pages. https://doi.org/10.1145/3543873.3585578.
The year 2023 marks the thirty-second anniversary of the World Wide Web being announced.
In the intervening years, the web has become an essential part of the fabric of society. Part of that is that huge amounts of information that used to be available (only) on paper is now available (only) electronically. One of the dangers of this is that owners of information often treat the data as ephemeral, and delete old information once it becomes out of date. As a result society is at risk of losing large parts of its history.
So it is time to assess how we use the web, how it has been designed, and what we should do to ensure that in one hundred years time (and beyond) we will still be able to access, and read, what we are now producing. We can still read 100 year-old books; that should not be any different for the web.
This paper takes a historical view of the web, and discusses the web from its early days: why it was successful compared with other similar systems emerging at the time, the things it did right, the mistakes that were made, and how it has developed to the web we know today, to what extent it meets the requirements needed for such an essential part of society's infrastructure, and what still needs to be done.
Keywords: World Wide Web, History, Design, Declarative principles, Markup, HTML, XML, XHTML, HTML5, Ephemera, Longevity, Data conservancy
New College, Oxford, built 1379, has a dining hall with huge oak beams in the roof.
On discovering the beams needed replacing, they didn't know where to go. So they asked the University forester.
"Which college are you from?" he asked,
"New College" they replied.
"Well, I've got your trees".
It turns out that around the time that New College was built, they planted new trees to be ready for when they would need them.
We don't see that sort of attitude much these days.
2023 is the 32nd anniversary of the World Wide Web being announced on 6 August 1991 by Tim Berners-Lee.
The internet had becoming open and international in November 1988 with the first European open node at the CWI Amsterdam.
The original web was not revolutionary, just created the right combination of existing elements:
and possibly the most important one:
Another major property was that it was based on declarative principles.
Declarative techniques allow you to define what you are trying to achieve without having to say how it should be done.
<a href="talk.html" title="…" target="…" class="…">My Talk</a>
In a nutshell, the advantages are that it is:
The markup largely specified the role of the elements, rather than how they should appear.
For instance, an h1
was a top-level heading with no a
priori requirement that it be displayed in any particular way, larger or
in bold. It just stated its purpose.
Mistakes included hr
(horizontal rule), and elements like
b
and i
for bold and italic, which
specify a visual property rather than a purpose, but most of the structure was
purely declarative.
Advantages include machine and modality independence: you can just as easily 'display' such a document with a voice-reader as on a screen, without having to use heuristics to guess what is intended.
Another advantage of declarative markup is that since display properties are not baked in to the language, you can use style sheets to control the display properties of a document, without altering the document itself.
In fact one of the first activities of the newly-created W3C was to add
style-sheets as quickly as possible to undo the damage being done by the
browser manufacturers, who were unilaterally adding visually-oriented elements
to HTML, such as font
, and blink
.
The result, CSS, is another example of a successful declarative approach.
The original HTML surprisingly did not have facilities for embedding images into documents, so they were added by the implementers of the first really successful browser, Mosaic.
<img src="…">
This has two regrettable, related, disadvantages:
Mosiac also introduced the unfortunate blink
and
font
elements as well as that excrescence the
frameset
, with its security and usability problems.
A better design would have allowed the element to have content to be used in fallback cases:
<img src="cat.png"> <img src="cat.jpg"> A <em>cat</em>, sitting on a mat. </img> </img>
The advantage of such a design to visually impaired users of the web should be obvious.
When png
images were introduced on the web, their usage was
held back for a long time because of the lack of such a mechanism.
One of the early tasks of the nascent W3C was to try and undo the damage being inflicted on the web by the implementors.
By then there were two warring browsers both adding new things, often incompatible, and without consulting the community.
The W3C result was HTML4, a compromise between the different browsers, but with a clear development path.
HTML4 → XML →XHTML → Modularisation → XHTML 1.1/Print/Basic →XHTML2
XML allowed the mixing of namespaces in a single document. This added enormous amounts of functionality.
This image is an example from 2002 of a single document (that ran in browsers already) combining XHTML, SVG and MathML.
HTML5 has taken a completely different path
Driven entirely by implementers, with little reference to users, predicated on procedural methods, disregarding the fundamental design principles of the web, and eschewing modularity.
Essentially HTML has become a monolithic programming environment.
One of the design principles quoted was "Pave the Cowpaths".
But the HTML5 design document got it wrong:
"When a practice is already widespread among authors, consider adopting it rather than forbidding it or inventing something new. Authors already use the
<br/>
syntax as opposed to<br>
in HTML and there is no harm done by allowing that to be used."
This is not "Paving the cowpaths", which would be more like noticing that huge numbers of sites have a navigation drop-down, and supporting that natively.
But even "Paving the cowpaths" is not necessarily a good design practice in itself.
Cows are not designers. Cowpaths are data. If you pave cowpaths, you are setting in stone the behaviours caused by the design decisions of the past.
Cowpaths tell you where the cows want to go, not how they want to get there. If they have to take a path round a swamp to get to the meadow, then maybe it would be a better idea to drain the swamp, or build a bridge over it, rather than paving the path they take round it.
As an example, take the rev attribute.
<link rel="next" href="chap2.html"/> <link rev="prev" href="chap2.html"/>
rel
and rev
are complementary attributes, they are
a pair, like +/-, up/down, left/right.
The HTML5 group decided that not enough people were using @rev
,
and so removed it.
This breaks backwards compatibility, and puts a fence before those who do need to use it.
This is doubly bad in the light of another of their design principles: "Support Existing Content".
For years, the wider community on the web had agreed to use a colon (:) to separate a name from the identification of the vocabulary it comes from.
But for some reason a new separator was developed for HTML5: the hyphen. For example:
<div role="searchbox" aria-labelledby="label" aria-placeholder="MM-DD-YYYY">03-14-1879</div>
apparently re-inventing namespaces.
This also went against another of their design principles: Do not Reinvent the Wheel.
Despite not reinventing things being one of the design principles, that precept wasn't followed. As has been noted:
"The amount of “not invented here” mentality that [pervades] the modern HTML5 spec is odious. Accessibility in HTML5 isn’t being decided by experts."
Many groups had already solved problems that HTML5 should have used, but HTML5 decided to reinvent, usually with worse results, since they were not experts in those areas.
To take an example, consider RDFa. This came as the result of the question: How should you represent general metadata in HTML?
In 2008 the RDFa Recommendation was released, after more than 5 years of work.
A year later in 2009 the HTML5 group created Microdata out of the blue with no warning, discussion or consultation, clearly copied from RDFa (it used the same attributes), but different, and less capable.
One major improvement that XML introduced was a new notation for empty
elements: <br/>
.
This one simple change meant that you always knew when an element was empty.
Incomprehensibly, HTML5 dropped the requirement for this notation meaning that a processor now has to know which elements are empty, and making it impossible to add new empty elements to HTML.
Rather than being designed, HTML5 uses programming to solve their design problems, and Javascript as the basis of functionality.
The result is that standardisation has become compromised.
As a result of the programming-instead-of-design problem, frameworks have emerged.
Now we have some twenty-odd versions of HTML instead of just the one. They are all different.
If a framework dies, or changes its licensing, you have to rewrite your whole website! There is no standardisation any more!
The use of frameworks has created bloat, slowed the web, and limited accessibility:
To look at the web-page of one single tweet of 140 characters, you have to download just under a megabyte. It's 5200 lines of HTML before you even get to the five Javascript packages. The whole of James Joyce's Ulysses is only half as long again.
Finally, HTML5 has become so complex, that implementers have found it hard to implement.
This has led to an impoverishment of the browser landscape, several browsers, even Microsoft!, having given up trying and instead just put a new wrapper around Google's Chrome browser.
This is regrettable, giving a single player a disproportional power over the web, and risking turning the web into a monoculture.
A sustainable web needs Modularity, Extensibility, Accessibility, and Standardisation, based on Declarative Principles.
A 100 year web is needed because it is the way now that information is distributed. The web pages that are being created now need to be readable in 100 years time, just as 100-year-old books are still readable.
Requiring a web-page to depend on a particular 100-year-old implementation of Javascript and a framework which hasn't been supported for 70 years and of which the creators are all dead is not in any sense future-proof.
The web started off as a simple, easy-to-use, easy-to-write-for infrastructure.
Programmers have remodelled HTML in their own image, and made it complicated, hard to implement, and hard to write for, excluding many potential creators.
Hopefully, in the not-too-distant future, the web community can come together again to try and undo the damage being inflicted on the web by the implementers, and bring it back to its declarative roots.
At least declarative markup is easier to keep alive because it is independent of implementation!
[ARIA] James Craig et al. (eds), 2014, Accessible Rich Internet Applications (WAI-ARIA) 1.0, W3C, https://www.w3.org/TR/wai-aria-1.0/
[Basic] Mark Baker et al. (eds), 2000, XHTML Basic, W3C, https://www.w3.org/TR/2000/REC-xhtml-basic-20001219/
[Brand] Stewart Brand, 1993, How Buildings Learn, Viking Press, ISBN 0140139966
[CSS] H.W. Lie et al. (eds), 1996, Cascading Style Sheets, level 1, W3C, https://www.w3.org/TR/REC-CSS1/
[CSSquirrel] Kyle Weems, 2009, Behold Leviathan, Confused, http://cssquirrel.com/blog/2009/08/03/behold-leviathan-confused/
[CWI] CWI, 2018, CWI celebrates 30 years of Open Internet in Europe, https://www.cwi.nl/news/2018/cwi-celebrates-30-year-of-open-internet-in-europe
[Design] Anne van Kesteren et al. (eds), 2007, HTML Design Principles, W3C, https://www.w3.org/TR/html-design-principles/
[Freinbichler] Marcel Freinbichler, 2018, Tweet, https://twitter.com/fr3ino/status/1000166112615714816
[History] Various, History of the World Wide Web, Wikipedia, https://en.wikipedia.org/wiki/History_of_the_World_Wide_Web
[HTML4] Dave Raggett et al. (eds), 1997, HTML 4.0 Specification, W3C, https://www.w3.org/TR/REC-html40-971218/
[HTML5] Anonymous, 2022, HTML5, WHATWG, https://html.spec.whatwg.org/multipage/
[JSSS] Wikipedia, JavaScript Style Sheets, https://en.wikipedia.org/wiki/JavaScript_Style_Sheets
[Lidwell] William Lidwell et al., 2010, Universal Principles of Design, Rockport Publishers, ISBN 1-59253-587-9
[M12N] Murray Altheim et al. (eds), 2001, Modularization of XHTML, W3C, https://www.w3.org/TR/2001/REC-xhtml-modularization-20010410/
[Microdata] Ian Hickson (ed.), 2010, HTML Microdata, W3C, https://www.w3.org/TR/2010/WD-microdata-20100304/
[Norwich] Norwich Record Office, 2016, The Norwich Computer, 1957, https://norfolkrecordofficeblog.org/2016/04/29/the-norwich-computer-1957/
[Pemberton] Steven Pemberton, 2020, On the design of the URL, in Proc. Declarative Amsterdam 2020, Amsterdam, The Netherlands, https://declarative.amsterdam/article?doi=da.2020.pemberton.design
[Pistacchio] ‘pistacchio’, 2016, I’m a web developer and I’ve been stuck with the simplest app for the last 10 days, Medium, https://medium.com/@pistacchio/i-m-a-web-developer-and-i-ve-been-stuck-with-the-simplest-app-for-the-last-10-days-fb5c50917df#.i7o9ivu3x
[Print] Melinda Grant et al. (eds), 2006, XHTML-Print, W3C, https://www.w3.org/TR/2006/REC-xhtml-print-20060920/
[RDFa] Ben Adida et al. (eds)., 2008, RDFa in XHTML: Syntax and Processing, W3C, http://www.w3.org/TR/2008/REC-rdfa-syntax-20081014/
[Weakley] Russ Weakley, 2015, Front End Frameworks - are they accessible? Slideshare, https://www.slideshare.net/maxdesign/front-end-frameworks-are-they-accessible
[XForms] John M. Boyer (ed.), 2009, XForms 1.1, W3C, https://www.w3.org/TR/xforms11/
[XML] Tim Bray et al. (eds), 1998, Extensible Markup Language (XML) 1.0, W3C, http://www.w3.org/TR/1998/REC-xml-19980210
[XHTML1] Steven Pemberton et al. (eds), 2000, XHTML™ 1.0: The Extensible HyperText Markup Language, W3C, http://www.w3.org/TR/2000/REC-xhtml1-20000126
[XHTML11] Murray Altheim et al. (eds), 2001, XHTML™ 1.1 - Module-based XHTML, W3C, https://www.w3.org/TR/2001/REC-xhtml11-20010531/
[XHTML2] Mark Birbeck et al. (eds.), 2010, XHTML 2.0, W3C, https://www.w3.org/TR/2010/NOTE-xhtml2-20101216/
[XMS] 石川 雅康 (ISHIKAWA Masayasu), 2002, An XHTML + MathML + SVG Profile, W3C, https://www.w3.org/TR/XHTMLplusMathMLplusSVG/