How people experience the web, and why we are doing it wrong

Steven Pemberton, CWI, Amsterdam

The Author

Contents

About me

Stephen Hawking

I went to the same school as Stephen Hawking.

University

Richard GrimsdaleMy university tutor was Richard Grimsdale.

He built the first ever transistorised computer.

Turing

Alan Turing on UK £50 note

Grimsdale's tutor was Alan Turing (making me a grand-tutee of Turing).

Post-University

MU5

I (coincidentally) went on to work in the department in Manchester where Turing worked.

I worked on the 5th computer in the line of computers Turing also worked on, the MU5.

Amsterdam

A project meeting

Moving to The Netherlands, I co-designed the programming language that Python is based on.

Internet

Steven at a computer in the 80's

I was the first user of the open internet in Europe, in November 1988, 35 years ago!

CWI set up the first European internet node, and then two spin-offs to build the internet out in Europe and the Netherlands.

Web

Steven with Tim Berners-Lee

I organised workshops at the first Web conference at CERN in 1994

I co-designed HTML, CSS, XHTML, XForms, RDFa, and several others.

I still chair XForms and ixml.

Internet@35

The Windowless AMSIX building

Last year was the 35th anniversary of the open internet in Europe

It started in November 1988 at the CWI, at the breathtaking speed of 64kb/s connecting all of Europe (= me + a handful of other researchers) with all of N. America.

Since then the speed has nearly doubled every year, so that the Science Park has the world's fastest internet node at over 12 Tb/s.

12 Teraseconds is approaching half a million years. A huge number, and every second that many bits pass through that building.

1988

A CRT TV tube

(Image Catalogo collezioni (in it). Museoscienza.org. Museo nazionale della scienza e della tecnologia Leonardo da Vinci, Milano, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=48928866)

New in 1988

Early Toshiba laptop

So what did you do?

Railway timetable

Gratitude

So we have a lot to be grateful to the internet for.

But have we reached peak internet, or is there more to come?

New Technologies

An image of an early car

Whenever a new technology is introduced, it imitates the old.

Early cars looked like "horseless carriages" because that is exactly what they were.

It took a long time for cars to evolve into what we now know.

Books

ScriptorumBefore the introduction of printing in 1450, all books were literally made by hand (Manu script: hand written). This was a long slow process, and very expensive.

Until the introduction of printing, books were rare, and very, very expensive, maybe something like the price of a small farm.

Only very rich people, and rich institutions, owned books.

In 1424 The University of Cambridge had one of the largest libraries in Europe: 122 books.

Book 1450

Printing in 1568

Gutenberg combined known technologies: ink, paper, wine presses, and added movable type.

Early books

The first page of Gutenberg's bible

For the first 50 years, books looked just like manuscripts.

Why?

That was what was expected of a book at the time.

It was where the money was.

They didn't know any different!

Effects

The introduction of book printing had several effects:

1450

printing_presses_in_Europe_1450

1460

printing_presses_in_Europe_1460

1470

printing_presses_in_Europe_1470

1480

printing_presses_in_Europe_1480

1490

printing_presses_in_Europe_1490

1500

printing_presses_in_Europe_1500

Image source. Data source.

Information explosion

Before, producing a single copy of a book took several years. By 1500:

And bear in mind, you didn't just "set up a print shop". You had to:

It was a real revolution.

The real book

Newton's Principia Mathematica 1687After about 50 years, readable fonts, and the features we now expect from a book emerged, so that books became what we now think of as books.

Another Effect: Social Turmoil

Before printing, all information had been in the hands of the church (even universities were primarily religious institutions run by the church).

After printing, church and state instituted censorship to control information. Writers were killed or imprisoned for publishing things that weren't approved of.

Another Effect: Social Turmoil

Giordano Bruno Statue, in Rome where he was killed

Amsterdam

Descartes statue

Consequently many thinkers relocated to get out of the reach of the church.

"The twin occurrences – that [Amsterdam] became a hub for scientists, and that it became the centre of publishing – fed one another, resulting in the astounding fact that, over the course of the 17th century, approximately one-third of all books published in the entire world were produced in Amsterdam" - Russell Shorto

Printing enabled the rise of Protestantism, and the Enlightenment is ascribed to the availability of books.

By Yair Haklai - Own work, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=6074035

Information increase

Scientific journals1665: first two scientific journals French Journal des Sçavans and the British Philosophical Transactions

From then on the number of scientific journals doubled every 15 years, right into the 20th century.

Even as late as the 1970's if you had said "there has to come a new way of distributing information to support this growth", they would have thought you crazy, more likely expecting the growth to end.

(Source: Little Science, Big Science, Derek J. De Solla Price)

The Web

Tim Berners-Lee with his first browserThe coming of the internet in Europe in 1988 enabled the Web.

Tim Berners-Lee (and Robert Caillau) created the Web at CERN: first server 1991.

They brought together existing technologies (Hypertext, the internet, MIME types) and created a cohesive whole.

The Web is now replacing books and many other things: Telephone directories, yellow pages, encyclopaedias, train timetables, other reference works are already gone. Others will follow.

Books (as an artefact) will become a niche market. All information will eventually be internet-based.

Parallels

In many ways, the development of the web has echoed that of the book. It has:

Turmoil

Books created turmoil in society by creating new ways for information to be distributed, which disturbed the existing power structures.

We are now living through a similar information turmoil, since society has not yet worked out how to deal with these new sources of information.

Amsterdam

Amsix stats 2023 peaking at nearly 12TB/s

And weirdly, just as Amsterdam produced the largest number of books in the 17th century, it has the fastest internet switch in the world, currently peaking at over 12Tb/s.

BUT!

The web is still imitating the old.

It is still very much presentation-oriented, not information-oriented.

As an example, I recently had to:

And note that we still talk about Web "pages"

The web imitating the old

A receiptOther examples are ticketing, contracts, and receipts.

These are all typically PDFs, with no machine processable elements. They are a picture of a paper version of the thing.

The only thing that has happened is that the paper has been digitised away, and is sent to you electronically. Otherwise it is the same as it ever was.

Machine-readable information

Information can be used in two ways (at the same time): to communicate to people, and to communicate between machines.

For instance there is a service, tripit, to which you send tickets, hotel bookings, and so on. It assembles them and creates an itinary for you automatically. Really handy: everything in one place.

But it has to know what the information is.

It has to try and work out what is in these things, in order to do something useful with it. It often gets it wrong.

Example ticket

This is a ticket recently sent to me, as a PDF

A train ticket as PDF

The Data

This is the essence of that PDF ticket

document: ticket
type: train
supplier: Eurostar
reference: PCX4GZ
passenger: Steven Pemberton
train: 9114
leave: 2023-07-20T08:16:00+02:00
from: Amsterdam CS
to: St Pancras International
arrive: 2023-07-20T13:51:00+01:00
class: SP
coach: 3
seat: 21

Making this pretty for a human reader is a trivial task, and the technology already exists to do that.

Automatically getting the information out of a PDF is not trivial. It is hard, and it is often got wrong.

Conferences imitating the old

Providing papers for a conference is a often a choice between using latex (which is a pre-web technology) or Word!

There's a page limit!

There's a styleguide including how references should be visually displayed!

IT'S ALL ABOUT PAPER!

How it should be done

Conference papers are paragraphs, headings, diagrams, references.

The author shouldn't have to care about how the conference wants it formatted. The information is there, let them format it as they will.

I write all my papers in (a rational version of) XHTML, supposing it to be the current format that has the best chance of longevity. Using HTML doesn't commit me to any particular presentation: style sheets let you change the look at will.

But I have to convert my HTML to Word in order to submit.

Which they then convert back to HTML and PDF!

When will we get the real web?

So eventually books went from pretending to be manuscripts to being proper books.

When can we expect the Web to stop pretending to be the old things, and start being what it really ought to be?

Why did it take 50 years for books to become their real selves?
This question has long troubled me.

When will we get the real web?

So eventually books went from pretending to be manuscripts to being proper books.

When can we expect the Web to stop pretending to be the old things, and start being what it really ought to be?

Why did it take 50 years for books to become their real selves?
This question has long troubled me.

My reluctant conclusion: the old generation of users and producers have to die, before the new generation who had never known the old way can start asking why things are done in such a weird way and start fixing them.

Future web

The requirements of the non-paper internet include

Decentralisation

Facebook

How it could be

Distributed Facebook

How it should work

You have a (small) home server.

Everything you want to share, you put there.

You control who can access it.

You can then have a distributed Facebook, where only your friends can see your stuff, and you theirs.

You could put something online for sale, and several sites could add it.

You could put an event online that several services collect.

The data is yours, you control who sees it; you don't have to repeat the work; if the sites die, you've still got your data.

A Server in Every Home

Everyone already has a browser.

A server is very small, and people don't have much data.

The web should be for reading and writing

Identity on the Internet

How do we prove our identity on the internet?

Mostly by passwords...

You log in to your computer, by whatever means. It therefore knows it is you.

But then you have to log in to a website to identify yourself, and then again, and again.

Transitivity

Your identity should really be passed down the line: your computer knows it is you. It can prove this to other computers without you having to log in.

There are techniques for doing this, and hopefully it will be widely adopted soon.

No identity: Anonymity

The internet was originally built with no mechanisms for trust and identity built in.

It was for computer scientists to communicate with each other.

Anonymity was originally seen as a good thing...

It turns out that it is good for a small number of things (whistleblowing, avoiding abusers), but is bad for a lot more.

Traceable anonymity

We need a form of anonymity that allows for whistleblowers, or people hiding from abusers, but lets you track down money launderers or abusers, or know when a post comes from a fake account

Conclusion

We are still in the "looks like a manuscript" phase of the internet. We are still imitating the old ways.

The technologies are already there to do the right thing, but the real web can't emerge until the paper generation is dead.