Web n+1

Abstract

The Web has turned into a programming environment, turning its back on its earlier roots of simplicity and ease-of-use. And in the process many properties of the early web have been lost. This talk will examine some of the desirable properties of a future web, such as accessibility, usability, semantics, decentralisation, privacy, aggregation and even what to do about the password problem.

About me

Researcher at CWI in Amsterdam (first non-military internet site in Europe - 1988, whole of Europe connected to USA with 64kb link!)

Co-designed the programming language ABC, that was later used as the basis for Python.

Wrote part of GCC.

At the end of the 80's built a system that you would now call a browser.

Organised 2 workshops at the first Web conference in 1994.

Chaired HTML WG for the best part of a decade.

Co-author of HTML4, CSS, XHTML, XForms, RDFa, etc.

Syntax and abstraction

HTML was originally designed as a pure structure language: It only described the structure of the document, and not how it should look.

This has some advantages, for instance

flexibility
reuse
machine independence
accessibility

Presentation

The browser-manufacturers at the time, not understanding this principle, started adding elements to influence presentation (such as <font> not to mention <blink>...).

This was one of the motivating factors behind creating W3C, and why CSS was the first W3C product.

Stylesheets

Style sheets abstract the idea of presentation out into a separate layer, and doing this adds a whole new layer of advantages:

Easier to write your documents, easier to change your documents
Flexible: easy to change the look of your documents, easy to change house style
Access to professional designs
Your documents are smaller: use less bandwidth, download faster (ESPN: 2TB/day saved)
Visible on more devices, visible to more people, accessible
Separation of concerns
Simpler HTML, less training
Cheaper to produce, easier to manage
Search engines find your stuff easier

And you can still create great designs

Abstraction

But it still took a long time for the world to get it.

Getting style-sheets accepted took a lot of work and a long time.

It took a while for people to understand that you could separate the presentation from the content.

Separation of concerns makes content more manageable.

Markup

Problems with HTML include that it is

Monolithic
Non-extensible

If you had a programming language that didn't allow you to create functions and libraries you would be very upset, and yet we seem to accept this from HTML.

Markup abstraction

What is it that makes a <p> a paragraph?

Certainly not the combination of characters "<", "p", ">".

HTML elements and attributes reflect a small number of semantic properties that we could just as well abstract out:

para: p
para@link: a@href
image: img
image@source: img@src
image/content: img@alt

This could allow

<para link="document.pdf">
   Here is an image: 
   <image source="fig.jpg">
      Figure 1: The larch
   </image>
</para>

This would allow you to easily add new more meaningful elements, for instance <book> or <person> or <city>.

Semantic abstraction

But we're not only interested in what markup means as markup, but also as concepts.

Luckily we have RDF to supply meaning as concepts, that can be layered in a similar way, so that if we make an element <city>, it would be possible for a browser (or search engine) to know what that means.

<affiliation>
    <person>Steven Pemberton</person>
    <employer>CWI</employer>
    <city>Amsterdam</city>
    <country>The Netherlands</country>
</affiliation>

Invisible Markup

But actually, we don't even need to be tied down to the encoding of markup. Invisible XML frees you even from that. For instance

body {color: blue}

gives you

<css>
   <rule>
      <simple-selector name="body"/>
      <block>
         <property name="color" value="blue"/>
      </block>
   </rule>
</css>

Invisible Markup

a×(3+b)

gives

<expr>
   <prod>
      <letter>a</letter>
      <sum>
         <digit>3</digit>
         <letter>b</letter>
      </sum>
   </prod>
</expr>

Functionality

HTML5 has turned HTML into a programming environment.

However, the key term for describing the original web (and I would claim, its initial success) is the word "declarative".

A declarative definition is where you describe what you want, rather than how to get it: it describes the solution space, and not a recipe to get to one solution.

Declarative definitions are typically short, and easy to understand.

The first declarative definition

A classic example is when you learn in school that

The square root of a number n is the number r such that r × r = n

Simple
short
obvious
understandable.

This tells us how to recognise a square root, but not how to calculate one; but no problem, because we have machines to do that for us.

Procedural code

function f a: {
    x ← a
    x' ← (a + 1) ÷ 2
    epsilon ← 1.19209290e-07
    while abs(x − x') > epsilon × x: {
        x ← x'
        x' ← ((a ÷ x') + x') ÷ 2
    }
    return x'
}

What does it do? Under what conditions?
How does it do it? What is the theory behind it?
Is it correct? Can I prove it?
Under what conditions may I replace it, or a part of it with something else?

Declarative Markup

The poster-child of HTML declarative markup is the hyperlink:

<a href="talk.html" title="My talk" target="_blank" class="overt">Web n+1</a>

This compactly encapsulates a lot of behaviour including:

what the link looks like
what happens when you hover over it,
activation in several different ways
what to do with the result,
hooks for presentation changes.

Doing this with programming would be a lot of work.

Example: A Procedural Clock

A clock in C, 4000+ lines

1000 lines, almost all of it administrative. Only 2 or 3 lines have anything to do with telling the time.

And this was the smallest example I could find. The largest was more than 4000 lines.

A Declarative Clock

type clock = (h, m, s)
displayed as 
   circled(combined(hhand; mhand; shand; decor))
   shand = line(slength) rotated (s × 6)
   mhand = line(mlength) rotated (m × 6)
   hhand = line(hlength) rotated (h × 30 + m ÷ 2)
   decor = ...
   slength = ...
   ...
clock c
c.s = system:seconds mod 60
c.m = (system:seconds div 60) mod 60
c.h = (system:seconds div 3600) mod 24

Example

A certain company makes BIG machines (walk in): user interface is very demanding — traditionally needed 5 years, 30 people.

With XForms this became: 1 year, 10 people.

Do the sums. Assume one person costs 100k a year. Then this has gone from a 15M cost to a 1M cost. They have saved 14 million! (And 4 years)

Example

The British National Health Service started a project for a health records system.

It involved 70 people
It cost £10M.
The hardware costs alone were £5 per patient.
It failed.

One person then created a system using XForms.

Hardware costs are 1p per patient
It runs on Raspberry Pi's
It is now running in 5 NHS hospitals.

Identity

All those passwords!

It's all about identity.

Your computer knows it is you (you've used a password or whatever to get in).

Use public key cryptography at a low level to log you in.

Public Key Cryptography

Two matched keys: you can lock with either key, but if you lock with one, only the other can open it.

So everyone has two keys, one public and one private.

Identity: If I lock a message with my private key, you can open it with my public key and read it, and know it was really from me. (No more spam!)

Privacy: If you send me a message locked with my public key, you know that only I can open it to read it.

Secure messaging: if I send you a message locked with my private key, and your public key, then only you can read it, and you know it's really from me.

Public keys for passwords

You still need to register with sites, but instead of picking a password, you exchange public keys (or your browser does).

Then when you click on "log in", the site says (to your browser): decrypt this for me.

You know it's really them asking, and when your browser decrypts the message, they know it's really you.

And you're in, without typing in a password.

HTTP

Client-server
Simple to implement
Served us well for 25 years.

BUT

Central point of failure
Allows for DDoS Attacks
Allows governments to monitor, block and close sites
Popular sites have to use load sharing
Peaks of demand can easily crash sites or make them unavailable.

How could we do better?

Use existing technologies.

Peer-to-peer:

Harder to block
The more popular particular content becomes, the easier it is to find
Even if the originating site is offline, the content may still be available
Peaks of demand are automatically dealt with
Popular sites don't need wide pipes.

Magnet Links

Saying not where to get it, but what you want

Fall-back to single source for long-tail content.

magnet:?xt=urn:sha1:YNCKHTQCWBTRNJIV4WNAE52SJUQCZO5C
?as=http%3A%2F%2Fexample.com%2Fulysses.html

Bit Torrent

If someone already has the document you are downloading in their cache, they can serve it to you.

If several people have it, they can share the task by sharing different parts.

You get it even faster.

Long-tail content

Personalised pages are a possible example of long-tail content.

But even these are applicable, since personalised pages can be represented in very many cases as a merge of the main content and the personalisation data (which for instance XForms is particularly good at).

HTTP: In Summary

Although you still need HTTP for long-tail, and single-use content, replacing HTTP with peer-to-peer+magnet links makes the most of the web:

Harder to block and censor
Faster
Cheaper for web sites
More available
Resilient to peaks of demand
You can phase it in without disruption to the web.

Metcalf's Law

Metcalf proposes that the value of a network is proportional to the square of the number of nodes.

v(n)=n²

Simple maths shows that if you split a network into two, it halves the total value:

(n/2)² + (n/2)² = n²/4 + n²/4 = n²/2

This is why it is good that there is only one email network, and bad that there are so many Instant Messenger networks. It is why it is good that there is only one World Wide Web.

Data in the cloud

The term Web 2.0 was invented by a book publisher (O'Reilly) as a term to build a series of conferences around.

It conceptualises the idea of Web sites that gain value by their users adding data to them, such as Wikipedia, Facebook, Flickr, ...

The dangers of Web 2.0

By putting a lot of work into a website, you commit yourself to it, and lock yourself into their data formats too.

This is similar to data lock-in with software: when you use a proprietary program you commit yourself and lock yourself in. Moving comes at great cost.

How do you decide?

As an example, if you commit to a particular photo-sharing website, you upload thousands of photos, tagging extensively, and then a better site comes along. What do you do?

How do you decide which social networking site to join? Do you join several and repeat the work? I am bombarded by emails from networking sites (LinkedIn, Dopplr, Plaxo, Facebook, MySpace, ...) telling me that someone wants to be my friend, or business contact.

How about geneology sites? You choose one and spend months creating your family tree. The site then spots similar people in your tree on other trees, and suggests you get together. But suppose a really important tree is on another site?

And what if it dies? Or your account is deleted?

How about if the site you have chosen closes down: all your work is lost.

This happened with MP3.com for instance. And Stage6.

How about if your account gets closed down? There was someone whose Google account got hacked, and so the account got closed down. Four years of email lost, no calendar, no Orkut.

Here is someone whose Facebook account got closed. Why? Because he was trying to download all the email addresses of his friends into Outlook.

Walled gardens

These are all examples of Metcalf's law.

Web 2.0 partitions the Web into a number of topical sub-Webs, and locks you in, thereby reducing the value of the network as a whole.

This is why you should have a Web Site

What should really happen is that you have a personal Website, with your photos, your family tree, your business details, and aggregators then turn this into added value by finding the links across the whole web.

So what do we need to realize this?

Firstly and principally, machine readable Web pages.

When an aggregator comes to your Website, it should be able to see that this page represents (a part of) your family tree, and so on.

Machine-readable Web Sites

One of the technologies that can make this happen has the catchy name of RDFa.

You could describe it as a CSS for meaning: it allows you to add a small layer of markup to your page that adds machine-readable semantics.

It allows you to say "This is a date", "This is a place", "This is a person", and uniquely identify them on your web page.

Advantages

If a page has machine-understandable semantics, you can do lots more with it.

Once a search engine can derive from the document that the text "the prime minister" means "Theresa May", then a search for "Theresa May" can find that page as well, even if it doesn't mention her by name, or a browser might offer additional information.
If the browser really knows that something is an address, it can offer to add it to your address book, or find it for you on a map.
If the browser really knows that something is an announcement for an event like a conference, and can identify the sub-parts, it can offer to add it to your agenda, find it on a map, locate hotels, look up flights, ...
Aggregators can create value by joining data. Don't give your data to them, let them come and get it.

Web n+1

Abstract

Contents

About me

Introduction

Syntax and abstraction

Presentation

Stylesheets

Abstraction

Markup

Markup abstraction

Semantic abstraction

Invisible Markup

Invisible Markup

Functionality

The first declarative definition

Procedural code

Declarative Markup

Advantages of the Declarative Approach

What does 'Declarative programming' mean?

Example: A Procedural Clock

A Declarative Clock

A Running Declarative Clock

Declarative programming today

Example

Example

A word from our sponsors

Identity

Public Key Cryptography

Public keys for passwords

Decentralised Web

HTTP

How could we do better?

Magnet Links

Bit Torrent

Example: Tribler

Tribler

Long-tail content

HTTP: In Summary

Decentralised social web

Metcalf's Law

Data in the cloud

The dangers of Web 2.0

How do you decide?

And what if it dies? Or your account is deleted?

Walled gardens

This is why you should have a Web Site

So what do we need to realize this?

Machine-readable Web Sites

Advantages

Conclusion