Treating JSON as a Subset of XML

Steven Pemberton, CWI and W3C, Amsterdam

About me

Researcher at CWI in Amsterdam (first non-military internet site in Europe - 1988, whole of Europe connected to USA with 64kb link!)

Co-designed the programming language ABC, that was later used as the basis for Python

At the end of the 80's built a system that you would now call a browser.

Organised 2 workshops at the first Web conference in 1994

Chaired the first style and internationalization workshops at W3C.

Co-author of HTML4, CSS, XHTML, XML Events, XForms, RDFa, etc

Forms co-chair at W3C

XForms

XForms originally designed as a replacement for HTML Forms.

The resultant design

Example

What this concretely means is that the data is physically separated from the controls in the form. The data is placed in the head of the document, and the controls bind to the data.

<html xmlns="http://www.w3.org/1999/xhtml">
<head>
   <model xmlns="http://www.w3.org/2002/xforms">
      <instance>
         <data xmlns="">
            <year>2012</year>...
         </data>
      </instance>
   </model>
</head>
<body> ...

Controls and initial values

Controls in the body refer to values in the data instance(s) using XPath expressions:

<input ref="year">...
<input ref="event[1]/title/@language">...

The controls can be initialised by putting values in the data:

<data xmlns=""><year>2001</year>...</data>

but the data can also be initialised from external sources:

<instance src="http://www.example.org/events"/>

Constraints

Relationships between, and restrictions on, values can be specified in the model, allowing dependent values to be calculated automatically and data checking to be performed on the client rather than on the server.

<bind nodeset="year" constraint=". &gt; 1752"/>
<bind nodeset="state" required="../country = 'USA'"/>
<bind nodeset="age" calculate="../thisyear - ../birthdate/year"/> <bind nodeset="birthdate" type="date"/>

Output

Values can be exposed in the document itself, using an output control:

The result for the year <output ref="year"/> is ...

Input and output

(Source)

Intent-based Controls

Controls are intent-based, by expressing what the control should do, rather than how it should look. So a control like this:

<select1 ref="colour">
   <label>Colour:</label>
   <item><label>red</label>
      <value>#ff0000</value></item>
   <item><label>green</label>
       <value>#00ff00</value></item>
   <item><label>blue</label>
      <value>#0000ff</value></item>
</select1>

can be represented in different ways depending purely on styling.

THe same control three times, each with a different styling

Controls are abstract

Here are three identical controls, just styled differently

(Source)

Initial experience

Particularly in the use of fixed strings rather than (potentially) calculated values for such things as the submission URI.

As a consequence this restricted what was possible with the language.

XForms 1.0 → 1.1

As a consequence, XForms 1.1 addressed these shortcomings

Resultant language turned out to be far more than a forms language, but a declarative application language.

Since XForms has input, output, and a processing engine, XForms is Turing-complete, and much more than just forms is now possible with the language.

XForms 1.0 → 1.1

Experience: application production time can be reduced by an order of magnitude

One large project reporting a reduction from 5 years with 30 programmers using traditional programming, to 1 year with 10 programmers using XForms.

Examples

Examples

XForms 1.1 → XForms 2.0

Biggest changes: XPath 2.0, AVTs

Another change: accept data in other formats than XML

Data Opacity

JSON

An obvious data format widely in use on the web is JSON.

There are several mappings defined in both directions between XML and JSON, but largely because JSON can only represent a subset of what XML can represent, many of the mappings are cumbersome, and unnatural.

Example

For instance, just to take one example, here of the mapping from JXON, the following XML:

<BOOKS>
  <BOOK id="1">
    <TITLE>My Favorite Book</TITLE>
    <PRICE>1.23</PRICE>
  </BOOK>
  <BOOK id="1a">
    <TITLE>XML for Dummies</TITLE>
    <PRICE>5.25</PRICE>
  </BOOK>
  <BOOK id="3">
    <TITLE>JSON for Dummies</TITLE>
    <PRICE>200.95</PRICE>
  </BOOK>
</BOOKS>

would be transformed into:

{
 "childNodes": [
  {
   "childNodes": [
    {
     "childNodes": ["My Favorite Book"],
     "tagName": "TITLE"
    },
    {
     "childNodes": [1.23],
     "tagName": "PRICE"
    }
   ],
   "id": 1,
   "tagName": "BOOK"
  },
  {
   "childNodes": [
    {
     "childNodes": ["XML for Dummies"],
     "tagName": "TITLE"
    },
    {
     "childNodes": [5.25],
     "tagName": "PRICE"
    }
   ],
   "id": "1a",
   "tagName": "BOOK"
  },
  {
   "childNodes": [
    {
     "childNodes": ["JSON for Dummies"],
     "tagName": "TITLE"
    },
    {
     "childNodes": [200.95],
     "tagName": "PRICE"
    }
   ],
   "id": 3,
   "tagName": "BOOK"
  }
 ],
 "tagName": "BOOKS"
}

JSON in XForms

During the design phase we went through several iterations

Key realisation: since the aim is only to address existing JSON stores, it is not necessary to be able to convert every possible XML representation into an equivalent JSON representation, only the reverse.

This reduces the task considerably, since it means several features of XML do not have to be addressed, such as namespaces, attributes, and mixed content.

Requirements

Some of the requirements for a mapping from JSON to XML for XForms included:

Opaque data

Ideally, an XForm processing JSON data shouldn't have to know which data format has been used; so that, for instance, data such as

{"company":"example.com", "locations":[{"city": "Amsterdam"},{"city": "London"}]}

with the right mapping could be selected with XPath selectors like

locations/city[1] 

In this way data could be loaded using content negotiation, and will work whether the data comes in as XML or JSON.

Transformation used

The basic mapping designed is rather simple . Since JSON has no attributes, all content can be represented in elements, and attributes are therefore free to be used to help with the mapping.

Since a JSON value can have several values at the top level, a root element is used <json>. JSON names become XML elements:

{"name": "XForms"}

becomes

<json><name>XForms</name></json>

Types

Strings are the default datatype. In order to allow the processor to distinguish between {"size": 30} and {size: "30"} when serialising, other types are marked:

"age": 21

becomes

<age type="integer">21</age>

and

"registered": true

becomes:

<registered type="boolean">true</registered>

Nesting

Nested values are obvious:

"name": {"given": "Isaac", "family": "Newton"}

becomes

<name><given>Isaac</given><family>Newton</family></name>

Arrays

Arrays are marked specially:

"colour": ["red", "green", "blue"]

becomes

<colour starts="array">red</colour>
<colour>green</colour>
<colour>blue</colour>

This allows selectors like colour[3] to work, but also allows to distinguish things like single element arrays:

{city: ["Amsterdam"]}

from

{city: "Amsterdam"}

and empty arrays:

{"set": []}

from

{"set": ""}

Example

To take an example from the JSON site:

{"bindings": [
        {"ircEvent": "PRIVMSG",
         "method": "newURI",
         "regex": "^http://.*"},
        {"ircEvent": "PRIVMSG",
         "method": "deleteURI",
         "regex": "^delete.*"},
        {"ircEvent": "PRIVMSG",
         "method": "randomURI",
          "regex": "^random.*"}
    ]
}

would become

<json>
   <bindings starts="array">
      <ircEvent>PRIVMSG</ircEvent>
      <method>newURI</method>
      <regex>^http://.*</regex>
   </bindings>
   <bindings>
      <ircEvent>PRIVMSG</ircEvent>
      <method>deleteURI</method>
      <regex>^delete.*</regex>
   </bindings>
   <bindings>
      <ircEvent>PRIVMSG</ircEvent>
      <method>randomURI</method>
      <regex>^random.*</regex>
   </bindings>
</json>

and a JSON selector like

bindings[0].method

would become in XPath (JSON is 0-based, XPath 1-based):

bindings[1]/method

Special Cases

There are a small number of special cases that have to be accounted for:

Dealing with special cases

Empty names and illegal name characters are easy to deal with: any character that is not possible in XML is replaced with an underscore, and an attribute name is added to the element giving the correct name. The empty name is replaced with a single underscore, and an empty name attribute is used.

For example:

"$": "$"

would be transcribed:

<_ name="$">$</_>

Characters

The third is harder to deal with, with an example being:

{"backspace": "\b"}

The backspace character is completely disallowed in XML (even hex encoded), leaving the only option to leave those illegal characters encoded in JSON notation.

Implementation

Implementation of the mapping is relatively trivial:

At the point where an implementation normally receives a document of type application/xml (or similar), either during initial instance initialisation from an external resource, or as the return value of a submission, if the media type of the resource is application/json, the resource can be parsed, and transformed to an equivalent XML instance, as described above.

The media type can be recorded as an attribute of the root element, so that it can be reused if the instance is to be resubmitted as JSON.

Other formats

Clearly this method can be extended to other datatypes such as VCARD and iCalendar. For instance an iCalendar value such as

BEGIN:VCALENDAR
  METHOD:PUBLISH
  PRODID:-//Example/ExampleCalendarClient//EN
  VERSION:2.0
  BEGIN:VEVENT
    ORGANIZER:mailto:a@example.com
    DTSTART:19970701T200000Z
    DTSTAMP:19970611T190000Z
    SUMMARY:ST. PAUL SAINTS -VS- DULUTH-SUPERIOR DUKES
    UID:0981234-1234234-23@example.com
  END:VEVENT
END:VCALENDAR

can be transformed to

<VCALENDAR>
  <METHOD>PUBLISH</METHOD>
  <PRODID>-//Example/ExampleCalendarClient//EN</PRODID>
   
    <VERSION>2.0</VERSION>
  <VEVENT>
    <ORGANIZER>mailto:a@example.com</ORGANIZER>
    <DTSTART>19970701T200000Z</DTSTART>
    <DTSTAMP>19970611T190000Z</DTSTAMP>
    <SUMMARY>ST. PAUL SAINTS -VS- DULUTH-SUPERIOR DUKES</SUMMARY>
    <UID>0981234-1234234-23@example.com</UID>
  </VEVENT>
</VCALENDAR>

Conclusions

Due to the lack of a need to represent arbitrary XML in JSON, dealing with external JSON values in XForms becomes easy, and natural, in most cases not even exposing the fact that the external data type is not XML in the XForm. The approach can be extended to other types, and thanks to the generality of XML, mostly without restriction.

Future XML: allow all Unicode please; and do something about character entities...

XForms resources

A tutorial: http://www.w3.org/MarkUp/Forms/2010/xforms11-for-html-authors/

For an overview of all features, elements and attributes of XForms 1.1, see the XForms 1.1 Quick Reference.

It's not easy reading, but the final arbiter in questions of doubt is the XForms 1.1 Specification.

XForms 2.0 Draft: http://www.w3.org/MarkUp/Forms/wiki/XForms_2.0

The implementation used for the examples in this talk is XSLTForms.