Roundtripping Invisible XML

Steven Pemberton, CWI, Amsterdam

Version: 2023-07-18



Keywords: World Wide Web, History, Design, Declarative principles, Markup, HTML, XML, XHTML, HTML5, Ephemera, Longevity, Data conservancy


ixml takes linear textual input, and converts it to structured XML output.

It does this by parsing the input using a grammar describing the format of the input document, and serialising the resultant parse tree as XML.

If that were all it did, then round-tripping the XML back to text would be trivial: it is simply a case of concatenating the text nodes of the XML, and you are done.

However, there are issues with regards ixml serialisation:

As hinted at in earlier papers on ixml [refs], round-tripping could be achieved by having a special-purpose general parser which attempts to recreate a parse-tree that could have produced the serialisation, and then concatenating the resulting text nodes.



a: ";". => -a: -"<a>", ";", "/<a>".
a: -";". => -a: -"<a/>", +";".
-a: ";". => -a: ";".
-a: -";". => -a: +";".
@a: ";". => -a: -" a='", ";", "'".
@a: -";" => -a: -" a=''".

Nonterminal element

a: b. => -a: -"<a>", b, "</a>"; "<a/>". {doesn't matter if b can't be empty}
a: -b. => -a: -"<a", raised-b-attributes, (">", -b, "</a>"; "/>"). 
-a: b. => -a: b.
-a: -b. => -a: -b.
@a: b. => -a: -" a='", flattened-b, "'".
@a: -b => -a: -" a='", flattened-b, "'".

Nonterminal attribute

a: @b. => -a: -"<a", @b, "/>". 
-a: @b. => -a: @b.
@a: @b. => -a: -" a='", flattened-b, "'".

["abc"] => ["abc"]
-["abc"] => +"a".
-["abc"]* => +"a"

Flattened a

raised a attributes

Dealing with whitespace

Dealing with deleted repetitions

Dealing with reordering

Bla bla bla

Is the permissive grammar

thing: -["[{("], expression, -"]})". => 
thing: +"[", expression, +"]".



Here is the ixml

dates: (date, s)*.
date: day, -"/", month, -"/", year;
      year, -"-", month, -"-", day.
day: d, d?.
month: d, d?.
year: d, d, d, d.
-d: ["0"-"9"].
-s: -[#a; #d; " "]+.

With input:


gives output:


Roundtrip ixml:

roundtrip: dates.
-dates: S, (-"<dates/>";
             -"<dates>", (date, s)*, S, -"</dates>"), S.
-date: S, (-"<date/>";
            -"<date>", (day, +"/", month, +"/", year;
                       year, +"-", month, +"-", day), S, -"</date>").
-day: S, (-"<day/>";
  -"<day>", (d, d?), -"</day>").
-month: S, (-"<month/>";
            -"<month>", (d, d?), -"</month>").
-year: S, (-"<year/>";
           -"<year>", (d, d, d, d), -"</year>").
-d: ["0"-"9"].
-s: +#a.
-S: -[" "; #a; #d]*.

gives output:


