Roundtripping Invisible XML

Steven Pemberton, CWI, Amsterdam

Version: 2023-07-18

Contents

Abstract

Keywords: World Wide Web, History, Design, Declarative principles, Markup, HTML, XML, XHTML, HTML5, Ephemera, Longevity, Data conservancy

Introduction

ixml takes linear textual input, and converts it to structured XML output.

It does this by parsing the input using a grammar describing the format of the input document, and serialising the resultant parse tree as XML.

If that were all it did, then round-tripping the XML back to text would be trivial: it is simply a case of concatenating the text nodes of the XML, and you are done.

However, there are issues with regards ixml serialisation:

As hinted at in earlier papers on ixml [refs], round-tripping could be achieved by having a special-purpose general parser which attempts to recreate a parse-tree that could have produced the serialisation, and then concatenating the resulting text nodes.

Transformations

Terminal:

a: ";". => -a: -"<a>", ";", "/<a>".
a: -";". => -a: -"<a/>", +";".
-a: ";". => -a: ";".
-a: -";". => -a: +";".
@a: ";". => -a: -" a='", ";", "'".
@a: -";" => -a: -" a=''".

Nonterminal element

a: b. => -a: -"<a>", b, "</a>"; "<a/>". {doesn't matter if b can't be empty}
a: -b. => -a: -"<a", raised-b-attributes, (">", -b, "</a>"; "/>"). 
-a: b. => -a: b.
-a: -b. => -a: -b.
@a: b. => -a: -" a='", flattened-b, "'".
@a: -b => -a: -" a='", flattened-b, "'".

Nonterminal attribute

a: @b. => -a: -"<a", @b, "/>". 
-a: @b. => -a: @b.
@a: @b. => -a: -" a='", flattened-b, "'".

["abc"] => ["abc"]
-["abc"] => +"a".
-["abc"]* => +"a"

Flattened a

raised a attributes

Dealing with whitespace

Dealing with deleted repetitions

Dealing with reordering

Bla bla bla

Is the permissive grammar

thing: -["[{("], expression, -"]})". => 
thing: +"[", expression, +"]".

ambiguous?

Example

Here is the ixml

dates: (date, s)*.
date: day, -"/", month, -"/", year;
      year, -"-", month, -"-", day.
day: d, d?.
month: d, d?.
year: d, d, d, d.
-d: ["0"-"9"].
-s: -[#a; #d; " "]+.

With input:

31/12/1999
1999-12-31

gives output:

<dates>
   <date>
      <day>31</day>
      <month>12</month>
      <year>1999</year>
   </date>
   <date>
      <year>1999</year>
      <month>12</month>
      <day>31</day>
   </date>
</dates>

Roundtrip ixml:

roundtrip: dates.
-dates: S, (-"<dates/>";
             -"<dates>", (date, s)*, S, -"</dates>"), S.
-date: S, (-"<date/>";
            -"<date>", (day, +"/", month, +"/", year;
                       year, +"-", month, +"-", day), S, -"</date>").
-day: S, (-"<day/>";
  -"<day>", (d, d?), -"</day>").
-month: S, (-"<month/>";
            -"<month>", (d, d?), -"</month>").
-year: S, (-"<year/>";
           -"<year>", (d, d, d, d), -"</year>").
-d: ["0"-"9"].
-s: +#a.
-S: -[" "; #a; #d]*.

gives output:

<roundtrip>31/12/1999
1999-12-31
</roundtrip>

Conclusion

References