Modular ixml

The author

Steven Pemberton, CWI, Amsterdam

Contents

Abstract

Most current ixml grammars are small. However there are examples of large grammars, and it is likely that in the future more large grammars will emerge as ixml usage increases.

To make large grammars more manageable, and to enable reuse, it would be useful to have a way to modularise them.

One of the requirements of modularisation for reuse in any notation is to have a method of specifying the contractual interface, such that it is possible for the producers of the modules to change their internal structure without breaking any existing usage of the module.

This paper describes a proposal for an ixml preprocessor that permits an ixml grammar to invoke other modules of ixml grammars, specifying their linkage. This involves renaming rules with name clashes in the modules, using ixml renaming, resulting in a single ixml grammar with no rule-name clashes, and so that the resultant XML serialisations remain the same. The invoking grammar remains unchanged.

There is no change to the syntax or semantics of ixml proper.

Keywords: ixml, parsing, context-free grammars, XML, modularisation

ixml

Invisible XML (ixml) is a notation and process that uses context-free grammars to describe the format of textual documents.

This allows documents to be parsed into an abstract parse-tree, which can be processed in various ways, but principally serialised into an XML document, thus making the implicit structure of the textual document explicit in the XML.

Modularisation

Most current ixml grammars are small (the grammar for ixml itself for example is around 70 lines).

Large grammars may emerge containing subparts that are authored by different people.

E.g. there is a grammar for XPath 4 at around 350 lines which could be used by grammars for languages that use XPath 4.

The nice thing about general context-free grammars is that they can be combined, and remain general context-free, which makes modularisation feasible.

Requirements

The main problem to be solved: rule name clashes between modules.

Other requirements and desiderata:

Naming and renaming

Renaming is a new ixml feature agreed by the working group.

Already present in several implementations.

It allows you to specify for a rule a different name than default for a rule to be used on serialisation.

Example renaming

Consider a grammar that accepts both 31/12/1999 and 31 December 1999 forms of dates:

    date: numeric; textual.
-numeric: day, -"/", month, -"/", year.
-textual: day, -" "+, tmonth, -" "+, year.
     day: d, d?.
   month: d, d?.
    year: d, d, d, d.
  tmonth: -"January",  +"1";
          -"February", +"2";
          ...
          -"December", +"12".
      -d: ["0"-"9"].

Dates

While 31/12/1999 produces

<date>
   <day>31</day>
   <month>12</month>
   <year>1999</year>
</date>

31 December 1999 produces

<date>
   <day>31</day>
   <tmonth>12</tmonth>
   <year>1999</year>
</date>

where the difference is because it is produced from a different input syntax.

Using Renaming

Using renaming, you can specify that both have the same serialised name:

tmonth > month:
        -"January",  +"1";
        -"February", +"2";
        ...
        -"December", +"12".

tmonth is the rule name, month is the name used on serialisation.

The Structure of a Module

A module consists of a regular ixml grammar, preceded by specifications of rules used from other modules and what is shared for use from this module.

+uses css from css.ixml
+uses iri, url, uri, urn from uri.ixml

It is possible to combine them

+uses css from css.ixml; iri, url, uri, urn from uri.ixml

Also possible:

+uses iri from https://example.com/ixml/modules/iri.ixml

The specification of what can be used is similar:

+shares iri, url, uri, urn

A Grammar for Modularisation

There are two main choices for a grammar for these. The first literally recognises the structure as it is specified above:

   module: s, (uses; shares)*, ixml.
     uses: -"+uses", rs, from++(-";", s).
   shares: -"+shares", rs, entries.
     from: entries, rs, -"from", rs, location, s.
 -entries: share++(-",", s).
    share: @name, s.
  @source: iri.

using s, rs, name, and ixml from the ixml grammar, and presupposing a rule for iri

Result

A specification like

+uses css from css.ixml; iri, url, uri, urn from uri.ixml

then produces

<uses>
   <from source='css.ixml'>
       <share name='css'/>
   </from>
   <from source='iri.ixml'>
      <share name='iri'/>
      <share name='url'/>
      <share name='uri'/>
      <share name='urn'/>
   </from>
</uses>

Alternative Grammar

   module: s, (multiuse; shares)*, ixml.
-multiuse: -"+uses", rs, uses++(-";", s).
   shares: -"+shares", rs, entries.
     uses: entries, rs, -"from", rs, from.
 -entries: share++(-",", s).
    share: @name, s.
    @from: iri, s.

where the resulting structure is then:

<uses from='css.ixml'>
   <share name='css'/>
</uses>
<uses from='uri.ixml'>
   <share name='iri'/>
   <share name='url'/>
   <share name='uri'/>
   <share name='urn'/>
</uses>

Semantic Requirements

+uses css from css.ixml
+uses iri, url, uri, urn from uri.ixml
+shares model, control

Mutual References

Modules are allowed to invoke each other.

E.g. a programming language where declarations can include procedures, and procedures can include declarations.

Module for procedures:

+uses declaration from declaration.ixml
+shares procedure

module for declarations:

+uses procedure from procedure.ixml
+shares declaration

This illustrates that a uses specification is different from, for instance, #include in C preprocessing, since uses only ensures that the module will be present in the final grammar.

Sole Ownership

A module can only share rules it defines; it is not permitted to share a rule from a different module like this:

+uses x, y from z.ixml
+shares x

Defining modules using modules

We can now use modules to define modules:

+uses ixml, name, s, rs from ixml.ixml
+uses iri from iri.ixml
+shares module

   module: s, (multiuse; shares)*, ixml.
-multiuse: -"+uses", rs, uses++(-";", s).
   shares: -"+shares", rs, entries.
     uses: entries, rs, -"from", rs, from.
 -entries: share++(-",", s).
    share: @name, s.
    @from: iri, s.

Processing

The invoking module and all invoked modules are collected.

If any two contain the definition of a rule of the same name, one of the rules is renamed:

Rule renaming

A rule is renamed by generating a new unique name, different from all other rule names in the set of modules:

All applications of the old name in the module grammar, and any of the other modules that use that rule are replaced with the new name.

Once all naming conflicts are resolved, all invoked modules are appended to the invoking module, with the uses and shares specifications removed.

Result

What these rules ensure is that:

Example 1

Imagine a language of identity statements of the style

total=price+tax+shipping
tax=price×10÷100
shipping=5

expressed using the definition of expr from another module:

+uses expr from expr.ixml
data: identity+.
identity: id, -"=", expr, -#a.
id: [L]+.

However the expr module has a clashing rule for id:

+shares expr
expr: id++op.
id: [L; Nd]+.
op: ["+-×÷"].

Processed

Since the invoking grammar never gets changed, the rule in the module gets renamed, resulting in the following complete grammar:

data: identity+.
identity: id, -"=", expr, -#a.
id: [L]+.

expr: id_++op.
id_>id: [L; Nd]+.
op: ["+-×÷"].

If the module's rule for id had instead been a renaming, for instance:

id>ident: [L; Nd]+.

then the renaming would have ended up as:

id_>ident: [L; Nd]+.

Example 2

Making the example slightly more complex, with rules like

result[1]=a1+b1+c1
result[2]=a2+b2+c2

using this grammar:

+uses expr from expr.ixml; identity from id.ixml
rules: rule+.
rule: identity, -"=", expr, -#a.

Example 2

Module expr.ixml

+shares expr
expr: operand++op.
operand: id; number.
id: [L], [L; Nd]*.
op: ["+-×÷"].
number: ["0"-"9"]+.

Module identity.ixml has a clash with both id and number:

+shares identity
identity: id; id, -"[", number, -"]".
id: [L]+.
number: digits, (".", digits)?.
-digits: [Nd]+.

Example 2

The invoking grammar never changes:

rules: rule+.
rule: identity, -"=", expr.

In module expr.ixml nothing needs changing

expr: operand++op.
operand: id; number.
id: [L], [L; Nd]*.
op: ["+-×÷"].
number: ["0"-"9"]+.

Example 2

In identity.ixml both id and number are renamed:

identity: id_; id_, -"[", number_, -"]".
id_>id: -"@", [L]+.
number_>number: digits, ".", digits.
-digits: [Nd]+.

The rules allow either or both to be renamed in expr.ixml instead.

Example 3

The invoking grammar:

+uses id from ident.ixml; expr from expr.ixml
rules: rule+.
rule: id, -"=", expr.

Module ident.ixml

+shares id
id: [L]+.

Module expr.ixml

+uses id, number from id.ixml
+shares expr
expr: operand++op.
operand: id; number.
op: ["+-×÷"].

Module id.ixml

+shares id, number
id: [L], [L; Nd]*.
number: [Nd]+.

Here there are two rules called id both shared and used by two different modules.

Result

The invoking grammar is never changed:

rules: rule+.
rule: id, -"=", expr.

and since the id rule is used from module ident.ixml, the rule may not be renamed there:

id: [L]+.

This means that the id rule in module id.ixml has to be renamed:

id_>id: [L], [L; Nd]*.
number: [Nd]+.

and in module expr.ixml that uses it

expr: operand++op.
operand: id_; number.
op: ["+-×÷"].

A Larger Example

Imagine you were defining a textual format for XForms:

Example XForm
style xform.css

model M
  instance data data.xml
  submission save put:data.xml replace:none 

input name "What is your name?"
submit "OK"

xform.ixml

This is going to need definitions for CSS, URIs, XPath, and a lot more. Then you might define a grammar like this (this is not a complete example).

+uses form from form.ixml
+uses content from content.ixml

xform>html: h, form, content.
@h>xmlns: +"http://www.w3.org/1999/xhtml".

form.ixml

+shares form
+uses css from css.ixml;
      model from model.ixml;
      iri from iri.ixml;
      s from xforms-basics.ixml
      
     form>head: title, styling?, model*.
         title: ~[" "; #a], ~[#a]+, -#a.
      -styling: -"style", s, (style; stylelink).
stylelink>link: csstype, cssrel, href.
         style: csstype, css.
 @csstype>type: +"text/css".
   @cssrel>rel: +"stylesheet".
         @href: -iri, s.

model.ixml

+shares model
+uses s, ref, xf from xforms-basics.ixml;
      id, name from xml.ixml;
      Action from action.ixml;
      iri from iri.ixml

         model: -"model", s, id, s, xf, -#a, 
                    s, (instance; bind; submission; Action)+.

      instance: -"instance", s, id, s, resource, s.
     @resource: -iri.

          bind: "bind", s, (id, s)?, ref, s, property*.
      property: type {; readonly; relevant; required; etc}.
          type: "type:", name, s.
  
    submission: -"submission", s, id, s, 
                (method, -":", resource, s)?, replace?.
       @method: "get"; "put".
      @replace: -"replace:", name, s.
{etc}

content.ixml

+shares content
+uses IDREF from xml.ixml;
      xf, ref, string, s from xforms-basics.ixml

     content>body: group.

            group: xf, control*.
         -control: input; submit {more}.

            input: -"input", s, ref, label.
            label: string.

           submit: -"submit", s, subid?, label?.
@subid>submission: -"submission:", IDREF, s.

Result

<html xmlns='http://www.w3.org/1999/xhtml'>
   <head>
      <title>Example XForm</title>
      <link type='text/css' rel='stylesheet' href='xform.css'/>
      <model id='M' xmlns='http://www.w3.org/2002/xforms'>
         <instance id='data' resource='data.xml'/>
         <submission id='save' method='put' resource='data.xml' replace='none'/>
      </model>
   </head>
   <body>
      <group xmlns='http://www.w3.org/2002/xforms'>
         <input ref='name'>
            <label>What is your name?</label>
         </input>
         <submit>
            <label>OK</label>
         </submit>
      </group>
   </body>
</html>

Conclusion

Modularisation can imitate scoping in a simple and direct manner through renaming

A pre-processor can produce a complete ixml grammar that produces an identical serialisation of the parsed input

No change in the syntax or semantics of ixml proper.