Most current ixml grammars are small. However there are examples of large grammars, and it is likely that in the future more large grammars will emerge as ixml usage increases.
To make large grammars more manageable, and to enable reuse, it would be useful to have a way to modularise them.
One of the requirements of modularisation for reuse in any notation is to have a method of specifying the contractual interface, such that it is possible for the producers of the modules to change their internal structure without breaking any existing usage of the module.
This paper describes a proposal for an ixml preprocessor that permits an ixml grammar to invoke other modules of ixml grammars, specifying their linkage. This involves renaming rules with name clashes in the modules, using ixml renaming, resulting in a single ixml grammar with no rule-name clashes, and so that the resultant XML serialisations remain the same. The invoking grammar remains unchanged.
There is no change to the syntax or semantics of ixml proper.
Keywords: ixml, parsing, context-free grammars, XML, modularisation
Invisible XML (ixml) is a notation and process that uses context-free grammars to describe the format of textual documents.
This allows documents to be parsed into an abstract parse-tree, which can be processed in various ways, but principally serialised into an XML document, thus making the implicit structure of the textual document explicit in the XML.
Most current ixml grammars are small (the grammar for ixml itself for example is around 70 lines).
Large grammars may emerge containing subparts that are authored by different people.
E.g. there is a grammar for XPath 4 at around 350 lines which could be used by grammars for languages that use XPath 4.
The nice thing about general context-free grammars is that they can be combined, and remain general context-free, which makes modularisation feasible.
The main problem to be solved: rule name clashes between modules.
Other requirements and desiderata:
Renaming is a new ixml feature agreed by the working group.
Already present in several implementations.
It allows you to specify for a rule a different name than default for a rule to be used on serialisation.
Consider a grammar that accepts both 31/12/1999
and
31
December
1999
forms of dates:
date: numeric; textual. -numeric: day, -"/", month, -"/", year. -textual: day, -" "+, tmonth, -" "+, year. day: d, d?. month: d, d?. year: d, d, d, d. tmonth: -"January", +"1"; -"February", +"2"; ... -"December", +"12". -d: ["0"-"9"].
While 31/12/1999
produces
<date> <day>31</day> <month>12</month> <year>1999</year> </date>
31
December
1999
produces
<date> <day>31</day> <tmonth>12</tmonth> <year>1999</year> </date>
where the difference is because it is produced from a different input syntax.
Using renaming, you can specify that both have the same serialised name:
tmonth > month: -"January", +"1"; -"February", +"2"; ... -"December", +"12".
tmonth
is the rule name, month
is the name used on
serialisation.
A module consists of a regular ixml grammar, preceded by specifications of rules used from other modules and what is shared for use from this module.
+uses css from css.ixml +uses iri, url, uri, urn from uri.ixml
It is possible to combine them
+uses css from css.ixml; iri, url, uri, urn from uri.ixml
Also possible:
+uses iri from https://example.com/ixml/modules/iri.ixml
The specification of what can be used is similar:
+shares iri, url, uri, urn
There are two main choices for a grammar for these. The first literally recognises the structure as it is specified above:
module: s, (uses; shares)*, ixml. uses: -"+uses", rs, from++(-";", s). shares: -"+shares", rs, entries. from: entries, rs, -"from", rs, location, s. -entries: share++(-",", s). share: @name, s. @source: iri.
using s
, rs
, name
, and
ixml
from the ixml grammar, and presupposing a rule for
iri
A specification like
+uses css from css.ixml; iri, url, uri, urn from uri.ixml
then produces
<uses> <from source='css.ixml'> <share name='css'/> </from> <from source='iri.ixml'> <share name='iri'/> <share name='url'/> <share name='uri'/> <share name='urn'/> </from> </uses>
module: s, (multiuse; shares)*, ixml. -multiuse: -"+uses", rs, uses++(-";", s). shares: -"+shares", rs, entries. uses: entries, rs, -"from", rs, from. -entries: share++(-",", s). share: @name, s. @from: iri, s.
where the resulting structure is then:
<uses from='css.ixml'> <share name='css'/> </uses> <uses from='uri.ixml'> <share name='iri'/> <share name='url'/> <share name='uri'/> <share name='urn'/> </uses>
+uses css from css.ixml +uses iri, url, uri, urn from uri.ixml +shares model, control
uses
and shares
specifications in
a module must be unique;uses
;shares
.Modules are allowed to invoke each other.
E.g. a programming language where declarations can include procedures, and procedures can include declarations.
Module for procedures:
+uses declaration from declaration.ixml +shares procedure
module for declarations:
+uses procedure from procedure.ixml +shares declaration
This illustrates that a uses
specification is different from,
for instance, #include
in C preprocessing, since uses
only ensures that the module will be present in the final grammar.
A module can only share rules it defines; it is not permitted to share a rule from a different module like this:
+uses x, y from z.ixml +shares x
We can now use modules to define modules:
+uses ixml, name, s, rs from ixml.ixml +uses iri from iri.ixml +shares module module: s, (multiuse; shares)*, ixml. -multiuse: -"+uses", rs, uses++(-";", s). shares: -"+shares", rs, entries. uses: entries, rs, -"from", rs, from. -entries: share++(-",", s). share: @name, s. @from: iri, s.
The invoking module and all invoked modules are collected.
If any two contain the definition of a rule of the same name, one of the rules is renamed:
A rule is renamed by generating a new unique name, different from all other rule names in the set of modules:
name > alias
), the
rule is redefined with the new name and the existing alias (newname
> alias
)newname > oldname
).All applications of the old name in the module grammar, and any of the other modules that use that rule are replaced with the new name.
Once all naming conflicts are resolved, all invoked modules are appended to
the invoking module, with the uses
and shares
specifications removed.
What these rules ensure is that:
Imagine a language of identity statements of the style
total=price+tax+shipping tax=price×10÷100 shipping=5
expressed using the definition of expr
from another module:
+uses expr from expr.ixml data: identity+. identity: id, -"=", expr, -#a. id: [L]+.
However the expr
module has a clashing rule for
id
:
+shares expr expr: id++op. id: [L; Nd]+. op: ["+-×÷"].
Since the invoking grammar never gets changed, the rule in the module gets renamed, resulting in the following complete grammar:
data: identity+. identity: id, -"=", expr, -#a. id: [L]+. expr: id_++op. id_>id: [L; Nd]+. op: ["+-×÷"].
If the module's rule for id
had instead been a
renaming, for instance:
id>ident: [L; Nd]+.
then the renaming would have ended up as:
id_>ident: [L; Nd]+.
Making the example slightly more complex, with rules like
result[1]=a1+b1+c1 result[2]=a2+b2+c2
using this grammar:
+uses expr from expr.ixml; identity from id.ixml rules: rule+. rule: identity, -"=", expr, -#a.
Module expr.ixml
+shares expr expr: operand++op. operand: id; number. id: [L], [L; Nd]*. op: ["+-×÷"]. number: ["0"-"9"]+.
Module identity.ixml
has a clash with both id
and
number:
+shares identity identity: id; id, -"[", number, -"]". id: [L]+. number: digits, (".", digits)?. -digits: [Nd]+.
The invoking grammar never changes:
rules: rule+. rule: identity, -"=", expr.
In module expr.ixml
nothing needs changing
expr: operand++op. operand: id; number. id: [L], [L; Nd]*. op: ["+-×÷"]. number: ["0"-"9"]+.
In identity.ixml
both id
and number
are renamed:
identity: id_; id_, -"[", number_, -"]". id_>id: -"@", [L]+. number_>number: digits, ".", digits. -digits: [Nd]+.
The rules allow either or both to be renamed in expr.ixml
instead.
The invoking grammar:
+uses id from ident.ixml; expr from expr.ixml rules: rule+. rule: id, -"=", expr.
Module ident.ixml
+shares id id: [L]+.
Module expr.ixml
+uses id, number from id.ixml +shares expr expr: operand++op. operand: id; number. op: ["+-×÷"].
Module id.ixml
+shares id, number id: [L], [L; Nd]*. number: [Nd]+.
Here there are two rules called id
both shared and used by two
different modules.
The invoking grammar is never changed:
rules: rule+. rule: id, -"=", expr.
and since the id
rule is used from module
ident.ixml
, the rule may not be renamed there:
id: [L]+.
This means that the id
rule in module id.ixml
has
to be renamed:
id_>id: [L], [L; Nd]*. number: [Nd]+.
and in module expr.ixml
that uses it
expr: operand++op. operand: id_; number. op: ["+-×÷"].
Imagine you were defining a textual format for XForms:
Example XForm style xform.css model M instance data data.xml submission save put:data.xml replace:none input name "What is your name?" submit "OK"
This is going to need definitions for CSS, URIs, XPath, and a lot more. Then you might define a grammar like this (this is not a complete example).
+uses form from form.ixml +uses content from content.ixml xform>html: h, form, content. @h>xmlns: +"http://www.w3.org/1999/xhtml".
+shares form +uses css from css.ixml; model from model.ixml; iri from iri.ixml; s from xforms-basics.ixml form>head: title, styling?, model*. title: ~[" "; #a], ~[#a]+, -#a. -styling: -"style", s, (style; stylelink). stylelink>link: csstype, cssrel, href. style: csstype, css. @csstype>type: +"text/css". @cssrel>rel: +"stylesheet". @href: -iri, s.
+shares model +uses s, ref, xf from xforms-basics.ixml; id, name from xml.ixml; Action from action.ixml; iri from iri.ixml model: -"model", s, id, s, xf, -#a, s, (instance; bind; submission; Action)+. instance: -"instance", s, id, s, resource, s. @resource: -iri. bind: "bind", s, (id, s)?, ref, s, property*. property: type {; readonly; relevant; required; etc}. type: "type:", name, s. submission: -"submission", s, id, s, (method, -":", resource, s)?, replace?. @method: "get"; "put". @replace: -"replace:", name, s. {etc}
+shares content +uses IDREF from xml.ixml; xf, ref, string, s from xforms-basics.ixml content>body: group. group: xf, control*. -control: input; submit {more}. input: -"input", s, ref, label. label: string. submit: -"submit", s, subid?, label?. @subid>submission: -"submission:", IDREF, s.
<html xmlns='http://www.w3.org/1999/xhtml'> <head> <title>Example XForm</title> <link type='text/css' rel='stylesheet' href='xform.css'/> <model id='M' xmlns='http://www.w3.org/2002/xforms'> <instance id='data' resource='data.xml'/> <submission id='save' method='put' resource='data.xml' replace='none'/> </model> </head> <body> <group xmlns='http://www.w3.org/2002/xforms'> <input ref='name'> <label>What is your name?</label> </input> <submit> <label>OK</label> </submit> </group> </body> </html>
Modularisation can imitate scoping in a simple and direct manner through renaming
A pre-processor can produce a complete ixml grammar that produces an identical serialisation of the parsed input
No change in the syntax or semantics of ixml proper.