# Invisible Markup

## Abstractions

Numbers are abstractions: you can't point to the number three, just three bicycles, or three sheep, or three self-referential examples.

Three is what those bicycles and sheep and examples have in common.

## Representations

You can represent a number in different ways:

3, III, 0011, ㆔, ३, ፫, ૩, ੩, 〣, ೩, ៣, ໓, Ⅲ, ൩, ၃, ႓, trois, drie.

You can concretise numbers as a length, a weight, a speed, a temperature.

But in the end, they all represent the same three.

## Data is an abstraction too!

We are often obliged for different reasons to represent data in some way or another.

But in the end those representations are all of the same abstraction; there is no essential difference between the JSON

`{"temperature": {"scale": "C"; "value": 21}}`

and an equivalent XML

`<temperature scale="C" value="21"/>`

or

```<temperature>
<scale>C</scale>
<value>21</value>
</temperature>```

or indeed

`temperature: 21°C`

since the underlying abstractions being represented are the same.

## What Invisible Markup Does

Takes a representation of data (typically with implicit structure).

Uses a description of the format of that data to recognise the data's structure.

Creates an internal representation of the data, now with the structure made explicit.

Which can be used for multiple purposes, including creating an external representation with explicit structure.

## (Simple) Example: Dates

`19 October 2022`

Describe the format:

```date: day, " ", month, " ", year.
day: digit, digit?.
month: "January"; "February"; ...; "December".
year: digit, digit, digit, digit.
digit: ["0"-"9"].```

## (Simple) Example: Dates

`19 October 2022`

Describe the format:

```date: day, " ", month, " ", year.
day: digit, digit?.
month: "January"; "February"; ...; "December".
year: digit, digit, digit, digit.
digit: ["0"-"9"].```

Process the input with this description, and get:

```<date>
<day>
<digit>1</digit>
<digit>9</digit>
</day>
<month>October</month>
<year>
<digit>2</digit>
<digit>0</digit>
<digit>2</digit>
<digit>2</digit>
</year>
</date>```

## (Simple) Example: Dates

`19 October 2022`

Describe the format:

```date: day, " ", month, " ", year.
day: digit, digit?.
month: "January"; "February"; ...; "December".
year: digit, digit, digit, digit.
-digit: ["0"-"9"].```

Process the input with this description, and get:

```<date>
<day>19</day>
<month>October</month>
<year>2022</year>
</date>```

## (Simple) Example: Dates

`19/10/2022`

```date: day, " ", month, " ", year;
day, "/", nmonth, "/", year.
day: digit, digit?.
month: "January"; "February"; ...; "December".
nmonth: digit, digit?.
year: digit, digit, digit, digit.
-digit: ["0"-"9"].```

Process the input with this description, and get:

```<date>
<day>19</day>/
<nmonth>10</nmonth>/
<year>2022</year>
</date>```

## Several dates

`dates: date+.`

Better:

`dates: (date, " "*)+.`

or:

`dates: date++", ".`

for

`19/10/2022, 31 December 2022, 1/1/2023`

## Attributes

```date: day, " ", month, " ", year;
day, "/", nmonth, "/", year.
@day: digit, digit?.
@month: "January"; "February"; ...; "December".
@nmonth: digit, digit?.
@year: digit, digit, digit, digit.
-digit: ["0"-"9"].```

with input

`19/10/2022`

gives

`<date day="19" nmonth="10" year="2022">//</date>`

## Deleting terminals

```date: day, -" ", month, -" ", year;
day, -"/", nmonth, -"/", year.
@day: digit, digit?.
@month: "January"; "February"; ...; "December".
@nmonth: digit, digit?.
@year: digit, digit, digit, digit.
-digit: ["0"-"9"].```

with input

`19/10/2022`

gives

`<date day="19" nmonth="10" year="2022"/>`

## Ambiguity

ixml reports ambiguous input.

A grammar accepting both World and USA style dates, with month only 1-12, and day 1-31:

```date: us; world.
us: month, -"/", day, -"/", year.
world: day, -"/", month, -"/", year.
month: "0"?, ["1"-"9"];
"10"; "11"; "12".
etc```

the input `04/10/2021` would produce:

```<!-- AMBIGUOUS
The input from line.pos 1.1 to 1.11 can be interpreted as 'date' in 2 different ways:
1: us[:1.11]
2: world[:1.11]
-->
<date ixml:state="ambiguous" xmlns:ixml="http://invisiblexml.org/NS">
<us>
<month>04</month>
<day>10</day>
<year>2021</year>
</us>
</date>```

## Real world example

The hardest part of getting an article into Docbook format (the format used by several conferences I go to) is getting the bibliography right.

The bibliography for a recent paper was produced with the help of ixml. For instance, the text

`[spec] Steven Pemberton (ed.), Invisible XML Specification, invisiblexml.org, 2022, https://invisiblexml.org/ixml-specification.html`

was processed by an ixml grammar whose top-level rules were

```bibliography: biblioentry+.
biblioentry:
abbrev, (author; editor), -", ",
title, -", ",
publisher, -", ", pubdate, -",",
(artpagenums, -", ")?,
(bibliomisc; biblioid)**-", ", -#a.```

## Yielding

```<biblioentry>
<abbrev>spec</abbrev>
<editor>
<personname>
<firstname>Steven</firstname>
<surname>Pemberton</surname>
</personname>
</editor>
<title>Invisible XML Specification</title>
<publisher>invisiblexml.org</publisher>
<pubdate>2022</pubdate>
<bibliomisc>
</bibliomisc>
</biblioentry>```

## Processing step

This produces a structured parse tree, which can then be processed in a number of ways, such as serialization as XML.

## Structured ixml

The format description is drawn as a structured document.

However, it is normally supplied in textual form, and is processed in exactly the same way, by the ixml processor, but using a description of the ixml format.

This results in the structured version of the description.

## ixml in ixml

ixml is of course expressed in ixml:

`rule: (mark, s)?, name, s, -["=:"], s, -alts, -".".`

which comes out as XML

```<rule name='rule'>
<alt>
<option>
<alts>
<alt>
<nonterminal name='mark'/>
<nonterminal name='s'/>
</alt>
</alts>
</option>
<nonterminal name='name'/>
<nonterminal name='s'/>
<inclusion tmark='-'>
<member string='=:'/>
</inclusion>
<nonterminal name='s'/>
<nonterminal mark='-' name='alts'/>
<literal tmark='-' string='.'/>
</alt>
</rule>```

## ixml

Version 1 was officially released last June on invisiblexml.org

Currently 3 implementations running, 3 more in preparation.

For full details read the specification, or see the tutorial.