↑ Grammar Tools in ABC

Steven Pemberton
CWI, Amsterdam

When I have to work with grammars, I always use ABC to do it. Among the advantages are that you can do the work interactively, that you can very quickly build additional tools, and that you have the already powerful programming environment at your disposal.

What follows is a brief description of some of the tools I use, with an example. There is no description of ABC here: you can find a quick description of the language at http://www.cwi.nl/~steven/abc/, with information about ABC (there's a book), and how to get the implementations (they're free).

Some of what follows is also presented in the book, though at a more relaxed pace :-).

(For didactic reasons, what is presented here differs in detail from the distributed code.)

Grammars

The representation that I use is more or less a direct transcription of what a grammar is. I use a table whose keys are texts (i.e. strings) representing the nonterminals of the language, and whose items are sets of alternatives. Each alternative is a sequence of texts, representing terminals and nonterminals. So here is a how-to that displays a grammar in this form:

        HOW TO DISPLAY grammar:
           FOR name IN keys grammar:
              WRITE "`name`: " /
              FOR alt IN grammar[name]:
                 WRITE "    "
                 FOR symbol IN alt:
                    WRITE symbol, " "
                 WRITE /

and as example:

        >>> DISPLAY sentence
        ADJ:
            EMPTY
            clever
            shy
        BOY:
            John
            Kevin
        EMPTY:

        GIRL:
            Mary
            Susan
        OBJ:
            SUBJ
        SENT:
            SUBJ loves OBJ
        SUBJ:
            ADJ BOY
            ADJ GIRL

You can generate a random phrase from a grammar with the following:

        HOW TO GENERATE sym FROM grammar:
           SELECT:
              sym in keys grammar: \ Nonterminal
                 FOR new IN choice grammar[sym]:
                    GENERATE new FROM grammar
              ELSE: \ Terminal symbol
                 WRITE sym, " "

        >>> GENERATE "SENT" FROM sentence
        Susan loves clever John

Sets

Here are some necessary functions on sets. Set union:

        HOW TO RETURN set1 with set2: \ Union
           FOR x IN set2:
              IF x not.in set1:
                 INSERT x IN set1
           RETURN set1

Set difference:

        HOW TO RETURN set1 less set2: \ Difference
           FOR x IN set2:
              IF x in set1:
                 REMOVE x FROM set1
           RETURN set1

Here is a function that collects all symbols used in the rules of a grammar:

        HOW TO RETURN used grammar: \Collect all used symbols
           PUT {} IN all
           FOR rule IN grammar:
              FOR alt IN rule:
                 FOR sym IN alt:
                    IF sym not.in all:
                       INSERT sym IN all
           RETURN all

        >>> WRITE used sentence
        {"ADJ"; "BOY"; "EMPTY"; "GIRL"; "John"; "Kevin"; "Mary";
         "OBJ"; "SUBJ"; "Susan"; "clever"; "loves"; "shy"}

The terminals of the grammar are all the symbols less the nonterminals:

        >>> WRITE (used sentence) less keys sentence
        {"John"; "Kevin"; "Mary"; "Susan"; "clever"; "loves"; "shy"}

and the unused nonterminals (such as the root symbol) are the nonterminals less the used symbols:

        >>> WRITE (keys sentence) less used sentence
        {"SENT"}

For neater output, the function "listed" converts a set to a text:

        HOW TO RETURN listed set:
           PUT "" IN line
           FOR element IN set:
              PUT line ^ "`element` " IN line
           RETURN line

        >>> WRITE listed ((used sentence) less keys sentence)
        John Kevin Mary Susan clever loves shy

A useful set is the set of nonterminals that can generate empty. This is generated by repeatedly doing a pass over the rules that we don't know yet can generate empty, until we find no more:

        HOW TO RETURN empties grammar:
           PUT keys grammar, {} IN to.do, empties
           WHILE SOME name IN to.do HAS empty.rule:
              INSERT name IN empties
              REMOVE name FROM to.do
           RETURN empties
        empty.rule:
           REPORT SOME alt IN grammar[name] HAS empty.alt
        empty.alt:
           REPORT EACH sym IN alt HAS sym in empties

        >>> WRITE listed empties sentence
        ADJ EMPTY

Relations

Relations between symbols of the grammar are the essential element of the grammar tools. A relation is represented as a table whose keys are symbols, and whose items are sets of symbols.

For instance, if symbol b follows symbol a in some rule, "b" will be in the set for follows["a"], so you can say, for instance:

        IF "b" in follows["a"]: ....

Relations are sparse (i.e. a symbol is not in the keys of the relation if the set of elements is empty), so we use the following to access a relation:

        HOW TO RETURN relation for k: \relation[k] for sparse relations
           IF k in keys relation:
              RETURN relation[k]
           RETURN {}

To add an element to a relation, we use this:

        HOW TO ADD element TO relation FOR thing:
           IF thing not.in keys relation: \First time
              PUT {} IN relation[thing]
           IF element not.in relation[thing]:
              INSERT element IN relation[thing]

though you may prefer

        HOW TO ADD element TO relation FOR thing:
           PUT (relation for thing) with {element} IN relation[thing]

For instance:

        >>> ADD "b" TO follows FOR "a"

We'll display a relation with:

        HOW TO SHOW relation:
           FOR k IN keys relation:
              WRITE "`k`: ", listed relation[k] /

Here are some general functions on relations. The inverse:

        HOW TO RETURN inverse relation:
           PUT {} IN inv
           FOR k IN keys relation:
              FOR x IN relation[k]:
                 ADD k TO inv FOR x
           RETURN inv

The product of two relations (a P c iff a R1 b and b R2 c):

        HOW TO RETURN r1 prod r2: \product of relations
           PUT {} IN prod
           FOR c IN keys r2:
              FOR b IN r2[c]:
                 IF b in keys r1:
                    FOR a IN r1[b]:
                       ADD a TO prod FOR c
           RETURN prod

The closure:

        HOW TO RETURN closure r:
           FOR i IN keys r:
              FOR j IN keys r:
                 IF i in r[j]:
                    PUT r[i] with r[j] IN r[j]
           RETURN r

To make a relation reflexive, we use the following. Since relations are sparse, we also have to pass the set of symbols that it must be reflexive over:

        HOW TO RETURN symbols reflexive relation: \make the relation reflexive
           FOR sym IN symbols:
              ADD sym TO relation FOR sym
           RETURN relation

Some Examples of Relations

To collect the direct followers for each symbol, we walk along each alternative, collecting adjacent symbols. There is one catch: in a rule like:

        SENT: the ADJ PERSON

"the" and "ADJ" are adjacent, but if "ADJ" can generate empty, then so are "the" and "PERSON":

        HOW TO RETURN followers grammar:
           PUT {}, empties grammar IN foll, empty
           FOR rule IN grammar:
              FOR alt IN rule:
                 TREAT ALT
           RETURN foll
        TREAT ALT:
           FOR i IN {1..#alt-1}:
              PUT alt item i IN this
              TREAT PART
        TREAT PART:
           FOR j IN {i+1..#alt}:
              PUT alt item j IN next
              ADD next TO foll FOR this
              IF next not.in empty: QUIT

        >>> SHOW followers sentence
        ADJ: BOY GIRL
        SUBJ: loves
        loves: OBJ

To collect the direct starter symbols of each rule, you also have to deal with symbols that produce empty:

        HOW TO RETURN heads grammar:
           PUT {}, empties grammar IN heads, empty
           FOR name IN keys grammar:
              FOR alt IN grammar[name]:
                 TREAT ALT
           RETURN heads
        TREAT ALT:
           FOR i IN {1..#alt}:
              PUT alt item i IN head
              ADD head TO heads FOR name
              IF head not.in empty: QUIT

        >>> SHOW heads sentence
        ADJ: EMPTY clever shy
        BOY: John Kevin
        GIRL: Mary Susan
        OBJ: SUBJ
        SENT: SUBJ
        SUBJ: ADJ BOY GIRL

Similarly for the direct enders:

        HOW TO RETURN tails grammar:
           PUT {}, empties grammar IN tails, empty
           FOR name IN keys grammar:
              FOR alt IN grammar[name]:
                 TREAT ALT
           RETURN tails
        TREAT ALT:
           FOR i' IN {-#alt..-1}:
              PUT -i' IN i
              PUT alt item i IN tail
              ADD tail TO tails FOR name
              IF tail not.in empty: QUIT

The closure of the head relation represents all symbols that can start a rule, either directly or indirectly:

        >>> SHOW closure heads sentence
        ADJ: EMPTY clever shy
        BOY: John Kevin
        GIRL: Mary Susan
        OBJ: ADJ BOY EMPTY GIRL John Kevin Mary SUBJ Susan clever shy
        SENT: ADJ BOY EMPTY GIRL John Kevin Mary SUBJ Susan clever shy
        SUBJ: ADJ BOY EMPTY GIRL John Kevin Mary Susan clever shy

Symbol b may follow symbol a in a phrase if b follows a in an alternative, or if B follows A in an alternative and b is in heads*(B) and a is in tails*(A). This is expressed as the product

head* . follow . inverse(tail*).

Now we have enough to define a command that prints for each symbol in an alternative what may follow that symbol at that point:

        HOW TO SHOW LOCAL FOLLOWERS grammar:
           PUT (used grammar) with keys grammar IN symbols
           PUT symbols reflexive (closure heads grammar) IN head.star
           PUT symbols reflexive (closure tails grammar) IN tail.star
           PUT followers grammar IN follow
           PUT (head.star prod follow) prod (inverse tail.star) IN deep.follow
           FOR parent IN keys grammar:
              FOR alt IN grammar[parent]:
                 TREAT ALT
        TREAT ALT:
           ANNOUNCE ALT
           FOR i IN {1..#alt}:
              TREAT SYM
        TREAT SYM:
           PUT alt item i IN sym
           WRITE "    `sym`: ", listed local.follow /
        local.follow:
           PUT {} IN foll
           FOR j IN {i+1..#alt}:
              PUT alt item j IN next
              PUT foll with (head.star for next) IN foll
              IF next not.in empty:
                 RETURN foll
           RETURN foll with (deep.follow for parent)
        ANNOUNCE ALT:
           WRITE "`parent`: ", listed alt /

This prints each alternative separately, followed by each symbol of the alternative indented one to a line followed by the symbols that can follow it at that point.

For example:

        >>> SHOW LOCAL FOLLOWERS sentence
        ADJ: EMPTY
            EMPTY: BOY GIRL John Kevin Mary Susan
        ADJ: clever
            clever: BOY GIRL John Kevin Mary Susan
        ADJ: shy
            shy: BOY GIRL John Kevin Mary Susan
        BOY: John
            John: loves
        BOY: Kevin
            Kevin: loves
        EMPTY:
        GIRL: Mary
            Mary: loves
        GIRL: Susan
            Susan: loves
        OBJ: SUBJ
            SUBJ:
        SENT: SUBJ loves OBJ
            SUBJ: loves
            loves: ADJ BOY EMPTY GIRL John Kevin Mary OBJ SUBJ Susan clever shy
            OBJ:
        SUBJ: ADJ BOY
            ADJ: BOY John Kevin
            BOY: loves
        SUBJ: ADJ GIRL
            ADJ: GIRL Mary Susan
            GIRL: loves