Sdf1 to Sdf2 Conversion

The new Asf+Sdf Meta-Environment is based on a new version of Sdf (syntax definition formalism). If you have an Asf+Sdf language definition developed with the old Asf+Sdf Meta-Environment you have to perform some (simple) modifications. For details concerning Sdf2 we refer to the PhD-thesis of Eelco Visser.

The main differences between Sdf1 and Sdf2 are in the range of lexical disambiguation.

Lexical and Syntactic Modifications

Lexical Disambiguation

Sdf2 does not support automatic lexical disambiguation. If you do not take extra precautions this may lead to many ambiguities and performance penalties during parsing. There are a number of simple rules to which one has to stick when converting to and/or writing an Sdf2 definition. The lexical disambiguation rules in Sdf1 are: For more details on these lexical disambiguation rules we refer to the Sdf reference manual. We will discuss how the most crucial lexical disambiguation rules can be defined in Sdf2, but first we will give an example of how the layout defined in Sdf2.

Example Definition of Layout

module Layout

exports
  lexical syntax
    [\ \t\n]         -> LAYOUT
    "%%" ~[\n]* "\n" -> LAYOUT
    "%" ~[\%\n]+ "%" -> LAYOUT
  context-free restrictions
    LAYOUT? -/- [\ \t\n]

Besides the extra back-slash before the space symbol in the character class the main difference is the extra section "context-free restrictions". The purpose of this extra section is to force the parser to go on with recognizing layout as long as possible. The question mark behind "LAYOUT" is a new feature of Sdf2 and represents optional LAYOUT. Even if there is an empty recognizing non-terminal between two optional layouts.

Note that the use of these new features may prove to be problematic in combination with complilation and interpretation.

Definition of "Prefer Longest Match per Sort"

The prefer longest match rules forces the scanner to go on with recognizing a lexical token as long as possible. Given the definition of identifiers as
exports
  sorts Id
  lexical syntax
    [a-zA-Z][a-zA-Z0-9\_]* -> Id
The lexical token "Program" can be recognized in several ways, but by the prefer longest match rule only one interpretation is possible.

This longest match rule is enforced in Sdf2 as follows:

exports
  sorts Id
  lexical syntax
    [a-zA-Z][a-zA-Z0-9\_]* -> Id

  lexical restrictions
     -/- [a-zA-Z0-9\_]
The keyword "lexical" for "restrictions" is optional, the definition can also be written down as follows:
exports
  sorts Id
  lexical syntax
    [a-zA-Z][a-zA-Z0-9\_]* -> Id

  context-free restrictions
    Id -/- [a-zA-Z0-9\_]
When discussing the prefer keywords disambiguation rule we will come back to the prefer longest match rule.

Definition of "Prefer Keywords or Literals"

The prefer keyword or literal rules forces the scanner to recognize strings which can be recognized both as keywords or as lexical tokens like ``Identifiers'' as keywords.

To enforce this in Sdf2 we have to use the ``reject''-mechanism. Suppose we have the following context-free grammar rule

exports
  context-free syntax
    "if" Bool "then" Series "else" Series "fi" -> Statement
Than there is an overlap between these keywords and the ``Id'' definition above. This problem is solved as follows in Sdf2:
exports
  context-free syntax
    "if" Bool "then" Series "else" Series "fi" -> Statement
  context-free syntax
    "if"   -> Id {reject}
    "then" -> Id {reject}
    "else" -> Id {reject}
    "fi"   -> Id {reject}
In order to enforce that a list of characters like ``ifIdentifier'' is recognized as two separate Ids it may be wise to add also the following restrictions section:
   restrictions
     "if" "then" "else" "fi" -/- [a-zA-Z0-9\_]
This forces the parser not to stop after the "if" but to go on.
Mark van den Brand
Last modified: Wed Aug 4 11:48:17 MET DST 1999