Pascal Implementation by Steven Pemberton and Martin Daniels

Chapter 2: Syntax Analysis

Unlike other aspects of the compiler, the syntax analysis parts are not very separable, since they are mixed up with calls to all other parts, such as semantic analysis.

However the method used is that commonly known as recursive descent. This will not be treated in great detail here - consult any book on compiler theory for details.

The method depends on writing a separate parsing procedure for each kind of syntactic structure, such as if statement, assignment statement, expression and so on, and each of these is only responsible for analysing its own kind of structure. If any structure contains another structure then the parsing procedure can call the procedure for this contained structure.

As an example, consider the procedure ifstatement [3185-98]. Eliminating all but the syntax analysis parts leaves

procedure ifstatement;
begin
    expression;
    if sy = thensy then insymbol
    else error(52);
    statement;
    if sy = elsesy then
    begin
        insymbol; statement
    end
end;

This relies on the existence of similar procedures for expression and statement.

When ifstatement is called, the initial if symbol has already been recognised [3457], so the first action of ifstatement is to call the procedure expression to parse the condition. A then symbol should follow, so this is tested for and skipped if present, and an error announced if absent. Procedure statement is called after this to parse the statement that should follow. The else part is optional, so if an else symbol is present it is skipped and statement called; otherwise nothing more is done.

Another simple example is whilestatement [3309-15]:

procedure whilestatement;
begin
    expression;
    if sy = dosy then insymbol else error(54);
    statement
end;

Of course constructs may contain their own constructs, for instance an if statement may contain another if statement:

and this is why the method is called recursive, since in this case ifstatement will call statement, which will again call ifstatement for the enclosed if statement, which will itself again call statement.

Error Recovery

Recursive descent works very well as long as the program being compiled is syntactically correct. However, as soon as there is a syntax error it risks getting completely out of step with the source program. For instance, if the then symbol should be misspelled

if a > max tehn max:=a

then error 52 will be reported, and statement will be called with tehn as its first symbol. In trying to parse this as an assignment statement, tehn will be flagged as an undeclared identifier, and the becomes symbol will be reported as missing after it, and so on with a hopeless cascade of error messages, all because of one error.

So, in an attempt to reduce these cascading messages, a method of error recovery has been used.

The method involves passing over to parsing procedures a set of symbols representing what might be called synchronising points. When a syntax error occurs, symbols may be skipped until one of the synchronising points is found, allowing the parser to get back into phase. An example of such synchronisation is at [3464-5] in procedure statement, where if the statement being compiled is not followed by an acceptable symbol, error 6 is reported and procedure skip is called to skip input symbols to a synchronising point.

So, returning to procedure ifstatement [3185] and now adding the error recovery parts:

procedure ifstatement;
begin
    expression(fsys + [thensy]);
    if sy = thensy then insymbol
    else error(52);
    statement(fsys + [elsesy]);
    if sy = elsesy then
    begin insymbol;
	statement(fsys)
    end
end;

Fsys [3187] is nonlocal to ifstatement, passed over to statement [2080], and containing the synchronising symbols for this statement. When expression is called, thensy is added to this set as an extra possible synchronising point. Similarly, when statement is called at [3190] elsesy is added to the set, since an else may follow it.

Fsys is set up initially in the call to the first compiling procedure programme for the main program [3998]. Its initial value is all those symbols that can uniquely start a declaration (blockbegsys [3817]) or a statement (statbegsys [3820]). Casesy is excluded since it may also appear as part of a record declaration, and so does not make a very good choice of synchronisation point. It is added back into the set at the call of body [3584] once the declarations have safely been compiled.

The procedure for a while statement, with added error recovery parts, is

procedure whilestatement;
begin
    expression(fsys + [dosy]);
    if sy = dosy then insymbol else error(54);
    statement(fsys)
end;

This method of error-recovery is formalised in Hartmann (1975) and Pemberton (1979).

Routine skip, lines [855-62]

This is the one routine that is uniquely to do with syntax analysis, for skipping symbols when recovering from a syntax error.

This routine is mainly self explanatory; the only thing to note is line [860]. The while loop on the previous line terminates when (sy in fsys) or eof(input). So, if not (sy in fsys) is true on [860], then eof(input) must be true. You may wish to contemplate why insymbol should be called in this situation.

Pascal Implementation by Steven Pemberton and Martin Daniels