Pascal Implementation by Steven Pemberton and Martin Daniels

Chapter 7: Compiling Statements

Lines: [3104-467]

Routines: Assignment, gotostatement, compoundstatement, ifstatement, casestatement, repeatstatement, whilestatement, forstatement, withstatement, statement.

There are nine kinds of statement: if, case, while, repeat, for, with, goto, compound, assignment, and procedure call. Procedure calls have already been dealt with in the last chapter.

Routine assignment, lines [3104-144]

There are three kinds of code generated for assignments. The simplest case is assigning a value to a direct simple variable (an actual variable that is not an array or record). Examples:

 a:=0

where a is a local variable

 LDCI 0   Load zero
 STRI 0 5 Store it in a

Where a is a global variable

 LDCI 0
 STOI 10

The next kind is when the variable is simple, but indirect (a pointer or formal variable that is not an array or record). Examples

 p^:=0

or

f:=0

where p or f is local:

LODA 0 5 Load the address of the variable
LDCI 0
STRI     Store indirect

or if p is global (f cannot be)

LDOA 9
LDCI 0
STRI

The final case is for arrays and records whether direct or indirect. Example

 r:=s

where r and s are local:

LDA 0 5   Load address of r
LDA 0 100 Load address of s
MOV 20    Move 20 locations from s to r

If either of the variables was global, LAO would be used instead of LDA.

[3106] The first identifier has already been read [3451] (its information is in fcp). Get the selector if any.

[3110-1] If indirect or not simple, load its address.

[3112] Save the attributes of the left hand side for use in the calls of store.

[3113] Get the right hand side.

[3114-6] If it was simple then load it, else load its address.

[3119-22] If the left hand side is real, and the right integer, float the integer.

[3123] Check the two sides for compatibility.

[3124-38] Generate the store instruction.

[3138] Only files may not be assigned.

Notes

  1. The test (gattr. typtr^.form > power) on [3110] would be better as

    (gattr.typtr^. form in [arrays, records, files])

    and the test (gattr.typtr^.form <=power) on [3115] would be better as

    (gattr.typtr^.form in [scalar, subrange, pointer, power])
  2. The zero in the CHK instruction [3132] allows the pointer to be nil. See also [2211] in selector.

Routine gotostatement, lines [3146-72]

The code for a goto statement is

 UJP label (Unconditional jump)

The compiler only accepts goto's within a routine.

[3152-4] The top elements of display may be entries for with statements, so find the top-most entry for a block.

[3155-67] Search the display for the required label, going down through the levels.

[3161] If the label is in the current block then generate the jump instruction. Otherwise it is a jump to a non-local label, which is not implemented.

Notes

  1. Instead of searching for the top-most block, level could be used, which points straight to it. See [3433].

  2. The problem with non-local jumps is at run-time in resetting the correct environment on the stack, by removing stack frames until the stack frame of the jumped-to routine is reached. To achieve this a level would have to be given to the UJP instruction so that UJP 0 would be equivalent to the current local jump, and UJP n would be a jump to a label, n textual levels out.

    To do this, replace [3162] with

    genujpxjp(57(*ujp*), level-ttop, labname).

    Then the interpreter must be changed to cope with this:

    req:= base(p);
    while mp <> req do
    begin
        (*unstack frame as in a RETP*)
        sp:=mp-1;
        ep:=store[mp+3].vm;
        mp:=store[mp+2].vm
    end;
    pc:=q
  3. When the compiler compiles itself, it generates 1,753 labels. However the majority of these (with the exception of those used in goto statements and routine calls) are used only once. If space in the assembler were at a premium, the treatment of labels could be revised to re-use the single-use ones. This would mean having two kinds of labels in the code - single use ones, and programmer defined ones

Routine compoundstatement, lines [3174-83]

This procedure should be self-evident.

Routine ifstatement, lines [3185-98]

The code for an if statement

if condition then statement1 else statement2

is

 [Code for condition]
 FJP L 6    Jump to label 6 if the condition was false
 [Code for statement1]
 UJP L 7
L 6
 [Code for statement2]
L 7

If there is no else part, then it is

 [Code for condition]
 FJP L 6
 [Code for statement1]
L 6

[3187] Compile the condition

[3188] Generate a label for the FJP, and generate the FJP.

The rest should be easy to understand.

Note

Lcix1 and lcix2 is a terrible choice of names.

Routine casestatement, lines [3200-89]

A case statement

case expression of
 2,4: statement1;
 3,6: statement2
end

has the following code

 [Code for expression]
 UJP L 8
L 10
 [Code for statement1]
 UJP L 9  Jump to end of case statement
L 11
 [Code for statement2]
 UJP L 9
L 8
 CHKI 2 6 Check the value is in range
 LDCI 2
 SBI      Subtract the lower bound
 XJP L l2 Indexed jump into the following table
L 12
 UJP L 10
 UJP L 11
 UJP L 10
 UJC
 UJP L 11
L 9

The reason for this tortuous-looking code is that the number of cases is not known until after all the cases have been compiled. So code is generated to jump over the code for the cases (UJP L 8 above), to where the value can be checked as in range (CHKI). Then the lower bound of the range is subtracted to leave a value in the range 0..m, which is used by the XJP to index a table of jump instructions. Values in this table which have not been used as case labels have a UJC -- case error -- entry.

[3202-7] Each case label (such as 2, 4, 6, and 3, in the above example) has associated with it an internal label (labels 1 and 2 above). These two values are stored in the cslab and csstart fields respectively of caseinfo, and these are chained together by the next field.

[3210] Compile the case expression.

[3213-5] Check that it is scalar and not real.

[3216] Coerce it to integer if necessary.

[3217] Generate the jump to the checking code.

[3219] Laddr will be the label at the end of the case statement, to which each case arm jumps (label 9 above).

[3220-58] This repeat gets one case arm.

[3221] Lcix1 is the label for this arm.

[3224-48] Get the case labels.

[3224] Get one label.

[3226] Check its type matches the expression.

[3227-44] The list of case labels is chained together in diminishing order of the values in cslab. Fstptr points to the highest valued one. For example:

Lpt2 and lpt1 are made to point to either side of where the new case label is to be inserted, and lpt3 points to the new label. For example, to insert 5:

Then the new label can be inserted:

[3232] Check this case label has not occurred before.

[3250] Output the label.

[3251] Compile the statement. The repeat loop is just syntax-error recovery.

[3253-4] If no error has occurred, generate the jump to the end of the case statement.

[3259] Output the label for the checking code (the destination of the jump generated at [3217]).

[3261] Save the maximum case label value.

[3263-6] Reverse the list so that it is now in diminishing order.

[3267] Save the minimum case label value.

[3268] Check that the jump table is not going to be too big (cixmax is declared on [1809]).

[3270-2] Generate the check, the subtraction, the indexed jump, and the label.

[3273-83] Output the jump table.

[3276-9] For values that were not included in the case labels, generate the UJC.

[3284] Output the final label.

Notes

  1. The code

    LDCI lmin
    SBI

    can be replaced by

     DECI lmin

    but there should be a check to prevent it being generated if lmin is zero.

  2. The label that is parameter to the XJP instruction always immediately follows the instruction, and so is redundant. This leaves the parameter free for a more worthwhile purpose -- to hold the length of the jump table so that the XJP instruction can check the value is in range, and the CHK can be eliminated. So the checking code and jump table for the earlier example can be:

    L 8
     DECI 2
     XJP 5
     UJP L 10
     UJP L 11
     UJP L 10
     UJC
     UJP L 11
    L 9

    The change to the interpreter for this is:

    if (store[sp]. vi < 0) or (store[sp].vi >=q) then errori('case error ')
    else pc: =pc+store[sp].vi

    Of course, if XJP was given two parameters like CHK, then

    DECI lmin
    XJP size

    could be replaced by

    XJP lmin size.
  3. It is pointless to form the case list in descending order, only to reverse it immediately after. Better to keep it in ascending order from the start, updating lmax as new labels are added.

  4. It is arguable whether the check for cixmax [3268] is needed.

    Presumably it is used to prevent the vast amount of useless code that would be generated for programs like the following.

    case i of
    1: i: =9999;
    9999: i:=1
    end

    (As an aside: how would you change the compiler to be able to accept the above statement?)

  5. Cixmax is declared in rather an odd place, and rather far away from its only use. Better either at the head of this procedure, or at the start of the compiler.

  6. As with array indexing, case statements for characters generate problem code for those who wish to assemble it on a machine with a different character set. Consider

    case ch of
    '0': s1;
    '1': s2
    end

    giving the following code:

     LODC 0 8
     ORDC
     UJP L 6
    L 8
     [Code for s1]
     UJP L 7
    L 9
     [Code for s2]
     UJP L 7
    L 6
     CHKI 48 49
     LDCI 48
     SBI
     XJP L 10
    L 10
     UJP L 8
     U3P L 9
    L 7

    As with array indexing, the 48 and 49 parameters to CHK and LDC are the integer values of '0' and '1' on the source machine. Refer to the notes to selector for some possible solutions.

Routine repeatstatement, lines [3291-307]

The code for a repeat statement

repeat
    statements
until condition

is

L 10
 [Code for statements]
 [Code for condition]
 FJP L 10

This is a fairly straightforward routine. Most of its complexity is due to syntactic error recovery in case of missed semicolons, by allowing a sequence of statements not separated by semicolons, where one statement should be.

Notes

  1. Note that [3294-6] is identical to [3299-301]. Calling this 'A', the structure of this procedure is

    A;
    while sy=semicolon do
    begin insymbol;
          A
    end;

    This is a typical example of a loop terminated in the middle, inexpressible in Pascal. One way of rewriting it to avoid the duplication of A is

    repeat
        A;
        finished:=sy<> semicolon;
        if not finished then insymbol
    until finished

    or (see the notes for procedure factor)

    repeat A
    until isnt(semicolon);

Routine whilestatement, lines [3309-15]

A very straightforward routine. The code for a while statement

while condition do statement

is

L 15
 [Code for condition]
 FJP L 16
 [Code for statement]
 UJP L 15
L 16

Routine forstatement, lines [3317-87]

The code for a for statement

for i:= E1 to E2 do statement

is

 [Code for E1]
 [Code to store E1 in i]
 [Code for E2]
 STRI temp temp is a temporary local variable to hold the upperbound
L 8
 [Code to load i]
 LODI temp
 LEQI      Test i <= temp
 FJP L 9   Finish if i > temp
 [Code for statement]
 [Code to load i]
 INCI 1    Increment i
 [Code to store in i]
 UJP L 8
L 9

The only difference for a downto loop is that LEQI is replaced by GEQI, and INCI by DECI.

[3321] As mentioned above, a temporary variable will be created for the duration of this statement. Save the value of lc so that it can be reset at the end to forget the temporary.

[3322-5] These are default attributes of the control variable in case it is missing.

[3326-33] Get the identifier and fill in its attributes.

[3334] The variable must not be a formal variable. (This ensures it is directly accessible).

[3336-9] Check its type.

[3345-9] Get the lower bound and check its type.

[3350] Generate code to load the lower bound and store it in the control variable.

[3356-60] Get the upper bound and check its type.

[3361-3] Load it and convert to integer if necessary.

[3364] Calculate the address of the temporary, allocating it above the other variables in this routine.

[3365] Generate code to store the upper bound in the temporary.

[3366] Generate and output the label.

[3367-9] Restore the attributes of the control variable, load it and convert to integer if necessary.

[3370] Load the temporary.

[3371-2] Allocate the space for the temporary, and update lcmax if necessary (this records the maximum size of the variables area).

[3373-4] Generate the LEQ or GEQ instruction.

[3379] Generate the end label, and the FJP to it.

[3380-1] Get the do symbol, and compile the statement.

[3382] Reload the control variable.

[3383-4] Increment or decrement as necessary.

[3385] Store the revised value in the control variable, generate the UJP to the top of the loop, and output the end label.

[3386] Reset lc to reclaim the space for the temporary variable.

Notes

  1. One of the aims of P-code is to be in some way 'ideal' for the translation of Pascal. With this is mind, the code generated for a for statement could be drastically improved.

    Rather than having separate instructions to manipulate the limits and the control variable, if these values are loaded on the stack, then only two instructions are needed to do the manipulations:

    for i:=EI to E2 do statement
    

    would give

     [Code to load the address of i]
     [Code to load E1]
     [Code to load E2]
     FOR 1 L 10
    L 11
     [Code for statement]
     NXT 1 L 11
    L 10

    For a downto use FOR 2 and NXT 2.

    This saves seven instructions and the temporary variable for each loop.

    The structure of the interpreter actions for these is

    (*FOR P Q*)
    if p=1 then test := E1 <= E2
    else test := E1 >= E2;
    if test then controlvariable:=E1
    else begin
        unstack the 3 values;
        pc:=q
    end
    
    (*NXTP Q*)
    if p=1 then test:= controlvariable < E2
    else test := controlvariable > E2;
    if test then begin
        if p=1 then controlvariable:=controlvariable + 1
        else controlvariable:=controlvariable - 1
        pc:=q
        end
    else unstack the 3 values.

    However, this solution does cause a problem with goto's leading out of a for statement, leaving the three values on the stack. Consider how to overcome this.

  2. Error 180 would be better than 155 on line [3334].

  3. The assignments to kind and access [3329, 3331] are redundant, having already been set [3323-4].

  4. The 1 of the calls to gen2 for LEQ and GEQ [3373-4] is unused.

  5. The use of the genujpxjp for the FJP is because genfjp tests that the last expression compiled was boolean.

  6. Lsp [3318] is never used.

Routine withstatement, lines [3390-429]

If the record variable of a with statement is directly accessible, no extra code is generated. For instance the code for

with r do f:=0

is no different to that for

 r.f:=0

However if the record is indirectly accessible, a temporary local variable is created to hold the record's address. Then all accesses to it are via this local.

Examples:

with r do f:=0

where r is direct, gives

LDCI 0
STRI 0 6

The code

 ir^.f:=0

gives

 
LODA 0 8 Load the address of the record.
INCA l   Increment it by the offset of f
LDCI 0
STOI     Store indirect

while

with ir^ do f:=0

gives

LODA 0 8 Load the address
STRA 0 9 Store it in the local
LODA 0 9 Access the address
INCA l   Increment to get f
LDCI 0
STDI     Store indirect

[3392] These two values will be used to reset top and lc later [3428].

[3393] A with statement can have a list of variables. This repeat deals with each variable.

[3394-7] Get the variable.

[3398-9] Check it is a record.

[3401-5] Put the details of the record on the display.

[3406-10] If the record is directly accessible, make the display entry as a constant record occurrence (crec).

[3412-19] If it is indirect, create the temporary local variable, generate code to load and store the address of the record in it, and mark the occurrence as a variable record (vrec).

[3427] Compile the statement.

[3428] Reclaim the used display entries, and the temporary variables.

Note

  1. Consistency with the way lc is treated would suggest that instead of lcnt1, a variable ltop be used, with [3392] as

    ltop := top;

    and [3428] as

    top:= ltop;

Routine statement, lines [3431-67]

[3432-45] Deal with a label before the statement.

[3433-41] It must be declared at this level, so search the local list of declared labels.

[3437] If it has already been used to label a statement, report an error.

[3438] Output the label and record that it has been defined.

[3446-7] Synchronise the input.

[3448] A statement may be empty, so see if it is.

[3450] Use the initial symbol to decide on the kind of statement.

[3451-4] An identifier may start an assignment or a procedure call, so the semantic attributes of the identifier are used to distinguish which it is.

[3464] The only symbols that may follow a statement are ';', end, else, and until.

Note

  1. An annoying aspect of many Pascal compilers is that if a procedure has not been declared, when it is called the compiler gives an error message and then, due to the test at [3452], tries to compile the call as an assignment statement, giving many more error messages. This is a problem with all semantics-directed parsing.

    However in this case there is no need to use semantic information.

    A call has the form

    identifier

    or

    identifier(parameters)

    whilst an assignment has the form

    identifier:=...
    identifier[indexes]:=...
    identifier.field:=...

    or

    identifier^ :=...

    Therefore the decision on which it is can be taken on the symbol after the identifier: if it is an open bracket or one of the statement follower symbols then it is a call. If it is a selectsys or a becomes symbol it is an assignment, otherwise skip, giving:

    ident: begin searchid([vars, field, func, proc], lcp);
                insymbol;
                if (sy in statfolsys) or (sy=lparent) then call(fsys, lcp)
                else if (sy in selectsys) or (sy=becomessy) then assignment(lcp)
                else begin error (6); skip(fsys+statfolsys) end
    
           end

    where

    statfolsys=[semicolon, endsy, elsesy, untilsy]

Pascal Implementation by Steven Pemberton and Martin Daniels