Pascal Implementation by Steven Pemberton and Martin Daniels
Routines: Assignment, gotostatement, compoundstatement, ifstatement, casestatement, repeatstatement, whilestatement, forstatement, withstatement, statement.
There are nine kinds of statement: if, case, while, repeat, for, with, goto, compound, assignment, and procedure call. Procedure calls have already been dealt with in the last chapter.
There are three kinds of code generated for assignments. The simplest case is assigning a value to a direct simple variable (an actual variable that is not an array or record). Examples:
a:=0
where a is a local variable
LDCI 0 Load zero STRI 0 5 Store it in a
Where a is a global variable
LDCI 0 STOI 10
The next kind is when the variable is simple, but indirect (a pointer or formal variable that is not an array or record). Examples
p^:=0
or
f:=0
where p or f is local:
LODA 0 5 Load the address of the variable LDCI 0 STRI Store indirect
or if p is global (f cannot be)
LDOA 9 LDCI 0 STRI
The final case is for arrays and records whether direct or indirect. Example
r:=s
where r and s are local:
LDA 0 5 Load address of r LDA 0 100 Load address of s MOV 20 Move 20 locations from s to r
If either of the variables was global, LAO
would be
used instead of LDA
.
[3106] The first identifier has already been read [3451] (its information is in fcp). Get the selector if any.
[3110-1] If indirect or not simple, load its address.
[3112] Save the attributes of the left hand side for use in the calls of store.
[3113] Get the right hand side.
[3114-6] If it was simple then load it, else load its address.
[3119-22] If the left hand side is real, and the right integer, float the integer.
[3123] Check the two sides for compatibility.
[3124-38] Generate the store instruction.
[3138] Only files may not be assigned.
The test (gattr. typtr^.form > power) on [3110] would be better as
(gattr.typtr^. form in [arrays, records, files])
and the test (gattr.typtr^.form <=power) on [3115] would be better as
(gattr.typtr^.form in [scalar, subrange, pointer, power])
The zero in the CHK
instruction [3132] allows the pointer to be
nil. See also [2211] in selector.
The code for a goto statement is
UJP label (Unconditional jump)
The compiler only accepts goto's within a routine.
[3152-4] The top elements of display may be entries for with statements, so find the top-most entry for a block.
[3155-67] Search the display for the required label, going down through the levels.
[3161] If the label is in the current block then generate the jump instruction. Otherwise it is a jump to a non-local label, which is not implemented.
Instead of searching for the top-most block, level could be used, which points straight to it. See [3433].
The problem with non-local jumps is at run-time in resetting the
correct environment on the stack, by removing stack frames until
the stack frame of the jumped-to routine is reached. To achieve
this a level would have to be given to the UJP
instruction so that
UJP 0
would be equivalent to the current local jump, and UJP n
would be a
jump to a label, n textual levels out.
To do this, replace [3162] with
genujpxjp(57(*ujp*), level-ttop, labname).
Then the interpreter must be changed to cope with this:
req:= base(p); while mp <> req do begin (*unstack frame as in a RETP*) sp:=mp-1; ep:=store[mp+3].vm; mp:=store[mp+2].vm end; pc:=q
When the compiler compiles itself, it generates 1,753 labels. However the majority of these (with the exception of those used in goto statements and routine calls) are used only once. If space in the assembler were at a premium, the treatment of labels could be revised to re-use the single-use ones. This would mean having two kinds of labels in the code - single use ones, and programmer defined ones
This procedure should be self-evident.
The code for an if statement
if condition then statement1 else statement2
is
[Code for condition] FJP L 6 Jump to label 6 if the condition was false [Code for statement1] UJP L 7 L 6 [Code for statement2] L 7
If there is no else part, then it is
[Code for condition] FJP L 6 [Code for statement1] L 6
[3187] Compile the condition
[3188] Generate a label for the FJP
, and generate the FJP
.
The rest should be easy to understand.
Lcix1 and lcix2 is a terrible choice of names.
A case statement
case expression of 2,4: statement1; 3,6: statement2 end
has the following code
[Code for expression] UJP L 8 L 10 [Code for statement1] UJP L 9 Jump to end of case statement L 11 [Code for statement2] UJP L 9 L 8 CHKI 2 6 Check the value is in range LDCI 2 SBI Subtract the lower bound XJP L l2 Indexed jump into the following table L 12 UJP L 10 UJP L 11 UJP L 10 UJC UJP L 11 L 9
The reason for this tortuous-looking code is that the number of
cases is not known until after all the cases have been compiled. So
code is generated to jump over the code for the cases (UJP L
8
above), to where the value can be checked as in range
(CHKI
). Then the lower bound of the range is subtracted to leave a
value in the range 0..m, which is used by the XJP
to index a table of
jump instructions. Values in this table which have not been used as
case labels have a UJC
-- case error -- entry.
[3202-7] Each case label (such as 2, 4, 6, and 3, in the above example) has associated with it an internal label (labels 1 and 2 above). These two values are stored in the cslab and csstart fields respectively of caseinfo, and these are chained together by the next field.
[3210] Compile the case expression.
[3213-5] Check that it is scalar and not real.
[3216] Coerce it to integer if necessary.
[3217] Generate the jump to the checking code.
[3219] Laddr will be the label at the end of the case statement, to which each case arm jumps (label 9 above).
[3220-58] This repeat gets one case arm.
[3221] Lcix1 is the label for this arm.
[3224-48] Get the case labels.
[3224] Get one label.
[3226] Check its type matches the expression.
[3227-44] The list of case labels is chained together in diminishing order of the values in cslab. Fstptr points to the highest valued one. For example:
Lpt2 and lpt1 are made to point to either side of where the new case label is to be inserted, and lpt3 points to the new label. For example, to insert 5:
Then the new label can be inserted:
[3232] Check this case label has not occurred before.
[3250] Output the label.
[3251] Compile the statement. The repeat loop is just syntax-error recovery.
[3253-4] If no error has occurred, generate the jump to the end of the case statement.
[3259] Output the label for the checking code (the destination of the jump generated at [3217]).
[3261] Save the maximum case label value.
[3263-6] Reverse the list so that it is now in diminishing order.
[3267] Save the minimum case label value.
[3268] Check that the jump table is not going to be too big (cixmax is declared on [1809]).
[3270-2] Generate the check, the subtraction, the indexed jump, and the label.
[3273-83] Output the jump table.
[3276-9] For values that were not included in the case labels,
generate the UJC
.
[3284] Output the final label.
The code
LDCI lmin SBI
can be replaced by
DECI lmin
but there should be a check to prevent it being generated if lmin is zero.
The label that is parameter to the XJP
instruction always immediately follows the instruction, and so is
redundant. This leaves the parameter free for a more worthwhile
purpose -- to hold the length of the jump table so that the XJP
instruction can check the value is in range, and the CHK
can be
eliminated. So the checking code and jump table for the earlier
example can be:
L 8 DECI 2 XJP 5 UJP L 10 UJP L 11 UJP L 10 UJC UJP L 11 L 9
The change to the interpreter for this is:
if (store[sp]. vi < 0) or (store[sp].vi >=q) then errori('case error ') else pc: =pc+store[sp].vi
Of course, if XJP
was given two parameters like CHK
, then
DECI lmin XJP size
could be replaced by
XJP lmin size.
It is pointless to form the case list in descending order, only to reverse it immediately after. Better to keep it in ascending order from the start, updating lmax as new labels are added.
It is arguable whether the check for cixmax [3268] is needed.
Presumably it is used to prevent the vast amount of useless code that would be generated for programs like the following.
case i of 1: i: =9999; 9999: i:=1 end
(As an aside: how would you change the compiler to be able to accept the above statement?)
Cixmax is declared in rather an odd place, and rather far away from its only use. Better either at the head of this procedure, or at the start of the compiler.
As with array indexing, case statements for characters generate problem code for those who wish to assemble it on a machine with a different character set. Consider
case ch of '0': s1; '1': s2 end
giving the following code:
LODC 0 8 ORDC UJP L 6 L 8 [Code for s1] UJP L 7 L 9 [Code for s2] UJP L 7 L 6 CHKI 48 49 LDCI 48 SBI XJP L 10 L 10 UJP L 8 U3P L 9 L 7
As with array indexing, the 48 and 49
parameters to CHK
and LDC
are the integer values of
'0' and '1' on the source
machine. Refer to the notes to selector for
some possible solutions.
The code for a repeat statement
repeat statements until condition
is
L 10 [Code for statements] [Code for condition] FJP L 10
This is a fairly straightforward routine. Most of its complexity is due to syntactic error recovery in case of missed semicolons, by allowing a sequence of statements not separated by semicolons, where one statement should be.
Note that [3294-6] is identical to [3299-301]. Calling this 'A', the structure of this procedure is
A; while sy=semicolon do begin insymbol; A end;
This is a typical example of a loop terminated in the middle, inexpressible in Pascal. One way of rewriting it to avoid the duplication of A is
repeat A; finished:=sy<> semicolon; if not finished then insymbol until finished
or (see the notes for procedure factor)
repeat A until isnt(semicolon);
A very straightforward routine. The code for a while statement
while condition do statement
is
L 15 [Code for condition] FJP L 16 [Code for statement] UJP L 15 L 16
The code for a for statement
for i:= E1 to E2 do statement
is
[Code for E1] [Code to store E1 in i] [Code for E2] STRI temp temp is a temporary local variable to hold the upperbound L 8 [Code to load i] LODI temp LEQI Test i <= temp FJP L 9 Finish if i > temp [Code for statement] [Code to load i] INCI 1 Increment i [Code to store in i] UJP L 8 L 9
The only difference for a downto loop is that LEQI is replaced by GEQI
, and INCI
by DECI
.
[3321] As mentioned above, a temporary variable will be created for the duration of this statement. Save the value of lc so that it can be reset at the end to forget the temporary.
[3322-5] These are default attributes of the control variable in case it is missing.
[3326-33] Get the identifier and fill in its attributes.
[3334] The variable must not be a formal variable. (This ensures it is directly accessible).
[3336-9] Check its type.
[3345-9] Get the lower bound and check its type.
[3350] Generate code to load the lower bound and store it in the control variable.
[3356-60] Get the upper bound and check its type.
[3361-3] Load it and convert to integer if necessary.
[3364] Calculate the address of the temporary, allocating it above the other variables in this routine.
[3365] Generate code to store the upper bound in the temporary.
[3366] Generate and output the label.
[3367-9] Restore the attributes of the control variable, load it and convert to integer if necessary.
[3370] Load the temporary.
[3371-2] Allocate the space for the temporary, and update lcmax if necessary (this records the maximum size of the variables area).
[3373-4] Generate the LEQ
or GEQ
instruction.
[3379] Generate the end label, and the FJP
to it.
[3380-1] Get the do symbol, and compile the statement.
[3382] Reload the control variable.
[3383-4] Increment or decrement as necessary.
[3385] Store the revised value in the control variable, generate the
UJP
to the top of the loop,
and output the end label.
[3386] Reset lc to reclaim the space for the temporary variable.
One of the aims of P-code is to be in some way 'ideal' for the translation of Pascal. With this is mind, the code generated for a for statement could be drastically improved.
Rather than having separate instructions to manipulate the limits and the control variable, if these values are loaded on the stack, then only two instructions are needed to do the manipulations:
for i:=EI to E2 do statement
would give
[Code to load the address of i] [Code to load E1] [Code to load E2] FOR 1 L 10 L 11 [Code for statement] NXT 1 L 11 L 10
For a downto use FOR 2
and NXT 2
.
This saves seven instructions and the temporary variable for each loop.
The structure of the interpreter actions for these is
(*FOR P Q*) if p=1 then test := E1 <= E2 else test := E1 >= E2; if test then controlvariable:=E1 else begin unstack the 3 values; pc:=q end (*NXTP Q*) if p=1 then test:= controlvariable < E2 else test := controlvariable > E2; if test then begin if p=1 then controlvariable:=controlvariable + 1 else controlvariable:=controlvariable - 1 pc:=q end else unstack the 3 values.
However, this solution does cause a problem with goto's leading out of a for statement, leaving the three values on the stack. Consider how to overcome this.
Error 180 would be better than 155 on line [3334].
The assignments to kind and access [3329, 3331] are redundant, having already been set [3323-4].
The 1 of the calls to gen2 for LEQ
and GEQ
[3373-4] is unused.
The use of the genujpxjp for the FJP
is because genfjp tests that
the last expression compiled was boolean.
Lsp [3318] is never used.
If the record variable of a with statement is directly accessible, no extra code is generated. For instance the code for
with r do f:=0
is no different to that for
r.f:=0
However if the record is indirectly accessible, a temporary local variable is created to hold the record's address. Then all accesses to it are via this local.
Examples:
with r do f:=0
where r is direct, gives
LDCI 0 STRI 0 6
The code
ir^.f:=0
gives
LODA 0 8 Load the address of the record. INCA l Increment it by the offset of f LDCI 0 STOI Store indirect
while
with ir^ do f:=0
gives
LODA 0 8 Load the address STRA 0 9 Store it in the local LODA 0 9 Access the address INCA l Increment to get f LDCI 0 STDI Store indirect
[3392] These two values will be used to reset top and lc later [3428].
[3393] A with statement can have a list of variables. This repeat deals with each variable.
[3394-7] Get the variable.
[3398-9] Check it is a record.
[3401-5] Put the details of the record on the display.
[3406-10] If the record is directly accessible, make the display entry as a constant record occurrence (crec).
[3412-19] If it is indirect, create the temporary local variable, generate code to load and store the address of the record in it, and mark the occurrence as a variable record (vrec).
[3427] Compile the statement.
[3428] Reclaim the used display entries, and the temporary variables.
Consistency with the way lc is treated would suggest that instead of lcnt1, a variable ltop be used, with [3392] as
ltop := top;
and [3428] as
top:= ltop;
[3432-45] Deal with a label before the statement.
[3433-41] It must be declared at this level, so search the local list of declared labels.
[3437] If it has already been used to label a statement, report an error.
[3438] Output the label and record that it has been defined.
[3446-7] Synchronise the input.
[3448] A statement may be empty, so see if it is.
[3450] Use the initial symbol to decide on the kind of statement.
[3451-4] An identifier may start an assignment or a procedure call, so the semantic attributes of the identifier are used to distinguish which it is.
[3464] The only symbols that may follow a statement are ';', end, else, and until.
An annoying aspect of many Pascal compilers is that if a procedure has not been declared, when it is called the compiler gives an error message and then, due to the test at [3452], tries to compile the call as an assignment statement, giving many more error messages. This is a problem with all semantics-directed parsing.
However in this case there is no need to use semantic information.
A call has the form
identifier
or
identifier(parameters)
whilst an assignment has the form
identifier:=... identifier[indexes]:=... identifier.field:=...
or
identifier^ :=...
Therefore the decision on which it is can be taken on the symbol after the identifier: if it is an open bracket or one of the statement follower symbols then it is a call. If it is a selectsys or a becomes symbol it is an assignment, otherwise skip, giving:
ident: begin searchid([vars, field, func, proc], lcp); insymbol; if (sy in statfolsys) or (sy=lparent) then call(fsys, lcp) else if (sy in selectsys) or (sy=becomessy) then assignment(lcp) else begin error (6); skip(fsys+statfolsys) end end
where
statfolsys=[semicolon, endsy, elsesy, untilsy]
Pascal Implementation by Steven Pemberton and Martin Daniels
Copyright © 1982, 2002 Steven Pemberton and Martin Daniels, all rights reserved.