Pascal Implementation by Steven Pemberton and Martin Daniels

Chapter 12: The Interpreter

Although the assembler and the interpreter are presented here as one program they are two distinct phases as far as the operation is concerned.

The assembler puts the P-code program in the array code and the associated constants in the store array. The interpreter executes the P-code instructions using the case statement starting on [687] with the help of some additional procedures between lines [438-542]. The first procedures pmd and errori are concerned with errors that may arise at run time. The function base is used for manipulating the linkage between stack frames. Procedure compare is used for string comparisons. Finally procedure callsp contains the routines to deal with the embedded standard P-code procedures and the special requirements of input and output. A detailed analysis of the interpreter now follows.

[438-49] procedure pmd (post-mortem dump).

This procedure is called when an error during run time has occurred.

[439] The main objective is to print out the contents of the stack and heap. The variable s holds the address of a location to be written out.

[441-9] procedure pt (print)

[442] prints the address and contents of one location. [443] tests for a peculiarity of the CDC machine which is able to store an integer value that is larger than is possible to print out. S and i are adjusted ready for the next value to be printed.

[452] This statement outputs some of the register values and the offending opcode value.

[456-7] These statements print the contents of the stack. The next two lines [458-9] print the contents of the heap.

The result of this procedure is a straightforward dump. It would be much more useful if all the values of the registers were printed, and the output formatted to show the structure of the stack frames.

[462-5] errori

This procedure has two functions, to print out the error message, (which will be an indication of the type of error that has occurred) which was passed as a parameter, and to call the post-mortem dump procedure. Control is transferred to label 1 at line [1018] to halt execution. (In the assembler in a similar situation [170] the non-standard halt procedure was used).

[467-74] Function base

The parameter, ld, which is passed to this function is the depth of nesting and is used to calculate an address to be returned as the value for the function. This function is used to follow the static links back to a textually enclosing block and to return a base address. The effect of this while statement is

for i := 1 to ld
do ad := store[ad + 1].vm

[476-85] Procedure compare

The purpose of this procedure is to compare two strings. The strings will be of equal length and the procedure returns true if the strings are the same. The addresses of strings are found on the top of the stack and assigned to i1 and i2, and their length is defined by the q operand of the P-code instruction being executed. The main body of the procedure [482-4] works as follows. While each successive pair of characters are equal, the test proceeds, as soon as a difference is detected b is set false and control returns to the calling procedure.

Notes

  1. [483] It is not clear why the strings are referenced with the tag field vi when it would be more appropriate to use vc.

  2. The procedure compare could be much improved as it only performs half the task of comparison e.g. for GEQ [829-30] the comparison is continued to test for greater than. The procedure could be reprogrammed as follows:

    type same = (lessthan, equal, greaterthan);
    
    function compare: same;
        .
        .
        .
    end; (* of compare *)

    the code for GEQ could then become:

    store[sp].vb := compare in [greaterthan, equal]

Routine callsp (call standard procedure), lines [487-664]

There are a number of standard procedures that are available to P-code programs. They appear in the source, for example, as follows:

CSP WRC

where the operand of the instruction specifies the required standard procedure.

The majority of these standard procedures deal with file handling, and to aid their interpretation there is a set of procedures between lines [491-542] which will be discussed before the detailed description of each standard procedure.

Each of these procedures performs, generally, a similar set of operations:

  1. The file address is found on the top of the stack
  2. An address is found in the next position of the stack
  3. An operation is performed peculiar to the procedure, for example get(f)
  4. The buffer variable is accessed/updated
  5. The stack pointer is updated.

The file addresses refer to one of four locations in the stack frame that is generated for the main block. The addresses are as follows:

5: input from the input file

6: output to the output file

7: input from the prd file

8: output to the prr file

The contents of each of the these addresses are used to hold the buffer variables.

Note

  1. The variables adptr and adelnt although declared are not used.

[491-7] readi (read integer)

An address is taken from the penultimate position on the stack and an integer is read into this address. The buffer variable location is updated, f^ contains the next character to be read and this is placed in the buffer variable location [495]. Line [496] sees the two addresses removed from the stack.

[499-505] readr (read real number)

This procedure is effectively the same as readi except that in [502] a real number is read in.

[507-15] readc (read a character)

A character is read and placed in the address found on the stack. [513] appears to be unneccessary as it overwrites the effect of line [512].

[517-27] writestr (write string)

For this procedure the stack contains two extra numbers which give an indication of the field width required (k) and the number of characters in the string (j). If k is greater than j then the difference of these two numbers gives the number of spaces that must be printed out before the string. [525] writes the characters of the string. The stack pointer is decremented by four.

[529-5] getfile

This procedure closely follows the general outline. Its function is to do a get on a file, that is, to advance the file window to the next component and update the buffer variable [533].

[536-41] putfile

This procedure performs the reverse function to getfile. i.e. it appends the value of the buffer variable to the file.

The next group of instructions deals with the execution of the standard procedures proper. [543] The case statement is driven by the value of the operand, q. In each case which involves file transfers there is a further case statement to separate the individual files - which are denoted by the numbers 5,6,7 and 8.

[545-50] GET

Store[sp].va defines the file on which the get is to be performed. This is only permissible on input files and therefore in cases 6 and 8 there is a call to procedure errori to print out the error message and halt the program.

[551-6] PUT

This is very similar to GET.

[557-60] RST

This instruction corresponds to release. The heap pointer gets set to the value on the stack. The comment is meaningless. The instruction name is not very helpful (suggesting reset perhaps?).

[561-72] RLN (readln)

This instruction performs a readln on the specified file and updates the buffer variable accordingly. RLN is applicable to input files only, hence the error conditions for the two output files.

Unfortunately there is an error in the code for the prd file, which refers to input instead of prd. It should read:

7: begin
       readln (prd);
       store[prdadr].vc : = prd ^
   end;

[573-80] NEW

The new procedure is used to allocate space on the heap. The stack contains an address and a value giving the space requirements. A test is made to ensure that the stack and heap will not overlap when the space has been allocated. Provided this check succeeds, NP is updated and this new value is returned in the address found on the stack, which will be used to access the new locations on the heap.

[581-8] WLN (write line)

The purpose of this procedure is to perform a writeln on the specified file.

[589-93] WRS (write string)

A string of characters is output to the selected file by calling the procedure writestr.

[595-602] ELN (end-of-line)

This procedure performs a test to determine if the end-of-line, during input, has occurred. A temporary boolean variable, line, is used. This value is left on the stack.

[603-32] WRI, WRR and WRC (write integer, write real and write character).

These three routines are identical excluding the tag fields. In each case there are three values on the stack, the top value is the file identifier, followed by the field width and the value to be written. Taking [605] as an example this statement writes an integer (as shown by the tag field vi) store[sp-2].vi, with a field width of store[sp-1].vi, onto the output file.

[633-50] RDI, RDR and RDC (read integer, read real and read character)

Similarly, these procedures only differ in the actual procedures that they call which have already been discussed [491-515].

[651-6] This section covers the six mathematical functions available, sin, cos, exp, ln, sqrt, and atan. In each case the top of the stack contains a real number. This is replaced by a real number which is the mathematical function of the original

[657-60] SAV (save)

This instruction is called as a result of the mark procedure. It is used to remember the heap pointer for later use in relinquishing space on the heap via the release procedure.

[661-2] This marks the end of the CSP case statement and the end of the callsp procedure.

Notes

  1. There is no real need to separate the standard procedures from the ordinary P-code instructions; the standard procedures could simply be parameterless instructions. This would save this second case statement [544] (and incidently the second linear search in the assembler [321].)

  2. There is a lack of consistency in the choice of which are standard procedures and which are instructions. For example, why has EOF been selected as an instruction?

  3. All the file handling operations have a case statement and the individual files are represented by the four numbers. In fact names have already been defined for these files and their associated addresses [34-7], that is inputadr, outputdr, prdadr and prradr. These should be used instead of numbers because in other implementations different numbers may be involved.

Main Section, lines [664-1017]

This is the start of the controlling segment for both the assembler and interpreter. The assembler is self-contained within one procedure, load, and this is called in line [666]. The interpreter is not separated in the same way, the bulk of the code appears in the following statements.

[665] rewrite(prr)

This statement is not included for the assembler/interpreter as neither use the prr file - it is included in case the user program that is to be intepreted uses it. Normally rewrite and reset would be included at the P-code level but neither are provided in this P-code implementation. These functions have therefore to be provided by the interpreter, hence [665] and line [670] which partially fulfill reset(prd). However these additions do not fully substitute the need for true reset/rewrite functions - for example, multiple passes of the prd file are not possible. It is a trivial job to add these functions to the compiler and the assembler/interpreter.

[668-70] Initialisation takes place in these lines. The registers are set up with the PC pointing at the first executable instruction and the stack frame registers ready to generate the first stack frames. SP is pointing to an empty stack and NP is pointing to an empty heap. EP is assigned the value five - but this value is arbitrary as it is changed by the first ENT instructions. [669-70] initialises the buffer variables for both input and prd files.

[671] The variable interpreting is set true and is only set false by the STP (stop) instruction [1003].

[673-1017] The remainder of the program is the body of the interpreter where the P-code machine is simulated. This machine follows traditional lines of a fetch-execute cycle. The fetch phase is between [676-682]: the primary objective is to determine the opcode of the next instruction together with its parameters p and q. The only peculiarity here is again the fact that the instructions have been packed two to a word, as in [419-24]. The PC is incremented in order to point at the next instruction in sequence.

Instruction Interpretation, lines [685-1012]

The interpretation, or execute phase, of the individual instructions now begins. The assembler has appointed unique opcodes to each instructions and also each instruction with a collection of different possible types e.g. SRO has the opcodes 3, 75, 76, 77, 78, 79 - one for each type of possible operand. The interpreter, however, in most cases is able to lump them together again as it assumes that all primitive types occupy one word. In other implementations it is likely that each case would have to be treated separately.

The case statement [685] acts like the instruction register of a processor and directs the execution according to the opcode of the instruction.

[687-91] LOD (load contents of address at level p)

p gives the depth of nesting and this is used by the function base to find a base address. q is used as an offset to this base to locate the contents required. This value is put onto the stack.

[693-7] LDO (load contents of base-level address)

This instruction represents a special case of LOD where the address required is known to be at the outermost level of nesting, that is in the main program, and therefore it is not necessary to search down the static links to find the location. q in this case is a simple offset to the first stack frame and it is from this location that a value is found and put on the stack.

[699-702] STR (store contents at address at level p)

This is the complementary instruction to LOD and operates in a similar manner.

[704-7] SRO (store at base-level address)

This is the complementary instruction to LDO.

[709-11] LDA (load level p address)

This instruction is similar to LOD except that the address is loaded onto the stack, rather than its contents.

[713-5] LAO (load base-level address)

The operand q is a base-level address and this is left on the stack. This is the base-level version of LDA.

[717-21] STO (store indirect)

In this instance the stack contains two entries, the uppermost is a value to be stored in the location whose address is in the next location.

[723-31] LDC (load constant)

Four different types of constant may be loaded with this instruction. The first is an integer (p=l) (the begin and end are not required here [725-6]), the second (p=6) is a character, the third (p=3) is a boolean value, otherwise (p=0) the nil value will be loaded on the stack. The nil value created is maxstr which is an address that cannot be generated by the user program.

[733-5] LCI (load constant indirect)

The value of q is a pointer to the table of constants, this address is left on the stack.

[737-41] IND (indexed fetch)

The top of the stack contains an address, and the value of q is an offset to the address. (The comment means that the compiler will compensate for a particular primitive type having more than one word per variable.) The offset and address produce a new address and the contents of this location are placed on the stack.

[743-4] INC (increment)

The top of the stack is incremented by the value of the operand, q.

[746-57] MST (mark stack)

The MST instruction is used to build a mark stack for a new stack frame and is part of calling a procedure. [750] sets up the static link, and the dynamic link is assigned in the next statement. [754] stores the value of EP i.e. the extremity of the current stack frame, so that the current procedure may be reinstated. The stack pointer is moved by five units i.e. the size of the mark stack. After the MST instruction, the parameters (if any) for the procedure will be loaded into the stack frame.

[759-63] CUP (call user procedure)

The first statement for this instruction is to set the MP to the beginning of the new stack frame that has been partially constructed by the previous MST instruction. p denotes the amount of space required for parameters and the number four relates to the size of the mark stack (for this implementation only). As p has been declared as 0..15 it means that the number of parameters is limited to 15 (though this is never checked). [761] the PC value is stored for the return address back to the calling procedure. [762] q is the address of the instruction of the procedure being called.

[765-73] ENT (enter procedure)

Two ENT instructions will be found at the start of every user procedure. The operands of these instructions define the length of the remaining segments in the stack frame. If p=l then the value of q denotes the extra space required for locally declared variables and SP is updated accordingly. The statement should be as follows:

sp := mp + q - 1;

This is because SP always points at the top item on the stack; that is, to push a value involves incrementing the stack pointer first. Without the additional -1 one location is always wasted. A check is made [767] to ensure that the stack frame does not encroach into the heap. If p=2 then q denotes the size of the local stack and EP is updated and again a check is made to ensure that the stack frame does not overlap the heap. Only this second check is necessary.

[775-82] RET (return from procedure or function)

The RET instruction provides the mechanism to return to the calling procedure and release the stack frame of the just-completed procedure. If returning from a function then the returned function value must be left on the stack [777], the five values of p allowing for the different function types. If p equals zero then a procedure return is being performed. The remaining three statements of this instruction revive the calling procedure's stack frame i.e. PC takes up the return address value, EP the old EP value and finally MP the old MP value or dynamic link.

[784] CSP (call standard procedure)

This is dealt with by a separate procedure which has already been described [487-662].

[786-90] IXA (indexed address)

This instruction is for array subscripts. Two values are on the stack, the uppermost is an integer and the next an address. [789] The address is modified by the product of the integer and q and this new address is left on the stack.

[792-877] Condition instructions.

The next six instructions are each treated in similar ways.

[792-804] EQU (test for equal)

The two values to be compared are on the top of the stack, except for the strings case where pointers to the strings are present. [794] Two integers are compared and the boolean result is pushed onto the stack. The other types are treated in a similar way, except for strings [800-2]. Here, the procedure compare returns in variable b a boolean giving the result of the equality test which in the case of EQU is in the correct form. With all the other conditional instructions this boolean value must be modified.

[806-l8] NEQ (test for not equals)

Similar to EQU.

[820-33] GEQ (test for greater than or equal)

An additional test is required for the string comparison. Procedure compare only returns a result indicating whether the strings are equal or not. The pointers i1, i2 and i are left by compare pointing to the place in the strings where a difference has occurred and these are used to test the two characters for GEQ. Note that if b = true, i1 and i2 point to garbage.

[835-77] GRT, LEQ, LES (tests for greater than, less than or equal, and less than) Similar to previous tests.

This concludes the conditional instructions.

Note

  1. The case labels for the conditional instructions are in a curious order.

[879] UJP (unconditional jump)

q specifies the address to where control is to be transferred.

[881-3] FJP (jump on false)

This instruction is used by if, while, repeat and for statements. If the boolean value on the top of the stack is false then a jump is made to the address held in q, otherwise the next instruction following the FJP is executed.

[885-888] XJP (indexed jump)

The address in q is modified by the integer on the top of the stack to index a table of jumps. This instruction is used for the implementation of the case statement, the integer being the selector.

[890-2] CHKA (check address)

This is a run time error checking instruction. It is used to ensure that pointers to the heap are correct. q will only take the values zero or one, in the first case the address can either be nil or a real address, while in the second case only a real address is allowed i.e. a nil pointer cannot be dereferenced.

[894-7] CHK (check value between bounds)

q is a pointer to the constants table where the upper and lower bounds are stored. The value on the stack is checked to ensure it lies between these bounds - if not an error message is printed and execution halted.

[899-3] EOF (end-of-file)

If the address on the stack is inputadr then a boolean is returned denoting whether end-of-file on the input file has occurred or not, otherwise an error condition is flagged. (An unnecessary begin end here). This is an unnecessary restriction which standard Pascal does not have, i.e. eof(prd) should be allowed.

[905-7] ADI (add integers)

Two integers on the top of the stack are added together and the result is left in their place.

[909-19] ADR, SBI, SBR (add reals, subtract integers, subtract reals)

Similar structure to ADI.

[921] SGS (create singleton set)

A set is created with a single element defined by the integer on the stack.

[923] FLT (float integer to real)

Converts the integer in the top stack location to a real number.

[925] FLO (float integer to real - lower stack position)

Same as FLT except it is performed on the element next to the top of the stack. Having two 'float' instructions allows operators with a real and an integer operand to be evaluated without putting any restriction on the order. Wherever FLO is used, the topmost stack element will already be a real number.

[927] TRC (truncate)

Truncates a real number and converts to an integer.

[929-41] The instructions in this group all involve a single statement and a simple transformation of the stack element. They are briefly:

NGI negate integer
NGR negate real
SQI square of an integer
SQR square of a real number
ABI absolute value of an integer
ABR absolute value of a real number
NOT logical inversion

[943-9] AND, IOR (logical and, logical inclusive or)

These two instructions perform a logical operation on the top two stack locations and leave the result on the top.

[951-61] DIF, INT, UNI (set operations - difference, intersection and union). Each of the set instructions is performed on the top two stack locations and the result is left on the top.

[963-6] INN (test set inclusion)

The uppermost element of the stack contains a set. The next element is an integer and this instruction tests to see if this integer is a member of the set; a boolean value is left on the stack.

[968-70] MOD (modulo)

The top value of the stack takes the modulo of the next integer on the stack.

[972] ODD (test for an integer)

A boolean true value is left on the stack if the integer tested is odd otherwise false will be left.

[974-88] MPI, MPR, DVI, DVR

Each of these instructions involves an arithmetic operation on the top two elements of the stack. They are as follows:

MPI multiply integers
MPR multiply real numbers
DVI divide integers
DVR divide real numbers

[990-4] MOV (move)

The stack contains two addresses, the top address points to the first of a number of storage locations that are to be moved. The second address on the stack is the start of the reception area for the moved locations. q defines the number of locations to be moved.

[996-998] LCA (load address of constant)

q is an address of a string constant, this is pushed onto the stack. This is the address version of LCI.

[1000-1] DEC (decrement)

The complementary instruction to INC. The integer on the top of the stack is decremented by the value of q.

[1003] STP (stop)

The terminating instruction. Apart from a run time error, this is the only way to exit from the interpreter.

[1005-10] ORD, CHR

These instructions are not propagated by the assembler.

[1012] UJC (padding instruction)

This instruction is used only to fill the table of jumps constructed for XJP instructions for non-existent selections. If an attempt is made to execute a UJC instruction then a run time error will result.

This completes the code for the interpreter.

Pascal Implementation by Steven Pemberton and Martin Daniels