\section{Lexical Scanning} 

The lexical analysis maps a string (hopefully constituting a valid
{\em dynamo}\/ program) to a list of tokens. For example, the
procedure {\tt lexer}\/ maps the \dyn\ program
\begin{verbatim} 
begin 
  either  x = 3 orelse y = 4; 
  x = 4 + y
end
\end{verbatim} 
to the following list of tokens: 
\begin{verbatim} 
[TokenBegin, TokenEither, TokenName "x", TokenEq, TokenInt 3, 
TokenOrelse, TokenName "y", TokenEq, TokenInt 4, TokenSC, 
TokenName "x", TokenEq, TokenInt 4, TokenPlus, TokenName "y", 
TokenEnd]
\end{verbatim}

To achieve this, the lexicalizer scans the input and puts in
appropriate tags.  {\tt Token} is the datatype for the result of
lexical analysis.

\begin{code} 

module L 

where

data Token 
      = TokenBegin       | TokenEnd      | TokenSkip 
      | TokenFail        | TokenDo       | TokenIterations
      | TokenFind        | TokenIn       | TokenWith 
      | TokenEither      | TokenOrelse   | TokenIf 
      | TokenThen        | TokenElse     | TokenTest 
      | TokenSome        | TokenDonot    | TokenInt Int  
      | TokenName String | TokenTrue     | TokenFalse    
      | TokenNeg         | TokenAnd      | TokenOr   
      | TokenEq          | TokenMore     | TokenLess   
      | TokenPlus        | TokenMinus    | TokenTimes 
      | TokenDiv         | TokenOB       | TokenCB 
      | TokenSC          | TokenOSB      | TokenCSB
      | TokenDot         | TokenComma    | TokenProgram
 deriving Show

\end{code} 

{\tt lexer}\/ is the main procedure for lexical tagging: 

\begin{code}

lexer :: String -> [Token]
lexer [] = []
lexer (c:cs) 
      | isSpace c = lexer cs
      | isAlpha c = lexName (c:cs)
      | isDigit c = lexNum (c:cs)
lexer ('=':cs) = TokenEq : lexer cs
lexer ('+':cs) = TokenPlus : lexer cs 
lexer ('-':cs) = TokenMinus : lexer cs
lexer ('*':cs) = TokenTimes : lexer cs
lexer ('/':cs) = TokenDiv : lexer cs
lexer ('(':cs) = TokenOB : lexer cs
lexer (')':cs) = TokenCB : lexer cs
lexer ('>':cs) = TokenMore : lexer cs 
lexer ('<':cs) = TokenLess : lexer cs 
lexer (';':cs) = TokenSC: lexer cs 
lexer ('[':cs) = TokenOSB: lexer cs 
lexer (']':cs) = TokenCSB: lexer cs 
lexer ('.':cs) = TokenDot: lexer cs 
lexer (',':cs) = TokenComma: lexer cs 
lexer (c:cs) | c == (chr 123) = lexComment cs 0

\end{code} 

\verb^(chr 123)^ is a representation of an opening curly bracket. 
Comments in \dyn\ programs start with an opening curly bracket
\verb^{^ and end with the next matching occurrence of a closing curly
  bracket \verb^}^. {\tt lexComment}\/ is called when an opening
comment bracket is read; it puts the comment level to 0 and skips
everything until another curly bracket is read. If the next curly
bracket is an opening bracket, the comment level is incremented. If it
is a closing bracket and we are at comment level $0$ the comment ends.
If it is a closing bracket at comment level greater than $0$ the
comment level is decremented. This allows comments to be nested.
E.g., the programming fragment
\begin{verbatim} 
  do n times { record that all men are bachelors } 
     begin m >> m0 = m0 + 1;  b[m] = 1 end; 
\end{verbatim} 
can be commented out as follows: 
\begin{verbatim} 
  { do n times { record that all men are bachelors } 
     begin m >> m0 = m0 + 1;  b[m] = 1 end;        }
\end{verbatim}

This is a very handy feature when debugging \dyn\ programs.  The
representations for opening and closing curly brackets that we use are
\verb^(chr 123)^ and \verb^(chr 125)^. Haskell allows to refer to
`$\{$' and `$\}$' directly, by means of \verb^{^ and \verb^}^.  We
prefer \verb^(chr 123)^ and \verb^(chr 125)^ because unlike the
brackets these representations do not interfere with the \LaTeX\ 
processing of the source file.\footnote{Recall that what you are
  reading now is the literate documentation of the \dyn\ 
  implementation, as it was typeset by \LaTeX.}

\begin{code} 

lexComment (c:cs) n | c == (chr 125) = case n <= 0  of 
                                       True  -> lexer cs
                                       False -> lexComment cs (n-1)
                    | c == (chr 123) = lexComment cs (n+1)
                    | otherwise      = lexComment cs n 

\end{code} 

{\tt lexNum}\/ recognizes integers and puts a marker
{\tt TokenInt} in front of them. 

\begin{code} 

lexNum cs = TokenInt (read num) : lexer rest
     where (num,rest) = span isDigit cs

\end{code}

{\tt lexName}\/ sorts out the identifiers from the reserved words of
the language. An identifier (a name of a program or variable) is
marked by putting a {\tt TokenName}\/ in front of it. Identifiers
start with an alphabetic character, but may contain digits as well.

\begin{code} 

lexName cs =
   case span isAlphaDigit cs of
      ("program",rest) -> TokenProgram : lexer rest
      ("begin",rest) -> TokenBegin : lexer rest
      ("end",rest) -> TokenEnd : lexer rest
      ("skip",rest) -> TokenSkip : lexer rest
      ("fail",rest) -> TokenFail : lexer rest
      ("do",rest)  -> TokenDo : lexer rest
      ("times",rest)  -> TokenIterations : lexer rest
      ("find", rest) -> TokenFind : lexer rest
      ("in", rest) -> TokenIn : lexer rest
      ("with", rest) -> TokenWith : lexer rest
      ("either",rest)  -> TokenEither : lexer rest
      ("orelse",rest)  -> TokenOrelse : lexer rest
      ("if",rest)  -> TokenIf : lexer rest
      ("then",rest)  -> TokenThen : lexer rest
      ("else",rest)  -> TokenElse : lexer rest
      ("test",rest)  -> TokenTest : lexer rest
      ("some",rest)  -> TokenSome : lexer rest
      ("donot", rest) -> TokenDonot : lexer rest 
      ("true",rest)  -> TokenTrue : lexer rest
      ("false",rest)  -> TokenFalse : lexer rest
      ("not",rest)  -> TokenNeg : lexer rest
      ("and",rest)  -> TokenAnd: lexer rest
      ("or",rest)  -> TokenOr : lexer rest
      (name,rest)   -> TokenName name : lexer rest
     where isAlphaDigit c = isAlpha c || isDigit c

\end{code} 

That's all there is to lexical analysis. 

\section{DataType Declarations for Syntax}

The parser has to know about a number of datatypes in order to
construct the appropriate parse trees.

The datatype declaration for variables distinguishes between 
single variables (\verb^x1^), indexed variables (\verb^x[5]^), and 
 doubly indexed variables (\verb^x[5][4]^):

\begin{code} 

data Var 
      = Svar String 
      | Ivar String Exp
      | Divar String Exp Exp
     deriving Show

\end{code} 

{\tt Exp} for numerical expressions (example: \verb^x + (y * z)^):

\begin{code} 

data Exp  
      = Plus Exp Term 
      | Minus Exp Term 
      | Term Term
  deriving Show

\end{code} 

{\tt Term} for numerical terms (example: \verb^y * z^):

\begin{code}

data Term 
      = Times Term Factor 
      | Div Term Factor 
      | Factor Factor
  deriving Show

\end{code} 

{\tt Factor} for numerical factors (example: \verb^y^):

\begin{code}

data Factor 
      = Int Int
      | Var Var 
      | Brack Exp
  deriving Show

\end{code} 

{\tt Bexp} for Boolean expressions (example: \verb^x >= 0^):

\begin{code}

data Bexp 
     = Bool Bool
     | Eq Exp Exp 
     | Gr Exp Exp 
     | Geq Exp Exp 
     | Less Exp Exp 
     | Leq Exp Exp 
     | Neq Exp Exp 
     | Neg Bexp 
     | Conj Bexp Bexp
     | Disj Bexp Bexp
  deriving Show

\end{code} 

{\tt Rng} for ranges (example: \verb^[1 .. 100]^):

\begin{code} 

type Rng = (Exp,Exp) 

\end{code}

{\tt Stmt} for program statements:

\begin{code}

data Stmt
    = Some Var 
    | Skip 
    | Fail
    | Iseq Exp Exp 
    | Put Var Var Exp
    | Test Bexp 
    | If Bexp Stmt Stmt
    | Either Stmt Stmt
    | Do Exp Stmt
    | Find Var Rng Stmt
    | Find1 Var Int Int Stmt
    | Not Stmt
    | Comp Stmts 
  deriving Show

\end{code} 

{\tt Stmts} for lists of program statements. 

\begin{code} 

type Stmts = [Stmt]

\end{code}