Lecture 8: Interpreters

Programming Languages Course
Aarne Ranta (aarne@chalmers.se)

Book: nothing

Plan

Interpretation vs. compilation

Operational semantics

From operational semantics to interpreter code

Evaluation and side effects

Tracing and debugging

From type checker code to interpreter code

Implementing interpreters in Haskell

Implementing interpreters in Java

Lab 3 summary

JVM interpretation

Interpretation vs. compilation

Interpretation: run the code statement by statement

Compilation: translate the code into another format

Example:

    int i = 1 ;
    int j = i + 2 ;
    printInt(i + j) ;

Interpretation goes from state to state:

    ()
    (i = 1)
    (i = 1, j = 3)
    output 4

Compilation generates another piece of code (e.g. JVM):

    bipush 1
    dup
    istore 0
    bipush 2
    iadd
    istore 1
    iload 0
    iload 1
    iadd
    invokestatic runtime/iprint(I)V

Implementing interpretation and compilation

Both techniques use syntax-directed translation on type-checked syntax tree.

Both can be easily adapted from the type checker code.

Both have an environment with function signature and a block-structured variable context.

But the environments contain different things...

... and the theoretical models are different.

Interpretation: operational semantics (or, equivalently, abstract interpretations)

Compilation: compilation schemes

Operational semantics

Inference rules similar to typing rules, with the form of judgement

    Env => Exp ⇩ Val

Read: "In the environment Env, expression Exp computes to value Val". This is called big-step semantics, as there can be many actual computation steps inside Exp ⇩ Val.

The environment has variables and their values, e.g. [x = 0, y = 7].

Examples: addition, variables, integer literals.

    Env => exp1 ⇩ val1   Env => exp2 ⇩ val2
    -------------------------------------------
        Env => exp1 + exp2 ⇩ val1 + val2
  
  
    --------------  x = v is in Env     -------------- i is an integer literal
    Env => x ⇩ v                      Env => i ⇩ i

Operational semantics for statements

Statements don't compute to values. Instead, they change the environment. The form of judgement is

    Env => Stm ⇩ Env'

We write Env[x = val] for updating Env with a new value val for x.

Example: assignments.

    Env => exp ⇩ val
    -------------------------------
    Env => x = exp ⇩ Env[x = val]

Operational semantics: while loops

Two rules are needed, depending on whether the condition is true:

    Env => exp ⇩ false
    ------------------------------
    Env => while (exp) stm ⇩ Env
  
    Env  => exp ⇩ true    
    Env  => stm ⇩ Env'   
    Env' => while (exp) stm ⇩ Env''
    ---------------------------------
    Env  => while (exp) stm ⇩ Env''

Expressions with side effects

More complex form of judgements: return a value and a new environment:

    Env => Exp ⇩ (Val,Env')

For instance, if assignments are expressions, we have

    Env => exp ⇩ (val,Env')
    --------------------------------------
    Env => x = exp ⇩ (val,Env'[x = val])

Try this with x = (x = x + 1) + 1 in the initial environment [x = 0] !

Semantic rules revisited

Principle: each recursive call of the interpreter may change Env.

    Env => exp1 ⇩ (val1,Env1)   Env1 => exp2 ⇩ (val2,Env2)
    ----------------------------------------------------------
        Env => exp1 + exp2 ⇩ (val1 + val2, Env2)

But if there are no recursive calls, Env usually remains constant.

    --------------  x = v is in Env     -------------- i is an integer literal
    Env => x ⇩ v                      Env => i ⇩ i

Exception: increments. Notice the difference between pre and post increments!

    ----------------------------------  x = v is in Env
    Env => x++ ⇩ (v,   Env[x = v+1])
  
    ----------------------------------  x = v is in Env
    Env => ++x ⇩ (v+1, Env[x = v+1])

From semantic rules to interpreter code

Basic idea is the same as in type checking: from rule

        J₁ ...  J_n
      ---------------- C
      Env => exp ⇩ val

generate the code "upside down" - with some modifications,

    interpret (Env => exp) =
      val₁ := interpret J₁
      ...
      val_n := interpret J_n
      return f(val₁,...,val_n,C)

The rules that are the starting point are now the abstract interpretation rules (equivalent to big-step semantic rules).

First look at an example without side effects:

    Env => exp1 ⇩ val1   Env => exp2 ⇩ val2     interpret Env (exp1 + exp2) =
    -------------------------------------------       val1 := interpret Env exp1
    Env => exp1 + exp2 ⇩ val1 + val2                val2 := interpret Env exp2
                                                      return val1 + val2

Values

Interpreting an expression returns a value.

Therefore expression interpretation is also called evaluation.

Values are also called canonical expressions, or expressions in normal form.

Normal form: expression that cannot be evaluated further.

In lab 3, we will need four types of values:

integer values, e.g. -47
double values, e.g. 3.14159
boolean values, true and false
a void value, which need never be shown

Thus variables are not values, and complex expressions like 3 + 2 are not values.

Good approximation (works when values are finite, atomic objects): values are literals.

In the interpreter implementation, it is good to have a separate type Value with a constructor for each type of value.

Operations on values

Call by value evaluation strategy: when executing a function call, first evaluate the arguments and then call the function with the obtained values as parameters.

The call by value strategy is followed by many languages: C, C++, Java, ML - but not Haskell (which has "call by need" - lecture 11).

The simplest example are arithmetic operations: the operands are evaluated first, e.g. in x + y, x and y are evaluated first. The values are then added by a plus operation defined for values.

Environments for interpreters

The signature holds the interpretations of functions.

It maps each function name to a function (written in the implementation language!) which sends values of arguments to a return value.
In Lab3, we need not necessarily have a signature - we just need interpretation rules for each of the four built-in functions.

The context holds the values of the variables.

It has a block structure, just like the context in the type checker.
The values can be uninitialized, if a variable is only declared. Referring to such a variable gives a run-time error.

Thus the following methods are needed on the context

    Value lookupVar  (Env, Id)
    Env   updateVar  (Env, Id, Value)
    Env   newVar     (Env, Id)
    Env   newBlock   (Env)

Interpreting statements: execution

Interpreting statements is also called execution.

The interpreter defines an exec function that tells how the environment is changed by each statement.

    Env exec (Env, Stm)
  
    // declarations
    exec Env (typ var ;) =
      newVar Env var
   
    // expression statements
    exec Env (exp ;) =
      (Env2, val) := eval Env exp
      return Env2
  
    // while loops
    exec Env (while (exp) stm) =
      (Env2, val) := eval Env exp
      if val == true
        Env3 := exec Env2 stm
        exec Env3 (while (exp) stm)
      else
        return Env2

Interpreting expression: evaluation

Reminder: interpreting expressions statements is also called evaluation.

In Lab 3 (as in C++, C, and Java), expressions may have side effects, i.e. change the environment.

The interpreter defines an eval function that tells what value is returned and how the environment is changed by each statement.

    (Env, Val) eval (Env, Exp)
  
    // literals
    eval Env lit = 
      return (Env, lit)
   
    // variables
    exec Env var =
      val := lookupVar Env var
      return (Env,val)
  
    // assignments
    exec Env (var = exp) =
      (Env2,val) := eval Env exp
      Env3       := updateVar Env2 var val
      return (Env3,val)

Evaluation of increments

Following the C/C++ standard, we distinguish between preincrements ++i and postincrements i++.

    // preincrements
    exec Env (++ var) =
      val  := lookupVar Env var
      val2 := plusVal(val,1)
      Env2 := updateVar Env var val2
      return (Env2,val2)
  
    // postincrements
    exec Env (var ++) =
      val  := lookupVar Env var
      val2 := plusVal(val,1)
      Env2 := updateVar Env var val2
      return (Env2,val)

Lazy logical operations

Conjunction,

    a && b

is evaluated lazily: first a is evaluated. If the result is true, also b is evaluated, and the value of b is returned. However, if a evaluates to false, then false is returned without evaluating b.

    eval Env (a && b) = 
      (Env2,val) := eval Env a
      if val == false
         return (Env2,val)
      else
         eval Env2 b

Disjunction,

    a || b

is also evaluated lazily: first a is evaluated. If the result is false, also b is evaluated, and the value of b is returned. However, if a evaluates to true, then true is returned without evaluating b.

The top-level interpreter

To interpret the whole program,

initialize the context with an empty context
interpret the body in this context

    exec (int main () { Stm₁ ... Stm_n}) =
      Env₁ := exec .() Stm₁
        ...
      exec Env_n1 Stm_n

Tracing and debugging

Nice thing about interpretation: the execution can be precisely traced statement by statement.

For instance: append to each execution function call a line showing the statement and a line showing the variable context.

This is useful for debugging programs.

But it is also useful for debugging your interpreter! Use the old trick of writing a generic tracing function

     trace Env stm =
       print Env ;   // comment out in final version
       print stm ;   //
       // return ;   // uncomment in final version

You can also make this dependent on a command-line flag, so that the call

    lab3 -debug file.cc

runs your program with tracing on.

Add to this that the program stops after each statement to wait for an 'enter', and you have turned your interpreter into a debugger. (But this is not required in Lab 3.)

Example run of the interpreter

Source code in good09.cc:

  int main ()
  {
    int i = readInt() ; // 5
  
    printInt(i) ;   //5
    printInt(i++) ; //5
    printInt(i) ;   //6
    printInt(++i) ; //7
    printInt(i) ;   //7
  }

Running the interpreter

    bash$ ./lab3 good09.cc
    -- waits for user input, which is 100
    100
    100
    100
    101
    102
    102

Example run in debugging mode

After each statement, the program waits for an Enter. In addition, it waits for user input with read functions, as usual.

    bash$ ./lab3 -debug good09.cc
  
    [input] 100
    i=100
  
    printInt (i);
    [output] 100
    i=100
  
    printInt (i ++);
    [output] 100
    i=101
  
    printInt (i);
    [output] 101
    i=101
  
    printInt (++ i);
    [output] 102
    i=102
  
    printInt (i);
    [output] 102
    i=102

Interpreter in Haskell

You can copy the contents of laborations/lab3/haskell/:

    CPP.cf           -- grammar: from your Lab 2
    lab3.hs          -- main module
    Makefile         
    TypeChecker.hs   -- type checking module: from your Lab 2
    Interpreter.hs   -- interpreter module

You only have to modify Interpreter.hs.

But you can already compile them: just type

    make

and run the interpreter with

    ./lab3 <File>

The rest is "debugging the empty file"! You can also start from laborations/mini.

The Main module

You don't have to write this - but it shows how compiler phases are linked together.

    check :: String -> IO () 
    check s = case pProgram (myLexer s) of
                Bad err  -> do putStrLn "SYNTAX ERROR"
                               putStrLn err
                               exitFailure 
                Ok  tree -> case typecheck tree of
                              Bad err -> do putStrLn "TYPE ERROR"
                                            putStrLn err
                                            exitFailure 
                              Ok _    -> interpret tree

Notice that the interpreter runs in the IO monad and not the error monad.

Central types

Values

    data Value = Vint Integer | Vdouble Double | Vbool Bool | Vvoid
  
    printValue :: Value -> String

Environment type; use Map for easier update than lists.

    type Env = (Sig,[Context])
    type Sig = [(Id,[Value] -> IO Value)]
    type Context = M.Map Id Value

The signature is initialized by the primitive functions

    (Id "printInt",   \ [v] -> putStrLn (printValue v) >> return Vvoid)
    (Id "printDouble",\ [v] -> putStrLn (printValue v) >> return Vvoid)
    (Id "readInt",    \ []  -> getLine >>= return . Vint . read)
    (Id "readDouble", \ []  -> getLine >>= return . Vdouble . read)

The Interpreter module

The environment datatypes and operations.

Type signatures of the interpretation methods

    interpret :: Program -> IO ()
    exec      :: Env -> Stm -> IO Env
    eval      :: Env -> Exp -> IO (Env,Value)

Aggressivity

"Well-typed programs cannot go wrong!"
therefore we need no more check certain things at runtime.

    exec env (SWhile exp stm) = do
      (env1,Vbool b) <- eval env exp
      if b then ...
      ---

Interpreter in Java

You can copy the contents of laborations/lab3/java/:

    CPP.cf             -- grammar (from lab2)
    lab3               -- script running the type checker
    lab3.java          -- main program
    Makefile
    TypeChecker.java   -- type checker class (from lab2)
    TypeException.java -- exceptions for type checking
    Interpreter.java   -- interpreter class

You only have to modify Interpreter.java.

You can already compile the files: just type

    make

and run the type checker with

    ./lab3 <File>

Before make, you may have to set your class path so that it finds java_cup and JLex, as well as the current directory.

    export CLASSPATH=.:<path-to-JLex>:<path-to-CUP>:$CLASSPATH

The main module

You don't have to write this - but it shows how compiler phases are linked together.

  		try {
  			l = new Yylex(new FileReader(args[0]));
  			parser p = new parser(l);
  			CPP.Absyn.Program parse_tree = p.pProgram();
  			new TypeChecker().typecheck(parse_tree);
  			new Interpreter().interpret(parse_tree);
  
  		} catch (TypeException e) {
  			System.out.println("TYPE ERROR");
  			System.err.println(e.toString());
  			System.exit(1);
  		} catch (RuntimeException e) {
  		    //			System.out.println("RUNTIME ERROR");
  			System.err.println(e.toString());
  			System.exit(-1);
  		} catch (IOException e) {
  			System.err.println(e.toString());
  			System.exit(1);
  		} catch (Throwable e) {
  			System.out.println("SYNTAX ERROR");
  			System.out.println("At line " + String.valueOf(l.line_num()) 
  					   + ", near \"" + l.buff() + "\" :");
  			System.out.println("     " + e.getMessage());
  			e.printStackTrace();
  			System.exit(1);
  		}

Complete code

This code is for a small language called Mini:

    -- Mini.cf
  
    Prog. Program ::= [Stm] ;
  
    terminator Stm "" ;
  
    SDecl.  Stm ::= Type Ident ";"  ;
    SAss.   Stm ::= Ident "=" Exp ";" ;
    SBlock. Stm ::= "{" [Stm] "}" ;
    SPrint. Stm ::= "print" Exp  ";" ;
  
    EVar.    Exp1 ::= Ident ;
    EInt.    Exp1 ::= Integer ;
    EDouble. Exp1 ::= Double ;
    EAdd.    Exp  ::= Exp "+" Exp1 ;
  
    coercions Exp 1 ;
  
    TInt.    Type ::= "int" ;
    TDouble. Type ::= "double" ;

The code is found in laborations/mini.

You can take it as starting point for Java (or Haskell as well).

The Interpreter module

  import Mini.Absyn.*;
  
  import java.util.HashMap;
  import java.util.LinkedList;
  
  public class Interpreter {
  
      public void interpret(Program p) {
  	Prog prog = (Prog)p;
  	Env env = new Env();
  	for (Stm s : prog.liststm_) {
  	    execStm(s, env);
  	}
      }
  
      private static abstract class Value {
  	public boolean isInt() { return false; }
  	public Integer getInt() { 
  	    throw new RuntimeException(this + " is not an integer."); 
  	}
  	public Double getDouble() { 
  	    throw new RuntimeException(this + " is not a double."); 
  	}
  
  	public static class Undefined extends Value {
  	    public Undefined() {}
  	    public String toString() { return "undefined"; }
  	}
  	public static class IntValue extends Value {
  	    private Integer i;
  	    public IntValue(Integer i) { this.i = i; }
  	    public boolean isInt() { return true; }
  	    public Integer getInt() { return i; }
  	    public String toString() { return i.toString(); }
  	}
  	public static class DoubleValue extends Value {
  	    private Double d;
  	    public DoubleValue(Double d) { this.d = d; }
  	    public Double getDouble() { return d; }
  	    public String toString() { return d.toString(); }
  	}
      }
  
      private static class Env { 
  	private LinkedList<HashMap<String,Value>> scopes;
  
  	public Env() {
  	    scopes = new LinkedList<HashMap<String,Value>>();
  	    enterScope();
  	}
  
  	public Value lookupVar(String x) {
  	    for (HashMap<String,Value> scope : scopes) {
  		Value v = scope.get(x);
  		if (v != null)
  		    return v;
  	    }
  	    throw new RuntimeException("Unknown variable " + x + " in " + scopes);
  	}
  
  	public void addVar(String x) {
  	    scopes.getFirst().put(x,new Value.Undefined());
  	}
  
  	public void setVar(String x, Value v) {
  	    for (HashMap<String,Value> scope : scopes) {
  		if (scope.containsKey(x)) {
  		    scope.put(x,v);
  		    return;
  		}
  	    }
  	}
  
  	public void enterScope() {
  	    scopes.addFirst(new HashMap<String,Value>());
  	}
  
  	public void leaveScope() {
  	    scopes.removeFirst();
  	}
      }
  
      private void execStm(Stm st, Env env) {
  	st.accept(new StmExecuter(), env);
      }
  
      private class StmExecuter implements Stm.Visitor<Object,Env> {
  	public Object visit(Mini.Absyn.SDecl p, Env env) {
  	    env.addVar(p.ident_);
  	    return null;
  	}
  
  	public Object visit(Mini.Absyn.SAss p, Env env) {
  	    env.setVar(p.ident_, evalExp(p.exp_, env));
  	    return null;
  	}
  
  	public Object visit(Mini.Absyn.SBlock p, Env env) {
  	    env.enterScope();
  	    for (Stm st : p.liststm_) {
  		execStm(st, env);
  	    }
  	    env.leaveScope();
  	    return null;
  	}
  
  	public Object visit(Mini.Absyn.SPrint p, Env env) {
  	    Value v = evalExp(p.exp_, env);
  	    System.err.println(v.toString());
  	    return null;
  	}
      }
  
      private Value evalExp(Exp e, Env env) {
  	return e.accept(new ExpEvaluator(), env);
      }
  
      private class ExpEvaluator implements Exp.Visitor<Value,Env> {
  
  	public Value visit(Mini.Absyn.EVar p, Env env) {
  	    return env.lookupVar(p.ident_);
  	}
  
  	public Value visit(Mini.Absyn.EInt p, Env env) {
  	    return new Value.IntValue(p.integer_);
  	}
  	public Value visit(Mini.Absyn.EDouble p, Env env) {
  	    return new Value.DoubleValue(p.double_);
  	}
  	public Value visit(Mini.Absyn.EAdd p, Env env) {
  	    Value v1 = p.exp_1.accept(this, env);
  	    Value v2 = p.exp_2.accept(this, env);
  	    if (v1.isInt()) {
  		return new Value.IntValue(v1.getInt() + v2.getInt());
  	    } else {
  		return new Value.DoubleValue(v1.getDouble() + v2.getDouble());
  	    }
  	}
  
      }
  
  }

The interpretation of print and read

These functions are directly inlined in the Eval class visiting expressions that call them.

    public static class Eval implements Exp.Visitor<EnvVal,Env> {
      public EnvVal visit(ECall p, Env env) {
        if (p.id_ == "printInt") {
           EnvVal envval = p.listexp_.element().accept(this, env);
           System.err.println(envval.val.toString()) ;
        } ;
      }
      // ...
    }

Lab 3

We take a look at the lab 3 PM.

Java bytecode interpretation

Byte code, virtual machine code - simpler than high-level source code.

Example: JVM (Java Virtual Machine)

    bipush n  -- push byte constant n
    iadd      -- add two integers; pop the operands and push the result
    imul      -- multiply two integers; pop the operands and push the result
    istore x  -- store value in stack address x and pop it
    iload x   -- push value to stack address x
    dup       -- duplicate the top of the stack
    invokestatic -- call a function with parameters from the top of the
                    stack, pop the parameters and push the value

Java is compiled to JVM (next lecture).

JVM is interpreted, or compiled to native machine code by JIT (Just In Time compilation).

Most "interpreted languages" are actually compiled to byte code. Exception: Ruby.

Example use of invokestatic, generated from printInt(5):

    bipush 5
    invokestatic runtime/iprint(I)V

The type (I)V tells e.g. how many values to pop.

JVM interpreter

Environment: local variable storage and a stack holding values

Primitive actions:

store value in place #i in local storage
load value from place #i in local storage
push a value to the stack
pop a value from the stack

Example execution: 5 * (6 + 7)

    bipush 5 ; bipush 6 ; bipush 7 ; iadd ; imul
  
    --         --         --         --     --
     5          5          5          5     65
                6          6         13
                           7

Local variables in JVM

The compiler assigns local storage addresses to variables (lecture 11).

    int i ;            ; reserve address 0 for i       
    i = 9 ;            bipush 9
                       istore 0
    int j = i + 3 ;    ; reserve address 1 for j
                       iload 0
                       bipush 3
                       iadd
                       istore 1

Semantics of JVM

Naturally expressed using small-step semantics, i.e. each rule specifies one step of computation. The format of small-step rules is

    < Instruction , Env > ⇩ < Env' >

The environment has a storage V and a stack S. The rules work on instructions, executed one at a time.

    <bipush v, V-S>      ⇩ <V-S.v>
    <iadd,     V-S.v.w>  ⇩ <V-S.v+w>
    <imul,     V-S.v.w>  ⇩ <V-S.v*w>
    <iload i,  V-S>      ⇩ <V-S.V(i)>
    <istore i, V-S.v>    ⇩ <V(i:=v)-S>
    <pop,      V-S.v>    ⇩ <V-S>
    <dup,      V-S.v>    ⇩ <V-S.v.v>

Notation used:

Notation	Explanation
`<c , V-S>`	instruction `c`, with storage `V` and stack `S`
`S.v`	stack with all values in `S` plus `v` on the top
`V(i)`	the value at position `i` in storage `V`
`V(i:=v)`	storage `V` with value `v` put into position `i`

Dealing with jumps

JVM does not always continue with the next instruction, but there can be jumps.

Example (my first BASIC program):

    BEGIN:
      bipush 66 
      invokestatic runtime/iprint(I)V 
      goto BEGIN

To give semantics to the goto instruction, we have to add the code C to the environment.

We denote by C(p) the instruction at position p in C.

    <goto LABEL,  C-V-S>  ⇩  <C(LABEL), C-V-S>

Question: How do we now express rules for the "ordinary" instructions?

    <bipush v, C-V-S>      ⇩ <C(?), C-V-S.v>

Answer: we add a code pointer P to the environment:

    <goto LABEL,  P-C-V-S>   ⇩  <C(LABEL), LABEL-C-V-S>
    <bipush v,    P-C-V-S>   ⇩  <C(P+1),   (P+1)-C-V-S.v>