Lecture 8: Interpreters
Programming Languages Course
Aarne Ranta (aarne@chalmers.se)


%!target:html

%!Encoding:utf-8

%!postproc(html): #NEW <!-- NEW -->
%!postproc(html): #HR <HR>
%!postproc(html): #sub1 <sub>1</sub>
%!postproc(html): #subn <sub>n</sub>
%!postproc(html): #subn1 <sub>n-1</sub>


Book: nothing


#NEW

==Plan==

Interpretation vs. compilation

Operational semantics

From operational semantics to interpreter code

Evaluation and side effects

Tracing and debugging

From type checker code to interpreter code

Implementing interpreters in Haskell

Implementing interpreters in Java

Lab 3 summary

JVM interpretation


#NEW

==Interpretation vs. compilation==

Interpretation: **run** the code statement by statement

Compilation: **translate** the code into another format

Example:
```
  int i = 1 ;
  int j = i + 2 ;
  printInt(i + j) ;
```
Interpretation goes from state to state:
```
  ()
  (i = 1)
  (i = 1, j = 3)
  output 4
```
Compilation generates another piece of code (e.g. JVM):
```
  bipush 1
  dup
  istore 0
  bipush 2
  iadd
  istore 1
  iload 0
  iload 1
  iadd
  invokestatic runtime/iprint(I)V
```


#NEW

==Implementing interpretation and compilation==

Both techniques use //syntax-directed translation// on
//type-checked syntax tree//.

Both can be easily adapted from the type checker code.

Both have an environment with function signature and
a block-structured variable context.

But the environments contain different things...

... and the theoretical models are different.

Interpretation: **operational semantics** (or, equivalently,
**abstract interpretations**)

Compilation: compilation schemes


#NEW

==Operational semantics==

Inference rules similar to typing rules, with the form of judgement
```
  Env => Exp ⇩ Val
```
Read: "In the environment Env, expression Exp computes to value Val".
This is called **big-step semantics**, as there can be many actual computation
steps inside ``Exp ⇩ Val``. 

The environment has variables and their values, e.g. ``[x = 0, y = 7]``.

Examples: addition, variables, integer literals.
```
  Env => exp1 ⇩ val1   Env => exp2 ⇩ val2
  -------------------------------------------
      Env => exp1 + exp2 ⇩ val1 + val2


  --------------  x = v is in Env     -------------- i is an integer literal
  Env => x ⇩ v                      Env => i ⇩ i
```


#NEW

==Operational semantics for statements==

Statements don't compute to values. Instead, they change the environment.
The form of judgement is
```
  Env => Stm ⇩ Env'
```
We write ``Env[x = val]`` for updating ``Env`` with
a new value ``val`` for ``x``.

Example: assignments. 
```
  Env => exp ⇩ val
  -------------------------------
  Env => x = exp ⇩ Env[x = val]
```

#NEW

==Operational semantics: while loops==

Two rules are needed, depending on whether the condition is true:
```
  Env => exp ⇩ false
  ------------------------------
  Env => while (exp) stm ⇩ Env

  Env  => exp ⇩ true    
  Env  => stm ⇩ Env'   
  Env' => while (exp) stm ⇩ Env''
  ---------------------------------
  Env  => while (exp) stm ⇩ Env''
```


#NEW

==Expressions with side effects==

More complex form of judgements: return a value //and// a new environment:
```
  Env => Exp ⇩ (Val,Env')
```
For instance, if assignments are expressions, we have
```
  Env => exp ⇩ (val,Env')
  --------------------------------------
  Env => x = exp ⇩ (val,Env'[x = val])
```
Try this with ``x = (x = x + 1) + 1`` in the initial environment ``[x = 0]`` !


#NEW

==Semantic rules revisited==

Principle: each recursive call of the interpreter may change Env.
```
  Env => exp1 ⇩ (val1,Env1)   Env1 => exp2 ⇩ (val2,Env2)
  ----------------------------------------------------------
      Env => exp1 + exp2 ⇩ (val1 + val2, Env2)
```
But if there are no recursive calls, Env usually remains constant.
```
  --------------  x = v is in Env     -------------- i is an integer literal
  Env => x ⇩ v                      Env => i ⇩ i
```
Exception: increments. Notice the difference between pre and post increments!
```
  ----------------------------------  x = v is in Env
  Env => x++ ⇩ (v,   Env[x = v+1])

  ----------------------------------  x = v is in Env
  Env => ++x ⇩ (v+1, Env[x = v+1])
```


#NEW

==From semantic rules to interpreter code==

Basic idea is the same as in type checking: from rule
```
      J#sub1 ...  J#subn
    ---------------- C
    Env => exp ⇩ val
```
generate the code "upside down" - with some modifications,
```
  interpret (Env => exp) =
    val#sub1 := interpret J#sub1
    ...
    val#subn := interpret J#subn
    return f(val#sub1,...,val#subn,C)
```
The rules that are the starting point are now 
the abstract interpretation rules (equivalent to 
big-step semantic rules). 

First look at an example without side effects:
```
  Env => exp1 ⇩ val1   Env => exp2 ⇩ val2     interpret Env (exp1 + exp2) =
  -------------------------------------------       val1 := interpret Env exp1
  Env => exp1 + exp2 ⇩ val1 + val2                val2 := interpret Env exp2
                                                    return val1 + val2
```


#NEW

==Values==

Interpreting an expression returns a **value**.

Therefore expression interpretation is also called **evaluation**.

Values are also called **canonical expressions**, or expressions in 
**normal form**.

Normal form: expression that cannot be evaluated further.

In lab 3, we will need four types of values:
- integer values, e.g. -47
- double values, e.g. 3.14159
- boolean values, ``true`` and ``false``
- a void value, which need never be shown


Thus variables are not values, and complex expressions like ``3 + 2`` are not values.

Good approximation (works when values are finite, atomic objects): values are literals.

In the interpreter implementation, 
it is good to have a separate type ``Value`` with a constructor for each type of value.


#NEW

==Operations on values==

**Call by value evaluation strategy**: when executing a function call, first 
evaluate the arguments and then call the function with the obtained 
values as parameters.

The call by value strategy is followed by many languages: C, C++, Java, ML - but
not Haskell (which has "call by need" - lecture 11).

The simplest example are arithmetic operations: the operands are evaluated
first, e.g. in ``x + y``, ``x`` and ``y`` are evaluated first. The values
are then added by a plus operation defined for values.


#NEW

==Environments for interpreters==

The **signature** holds the interpretations of functions.
- It maps each function name to a function (written in the implementation language!)
  which sends values of arguments to a return value.
- In Lab3, we need not necessarily have a signature - we just need
  interpretation rules for each of the four built-in functions.


The **context** holds the values of the variables.
- It has a block structure, just like the context in the type checker.
- The values can be uninitialized, if a variable is only declared. Referring
  to such a variable gives a run-time error.
- Thus the following methods are needed on the context
```
  Value lookupVar  (Env, Id)
  Env   updateVar  (Env, Id, Value)
  Env   newVar     (Env, Id)
  Env   newBlock   (Env)
```


#NEW

==Interpreting statements: execution==

Interpreting statements is also called **execution**.

The interpreter defines an ``exec`` function that tells
how the environment is changed by each statement.
```
  Env exec (Env, Stm)

  // declarations
  exec Env (typ var ;) =
    newVar Env var
 
  // expression statements
  exec Env (exp ;) =
    (Env2, val) := eval Env exp
    return Env2

  // while loops
  exec Env (while (exp) stm) =
    (Env2, val) := eval Env exp
    if val == true
      Env3 := exec Env2 stm
      exec Env3 (while (exp) stm)
    else
      return Env2
```


#NEW

==Interpreting expression: evaluation==

Reminder: interpreting expressions statements is also called evaluation.

In Lab 3 (as in C++, C, and Java), expressions may have
**side effects**, i.e. change the environment.

The interpreter defines an ``eval`` function that tells
what value is returned and 
how the environment is changed by each statement.
```
  (Env, Val) eval (Env, Exp)

  // literals
  eval Env lit = 
    return (Env, lit)
 
  // variables
  exec Env var =
    val := lookupVar Env var
    return (Env,val)

  // assignments
  exec Env (var = exp) =
    (Env2,val) := eval Env exp
    Env3       := updateVar Env2 var val
    return (Env3,val)
```


#NEW

==Evaluation of increments==

Following the C/C++ standard, we distinguish between **preincrements** ``++i``
and **postincrements** ``i++``.
```
  // preincrements
  exec Env (++ var) =
    val  := lookupVar Env var
    val2 := plusVal(val,1)
    Env2 := updateVar Env var val2
    return (Env2,val2)

  // postincrements
  exec Env (var ++) =
    val  := lookupVar Env var
    val2 := plusVal(val,1)
    Env2 := updateVar Env var val2
    return (Env2,val)
```


#NEW

==Lazy logical operations==

Conjunction,
```
  a && b
```
is evaluated //lazily//: first ``a`` is evaluated. If the
result is ``true``, also ``b`` is evaluated, and the
value of ``b`` is returned. However, if
``a`` evaluates to ``false``, then ``false``
is returned without evaluating ``b``.
```
  eval Env (a && b) = 
    (Env2,val) := eval Env a
    if val == false
       return (Env2,val)
    else
       eval Env2 b
```
Disjunction,
```
  a || b
```
is also evaluated lazily: first ``a`` is evaluated. If the
result is ``false``, also ``b`` is evaluated, and the
value of ``b`` is returned. However, if
``a`` evaluates to ``true``, then ``true``
is returned without evaluating ``b``.


#NEW

==The top-level interpreter==

To interpret the whole program,
+ initialize the context with an empty context
+ interpret the body in this context


```
  exec (int main () { Stm#sub1 ... Stm#subn}) =
    Env#sub1 := exec .() Stm#sub1
      ...
    exec Env#subn1 Stm#subn
```


#NEW

==Tracing and debugging==

Nice thing about interpretation: the execution can
be precisely traced statement by statement.

For instance: append to each execution function call a
line showing the statement and a line showing the variable 
context. 

This is useful for debugging programs.

But it is also useful for debugging your interpreter!
Use the old trick of writing a generic tracing function
```
   trace Env stm =
     print Env ;   // comment out in final version
     print stm ;   //
     // return ;   // uncomment in final version
```
You can also make this dependent on a command-line flag,
so that the call
```
  lab3 -debug file.cc
```
runs your program with tracing on.

Add to this that the program stops after each statement
to wait for an 'enter', and you have turned your interpreter
into a debugger. (But this is not required in Lab 3.)


#NEW

==Example run of the interpreter==

Source code in
[``good09.cc`` ../laborations/lab3/testsuite/good/good09.cc]:
```
int main ()
{
  int i = readInt() ; // 5

  printInt(i) ;   //5
  printInt(i++) ; //5
  printInt(i) ;   //6
  printInt(++i) ; //7
  printInt(i) ;   //7
}
```
Running the interpreter
```
  bash$ ./lab3 good09.cc
  -- waits for user input, which is 100
  100
  100
  100
  101
  102
  102
```


#NEW

==Example run in debugging mode==

After each statement, the program waits for an Enter.
In addition, it waits for user input with ``read`` functions,
as usual.
```
  bash$ ./lab3 -debug good09.cc

  [input] 100
  i=100

  printInt (i);
  [output] 100
  i=100

  printInt (i ++);
  [output] 100
  i=101

  printInt (i);
  [output] 101
  i=101

  printInt (++ i);
  [output] 102
  i=102

  printInt (i);
  [output] 102
  i=102
```


#NEW

==Interpreter in Haskell==

You can copy the contents of
[``laborations/lab3/haskell/`` ../laborations/lab3/haskell]:
```
  CPP.cf           -- grammar: from your Lab 2
  lab3.hs          -- main module
  Makefile         
  TypeChecker.hs   -- type checking module: from your Lab 2
  Interpreter.hs   -- interpreter module
```
You only have to modify ``Interpreter.hs``.

But you can already compile them: just type
```
  make
```
and run the interpreter with
```
  ./lab3 <File>
```
The rest is "debugging the empty file"!
You can also start from [``laborations/mini`` ../laborations/mini].


#NEW

===The Main module===

You don't have to write this - but it shows how compiler phases are linked
together.
```
  check :: String -> IO () 
  check s = case pProgram (myLexer s) of
              Bad err  -> do putStrLn "SYNTAX ERROR"
                             putStrLn err
                             exitFailure 
              Ok  tree -> case typecheck tree of
                            Bad err -> do putStrLn "TYPE ERROR"
                                          putStrLn err
                                          exitFailure 
                            Ok _    -> interpret tree
```
Notice that the interpreter runs in the IO monad and not the
error monad.


#NEW

==Central types==

Values
```
  data Value = Vint Integer | Vdouble Double | Vbool Bool | Vvoid

  printValue :: Value -> String
```
Environment type; use ``Map`` for easier update than lists.
```
  type Env = (Sig,[Context])
  type Sig = [(Id,[Value] -> IO Value)]
  type Context = M.Map Id Value
```
The signature is initialized by the primitive functions
```
  (Id "printInt",   \ [v] -> putStrLn (printValue v) >> return Vvoid)
  (Id "printDouble",\ [v] -> putStrLn (printValue v) >> return Vvoid)
  (Id "readInt",    \ []  -> getLine >>= return . Vint . read)
  (Id "readDouble", \ []  -> getLine >>= return . Vdouble . read)
```


#NEW

===The Interpreter module===

The environment datatypes and operations.

Type signatures of the interpretation methods
```
  interpret :: Program -> IO ()
  exec      :: Env -> Stm -> IO Env
  eval      :: Env -> Exp -> IO (Env,Value)
```
**Aggressivity**
- "Well-typed programs cannot go wrong!"
- therefore we need no more check certain things at runtime.


```
  exec env (SWhile exp stm) = do
    (env1,Vbool b) <- eval env exp
    if b then ...
    ---
```


#NEW

==Interpreter in Java==

You can copy the contents of
[``laborations/lab3/java/`` ../laborations/lab3/java1.5]:
```
  CPP.cf             -- grammar (from lab2)
  lab3               -- script running the type checker
  lab3.java          -- main program
  Makefile
  TypeChecker.java   -- type checker class (from lab2)
  TypeException.java -- exceptions for type checking
  Interpreter.java   -- interpreter class
```
You only have to modify ``Interpreter.java``.

You can already compile the files: just type
```
  make
```
and run the type checker with
```
  ./lab3 <File>
```
Before ``make``, you may have to set your class path so that it finds
java_cup and JLex, as well as the current directory.
```
  export CLASSPATH=.:<path-to-JLex>:<path-to-CUP>:$CLASSPATH
```


#NEW

===The main module===

You don't have to write this - but it shows how compiler phases are linked
together.
```
		try {
			l = new Yylex(new FileReader(args[0]));
			parser p = new parser(l);
			CPP.Absyn.Program parse_tree = p.pProgram();
			new TypeChecker().typecheck(parse_tree);
			new Interpreter().interpret(parse_tree);

		} catch (TypeException e) {
			System.out.println("TYPE ERROR");
			System.err.println(e.toString());
			System.exit(1);
		} catch (RuntimeException e) {
		    //			System.out.println("RUNTIME ERROR");
			System.err.println(e.toString());
			System.exit(-1);
		} catch (IOException e) {
			System.err.println(e.toString());
			System.exit(1);
		} catch (Throwable e) {
			System.out.println("SYNTAX ERROR");
			System.out.println("At line " + String.valueOf(l.line_num()) 
					   + ", near \"" + l.buff() + "\" :");
			System.out.println("     " + e.getMessage());
			e.printStackTrace();
			System.exit(1);
		}
```


#NEW

==Complete code==

This code is for a small language called ``Mini``:
```
  -- Mini.cf

  Prog. Program ::= [Stm] ;

  terminator Stm "" ;

  SDecl.  Stm ::= Type Ident ";"  ;
  SAss.   Stm ::= Ident "=" Exp ";" ;
  SBlock. Stm ::= "{" [Stm] "}" ;
  SPrint. Stm ::= "print" Exp  ";" ;

  EVar.    Exp1 ::= Ident ;
  EInt.    Exp1 ::= Integer ;
  EDouble. Exp1 ::= Double ;
  EAdd.    Exp  ::= Exp "+" Exp1 ;

  coercions Exp 1 ;

  TInt.    Type ::= "int" ;
  TDouble. Type ::= "double" ;
```
The code is found in [``laborations/mini`` ../laborations/mini].

You can take it as starting point for Java (or Haskell as well).


#NEW

===The Interpreter module===

```
import Mini.Absyn.*;

import java.util.HashMap;
import java.util.LinkedList;

public class Interpreter {

    public void interpret(Program p) {
	Prog prog = (Prog)p;
	Env env = new Env();
	for (Stm s : prog.liststm_) {
	    execStm(s, env);
	}
    }

    private static abstract class Value {
	public boolean isInt() { return false; }
	public Integer getInt() { 
	    throw new RuntimeException(this + " is not an integer."); 
	}
	public Double getDouble() { 
	    throw new RuntimeException(this + " is not a double."); 
	}

	public static class Undefined extends Value {
	    public Undefined() {}
	    public String toString() { return "undefined"; }
	}
	public static class IntValue extends Value {
	    private Integer i;
	    public IntValue(Integer i) { this.i = i; }
	    public boolean isInt() { return true; }
	    public Integer getInt() { return i; }
	    public String toString() { return i.toString(); }
	}
	public static class DoubleValue extends Value {
	    private Double d;
	    public DoubleValue(Double d) { this.d = d; }
	    public Double getDouble() { return d; }
	    public String toString() { return d.toString(); }
	}
    }

    private static class Env { 
	private LinkedList<HashMap<String,Value>> scopes;

	public Env() {
	    scopes = new LinkedList<HashMap<String,Value>>();
	    enterScope();
	}

	public Value lookupVar(String x) {
	    for (HashMap<String,Value> scope : scopes) {
		Value v = scope.get(x);
		if (v != null)
		    return v;
	    }
	    throw new RuntimeException("Unknown variable " + x + " in " + scopes);
	}

	public void addVar(String x) {
	    scopes.getFirst().put(x,new Value.Undefined());
	}

	public void setVar(String x, Value v) {
	    for (HashMap<String,Value> scope : scopes) {
		if (scope.containsKey(x)) {
		    scope.put(x,v);
		    return;
		}
	    }
	}

	public void enterScope() {
	    scopes.addFirst(new HashMap<String,Value>());
	}

	public void leaveScope() {
	    scopes.removeFirst();
	}
    }

    private void execStm(Stm st, Env env) {
	st.accept(new StmExecuter(), env);
    }

    private class StmExecuter implements Stm.Visitor<Object,Env> {
	public Object visit(Mini.Absyn.SDecl p, Env env) {
	    env.addVar(p.ident_);
	    return null;
	}

	public Object visit(Mini.Absyn.SAss p, Env env) {
	    env.setVar(p.ident_, evalExp(p.exp_, env));
	    return null;
	}

	public Object visit(Mini.Absyn.SBlock p, Env env) {
	    env.enterScope();
	    for (Stm st : p.liststm_) {
		execStm(st, env);
	    }
	    env.leaveScope();
	    return null;
	}

	public Object visit(Mini.Absyn.SPrint p, Env env) {
	    Value v = evalExp(p.exp_, env);
	    System.err.println(v.toString());
	    return null;
	}
    }

    private Value evalExp(Exp e, Env env) {
	return e.accept(new ExpEvaluator(), env);
    }

    private class ExpEvaluator implements Exp.Visitor<Value,Env> {

	public Value visit(Mini.Absyn.EVar p, Env env) {
	    return env.lookupVar(p.ident_);
	}

	public Value visit(Mini.Absyn.EInt p, Env env) {
	    return new Value.IntValue(p.integer_);
	}
	public Value visit(Mini.Absyn.EDouble p, Env env) {
	    return new Value.DoubleValue(p.double_);
	}
	public Value visit(Mini.Absyn.EAdd p, Env env) {
	    Value v1 = p.exp_1.accept(this, env);
	    Value v2 = p.exp_2.accept(this, env);
	    if (v1.isInt()) {
		return new Value.IntValue(v1.getInt() + v2.getInt());
	    } else {
		return new Value.DoubleValue(v1.getDouble() + v2.getDouble());
	    }
	}

    }

}
```


#NEW

==The interpretation of print and read==

These functions are directly inlined in the ``Eval`` class visiting
expressions that call them.
```
  public static class Eval implements Exp.Visitor<EnvVal,Env> {
    public EnvVal visit(ECall p, Env env) {
      if (p.id_ == "printInt") {
         EnvVal envval = p.listexp_.element().accept(this, env);
         System.err.println(envval.val.toString()) ;
      } ;
    }
    // ...
  }
```

#NEW

==Lab 3==

We take a look at the [lab 3 PM ../laborations/lab3/lab3.html].


#NEW

==Java bytecode interpretation==

Byte code, virtual machine code - simpler than high-level source code.

Example: JVM (Java Virtual Machine)
```
  bipush n  -- push byte constant n
  iadd      -- add two integers; pop the operands and push the result
  imul      -- multiply two integers; pop the operands and push the result
  istore x  -- store value in stack address x and pop it
  iload x   -- push value to stack address x
  dup       -- duplicate the top of the stack
  invokestatic -- call a function with parameters from the top of the
                  stack, pop the parameters and push the value
```
Java is compiled to JVM (next lecture).

JVM is interpreted, or compiled to native machine code by JIT
(Just In Time compilation).

Most "interpreted languages" are actually compiled to byte code.
Exception: Ruby.

Example use of ``invokestatic``, generated from ``printInt(5)``:
```
  bipush 5
  invokestatic runtime/iprint(I)V
```
The type ``(I)V`` tells e.g. how many values to pop.


#NEW

==JVM interpreter==

Environment: **local variable storage** and a **stack** holding values

Primitive actions: 
- **store** value in place #i in local storage
- **load** value from place #i in local storage
- **push** a value to the stack 
- **pop** a value from the stack


Example execution: ``5 * (6 + 7)``
```
  bipush 5 ; bipush 6 ; bipush 7 ; iadd ; imul

  --         --         --         --     --
   5          5          5          5     65
              6          6         13
                         7
```


#NEW

==Local variables in JVM==

The compiler assigns local storage addresses to variables (lecture 11).
```
  int i ;            ; reserve address 0 for i       
  i = 9 ;            bipush 9
                     istore 0
  int j = i + 3 ;    ; reserve address 1 for j
                     iload 0
                     bipush 3
                     iadd
                     istore 1
```


#NEW

==Semantics of JVM==

Naturally expressed using **small-step semantics**, i.e. each rule specifies one
step of computation. The format of small-step rules is
```
  < Instruction , Env > ⇩ < Env' >
```
The environment has a storage V and a stack S.
The rules work on instructions, executed one at a time.
```
  <bipush v, V-S>      ⇩ <V-S.v>
  <iadd,     V-S.v.w>  ⇩ <V-S.v+w>
  <imul,     V-S.v.w>  ⇩ <V-S.v*w>
  <iload i,  V-S>      ⇩ <V-S.V(i)>
  <istore i, V-S.v>    ⇩ <V(i:=v)-S>
  <pop,      V-S.v>    ⇩ <V-S>
  <dup,      V-S.v>    ⇩ <V-S.v.v>
```
Notation used:
  || Notation      | Explanation ||
  |  ``<c , V-S>`` | instruction ``c``, with storage ``V`` and stack ``S`` |
  |  ``S.v``       | stack with all values in ``S`` plus ``v`` on the top |
  |  ``V(i)``      | the value at position ``i`` in storage ``V`` |
  |  ``V(i:=v)``   | storage ``V`` with value ``v`` put into position ``i`` |


#NEW

==Dealing with jumps==

JVM does not always continue with the next instruction, but there can be jumps.

Example (my first BASIC program):
```
  BEGIN:
    bipush 66 
    invokestatic runtime/iprint(I)V 
    goto BEGIN
```
To give semantics to the ``goto`` instruction, we have to add the 
code ``C`` to the environment.

We denote by ``C(p)`` the instruction at position ``p`` in ``C``.
```
  <goto LABEL,  C-V-S>  ⇩  <C(LABEL), C-V-S>
```
Question: How do we now express rules for the "ordinary" instructions?
```
  <bipush v, C-V-S>      ⇩ <C(?), C-V-S.v>
```
Answer: we add a code pointer ``P`` to the environment:
```
  <goto LABEL,  P-C-V-S>   ⇩  <C(LABEL), LABEL-C-V-S>
  <bipush v,    P-C-V-S>   ⇩  <C(P+1),   (P+1)-C-V-S.v>
```