Lecture 8: Interpreters
Programming Languages Course
Aarne Ranta (aarne@chalmers.se)
%!target:html
%!Encoding:utf-8
%!postproc(html): #NEW
%!postproc(html): #HR
%!postproc(html): #sub1 1
%!postproc(html): #subn n
%!postproc(html): #subn1 n-1
Book: nothing
#NEW
==Plan==
Interpretation vs. compilation
Operational semantics
From operational semantics to interpreter code
Evaluation and side effects
Tracing and debugging
From type checker code to interpreter code
Implementing interpreters in Haskell
Implementing interpreters in Java
Lab 3 summary
JVM interpretation
#NEW
==Interpretation vs. compilation==
Interpretation: **run** the code statement by statement
Compilation: **translate** the code into another format
Example:
```
int i = 1 ;
int j = i + 2 ;
printInt(i + j) ;
```
Interpretation goes from state to state:
```
()
(i = 1)
(i = 1, j = 3)
output 4
```
Compilation generates another piece of code (e.g. JVM):
```
bipush 1
dup
istore 0
bipush 2
iadd
istore 1
iload 0
iload 1
iadd
invokestatic runtime/iprint(I)V
```
#NEW
==Implementing interpretation and compilation==
Both techniques use //syntax-directed translation// on
//type-checked syntax tree//.
Both can be easily adapted from the type checker code.
Both have an environment with function signature and
a block-structured variable context.
But the environments contain different things...
... and the theoretical models are different.
Interpretation: **operational semantics** (or, equivalently,
**abstract interpretations**)
Compilation: compilation schemes
#NEW
==Operational semantics==
Inference rules similar to typing rules, with the form of judgement
```
Env => Exp ⇩ Val
```
Read: "In the environment Env, expression Exp computes to value Val".
This is called **big-step semantics**, as there can be many actual computation
steps inside ``Exp ⇩ Val``.
The environment has variables and their values, e.g. ``[x = 0, y = 7]``.
Examples: addition, variables, integer literals.
```
Env => exp1 ⇩ val1 Env => exp2 ⇩ val2
-------------------------------------------
Env => exp1 + exp2 ⇩ val1 + val2
-------------- x = v is in Env -------------- i is an integer literal
Env => x ⇩ v Env => i ⇩ i
```
#NEW
==Operational semantics for statements==
Statements don't compute to values. Instead, they change the environment.
The form of judgement is
```
Env => Stm ⇩ Env'
```
We write ``Env[x = val]`` for updating ``Env`` with
a new value ``val`` for ``x``.
Example: assignments.
```
Env => exp ⇩ val
-------------------------------
Env => x = exp ⇩ Env[x = val]
```
#NEW
==Operational semantics: while loops==
Two rules are needed, depending on whether the condition is true:
```
Env => exp ⇩ false
------------------------------
Env => while (exp) stm ⇩ Env
Env => exp ⇩ true
Env => stm ⇩ Env'
Env' => while (exp) stm ⇩ Env''
---------------------------------
Env => while (exp) stm ⇩ Env''
```
#NEW
==Expressions with side effects==
More complex form of judgements: return a value //and// a new environment:
```
Env => Exp ⇩ (Val,Env')
```
For instance, if assignments are expressions, we have
```
Env => exp ⇩ (val,Env')
--------------------------------------
Env => x = exp ⇩ (val,Env'[x = val])
```
Try this with ``x = (x = x + 1) + 1`` in the initial environment ``[x = 0]`` !
#NEW
==Semantic rules revisited==
Principle: each recursive call of the interpreter may change Env.
```
Env => exp1 ⇩ (val1,Env1) Env1 => exp2 ⇩ (val2,Env2)
----------------------------------------------------------
Env => exp1 + exp2 ⇩ (val1 + val2, Env2)
```
But if there are no recursive calls, Env usually remains constant.
```
-------------- x = v is in Env -------------- i is an integer literal
Env => x ⇩ v Env => i ⇩ i
```
Exception: increments. Notice the difference between pre and post increments!
```
---------------------------------- x = v is in Env
Env => x++ ⇩ (v, Env[x = v+1])
---------------------------------- x = v is in Env
Env => ++x ⇩ (v+1, Env[x = v+1])
```
#NEW
==From semantic rules to interpreter code==
Basic idea is the same as in type checking: from rule
```
J#sub1 ... J#subn
---------------- C
Env => exp ⇩ val
```
generate the code "upside down" - with some modifications,
```
interpret (Env => exp) =
val#sub1 := interpret J#sub1
...
val#subn := interpret J#subn
return f(val#sub1,...,val#subn,C)
```
The rules that are the starting point are now
the abstract interpretation rules (equivalent to
big-step semantic rules).
First look at an example without side effects:
```
Env => exp1 ⇩ val1 Env => exp2 ⇩ val2 interpret Env (exp1 + exp2) =
------------------------------------------- val1 := interpret Env exp1
Env => exp1 + exp2 ⇩ val1 + val2 val2 := interpret Env exp2
return val1 + val2
```
#NEW
==Values==
Interpreting an expression returns a **value**.
Therefore expression interpretation is also called **evaluation**.
Values are also called **canonical expressions**, or expressions in
**normal form**.
Normal form: expression that cannot be evaluated further.
In lab 3, we will need four types of values:
- integer values, e.g. -47
- double values, e.g. 3.14159
- boolean values, ``true`` and ``false``
- a void value, which need never be shown
Thus variables are not values, and complex expressions like ``3 + 2`` are not values.
Good approximation (works when values are finite, atomic objects): values are literals.
In the interpreter implementation,
it is good to have a separate type ``Value`` with a constructor for each type of value.
#NEW
==Operations on values==
**Call by value evaluation strategy**: when executing a function call, first
evaluate the arguments and then call the function with the obtained
values as parameters.
The call by value strategy is followed by many languages: C, C++, Java, ML - but
not Haskell (which has "call by need" - lecture 11).
The simplest example are arithmetic operations: the operands are evaluated
first, e.g. in ``x + y``, ``x`` and ``y`` are evaluated first. The values
are then added by a plus operation defined for values.
#NEW
==Environments for interpreters==
The **signature** holds the interpretations of functions.
- It maps each function name to a function (written in the implementation language!)
which sends values of arguments to a return value.
- In Lab3, we need not necessarily have a signature - we just need
interpretation rules for each of the four built-in functions.
The **context** holds the values of the variables.
- It has a block structure, just like the context in the type checker.
- The values can be uninitialized, if a variable is only declared. Referring
to such a variable gives a run-time error.
- Thus the following methods are needed on the context
```
Value lookupVar (Env, Id)
Env updateVar (Env, Id, Value)
Env newVar (Env, Id)
Env newBlock (Env)
```
#NEW
==Interpreting statements: execution==
Interpreting statements is also called **execution**.
The interpreter defines an ``exec`` function that tells
how the environment is changed by each statement.
```
Env exec (Env, Stm)
// declarations
exec Env (typ var ;) =
newVar Env var
// expression statements
exec Env (exp ;) =
(Env2, val) := eval Env exp
return Env2
// while loops
exec Env (while (exp) stm) =
(Env2, val) := eval Env exp
if val == true
Env3 := exec Env2 stm
exec Env3 (while (exp) stm)
else
return Env2
```
#NEW
==Interpreting expression: evaluation==
Reminder: interpreting expressions statements is also called evaluation.
In Lab 3 (as in C++, C, and Java), expressions may have
**side effects**, i.e. change the environment.
The interpreter defines an ``eval`` function that tells
what value is returned and
how the environment is changed by each statement.
```
(Env, Val) eval (Env, Exp)
// literals
eval Env lit =
return (Env, lit)
// variables
exec Env var =
val := lookupVar Env var
return (Env,val)
// assignments
exec Env (var = exp) =
(Env2,val) := eval Env exp
Env3 := updateVar Env2 var val
return (Env3,val)
```
#NEW
==Evaluation of increments==
Following the C/C++ standard, we distinguish between **preincrements** ``++i``
and **postincrements** ``i++``.
```
// preincrements
exec Env (++ var) =
val := lookupVar Env var
val2 := plusVal(val,1)
Env2 := updateVar Env var val2
return (Env2,val2)
// postincrements
exec Env (var ++) =
val := lookupVar Env var
val2 := plusVal(val,1)
Env2 := updateVar Env var val2
return (Env2,val)
```
#NEW
==Lazy logical operations==
Conjunction,
```
a && b
```
is evaluated //lazily//: first ``a`` is evaluated. If the
result is ``true``, also ``b`` is evaluated, and the
value of ``b`` is returned. However, if
``a`` evaluates to ``false``, then ``false``
is returned without evaluating ``b``.
```
eval Env (a && b) =
(Env2,val) := eval Env a
if val == false
return (Env2,val)
else
eval Env2 b
```
Disjunction,
```
a || b
```
is also evaluated lazily: first ``a`` is evaluated. If the
result is ``false``, also ``b`` is evaluated, and the
value of ``b`` is returned. However, if
``a`` evaluates to ``true``, then ``true``
is returned without evaluating ``b``.
#NEW
==The top-level interpreter==
To interpret the whole program,
+ initialize the context with an empty context
+ interpret the body in this context
```
exec (int main () { Stm#sub1 ... Stm#subn}) =
Env#sub1 := exec .() Stm#sub1
...
exec Env#subn1 Stm#subn
```
#NEW
==Tracing and debugging==
Nice thing about interpretation: the execution can
be precisely traced statement by statement.
For instance: append to each execution function call a
line showing the statement and a line showing the variable
context.
This is useful for debugging programs.
But it is also useful for debugging your interpreter!
Use the old trick of writing a generic tracing function
```
trace Env stm =
print Env ; // comment out in final version
print stm ; //
// return ; // uncomment in final version
```
You can also make this dependent on a command-line flag,
so that the call
```
lab3 -debug file.cc
```
runs your program with tracing on.
Add to this that the program stops after each statement
to wait for an 'enter', and you have turned your interpreter
into a debugger. (But this is not required in Lab 3.)
#NEW
==Example run of the interpreter==
Source code in
[``good09.cc`` ../laborations/lab3/testsuite/good/good09.cc]:
```
int main ()
{
int i = readInt() ; // 5
printInt(i) ; //5
printInt(i++) ; //5
printInt(i) ; //6
printInt(++i) ; //7
printInt(i) ; //7
}
```
Running the interpreter
```
bash$ ./lab3 good09.cc
-- waits for user input, which is 100
100
100
100
101
102
102
```
#NEW
==Example run in debugging mode==
After each statement, the program waits for an Enter.
In addition, it waits for user input with ``read`` functions,
as usual.
```
bash$ ./lab3 -debug good09.cc
[input] 100
i=100
printInt (i);
[output] 100
i=100
printInt (i ++);
[output] 100
i=101
printInt (i);
[output] 101
i=101
printInt (++ i);
[output] 102
i=102
printInt (i);
[output] 102
i=102
```
#NEW
==Interpreter in Haskell==
You can copy the contents of
[``laborations/lab3/haskell/`` ../laborations/lab3/haskell]:
```
CPP.cf -- grammar: from your Lab 2
lab3.hs -- main module
Makefile
TypeChecker.hs -- type checking module: from your Lab 2
Interpreter.hs -- interpreter module
```
You only have to modify ``Interpreter.hs``.
But you can already compile them: just type
```
make
```
and run the interpreter with
```
./lab3
```
The rest is "debugging the empty file"!
You can also start from [``laborations/mini`` ../laborations/mini].
#NEW
===The Main module===
You don't have to write this - but it shows how compiler phases are linked
together.
```
check :: String -> IO ()
check s = case pProgram (myLexer s) of
Bad err -> do putStrLn "SYNTAX ERROR"
putStrLn err
exitFailure
Ok tree -> case typecheck tree of
Bad err -> do putStrLn "TYPE ERROR"
putStrLn err
exitFailure
Ok _ -> interpret tree
```
Notice that the interpreter runs in the IO monad and not the
error monad.
#NEW
==Central types==
Values
```
data Value = Vint Integer | Vdouble Double | Vbool Bool | Vvoid
printValue :: Value -> String
```
Environment type; use ``Map`` for easier update than lists.
```
type Env = (Sig,[Context])
type Sig = [(Id,[Value] -> IO Value)]
type Context = M.Map Id Value
```
The signature is initialized by the primitive functions
```
(Id "printInt", \ [v] -> putStrLn (printValue v) >> return Vvoid)
(Id "printDouble",\ [v] -> putStrLn (printValue v) >> return Vvoid)
(Id "readInt", \ [] -> getLine >>= return . Vint . read)
(Id "readDouble", \ [] -> getLine >>= return . Vdouble . read)
```
#NEW
===The Interpreter module===
The environment datatypes and operations.
Type signatures of the interpretation methods
```
interpret :: Program -> IO ()
exec :: Env -> Stm -> IO Env
eval :: Env -> Exp -> IO (Env,Value)
```
**Aggressivity**
- "Well-typed programs cannot go wrong!"
- therefore we need no more check certain things at runtime.
```
exec env (SWhile exp stm) = do
(env1,Vbool b) <- eval env exp
if b then ...
---
```
#NEW
==Interpreter in Java==
You can copy the contents of
[``laborations/lab3/java/`` ../laborations/lab3/java1.5]:
```
CPP.cf -- grammar (from lab2)
lab3 -- script running the type checker
lab3.java -- main program
Makefile
TypeChecker.java -- type checker class (from lab2)
TypeException.java -- exceptions for type checking
Interpreter.java -- interpreter class
```
You only have to modify ``Interpreter.java``.
You can already compile the files: just type
```
make
```
and run the type checker with
```
./lab3
```
Before ``make``, you may have to set your class path so that it finds
java_cup and JLex, as well as the current directory.
```
export CLASSPATH=.:::$CLASSPATH
```
#NEW
===The main module===
You don't have to write this - but it shows how compiler phases are linked
together.
```
try {
l = new Yylex(new FileReader(args[0]));
parser p = new parser(l);
CPP.Absyn.Program parse_tree = p.pProgram();
new TypeChecker().typecheck(parse_tree);
new Interpreter().interpret(parse_tree);
} catch (TypeException e) {
System.out.println("TYPE ERROR");
System.err.println(e.toString());
System.exit(1);
} catch (RuntimeException e) {
// System.out.println("RUNTIME ERROR");
System.err.println(e.toString());
System.exit(-1);
} catch (IOException e) {
System.err.println(e.toString());
System.exit(1);
} catch (Throwable e) {
System.out.println("SYNTAX ERROR");
System.out.println("At line " + String.valueOf(l.line_num())
+ ", near \"" + l.buff() + "\" :");
System.out.println(" " + e.getMessage());
e.printStackTrace();
System.exit(1);
}
```
#NEW
==Complete code==
This code is for a small language called ``Mini``:
```
-- Mini.cf
Prog. Program ::= [Stm] ;
terminator Stm "" ;
SDecl. Stm ::= Type Ident ";" ;
SAss. Stm ::= Ident "=" Exp ";" ;
SBlock. Stm ::= "{" [Stm] "}" ;
SPrint. Stm ::= "print" Exp ";" ;
EVar. Exp1 ::= Ident ;
EInt. Exp1 ::= Integer ;
EDouble. Exp1 ::= Double ;
EAdd. Exp ::= Exp "+" Exp1 ;
coercions Exp 1 ;
TInt. Type ::= "int" ;
TDouble. Type ::= "double" ;
```
The code is found in [``laborations/mini`` ../laborations/mini].
You can take it as starting point for Java (or Haskell as well).
#NEW
===The Interpreter module===
```
import Mini.Absyn.*;
import java.util.HashMap;
import java.util.LinkedList;
public class Interpreter {
public void interpret(Program p) {
Prog prog = (Prog)p;
Env env = new Env();
for (Stm s : prog.liststm_) {
execStm(s, env);
}
}
private static abstract class Value {
public boolean isInt() { return false; }
public Integer getInt() {
throw new RuntimeException(this + " is not an integer.");
}
public Double getDouble() {
throw new RuntimeException(this + " is not a double.");
}
public static class Undefined extends Value {
public Undefined() {}
public String toString() { return "undefined"; }
}
public static class IntValue extends Value {
private Integer i;
public IntValue(Integer i) { this.i = i; }
public boolean isInt() { return true; }
public Integer getInt() { return i; }
public String toString() { return i.toString(); }
}
public static class DoubleValue extends Value {
private Double d;
public DoubleValue(Double d) { this.d = d; }
public Double getDouble() { return d; }
public String toString() { return d.toString(); }
}
}
private static class Env {
private LinkedList> scopes;
public Env() {
scopes = new LinkedList>();
enterScope();
}
public Value lookupVar(String x) {
for (HashMap scope : scopes) {
Value v = scope.get(x);
if (v != null)
return v;
}
throw new RuntimeException("Unknown variable " + x + " in " + scopes);
}
public void addVar(String x) {
scopes.getFirst().put(x,new Value.Undefined());
}
public void setVar(String x, Value v) {
for (HashMap scope : scopes) {
if (scope.containsKey(x)) {
scope.put(x,v);
return;
}
}
}
public void enterScope() {
scopes.addFirst(new HashMap());
}
public void leaveScope() {
scopes.removeFirst();
}
}
private void execStm(Stm st, Env env) {
st.accept(new StmExecuter(), env);
}
private class StmExecuter implements Stm.Visitor