Book: nothing
Interpretation vs. compilation
Operational semantics
From operational semantics to interpreter code
Evaluation and side effects
Tracing and debugging
From type checker code to interpreter code
Implementing interpreters in Haskell
Implementing interpreters in Java
Lab 3 summary
JVM interpretation
Interpretation: run the code statement by statement
Compilation: translate the code into another format
Example:
int i = 1 ; int j = i + 2 ; printInt(i + j) ;
Interpretation goes from state to state:
() (i = 1) (i = 1, j = 3) output 4
Compilation generates another piece of code (e.g. JVM):
bipush 1 dup istore 0 bipush 2 iadd istore 1 iload 0 iload 1 iadd invokestatic runtime/iprint(I)V
Both techniques use syntax-directed translation on type-checked syntax tree.
Both can be easily adapted from the type checker code.
Both have an environment with function signature and a block-structured variable context.
But the environments contain different things...
... and the theoretical models are different.
Interpretation: operational semantics (or, equivalently, abstract interpretations)
Compilation: compilation schemes
Inference rules similar to typing rules, with the form of judgement
Env => Exp ⇩ Val
Read: "In the environment Env, expression Exp computes to value Val".
This is called big-step semantics, as there can be many actual computation
steps inside Exp ⇩ Val
.
The environment has variables and their values, e.g. [x = 0, y = 7]
.
Examples: addition, variables, integer literals.
Env => exp1 ⇩ val1 Env => exp2 ⇩ val2 ------------------------------------------- Env => exp1 + exp2 ⇩ val1 + val2 -------------- x = v is in Env -------------- i is an integer literal Env => x ⇩ v Env => i ⇩ i
Statements don't compute to values. Instead, they change the environment. The form of judgement is
Env => Stm ⇩ Env'
We write Env[x = val]
for updating Env
with
a new value val
for x
.
Example: assignments.
Env => exp ⇩ val ------------------------------- Env => x = exp ⇩ Env[x = val]
Two rules are needed, depending on whether the condition is true:
Env => exp ⇩ false ------------------------------ Env => while (exp) stm ⇩ Env Env => exp ⇩ true Env => stm ⇩ Env' Env' => while (exp) stm ⇩ Env'' --------------------------------- Env => while (exp) stm ⇩ Env''
More complex form of judgements: return a value and a new environment:
Env => Exp ⇩ (Val,Env')
For instance, if assignments are expressions, we have
Env => exp ⇩ (val,Env') -------------------------------------- Env => x = exp ⇩ (val,Env'[x = val])
Try this with x = (x = x + 1) + 1
in the initial environment [x = 0]
!
Principle: each recursive call of the interpreter may change Env.
Env => exp1 ⇩ (val1,Env1) Env1 => exp2 ⇩ (val2,Env2) ---------------------------------------------------------- Env => exp1 + exp2 ⇩ (val1 + val2, Env2)
But if there are no recursive calls, Env usually remains constant.
-------------- x = v is in Env -------------- i is an integer literal Env => x ⇩ v Env => i ⇩ i
Exception: increments. Notice the difference between pre and post increments!
---------------------------------- x = v is in Env Env => x++ ⇩ (v, Env[x = v+1]) ---------------------------------- x = v is in Env Env => ++x ⇩ (v+1, Env[x = v+1])
Basic idea is the same as in type checking: from rule
J1 ... Jn ---------------- C Env => exp ⇩ val
generate the code "upside down" - with some modifications,
interpret (Env => exp) = val1 := interpret J1 ... valn := interpret Jn return f(val1,...,valn,C)
The rules that are the starting point are now the abstract interpretation rules (equivalent to big-step semantic rules).
First look at an example without side effects:
Env => exp1 ⇩ val1 Env => exp2 ⇩ val2 interpret Env (exp1 + exp2) = ------------------------------------------- val1 := interpret Env exp1 Env => exp1 + exp2 ⇩ val1 + val2 val2 := interpret Env exp2 return val1 + val2
Interpreting an expression returns a value.
Therefore expression interpretation is also called evaluation.
Values are also called canonical expressions, or expressions in normal form.
Normal form: expression that cannot be evaluated further.
In lab 3, we will need four types of values:
true
and false
Thus variables are not values, and complex expressions like 3 + 2
are not values.
Good approximation (works when values are finite, atomic objects): values are literals.
In the interpreter implementation,
it is good to have a separate type Value
with a constructor for each type of value.
Call by value evaluation strategy: when executing a function call, first evaluate the arguments and then call the function with the obtained values as parameters.
The call by value strategy is followed by many languages: C, C++, Java, ML - but not Haskell (which has "call by need" - lecture 11).
The simplest example are arithmetic operations: the operands are evaluated
first, e.g. in x + y
, x
and y
are evaluated first. The values
are then added by a plus operation defined for values.
The signature holds the interpretations of functions.
The context holds the values of the variables.
Value lookupVar (Env, Id) Env updateVar (Env, Id, Value) Env newVar (Env, Id) Env newBlock (Env)
Interpreting statements is also called execution.
The interpreter defines an exec
function that tells
how the environment is changed by each statement.
Env exec (Env, Stm) // declarations exec Env (typ var ;) = newVar Env var // expression statements exec Env (exp ;) = (Env2, val) := eval Env exp return Env2 // while loops exec Env (while (exp) stm) = (Env2, val) := eval Env exp if val == true Env3 := exec Env2 stm exec Env3 (while (exp) stm) else return Env2
Reminder: interpreting expressions statements is also called evaluation.
In Lab 3 (as in C++, C, and Java), expressions may have side effects, i.e. change the environment.
The interpreter defines an eval
function that tells
what value is returned and
how the environment is changed by each statement.
(Env, Val) eval (Env, Exp) // literals eval Env lit = return (Env, lit) // variables exec Env var = val := lookupVar Env var return (Env,val) // assignments exec Env (var = exp) = (Env2,val) := eval Env exp Env3 := updateVar Env2 var val return (Env3,val)
Following the C/C++ standard, we distinguish between preincrements ++i
and postincrements i++
.
// preincrements exec Env (++ var) = val := lookupVar Env var val2 := plusVal(val,1) Env2 := updateVar Env var val2 return (Env2,val2) // postincrements exec Env (var ++) = val := lookupVar Env var val2 := plusVal(val,1) Env2 := updateVar Env var val2 return (Env2,val)
Conjunction,
a && b
is evaluated lazily: first a
is evaluated. If the
result is true
, also b
is evaluated, and the
value of b
is returned. However, if
a
evaluates to false
, then false
is returned without evaluating b
.
eval Env (a && b) = (Env2,val) := eval Env a if val == false return (Env2,val) else eval Env2 b
Disjunction,
a || b
is also evaluated lazily: first a
is evaluated. If the
result is false
, also b
is evaluated, and the
value of b
is returned. However, if
a
evaluates to true
, then true
is returned without evaluating b
.
To interpret the whole program,
exec (int main () { Stm1 ... Stmn}) = Env1 := exec .() Stm1 ... exec Envn1 Stmn
Nice thing about interpretation: the execution can be precisely traced statement by statement.
For instance: append to each execution function call a line showing the statement and a line showing the variable context.
This is useful for debugging programs.
But it is also useful for debugging your interpreter! Use the old trick of writing a generic tracing function
trace Env stm = print Env ; // comment out in final version print stm ; // // return ; // uncomment in final version
You can also make this dependent on a command-line flag, so that the call
lab3 -debug file.cc
runs your program with tracing on.
Add to this that the program stops after each statement to wait for an 'enter', and you have turned your interpreter into a debugger. (But this is not required in Lab 3.)
Source code in
good09.cc
:
int main () { int i = readInt() ; // 5 printInt(i) ; //5 printInt(i++) ; //5 printInt(i) ; //6 printInt(++i) ; //7 printInt(i) ; //7 }
Running the interpreter
bash$ ./lab3 good09.cc -- waits for user input, which is 100 100 100 100 101 102 102
After each statement, the program waits for an Enter.
In addition, it waits for user input with read
functions,
as usual.
bash$ ./lab3 -debug good09.cc [input] 100 i=100 printInt (i); [output] 100 i=100 printInt (i ++); [output] 100 i=101 printInt (i); [output] 101 i=101 printInt (++ i); [output] 102 i=102 printInt (i); [output] 102 i=102
You can copy the contents of
laborations/lab3/haskell/
:
CPP.cf -- grammar: from your Lab 2 lab3.hs -- main module Makefile TypeChecker.hs -- type checking module: from your Lab 2 Interpreter.hs -- interpreter module
You only have to modify Interpreter.hs
.
But you can already compile them: just type
make
and run the interpreter with
./lab3 <File>
The rest is "debugging the empty file"!
You can also start from laborations/mini
.
You don't have to write this - but it shows how compiler phases are linked together.
check :: String -> IO () check s = case pProgram (myLexer s) of Bad err -> do putStrLn "SYNTAX ERROR" putStrLn err exitFailure Ok tree -> case typecheck tree of Bad err -> do putStrLn "TYPE ERROR" putStrLn err exitFailure Ok _ -> interpret tree
Notice that the interpreter runs in the IO monad and not the error monad.
Values
data Value = Vint Integer | Vdouble Double | Vbool Bool | Vvoid printValue :: Value -> String
Environment type; use Map
for easier update than lists.
type Env = (Sig,[Context]) type Sig = [(Id,[Value] -> IO Value)] type Context = M.Map Id Value
The signature is initialized by the primitive functions
(Id "printInt", \ [v] -> putStrLn (printValue v) >> return Vvoid) (Id "printDouble",\ [v] -> putStrLn (printValue v) >> return Vvoid) (Id "readInt", \ [] -> getLine >>= return . Vint . read) (Id "readDouble", \ [] -> getLine >>= return . Vdouble . read)
The environment datatypes and operations.
Type signatures of the interpretation methods
interpret :: Program -> IO () exec :: Env -> Stm -> IO Env eval :: Env -> Exp -> IO (Env,Value)
Aggressivity
exec env (SWhile exp stm) = do (env1,Vbool b) <- eval env exp if b then ... ---
You can copy the contents of
laborations/lab3/java/
:
CPP.cf -- grammar (from lab2) lab3 -- script running the type checker lab3.java -- main program Makefile TypeChecker.java -- type checker class (from lab2) TypeException.java -- exceptions for type checking Interpreter.java -- interpreter class
You only have to modify Interpreter.java
.
You can already compile the files: just type
make
and run the type checker with
./lab3 <File>
Before make
, you may have to set your class path so that it finds
java_cup and JLex, as well as the current directory.
export CLASSPATH=.:<path-to-JLex>:<path-to-CUP>:$CLASSPATH
You don't have to write this - but it shows how compiler phases are linked together.
try { l = new Yylex(new FileReader(args[0])); parser p = new parser(l); CPP.Absyn.Program parse_tree = p.pProgram(); new TypeChecker().typecheck(parse_tree); new Interpreter().interpret(parse_tree); } catch (TypeException e) { System.out.println("TYPE ERROR"); System.err.println(e.toString()); System.exit(1); } catch (RuntimeException e) { // System.out.println("RUNTIME ERROR"); System.err.println(e.toString()); System.exit(-1); } catch (IOException e) { System.err.println(e.toString()); System.exit(1); } catch (Throwable e) { System.out.println("SYNTAX ERROR"); System.out.println("At line " + String.valueOf(l.line_num()) + ", near \"" + l.buff() + "\" :"); System.out.println(" " + e.getMessage()); e.printStackTrace(); System.exit(1); }
This code is for a small language called Mini
:
-- Mini.cf Prog. Program ::= [Stm] ; terminator Stm "" ; SDecl. Stm ::= Type Ident ";" ; SAss. Stm ::= Ident "=" Exp ";" ; SBlock. Stm ::= "{" [Stm] "}" ; SPrint. Stm ::= "print" Exp ";" ; EVar. Exp1 ::= Ident ; EInt. Exp1 ::= Integer ; EDouble. Exp1 ::= Double ; EAdd. Exp ::= Exp "+" Exp1 ; coercions Exp 1 ; TInt. Type ::= "int" ; TDouble. Type ::= "double" ;
The code is found in laborations/mini
.
You can take it as starting point for Java (or Haskell as well).
import Mini.Absyn.*; import java.util.HashMap; import java.util.LinkedList; public class Interpreter { public void interpret(Program p) { Prog prog = (Prog)p; Env env = new Env(); for (Stm s : prog.liststm_) { execStm(s, env); } } private static abstract class Value { public boolean isInt() { return false; } public Integer getInt() { throw new RuntimeException(this + " is not an integer."); } public Double getDouble() { throw new RuntimeException(this + " is not a double."); } public static class Undefined extends Value { public Undefined() {} public String toString() { return "undefined"; } } public static class IntValue extends Value { private Integer i; public IntValue(Integer i) { this.i = i; } public boolean isInt() { return true; } public Integer getInt() { return i; } public String toString() { return i.toString(); } } public static class DoubleValue extends Value { private Double d; public DoubleValue(Double d) { this.d = d; } public Double getDouble() { return d; } public String toString() { return d.toString(); } } } private static class Env { private LinkedList<HashMap<String,Value>> scopes; public Env() { scopes = new LinkedList<HashMap<String,Value>>(); enterScope(); } public Value lookupVar(String x) { for (HashMap<String,Value> scope : scopes) { Value v = scope.get(x); if (v != null) return v; } throw new RuntimeException("Unknown variable " + x + " in " + scopes); } public void addVar(String x) { scopes.getFirst().put(x,new Value.Undefined()); } public void setVar(String x, Value v) { for (HashMap<String,Value> scope : scopes) { if (scope.containsKey(x)) { scope.put(x,v); return; } } } public void enterScope() { scopes.addFirst(new HashMap<String,Value>()); } public void leaveScope() { scopes.removeFirst(); } } private void execStm(Stm st, Env env) { st.accept(new StmExecuter(), env); } private class StmExecuter implements Stm.Visitor<Object,Env> { public Object visit(Mini.Absyn.SDecl p, Env env) { env.addVar(p.ident_); return null; } public Object visit(Mini.Absyn.SAss p, Env env) { env.setVar(p.ident_, evalExp(p.exp_, env)); return null; } public Object visit(Mini.Absyn.SBlock p, Env env) { env.enterScope(); for (Stm st : p.liststm_) { execStm(st, env); } env.leaveScope(); return null; } public Object visit(Mini.Absyn.SPrint p, Env env) { Value v = evalExp(p.exp_, env); System.err.println(v.toString()); return null; } } private Value evalExp(Exp e, Env env) { return e.accept(new ExpEvaluator(), env); } private class ExpEvaluator implements Exp.Visitor<Value,Env> { public Value visit(Mini.Absyn.EVar p, Env env) { return env.lookupVar(p.ident_); } public Value visit(Mini.Absyn.EInt p, Env env) { return new Value.IntValue(p.integer_); } public Value visit(Mini.Absyn.EDouble p, Env env) { return new Value.DoubleValue(p.double_); } public Value visit(Mini.Absyn.EAdd p, Env env) { Value v1 = p.exp_1.accept(this, env); Value v2 = p.exp_2.accept(this, env); if (v1.isInt()) { return new Value.IntValue(v1.getInt() + v2.getInt()); } else { return new Value.DoubleValue(v1.getDouble() + v2.getDouble()); } } } }
These functions are directly inlined in the Eval
class visiting
expressions that call them.
public static class Eval implements Exp.Visitor<EnvVal,Env> { public EnvVal visit(ECall p, Env env) { if (p.id_ == "printInt") { EnvVal envval = p.listexp_.element().accept(this, env); System.err.println(envval.val.toString()) ; } ; } // ... }
We take a look at the lab 3 PM.
Byte code, virtual machine code - simpler than high-level source code.
Example: JVM (Java Virtual Machine)
bipush n -- push byte constant n iadd -- add two integers; pop the operands and push the result imul -- multiply two integers; pop the operands and push the result istore x -- store value in stack address x and pop it iload x -- push value to stack address x dup -- duplicate the top of the stack invokestatic -- call a function with parameters from the top of the stack, pop the parameters and push the value
Java is compiled to JVM (next lecture).
JVM is interpreted, or compiled to native machine code by JIT (Just In Time compilation).
Most "interpreted languages" are actually compiled to byte code. Exception: Ruby.
Example use of invokestatic
, generated from printInt(5)
:
bipush 5 invokestatic runtime/iprint(I)V
The type (I)V
tells e.g. how many values to pop.
Environment: local variable storage and a stack holding values
Primitive actions:
Example execution: 5 * (6 + 7)
bipush 5 ; bipush 6 ; bipush 7 ; iadd ; imul -- -- -- -- -- 5 5 5 5 65 6 6 13 7
The compiler assigns local storage addresses to variables (lecture 11).
int i ; ; reserve address 0 for i i = 9 ; bipush 9 istore 0 int j = i + 3 ; ; reserve address 1 for j iload 0 bipush 3 iadd istore 1
Naturally expressed using small-step semantics, i.e. each rule specifies one step of computation. The format of small-step rules is
< Instruction , Env > ⇩ < Env' >
The environment has a storage V and a stack S. The rules work on instructions, executed one at a time.
<bipush v, V-S> ⇩ <V-S.v> <iadd, V-S.v.w> ⇩ <V-S.v+w> <imul, V-S.v.w> ⇩ <V-S.v*w> <iload i, V-S> ⇩ <V-S.V(i)> <istore i, V-S.v> ⇩ <V(i:=v)-S> <pop, V-S.v> ⇩ <V-S> <dup, V-S.v> ⇩ <V-S.v.v>
Notation used:
Notation | Explanation | |
---|---|---|
<c , V-S> |
instruction c , with storage V and stack S |
|
S.v |
stack with all values in S plus v on the top |
|
V(i) |
the value at position i in storage V |
|
V(i:=v) |
storage V with value v put into position i |
JVM does not always continue with the next instruction, but there can be jumps.
Example (my first BASIC program):
BEGIN: bipush 66 invokestatic runtime/iprint(I)V goto BEGIN
To give semantics to the goto
instruction, we have to add the
code C
to the environment.
We denote by C(p)
the instruction at position p
in C
.
<goto LABEL, C-V-S> ⇩ <C(LABEL), C-V-S>
Question: How do we now express rules for the "ordinary" instructions?
<bipush v, C-V-S> ⇩ <C(?), C-V-S.v>
Answer: we add a code pointer P
to the environment:
<goto LABEL, P-C-V-S> ⇩ <C(LABEL), LABEL-C-V-S> <bipush v, P-C-V-S> ⇩ <C(P+1), (P+1)-C-V-S.v>