Lecture 8: Interpreters Programming Languages Course Aarne Ranta (aarne@chalmers.se) %!target:html %!Encoding:utf-8 %!postproc(html): #NEW %!postproc(html): #HR
%!postproc(html): #sub1 1 %!postproc(html): #subn n %!postproc(html): #subn1 n-1 Book: nothing #NEW ==Plan== Interpretation vs. compilation Operational semantics From operational semantics to interpreter code Evaluation and side effects Tracing and debugging From type checker code to interpreter code Implementing interpreters in Haskell Implementing interpreters in Java Lab 3 summary JVM interpretation #NEW ==Interpretation vs. compilation== Interpretation: **run** the code statement by statement Compilation: **translate** the code into another format Example: ``` int i = 1 ; int j = i + 2 ; printInt(i + j) ; ``` Interpretation goes from state to state: ``` () (i = 1) (i = 1, j = 3) output 4 ``` Compilation generates another piece of code (e.g. JVM): ``` bipush 1 dup istore 0 bipush 2 iadd istore 1 iload 0 iload 1 iadd invokestatic runtime/iprint(I)V ``` #NEW ==Implementing interpretation and compilation== Both techniques use //syntax-directed translation// on //type-checked syntax tree//. Both can be easily adapted from the type checker code. Both have an environment with function signature and a block-structured variable context. But the environments contain different things... ... and the theoretical models are different. Interpretation: **operational semantics** (or, equivalently, **abstract interpretations**) Compilation: compilation schemes #NEW ==Operational semantics== Inference rules similar to typing rules, with the form of judgement ``` Env => Exp ⇩ Val ``` Read: "In the environment Env, expression Exp computes to value Val". This is called **big-step semantics**, as there can be many actual computation steps inside ``Exp ⇩ Val``. The environment has variables and their values, e.g. ``[x = 0, y = 7]``. Examples: addition, variables, integer literals. ``` Env => exp1 ⇩ val1 Env => exp2 ⇩ val2 ------------------------------------------- Env => exp1 + exp2 ⇩ val1 + val2 -------------- x = v is in Env -------------- i is an integer literal Env => x ⇩ v Env => i ⇩ i ``` #NEW ==Operational semantics for statements== Statements don't compute to values. Instead, they change the environment. The form of judgement is ``` Env => Stm ⇩ Env' ``` We write ``Env[x = val]`` for updating ``Env`` with a new value ``val`` for ``x``. Example: assignments. ``` Env => exp ⇩ val ------------------------------- Env => x = exp ⇩ Env[x = val] ``` #NEW ==Operational semantics: while loops== Two rules are needed, depending on whether the condition is true: ``` Env => exp ⇩ false ------------------------------ Env => while (exp) stm ⇩ Env Env => exp ⇩ true Env => stm ⇩ Env' Env' => while (exp) stm ⇩ Env'' --------------------------------- Env => while (exp) stm ⇩ Env'' ``` #NEW ==Expressions with side effects== More complex form of judgements: return a value //and// a new environment: ``` Env => Exp ⇩ (Val,Env') ``` For instance, if assignments are expressions, we have ``` Env => exp ⇩ (val,Env') -------------------------------------- Env => x = exp ⇩ (val,Env'[x = val]) ``` Try this with ``x = (x = x + 1) + 1`` in the initial environment ``[x = 0]`` ! #NEW ==Semantic rules revisited== Principle: each recursive call of the interpreter may change Env. ``` Env => exp1 ⇩ (val1,Env1) Env1 => exp2 ⇩ (val2,Env2) ---------------------------------------------------------- Env => exp1 + exp2 ⇩ (val1 + val2, Env2) ``` But if there are no recursive calls, Env usually remains constant. ``` -------------- x = v is in Env -------------- i is an integer literal Env => x ⇩ v Env => i ⇩ i ``` Exception: increments. Notice the difference between pre and post increments! ``` ---------------------------------- x = v is in Env Env => x++ ⇩ (v, Env[x = v+1]) ---------------------------------- x = v is in Env Env => ++x ⇩ (v+1, Env[x = v+1]) ``` #NEW ==From semantic rules to interpreter code== Basic idea is the same as in type checking: from rule ``` J#sub1 ... J#subn ---------------- C Env => exp ⇩ val ``` generate the code "upside down" - with some modifications, ``` interpret (Env => exp) = val#sub1 := interpret J#sub1 ... val#subn := interpret J#subn return f(val#sub1,...,val#subn,C) ``` The rules that are the starting point are now the abstract interpretation rules (equivalent to big-step semantic rules). First look at an example without side effects: ``` Env => exp1 ⇩ val1 Env => exp2 ⇩ val2 interpret Env (exp1 + exp2) = ------------------------------------------- val1 := interpret Env exp1 Env => exp1 + exp2 ⇩ val1 + val2 val2 := interpret Env exp2 return val1 + val2 ``` #NEW ==Values== Interpreting an expression returns a **value**. Therefore expression interpretation is also called **evaluation**. Values are also called **canonical expressions**, or expressions in **normal form**. Normal form: expression that cannot be evaluated further. In lab 3, we will need four types of values: - integer values, e.g. -47 - double values, e.g. 3.14159 - boolean values, ``true`` and ``false`` - a void value, which need never be shown Thus variables are not values, and complex expressions like ``3 + 2`` are not values. Good approximation (works when values are finite, atomic objects): values are literals. In the interpreter implementation, it is good to have a separate type ``Value`` with a constructor for each type of value. #NEW ==Operations on values== **Call by value evaluation strategy**: when executing a function call, first evaluate the arguments and then call the function with the obtained values as parameters. The call by value strategy is followed by many languages: C, C++, Java, ML - but not Haskell (which has "call by need" - lecture 11). The simplest example are arithmetic operations: the operands are evaluated first, e.g. in ``x + y``, ``x`` and ``y`` are evaluated first. The values are then added by a plus operation defined for values. #NEW ==Environments for interpreters== The **signature** holds the interpretations of functions. - It maps each function name to a function (written in the implementation language!) which sends values of arguments to a return value. - In Lab3, we need not necessarily have a signature - we just need interpretation rules for each of the four built-in functions. The **context** holds the values of the variables. - It has a block structure, just like the context in the type checker. - The values can be uninitialized, if a variable is only declared. Referring to such a variable gives a run-time error. - Thus the following methods are needed on the context ``` Value lookupVar (Env, Id) Env updateVar (Env, Id, Value) Env newVar (Env, Id) Env newBlock (Env) ``` #NEW ==Interpreting statements: execution== Interpreting statements is also called **execution**. The interpreter defines an ``exec`` function that tells how the environment is changed by each statement. ``` Env exec (Env, Stm) // declarations exec Env (typ var ;) = newVar Env var // expression statements exec Env (exp ;) = (Env2, val) := eval Env exp return Env2 // while loops exec Env (while (exp) stm) = (Env2, val) := eval Env exp if val == true Env3 := exec Env2 stm exec Env3 (while (exp) stm) else return Env2 ``` #NEW ==Interpreting expression: evaluation== Reminder: interpreting expressions statements is also called evaluation. In Lab 3 (as in C++, C, and Java), expressions may have **side effects**, i.e. change the environment. The interpreter defines an ``eval`` function that tells what value is returned and how the environment is changed by each statement. ``` (Env, Val) eval (Env, Exp) // literals eval Env lit = return (Env, lit) // variables exec Env var = val := lookupVar Env var return (Env,val) // assignments exec Env (var = exp) = (Env2,val) := eval Env exp Env3 := updateVar Env2 var val return (Env3,val) ``` #NEW ==Evaluation of increments== Following the C/C++ standard, we distinguish between **preincrements** ``++i`` and **postincrements** ``i++``. ``` // preincrements exec Env (++ var) = val := lookupVar Env var val2 := plusVal(val,1) Env2 := updateVar Env var val2 return (Env2,val2) // postincrements exec Env (var ++) = val := lookupVar Env var val2 := plusVal(val,1) Env2 := updateVar Env var val2 return (Env2,val) ``` #NEW ==Lazy logical operations== Conjunction, ``` a && b ``` is evaluated //lazily//: first ``a`` is evaluated. If the result is ``true``, also ``b`` is evaluated, and the value of ``b`` is returned. However, if ``a`` evaluates to ``false``, then ``false`` is returned without evaluating ``b``. ``` eval Env (a && b) = (Env2,val) := eval Env a if val == false return (Env2,val) else eval Env2 b ``` Disjunction, ``` a || b ``` is also evaluated lazily: first ``a`` is evaluated. If the result is ``false``, also ``b`` is evaluated, and the value of ``b`` is returned. However, if ``a`` evaluates to ``true``, then ``true`` is returned without evaluating ``b``. #NEW ==The top-level interpreter== To interpret the whole program, + initialize the context with an empty context + interpret the body in this context ``` exec (int main () { Stm#sub1 ... Stm#subn}) = Env#sub1 := exec .() Stm#sub1 ... exec Env#subn1 Stm#subn ``` #NEW ==Tracing and debugging== Nice thing about interpretation: the execution can be precisely traced statement by statement. For instance: append to each execution function call a line showing the statement and a line showing the variable context. This is useful for debugging programs. But it is also useful for debugging your interpreter! Use the old trick of writing a generic tracing function ``` trace Env stm = print Env ; // comment out in final version print stm ; // // return ; // uncomment in final version ``` You can also make this dependent on a command-line flag, so that the call ``` lab3 -debug file.cc ``` runs your program with tracing on. Add to this that the program stops after each statement to wait for an 'enter', and you have turned your interpreter into a debugger. (But this is not required in Lab 3.) #NEW ==Example run of the interpreter== Source code in [``good09.cc`` ../laborations/lab3/testsuite/good/good09.cc]: ``` int main () { int i = readInt() ; // 5 printInt(i) ; //5 printInt(i++) ; //5 printInt(i) ; //6 printInt(++i) ; //7 printInt(i) ; //7 } ``` Running the interpreter ``` bash$ ./lab3 good09.cc -- waits for user input, which is 100 100 100 100 101 102 102 ``` #NEW ==Example run in debugging mode== After each statement, the program waits for an Enter. In addition, it waits for user input with ``read`` functions, as usual. ``` bash$ ./lab3 -debug good09.cc [input] 100 i=100 printInt (i); [output] 100 i=100 printInt (i ++); [output] 100 i=101 printInt (i); [output] 101 i=101 printInt (++ i); [output] 102 i=102 printInt (i); [output] 102 i=102 ``` #NEW ==Interpreter in Haskell== You can copy the contents of [``laborations/lab3/haskell/`` ../laborations/lab3/haskell]: ``` CPP.cf -- grammar: from your Lab 2 lab3.hs -- main module Makefile TypeChecker.hs -- type checking module: from your Lab 2 Interpreter.hs -- interpreter module ``` You only have to modify ``Interpreter.hs``. But you can already compile them: just type ``` make ``` and run the interpreter with ``` ./lab3 ``` The rest is "debugging the empty file"! You can also start from [``laborations/mini`` ../laborations/mini]. #NEW ===The Main module=== You don't have to write this - but it shows how compiler phases are linked together. ``` check :: String -> IO () check s = case pProgram (myLexer s) of Bad err -> do putStrLn "SYNTAX ERROR" putStrLn err exitFailure Ok tree -> case typecheck tree of Bad err -> do putStrLn "TYPE ERROR" putStrLn err exitFailure Ok _ -> interpret tree ``` Notice that the interpreter runs in the IO monad and not the error monad. #NEW ==Central types== Values ``` data Value = Vint Integer | Vdouble Double | Vbool Bool | Vvoid printValue :: Value -> String ``` Environment type; use ``Map`` for easier update than lists. ``` type Env = (Sig,[Context]) type Sig = [(Id,[Value] -> IO Value)] type Context = M.Map Id Value ``` The signature is initialized by the primitive functions ``` (Id "printInt", \ [v] -> putStrLn (printValue v) >> return Vvoid) (Id "printDouble",\ [v] -> putStrLn (printValue v) >> return Vvoid) (Id "readInt", \ [] -> getLine >>= return . Vint . read) (Id "readDouble", \ [] -> getLine >>= return . Vdouble . read) ``` #NEW ===The Interpreter module=== The environment datatypes and operations. Type signatures of the interpretation methods ``` interpret :: Program -> IO () exec :: Env -> Stm -> IO Env eval :: Env -> Exp -> IO (Env,Value) ``` **Aggressivity** - "Well-typed programs cannot go wrong!" - therefore we need no more check certain things at runtime. ``` exec env (SWhile exp stm) = do (env1,Vbool b) <- eval env exp if b then ... --- ``` #NEW ==Interpreter in Java== You can copy the contents of [``laborations/lab3/java/`` ../laborations/lab3/java1.5]: ``` CPP.cf -- grammar (from lab2) lab3 -- script running the type checker lab3.java -- main program Makefile TypeChecker.java -- type checker class (from lab2) TypeException.java -- exceptions for type checking Interpreter.java -- interpreter class ``` You only have to modify ``Interpreter.java``. You can already compile the files: just type ``` make ``` and run the type checker with ``` ./lab3 ``` Before ``make``, you may have to set your class path so that it finds java_cup and JLex, as well as the current directory. ``` export CLASSPATH=.:::$CLASSPATH ``` #NEW ===The main module=== You don't have to write this - but it shows how compiler phases are linked together. ``` try { l = new Yylex(new FileReader(args[0])); parser p = new parser(l); CPP.Absyn.Program parse_tree = p.pProgram(); new TypeChecker().typecheck(parse_tree); new Interpreter().interpret(parse_tree); } catch (TypeException e) { System.out.println("TYPE ERROR"); System.err.println(e.toString()); System.exit(1); } catch (RuntimeException e) { // System.out.println("RUNTIME ERROR"); System.err.println(e.toString()); System.exit(-1); } catch (IOException e) { System.err.println(e.toString()); System.exit(1); } catch (Throwable e) { System.out.println("SYNTAX ERROR"); System.out.println("At line " + String.valueOf(l.line_num()) + ", near \"" + l.buff() + "\" :"); System.out.println(" " + e.getMessage()); e.printStackTrace(); System.exit(1); } ``` #NEW ==Complete code== This code is for a small language called ``Mini``: ``` -- Mini.cf Prog. Program ::= [Stm] ; terminator Stm "" ; SDecl. Stm ::= Type Ident ";" ; SAss. Stm ::= Ident "=" Exp ";" ; SBlock. Stm ::= "{" [Stm] "}" ; SPrint. Stm ::= "print" Exp ";" ; EVar. Exp1 ::= Ident ; EInt. Exp1 ::= Integer ; EDouble. Exp1 ::= Double ; EAdd. Exp ::= Exp "+" Exp1 ; coercions Exp 1 ; TInt. Type ::= "int" ; TDouble. Type ::= "double" ; ``` The code is found in [``laborations/mini`` ../laborations/mini]. You can take it as starting point for Java (or Haskell as well). #NEW ===The Interpreter module=== ``` import Mini.Absyn.*; import java.util.HashMap; import java.util.LinkedList; public class Interpreter { public void interpret(Program p) { Prog prog = (Prog)p; Env env = new Env(); for (Stm s : prog.liststm_) { execStm(s, env); } } private static abstract class Value { public boolean isInt() { return false; } public Integer getInt() { throw new RuntimeException(this + " is not an integer."); } public Double getDouble() { throw new RuntimeException(this + " is not a double."); } public static class Undefined extends Value { public Undefined() {} public String toString() { return "undefined"; } } public static class IntValue extends Value { private Integer i; public IntValue(Integer i) { this.i = i; } public boolean isInt() { return true; } public Integer getInt() { return i; } public String toString() { return i.toString(); } } public static class DoubleValue extends Value { private Double d; public DoubleValue(Double d) { this.d = d; } public Double getDouble() { return d; } public String toString() { return d.toString(); } } } private static class Env { private LinkedList> scopes; public Env() { scopes = new LinkedList>(); enterScope(); } public Value lookupVar(String x) { for (HashMap scope : scopes) { Value v = scope.get(x); if (v != null) return v; } throw new RuntimeException("Unknown variable " + x + " in " + scopes); } public void addVar(String x) { scopes.getFirst().put(x,new Value.Undefined()); } public void setVar(String x, Value v) { for (HashMap scope : scopes) { if (scope.containsKey(x)) { scope.put(x,v); return; } } } public void enterScope() { scopes.addFirst(new HashMap()); } public void leaveScope() { scopes.removeFirst(); } } private void execStm(Stm st, Env env) { st.accept(new StmExecuter(), env); } private class StmExecuter implements Stm.Visitor { public Object visit(Mini.Absyn.SDecl p, Env env) { env.addVar(p.ident_); return null; } public Object visit(Mini.Absyn.SAss p, Env env) { env.setVar(p.ident_, evalExp(p.exp_, env)); return null; } public Object visit(Mini.Absyn.SBlock p, Env env) { env.enterScope(); for (Stm st : p.liststm_) { execStm(st, env); } env.leaveScope(); return null; } public Object visit(Mini.Absyn.SPrint p, Env env) { Value v = evalExp(p.exp_, env); System.err.println(v.toString()); return null; } } private Value evalExp(Exp e, Env env) { return e.accept(new ExpEvaluator(), env); } private class ExpEvaluator implements Exp.Visitor { public Value visit(Mini.Absyn.EVar p, Env env) { return env.lookupVar(p.ident_); } public Value visit(Mini.Absyn.EInt p, Env env) { return new Value.IntValue(p.integer_); } public Value visit(Mini.Absyn.EDouble p, Env env) { return new Value.DoubleValue(p.double_); } public Value visit(Mini.Absyn.EAdd p, Env env) { Value v1 = p.exp_1.accept(this, env); Value v2 = p.exp_2.accept(this, env); if (v1.isInt()) { return new Value.IntValue(v1.getInt() + v2.getInt()); } else { return new Value.DoubleValue(v1.getDouble() + v2.getDouble()); } } } } ``` #NEW ==The interpretation of print and read== These functions are directly inlined in the ``Eval`` class visiting expressions that call them. ``` public static class Eval implements Exp.Visitor { public EnvVal visit(ECall p, Env env) { if (p.id_ == "printInt") { EnvVal envval = p.listexp_.element().accept(this, env); System.err.println(envval.val.toString()) ; } ; } // ... } ``` #NEW ==Lab 3== We take a look at the [lab 3 PM ../laborations/lab3/lab3.html]. #NEW ==Java bytecode interpretation== Byte code, virtual machine code - simpler than high-level source code. Example: JVM (Java Virtual Machine) ``` bipush n -- push byte constant n iadd -- add two integers; pop the operands and push the result imul -- multiply two integers; pop the operands and push the result istore x -- store value in stack address x and pop it iload x -- push value to stack address x dup -- duplicate the top of the stack invokestatic -- call a function with parameters from the top of the stack, pop the parameters and push the value ``` Java is compiled to JVM (next lecture). JVM is interpreted, or compiled to native machine code by JIT (Just In Time compilation). Most "interpreted languages" are actually compiled to byte code. Exception: Ruby. Example use of ``invokestatic``, generated from ``printInt(5)``: ``` bipush 5 invokestatic runtime/iprint(I)V ``` The type ``(I)V`` tells e.g. how many values to pop. #NEW ==JVM interpreter== Environment: **local variable storage** and a **stack** holding values Primitive actions: - **store** value in place #i in local storage - **load** value from place #i in local storage - **push** a value to the stack - **pop** a value from the stack Example execution: ``5 * (6 + 7)`` ``` bipush 5 ; bipush 6 ; bipush 7 ; iadd ; imul -- -- -- -- -- 5 5 5 5 65 6 6 13 7 ``` #NEW ==Local variables in JVM== The compiler assigns local storage addresses to variables (lecture 11). ``` int i ; ; reserve address 0 for i i = 9 ; bipush 9 istore 0 int j = i + 3 ; ; reserve address 1 for j iload 0 bipush 3 iadd istore 1 ``` #NEW ==Semantics of JVM== Naturally expressed using **small-step semantics**, i.e. each rule specifies one step of computation. The format of small-step rules is ``` < Instruction , Env > ⇩ < Env' > ``` The environment has a storage V and a stack S. The rules work on instructions, executed one at a time. ``` ``` Notation used: || Notation | Explanation || | ```` | instruction ``c``, with storage ``V`` and stack ``S`` | | ``S.v`` | stack with all values in ``S`` plus ``v`` on the top | | ``V(i)`` | the value at position ``i`` in storage ``V`` | | ``V(i:=v)`` | storage ``V`` with value ``v`` put into position ``i`` | #NEW ==Dealing with jumps== JVM does not always continue with the next instruction, but there can be jumps. Example (my first BASIC program): ``` BEGIN: bipush 66 invokestatic runtime/iprint(I)V goto BEGIN ``` To give semantics to the ``goto`` instruction, we have to add the code ``C`` to the environment. We denote by ``C(p)`` the instruction at position ``p`` in ``C``. ``` ``` Question: How do we now express rules for the "ordinary" instructions? ``` ``` Answer: we add a code pointer ``P`` to the environment: ``` ```