Lecture 6: Syntax-Directed Translation Programming Languages Course Aarne Ranta (aarne@chalmers.se) %!target:html %!postproc(html): #NEW %!postproc(html): #HR
Book: 5.1, 5.3, 2.7 #NEW ==Syntax-directed translation== Functions that take abstract syntax trees as arguments. A general technique in compiler phases after parsing: - type checker: from trees to Booleans (or to type-annotated trees) - interpreter: from trees to values - optimizer: from trees to trees - code generator: from trees to target code instructions Implementable in different programming languages - pattern matching in Haskell - visitors in Java and C++ Help from BNFC: Skeleton files implementing the traversal of trees, returning dummy values that can be hand-edited. #NEW ==Example 1: interpreter of arithmetic expressions== Grammar ``` Exp ::= Exp "+" Exp | Exp "*" Exp | Integer ``` Syntax-directed translation function returning the **value** of an expression ``` value (x + y) = value(x) + value(y) value (x * y) = value(x) * value(y) value (i) = i ``` There is a case for each syntactic constructor. Normally, the value for a tree is built from the values for subtrees. Example run: ``` value (4 + 5 * 6) = value(4) + value(5 * 6) = 4 + (value(5) * value(6)) = 4 + (5 * 6) = 4 + 30 = 34 ``` #NEW ==Example 2: JVM code generator== Grammar (the same as previous) ``` Exp ::= Exp "+" Exp | Exp "*" Exp | Integer ``` Syntax-directed translation function returning JVM code ``` code (x + y) = code(x) \n code(y) \n iadd code (x * y) = code(x) \n code(y) \n imul code (i) = bipush i ``` Example run ``` code (4 + 5 * 6) = code(4) \n code(5 * 6) \n iadd = bipush 4 \n code(5) \n code(6) \n imul \n iadd = bipush 4 \n bipush 5 \n bipush 6 \n imul \n iadd ``` #NEW ==Making translation rules precise== Above, we have used a pseudocode notation where we have shown concrete syntax instead of syntax trees. We can do this more precise by using the abstract syntax constructors: ``` EAdd. Exp ::= Exp "+" Exp EMul. Exp ::= Exp "*" Exp EInt. Exp ::= Integer ``` Now we can write ``` value (EAdd x y) = value(x) + value(y) value (EMul x y) = value(x) * value(y) value (EInt i) = i ``` #NEW ==Implementing translation rules in Haskell== But what is the set of translation rules on the previous slide? It is //exactly// a piece of Haskell code defining a function with the type signature ``` value :: Exp -> Integer ``` It is defined by **pattern matching** on the datatype ``` data Exp = EAdd Exp Exp | EMul Exp Exp | EInt Integer ``` Datatypes and pattern matching make Haskell very usable in compilers. Now, how can we do in Java? #NEW ==Translation rules in Java: datatypes== Recall the way datatypes are implemented in Java: ``` public abstract class Exp public class EAdd extends Exp { public final Exp exp_1, exp_2; } public class EMul extends Exp { public final Exp exp_1, exp_2; } public class EInt extends Exp { public final Integer integer_; } ``` The most obvious way to implement the ``value`` function is to add ``value`` as a class method. #NEW ==Translation rules in Java: a new method== Let us add the ``value()`` method to ``Exp`` and its subclasses. ``` public abstract class Exp { public abstract Integer value() ; } public class EAdd extends Exp { public final Exp exp_1, exp_2; public Integer value() {return exp_1.value() + exp_2.value() ;} } public class EMul extends Exp { public final Exp exp_1, exp_2; public Integer value() {return exp_1.value() * exp_2.value() ;} } public class EInt extends Exp { public final Integer integer_; public Integer value() {return integer_ ;} } ``` #NEW ==Attribute grammars== This is the method explained in Book, chapter 5. Attribute grammars combine parsing rules with translations made as semantic actions, which can be expressed in standard parser tools (Yacc, Bison, CUP, Happy). Previous example as attribute grammar ``` Exp ::= Exp_1 "+" Exp_2 {Exp.value = Exp_1.value + Exp_2.value} Exp ::= Exp_1 "*" Exp_2 {Exp.value = Exp_1.value * Exp_2.value} Exp ::= Integer {Exp.value = Integer.intval} ``` In this course, we assume that semantic actions only build abstract syntax trees (cf. Book 5.3.1). As attribute grammar: ``` Exp ::= Exp_1 "+" Exp_2 {Exp.tree = EAdd Exp_1.tree Exp_2.tree} Exp ::= Exp_1 "*" Exp_2 {Exp.tree = EMul Exp_1.tree Exp_2.tree} Exp ::= Integer {Exp.tree = EInt Integer} ``` #NEW ==Translation rules in Java: another new method== Let us now add the ``code()`` method to ``Exp`` and its subclasses. It will print the code to standard output. To save space, the ``value()`` method is not shown. ``` public abstract class Exp { public abstract void code() ; } public class EAdd extends Exp { public final Exp exp_1, exp_2; public void code() { exp_1.code() ; exp_2.code() ; System.out.println("iadd") ; } } public class EMul extends Exp { public final Exp exp_1, exp_2; public void code() { exp_1.code() ; exp_2.code() ; System.out.println("imul") ; } } public class EInt extends Exp { public final Integer integer_; public void code() { System.out.println("bipush " + integer_) ; } } ``` #NEW ==How many translation methods?== Which translation methods should there be in the datatype definitions? Interpreter, code generator, type checker, pretty printer,... Well-known problem with object-oriented programming: - it is easy to add new data constructors (subclasses)... - ... but it is difficult to define new functions This is in contrast with functional programming (Haskell) - it is easy to define new functions... - ... but it is difficult to add new data constructors However, the problem can be solved in Java by using **visitors**. #NEW ==The visitor method== A general way of defining functions over a class, with any return type ``R`` and a supplementary argument of any type ``A``. ``` public abstract class Exp { public abstract R accept(Exp.Visitor v, A arg); public interface Visitor { public R visit(Arithm.Absyn.EAdd p, A arg); public R visit(Arithm.Absyn.EMul p, A arg); public R visit(Arithm.Absyn.EInt p, A arg); } } public class EAdd extends Exp { public final Exp exp_1, exp_2; public R accept(Arithm.Absyn.Exp.Visitor v, A arg) { return v.visit(this, arg); } } public class EInt extends Exp { public final Integer integer_; public R accept(Arithm.Absyn.Exp.Visitor v, A arg) { return v.visit(this, arg); } } ``` #NEW ==Translation using the visitor method== ``` public class Interpreter { public Integer value(Exp e) { return e.accept(new Value() , null ) ; } private class Value implements Exp. Visitor { public Integer visit (EAdd p, Object arg) { return value(p.exp_1) + value(p.exp_2) ; } public Integer visit (EMul p, Object arg) { return value(p.exp_1) * value(p.exp_2) ; } public Integer visit (EInt p, Object arg) { return p.integer_ ; } } } ``` #NEW ==Translation with side effects== Let us build a "debugger" that prints intermediate values while it evaluates the expression. ``` value (x + y) = ret = value(x) + value(y) print(ret) return ret value (x * y) = ret = value(x) * value(y) print(ret) return ret value (i) = ret = i print(ret) return ret ``` Printing the value is a **side effect** of the interpreter. The value itself is stored in the variable ``ret`` and returned. Sequence of prints from example run: ``` value (4 + 5 * 6): 4 -- ret = 4 5 -- ret = 5 6 -- ret = 6 30 -- ret = 5 * 6 34 -- ret = 4 + 30 ``` #NEW ==Implementing side effects== This is very simple in Java: just add a printing statement to the code ``` private class Value implements Exp. Visitor { public Integer visit (EAdd p, Object arg) { Integer ret = value(p.exp_1) + value(p.exp_2) ; System.out.println(ret) ; return ret ; } public Integer visit (EMul p, Object arg) { Integer ret = value(p.exp_1) * value(p.exp_2) ; System.out.println(ret) ; return ret ; } public Integer visit (EInt p, Object arg) { ret = p.integer_ ; System.out.println(ret) ; return ret ; } } ``` What about Haskell? Haskell code is **pure** and cannot have side effects. Is Haskell so good for compilers after all? #NEW ==Side effects in monads== In Haskell, we have to change the value type to a **monad** in order to get side effects. The IO monad is the most basic one. Now we can write much like in Java: ``` value :: Exp -> IO Integer value (EAdd x y) = do v1 <- value(x) v2 <- value(y) let ret = v1 + v2 print ret return ret value (EMul x y) = do v1 <- value(x) v2 <- value(y) let ret = v1 * v2 print ret return ret value (EInt i) = do let ret = i print ret return ret ``` #NEW ==A more interesting interpreter== Language: sequence of assignments to integer variables. ``` Prog. Program ::= [Ass] AVar. Ass ::= Ident "=" Exp ";" EAdd. Exp ::= Exp "+" Exp EMul. Exp ::= Exp "*" Exp EInt. Exp ::= Integer EId. Exp ::= Ident ``` Interpreter: return the value of the last assignment. ``` x = 7 y = x + 5 y = x + y * 4 returns: 55 ``` Cf. Book, chapter 2. #NEW ==Symbol tables== The interpreter must maintain a **symbol table**, mapping variables to their values. This is how it evolves: ``` x = 7 -- x = 7 y = x + 5 -- x = 7, y = 12 y = x + y * 4 -- x = 7, y = 55 ``` Symbol tables are needed in other compiler phases as well. Before implementing them, let us summarize their main usages: - **lookup**: return the value of a symbol; error if the symbol is not defined. - **update**: change the value of a symbol; add the symbol if it is not defined. Updating symbol tables is another side effect common in syntax-directed translation. #NEW ==Interpreter with symbol tables== ``interpret`` is defined for programs and assignments, returning nothing. ``value`` is defined for assignments and expressions, returning an integer. ``` interpret(s1 ... sn) = interpret(s1) ... interpret(sn) print(value(sn)) interpret(v = e) = update(v,value(e)) value(v = e) = return(value(v)) value (x + y) = return(value(x) + value(y)) value (x * y) = return(value(x) * value(y)) value (i) = return(i) value (v) = return(lookup(v)) ``` #NEW ==Implementing symbol tables in Haskell== Lists ``[(symbol,value)]``, or (better) ``Map symbol value``. ``` lookup :: s -> Map s v -> Maybe v insert :: s -> v -> Map s v -> Map s v ``` In the interpreter, the symbol table can be sent as an argument. ``` interpretProgram :: Program -> Map Ident Integer -> Map Ident Integer interpretProgram [] table = table interpretProgram (s:ss) table = let table' = interpretAss s table in interpretProgram ss table' interpretAss :: Ass -> Map Ident Integer -> Map Ident Integer interpretAss (AVar v e) table = insert v (value table e) table value :: Exp -> Map Ident Integer -> Integer value (EAdd x y) table = value x table + value y table -- value (EVar v) table = case lookup v table of Just i -> i Nothing -> error ("unknown variable " ++ show v) ``` #NEW ==Implementing symbol tables in Java 1.5== Hash maps ``Map``, with methods ``` value get (symbol s) void put (symbol s, value v) ``` Example: ``` public static class Interpret implements Stm.Visitor> { public Object visit(AVar p, Map env) { Integer i = p.exp_.accept(new Value(), env) ; env.put(p.ident_, i) ; return null ; } } public static class Value implements Exp.Visitor> { public Integer visit(EVar p, Map env) { Integer i = env.get(p.ident_); if (i != null) return t; else throw //... } } ```