Book: 5.1, 5.3, 2.7
Functions that take abstract syntax trees as arguments.
A general technique in compiler phases after parsing:
Implementable in different programming languages
Help from BNFC: Skeleton files implementing the traversal of trees, returning dummy values that can be hand-edited.
Grammar
Exp ::= Exp "+" Exp | Exp "*" Exp | Integer
Syntax-directed translation function returning the value of an expression
value (x + y) = value(x) + value(y) value (x * y) = value(x) * value(y) value (i) = i
There is a case for each syntactic constructor.
Normally, the value for a tree is built from the values for subtrees.
Example run:
value (4 + 5 * 6) = value(4) + value(5 * 6) = 4 + (value(5) * value(6)) = 4 + (5 * 6) = 4 + 30 = 34
Grammar (the same as previous)
Exp ::= Exp "+" Exp | Exp "*" Exp | Integer
Syntax-directed translation function returning JVM code
code (x + y) = code(x) \n code(y) \n iadd code (x * y) = code(x) \n code(y) \n imul code (i) = bipush i
Example run
code (4 + 5 * 6) = code(4) \n code(5 * 6) \n iadd = bipush 4 \n code(5) \n code(6) \n imul \n iadd = bipush 4 \n bipush 5 \n bipush 6 \n imul \n iadd
Above, we have used a pseudocode notation where we have shown concrete syntax instead of syntax trees.
We can do this more precise by using the abstract syntax constructors:
EAdd. Exp ::= Exp "+" Exp EMul. Exp ::= Exp "*" Exp EInt. Exp ::= Integer
Now we can write
value (EAdd x y) = value(x) + value(y) value (EMul x y) = value(x) * value(y) value (EInt i) = i
But what is the set of translation rules on the previous slide?
It is exactly a piece of Haskell code defining a function with the type signature
value :: Exp -> Integer
It is defined by pattern matching on the datatype
data Exp = EAdd Exp Exp | EMul Exp Exp | EInt Integer
Datatypes and pattern matching make Haskell very usable in compilers.
Now, how can we do in Java?
Recall the way datatypes are implemented in Java:
public abstract class Exp public class EAdd extends Exp { public final Exp exp_1, exp_2; } public class EMul extends Exp { public final Exp exp_1, exp_2; } public class EInt extends Exp { public final Integer integer_; }
The most obvious way to implement the value
function
is to add value
as a class method.
Let us add the value()
method to Exp
and its subclasses.
public abstract class Exp { public abstract Integer value() ; } public class EAdd extends Exp { public final Exp exp_1, exp_2; public Integer value() {return exp_1.value() + exp_2.value() ;} } public class EMul extends Exp { public final Exp exp_1, exp_2; public Integer value() {return exp_1.value() * exp_2.value() ;} } public class EInt extends Exp { public final Integer integer_; public Integer value() {return integer_ ;} }
This is the method explained in Book, chapter 5.
Attribute grammars combine parsing rules with translations made as semantic actions, which can be expressed in standard parser tools (Yacc, Bison, CUP, Happy).
Previous example as attribute grammar
Exp ::= Exp_1 "+" Exp_2 {Exp.value = Exp_1.value + Exp_2.value} Exp ::= Exp_1 "*" Exp_2 {Exp.value = Exp_1.value * Exp_2.value} Exp ::= Integer {Exp.value = Integer.intval}
In this course, we assume that semantic actions only build abstract syntax trees (cf. Book 5.3.1). As attribute grammar:
Exp ::= Exp_1 "+" Exp_2 {Exp.tree = EAdd Exp_1.tree Exp_2.tree} Exp ::= Exp_1 "*" Exp_2 {Exp.tree = EMul Exp_1.tree Exp_2.tree} Exp ::= Integer {Exp.tree = EInt Integer}
Let us now add the code()
method to Exp
and its subclasses.
It will print the code to standard output. To save space,
the value()
method is not shown.
public abstract class Exp { public abstract void code() ; } public class EAdd extends Exp { public final Exp exp_1, exp_2; public void code() { exp_1.code() ; exp_2.code() ; System.out.println("iadd") ; } } public class EMul extends Exp { public final Exp exp_1, exp_2; public void code() { exp_1.code() ; exp_2.code() ; System.out.println("imul") ; } } public class EInt extends Exp { public final Integer integer_; public void code() { System.out.println("bipush " + integer_) ; } }
Which translation methods should there be in the datatype definitions?
Interpreter, code generator, type checker, pretty printer,...
Well-known problem with object-oriented programming:
This is in contrast with functional programming (Haskell)
However, the problem can be solved in Java by using visitors.
A general way of defining functions over a class, with any return type R
and a supplementary argument of any type A
.
public abstract class Exp { public abstract <R,A> R accept(Exp.Visitor<R,A> v, A arg); public interface Visitor <R,A> { public R visit(Arithm.Absyn.EAdd p, A arg); public R visit(Arithm.Absyn.EMul p, A arg); public R visit(Arithm.Absyn.EInt p, A arg); } } public class EAdd extends Exp { public final Exp exp_1, exp_2; public <R,A> R accept(Arithm.Absyn.Exp.Visitor<R,A> v, A arg) { return v.visit(this, arg); } } public class EInt extends Exp { public final Integer integer_; public <R,A> R accept(Arithm.Absyn.Exp.Visitor<R,A> v, A arg) { return v.visit(this, arg); } }
public class Interpreter { public Integer value(Exp e) { return e.accept(new Value() , null ) ; } private class Value implements Exp. Visitor<Integer, Object> { public Integer visit (EAdd p, Object arg) { return value(p.exp_1) + value(p.exp_2) ; } public Integer visit (EMul p, Object arg) { return value(p.exp_1) * value(p.exp_2) ; } public Integer visit (EInt p, Object arg) { return p.integer_ ; } } }
Let us build a "debugger" that prints intermediate values while it evaluates the expression.
value (x + y) = ret = value(x) + value(y) print(ret) return ret value (x * y) = ret = value(x) * value(y) print(ret) return ret value (i) = ret = i print(ret) return ret
Printing the value is a side effect of the interpreter.
The value itself is stored in the variable ret
and returned.
Sequence of prints from example run:
value (4 + 5 * 6): 4 -- ret = 4 5 -- ret = 5 6 -- ret = 6 30 -- ret = 5 * 6 34 -- ret = 4 + 30
This is very simple in Java: just add a printing statement to the code
private class Value implements Exp. Visitor<Integer, Object> { public Integer visit (EAdd p, Object arg) { Integer ret = value(p.exp_1) + value(p.exp_2) ; System.out.println(ret) ; return ret ; } public Integer visit (EMul p, Object arg) { Integer ret = value(p.exp_1) * value(p.exp_2) ; System.out.println(ret) ; return ret ; } public Integer visit (EInt p, Object arg) { ret = p.integer_ ; System.out.println(ret) ; return ret ; } }
What about Haskell? Haskell code is pure and cannot have side effects.
Is Haskell so good for compilers after all?
In Haskell, we have to change the value type to a monad in order to get side effects.
The IO monad is the most basic one.
Now we can write much like in Java:
value :: Exp -> IO Integer value (EAdd x y) = do v1 <- value(x) v2 <- value(y) let ret = v1 + v2 print ret return ret value (EMul x y) = do v1 <- value(x) v2 <- value(y) let ret = v1 * v2 print ret return ret value (EInt i) = do let ret = i print ret return ret
Language: sequence of assignments to integer variables.
Prog. Program ::= [Ass] AVar. Ass ::= Ident "=" Exp ";" EAdd. Exp ::= Exp "+" Exp EMul. Exp ::= Exp "*" Exp EInt. Exp ::= Integer EId. Exp ::= Ident
Interpreter: return the value of the last assignment.
x = 7 y = x + 5 y = x + y * 4 returns: 55
Cf. Book, chapter 2.
The interpreter must maintain a symbol table, mapping variables to their values. This is how it evolves:
x = 7 -- x = 7 y = x + 5 -- x = 7, y = 12 y = x + y * 4 -- x = 7, y = 55
Symbol tables are needed in other compiler phases as well.
Before implementing them, let us summarize their main usages:
Updating symbol tables is another side effect common in syntax-directed translation.
interpret
is defined for programs and assignments, returning nothing.
value
is defined for assignments and expressions, returning an integer.
interpret(s1 ... sn) = interpret(s1) ... interpret(sn) print(value(sn)) interpret(v = e) = update(v,value(e)) value(v = e) = return(value(v)) value (x + y) = return(value(x) + value(y)) value (x * y) = return(value(x) * value(y)) value (i) = return(i) value (v) = return(lookup(v))
Lists [(symbol,value)]
, or (better) Map symbol value
.
lookup :: s -> Map s v -> Maybe v insert :: s -> v -> Map s v -> Map s v
In the interpreter, the symbol table can be sent as an argument.
interpretProgram :: Program -> Map Ident Integer -> Map Ident Integer interpretProgram [] table = table interpretProgram (s:ss) table = let table' = interpretAss s table in interpretProgram ss table' interpretAss :: Ass -> Map Ident Integer -> Map Ident Integer interpretAss (AVar v e) table = insert v (value table e) table value :: Exp -> Map Ident Integer -> Integer value (EAdd x y) table = value x table + value y table -- value (EVar v) table = case lookup v table of Just i -> i Nothing -> error ("unknown variable " ++ show v)
Hash maps Map<symbol,value>
, with methods
value get (symbol s) void put (symbol s, value v)
Example:
public static class Interpret implements Stm.Visitor<Object,Map<Ident,Integer>> { public Object visit(AVar p, Map<Ident, Integer> env) { Integer i = p.exp_.accept(new Value(), env) ; env.put(p.ident_, i) ; return null ; } } public static class Value implements Exp.Visitor<Integer, Map<Ident,Integer>> { public Integer visit(EVar p, Map<Ident,Integer> env) { Integer i = env.get(p.ident_); if (i != null) return t; else throw //... } }