Lecture 6: Syntax-Directed Translation
Programming Languages Course
Aarne Ranta (aarne@chalmers.se)
%!target:html
%!postproc(html): #NEW
%!postproc(html): #HR
Book: 5.1, 5.3, 2.7
#NEW
==Syntax-directed translation==
Functions that take abstract syntax trees as arguments.
A general technique in compiler phases after parsing:
- type checker: from trees to Booleans (or to type-annotated trees)
- interpreter: from trees to values
- optimizer: from trees to trees
- code generator: from trees to target code instructions
Implementable in different programming languages
- pattern matching in Haskell
- visitors in Java and C++
Help from BNFC: Skeleton files implementing the traversal
of trees, returning dummy values that can be hand-edited.
#NEW
==Example 1: interpreter of arithmetic expressions==
Grammar
```
Exp ::= Exp "+" Exp | Exp "*" Exp | Integer
```
Syntax-directed translation function returning the **value** of
an expression
```
value (x + y) = value(x) + value(y)
value (x * y) = value(x) * value(y)
value (i) = i
```
There is a case for each syntactic constructor.
Normally, the value for a tree is built from the values for
subtrees.
Example run:
```
value (4 + 5 * 6)
= value(4) + value(5 * 6)
= 4 + (value(5) * value(6))
= 4 + (5 * 6)
= 4 + 30
= 34
```
#NEW
==Example 2: JVM code generator==
Grammar (the same as previous)
```
Exp ::= Exp "+" Exp | Exp "*" Exp | Integer
```
Syntax-directed translation function returning JVM code
```
code (x + y) = code(x) \n code(y) \n iadd
code (x * y) = code(x) \n code(y) \n imul
code (i) = bipush i
```
Example run
```
code (4 + 5 * 6)
= code(4) \n code(5 * 6) \n iadd
= bipush 4 \n code(5) \n code(6) \n imul \n iadd
= bipush 4 \n bipush 5 \n bipush 6 \n imul \n iadd
```
#NEW
==Making translation rules precise==
Above, we have used a pseudocode notation where we have shown
concrete syntax instead of syntax trees.
We can do this more precise by using the abstract syntax
constructors:
```
EAdd. Exp ::= Exp "+" Exp
EMul. Exp ::= Exp "*" Exp
EInt. Exp ::= Integer
```
Now we can write
```
value (EAdd x y) = value(x) + value(y)
value (EMul x y) = value(x) * value(y)
value (EInt i) = i
```
#NEW
==Implementing translation rules in Haskell==
But what is the set of translation rules on the previous slide?
It is //exactly// a piece of Haskell code defining a function
with the type signature
```
value :: Exp -> Integer
```
It is defined by **pattern matching** on the datatype
```
data Exp = EAdd Exp Exp | EMul Exp Exp | EInt Integer
```
Datatypes and pattern matching make Haskell very usable
in compilers.
Now, how can we do in Java?
#NEW
==Translation rules in Java: datatypes==
Recall the way datatypes are implemented in Java:
```
public abstract class Exp
public class EAdd extends Exp {
public final Exp exp_1, exp_2;
}
public class EMul extends Exp {
public final Exp exp_1, exp_2;
}
public class EInt extends Exp {
public final Integer integer_;
}
```
The most obvious way to implement the ``value`` function
is to add ``value`` as a class method.
#NEW
==Translation rules in Java: a new method==
Let us add the ``value()`` method to ``Exp`` and its subclasses.
```
public abstract class Exp {
public abstract Integer value() ;
}
public class EAdd extends Exp {
public final Exp exp_1, exp_2;
public Integer value() {return exp_1.value() + exp_2.value() ;}
}
public class EMul extends Exp {
public final Exp exp_1, exp_2;
public Integer value() {return exp_1.value() * exp_2.value() ;}
}
public class EInt extends Exp {
public final Integer integer_;
public Integer value() {return integer_ ;}
}
```
#NEW
==Attribute grammars==
This is the method explained in Book, chapter 5.
Attribute grammars combine parsing rules with translations
made as semantic actions, which can be expressed in standard
parser tools (Yacc, Bison, CUP, Happy).
Previous example as attribute grammar
```
Exp ::= Exp_1 "+" Exp_2 {Exp.value = Exp_1.value + Exp_2.value}
Exp ::= Exp_1 "*" Exp_2 {Exp.value = Exp_1.value * Exp_2.value}
Exp ::= Integer {Exp.value = Integer.intval}
```
In this course, we assume that semantic actions only build
abstract syntax trees (cf. Book 5.3.1). As attribute grammar:
```
Exp ::= Exp_1 "+" Exp_2 {Exp.tree = EAdd Exp_1.tree Exp_2.tree}
Exp ::= Exp_1 "*" Exp_2 {Exp.tree = EMul Exp_1.tree Exp_2.tree}
Exp ::= Integer {Exp.tree = EInt Integer}
```
#NEW
==Translation rules in Java: another new method==
Let us now add the ``code()`` method to ``Exp`` and its subclasses.
It will print the code to standard output. To save space,
the ``value()`` method is not shown.
```
public abstract class Exp {
public abstract void code() ;
}
public class EAdd extends Exp {
public final Exp exp_1, exp_2;
public void code() {
exp_1.code() ;
exp_2.code() ;
System.out.println("iadd") ;
}
}
public class EMul extends Exp {
public final Exp exp_1, exp_2;
public void code() {
exp_1.code() ;
exp_2.code() ;
System.out.println("imul") ;
}
}
public class EInt extends Exp {
public final Integer integer_;
public void code() {
System.out.println("bipush " + integer_) ;
}
}
```
#NEW
==How many translation methods?==
Which translation methods should there be in the datatype definitions?
Interpreter, code generator, type checker, pretty printer,...
Well-known problem with object-oriented programming:
- it is easy to add new data constructors (subclasses)...
- ... but it is difficult to define new functions
This is in contrast with functional programming (Haskell)
- it is easy to define new functions...
- ... but it is difficult to add new data constructors
However, the problem can be solved in Java by using **visitors**.
#NEW
==The visitor method==
A general way of defining functions over a class, with any return type ``R``
and a supplementary argument of any type ``A``.
```
public abstract class Exp {
public abstract R accept(Exp.Visitor v, A arg);
public interface Visitor {
public R visit(Arithm.Absyn.EAdd p, A arg);
public R visit(Arithm.Absyn.EMul p, A arg);
public R visit(Arithm.Absyn.EInt p, A arg);
}
}
public class EAdd extends Exp {
public final Exp exp_1, exp_2;
public R accept(Arithm.Absyn.Exp.Visitor v, A arg) {
return v.visit(this, arg);
}
}
public class EInt extends Exp {
public final Integer integer_;
public R accept(Arithm.Absyn.Exp.Visitor v, A arg) {
return v.visit(this, arg);
}
}
```
#NEW
==Translation using the visitor method==
```
public class Interpreter {
public Integer value(Exp e) {
return e.accept(new Value() , null ) ;
}
private class Value implements Exp. Visitor {
public Integer visit (EAdd p, Object arg) {
return value(p.exp_1) + value(p.exp_2) ;
}
public Integer visit (EMul p, Object arg) {
return value(p.exp_1) * value(p.exp_2) ;
}
public Integer visit (EInt p, Object arg) {
return p.integer_ ;
}
}
}
```
#NEW
==Translation with side effects==
Let us build a "debugger" that prints intermediate values while
it evaluates the expression.
```
value (x + y) =
ret = value(x) + value(y)
print(ret)
return ret
value (x * y) =
ret = value(x) * value(y)
print(ret)
return ret
value (i) =
ret = i
print(ret)
return ret
```
Printing the value is a **side effect** of the interpreter.
The value itself is stored in the variable ``ret`` and returned.
Sequence of prints from example run:
```
value (4 + 5 * 6):
4 -- ret = 4
5 -- ret = 5
6 -- ret = 6
30 -- ret = 5 * 6
34 -- ret = 4 + 30
```
#NEW
==Implementing side effects==
This is very simple in Java: just add a printing statement to the code
```
private class Value implements Exp. Visitor {
public Integer visit (EAdd p, Object arg) {
Integer ret = value(p.exp_1) + value(p.exp_2) ;
System.out.println(ret) ;
return ret ;
}
public Integer visit (EMul p, Object arg) {
Integer ret = value(p.exp_1) * value(p.exp_2) ;
System.out.println(ret) ;
return ret ;
}
public Integer visit (EInt p, Object arg) {
ret = p.integer_ ;
System.out.println(ret) ;
return ret ;
}
}
```
What about Haskell? Haskell code is **pure** and cannot have side
effects.
Is Haskell so good for compilers after all?
#NEW
==Side effects in monads==
In Haskell, we have to change the value type to a **monad**
in order to get side effects.
The IO monad is the most basic one.
Now we can write much like in Java:
```
value :: Exp -> IO Integer
value (EAdd x y) = do
v1 <- value(x)
v2 <- value(y)
let ret = v1 + v2
print ret
return ret
value (EMul x y) = do
v1 <- value(x)
v2 <- value(y)
let ret = v1 * v2
print ret
return ret
value (EInt i) = do
let ret = i
print ret
return ret
```
#NEW
==A more interesting interpreter==
Language: sequence of assignments to integer variables.
```
Prog. Program ::= [Ass]
AVar. Ass ::= Ident "=" Exp ";"
EAdd. Exp ::= Exp "+" Exp
EMul. Exp ::= Exp "*" Exp
EInt. Exp ::= Integer
EId. Exp ::= Ident
```
Interpreter: return the value of the last assignment.
```
x = 7
y = x + 5
y = x + y * 4
returns: 55
```
Cf. Book, chapter 2.
#NEW
==Symbol tables==
The interpreter must maintain a **symbol table**, mapping
variables to their values. This is how it evolves:
```
x = 7 -- x = 7
y = x + 5 -- x = 7, y = 12
y = x + y * 4 -- x = 7, y = 55
```
Symbol tables are needed in other compiler phases as well.
Before implementing them, let us summarize their main usages:
- **lookup**: return the value of a symbol; error if the symbol is not
defined.
- **update**: change the value of a symbol; add the symbol if it is not
defined.
Updating symbol tables is another side effect common in
syntax-directed translation.
#NEW
==Interpreter with symbol tables==
``interpret`` is defined for programs and assignments, returning nothing.
``value`` is defined for assignments and expressions, returning an integer.
```
interpret(s1 ... sn) =
interpret(s1)
...
interpret(sn)
print(value(sn))
interpret(v = e) =
update(v,value(e))
value(v = e) =
return(value(v))
value (x + y) =
return(value(x) + value(y))
value (x * y) =
return(value(x) * value(y))
value (i) =
return(i)
value (v) =
return(lookup(v))
```
#NEW
==Implementing symbol tables in Haskell==
Lists ``[(symbol,value)]``, or (better) ``Map symbol value``.
```
lookup :: s -> Map s v -> Maybe v
insert :: s -> v -> Map s v -> Map s v
```
In the interpreter, the symbol table can be sent as an argument.
```
interpretProgram :: Program -> Map Ident Integer -> Map Ident Integer
interpretProgram [] table = table
interpretProgram (s:ss) table =
let table' = interpretAss s table
in interpretProgram ss table'
interpretAss :: Ass -> Map Ident Integer -> Map Ident Integer
interpretAss (AVar v e) table =
insert v (value table e) table
value :: Exp -> Map Ident Integer -> Integer
value (EAdd x y) table =
value x table + value y table
--
value (EVar v) table =
case lookup v table of
Just i -> i
Nothing -> error ("unknown variable " ++ show v)
```
#NEW
==Implementing symbol tables in Java 1.5==
Hash maps ``Map``, with methods
```
value get (symbol s)
void put (symbol s, value v)
```
Example:
```
public static class Interpret implements Stm.Visitor