Lecture 6: Syntax-Directed Translation

Programming Languages Course
Aarne Ranta (aarne@chalmers.se)

Book: 5.1, 5.3, 2.7

Syntax-directed translation

Functions that take abstract syntax trees as arguments.

A general technique in compiler phases after parsing:

type checker: from trees to Booleans (or to type-annotated trees)
interpreter: from trees to values
optimizer: from trees to trees
code generator: from trees to target code instructions

Implementable in different programming languages

pattern matching in Haskell
visitors in Java and C++

Help from BNFC: Skeleton files implementing the traversal of trees, returning dummy values that can be hand-edited.

Example 1: interpreter of arithmetic expressions

Grammar

    Exp ::= Exp "+" Exp | Exp "*" Exp | Integer

Syntax-directed translation function returning the value of an expression

    value (x + y) = value(x) + value(y)
    value (x * y) = value(x) * value(y)
    value (i)     = i

There is a case for each syntactic constructor.

Normally, the value for a tree is built from the values for subtrees.

Example run:

    value (4 + 5 * 6)
    = value(4) + value(5 * 6)
    = 4 + (value(5) * value(6))
    = 4 + (5 * 6)
    = 4 + 30
    = 34

Example 2: JVM code generator

Grammar (the same as previous)

    Exp ::= Exp "+" Exp | Exp "*" Exp | Integer

Syntax-directed translation function returning JVM code

    code (x + y) = code(x) \n code(y) \n iadd
    code (x * y) = code(x) \n code(y) \n imul
    code (i)     = bipush i

Example run

    code (4 + 5 * 6)
    = code(4) \n code(5 * 6) \n iadd
    = bipush 4 \n code(5) \n code(6) \n imul \n iadd
    = bipush 4 \n bipush 5 \n bipush 6 \n imul \n iadd

Making translation rules precise

Above, we have used a pseudocode notation where we have shown concrete syntax instead of syntax trees.

We can do this more precise by using the abstract syntax constructors:

    EAdd. Exp ::= Exp "+" Exp 
    EMul. Exp ::= Exp "*" Exp 
    EInt. Exp ::= Integer

Now we can write

    value (EAdd x y) = value(x) + value(y)
    value (EMul x y) = value(x) * value(y)
    value (EInt i)   = i

Implementing translation rules in Haskell

But what is the set of translation rules on the previous slide?

It is exactly a piece of Haskell code defining a function with the type signature

    value :: Exp -> Integer

It is defined by pattern matching on the datatype

    data Exp = EAdd Exp Exp | EMul Exp Exp | EInt Integer

Datatypes and pattern matching make Haskell very usable in compilers.

Now, how can we do in Java?

Translation rules in Java: datatypes

Recall the way datatypes are implemented in Java:

    public abstract class Exp
  
    public class EAdd extends Exp {
      public final Exp exp_1, exp_2;
    }  
  
    public class EMul extends Exp {
      public final Exp exp_1, exp_2;
    }  
  
    public class EInt extends Exp {
      public final Integer integer_;
    }

The most obvious way to implement the value function is to add value as a class method.

Translation rules in Java: a new method

Let us add the value() method to Exp and its subclasses.

    public abstract class Exp {
      public abstract Integer value() ;
    }
  
    public class EAdd extends Exp {
      public final Exp exp_1, exp_2;
      public Integer value() {return exp_1.value() + exp_2.value() ;}
    }  
  
    public class EMul extends Exp {
      public final Exp exp_1, exp_2;
      public Integer value() {return exp_1.value() * exp_2.value() ;}
    }  
  
    public class EInt extends Exp {
      public final Integer integer_;
      public Integer value() {return integer_ ;}
    }

Attribute grammars

This is the method explained in Book, chapter 5.

Attribute grammars combine parsing rules with translations made as semantic actions, which can be expressed in standard parser tools (Yacc, Bison, CUP, Happy).

Previous example as attribute grammar

    Exp ::= Exp_1 "+" Exp_2  {Exp.value = Exp_1.value + Exp_2.value}
    Exp ::= Exp_1 "*" Exp_2  {Exp.value = Exp_1.value * Exp_2.value}
    Exp ::= Integer          {Exp.value = Integer.intval}

In this course, we assume that semantic actions only build abstract syntax trees (cf. Book 5.3.1). As attribute grammar:

    Exp ::= Exp_1 "+" Exp_2  {Exp.tree = EAdd Exp_1.tree Exp_2.tree}
    Exp ::= Exp_1 "*" Exp_2  {Exp.tree = EMul Exp_1.tree Exp_2.tree}
    Exp ::= Integer          {Exp.tree = EInt Integer}

Translation rules in Java: another new method

Let us now add the code() method to Exp and its subclasses. It will print the code to standard output. To save space, the value() method is not shown.

    public abstract class Exp {
      public abstract void code() ;
    }
  
    public class EAdd extends Exp {
      public final Exp exp_1, exp_2;
      public void code() {
        exp_1.code() ;
        exp_2.code() ;
        System.out.println("iadd") ; 
      }
    }  
  
    public class EMul extends Exp {
      public final Exp exp_1, exp_2;
      public void code() {
        exp_1.code() ;
        exp_2.code() ;
        System.out.println("imul") ; 
      }
    }  
  
    public class EInt extends Exp {
      public final Integer integer_;
      public void code() {
        System.out.println("bipush " + integer_) ; 
      }
    }

How many translation methods?

Which translation methods should there be in the datatype definitions?

Interpreter, code generator, type checker, pretty printer,...

Well-known problem with object-oriented programming:

it is easy to add new data constructors (subclasses)...
... but it is difficult to define new functions

This is in contrast with functional programming (Haskell)

it is easy to define new functions...
... but it is difficult to add new data constructors

However, the problem can be solved in Java by using visitors.

The visitor method

A general way of defining functions over a class, with any return type R and a supplementary argument of any type A.

    public abstract class Exp {
      public abstract <R,A> R accept(Exp.Visitor<R,A> v, A arg);
      public interface Visitor <R,A> {
        public R visit(Arithm.Absyn.EAdd p, A arg);
        public R visit(Arithm.Absyn.EMul p, A arg);
        public R visit(Arithm.Absyn.EInt p, A arg);
      }
    }
    public class EAdd extends Exp {
      public final Exp exp_1, exp_2;
      public <R,A> R accept(Arithm.Absyn.Exp.Visitor<R,A> v, A arg) { 
        return v.visit(this, arg); 
      }
    }
    public class EInt extends Exp {
      public final Integer integer_;
      public <R,A> R accept(Arithm.Absyn.Exp.Visitor<R,A> v, A arg) { 
        return v.visit(this, arg); 
      }
    }

Translation using the visitor method

  public class Interpreter {
    public Integer value(Exp e) {
      return e.accept(new Value() , null ) ;
    }
    private class Value implements Exp. Visitor<Integer, Object> {
      public Integer visit (EAdd p, Object arg) {
        return value(p.exp_1) + value(p.exp_2) ;
      }
      public Integer visit (EMul p, Object arg) {
        return value(p.exp_1) * value(p.exp_2) ;
      }
      public Integer visit (EInt p, Object arg) {
        return p.integer_ ;
      }
    }
  }

Translation with side effects

Let us build a "debugger" that prints intermediate values while it evaluates the expression.

    value (x + y) = 
      ret = value(x) + value(y)
      print(ret)
      return ret
    value (x * y) = 
      ret = value(x) * value(y)
      print(ret)
      return ret
    value (i) = 
      ret = i
      print(ret) 
      return ret

Printing the value is a side effect of the interpreter.

The value itself is stored in the variable ret and returned.

Sequence of prints from example run:

    value (4 + 5 * 6):
     4  -- ret = 4
     5  -- ret = 5
     6  -- ret = 6
    30  -- ret = 5 * 6
    34  -- ret = 4 + 30

Implementing side effects

This is very simple in Java: just add a printing statement to the code

    private class Value implements Exp. Visitor<Integer, Object> {
      public Integer visit (EAdd p, Object arg) {
        Integer ret = value(p.exp_1) + value(p.exp_2) ;
        System.out.println(ret) ;
        return ret ;
      }
      public Integer visit (EMul p, Object arg) {
        Integer ret = value(p.exp_1) * value(p.exp_2) ;
        System.out.println(ret) ;
        return ret ;
      }
      public Integer visit (EInt p, Object arg) {
        ret = p.integer_ ;
        System.out.println(ret) ;
        return ret ;
      }
    }

What about Haskell? Haskell code is pure and cannot have side effects.

Is Haskell so good for compilers after all?

Side effects in monads

In Haskell, we have to change the value type to a monad in order to get side effects.

The IO monad is the most basic one.

Now we can write much like in Java:

    value :: Exp -> IO Integer
    value (EAdd x y) = do
      v1 <- value(x)
      v2 <- value(y)
      let ret = v1 + v2
      print ret
      return ret
    value (EMul x y) = do
      v1 <- value(x)
      v2 <- value(y)
      let ret = v1 * v2
      print ret
      return ret
    value (EInt i) = do
      let ret = i
      print ret
      return ret

A more interesting interpreter

Language: sequence of assignments to integer variables.

    Prog. Program ::= [Ass]
  
    AVar. Ass ::= Ident "=" Exp ";"
  
    EAdd. Exp ::= Exp "+" Exp 
    EMul. Exp ::= Exp "*" Exp 
    EInt. Exp ::= Integer
    EId.  Exp ::= Ident

Interpreter: return the value of the last assignment.

    x = 7
    y = x + 5
    y = x + y * 4
  
    returns: 55

Cf. Book, chapter 2.

Symbol tables

The interpreter must maintain a symbol table, mapping variables to their values. This is how it evolves:

    x = 7            -- x = 7
    y = x + 5        -- x = 7, y = 12
    y = x + y * 4    -- x = 7, y = 55

Symbol tables are needed in other compiler phases as well.

Before implementing them, let us summarize their main usages:

lookup: return the value of a symbol; error if the symbol is not defined.
update: change the value of a symbol; add the symbol if it is not defined.

Updating symbol tables is another side effect common in syntax-directed translation.

Interpreter with symbol tables

interpret is defined for programs and assignments, returning nothing.

value is defined for assignments and expressions, returning an integer.

    interpret(s1 ... sn) =
      interpret(s1)
      ...
      interpret(sn)
      print(value(sn))
  
    interpret(v = e) =
      update(v,value(e))
  
    value(v = e) =
      return(value(v))
  
    value (x + y) = 
      return(value(x) + value(y))
  
    value (x * y) = 
      return(value(x) * value(y))
  
    value (i) = 
      return(i)
  
    value (v) = 
      return(lookup(v))

Implementing symbol tables in Haskell

Lists [(symbol,value)], or (better) Map symbol value.

    lookup :: s -> Map s v -> Maybe v
    insert :: s -> v -> Map s v -> Map s v

In the interpreter, the symbol table can be sent as an argument.

    interpretProgram :: Program -> Map Ident Integer -> Map Ident Integer
    interpretProgram [] table = table
    interpretProgram (s:ss) table = 
      let table' = interpretAss s table
      in interpretProgram ss table'
  
    interpretAss :: Ass -> Map Ident Integer -> Map Ident Integer
    interpretAss (AVar v e) table =
      insert v (value table e) table
  
    value :: Exp -> Map Ident Integer -> Integer
    value (EAdd x y) table =  
      value x table + value y table
    --
    value (EVar v) table = 
      case lookup v table of
        Just i -> i
        Nothing -> error ("unknown variable " ++ show v)

Implementing symbol tables in Java 1.5

Hash maps Map<symbol,value>, with methods

    value get (symbol s)
    void  put (symbol s, value v)

Example:

    public static class Interpret implements Stm.Visitor<Object,Map<Ident,Integer>> {
      public Object visit(AVar p, Map<Ident, Integer> env) {
        Integer i = p.exp_.accept(new Value(), env) ;
        env.put(p.ident_, i) ;
        return null ;
      }
    }
  
    public static class Value implements Exp.Visitor<Integer, Map<Ident,Integer>> {
  
      public Integer visit(EVar p, Map<Ident,Integer> env) {
        Integer i = env.get(p.ident_);
        if (i != null)
          return t;
        else
          throw //...
      }
    }