Lecture 8: Implementing Type Checking
Programming Languages Course
Aarne Ranta (aarne@cs.chalmers.se)

%!target:html

%!postproc(html): #NEW <!-- NEW -->
%!postproc(html): #HR <HR>
%!postproc(html): #sub1 <sub>1</sub>
%!postproc(html): #subn <sub>n</sub>


Book: 6.3, 6.5


#NEW

==Plan==

From typing rules to type checking code.

Type checker implementation in Haskell.

Type checker implementation in Java.

The location of errors

Extra: scoping puzzles in real C++


#NEW

==From typing rules to type checking code==

Basic idea: from rule
```
   J#sub1 ...  J#subn
   ---------- C
       J
```
generate the code "upside down"
```
  check J =
    check J#sub1
    ...
    check J#subn
    check_condition C
```
Example:
```
  Env => exp1 : bool     Env => exp2 : bool     check Env => exp1 && exp2 : bool =
  -----------------------------------------       check Env => exp1 : bool
       Env => exp1 && exp2 : bool                 check Env => exp1 : bool
```


#NEW

==From typing rules to type checking code: more examples==

Judgements are easy: recursive calls to check.
```
  Env => exp : bool   Env => stm valid      check Env => while (exp) stm valid =
  ------------------------------------        check Env => exp : bool   
      Env => while (exp) stm valid            check Env => stm valid
```
Side conditions are unlimited code, so you have to think harder.  
```
  ---------------- var : typ is in Env      check Env => var : typ =
  Env => var : typ                            check_condition lookup(var,Env) == typ
```
It is ``lookup`` and such conditions that in the end generate the error messages.
```
  lookup(var,Env) = message ("variable " var "not found") // if var is not in Env

  check_condition x == y = message ("expected " y " but found " x) // if not equal
```


#NEW

==The need of type inference==

There is a grammar rule saying that expressions can be used as statements:
```
  Stm ::= Exp ";"
```
How do we check that such statements are valid?
```
  Env => exp : ?
  ------------------
  Env => exp ; valid
```
The problem is that we have no type ``typ`` to check ``exp : typ``.

Solution 1: check ``exp`` with each of the four types
```
  check Env => exp ; valid =
    try each typ in  [bool,double,int,void]: 
      check exp : typ
```
This is inefficient, and does not scale up to infinitely many types.

Solution 2: do type inference with ``exp``. If it succeeds, the statement
is valid - because expressions of any type can be used as statments.


#NEW

==Type inference==

The general scheme is a rule where the conclusion has a type depending in
some way on the premises and the condition:
```
         J#sub1 ...  J#subn
  --------------------------------- C
  Env => exp : typ(J#sub1, ..., J#subn, C)
```
We should then use recursive calls of ``check`` and ``infer`` so that
- everything we need for constructing the type is inferred
- everything else is just checked


Often the type is independent of the premisses (which still have to be checked of course!):
```
  Env => exp1 : bool      Env => exp2 : bool     infer Env (exp1 && exp2) =
  ------------------------------------------       check Env => exp1 : bool
       Env => exp1 && exp2 : bool                  check Env => exp2 : bool
                                                   return bool
```                                
It can also come from the condition:
```
      ---------------- var : typ is in Env       infer Env var =
      Env => var : typ                             return lookup(var,Env)
```


#NEW

==Type checking overloaded operations==

Arithmetic operations in most languages are **overloaded**.

This means that they apply to many types.

The general rule for ``+ - * /`` is: both operands have the same type as the value,
which must be ``int`` or ``double``.
```
  Env => exp1 : typ    Env => exp2 : typ
  -------------------------------------- typ is int or double
      Env => exp1 + exp2 : typ
```
What we do is infer the type of the first operand and check the second.
```
  infer Env (exp1 + exp2) =
    typ := infer Env exp1
    check_condition typ == int or typ == double
    check Env => exp2 : typ
    return typ
```
Also the comparison operators are overloaded, but 
the return type is of course ``bool``.


#NEW

==Relating inference and checking==

Now we can check expression statements:
```
  check Env => exp ; valid =
    infer Env exp
```
If ``infer`` fails, we get any error message it generates.

If ``infer`` succeeds, we discard the type.


In the same way, we only need to write ``infer`` for expressions.
Then we define ``check`` uniformly,
```
  check Env => exp : typ =
    typ2 := infer Env exp
    check_condition typ2 == typ
```
The ``check_condition`` call usually returns a message at failure, e.g.
```
    TYPE ERROR
    type of exp: expected typ, inferred typ2
```


#NEW

==The top-level checkers==

To check the whole program,
+ collect the types of each function into the signature
+ check that function names are unique
+ check each function definition using the signature


To check a function definition
+ check that argument variables are unique
+ initialize the topmost context with the argument variables
+ check the body in this context
+ check that there is a ``return``, with an expression
  that has the expected return type of the function (or just 
  a ``return`` if the type is ``void``)


To check a sequence of statements
+ check the validity of the first statement and update the environment
  if appropriate
+ check the remaining sequence in the new environment
+ an empty sequence is always valid


#NEW

==Type checker in Haskell==

You can copy the contents of
[``laborations/lab2/haskell/`` ../laborations/lab2/haskell]:
```
  CPP.cf           -- grammar
  lab2.hs          -- main module
  Makefile         
  TypeChecker.hs   -- type checking module
```
You only have to modify ``CPP.cf`` and ``TypeChecker.hs``.

But you can already compile them: just type
```
  make
```
and run the type checker with
```
  ./lab2 <File>
```
The rest is "debugging the empty file"!


#NEW

===The Main module===

You don't have to write this - just copy the file
[``laborations/lab2/haskell/lab2.hs`` ../laborations/lab2/haskell/lab2.hs].

This file shows how compiler phases are linked together.
```
  check :: String -> IO () 
  check s = case pProgram (myLexer s) of
              Bad err  -> do putStrLn "SYNTAX ERROR"
                             putStrLn err
                             exitFailure 
              Ok  tree -> case typecheck tree of
                            Bad err -> do putStrLn "TYPE ERROR"
                                          putStrLn err
                                          exitFailure 
                            Ok _    -> putStrLn "OK"
```
In other words: call the parser; if it succeeds, call the type checker.

Notice the use of the **error type**,
```
  data Err a = Ok a | Bad String
```
The value is either ``Ok`` of the expected type or ``Bad``
with an error message.


#NEW

===Using the Err type===

The ``Err`` type 
```
  data Err a = Ok a | Bad String
```
is a **monad** - a type of actions returning ``a`` but also doing
other things (in this case: exceptions).

Monad actions can be **sequence**d: if
```
  inferExp :: Env -> Exp -> Err Type 
```
then you can make several inferences one after the other by using ``do``
```
  do inferExp env exp1
     inferExp env exp2
```
You can **bind** variables returned from actions, and **return**
values.
```
  do typ1 <- inferExp env exp1
     typ2 <- inferExp env exp2
     return TBool
```
If you are only interested in side effects, use the dummy value type
``()`` (corresponds to ``void`` in C and Java).


#NEW

==Symbol tables==

Environment type
```
  type Env = (Sig,[Context])
  type Sig = [(Id,([Type],Type))]    -- or Map Id ([Type],Type)
  type Context = [(Id,Type)]         -- or Map Id Type
```
Auxiliary operations on the environment
```
  lookVar   :: Env -> Id -> Err Type
  lookFun   :: Env -> Id -> Err ([Type],Type)
  updateVar :: Env -> Id -> Type -> Err Env
  updateFun :: Env -> Id -> ([Type],Type) -> Err Env
  newBlock  :: Env -> Err Env
  emptyEnv  :: Env
```
Keep the datatypes abstract, i.e. use them only via these operations.
Then you can switch to another implementation if needed (more efficient,
more stuff in the environment).


#NEW

===The TypeCheck module===

The environment datatypes and operations.

Type signatures of the checking methods
```
  typecheck :: Program -> Err ()                -- required function in lab2
  checkDef  :: Env -> Def -> Err ()             -- check a function definition
  checkStms :: Env -> Type -> [Stm] -> Err ()
  checkStm  :: Env -> Type -> Stm -> Err Env
  checkExp  :: Env -> Type -> Exp -> Err ()
  inferExp  :: Env -> Exp  -> Err Type
```
Some other auxiliaries.
```
  checkUnique    :: (Ord a, Print a) => [a] -> Err ()
  checkCondition :: Bool -> Err ()
```


#NEW

===Some examples of checking===

```
  checkStm :: Env -> Type -> Stm -> Err Env
  checkStm env val x = case x of
    SExp exp  -> do
      inferExp env exp
      return env
    SDecl type' x  -> 
      updateVar env id type'   -- also check that x is not in context already
    SWhile exp stm  -> do
      checkExp env Type_bool exp
      checkStm env val stm

  checkExp :: Env -> Type -> Exp -> Err ()
  checkExp env typ exp = do
    typ2 <- inferExp env exp
    if (typ2 == typ) then
        return ()
      else
        fail $ "type of " ++ printTree exp -- ... 
```

#NEW

===Some examples of type inference===

```
  inferExp :: Env -> Exp -> Err Type
  inferExp env x = case x of
    ETrue      -> return Type_bool
    EInt n     -> return Type_int
    EId id     -> lookVar env id
    EPIncr exp -> inferNumeric env exp
    ETimes exp0 exp -> inferNumericBin env exp0 exp

  inferNumeric :: Env -> Exp -> Err Type
  inferNumeric env exp = do
    typ <- inferExp env exp
    if elem typ [Type_int, Type_double] then
        return typ
      else
        fail $ "type of expression " ++ printTree exp -- ...

  inferNumericBin :: Env -> Exp -> Exp -> Err Type
```


#NEW

==Type checker in Java==

You can copy the contents of
[``laborations/lab2/java/`` ../laborations/lab2/java1.5]:
```
  CPP.cf             -- grammar
  lab2               -- script running the type checker
  lab2.java          -- main program
  Makefile
  TypeChecker.java   -- type checker class
  TypeException.java -- exceptions for type checking
```
You only have to modify ``CPP.cf`` and ``TypeChecker.java``.

But you can already compile them: just type
```
  make
```
and run the type checker with
```
  ./lab2 <File>
```
The rest is "debugging the empty file"!

Before ``make``, you may have to set your class path so that it finds
java_cup and JLex, as well as the current directory.
```
  export CLASSPATH=.:<path-to-JLex>:<path-to-CUP>:$CLASSPATH
```


#NEW

===The Main module===

This is given in
[``laborations/lab2/java/lab2.java`` ../laborations/lab2/java1.5/lab2.java],
hence you don't have to write this.

It shows how compiler phases are linked together.
```
try {
	l = new Yylex(new FileReader(args[0]));
	parser p = new parser(l);
	CPP.Absyn.Program parse_tree = p.pProgram();
	new TypeChecker().typecheck(parse_tree);

} catch (TypeException e) {
	System.out.println("TYPE ERROR");
	System.err.println(e.toString());
	System.exit(1);
} catch (IOException e) {
	System.err.println(e.toString());
	System.exit(1);
} catch (Throwable e) {
	System.out.println("SYNTAX ERROR");
	System.out.println("At line " + String.valueOf(l.line_num()) 
			   + ", near \"" + l.buff() + "\" :");
	System.out.println("     " + e.getMessage());
	System.exit(1);
}
```


#NEW

==Symbol tables==

Environment types
```
  public static class FunType {
    public LinkedList<Type> args ;
    public Type val ;
  }

  public static class Env {
    public Map<String,FunType> signature ;
    public LinkedList<Map<String,Type>> contexts ;

    public static Type lookVar(String id) { ...} ;
    public static FunType lookFun(String id) { ...} ;
    public static void updateVar (String id, Type ty) {...} ;
    // ...
  }
```


#NEW

===The TypeCheck module===

The environment datatypes and operations.

An enumeration of codes for types.
```
  public static enum TypeCode { CodeInt, CodeDouble, CodeBool, CodeVoid } ;
```
Notice that ``TypeCode`` is not the same as the class
``Type``, which is the syntactic category of source-language types.
We need ``TypeCode`` to be able to compare types for equality,
and this happens when we compare an expected type with an inferred type.

Type signatures of the checking methods
```
  public void typecheck(Program p) {
  }

  public static class CheckStm implements Stm.Visitor<Env,Env> {
    public Env visit(SDecl p, Env env) {
    }
    public Env visit(SExp p, Env env) {
    }
    // ...

  public static class InferExp implements Exp.Visitor<Type,Env> {
    public Type visit(EInt p, Env env) {
    }
    public Type visit(EAdd p, Env env) {
    }
    // ...

  }
```

#NEW

===Some examples of checking===

```
public static class CheckStm implements Stm.Visitor<Env,Env> {

    public Env visit(SDecl p, Env env) {
      env.updateVar(p.id_,p.type_) ;
      return env ;
    }

   //...
  }
```

#NEW

===Some examples of type inference===

```
  public static class InferExpType implements Exp.Visitor<Type,Env> {

    public Type visit(demo.Absyn.EPlus p, Env env) {
      Type t1 = p.exp_1.accept(this, env);
      Type t2 = p.exp_2.accept(this, env);

      if (typeCode(t1) == TypeCode.CodeInt && typeCode(t2) == TypeCode.CodeInt)
         return TInt;
      else
      if (typeCode(t1) == TypeCode.CodeDouble && typeCode(t2) == TypeCode.CodeDouble)
         return TDouble;
      else
        throw new TypeException("Operands to + must be int or double.");
      }
    //...
  }
```
The function ``typeCode`` converts source language types to their codes:
```
  public static TypeCode typeCode (Type ty) ...
```
It can be implemented by using a visitor or the ``instanceof`` operator.


#NEW

===More help===

You don't need to debug completely empty files:
- for the grammars, you can pick rules from your Lab 1 (as indicated in Lab 2 PM)
- for the type checker, you can start from the "mini" implementation, in
  [``laborations/mini`` ../laborations/mini]


#NEW

==The location of errors==

BNFC abstract syntax does not normally save the location in source file.

This means that the location of type checker (or later) errors cannot
be known by the type checker.

However, there is a possibility to mark the position of any : use
```
  position token Id ...
```
instead of just ``token``. 

This works at the moment only in Haskell and C++ with STL, and is
therefore not required in the lab.

Haskell representation:
```
  newtype Id = Id ((Int,Int),String) -- line, column, ident
```
C++ representation:
```
  class Id : public Visitable {
  public:
    String string_;    // ident
    Integer integer_;  // line
  //
  }
```


#NEW

==Extra: scoping puzzles in real C++==

What does the following print - or is it correct?
```
  if (int x = 2)
    int x = 4;

  std::cout << x ;

  while (int y = 2)
    int y = 4;

  std::cout << y ;

  while (int z = 2)
    {int z = 4;}

  std::cout << z ;

  while (int u = 2)
    {{int u = 4;}}

  std::cout << u ;
```
Can you initialize a variable by itself?
```
  int x = x;
```