Lecture 7: Type Checking
Programming Languages Course
Aarne Ranta (aarne@chalmers.se)

%!target:html

%!postproc(html): #NEW <!-- NEW -->
%!postproc(html): #HR <HR>
%!postproc(html): #sub1 <sub>1</sub>
%!postproc(html): #subn <sub>n</sub>


Book: 6.3, 6.5


#NEW

==The purpose of types==

To define what the program should do.
- e.g. read an array of integers and return a double


To guarantee that the program is meaningful.
- that it does not add a string to an integer
- that variables are declared before they are used


To document the programmer's intentions.
- better than comments, which are not checked by the compiler


To optimize the use of hardware.
- reserve the minimal amount of memory, but not more
- use the most appropriate machine instructions


#NEW

==What belongs to type checking==

Depending on language, the type checker can prevent 
- application of a function to wrong number of arguments,
- application of integer functions to floats,
- use of undeclared variables in expressions,
- functions that do not return values,
- division by zero
- array indices out of bounds,
- nonterminating recursion,
- sorting algorithms that don't sort...


Languages differ greatly in how strict their
static semantics is: none of the things above is
checked by all programming languages!

In general, the more there is static checking in the compiler, 
the less need there is for manual debugging.


#NEW

==Description formats for different compiler phases==

These formats are independent of implementation language.

- Lexer: regular expressions

- Parser: BNF grammars

- **Type checker: typing rules**

- Interpreter: operational semantic rules

- Code generator: compilation schemes


#NEW

==Typing judgements and rules==

Typing rules concern **judgements** of the form
```
   E => e : T
```
where //E// is an **environment**, which contains e.g. typings of identifiers. 
The judgement says
- in the environment //E//, expression //e// has type //T//


Judgements are used in **typing rules** of the form
```
   J1  J2  ...  Jn
   --------------- C
         J
```
(n >= 0) which says
- from the judgements //J1, J2, ..., Jn// you may conclude //J//, 
  if condition //C// holds.


The judgements above the line in a rule are the **premisses**. 

The judgement under the line is the **conclusion**. 

The condition ``C`` beside is a 
**side condition**, typically not expressible as a judgement, 
therefore not a premiss.

The judgement are written in a formal language, whereas side conditions 
can be written in natural language.


#NEW

==Examples of typing rules and derivation==

Typing rules for arithmetic expressions
```
  E => e1 : int     E => e2 : int    E => e1 : int     E => e2 : int
  -------------------------------    -------------------------------
       E => e1 + e2 : int                E => e1 * e2 : int


       ---------- x : T is in E      ------------ i is an integer literal
       E => x : T                    E => i : int
```

Derivation of judgement ``x : int, y : int => x + 12 * y : int``
```
                            x:int, y:int => 12 : int    x:int, y:int => y : int
                            ---------------------------------------------------  
    x:int, y:int => x : int        x:int, y:int => 12 * y : int
    -----------------------------------------------------------
                x:int, y:int => x + 12 * y : int
```


#NEW
==Signature vs. context==

We generalize the type checking context to an **environment** with
two parts:
- **signature**, which shows the types of functions
- **context**, which shows the types of variables.


In the course of type checking, the signature remains the same 
throughout a program module, whereas
the context changes all the time.


#NEW

==Function types==

No expression in the language of Lab 2 has **function types**, 
because functions are never returned as values or used as arguments.

However, the compiler needs internally 
a data structure for function types, to hold the
types of the parameters and the return type. E.g. for a function
```
  bool between (int x, double a, double b) {...}
```
we write
```
  between : (int, double, double) -> bool
```
to express this internal representation in typing rules.


#NEW

==Notation for signature and context==

Dividing the environment to signature ``F`` and context ``G``,
```
  F,G => J
```
Example: typing rule for a variable expression
```
  ------------- x : T is in G                
  F,G => x : T
```
Example: typing rule for one-place function application
```
  F,G => e : A
  --------------- f : (A) -> T is in F                
  F,G => f(e) : T
```


#NEW

==The validity of statements==

Expressions have types, but statements do not.

However, also statements are checked in type checking.

We need a new judgement form, saying that a statement ``S`` is valid:
```
  F,G => S valid
```
Example: typing rule for an assignment
```
  F,G => e : T
  -------------------- x : T is in G                
  F,G => x = e ; valid
```
Example: typing rule for while loops
```
  F,G => e : bool   F,G => S valid
  --------------------------------
  F,G => while (e) S valid
```


#NEW

==Variable binding and context==

Contexts can be **extended** with new variables. The notation we use is
```
  (G, x : T)
```
This corresponds to **variable binding** constructs, e.g. declarations.

Example: typing rule for a declaration; ``SS`` is a sequence of statements
```
  F,(G, x:t) => SS valid
  ------------------------- x not in G
  F,G => t x ; SS valid
```
The rule says: if ``SS`` is a valid sequence of statements in context
``(G, x : T)``, then ``t x ; SS`` is valid in ``G``.


#NEW

==Example of variable binding and context==

We prove that ``int x ; x = x + 5 ;`` is valid in the empty context ``()``.
```
  x : int => x : int     x : int => 5 : int
  -----------------------------------------
        x : int => x + 5 : int
      --------------------------
      x : int => x = x + 5 ; valid
    -------------------------------
    () => int x ; x = x + 5 ; valid
```
The signature is omitted for simplicity.


#NEW

==Function definitions and applications==

The validity of a function definition.
```  
  F,(G, x#sub1:A#sub1,...,x#subn : A#subn) => SS valid
  --------------------------------------------- f not in F
  F,G => T f (A#sub1 x#sub1,...,A#subn x#subn) { SS } valid
```
More conditions:
- that ``x#sub1 ... x#subn`` are distinct
- that ``A#sub1 ... A#subn, T`` are types (can be guaranteed by syntax)
- that there is a ``return e`` with ``e : T`` (not always checked)


The typing rule for function applications is the following
```  
  F,G => f : (A#sub1,...,A#subn) -> T    F,G => e#sub1 : A#sub1,..., e#subn : A#subn
  ----------------------------------------------------------
                 F,G => f(e#sub1,...,e#subn) : T
```


#NEW

==Block structure==

In C, C++, Java, Haskell, etc, variables on the same level must be distinct
(e.g. in function parameter lists).

However, variables in an **inner block** are no longer on the same level
and can hence **overshadow** outer variables
```
  {
    int x = 1 ;
    bool b ;
    x = x + 2 ;         // x : int
    {
      double x = 2.0 ;
      x = x + 1.0 ;     // x : double
      b = true ;        // b : bool
    }
    x = x + 5 ;         // x : int
    b = b && b ;        // b : bool
  }
```
Variables declared in a block are discarded at exit from the block.

There is no limit in the number of block levels.


#NEW

==Contexts for block structure==

Implementation 1: with markers
```
  x : int, b : bool, MARK, x : double
```
- entering a block: add a marker
- leaving a block: delete variables after last marker, and the marker itself
- update: after the last variable
- lookup: the latest occurrence of the variable


Implementation 2: with stacks of contexts (separated by ".", stack top is rightmost)
```
  (x : int, b : bool).(x : double)
```
- entering a block: push an empty context on the stack
- leaving a block: pop the topmost context
- update: after the last variable in the topmost context
- lookup: the deepest surrounding occurrence of the variable


Implementation 2 can be done with lists of lists: the top of the stack is the head
of the list.


#NEW
==Typing rules for variables==

The rules are expressed as checking if a **list of statements** SS is valid.
Instead of a single context, we have a **stack of contexts** ``Gs.G`` where 
we denote by ``G`` the topmost context.

Declarations.
```
  Gs.(G,x:t) => SS valid
  -------------------------- x not in G
  Gs.G => t x ; SS valid
```
Assignments.
```
  Gs => e : t     Gs => SS valid
  ------------------------------- x : t in Gs  
  Gs => x = e ; SS valid
```
Blocks.
```
  Gs.() => SS valid      Gs => SSS valid
  -------------------------------------
  Gs => { SS } SSS valid
```
The typing rule for variable expressions is now
```
  -------------- x : T the closest entry for x in Gs.G
  Gs.G => x : T
```


#NEW

==Type checking and type inference==

**Type checking**: given a judgement ``G => e : T``, find out whether it
can be derived by the typing rules. The **derivation** is a tree of rule 
application with the judgement as the last line.

**Type inference**: given an expression ``e``,
find a type ``T`` in context ``G`` such that ``G => e : T``
can be derived by the typing rules.

For Java, C, and C++, we can mostly do with just type checking,
because types are marked explicitly.

Haskell has type inference as well: if you don't give the type,
the compiler can usually find //the most general type//.


#NEW

==Different type checkers==

We can classify checkers in terms of what they return:
- A //rude checker//, which only says ``True`` or ``False``, 
  and may even crash (for instance, when variable lookup
  just gives an ``error``is the variable is not found).
- An //error-reporting checker//, which returns ``OK`` or a message
  saying where the error is.
- An //annotating checker//, which returns a syntax tree annotated 
  with more type information.


To build a compiler back end, we need the third.

In Lab 2, we build the second.


#NEW

==The passes of the type checker==

Pass 1:
- start with empty signature
- for each function ``f``,  update signature with ``f : T``


Pass 2:
- for each ``f``,  check the function body of ``f``
  with respect to the type ``T``


The expression checker consists of functions:
```
  check (Exp e, Type t)  returns void
  infer (Exp e)          returns Type
```
These functions are defined by mutual recursion, by cases
on the expression. 

We also need to check function definitions and sequences of statements.
```
  check (Def  d)   returns void
  check (Stms ss)  returns void
```
All functions use an environment (= signature and stack of contexts).

The method is syntax-directed translation.


#NEW

==Examples of type checking==

We show syntax-directed translation in pseudocode.
```
  infer x =              // variable x
    t := lookup(x)
    return t

  infer i =              // integer literal i
    return int

  infer f(a#sub1,..., a#subn) =
    T := lookup(f)
    if T = (A#sub1, ..., A#subn) -> B
      check a#sub1 : A#sub1
      ...
      check a#subn : A#subn
      return B
    else failure
```


#NEW

==From typing rules to type checking code==

Basic idea: from rule
```
   J#sub1 ...  J#subn
   ---------- C
       J
```
generate the code "upside down"
```
  check J =
    check J#sub1
    ...
    check J#subn
    check_condition C
```
Example:
```
  Env => exp1 : bool     Env => exp2 : bool     check Env => exp1 && exp2 : bool =
  -----------------------------------------       check Env => exp1 : bool
       Env => exp1 && exp2 : bool                 check Env => exp1 : bool
```


#NEW

==From typing rules to type checking code: more examples==

Judgements are easy: recursive calls to check.
```
  Env => exp : bool   Env => stm valid      check Env => while (exp) stm valid =
  ------------------------------------        check Env => exp : bool   
      Env => while (exp) stm valid            check Env => stm valid
```
Side conditions are unlimited code, so you have to think harder.  
```
  ---------------- var : typ is in Env      check Env => var : typ =
  Env => var : typ                            check_condition lookup(var,Env) == typ
```
It is ``lookup`` and such conditions that in the end generate the error messages.
```
  lookup(var,Env) = message ("variable " var "not found") // if var is not in Env

  check_condition x == y = message ("expected " y " but found " x) // if not equal
```


#NEW

==The need of type inference==

There is a grammar rule saying that expressions can be used as statements:
```
  Stm ::= Exp ";"
```
How do we check that such statements are valid?
```
  Env => exp : ?
  ------------------
  Env => exp ; valid
```
The problem is that we have no type ``typ`` to check ``exp : typ``.

Solution 1: check ``exp`` with each of the four types
```
  check Env => exp ; valid =
    try each typ in  [bool,double,int,void]: 
      check exp : typ
```
This is inefficient, and does not scale up to infinitely many types.

Solution 2: do type inference with ``exp``. If it succeeds, the statement
is valid - because expressions of any type can be used as statments.


#NEW

==Type inference==

The general scheme is a rule where the conclusion has a type depending in
some way on the premises and the condition:
```
         J#sub1 ...  J#subn
  --------------------------------- C
  Env => exp : typ(J#sub1, ..., J#subn, C)
```
We should then use recursive calls of ``check`` and ``infer`` so that
- everything we need for constructing the type is inferred
- everything else is just checked


Often the type is independent of the premisses (which still have to be checked of course!):
```
  Env => exp1 : bool      Env => exp2 : bool     infer Env (exp1 && exp2) =
  ------------------------------------------       check Env => exp1 : bool
       Env => exp1 && exp2 : bool                  check Env => exp2 : bool
                                                   return bool
```                                
It can also come from the condition:
```
      ---------------- var : typ is in Env       infer Env var =
      Env => var : typ                             return lookup(var,Env)
```


#NEW

==Type checking overloaded operations==

Arithmetic operations in most languages are **overloaded**.

This means that they apply to many types.

The general rule for ``+ - * /`` is: both operands have the same type as the value,
which must be ``int`` or ``double``.
```
  Env => exp1 : typ    Env => exp2 : typ
  -------------------------------------- typ is int or double
      Env => exp1 + exp2 : typ
```
What we do is infer the type of the first operand and check the second.
```
  infer Env (exp1 + exp2) =
    typ := infer Env exp1
    check_condition typ == int or typ == double
    check Env => exp2 : typ
    return typ
```
Also the comparison operators are overloaded, but 
the return type is of course ``bool``.


#NEW

==Relating inference and checking==

Now we can check expression statements:
```
  check Env => exp ; valid =
    infer Env exp
```
If ``infer`` fails, we get any error message it generates.

If ``infer`` succeeds, we discard the type.


In the same way, we only need to write ``infer`` for expressions.
Then we define ``check`` uniformly,
```
  check Env => exp : typ =
    typ2 := infer Env exp
    check_condition typ2 == typ
```
The ``check_condition`` call usually returns a message at failure, e.g.
```
    TYPE ERROR
    type of exp: expected typ, inferred typ2
```


#NEW

==The top-level checkers==

To check the whole program,
+ collect the types of each function into the signature
+ check that function names are unique
+ check each function definition using the signature


To check a function definition
+ check that argument variables are unique
+ initialize the topmost context with the argument variables
+ check the body in this context
+ check that there is a ``return``, with an expression
  that has the expected return type of the function (or just 
  a ``return`` if the type is ``void``)


To check a sequence of statements
+ check the validity of the first statement and update the environment
  if appropriate
+ check the remaining sequence in the new environment
+ an empty sequence is always valid


#NEW

==Type checker in Haskell==

You can copy the contents of
[``laborations/lab2/haskell/`` ../laborations/lab2/haskell]:
```
  CPP.cf           -- grammar
  lab2.hs          -- main module
  Makefile         
  TypeChecker.hs   -- type checking module
```
You only have to modify ``CPP.cf`` and ``TypeChecker.hs``.

But you can already compile them: just type
```
  make
```
and run the type checker with
```
  ./lab2 <File>
```
The rest is "debugging the empty file"!


#NEW

===The Main module===

You don't have to write this - just copy the file
[``laborations/lab2/haskell/lab2.hs`` ../laborations/lab2/haskell/lab2.hs].

This file shows how compiler phases are linked together.
```
  check :: String -> IO () 
  check s = case pProgram (myLexer s) of
              Bad err  -> do putStrLn "SYNTAX ERROR"
                             putStrLn err
                             exitFailure 
              Ok  tree -> case typecheck tree of
                            Bad err -> do putStrLn "TYPE ERROR"
                                          putStrLn err
                                          exitFailure 
                            Ok _    -> putStrLn "OK"
```
In other words: call the parser; if it succeeds, call the type checker.

Notice the use of the **error type**,
```
  data Err a = Ok a | Bad String
```
The value is either ``Ok`` of the expected type or ``Bad``
with an error message. 

The ``Err`` type is generated by BNFC. One could also use Haskell's standard
type ``Either String a``.


#NEW

===Using the Err type===

The ``Err`` type 
```
  data Err a = Ok a | Bad String
```
is a **monad** - a type of actions returning ``a`` but also doing
other things (in this case: exceptions).

Monad actions can be **sequence**d: if
```
  inferExp :: Env -> Exp -> Err Type 
```
then you can make several inferences one after the other by using ``do``
```
  do inferExp env exp1
     inferExp env exp2
```
You can **bind** variables returned from actions, and **return**
values.
```
  do typ1 <- inferExp env exp1
     typ2 <- inferExp env exp2
     return TBool
```
If you are only interested in side effects, use the dummy value type
``()`` (corresponds to ``void`` in C and Java).


#NEW

==Symbol tables==

Environment type
```
  type Env = (Sig,[Context])       -- signature and stack of contexts
  type Sig = [(Id,([Type],Type))]  -- or Map Id ([Type],Type)
  type Context = [(Id,Type)]       -- or Map Id Type
```
Auxiliary operations on the environment
```
  lookVar   :: Env -> Id -> Err Type
  lookFun   :: Env -> Id -> Err ([Type],Type)
  updateVar :: Env -> Id -> Type -> Err Env
  updateFun :: Env -> Id -> ([Type],Type) -> Err Env
  newBlock  :: Env -> Err Env
  emptyEnv  :: Env
```
Keep the datatypes abstract, i.e. use them only via these operations.
Then you can switch to another implementation if needed (more efficient,
more stuff in the environment).


#NEW

===The TypeCheck module===

The environment datatypes and operations.

Type signatures of the checking methods
```
  typecheck :: Program -> Err ()                -- required function in lab2
  checkDef  :: Env -> Def -> Err ()             -- check a function definition
  checkStms :: Env -> Type -> [Stm] -> Err ()
  checkStm  :: Env -> Type -> Stm -> Err Env
  checkExp  :: Env -> Type -> Exp -> Err ()
  inferExp  :: Env -> Exp  -> Err Type
```
Some other auxiliaries.
```
  checkUnique    :: (Ord a, Print a) => [a] -> Err ()
  checkCondition :: Bool -> Err ()
```


#NEW

===Some examples of checking===

```
  checkStm :: Env -> Type -> Stm -> Err Env
  checkStm env val x = case x of
    SExp exp  -> do
      inferExp env exp
      return env
    SDecl type' x  -> 
      updateVar env id type'   -- also check that x is not in context already
    SWhile exp stm  -> do
      checkExp env Type_bool exp
      checkStm env val stm

  checkExp :: Env -> Type -> Exp -> Err ()
  checkExp env typ exp = do
    typ2 <- inferExp env exp
    if (typ2 == typ) then
        return ()
      else
        fail $ "type of " ++ printTree exp -- ... 
```

#NEW

===Some examples of type inference===

```
  inferExp :: Env -> Exp -> Err Type
  inferExp env x = case x of
    ETrue      -> return Type_bool
    EInt n     -> return Type_int
    EId id     -> lookVar env id
    EPIncr exp -> inferNumeric env exp
    ETimes exp0 exp -> inferNumericBin env exp0 exp

  inferNumeric :: Env -> Exp -> Err Type
  inferNumeric env exp = do
    typ <- inferExp env exp
    if elem typ [Type_int, Type_double] then
        return typ
      else
        fail $ "type of expression " ++ printTree exp -- ...

  inferNumericBin :: Env -> Exp -> Exp -> Err Type
```


#NEW

==Type checker in Java==

You can copy the contents of
[``laborations/lab2/java/`` ../laborations/lab2/java1.5]:
```
  CPP.cf             -- grammar
  lab2               -- script running the type checker
  lab2.java          -- main program
  Makefile
  TypeChecker.java   -- type checker class
  TypeException.java -- exceptions for type checking
```
You only have to modify ``CPP.cf`` and ``TypeChecker.java``.

But you can already compile them: just type
```
  make
```
and run the type checker with
```
  ./lab2 <File>
```
The rest is "debugging the empty file"!

Before ``make``, you may have to set your class path so that it finds
java_cup and JLex, as well as the current directory.
```
  export CLASSPATH=.:<path-to-JLex>:<path-to-CUP>:$CLASSPATH
```


#NEW

===The Main module===

This is given in
[``laborations/lab2/java/lab2.java`` ../laborations/lab2/java1.5/lab2.java],
hence you don't have to write this.

It shows how compiler phases are linked together.
```
try {
	l = new Yylex(new FileReader(args[0]));
	parser p = new parser(l);
	CPP.Absyn.Program parse_tree = p.pProgram();
	new TypeChecker().typecheck(parse_tree);

} catch (TypeException e) {
	System.out.println("TYPE ERROR");
	System.err.println(e.toString());
	System.exit(1);
} catch (IOException e) {
	System.err.println(e.toString());
	System.exit(1);
} catch (Throwable e) {
	System.out.println("SYNTAX ERROR");
	System.out.println("At line " + String.valueOf(l.line_num()) 
			   + ", near \"" + l.buff() + "\" :");
	System.out.println("     " + e.getMessage());
	System.exit(1);
}
```


#NEW

==Symbol tables==

Environment types
```
  public static class FunType {
    public LinkedList<Type> args ;
    public Type val ;
  }

  public static class Env {
    public Map<String,FunType> signature ;
    public LinkedList<Map<String,Type>> contexts ;  -- stack of contexts

    public static Type lookVar(String id) { ...} ;
    public static FunType lookFun(String id) { ...} ;
    public static void updateVar (String id, Type ty) {...} ;
    // ...
  }
```


#NEW

===The TypeCheck module===

The environment datatypes and operations.

An enumeration of codes for types.
```
  public static enum TypeCode { CodeInt, CodeDouble, CodeBool, CodeVoid } ;
```
Notice that ``TypeCode`` is not the same as the class
``Type``, which is the syntactic category of source-language types.
We need ``TypeCode`` to be able to compare types for equality,
and this happens when we compare an expected type with an inferred type.

Type signatures of the checking methods
```
  public void typecheck(Program p) {
  }

  public static class CheckStm implements Stm.Visitor<Env,Env> {
    public Env visit(SDecl p, Env env) {
    }
    public Env visit(SExp p, Env env) {
    }
    // ...

  public static class InferExp implements Exp.Visitor<Type,Env> {
    public Type visit(EInt p, Env env) {
    }
    public Type visit(EAdd p, Env env) {
    }
    // ...

  }
```

#NEW

===Some examples of checking===

```
public static class CheckStm implements Stm.Visitor<Env,Env> {

    public Env visit(SDecl p, Env env) {
      env.updateVar(p.id_,p.type_) ;
      return env ;
    }

   //...
  }
```

#NEW

===Some examples of type inference===

```
  public static class InferExpType implements Exp.Visitor<Type,Env> {

    public Type visit(demo.Absyn.EPlus p, Env env) {
      Type t1 = p.exp_1.accept(this, env);
      Type t2 = p.exp_2.accept(this, env);

      if (typeCode(t1) == TypeCode.CodeInt && typeCode(t2) == TypeCode.CodeInt)
         return TInt;
      else
      if (typeCode(t1) == TypeCode.CodeDouble && typeCode(t2) == TypeCode.CodeDouble)
         return TDouble;
      else
        throw new TypeException("Operands to + must be int or double.");
      }
    //...
  }
```
The function ``typeCode`` converts source language types to their codes:
```
  public static TypeCode typeCode (Type ty) ...
```
It can be implemented by using a visitor or the ``instanceof`` operator.


#NEW

===More help===

You don't need to debug completely empty files:
- for the grammars, you can pick rules from your Lab 1 (as indicated in Lab 2 PM)
- for the type checker, you can start from the "mini" implementation, in
  [``laborations/mini`` ../laborations/mini]


#NEW

==Lab 2 overview==

We will read through the [Lab PM ../laborations/lab2/lab2.html]

Preparation and exercise: write typing rules for Lab PM
constructs.