Lecture 7: Type Checking

Programming Languages Course
Aarne Ranta (aarne@chalmers.se)

Book: 6.3, 6.5

The purpose of types

To define what the program should do.

e.g. read an array of integers and return a double

To guarantee that the program is meaningful.

that it does not add a string to an integer
that variables are declared before they are used

To document the programmer's intentions.

better than comments, which are not checked by the compiler

To optimize the use of hardware.

reserve the minimal amount of memory, but not more
use the most appropriate machine instructions

What belongs to type checking

Depending on language, the type checker can prevent

application of a function to wrong number of arguments,
application of integer functions to floats,
use of undeclared variables in expressions,
functions that do not return values,
division by zero
array indices out of bounds,
nonterminating recursion,
sorting algorithms that don't sort...

Languages differ greatly in how strict their static semantics is: none of the things above is checked by all programming languages!

In general, the more there is static checking in the compiler, the less need there is for manual debugging.

Description formats for different compiler phases

These formats are independent of implementation language.

Lexer: regular expressions
Parser: BNF grammars
Type checker: typing rules
Interpreter: operational semantic rules
Code generator: compilation schemes

Typing judgements and rules

Typing rules concern judgements of the form

     E => e : T

where E is an environment, which contains e.g. typings of identifiers. The judgement says

in the environment E, expression e has type T

Judgements are used in typing rules of the form

     J1  J2  ...  Jn
     --------------- C
           J

(n >= 0) which says

from the judgements J1, J2, ..., Jn you may conclude J, if condition C holds.

The judgements above the line in a rule are the premisses.

The judgement under the line is the conclusion.

The condition C beside is a side condition, typically not expressible as a judgement, therefore not a premiss.

The judgement are written in a formal language, whereas side conditions can be written in natural language.

Examples of typing rules and derivation

Typing rules for arithmetic expressions

    E => e1 : int     E => e2 : int    E => e1 : int     E => e2 : int
    -------------------------------    -------------------------------
         E => e1 + e2 : int                E => e1 * e2 : int
  
  
         ---------- x : T is in E      ------------ i is an integer literal
         E => x : T                    E => i : int

Derivation of judgement x : int, y : int => x + 12 * y : int

                              x:int, y:int => 12 : int    x:int, y:int => y : int
                              ---------------------------------------------------  
      x:int, y:int => x : int        x:int, y:int => 12 * y : int
      -----------------------------------------------------------
                  x:int, y:int => x + 12 * y : int

Signature vs. context

We generalize the type checking context to an environment with two parts:

signature, which shows the types of functions
context, which shows the types of variables.

In the course of type checking, the signature remains the same throughout a program module, whereas the context changes all the time.

Function types

No expression in the language of Lab 2 has function types, because functions are never returned as values or used as arguments.

However, the compiler needs internally a data structure for function types, to hold the types of the parameters and the return type. E.g. for a function

    bool between (int x, double a, double b) {...}

we write

    between : (int, double, double) -> bool

to express this internal representation in typing rules.

Notation for signature and context

Dividing the environment to signature F and context G,

    F,G => J

Example: typing rule for a variable expression

    ------------- x : T is in G                
    F,G => x : T

Example: typing rule for one-place function application

    F,G => e : A
    --------------- f : (A) -> T is in F                
    F,G => f(e) : T

The validity of statements

Expressions have types, but statements do not.

However, also statements are checked in type checking.

We need a new judgement form, saying that a statement S is valid:

    F,G => S valid

Example: typing rule for an assignment

    F,G => e : T
    -------------------- x : T is in G                
    F,G => x = e ; valid

Example: typing rule for while loops

    F,G => e : bool   F,G => S valid
    --------------------------------
    F,G => while (e) S valid

Variable binding and context

Contexts can be extended with new variables. The notation we use is

    (G, x : T)

This corresponds to variable binding constructs, e.g. declarations.

Example: typing rule for a declaration; SS is a sequence of statements

    F,(G, x:t) => SS valid
    ------------------------- x not in G
    F,G => t x ; SS valid

The rule says: if SS is a valid sequence of statements in context (G, x : T), then t x ; SS is valid in G.

Example of variable binding and context

We prove that int x ; x = x + 5 ; is valid in the empty context ().

    x : int => x : int     x : int => 5 : int
    -----------------------------------------
          x : int => x + 5 : int
        --------------------------
        x : int => x = x + 5 ; valid
      -------------------------------
      () => int x ; x = x + 5 ; valid

The signature is omitted for simplicity.

Function definitions and applications

The validity of a function definition.

    F,(G, x₁:A₁,...,x_n : A_n) => SS valid
    --------------------------------------------- f not in F
    F,G => T f (A₁ x₁,...,A_n x_n) { SS } valid

More conditions:

that x₁ ... x_n are distinct
that A₁ ... A_n, T are types (can be guaranteed by syntax)
that there is a return e with e : T (not always checked)

The typing rule for function applications is the following

    F,G => f : (A₁,...,A_n) -> T    F,G => e₁ : A₁,..., e_n : A_n
    ----------------------------------------------------------
                   F,G => f(e₁,...,e_n) : T

Block structure

In C, C++, Java, Haskell, etc, variables on the same level must be distinct (e.g. in function parameter lists).

However, variables in an inner block are no longer on the same level and can hence overshadow outer variables

    {
      int x = 1 ;
      bool b ;
      x = x + 2 ;         // x : int
      {
        double x = 2.0 ;
        x = x + 1.0 ;     // x : double
        b = true ;        // b : bool
      }
      x = x + 5 ;         // x : int
      b = b && b ;        // b : bool
    }

Variables declared in a block are discarded at exit from the block.

There is no limit in the number of block levels.

Contexts for block structure

Implementation 1: with markers

    x : int, b : bool, MARK, x : double

entering a block: add a marker
leaving a block: delete variables after last marker, and the marker itself
update: after the last variable
lookup: the latest occurrence of the variable

Implementation 2: with stacks of contexts (separated by ".", stack top is rightmost)

    (x : int, b : bool).(x : double)

entering a block: push an empty context on the stack
leaving a block: pop the topmost context
update: after the last variable in the topmost context
lookup: the deepest surrounding occurrence of the variable

Implementation 2 can be done with lists of lists: the top of the stack is the head of the list.

Typing rules for variables

The rules are expressed as checking if a list of statements SS is valid. Instead of a single context, we have a stack of contexts Gs.G where we denote by G the topmost context.

Declarations.

    Gs.(G,x:t) => SS valid
    -------------------------- x not in G
    Gs.G => t x ; SS valid

Assignments.

    Gs => e : t     Gs => SS valid
    ------------------------------- x : t in Gs  
    Gs => x = e ; SS valid

Blocks.

    Gs.() => SS valid      Gs => SSS valid
    -------------------------------------
    Gs => { SS } SSS valid

The typing rule for variable expressions is now

    -------------- x : T the closest entry for x in Gs.G
    Gs.G => x : T

Type checking and type inference

Type checking: given a judgement G => e : T, find out whether it can be derived by the typing rules. The derivation is a tree of rule application with the judgement as the last line.

Type inference: given an expression e, find a type T in context G such that G => e : T can be derived by the typing rules.

For Java, C, and C++, we can mostly do with just type checking, because types are marked explicitly.

Haskell has type inference as well: if you don't give the type, the compiler can usually find the most general type.

Different type checkers

We can classify checkers in terms of what they return:

A rude checker, which only says True or False, and may even crash (for instance, when variable lookup just gives an erroris the variable is not found).
An error-reporting checker, which returns OK or a message saying where the error is.
An annotating checker, which returns a syntax tree annotated with more type information.

To build a compiler back end, we need the third.

In Lab 2, we build the second.

The passes of the type checker

Pass 1:

start with empty signature
for each function f, update signature with f : T

Pass 2:

for each f, check the function body of f with respect to the type T

The expression checker consists of functions:

    check (Exp e, Type t)  returns void
    infer (Exp e)          returns Type

These functions are defined by mutual recursion, by cases on the expression.

We also need to check function definitions and sequences of statements.

    check (Def  d)   returns void
    check (Stms ss)  returns void

All functions use an environment (= signature and stack of contexts).

The method is syntax-directed translation.

Examples of type checking

We show syntax-directed translation in pseudocode.

    infer x =              // variable x
      t := lookup(x)
      return t
  
    infer i =              // integer literal i
      return int
  
    infer f(a₁,..., a_n) =
      T := lookup(f)
      if T = (A₁, ..., A_n) -> B
        check a₁ : A₁
        ...
        check a_n : A_n
        return B
      else failure

From typing rules to type checking code

Basic idea: from rule

     J₁ ...  J_n
     ---------- C
         J

generate the code "upside down"

    check J =
      check J₁
      ...
      check J_n
      check_condition C

Example:

    Env => exp1 : bool     Env => exp2 : bool     check Env => exp1 && exp2 : bool =
    -----------------------------------------       check Env => exp1 : bool
         Env => exp1 && exp2 : bool                 check Env => exp1 : bool

From typing rules to type checking code: more examples

Judgements are easy: recursive calls to check.

    Env => exp : bool   Env => stm valid      check Env => while (exp) stm valid =
    ------------------------------------        check Env => exp : bool   
        Env => while (exp) stm valid            check Env => stm valid

Side conditions are unlimited code, so you have to think harder.

    ---------------- var : typ is in Env      check Env => var : typ =
    Env => var : typ                            check_condition lookup(var,Env) == typ

It is lookup and such conditions that in the end generate the error messages.

    lookup(var,Env) = message ("variable " var "not found") // if var is not in Env
  
    check_condition x == y = message ("expected " y " but found " x) // if not equal

The need of type inference

There is a grammar rule saying that expressions can be used as statements:

    Stm ::= Exp ";"

How do we check that such statements are valid?

    Env => exp : ?
    ------------------
    Env => exp ; valid

The problem is that we have no type typ to check exp : typ.

Solution 1: check exp with each of the four types

    check Env => exp ; valid =
      try each typ in  [bool,double,int,void]: 
        check exp : typ

This is inefficient, and does not scale up to infinitely many types.

Solution 2: do type inference with exp. If it succeeds, the statement is valid - because expressions of any type can be used as statments.

Type inference

The general scheme is a rule where the conclusion has a type depending in some way on the premises and the condition:

           J₁ ...  J_n
    --------------------------------- C
    Env => exp : typ(J₁, ..., J_n, C)

We should then use recursive calls of check and infer so that

everything we need for constructing the type is inferred
everything else is just checked

Often the type is independent of the premisses (which still have to be checked of course!):

    Env => exp1 : bool      Env => exp2 : bool     infer Env (exp1 && exp2) =
    ------------------------------------------       check Env => exp1 : bool
         Env => exp1 && exp2 : bool                  check Env => exp2 : bool
                                                     return bool

It can also come from the condition:

        ---------------- var : typ is in Env       infer Env var =
        Env => var : typ                             return lookup(var,Env)

Type checking overloaded operations

Arithmetic operations in most languages are overloaded.

This means that they apply to many types.

The general rule for + - * / is: both operands have the same type as the value, which must be int or double.

    Env => exp1 : typ    Env => exp2 : typ
    -------------------------------------- typ is int or double
        Env => exp1 + exp2 : typ

What we do is infer the type of the first operand and check the second.

    infer Env (exp1 + exp2) =
      typ := infer Env exp1
      check_condition typ == int or typ == double
      check Env => exp2 : typ
      return typ

Also the comparison operators are overloaded, but the return type is of course bool.

Relating inference and checking

Now we can check expression statements:

    check Env => exp ; valid =
      infer Env exp

If infer fails, we get any error message it generates.

If infer succeeds, we discard the type.

In the same way, we only need to write infer for expressions. Then we define check uniformly,

    check Env => exp : typ =
      typ2 := infer Env exp
      check_condition typ2 == typ

The check_condition call usually returns a message at failure, e.g.

      TYPE ERROR
      type of exp: expected typ, inferred typ2

The top-level checkers

To check the whole program,

collect the types of each function into the signature
check that function names are unique
check each function definition using the signature

To check a function definition

check that argument variables are unique
initialize the topmost context with the argument variables
check the body in this context
check that there is a return, with an expression that has the expected return type of the function (or just a return if the type is void)

To check a sequence of statements

check the validity of the first statement and update the environment if appropriate
check the remaining sequence in the new environment
an empty sequence is always valid

Type checker in Haskell

You can copy the contents of laborations/lab2/haskell/:

    CPP.cf           -- grammar
    lab2.hs          -- main module
    Makefile         
    TypeChecker.hs   -- type checking module

You only have to modify CPP.cf and TypeChecker.hs.

But you can already compile them: just type

    make

and run the type checker with

    ./lab2 <File>

The rest is "debugging the empty file"!

The Main module

You don't have to write this - just copy the file laborations/lab2/haskell/lab2.hs.

This file shows how compiler phases are linked together.

    check :: String -> IO () 
    check s = case pProgram (myLexer s) of
                Bad err  -> do putStrLn "SYNTAX ERROR"
                               putStrLn err
                               exitFailure 
                Ok  tree -> case typecheck tree of
                              Bad err -> do putStrLn "TYPE ERROR"
                                            putStrLn err
                                            exitFailure 
                              Ok _    -> putStrLn "OK"

In other words: call the parser; if it succeeds, call the type checker.

Notice the use of the error type,

    data Err a = Ok a | Bad String

The value is either Ok of the expected type or Bad with an error message.

The Err type is generated by BNFC. One could also use Haskell's standard type Either String a.

Using the Err type

The Err type

    data Err a = Ok a | Bad String

is a monad - a type of actions returning a but also doing other things (in this case: exceptions).

Monad actions can be sequenced: if

    inferExp :: Env -> Exp -> Err Type

then you can make several inferences one after the other by using do

    do inferExp env exp1
       inferExp env exp2

You can bind variables returned from actions, and return values.

    do typ1 <- inferExp env exp1
       typ2 <- inferExp env exp2
       return TBool

If you are only interested in side effects, use the dummy value type () (corresponds to void in C and Java).

Symbol tables

Environment type

    type Env = (Sig,[Context])       -- signature and stack of contexts
    type Sig = [(Id,([Type],Type))]  -- or Map Id ([Type],Type)
    type Context = [(Id,Type)]       -- or Map Id Type

Auxiliary operations on the environment

    lookVar   :: Env -> Id -> Err Type
    lookFun   :: Env -> Id -> Err ([Type],Type)
    updateVar :: Env -> Id -> Type -> Err Env
    updateFun :: Env -> Id -> ([Type],Type) -> Err Env
    newBlock  :: Env -> Err Env
    emptyEnv  :: Env

Keep the datatypes abstract, i.e. use them only via these operations. Then you can switch to another implementation if needed (more efficient, more stuff in the environment).

The TypeCheck module

The environment datatypes and operations.

Type signatures of the checking methods

    typecheck :: Program -> Err ()                -- required function in lab2
    checkDef  :: Env -> Def -> Err ()             -- check a function definition
    checkStms :: Env -> Type -> [Stm] -> Err ()
    checkStm  :: Env -> Type -> Stm -> Err Env
    checkExp  :: Env -> Type -> Exp -> Err ()
    inferExp  :: Env -> Exp  -> Err Type

Some other auxiliaries.

    checkUnique    :: (Ord a, Print a) => [a] -> Err ()
    checkCondition :: Bool -> Err ()

Some examples of checking

    checkStm :: Env -> Type -> Stm -> Err Env
    checkStm env val x = case x of
      SExp exp  -> do
        inferExp env exp
        return env
      SDecl type' x  -> 
        updateVar env id type'   -- also check that x is not in context already
      SWhile exp stm  -> do
        checkExp env Type_bool exp
        checkStm env val stm
  
    checkExp :: Env -> Type -> Exp -> Err ()
    checkExp env typ exp = do
      typ2 <- inferExp env exp
      if (typ2 == typ) then
          return ()
        else
          fail $ "type of " ++ printTree exp -- ...

Some examples of type inference

    inferExp :: Env -> Exp -> Err Type
    inferExp env x = case x of
      ETrue      -> return Type_bool
      EInt n     -> return Type_int
      EId id     -> lookVar env id
      EPIncr exp -> inferNumeric env exp
      ETimes exp0 exp -> inferNumericBin env exp0 exp
  
    inferNumeric :: Env -> Exp -> Err Type
    inferNumeric env exp = do
      typ <- inferExp env exp
      if elem typ [Type_int, Type_double] then
          return typ
        else
          fail $ "type of expression " ++ printTree exp -- ...
  
    inferNumericBin :: Env -> Exp -> Exp -> Err Type

Type checker in Java

You can copy the contents of laborations/lab2/java/:

    CPP.cf             -- grammar
    lab2               -- script running the type checker
    lab2.java          -- main program
    Makefile
    TypeChecker.java   -- type checker class
    TypeException.java -- exceptions for type checking

You only have to modify CPP.cf and TypeChecker.java.

But you can already compile them: just type

    make

and run the type checker with

    ./lab2 <File>

The rest is "debugging the empty file"!

Before make, you may have to set your class path so that it finds java_cup and JLex, as well as the current directory.

    export CLASSPATH=.:<path-to-JLex>:<path-to-CUP>:$CLASSPATH

The Main module

This is given in laborations/lab2/java/lab2.java, hence you don't have to write this.

It shows how compiler phases are linked together.

  try {
  	l = new Yylex(new FileReader(args[0]));
  	parser p = new parser(l);
  	CPP.Absyn.Program parse_tree = p.pProgram();
  	new TypeChecker().typecheck(parse_tree);
  
  } catch (TypeException e) {
  	System.out.println("TYPE ERROR");
  	System.err.println(e.toString());
  	System.exit(1);
  } catch (IOException e) {
  	System.err.println(e.toString());
  	System.exit(1);
  } catch (Throwable e) {
  	System.out.println("SYNTAX ERROR");
  	System.out.println("At line " + String.valueOf(l.line_num()) 
  			   + ", near \"" + l.buff() + "\" :");
  	System.out.println("     " + e.getMessage());
  	System.exit(1);
  }

Symbol tables

Environment types

    public static class FunType {
      public LinkedList<Type> args ;
      public Type val ;
    }
  
    public static class Env {
      public Map<String,FunType> signature ;
      public LinkedList<Map<String,Type>> contexts ;  -- stack of contexts
  
      public static Type lookVar(String id) { ...} ;
      public static FunType lookFun(String id) { ...} ;
      public static void updateVar (String id, Type ty) {...} ;
      // ...
    }

The TypeCheck module

The environment datatypes and operations.

An enumeration of codes for types.

    public static enum TypeCode { CodeInt, CodeDouble, CodeBool, CodeVoid } ;

Notice that TypeCode is not the same as the class Type, which is the syntactic category of source-language types. We need TypeCode to be able to compare types for equality, and this happens when we compare an expected type with an inferred type.

Type signatures of the checking methods

    public void typecheck(Program p) {
    }
  
    public static class CheckStm implements Stm.Visitor<Env,Env> {
      public Env visit(SDecl p, Env env) {
      }
      public Env visit(SExp p, Env env) {
      }
      // ...
  
    public static class InferExp implements Exp.Visitor<Type,Env> {
      public Type visit(EInt p, Env env) {
      }
      public Type visit(EAdd p, Env env) {
      }
      // ...
  
    }

Some examples of checking

  public static class CheckStm implements Stm.Visitor<Env,Env> {
  
      public Env visit(SDecl p, Env env) {
        env.updateVar(p.id_,p.type_) ;
        return env ;
      }
  
     //...
    }

Some examples of type inference

    public static class InferExpType implements Exp.Visitor<Type,Env> {
  
      public Type visit(demo.Absyn.EPlus p, Env env) {
        Type t1 = p.exp_1.accept(this, env);
        Type t2 = p.exp_2.accept(this, env);
  
        if (typeCode(t1) == TypeCode.CodeInt && typeCode(t2) == TypeCode.CodeInt)
           return TInt;
        else
        if (typeCode(t1) == TypeCode.CodeDouble && typeCode(t2) == TypeCode.CodeDouble)
           return TDouble;
        else
          throw new TypeException("Operands to + must be int or double.");
        }
      //...
    }

The function typeCode converts source language types to their codes:

    public static TypeCode typeCode (Type ty) ...

It can be implemented by using a visitor or the instanceof operator.

More help

You don't need to debug completely empty files:

for the grammars, you can pick rules from your Lab 1 (as indicated in Lab 2 PM)
for the type checker, you can start from the "mini" implementation, in laborations/mini

Lab 2 overview

We will read through the Lab PM

Preparation and exercise: write typing rules for Lab PM constructs.