Lecture 7: Type Checking
Programming Languages Course
Aarne Ranta (aarne@chalmers.se)
%!target:html
%!postproc(html): #NEW
%!postproc(html): #HR
%!postproc(html): #sub1 1
%!postproc(html): #subn n
Book: 6.3, 6.5
#NEW
==The purpose of types==
To define what the program should do.
- e.g. read an array of integers and return a double
To guarantee that the program is meaningful.
- that it does not add a string to an integer
- that variables are declared before they are used
To document the programmer's intentions.
- better than comments, which are not checked by the compiler
To optimize the use of hardware.
- reserve the minimal amount of memory, but not more
- use the most appropriate machine instructions
#NEW
==What belongs to type checking==
Depending on language, the type checker can prevent
- application of a function to wrong number of arguments,
- application of integer functions to floats,
- use of undeclared variables in expressions,
- functions that do not return values,
- division by zero
- array indices out of bounds,
- nonterminating recursion,
- sorting algorithms that don't sort...
Languages differ greatly in how strict their
static semantics is: none of the things above is
checked by all programming languages!
In general, the more there is static checking in the compiler,
the less need there is for manual debugging.
#NEW
==Description formats for different compiler phases==
These formats are independent of implementation language.
- Lexer: regular expressions
- Parser: BNF grammars
- **Type checker: typing rules**
- Interpreter: operational semantic rules
- Code generator: compilation schemes
#NEW
==Typing judgements and rules==
Typing rules concern **judgements** of the form
```
E => e : T
```
where //E// is an **environment**, which contains e.g. typings of identifiers.
The judgement says
- in the environment //E//, expression //e// has type //T//
Judgements are used in **typing rules** of the form
```
J1 J2 ... Jn
--------------- C
J
```
(n >= 0) which says
- from the judgements //J1, J2, ..., Jn// you may conclude //J//,
if condition //C// holds.
The judgements above the line in a rule are the **premisses**.
The judgement under the line is the **conclusion**.
The condition ``C`` beside is a
**side condition**, typically not expressible as a judgement,
therefore not a premiss.
The judgement are written in a formal language, whereas side conditions
can be written in natural language.
#NEW
==Examples of typing rules and derivation==
Typing rules for arithmetic expressions
```
E => e1 : int E => e2 : int E => e1 : int E => e2 : int
------------------------------- -------------------------------
E => e1 + e2 : int E => e1 * e2 : int
---------- x : T is in E ------------ i is an integer literal
E => x : T E => i : int
```
Derivation of judgement ``x : int, y : int => x + 12 * y : int``
```
x:int, y:int => 12 : int x:int, y:int => y : int
---------------------------------------------------
x:int, y:int => x : int x:int, y:int => 12 * y : int
-----------------------------------------------------------
x:int, y:int => x + 12 * y : int
```
#NEW
==Signature vs. context==
We generalize the type checking context to an **environment** with
two parts:
- **signature**, which shows the types of functions
- **context**, which shows the types of variables.
In the course of type checking, the signature remains the same
throughout a program module, whereas
the context changes all the time.
#NEW
==Function types==
No expression in the language of Lab 2 has **function types**,
because functions are never returned as values or used as arguments.
However, the compiler needs internally
a data structure for function types, to hold the
types of the parameters and the return type. E.g. for a function
```
bool between (int x, double a, double b) {...}
```
we write
```
between : (int, double, double) -> bool
```
to express this internal representation in typing rules.
#NEW
==Notation for signature and context==
Dividing the environment to signature ``F`` and context ``G``,
```
F,G => J
```
Example: typing rule for a variable expression
```
------------- x : T is in G
F,G => x : T
```
Example: typing rule for one-place function application
```
F,G => e : A
--------------- f : (A) -> T is in F
F,G => f(e) : T
```
#NEW
==The validity of statements==
Expressions have types, but statements do not.
However, also statements are checked in type checking.
We need a new judgement form, saying that a statement ``S`` is valid:
```
F,G => S valid
```
Example: typing rule for an assignment
```
F,G => e : T
-------------------- x : T is in G
F,G => x = e ; valid
```
Example: typing rule for while loops
```
F,G => e : bool F,G => S valid
--------------------------------
F,G => while (e) S valid
```
#NEW
==Variable binding and context==
Contexts can be **extended** with new variables. The notation we use is
```
(G, x : T)
```
This corresponds to **variable binding** constructs, e.g. declarations.
Example: typing rule for a declaration; ``SS`` is a sequence of statements
```
F,(G, x:t) => SS valid
------------------------- x not in G
F,G => t x ; SS valid
```
The rule says: if ``SS`` is a valid sequence of statements in context
``(G, x : T)``, then ``t x ; SS`` is valid in ``G``.
#NEW
==Example of variable binding and context==
We prove that ``int x ; x = x + 5 ;`` is valid in the empty context ``()``.
```
x : int => x : int x : int => 5 : int
-----------------------------------------
x : int => x + 5 : int
--------------------------
x : int => x = x + 5 ; valid
-------------------------------
() => int x ; x = x + 5 ; valid
```
The signature is omitted for simplicity.
#NEW
==Function definitions and applications==
The validity of a function definition.
```
F,(G, x#sub1:A#sub1,...,x#subn : A#subn) => SS valid
--------------------------------------------- f not in F
F,G => T f (A#sub1 x#sub1,...,A#subn x#subn) { SS } valid
```
More conditions:
- that ``x#sub1 ... x#subn`` are distinct
- that ``A#sub1 ... A#subn, T`` are types (can be guaranteed by syntax)
- that there is a ``return e`` with ``e : T`` (not always checked)
The typing rule for function applications is the following
```
F,G => f : (A#sub1,...,A#subn) -> T F,G => e#sub1 : A#sub1,..., e#subn : A#subn
----------------------------------------------------------
F,G => f(e#sub1,...,e#subn) : T
```
#NEW
==Block structure==
In C, C++, Java, Haskell, etc, variables on the same level must be distinct
(e.g. in function parameter lists).
However, variables in an **inner block** are no longer on the same level
and can hence **overshadow** outer variables
```
{
int x = 1 ;
bool b ;
x = x + 2 ; // x : int
{
double x = 2.0 ;
x = x + 1.0 ; // x : double
b = true ; // b : bool
}
x = x + 5 ; // x : int
b = b && b ; // b : bool
}
```
Variables declared in a block are discarded at exit from the block.
There is no limit in the number of block levels.
#NEW
==Contexts for block structure==
Implementation 1: with markers
```
x : int, b : bool, MARK, x : double
```
- entering a block: add a marker
- leaving a block: delete variables after last marker, and the marker itself
- update: after the last variable
- lookup: the latest occurrence of the variable
Implementation 2: with stacks of contexts (separated by ".", stack top is rightmost)
```
(x : int, b : bool).(x : double)
```
- entering a block: push an empty context on the stack
- leaving a block: pop the topmost context
- update: after the last variable in the topmost context
- lookup: the deepest surrounding occurrence of the variable
Implementation 2 can be done with lists of lists: the top of the stack is the head
of the list.
#NEW
==Typing rules for variables==
The rules are expressed as checking if a **list of statements** SS is valid.
Instead of a single context, we have a **stack of contexts** ``Gs.G`` where
we denote by ``G`` the topmost context.
Declarations.
```
Gs.(G,x:t) => SS valid
-------------------------- x not in G
Gs.G => t x ; SS valid
```
Assignments.
```
Gs => e : t Gs => SS valid
------------------------------- x : t in Gs
Gs => x = e ; SS valid
```
Blocks.
```
Gs.() => SS valid Gs => SSS valid
-------------------------------------
Gs => { SS } SSS valid
```
The typing rule for variable expressions is now
```
-------------- x : T the closest entry for x in Gs.G
Gs.G => x : T
```
#NEW
==Type checking and type inference==
**Type checking**: given a judgement ``G => e : T``, find out whether it
can be derived by the typing rules. The **derivation** is a tree of rule
application with the judgement as the last line.
**Type inference**: given an expression ``e``,
find a type ``T`` in context ``G`` such that ``G => e : T``
can be derived by the typing rules.
For Java, C, and C++, we can mostly do with just type checking,
because types are marked explicitly.
Haskell has type inference as well: if you don't give the type,
the compiler can usually find //the most general type//.
#NEW
==Different type checkers==
We can classify checkers in terms of what they return:
- A //rude checker//, which only says ``True`` or ``False``,
and may even crash (for instance, when variable lookup
just gives an ``error``is the variable is not found).
- An //error-reporting checker//, which returns ``OK`` or a message
saying where the error is.
- An //annotating checker//, which returns a syntax tree annotated
with more type information.
To build a compiler back end, we need the third.
In Lab 2, we build the second.
#NEW
==The passes of the type checker==
Pass 1:
- start with empty signature
- for each function ``f``, update signature with ``f : T``
Pass 2:
- for each ``f``, check the function body of ``f``
with respect to the type ``T``
The expression checker consists of functions:
```
check (Exp e, Type t) returns void
infer (Exp e) returns Type
```
These functions are defined by mutual recursion, by cases
on the expression.
We also need to check function definitions and sequences of statements.
```
check (Def d) returns void
check (Stms ss) returns void
```
All functions use an environment (= signature and stack of contexts).
The method is syntax-directed translation.
#NEW
==Examples of type checking==
We show syntax-directed translation in pseudocode.
```
infer x = // variable x
t := lookup(x)
return t
infer i = // integer literal i
return int
infer f(a#sub1,..., a#subn) =
T := lookup(f)
if T = (A#sub1, ..., A#subn) -> B
check a#sub1 : A#sub1
...
check a#subn : A#subn
return B
else failure
```
#NEW
==From typing rules to type checking code==
Basic idea: from rule
```
J#sub1 ... J#subn
---------- C
J
```
generate the code "upside down"
```
check J =
check J#sub1
...
check J#subn
check_condition C
```
Example:
```
Env => exp1 : bool Env => exp2 : bool check Env => exp1 && exp2 : bool =
----------------------------------------- check Env => exp1 : bool
Env => exp1 && exp2 : bool check Env => exp1 : bool
```
#NEW
==From typing rules to type checking code: more examples==
Judgements are easy: recursive calls to check.
```
Env => exp : bool Env => stm valid check Env => while (exp) stm valid =
------------------------------------ check Env => exp : bool
Env => while (exp) stm valid check Env => stm valid
```
Side conditions are unlimited code, so you have to think harder.
```
---------------- var : typ is in Env check Env => var : typ =
Env => var : typ check_condition lookup(var,Env) == typ
```
It is ``lookup`` and such conditions that in the end generate the error messages.
```
lookup(var,Env) = message ("variable " var "not found") // if var is not in Env
check_condition x == y = message ("expected " y " but found " x) // if not equal
```
#NEW
==The need of type inference==
There is a grammar rule saying that expressions can be used as statements:
```
Stm ::= Exp ";"
```
How do we check that such statements are valid?
```
Env => exp : ?
------------------
Env => exp ; valid
```
The problem is that we have no type ``typ`` to check ``exp : typ``.
Solution 1: check ``exp`` with each of the four types
```
check Env => exp ; valid =
try each typ in [bool,double,int,void]:
check exp : typ
```
This is inefficient, and does not scale up to infinitely many types.
Solution 2: do type inference with ``exp``. If it succeeds, the statement
is valid - because expressions of any type can be used as statments.
#NEW
==Type inference==
The general scheme is a rule where the conclusion has a type depending in
some way on the premises and the condition:
```
J#sub1 ... J#subn
--------------------------------- C
Env => exp : typ(J#sub1, ..., J#subn, C)
```
We should then use recursive calls of ``check`` and ``infer`` so that
- everything we need for constructing the type is inferred
- everything else is just checked
Often the type is independent of the premisses (which still have to be checked of course!):
```
Env => exp1 : bool Env => exp2 : bool infer Env (exp1 && exp2) =
------------------------------------------ check Env => exp1 : bool
Env => exp1 && exp2 : bool check Env => exp2 : bool
return bool
```
It can also come from the condition:
```
---------------- var : typ is in Env infer Env var =
Env => var : typ return lookup(var,Env)
```
#NEW
==Type checking overloaded operations==
Arithmetic operations in most languages are **overloaded**.
This means that they apply to many types.
The general rule for ``+ - * /`` is: both operands have the same type as the value,
which must be ``int`` or ``double``.
```
Env => exp1 : typ Env => exp2 : typ
-------------------------------------- typ is int or double
Env => exp1 + exp2 : typ
```
What we do is infer the type of the first operand and check the second.
```
infer Env (exp1 + exp2) =
typ := infer Env exp1
check_condition typ == int or typ == double
check Env => exp2 : typ
return typ
```
Also the comparison operators are overloaded, but
the return type is of course ``bool``.
#NEW
==Relating inference and checking==
Now we can check expression statements:
```
check Env => exp ; valid =
infer Env exp
```
If ``infer`` fails, we get any error message it generates.
If ``infer`` succeeds, we discard the type.
In the same way, we only need to write ``infer`` for expressions.
Then we define ``check`` uniformly,
```
check Env => exp : typ =
typ2 := infer Env exp
check_condition typ2 == typ
```
The ``check_condition`` call usually returns a message at failure, e.g.
```
TYPE ERROR
type of exp: expected typ, inferred typ2
```
#NEW
==The top-level checkers==
To check the whole program,
+ collect the types of each function into the signature
+ check that function names are unique
+ check each function definition using the signature
To check a function definition
+ check that argument variables are unique
+ initialize the topmost context with the argument variables
+ check the body in this context
+ check that there is a ``return``, with an expression
that has the expected return type of the function (or just
a ``return`` if the type is ``void``)
To check a sequence of statements
+ check the validity of the first statement and update the environment
if appropriate
+ check the remaining sequence in the new environment
+ an empty sequence is always valid
#NEW
==Type checker in Haskell==
You can copy the contents of
[``laborations/lab2/haskell/`` ../laborations/lab2/haskell]:
```
CPP.cf -- grammar
lab2.hs -- main module
Makefile
TypeChecker.hs -- type checking module
```
You only have to modify ``CPP.cf`` and ``TypeChecker.hs``.
But you can already compile them: just type
```
make
```
and run the type checker with
```
./lab2
```
The rest is "debugging the empty file"!
#NEW
===The Main module===
You don't have to write this - just copy the file
[``laborations/lab2/haskell/lab2.hs`` ../laborations/lab2/haskell/lab2.hs].
This file shows how compiler phases are linked together.
```
check :: String -> IO ()
check s = case pProgram (myLexer s) of
Bad err -> do putStrLn "SYNTAX ERROR"
putStrLn err
exitFailure
Ok tree -> case typecheck tree of
Bad err -> do putStrLn "TYPE ERROR"
putStrLn err
exitFailure
Ok _ -> putStrLn "OK"
```
In other words: call the parser; if it succeeds, call the type checker.
Notice the use of the **error type**,
```
data Err a = Ok a | Bad String
```
The value is either ``Ok`` of the expected type or ``Bad``
with an error message.
The ``Err`` type is generated by BNFC. One could also use Haskell's standard
type ``Either String a``.
#NEW
===Using the Err type===
The ``Err`` type
```
data Err a = Ok a | Bad String
```
is a **monad** - a type of actions returning ``a`` but also doing
other things (in this case: exceptions).
Monad actions can be **sequence**d: if
```
inferExp :: Env -> Exp -> Err Type
```
then you can make several inferences one after the other by using ``do``
```
do inferExp env exp1
inferExp env exp2
```
You can **bind** variables returned from actions, and **return**
values.
```
do typ1 <- inferExp env exp1
typ2 <- inferExp env exp2
return TBool
```
If you are only interested in side effects, use the dummy value type
``()`` (corresponds to ``void`` in C and Java).
#NEW
==Symbol tables==
Environment type
```
type Env = (Sig,[Context]) -- signature and stack of contexts
type Sig = [(Id,([Type],Type))] -- or Map Id ([Type],Type)
type Context = [(Id,Type)] -- or Map Id Type
```
Auxiliary operations on the environment
```
lookVar :: Env -> Id -> Err Type
lookFun :: Env -> Id -> Err ([Type],Type)
updateVar :: Env -> Id -> Type -> Err Env
updateFun :: Env -> Id -> ([Type],Type) -> Err Env
newBlock :: Env -> Err Env
emptyEnv :: Env
```
Keep the datatypes abstract, i.e. use them only via these operations.
Then you can switch to another implementation if needed (more efficient,
more stuff in the environment).
#NEW
===The TypeCheck module===
The environment datatypes and operations.
Type signatures of the checking methods
```
typecheck :: Program -> Err () -- required function in lab2
checkDef :: Env -> Def -> Err () -- check a function definition
checkStms :: Env -> Type -> [Stm] -> Err ()
checkStm :: Env -> Type -> Stm -> Err Env
checkExp :: Env -> Type -> Exp -> Err ()
inferExp :: Env -> Exp -> Err Type
```
Some other auxiliaries.
```
checkUnique :: (Ord a, Print a) => [a] -> Err ()
checkCondition :: Bool -> Err ()
```
#NEW
===Some examples of checking===
```
checkStm :: Env -> Type -> Stm -> Err Env
checkStm env val x = case x of
SExp exp -> do
inferExp env exp
return env
SDecl type' x ->
updateVar env id type' -- also check that x is not in context already
SWhile exp stm -> do
checkExp env Type_bool exp
checkStm env val stm
checkExp :: Env -> Type -> Exp -> Err ()
checkExp env typ exp = do
typ2 <- inferExp env exp
if (typ2 == typ) then
return ()
else
fail $ "type of " ++ printTree exp -- ...
```
#NEW
===Some examples of type inference===
```
inferExp :: Env -> Exp -> Err Type
inferExp env x = case x of
ETrue -> return Type_bool
EInt n -> return Type_int
EId id -> lookVar env id
EPIncr exp -> inferNumeric env exp
ETimes exp0 exp -> inferNumericBin env exp0 exp
inferNumeric :: Env -> Exp -> Err Type
inferNumeric env exp = do
typ <- inferExp env exp
if elem typ [Type_int, Type_double] then
return typ
else
fail $ "type of expression " ++ printTree exp -- ...
inferNumericBin :: Env -> Exp -> Exp -> Err Type
```
#NEW
==Type checker in Java==
You can copy the contents of
[``laborations/lab2/java/`` ../laborations/lab2/java1.5]:
```
CPP.cf -- grammar
lab2 -- script running the type checker
lab2.java -- main program
Makefile
TypeChecker.java -- type checker class
TypeException.java -- exceptions for type checking
```
You only have to modify ``CPP.cf`` and ``TypeChecker.java``.
But you can already compile them: just type
```
make
```
and run the type checker with
```
./lab2
```
The rest is "debugging the empty file"!
Before ``make``, you may have to set your class path so that it finds
java_cup and JLex, as well as the current directory.
```
export CLASSPATH=.:::$CLASSPATH
```
#NEW
===The Main module===
This is given in
[``laborations/lab2/java/lab2.java`` ../laborations/lab2/java1.5/lab2.java],
hence you don't have to write this.
It shows how compiler phases are linked together.
```
try {
l = new Yylex(new FileReader(args[0]));
parser p = new parser(l);
CPP.Absyn.Program parse_tree = p.pProgram();
new TypeChecker().typecheck(parse_tree);
} catch (TypeException e) {
System.out.println("TYPE ERROR");
System.err.println(e.toString());
System.exit(1);
} catch (IOException e) {
System.err.println(e.toString());
System.exit(1);
} catch (Throwable e) {
System.out.println("SYNTAX ERROR");
System.out.println("At line " + String.valueOf(l.line_num())
+ ", near \"" + l.buff() + "\" :");
System.out.println(" " + e.getMessage());
System.exit(1);
}
```
#NEW
==Symbol tables==
Environment types
```
public static class FunType {
public LinkedList args ;
public Type val ;
}
public static class Env {
public Map signature ;
public LinkedList