Book: 6.3, 6.5
To define what the program should do.
To guarantee that the program is meaningful.
To document the programmer's intentions.
To optimize the use of hardware.
Depending on language, the type checker can prevent
Languages differ greatly in how strict their static semantics is: none of the things above is checked by all programming languages!
In general, the more there is static checking in the compiler, the less need there is for manual debugging.
These formats are independent of implementation language.
Typing rules concern judgements of the form
E => e : T
where E is an environment, which contains e.g. typings of identifiers. The judgement says
Judgements are used in typing rules of the form
J1 J2 ... Jn --------------- C J
(n >= 0) which says
The judgements above the line in a rule are the premisses.
The judgement under the line is the conclusion.
The condition C
beside is a
side condition, typically not expressible as a judgement,
therefore not a premiss.
The judgement are written in a formal language, whereas side conditions can be written in natural language.
Typing rules for arithmetic expressions
E => e1 : int E => e2 : int E => e1 : int E => e2 : int ------------------------------- ------------------------------- E => e1 + e2 : int E => e1 * e2 : int ---------- x : T is in E ------------ i is an integer literal E => x : T E => i : int
Derivation of judgement x : int, y : int => x + 12 * y : int
x:int, y:int => 12 : int x:int, y:int => y : int --------------------------------------------------- x:int, y:int => x : int x:int, y:int => 12 * y : int ----------------------------------------------------------- x:int, y:int => x + 12 * y : int
We generalize the type checking context to an environment with two parts:
In the course of type checking, the signature remains the same throughout a program module, whereas the context changes all the time.
No expression in the language of Lab 2 has function types, because functions are never returned as values or used as arguments.
However, the compiler needs internally a data structure for function types, to hold the types of the parameters and the return type. E.g. for a function
bool between (int x, double a, double b) {...}
we write
between : (int, double, double) -> bool
to express this internal representation in typing rules.
Dividing the environment to signature F
and context G
,
F,G => J
Example: typing rule for a variable expression
------------- x : T is in G F,G => x : T
Example: typing rule for one-place function application
F,G => e : A --------------- f : (A) -> T is in F F,G => f(e) : T
Expressions have types, but statements do not.
However, also statements are checked in type checking.
We need a new judgement form, saying that a statement S
is valid:
F,G => S valid
Example: typing rule for an assignment
F,G => e : T -------------------- x : T is in G F,G => x = e ; valid
Example: typing rule for while loops
F,G => e : bool F,G => S valid -------------------------------- F,G => while (e) S valid
Contexts can be extended with new variables. The notation we use is
(G, x : T)
This corresponds to variable binding constructs, e.g. declarations.
Example: typing rule for a declaration; SS
is a sequence of statements
F,(G, x:t) => SS valid ------------------------- x not in G F,G => t x ; SS valid
The rule says: if SS
is a valid sequence of statements in context
(G, x : T)
, then t x ; SS
is valid in G
.
We prove that int x ; x = x + 5 ;
is valid in the empty context ()
.
x : int => x : int x : int => 5 : int ----------------------------------------- x : int => x + 5 : int -------------------------- x : int => x = x + 5 ; valid ------------------------------- () => int x ; x = x + 5 ; valid
The signature is omitted for simplicity.
The validity of a function definition.
F,(G, x1:A1,...,xn : An) => SS valid --------------------------------------------- f not in F F,G => T f (A1 x1,...,An xn) { SS } valid
More conditions:
x1 ... xn
are distinct
A1 ... An, T
are types (can be guaranteed by syntax)
return e
with e : T
(not always checked)
The typing rule for function applications is the following
F,G => f : (A1,...,An) -> T F,G => e1 : A1,..., en : An ---------------------------------------------------------- F,G => f(e1,...,en) : T
In C, C++, Java, Haskell, etc, variables on the same level must be distinct (e.g. in function parameter lists).
However, variables in an inner block are no longer on the same level and can hence overshadow outer variables
{ int x = 1 ; bool b ; x = x + 2 ; // x : int { double x = 2.0 ; x = x + 1.0 ; // x : double b = true ; // b : bool } x = x + 5 ; // x : int b = b && b ; // b : bool }
Variables declared in a block are discarded at exit from the block.
There is no limit in the number of block levels.
Implementation 1: with markers
x : int, b : bool, MARK, x : double
Implementation 2: with stacks of contexts (separated by ".", stack top is rightmost)
(x : int, b : bool).(x : double)
Implementation 2 can be done with lists of lists: the top of the stack is the head of the list.
The rules are expressed as checking if a list of statements SS is valid.
Instead of a single context, we have a stack of contexts Gs.G
where
we denote by G
the topmost context.
Declarations.
Gs.(G,x:t) => SS valid -------------------------- x not in G Gs.G => t x ; SS valid
Assignments.
Gs => e : t Gs => SS valid ------------------------------- x : t in Gs Gs => x = e ; SS valid
Blocks.
Gs.() => SS valid Gs => SSS valid ------------------------------------- Gs => { SS } SSS valid
The typing rule for variable expressions is now
-------------- x : T the closest entry for x in Gs.G Gs.G => x : T
Type checking: given a judgement G => e : T
, find out whether it
can be derived by the typing rules. The derivation is a tree of rule
application with the judgement as the last line.
Type inference: given an expression e
,
find a type T
in context G
such that G => e : T
can be derived by the typing rules.
For Java, C, and C++, we can mostly do with just type checking, because types are marked explicitly.
Haskell has type inference as well: if you don't give the type, the compiler can usually find the most general type.
We can classify checkers in terms of what they return:
True
or False
,
and may even crash (for instance, when variable lookup
just gives an error
is the variable is not found).
OK
or a message
saying where the error is.
To build a compiler back end, we need the third.
In Lab 2, we build the second.
Pass 1:
f
, update signature with f : T
Pass 2:
f
, check the function body of f
with respect to the type T
The expression checker consists of functions:
check (Exp e, Type t) returns void infer (Exp e) returns Type
These functions are defined by mutual recursion, by cases on the expression.
We also need to check function definitions and sequences of statements.
check (Def d) returns void check (Stms ss) returns void
All functions use an environment (= signature and stack of contexts).
The method is syntax-directed translation.
We show syntax-directed translation in pseudocode.
infer x = // variable x t := lookup(x) return t infer i = // integer literal i return int infer f(a1,..., an) = T := lookup(f) if T = (A1, ..., An) -> B check a1 : A1 ... check an : An return B else failure
Basic idea: from rule
J1 ... Jn ---------- C J
generate the code "upside down"
check J = check J1 ... check Jn check_condition C
Example:
Env => exp1 : bool Env => exp2 : bool check Env => exp1 && exp2 : bool = ----------------------------------------- check Env => exp1 : bool Env => exp1 && exp2 : bool check Env => exp1 : bool
Judgements are easy: recursive calls to check.
Env => exp : bool Env => stm valid check Env => while (exp) stm valid = ------------------------------------ check Env => exp : bool Env => while (exp) stm valid check Env => stm valid
Side conditions are unlimited code, so you have to think harder.
---------------- var : typ is in Env check Env => var : typ = Env => var : typ check_condition lookup(var,Env) == typ
It is lookup
and such conditions that in the end generate the error messages.
lookup(var,Env) = message ("variable " var "not found") // if var is not in Env check_condition x == y = message ("expected " y " but found " x) // if not equal
There is a grammar rule saying that expressions can be used as statements:
Stm ::= Exp ";"
How do we check that such statements are valid?
Env => exp : ? ------------------ Env => exp ; valid
The problem is that we have no type typ
to check exp : typ
.
Solution 1: check exp
with each of the four types
check Env => exp ; valid = try each typ in [bool,double,int,void]: check exp : typ
This is inefficient, and does not scale up to infinitely many types.
Solution 2: do type inference with exp
. If it succeeds, the statement
is valid - because expressions of any type can be used as statments.
The general scheme is a rule where the conclusion has a type depending in some way on the premises and the condition:
J1 ... Jn --------------------------------- C Env => exp : typ(J1, ..., Jn, C)
We should then use recursive calls of check
and infer
so that
Often the type is independent of the premisses (which still have to be checked of course!):
Env => exp1 : bool Env => exp2 : bool infer Env (exp1 && exp2) = ------------------------------------------ check Env => exp1 : bool Env => exp1 && exp2 : bool check Env => exp2 : bool return bool
It can also come from the condition:
---------------- var : typ is in Env infer Env var = Env => var : typ return lookup(var,Env)
Arithmetic operations in most languages are overloaded.
This means that they apply to many types.
The general rule for + - * /
is: both operands have the same type as the value,
which must be int
or double
.
Env => exp1 : typ Env => exp2 : typ -------------------------------------- typ is int or double Env => exp1 + exp2 : typ
What we do is infer the type of the first operand and check the second.
infer Env (exp1 + exp2) = typ := infer Env exp1 check_condition typ == int or typ == double check Env => exp2 : typ return typ
Also the comparison operators are overloaded, but
the return type is of course bool
.
Now we can check expression statements:
check Env => exp ; valid = infer Env exp
If infer
fails, we get any error message it generates.
If infer
succeeds, we discard the type.
In the same way, we only need to write infer
for expressions.
Then we define check
uniformly,
check Env => exp : typ = typ2 := infer Env exp check_condition typ2 == typ
The check_condition
call usually returns a message at failure, e.g.
TYPE ERROR type of exp: expected typ, inferred typ2
To check the whole program,
To check a function definition
return
, with an expression
that has the expected return type of the function (or just
a return
if the type is void
)
To check a sequence of statements
You can copy the contents of
laborations/lab2/haskell/
:
CPP.cf -- grammar lab2.hs -- main module Makefile TypeChecker.hs -- type checking module
You only have to modify CPP.cf
and TypeChecker.hs
.
But you can already compile them: just type
make
and run the type checker with
./lab2 <File>
The rest is "debugging the empty file"!
You don't have to write this - just copy the file
laborations/lab2/haskell/lab2.hs
.
This file shows how compiler phases are linked together.
check :: String -> IO () check s = case pProgram (myLexer s) of Bad err -> do putStrLn "SYNTAX ERROR" putStrLn err exitFailure Ok tree -> case typecheck tree of Bad err -> do putStrLn "TYPE ERROR" putStrLn err exitFailure Ok _ -> putStrLn "OK"
In other words: call the parser; if it succeeds, call the type checker.
Notice the use of the error type,
data Err a = Ok a | Bad String
The value is either Ok
of the expected type or Bad
with an error message.
The Err
type is generated by BNFC. One could also use Haskell's standard
type Either String a
.
The Err
type
data Err a = Ok a | Bad String
is a monad - a type of actions returning a
but also doing
other things (in this case: exceptions).
Monad actions can be sequenced: if
inferExp :: Env -> Exp -> Err Type
then you can make several inferences one after the other by using do
do inferExp env exp1 inferExp env exp2
You can bind variables returned from actions, and return values.
do typ1 <- inferExp env exp1 typ2 <- inferExp env exp2 return TBool
If you are only interested in side effects, use the dummy value type
()
(corresponds to void
in C and Java).
Environment type
type Env = (Sig,[Context]) -- signature and stack of contexts type Sig = [(Id,([Type],Type))] -- or Map Id ([Type],Type) type Context = [(Id,Type)] -- or Map Id Type
Auxiliary operations on the environment
lookVar :: Env -> Id -> Err Type lookFun :: Env -> Id -> Err ([Type],Type) updateVar :: Env -> Id -> Type -> Err Env updateFun :: Env -> Id -> ([Type],Type) -> Err Env newBlock :: Env -> Err Env emptyEnv :: Env
Keep the datatypes abstract, i.e. use them only via these operations. Then you can switch to another implementation if needed (more efficient, more stuff in the environment).
The environment datatypes and operations.
Type signatures of the checking methods
typecheck :: Program -> Err () -- required function in lab2 checkDef :: Env -> Def -> Err () -- check a function definition checkStms :: Env -> Type -> [Stm] -> Err () checkStm :: Env -> Type -> Stm -> Err Env checkExp :: Env -> Type -> Exp -> Err () inferExp :: Env -> Exp -> Err Type
Some other auxiliaries.
checkUnique :: (Ord a, Print a) => [a] -> Err () checkCondition :: Bool -> Err ()
checkStm :: Env -> Type -> Stm -> Err Env checkStm env val x = case x of SExp exp -> do inferExp env exp return env SDecl type' x -> updateVar env id type' -- also check that x is not in context already SWhile exp stm -> do checkExp env Type_bool exp checkStm env val stm checkExp :: Env -> Type -> Exp -> Err () checkExp env typ exp = do typ2 <- inferExp env exp if (typ2 == typ) then return () else fail $ "type of " ++ printTree exp -- ...
inferExp :: Env -> Exp -> Err Type inferExp env x = case x of ETrue -> return Type_bool EInt n -> return Type_int EId id -> lookVar env id EPIncr exp -> inferNumeric env exp ETimes exp0 exp -> inferNumericBin env exp0 exp inferNumeric :: Env -> Exp -> Err Type inferNumeric env exp = do typ <- inferExp env exp if elem typ [Type_int, Type_double] then return typ else fail $ "type of expression " ++ printTree exp -- ... inferNumericBin :: Env -> Exp -> Exp -> Err Type
You can copy the contents of
laborations/lab2/java/
:
CPP.cf -- grammar lab2 -- script running the type checker lab2.java -- main program Makefile TypeChecker.java -- type checker class TypeException.java -- exceptions for type checking
You only have to modify CPP.cf
and TypeChecker.java
.
But you can already compile them: just type
make
and run the type checker with
./lab2 <File>
The rest is "debugging the empty file"!
Before make
, you may have to set your class path so that it finds
java_cup and JLex, as well as the current directory.
export CLASSPATH=.:<path-to-JLex>:<path-to-CUP>:$CLASSPATH
This is given in
laborations/lab2/java/lab2.java
,
hence you don't have to write this.
It shows how compiler phases are linked together.
try { l = new Yylex(new FileReader(args[0])); parser p = new parser(l); CPP.Absyn.Program parse_tree = p.pProgram(); new TypeChecker().typecheck(parse_tree); } catch (TypeException e) { System.out.println("TYPE ERROR"); System.err.println(e.toString()); System.exit(1); } catch (IOException e) { System.err.println(e.toString()); System.exit(1); } catch (Throwable e) { System.out.println("SYNTAX ERROR"); System.out.println("At line " + String.valueOf(l.line_num()) + ", near \"" + l.buff() + "\" :"); System.out.println(" " + e.getMessage()); System.exit(1); }
Environment types
public static class FunType { public LinkedList<Type> args ; public Type val ; } public static class Env { public Map<String,FunType> signature ; public LinkedList<Map<String,Type>> contexts ; -- stack of contexts public static Type lookVar(String id) { ...} ; public static FunType lookFun(String id) { ...} ; public static void updateVar (String id, Type ty) {...} ; // ... }
The environment datatypes and operations.
An enumeration of codes for types.
public static enum TypeCode { CodeInt, CodeDouble, CodeBool, CodeVoid } ;
Notice that TypeCode
is not the same as the class
Type
, which is the syntactic category of source-language types.
We need TypeCode
to be able to compare types for equality,
and this happens when we compare an expected type with an inferred type.
Type signatures of the checking methods
public void typecheck(Program p) { } public static class CheckStm implements Stm.Visitor<Env,Env> { public Env visit(SDecl p, Env env) { } public Env visit(SExp p, Env env) { } // ... public static class InferExp implements Exp.Visitor<Type,Env> { public Type visit(EInt p, Env env) { } public Type visit(EAdd p, Env env) { } // ... }
public static class CheckStm implements Stm.Visitor<Env,Env> { public Env visit(SDecl p, Env env) { env.updateVar(p.id_,p.type_) ; return env ; } //... }
public static class InferExpType implements Exp.Visitor<Type,Env> { public Type visit(demo.Absyn.EPlus p, Env env) { Type t1 = p.exp_1.accept(this, env); Type t2 = p.exp_2.accept(this, env); if (typeCode(t1) == TypeCode.CodeInt && typeCode(t2) == TypeCode.CodeInt) return TInt; else if (typeCode(t1) == TypeCode.CodeDouble && typeCode(t2) == TypeCode.CodeDouble) return TDouble; else throw new TypeException("Operands to + must be int or double."); } //... }
The function typeCode
converts source language types to their codes:
public static TypeCode typeCode (Type ty) ...
It can be implemented by using a visitor or the instanceof
operator.
You don't need to debug completely empty files:
laborations/mini
We will read through the Lab PM
Preparation and exercise: write typing rules for Lab PM constructs.