Lecture 7: Type Checking Programming Languages Course Aarne Ranta (aarne@chalmers.se) %!target:html %!postproc(html): #NEW %!postproc(html): #HR
%!postproc(html): #sub1 1 %!postproc(html): #subn n Book: 6.3, 6.5 #NEW ==The purpose of types== To define what the program should do. - e.g. read an array of integers and return a double To guarantee that the program is meaningful. - that it does not add a string to an integer - that variables are declared before they are used To document the programmer's intentions. - better than comments, which are not checked by the compiler To optimize the use of hardware. - reserve the minimal amount of memory, but not more - use the most appropriate machine instructions #NEW ==What belongs to type checking== Depending on language, the type checker can prevent - application of a function to wrong number of arguments, - application of integer functions to floats, - use of undeclared variables in expressions, - functions that do not return values, - division by zero - array indices out of bounds, - nonterminating recursion, - sorting algorithms that don't sort... Languages differ greatly in how strict their static semantics is: none of the things above is checked by all programming languages! In general, the more there is static checking in the compiler, the less need there is for manual debugging. #NEW ==Description formats for different compiler phases== These formats are independent of implementation language. - Lexer: regular expressions - Parser: BNF grammars - **Type checker: typing rules** - Interpreter: operational semantic rules - Code generator: compilation schemes #NEW ==Typing judgements and rules== Typing rules concern **judgements** of the form ``` E => e : T ``` where //E// is an **environment**, which contains e.g. typings of identifiers. The judgement says - in the environment //E//, expression //e// has type //T// Judgements are used in **typing rules** of the form ``` J1 J2 ... Jn --------------- C J ``` (n >= 0) which says - from the judgements //J1, J2, ..., Jn// you may conclude //J//, if condition //C// holds. The judgements above the line in a rule are the **premisses**. The judgement under the line is the **conclusion**. The condition ``C`` beside is a **side condition**, typically not expressible as a judgement, therefore not a premiss. The judgement are written in a formal language, whereas side conditions can be written in natural language. #NEW ==Examples of typing rules and derivation== Typing rules for arithmetic expressions ``` E => e1 : int E => e2 : int E => e1 : int E => e2 : int ------------------------------- ------------------------------- E => e1 + e2 : int E => e1 * e2 : int ---------- x : T is in E ------------ i is an integer literal E => x : T E => i : int ``` Derivation of judgement ``x : int, y : int => x + 12 * y : int`` ``` x:int, y:int => 12 : int x:int, y:int => y : int --------------------------------------------------- x:int, y:int => x : int x:int, y:int => 12 * y : int ----------------------------------------------------------- x:int, y:int => x + 12 * y : int ``` #NEW ==Signature vs. context== We generalize the type checking context to an **environment** with two parts: - **signature**, which shows the types of functions - **context**, which shows the types of variables. In the course of type checking, the signature remains the same throughout a program module, whereas the context changes all the time. #NEW ==Function types== No expression in the language of Lab 2 has **function types**, because functions are never returned as values or used as arguments. However, the compiler needs internally a data structure for function types, to hold the types of the parameters and the return type. E.g. for a function ``` bool between (int x, double a, double b) {...} ``` we write ``` between : (int, double, double) -> bool ``` to express this internal representation in typing rules. #NEW ==Notation for signature and context== Dividing the environment to signature ``F`` and context ``G``, ``` F,G => J ``` Example: typing rule for a variable expression ``` ------------- x : T is in G F,G => x : T ``` Example: typing rule for one-place function application ``` F,G => e : A --------------- f : (A) -> T is in F F,G => f(e) : T ``` #NEW ==The validity of statements== Expressions have types, but statements do not. However, also statements are checked in type checking. We need a new judgement form, saying that a statement ``S`` is valid: ``` F,G => S valid ``` Example: typing rule for an assignment ``` F,G => e : T -------------------- x : T is in G F,G => x = e ; valid ``` Example: typing rule for while loops ``` F,G => e : bool F,G => S valid -------------------------------- F,G => while (e) S valid ``` #NEW ==Variable binding and context== Contexts can be **extended** with new variables. The notation we use is ``` (G, x : T) ``` This corresponds to **variable binding** constructs, e.g. declarations. Example: typing rule for a declaration; ``SS`` is a sequence of statements ``` F,(G, x:t) => SS valid ------------------------- x not in G F,G => t x ; SS valid ``` The rule says: if ``SS`` is a valid sequence of statements in context ``(G, x : T)``, then ``t x ; SS`` is valid in ``G``. #NEW ==Example of variable binding and context== We prove that ``int x ; x = x + 5 ;`` is valid in the empty context ``()``. ``` x : int => x : int x : int => 5 : int ----------------------------------------- x : int => x + 5 : int -------------------------- x : int => x = x + 5 ; valid ------------------------------- () => int x ; x = x + 5 ; valid ``` The signature is omitted for simplicity. #NEW ==Function definitions and applications== The validity of a function definition. ``` F,(G, x#sub1:A#sub1,...,x#subn : A#subn) => SS valid --------------------------------------------- f not in F F,G => T f (A#sub1 x#sub1,...,A#subn x#subn) { SS } valid ``` More conditions: - that ``x#sub1 ... x#subn`` are distinct - that ``A#sub1 ... A#subn, T`` are types (can be guaranteed by syntax) - that there is a ``return e`` with ``e : T`` (not always checked) The typing rule for function applications is the following ``` F,G => f : (A#sub1,...,A#subn) -> T F,G => e#sub1 : A#sub1,..., e#subn : A#subn ---------------------------------------------------------- F,G => f(e#sub1,...,e#subn) : T ``` #NEW ==Block structure== In C, C++, Java, Haskell, etc, variables on the same level must be distinct (e.g. in function parameter lists). However, variables in an **inner block** are no longer on the same level and can hence **overshadow** outer variables ``` { int x = 1 ; bool b ; x = x + 2 ; // x : int { double x = 2.0 ; x = x + 1.0 ; // x : double b = true ; // b : bool } x = x + 5 ; // x : int b = b && b ; // b : bool } ``` Variables declared in a block are discarded at exit from the block. There is no limit in the number of block levels. #NEW ==Contexts for block structure== Implementation 1: with markers ``` x : int, b : bool, MARK, x : double ``` - entering a block: add a marker - leaving a block: delete variables after last marker, and the marker itself - update: after the last variable - lookup: the latest occurrence of the variable Implementation 2: with stacks of contexts (separated by ".", stack top is rightmost) ``` (x : int, b : bool).(x : double) ``` - entering a block: push an empty context on the stack - leaving a block: pop the topmost context - update: after the last variable in the topmost context - lookup: the deepest surrounding occurrence of the variable Implementation 2 can be done with lists of lists: the top of the stack is the head of the list. #NEW ==Typing rules for variables== The rules are expressed as checking if a **list of statements** SS is valid. Instead of a single context, we have a **stack of contexts** ``Gs.G`` where we denote by ``G`` the topmost context. Declarations. ``` Gs.(G,x:t) => SS valid -------------------------- x not in G Gs.G => t x ; SS valid ``` Assignments. ``` Gs => e : t Gs => SS valid ------------------------------- x : t in Gs Gs => x = e ; SS valid ``` Blocks. ``` Gs.() => SS valid Gs => SSS valid ------------------------------------- Gs => { SS } SSS valid ``` The typing rule for variable expressions is now ``` -------------- x : T the closest entry for x in Gs.G Gs.G => x : T ``` #NEW ==Type checking and type inference== **Type checking**: given a judgement ``G => e : T``, find out whether it can be derived by the typing rules. The **derivation** is a tree of rule application with the judgement as the last line. **Type inference**: given an expression ``e``, find a type ``T`` in context ``G`` such that ``G => e : T`` can be derived by the typing rules. For Java, C, and C++, we can mostly do with just type checking, because types are marked explicitly. Haskell has type inference as well: if you don't give the type, the compiler can usually find //the most general type//. #NEW ==Different type checkers== We can classify checkers in terms of what they return: - A //rude checker//, which only says ``True`` or ``False``, and may even crash (for instance, when variable lookup just gives an ``error``is the variable is not found). - An //error-reporting checker//, which returns ``OK`` or a message saying where the error is. - An //annotating checker//, which returns a syntax tree annotated with more type information. To build a compiler back end, we need the third. In Lab 2, we build the second. #NEW ==The passes of the type checker== Pass 1: - start with empty signature - for each function ``f``, update signature with ``f : T`` Pass 2: - for each ``f``, check the function body of ``f`` with respect to the type ``T`` The expression checker consists of functions: ``` check (Exp e, Type t) returns void infer (Exp e) returns Type ``` These functions are defined by mutual recursion, by cases on the expression. We also need to check function definitions and sequences of statements. ``` check (Def d) returns void check (Stms ss) returns void ``` All functions use an environment (= signature and stack of contexts). The method is syntax-directed translation. #NEW ==Examples of type checking== We show syntax-directed translation in pseudocode. ``` infer x = // variable x t := lookup(x) return t infer i = // integer literal i return int infer f(a#sub1,..., a#subn) = T := lookup(f) if T = (A#sub1, ..., A#subn) -> B check a#sub1 : A#sub1 ... check a#subn : A#subn return B else failure ``` #NEW ==From typing rules to type checking code== Basic idea: from rule ``` J#sub1 ... J#subn ---------- C J ``` generate the code "upside down" ``` check J = check J#sub1 ... check J#subn check_condition C ``` Example: ``` Env => exp1 : bool Env => exp2 : bool check Env => exp1 && exp2 : bool = ----------------------------------------- check Env => exp1 : bool Env => exp1 && exp2 : bool check Env => exp1 : bool ``` #NEW ==From typing rules to type checking code: more examples== Judgements are easy: recursive calls to check. ``` Env => exp : bool Env => stm valid check Env => while (exp) stm valid = ------------------------------------ check Env => exp : bool Env => while (exp) stm valid check Env => stm valid ``` Side conditions are unlimited code, so you have to think harder. ``` ---------------- var : typ is in Env check Env => var : typ = Env => var : typ check_condition lookup(var,Env) == typ ``` It is ``lookup`` and such conditions that in the end generate the error messages. ``` lookup(var,Env) = message ("variable " var "not found") // if var is not in Env check_condition x == y = message ("expected " y " but found " x) // if not equal ``` #NEW ==The need of type inference== There is a grammar rule saying that expressions can be used as statements: ``` Stm ::= Exp ";" ``` How do we check that such statements are valid? ``` Env => exp : ? ------------------ Env => exp ; valid ``` The problem is that we have no type ``typ`` to check ``exp : typ``. Solution 1: check ``exp`` with each of the four types ``` check Env => exp ; valid = try each typ in [bool,double,int,void]: check exp : typ ``` This is inefficient, and does not scale up to infinitely many types. Solution 2: do type inference with ``exp``. If it succeeds, the statement is valid - because expressions of any type can be used as statments. #NEW ==Type inference== The general scheme is a rule where the conclusion has a type depending in some way on the premises and the condition: ``` J#sub1 ... J#subn --------------------------------- C Env => exp : typ(J#sub1, ..., J#subn, C) ``` We should then use recursive calls of ``check`` and ``infer`` so that - everything we need for constructing the type is inferred - everything else is just checked Often the type is independent of the premisses (which still have to be checked of course!): ``` Env => exp1 : bool Env => exp2 : bool infer Env (exp1 && exp2) = ------------------------------------------ check Env => exp1 : bool Env => exp1 && exp2 : bool check Env => exp2 : bool return bool ``` It can also come from the condition: ``` ---------------- var : typ is in Env infer Env var = Env => var : typ return lookup(var,Env) ``` #NEW ==Type checking overloaded operations== Arithmetic operations in most languages are **overloaded**. This means that they apply to many types. The general rule for ``+ - * /`` is: both operands have the same type as the value, which must be ``int`` or ``double``. ``` Env => exp1 : typ Env => exp2 : typ -------------------------------------- typ is int or double Env => exp1 + exp2 : typ ``` What we do is infer the type of the first operand and check the second. ``` infer Env (exp1 + exp2) = typ := infer Env exp1 check_condition typ == int or typ == double check Env => exp2 : typ return typ ``` Also the comparison operators are overloaded, but the return type is of course ``bool``. #NEW ==Relating inference and checking== Now we can check expression statements: ``` check Env => exp ; valid = infer Env exp ``` If ``infer`` fails, we get any error message it generates. If ``infer`` succeeds, we discard the type. In the same way, we only need to write ``infer`` for expressions. Then we define ``check`` uniformly, ``` check Env => exp : typ = typ2 := infer Env exp check_condition typ2 == typ ``` The ``check_condition`` call usually returns a message at failure, e.g. ``` TYPE ERROR type of exp: expected typ, inferred typ2 ``` #NEW ==The top-level checkers== To check the whole program, + collect the types of each function into the signature + check that function names are unique + check each function definition using the signature To check a function definition + check that argument variables are unique + initialize the topmost context with the argument variables + check the body in this context + check that there is a ``return``, with an expression that has the expected return type of the function (or just a ``return`` if the type is ``void``) To check a sequence of statements + check the validity of the first statement and update the environment if appropriate + check the remaining sequence in the new environment + an empty sequence is always valid #NEW ==Type checker in Haskell== You can copy the contents of [``laborations/lab2/haskell/`` ../laborations/lab2/haskell]: ``` CPP.cf -- grammar lab2.hs -- main module Makefile TypeChecker.hs -- type checking module ``` You only have to modify ``CPP.cf`` and ``TypeChecker.hs``. But you can already compile them: just type ``` make ``` and run the type checker with ``` ./lab2 ``` The rest is "debugging the empty file"! #NEW ===The Main module=== You don't have to write this - just copy the file [``laborations/lab2/haskell/lab2.hs`` ../laborations/lab2/haskell/lab2.hs]. This file shows how compiler phases are linked together. ``` check :: String -> IO () check s = case pProgram (myLexer s) of Bad err -> do putStrLn "SYNTAX ERROR" putStrLn err exitFailure Ok tree -> case typecheck tree of Bad err -> do putStrLn "TYPE ERROR" putStrLn err exitFailure Ok _ -> putStrLn "OK" ``` In other words: call the parser; if it succeeds, call the type checker. Notice the use of the **error type**, ``` data Err a = Ok a | Bad String ``` The value is either ``Ok`` of the expected type or ``Bad`` with an error message. The ``Err`` type is generated by BNFC. One could also use Haskell's standard type ``Either String a``. #NEW ===Using the Err type=== The ``Err`` type ``` data Err a = Ok a | Bad String ``` is a **monad** - a type of actions returning ``a`` but also doing other things (in this case: exceptions). Monad actions can be **sequence**d: if ``` inferExp :: Env -> Exp -> Err Type ``` then you can make several inferences one after the other by using ``do`` ``` do inferExp env exp1 inferExp env exp2 ``` You can **bind** variables returned from actions, and **return** values. ``` do typ1 <- inferExp env exp1 typ2 <- inferExp env exp2 return TBool ``` If you are only interested in side effects, use the dummy value type ``()`` (corresponds to ``void`` in C and Java). #NEW ==Symbol tables== Environment type ``` type Env = (Sig,[Context]) -- signature and stack of contexts type Sig = [(Id,([Type],Type))] -- or Map Id ([Type],Type) type Context = [(Id,Type)] -- or Map Id Type ``` Auxiliary operations on the environment ``` lookVar :: Env -> Id -> Err Type lookFun :: Env -> Id -> Err ([Type],Type) updateVar :: Env -> Id -> Type -> Err Env updateFun :: Env -> Id -> ([Type],Type) -> Err Env newBlock :: Env -> Err Env emptyEnv :: Env ``` Keep the datatypes abstract, i.e. use them only via these operations. Then you can switch to another implementation if needed (more efficient, more stuff in the environment). #NEW ===The TypeCheck module=== The environment datatypes and operations. Type signatures of the checking methods ``` typecheck :: Program -> Err () -- required function in lab2 checkDef :: Env -> Def -> Err () -- check a function definition checkStms :: Env -> Type -> [Stm] -> Err () checkStm :: Env -> Type -> Stm -> Err Env checkExp :: Env -> Type -> Exp -> Err () inferExp :: Env -> Exp -> Err Type ``` Some other auxiliaries. ``` checkUnique :: (Ord a, Print a) => [a] -> Err () checkCondition :: Bool -> Err () ``` #NEW ===Some examples of checking=== ``` checkStm :: Env -> Type -> Stm -> Err Env checkStm env val x = case x of SExp exp -> do inferExp env exp return env SDecl type' x -> updateVar env id type' -- also check that x is not in context already SWhile exp stm -> do checkExp env Type_bool exp checkStm env val stm checkExp :: Env -> Type -> Exp -> Err () checkExp env typ exp = do typ2 <- inferExp env exp if (typ2 == typ) then return () else fail $ "type of " ++ printTree exp -- ... ``` #NEW ===Some examples of type inference=== ``` inferExp :: Env -> Exp -> Err Type inferExp env x = case x of ETrue -> return Type_bool EInt n -> return Type_int EId id -> lookVar env id EPIncr exp -> inferNumeric env exp ETimes exp0 exp -> inferNumericBin env exp0 exp inferNumeric :: Env -> Exp -> Err Type inferNumeric env exp = do typ <- inferExp env exp if elem typ [Type_int, Type_double] then return typ else fail $ "type of expression " ++ printTree exp -- ... inferNumericBin :: Env -> Exp -> Exp -> Err Type ``` #NEW ==Type checker in Java== You can copy the contents of [``laborations/lab2/java/`` ../laborations/lab2/java1.5]: ``` CPP.cf -- grammar lab2 -- script running the type checker lab2.java -- main program Makefile TypeChecker.java -- type checker class TypeException.java -- exceptions for type checking ``` You only have to modify ``CPP.cf`` and ``TypeChecker.java``. But you can already compile them: just type ``` make ``` and run the type checker with ``` ./lab2 ``` The rest is "debugging the empty file"! Before ``make``, you may have to set your class path so that it finds java_cup and JLex, as well as the current directory. ``` export CLASSPATH=.:::$CLASSPATH ``` #NEW ===The Main module=== This is given in [``laborations/lab2/java/lab2.java`` ../laborations/lab2/java1.5/lab2.java], hence you don't have to write this. It shows how compiler phases are linked together. ``` try { l = new Yylex(new FileReader(args[0])); parser p = new parser(l); CPP.Absyn.Program parse_tree = p.pProgram(); new TypeChecker().typecheck(parse_tree); } catch (TypeException e) { System.out.println("TYPE ERROR"); System.err.println(e.toString()); System.exit(1); } catch (IOException e) { System.err.println(e.toString()); System.exit(1); } catch (Throwable e) { System.out.println("SYNTAX ERROR"); System.out.println("At line " + String.valueOf(l.line_num()) + ", near \"" + l.buff() + "\" :"); System.out.println(" " + e.getMessage()); System.exit(1); } ``` #NEW ==Symbol tables== Environment types ``` public static class FunType { public LinkedList args ; public Type val ; } public static class Env { public Map signature ; public LinkedList> contexts ; -- stack of contexts public static Type lookVar(String id) { ...} ; public static FunType lookFun(String id) { ...} ; public static void updateVar (String id, Type ty) {...} ; // ... } ``` #NEW ===The TypeCheck module=== The environment datatypes and operations. An enumeration of codes for types. ``` public static enum TypeCode { CodeInt, CodeDouble, CodeBool, CodeVoid } ; ``` Notice that ``TypeCode`` is not the same as the class ``Type``, which is the syntactic category of source-language types. We need ``TypeCode`` to be able to compare types for equality, and this happens when we compare an expected type with an inferred type. Type signatures of the checking methods ``` public void typecheck(Program p) { } public static class CheckStm implements Stm.Visitor { public Env visit(SDecl p, Env env) { } public Env visit(SExp p, Env env) { } // ... public static class InferExp implements Exp.Visitor { public Type visit(EInt p, Env env) { } public Type visit(EAdd p, Env env) { } // ... } ``` #NEW ===Some examples of checking=== ``` public static class CheckStm implements Stm.Visitor { public Env visit(SDecl p, Env env) { env.updateVar(p.id_,p.type_) ; return env ; } //... } ``` #NEW ===Some examples of type inference=== ``` public static class InferExpType implements Exp.Visitor { public Type visit(demo.Absyn.EPlus p, Env env) { Type t1 = p.exp_1.accept(this, env); Type t2 = p.exp_2.accept(this, env); if (typeCode(t1) == TypeCode.CodeInt && typeCode(t2) == TypeCode.CodeInt) return TInt; else if (typeCode(t1) == TypeCode.CodeDouble && typeCode(t2) == TypeCode.CodeDouble) return TDouble; else throw new TypeException("Operands to + must be int or double."); } //... } ``` The function ``typeCode`` converts source language types to their codes: ``` public static TypeCode typeCode (Type ty) ... ``` It can be implemented by using a visitor or the ``instanceof`` operator. #NEW ===More help=== You don't need to debug completely empty files: - for the grammars, you can pick rules from your Lab 1 (as indicated in Lab 2 PM) - for the type checker, you can start from the "mini" implementation, in [``laborations/mini`` ../laborations/mini] #NEW ==Lab 2 overview== We will read through the [Lab PM ../laborations/lab2/lab2.html] Preparation and exercise: write typing rules for Lab PM constructs.