Lecture 8: Implementing Type Checking Programming Languages Course Aarne Ranta (aarne@cs.chalmers.se) %!target:html %!postproc(html): #NEW %!postproc(html): #HR
%!postproc(html): #sub1 1 %!postproc(html): #subn n Book: 6.3, 6.5 #NEW ==Plan== From typing rules to type checking code. Type checker implementation in Haskell. Type checker implementation in Java. The location of errors Extra: scoping puzzles in real C++ #NEW ==From typing rules to type checking code== Basic idea: from rule ``` J#sub1 ... J#subn ---------- C J ``` generate the code "upside down" ``` check J = check J#sub1 ... check J#subn check_condition C ``` Example: ``` Env => exp1 : bool Env => exp2 : bool check Env => exp1 && exp2 : bool = ----------------------------------------- check Env => exp1 : bool Env => exp1 && exp2 : bool check Env => exp1 : bool ``` #NEW ==From typing rules to type checking code: more examples== Judgements are easy: recursive calls to check. ``` Env => exp : bool Env => stm valid check Env => while (exp) stm valid = ------------------------------------ check Env => exp : bool Env => while (exp) stm valid check Env => stm valid ``` Side conditions are unlimited code, so you have to think harder. ``` ---------------- var : typ is in Env check Env => var : typ = Env => var : typ check_condition lookup(var,Env) == typ ``` It is ``lookup`` and such conditions that in the end generate the error messages. ``` lookup(var,Env) = message ("variable " var "not found") // if var is not in Env check_condition x == y = message ("expected " y " but found " x) // if not equal ``` #NEW ==The need of type inference== There is a grammar rule saying that expressions can be used as statements: ``` Stm ::= Exp ";" ``` How do we check that such statements are valid? ``` Env => exp : ? ------------------ Env => exp ; valid ``` The problem is that we have no type ``typ`` to check ``exp : typ``. Solution 1: check ``exp`` with each of the four types ``` check Env => exp ; valid = try each typ in [bool,double,int,void]: check exp : typ ``` This is inefficient, and does not scale up to infinitely many types. Solution 2: do type inference with ``exp``. If it succeeds, the statement is valid - because expressions of any type can be used as statments. #NEW ==Type inference== The general scheme is a rule where the conclusion has a type depending in some way on the premises and the condition: ``` J#sub1 ... J#subn --------------------------------- C Env => exp : typ(J#sub1, ..., J#subn, C) ``` We should then use recursive calls of ``check`` and ``infer`` so that - everything we need for constructing the type is inferred - everything else is just checked Often the type is independent of the premisses (which still have to be checked of course!): ``` Env => exp1 : bool Env => exp2 : bool infer Env (exp1 && exp2) = ------------------------------------------ check Env => exp1 : bool Env => exp1 && exp2 : bool check Env => exp2 : bool return bool ``` It can also come from the condition: ``` ---------------- var : typ is in Env infer Env var = Env => var : typ return lookup(var,Env) ``` #NEW ==Type checking overloaded operations== Arithmetic operations in most languages are **overloaded**. This means that they apply to many types. The general rule for ``+ - * /`` is: both operands have the same type as the value, which must be ``int`` or ``double``. ``` Env => exp1 : typ Env => exp2 : typ -------------------------------------- typ is int or double Env => exp1 + exp2 : typ ``` What we do is infer the type of the first operand and check the second. ``` infer Env (exp1 + exp2) = typ := infer Env exp1 check_condition typ == int or typ == double check Env => exp2 : typ return typ ``` Also the comparison operators are overloaded, but the return type is of course ``bool``. #NEW ==Relating inference and checking== Now we can check expression statements: ``` check Env => exp ; valid = infer Env exp ``` If ``infer`` fails, we get any error message it generates. If ``infer`` succeeds, we discard the type. In the same way, we only need to write ``infer`` for expressions. Then we define ``check`` uniformly, ``` check Env => exp : typ = typ2 := infer Env exp check_condition typ2 == typ ``` The ``check_condition`` call usually returns a message at failure, e.g. ``` TYPE ERROR type of exp: expected typ, inferred typ2 ``` #NEW ==The top-level checkers== To check the whole program, + collect the types of each function into the signature + check that function names are unique + check each function definition using the signature To check a function definition + check that argument variables are unique + initialize the topmost context with the argument variables + check the body in this context + check that there is a ``return``, with an expression that has the expected return type of the function (or just a ``return`` if the type is ``void``) To check a sequence of statements + check the validity of the first statement and update the environment if appropriate + check the remaining sequence in the new environment + an empty sequence is always valid #NEW ==Type checker in Haskell== You can copy the contents of [``laborations/lab2/haskell/`` ../laborations/lab2/haskell]: ``` CPP.cf -- grammar lab2.hs -- main module Makefile TypeChecker.hs -- type checking module ``` You only have to modify ``CPP.cf`` and ``TypeChecker.hs``. But you can already compile them: just type ``` make ``` and run the type checker with ``` ./lab2 ``` The rest is "debugging the empty file"! #NEW ===The Main module=== You don't have to write this - just copy the file [``laborations/lab2/haskell/lab2.hs`` ../laborations/lab2/haskell/lab2.hs]. This file shows how compiler phases are linked together. ``` check :: String -> IO () check s = case pProgram (myLexer s) of Bad err -> do putStrLn "SYNTAX ERROR" putStrLn err exitFailure Ok tree -> case typecheck tree of Bad err -> do putStrLn "TYPE ERROR" putStrLn err exitFailure Ok _ -> putStrLn "OK" ``` In other words: call the parser; if it succeeds, call the type checker. Notice the use of the **error type**, ``` data Err a = Ok a | Bad String ``` The value is either ``Ok`` of the expected type or ``Bad`` with an error message. #NEW ===Using the Err type=== The ``Err`` type ``` data Err a = Ok a | Bad String ``` is a **monad** - a type of actions returning ``a`` but also doing other things (in this case: exceptions). Monad actions can be **sequence**d: if ``` inferExp :: Env -> Exp -> Err Type ``` then you can make several inferences one after the other by using ``do`` ``` do inferExp env exp1 inferExp env exp2 ``` You can **bind** variables returned from actions, and **return** values. ``` do typ1 <- inferExp env exp1 typ2 <- inferExp env exp2 return TBool ``` If you are only interested in side effects, use the dummy value type ``()`` (corresponds to ``void`` in C and Java). #NEW ==Symbol tables== Environment type ``` type Env = (Sig,[Context]) type Sig = [(Id,([Type],Type))] -- or Map Id ([Type],Type) type Context = [(Id,Type)] -- or Map Id Type ``` Auxiliary operations on the environment ``` lookVar :: Env -> Id -> Err Type lookFun :: Env -> Id -> Err ([Type],Type) updateVar :: Env -> Id -> Type -> Err Env updateFun :: Env -> Id -> ([Type],Type) -> Err Env newBlock :: Env -> Err Env emptyEnv :: Env ``` Keep the datatypes abstract, i.e. use them only via these operations. Then you can switch to another implementation if needed (more efficient, more stuff in the environment). #NEW ===The TypeCheck module=== The environment datatypes and operations. Type signatures of the checking methods ``` typecheck :: Program -> Err () -- required function in lab2 checkDef :: Env -> Def -> Err () -- check a function definition checkStms :: Env -> Type -> [Stm] -> Err () checkStm :: Env -> Type -> Stm -> Err Env checkExp :: Env -> Type -> Exp -> Err () inferExp :: Env -> Exp -> Err Type ``` Some other auxiliaries. ``` checkUnique :: (Ord a, Print a) => [a] -> Err () checkCondition :: Bool -> Err () ``` #NEW ===Some examples of checking=== ``` checkStm :: Env -> Type -> Stm -> Err Env checkStm env val x = case x of SExp exp -> do inferExp env exp return env SDecl type' x -> updateVar env id type' -- also check that x is not in context already SWhile exp stm -> do checkExp env Type_bool exp checkStm env val stm checkExp :: Env -> Type -> Exp -> Err () checkExp env typ exp = do typ2 <- inferExp env exp if (typ2 == typ) then return () else fail $ "type of " ++ printTree exp -- ... ``` #NEW ===Some examples of type inference=== ``` inferExp :: Env -> Exp -> Err Type inferExp env x = case x of ETrue -> return Type_bool EInt n -> return Type_int EId id -> lookVar env id EPIncr exp -> inferNumeric env exp ETimes exp0 exp -> inferNumericBin env exp0 exp inferNumeric :: Env -> Exp -> Err Type inferNumeric env exp = do typ <- inferExp env exp if elem typ [Type_int, Type_double] then return typ else fail $ "type of expression " ++ printTree exp -- ... inferNumericBin :: Env -> Exp -> Exp -> Err Type ``` #NEW ==Type checker in Java== You can copy the contents of [``laborations/lab2/java/`` ../laborations/lab2/java1.5]: ``` CPP.cf -- grammar lab2 -- script running the type checker lab2.java -- main program Makefile TypeChecker.java -- type checker class TypeException.java -- exceptions for type checking ``` You only have to modify ``CPP.cf`` and ``TypeChecker.java``. But you can already compile them: just type ``` make ``` and run the type checker with ``` ./lab2 ``` The rest is "debugging the empty file"! Before ``make``, you may have to set your class path so that it finds java_cup and JLex, as well as the current directory. ``` export CLASSPATH=.:::$CLASSPATH ``` #NEW ===The Main module=== This is given in [``laborations/lab2/java/lab2.java`` ../laborations/lab2/java1.5/lab2.java], hence you don't have to write this. It shows how compiler phases are linked together. ``` try { l = new Yylex(new FileReader(args[0])); parser p = new parser(l); CPP.Absyn.Program parse_tree = p.pProgram(); new TypeChecker().typecheck(parse_tree); } catch (TypeException e) { System.out.println("TYPE ERROR"); System.err.println(e.toString()); System.exit(1); } catch (IOException e) { System.err.println(e.toString()); System.exit(1); } catch (Throwable e) { System.out.println("SYNTAX ERROR"); System.out.println("At line " + String.valueOf(l.line_num()) + ", near \"" + l.buff() + "\" :"); System.out.println(" " + e.getMessage()); System.exit(1); } ``` #NEW ==Symbol tables== Environment types ``` public static class FunType { public LinkedList args ; public Type val ; } public static class Env { public Map signature ; public LinkedList> contexts ; public static Type lookVar(String id) { ...} ; public static FunType lookFun(String id) { ...} ; public static void updateVar (String id, Type ty) {...} ; // ... } ``` #NEW ===The TypeCheck module=== The environment datatypes and operations. An enumeration of codes for types. ``` public static enum TypeCode { CodeInt, CodeDouble, CodeBool, CodeVoid } ; ``` Notice that ``TypeCode`` is not the same as the class ``Type``, which is the syntactic category of source-language types. We need ``TypeCode`` to be able to compare types for equality, and this happens when we compare an expected type with an inferred type. Type signatures of the checking methods ``` public void typecheck(Program p) { } public static class CheckStm implements Stm.Visitor { public Env visit(SDecl p, Env env) { } public Env visit(SExp p, Env env) { } // ... public static class InferExp implements Exp.Visitor { public Type visit(EInt p, Env env) { } public Type visit(EAdd p, Env env) { } // ... } ``` #NEW ===Some examples of checking=== ``` public static class CheckStm implements Stm.Visitor { public Env visit(SDecl p, Env env) { env.updateVar(p.id_,p.type_) ; return env ; } //... } ``` #NEW ===Some examples of type inference=== ``` public static class InferExpType implements Exp.Visitor { public Type visit(demo.Absyn.EPlus p, Env env) { Type t1 = p.exp_1.accept(this, env); Type t2 = p.exp_2.accept(this, env); if (typeCode(t1) == TypeCode.CodeInt && typeCode(t2) == TypeCode.CodeInt) return TInt; else if (typeCode(t1) == TypeCode.CodeDouble && typeCode(t2) == TypeCode.CodeDouble) return TDouble; else throw new TypeException("Operands to + must be int or double."); } //... } ``` The function ``typeCode`` converts source language types to their codes: ``` public static TypeCode typeCode (Type ty) ... ``` It can be implemented by using a visitor or the ``instanceof`` operator. #NEW ===More help=== You don't need to debug completely empty files: - for the grammars, you can pick rules from your Lab 1 (as indicated in Lab 2 PM) - for the type checker, you can start from the "mini" implementation, in [``laborations/mini`` ../laborations/mini] #NEW ==The location of errors== BNFC abstract syntax does not normally save the location in source file. This means that the location of type checker (or later) errors cannot be known by the type checker. However, there is a possibility to mark the position of any : use ``` position token Id ... ``` instead of just ``token``. This works at the moment only in Haskell and C++ with STL, and is therefore not required in the lab. Haskell representation: ``` newtype Id = Id ((Int,Int),String) -- line, column, ident ``` C++ representation: ``` class Id : public Visitable { public: String string_; // ident Integer integer_; // line // } ``` #NEW ==Extra: scoping puzzles in real C++== What does the following print - or is it correct? ``` if (int x = 2) int x = 4; std::cout << x ; while (int y = 2) int y = 4; std::cout << y ; while (int z = 2) {int z = 4;} std::cout << z ; while (int u = 2) {{int u = 4;}} std::cout << u ; ``` Can you initialize a variable by itself? ``` int x = x; ```