Lecture 7: Type Systems Programming Languages Course Aarne Ranta (aarne@cs.chalmers.se) %!target:html %!postproc(html): #NEW %!postproc(html): #HR
%!postproc(html): #sub1 1 %!postproc(html): #subn n Book: 6.3, 6.5 #NEW ==The purpose of types== To define what the program should do. - e.g. read an array of integers and return a double To guarantee that the program is meaningful. - that it does not add a string to an integer - that variables are declared before they are used To document the programmer's intentions. - better than comments, which are not checked by the compiler To optimize the use of hardware. - reserve the minimal amount of memory, but not more - use the most appropriate machine instructions #NEW ==What belongs to type checking== Depending on language, the type checker can prevent - application of a function to wrong number of arguments, - application of integer functions to floats, - use of undeclared variables in expressions, - functions that do not return values, - division by zero - array indices out of bounds, - nonterminating recursion, - sorting algorithms that don't sort... Languages differ greatly in how strict their static semantics is: none of the things above is checked by all programming languages! In general, the more there is static checking in the compiler, the less need there is for manual debugging. #NEW ==Description formats for different compiler phases== These formats are independent of implemantation language. - Lexer: regular expressions - Parser: BNF grammars - **Type checker: typing rules** - Interpreter: operational semantic rules - Code generator: compilation schemes #NEW ==Typing judgements and rules== Typing rules concern **judgements** of the form ``` E => e : T ``` where //E// is an **environment**, which contains e.g. typings of identifiers. The judgement says - in the environment //E//, expression //e// has type //T// Judgements are used in **typing rules** of the form ``` J1 J2 ... Jn --------------- C J ``` (n >= 0) which says - from the judgements //J1, J2, ..., Jn// you may conclude //J//, if condition //C// holds. The judgements above the line in a rule are the **premisses**. The judgement under the line is the **conclusion**. The condition beside is a **side condition**, typically not expressible as a premiss. The judgement are written in a formal language, whereas side conditions can be written in natural language. #NEW ==Examples of typing rules and derivation== Typing rules for arithmetic expressions ``` E => e1 : int E => e2 : int E => e1 : int E => e2 : int ------------------------------- ------------------------------- E => e1 + e2 : int E => e1 * e2 : int ---------- x : T is in E ------------ i is an integer literal E => x : T E => i : int ``` Derivation of judgement ``x : int, y : int => x + 12 * y : int`` ``` x:int, y:int => 12 : int x:int, y:int => y : int --------------------------------------------------- x:int, y:int => x : int x:int, y:int => 12 * y : int ----------------------------------------------------------- x:int, y:int => x + 12 * y : int ``` #NEW ==Signature vs. context== We generalize the type checking context to an **environment** with two parts: - **signature**, which shows the types of functions - **context**, which shows the types of variables. In the course of type checking, the signature remains the same throughout a program module, whereas the context changes all the time. #NEW ==Function types== No expression in the language of Lab 2 has **function types**, because functions are never returned as values or used as arguments. However, the compiler needs internally a data structure for function types, to hold the types of the parameters and the return type. E.g. for a function ``` bool between (int x, double a, double b) {...} ``` we write ``` between : (int, double, double) -> bool ``` to express this internal representation in typing rules. #NEW ==Notation for signature and context== Dividing the environment to signature ``F`` and context ``G``, ``` F,G => J ``` Example: typing rule for a variable expression ``` ------------- x : T is in G F,G => x : T ``` Example: typing rule for one-place function application ``` F,G => e : A --------------- f : (A) -> T is in F F,G => f(e) : T ``` #NEW ==The validity of statements== Expressions have types, but statements do not. However, also statements are checked in type checking. We need a new judgement form, saying that a statement ``S`` is valid: ``` F,G => S valid ``` Example: typing rule for an assignment ``` F,G => e : T ------------------ x : T is in G F,G => x = e ; valid ``` Example: typing rule for while loops ``` F,G => e : bool F,G => S valid -------------------------------- F,G => while (e) S valid ``` #NEW ==Variable binding and context== Contexts can be **extended** with new variables. The notation we use is ``` (G, x : T) ``` This corresponds to **variable binding** constructs, e.g. declarations. Example: typing rule for a declaration; ``SS`` is a sequence of statements ``` F,(G, x:t) => SS valid ------------------------- x not in G F,G => t x ; SS valid ``` The rule says: if ``SS`` is a valid sequence of statements in context ``(G, x : T)``, then ``t x ; SS`` is valid in ``G``. #NEW ==Example of variable binding and context== We prove that ``int x ; x = x + 5 ;`` is valid in the empty context ``()``. ``` x : int => x : int x : int => 5 : int ----------------------------------------- x : int => x + 5 : int -------------------------- x : int => x = x + 5 ; valid ------------------------------- () => int x ; x = x + 5 ; valid ``` The signature is omitted for simplicity. #NEW ==Function definitions and applications== The validity of a function definition. ``` F,(G, x#sub1:a#sub1,...,x#subn : a#subn) => SS valid --------------------------------------------- f not in F F,G => T f (A#sub1 x#sub1,...,A#subn x#subn) { SS } valid ``` More conditions: - that ``x#sub1 ... x#subn`` are distinct - that ``A#sub1 ... A#subn, T`` are types (can be guaranteed by syntax) - that there is a ``return e`` with ``e : T`` (not always checked) The typing rule for function applications is the following ``` F,G => f : (A#sub1,...,A#subn) -> T F,G => e#sub1 : A#sub1,..., e#subn : A#subn ---------------------------------------------------------- F,G => f(e#sub1,...,e#subn) : T ``` #NEW ==Block structure== In C, C++, Java, Haskell, etc, variables on the same level must be distinct (e.g. in function parameter lists). However, variables in an **inner block** are no longer on the same level and can hence **overshadow** outer variables ``` { int x = 1 ; bool b ; x = x + 2 ; // x : int { double x = 2.0 ; x = x + 1.0 ; // x : double b = true ; // b : bool } x = x + 5 ; // x : int b = b && b ; // b : bool } ``` Variables declared in a block are discarded at exit from the block. There is no limit in the number of block levels. #NEW ==Contexts for block structure== Implementation 1: with markers ``` x : int, b : bool, MARK, x : double ``` - entering a block: add a marker - leaving a block: delete variables after last marker, and the marker itself - update: after the last variable - lookup: the latest occurrence of the variable Implementation 2: with stacks of contexts ``` (x : int, b : bool).(x : double) ``` - entering a block: push an empty context on the stack - leaving a block: pop the topmost context - update: after the last variable in the topmost context - lookup: the deepest surrounding occurrence of the variable Implementation 2 can be done with lists of lists: the top of the stack is the head of the list. #NEW ==Typing rules for variables== The rules are expressed as checking if a list of statements is valid. Instead of a single context, we have a **stack of contexts** ``Gs.G`` where we denote by ``G`` the topmost context. ``` -------------- x : T the closest entry for x in Gs.G Gs.G => x : T Gs.(G,x:t) => SS valid -------------------------- x not in G Gs.G => t x ; SS valid Gs => e : t Gs => SS valid ------------------------------- x : t in Gs Gs => x = e ; SS valid Gs.() => SS valid Gs => SSS valid ------------------------------------- Gs => { SS } SSS valid ``` #NEW ==Type checking and type inference== **Type checking**: given a judgement ``G => e : T``, find out whether it can be derived by the typing rules. The **derivation** is a tree of rule application with the judgement as the last line. **Type inference**: given an expression ``e``, find a type ``T`` in context ``G`` such that ``G => e : T`` can be derived by the typing rules. For Java, C, and C++, we can mostly do with just type checking, because types are marked explicitly. Haskell has type inference as well: if you don't give the type, the compiler can usually find //the most general type//. #NEW ==Different type checkers== We can classify checkers in terms of what they return: - A //rude checker//, which only says ``True`` or ``False``, and may even crash (for instance, when variable lookup just gives an ``error``is the variable is not found). - An //error-reporting checker//, which returns ``OK`` or a message saying where the error is. - An //annotating checker//, which returns a syntax tree annotated with more type information. To build a compiler back end, and in Lab 3, we need the third. In Lab 2, we build the second. #NEW ==The passes of the type checker== Pass 1: - start with empty signature - for each function ``f``, update signature with ``f : T`` Pass 2: - for each ``f``, check the function body of ``f`` with respect to the type ``T`` The expression checker consists of functions: ``` check (Exp e, Type t) returns void infer (Exp e) returns Type ``` These functions are defined by mutual recursion, by cases on the expression. We also need to check function definitions and sequences of statements. ``` check (Def d) returns void check (Stms ss) returns void ``` All functions use an environment (= signature and stack of contexts). The method is syntax-directed translation. #NEW ==Examples of type checking== We show syntax-directed translation in pseudocode. ``` infer x = // variable x t := lookup(x) return t infer i = // integer literal i return int infer f(a#sub1,..., a#subn) = T := lookup(f) if T = (A#sub1, ..., A#subn) -> B check a#sub1 : A#sub1 ... check a#subn : A#subn return B else failure ``` We return to the translation rules and their implementation in the next lecture. #NEW ==Supplementary topics in type checking== Not in Lab 2; go through if there is time. - type casts - arithmetic conversions - function overloading Advanced type systems; forthcoming in lecture 12 - higher-order function types - polymorphism - algebraic data types - dependent types #NEW ===Type casts=== An expression of one type can be used in another type. **Promotions**: no information is lost. E.g. ``char`` to ``int``. **Conversions**: information can be lost. E.g. ``int`` to ``double``, ``double`` to ``int``. **Explicit cast** ``` int i = 3 ; double d = (double)i ; // C and Java syntax double d = double(i) ; // C++ syntax ``` **Implicit cast** ``` int i = 3 ; double d = i ; ``` #NEW ===Arithmetic conversions=== (Not in Lab 2) In binary expressions like ``` a + b ``` if the two operands ``a`` and ``b`` have different types, the idea is to convert the "less precise" to the "more precise" type. Rules in C: - if one of the operands is ``double``, convert the other to ``double`` - otherwise, convert ``bool`` to ``int`` #NEW ===Function overloading=== (Not in Lab 2) Arithmetic operators can be seen as a special case of **overloaded functions**: the same name is used for many functions, each of which has a different type. Ada and C++ permit function overloading, which is resolved by argument types. ``` void print (int i) {...} // _print__Fi void print (double d) {...} // _print__Fd void print (string s) {...} // _print__Fs void print (string s, int indent) {...} // _print__Fsi ``` The compiler **resolves** overloading and replaces the name by a unique one. ``` print(2) // _print__Fi double d = 1.0 ; print(d + d) // _print__Fd ``` #NEW ===Overloading resolution rules=== In presence of implicit type casts, the rules have to find a best match by using the following principles in order of decreasing priority: - exact match - match using promotions - match using conversions Thus ``print(2)`` does //not// convert to ``double`` and use ``_print__Fd``. Example of a conflict: two equally good matches, depending on what arguments are considered. ``` int pow (int, int) ; double pow (double, double) ; pow(2.0, 2) // ambiguous! ``` Thus the C++ resolution rules for functions are different from operators after all. #NEW ==Lab 2 overview== We will read through the [Lab PM ../laborations/lab2/lab2.html] Next lecture: how to organize and write the code, using syntax-directed translation. Preparation and exercise: write typing rules for Lab PM constructs.