Lecture 7: Type Systems
Programming Languages Course
Aarne Ranta (aarne@cs.chalmers.se)
%!target:html
%!postproc(html): #NEW
%!postproc(html): #HR
%!postproc(html): #sub1 1
%!postproc(html): #subn n
Book: 6.3, 6.5
#NEW
==The purpose of types==
To define what the program should do.
- e.g. read an array of integers and return a double
To guarantee that the program is meaningful.
- that it does not add a string to an integer
- that variables are declared before they are used
To document the programmer's intentions.
- better than comments, which are not checked by the compiler
To optimize the use of hardware.
- reserve the minimal amount of memory, but not more
- use the most appropriate machine instructions
#NEW
==What belongs to type checking==
Depending on language, the type checker can prevent
- application of a function to wrong number of arguments,
- application of integer functions to floats,
- use of undeclared variables in expressions,
- functions that do not return values,
- division by zero
- array indices out of bounds,
- nonterminating recursion,
- sorting algorithms that don't sort...
Languages differ greatly in how strict their
static semantics is: none of the things above is
checked by all programming languages!
In general, the more there is static checking in the compiler,
the less need there is for manual debugging.
#NEW
==Description formats for different compiler phases==
These formats are independent of implemantation language.
- Lexer: regular expressions
- Parser: BNF grammars
- **Type checker: typing rules**
- Interpreter: operational semantic rules
- Code generator: compilation schemes
#NEW
==Typing judgements and rules==
Typing rules concern **judgements** of the form
```
E => e : T
```
where //E// is an **environment**, which contains e.g. typings of identifiers.
The judgement says
- in the environment //E//, expression //e// has type //T//
Judgements are used in **typing rules** of the form
```
J1 J2 ... Jn
--------------- C
J
```
(n >= 0) which says
- from the judgements //J1, J2, ..., Jn// you may conclude //J//,
if condition //C// holds.
The judgements above the line in a rule are the **premisses**.
The judgement under the line is the **conclusion**.
The condition beside is a
**side condition**, typically not expressible as a premiss.
The judgement are written in a formal language, whereas side conditions
can be written in natural language.
#NEW
==Examples of typing rules and derivation==
Typing rules for arithmetic expressions
```
E => e1 : int E => e2 : int E => e1 : int E => e2 : int
------------------------------- -------------------------------
E => e1 + e2 : int E => e1 * e2 : int
---------- x : T is in E ------------ i is an integer literal
E => x : T E => i : int
```
Derivation of judgement ``x : int, y : int => x + 12 * y : int``
```
x:int, y:int => 12 : int x:int, y:int => y : int
---------------------------------------------------
x:int, y:int => x : int x:int, y:int => 12 * y : int
-----------------------------------------------------------
x:int, y:int => x + 12 * y : int
```
#NEW
==Signature vs. context==
We generalize the type checking context to an **environment** with
two parts:
- **signature**, which shows the types of functions
- **context**, which shows the types of variables.
In the course of type checking, the signature remains the same
throughout a program module, whereas
the context changes all the time.
#NEW
==Function types==
No expression in the language of Lab 2 has **function types**,
because functions are never returned as values or used as arguments.
However, the compiler needs internally
a data structure for function types, to hold the
types of the parameters and the return type. E.g. for a function
```
bool between (int x, double a, double b) {...}
```
we write
```
between : (int, double, double) -> bool
```
to express this internal representation in typing rules.
#NEW
==Notation for signature and context==
Dividing the environment to signature ``F`` and context ``G``,
```
F,G => J
```
Example: typing rule for a variable expression
```
------------- x : T is in G
F,G => x : T
```
Example: typing rule for one-place function application
```
F,G => e : A
--------------- f : (A) -> T is in F
F,G => f(e) : T
```
#NEW
==The validity of statements==
Expressions have types, but statements do not.
However, also statements are checked in type checking.
We need a new judgement form, saying that a statement ``S`` is valid:
```
F,G => S valid
```
Example: typing rule for an assignment
```
F,G => e : T
------------------ x : T is in G
F,G => x = e ; valid
```
Example: typing rule for while loops
```
F,G => e : bool F,G => S valid
--------------------------------
F,G => while (e) S valid
```
#NEW
==Variable binding and context==
Contexts can be **extended** with new variables. The notation we use is
```
(G, x : T)
```
This corresponds to **variable binding** constructs, e.g. declarations.
Example: typing rule for a declaration; ``SS`` is a sequence of statements
```
F,(G, x:t) => SS valid
------------------------- x not in G
F,G => t x ; SS valid
```
The rule says: if ``SS`` is a valid sequence of statements in context
``(G, x : T)``, then ``t x ; SS`` is valid in ``G``.
#NEW
==Example of variable binding and context==
We prove that ``int x ; x = x + 5 ;`` is valid in the empty context ``()``.
```
x : int => x : int x : int => 5 : int
-----------------------------------------
x : int => x + 5 : int
--------------------------
x : int => x = x + 5 ; valid
-------------------------------
() => int x ; x = x + 5 ; valid
```
The signature is omitted for simplicity.
#NEW
==Function definitions and applications==
The validity of a function definition.
```
F,(G, x#sub1:a#sub1,...,x#subn : a#subn) => SS valid
--------------------------------------------- f not in F
F,G => T f (A#sub1 x#sub1,...,A#subn x#subn) { SS } valid
```
More conditions:
- that ``x#sub1 ... x#subn`` are distinct
- that ``A#sub1 ... A#subn, T`` are types (can be guaranteed by syntax)
- that there is a ``return e`` with ``e : T`` (not always checked)
The typing rule for function applications is the following
```
F,G => f : (A#sub1,...,A#subn) -> T F,G => e#sub1 : A#sub1,..., e#subn : A#subn
----------------------------------------------------------
F,G => f(e#sub1,...,e#subn) : T
```
#NEW
==Block structure==
In C, C++, Java, Haskell, etc, variables on the same level must be distinct
(e.g. in function parameter lists).
However, variables in an **inner block** are no longer on the same level
and can hence **overshadow** outer variables
```
{
int x = 1 ;
bool b ;
x = x + 2 ; // x : int
{
double x = 2.0 ;
x = x + 1.0 ; // x : double
b = true ; // b : bool
}
x = x + 5 ; // x : int
b = b && b ; // b : bool
}
```
Variables declared in a block are discarded at exit from the block.
There is no limit in the number of block levels.
#NEW
==Contexts for block structure==
Implementation 1: with markers
```
x : int, b : bool, MARK, x : double
```
- entering a block: add a marker
- leaving a block: delete variables after last marker, and the marker itself
- update: after the last variable
- lookup: the latest occurrence of the variable
Implementation 2: with stacks of contexts
```
(x : int, b : bool).(x : double)
```
- entering a block: push an empty context on the stack
- leaving a block: pop the topmost context
- update: after the last variable in the topmost context
- lookup: the deepest surrounding occurrence of the variable
Implementation 2 can be done with lists of lists: the top of the stack is the head
of the list.
#NEW
==Typing rules for variables==
The rules are expressed as checking if a list of statements is valid.
Instead of a single context, we have a **stack of contexts** ``Gs.G`` where
we denote by ``G`` the topmost context.
```
-------------- x : T the closest entry for x in Gs.G
Gs.G => x : T
Gs.(G,x:t) => SS valid
-------------------------- x not in G
Gs.G => t x ; SS valid
Gs => e : t Gs => SS valid
------------------------------- x : t in Gs
Gs => x = e ; SS valid
Gs.() => SS valid Gs => SSS valid
-------------------------------------
Gs => { SS } SSS valid
```
#NEW
==Type checking and type inference==
**Type checking**: given a judgement ``G => e : T``, find out whether it
can be derived by the typing rules. The **derivation** is a tree of rule
application with the judgement as the last line.
**Type inference**: given an expression ``e``,
find a type ``T`` in context ``G`` such that ``G => e : T``
can be derived by the typing rules.
For Java, C, and C++, we can mostly do with just type checking,
because types are marked explicitly.
Haskell has type inference as well: if you don't give the type,
the compiler can usually find //the most general type//.
#NEW
==Different type checkers==
We can classify checkers in terms of what they return:
- A //rude checker//, which only says ``True`` or ``False``,
and may even crash (for instance, when variable lookup
just gives an ``error``is the variable is not found).
- An //error-reporting checker//, which returns ``OK`` or a message
saying where the error is.
- An //annotating checker//, which returns a syntax tree annotated
with more type information.
To build a compiler back end, and in Lab 3, we need the third.
In Lab 2, we build the second.
#NEW
==The passes of the type checker==
Pass 1:
- start with empty signature
- for each function ``f``, update signature with ``f : T``
Pass 2:
- for each ``f``, check the function body of ``f``
with respect to the type ``T``
The expression checker consists of functions:
```
check (Exp e, Type t) returns void
infer (Exp e) returns Type
```
These functions are defined by mutual recursion, by cases
on the expression.
We also need to check function definitions and sequences of statements.
```
check (Def d) returns void
check (Stms ss) returns void
```
All functions use an environment (= signature and stack of contexts).
The method is syntax-directed translation.
#NEW
==Examples of type checking==
We show syntax-directed translation in pseudocode.
```
infer x = // variable x
t := lookup(x)
return t
infer i = // integer literal i
return int
infer f(a#sub1,..., a#subn) =
T := lookup(f)
if T = (A#sub1, ..., A#subn) -> B
check a#sub1 : A#sub1
...
check a#subn : A#subn
return B
else failure
```
We return to the translation rules and their implementation in the
next lecture.
#NEW
==Supplementary topics in type checking==
Not in Lab 2; go through if there is time.
- type casts
- arithmetic conversions
- function overloading
Advanced type systems; forthcoming in lecture 12
- higher-order function types
- polymorphism
- algebraic data types
- dependent types
#NEW
===Type casts===
An expression of one type can be used in another type.
**Promotions**: no information is lost. E.g. ``char`` to ``int``.
**Conversions**: information can be lost. E.g. ``int`` to ``double``,
``double`` to ``int``.
**Explicit cast**
```
int i = 3 ;
double d = (double)i ; // C and Java syntax
double d = double(i) ; // C++ syntax
```
**Implicit cast**
```
int i = 3 ;
double d = i ;
```
#NEW
===Arithmetic conversions===
(Not in Lab 2)
In binary expressions like
```
a + b
```
if the two operands ``a`` and ``b``
have different types, the idea is
to convert the "less precise" to the "more precise" type.
Rules in C:
- if one of the operands is ``double``, convert the other to ``double``
- otherwise, convert ``bool`` to ``int``
#NEW
===Function overloading===
(Not in Lab 2)
Arithmetic operators can be seen as a special case of **overloaded functions**:
the same name is used for many functions, each of which has a different type.
Ada and C++ permit function overloading, which is resolved
by argument types.
```
void print (int i) {...} // _print__Fi
void print (double d) {...} // _print__Fd
void print (string s) {...} // _print__Fs
void print (string s, int indent) {...} // _print__Fsi
```
The compiler **resolves** overloading and replaces the name by a unique one.
```
print(2) // _print__Fi
double d = 1.0 ;
print(d + d) // _print__Fd
```
#NEW
===Overloading resolution rules===
In presence of implicit type casts, the rules have to find a best match
by using the following principles in order of decreasing priority:
- exact match
- match using promotions
- match using conversions
Thus ``print(2)`` does //not// convert to ``double`` and use
``_print__Fd``.
Example of a conflict: two equally good matches, depending on what
arguments are considered.
```
int pow (int, int) ;
double pow (double, double) ;
pow(2.0, 2) // ambiguous!
```
Thus the C++ resolution rules for functions are different
from operators after all.
#NEW
==Lab 2 overview==
We will read through the [Lab PM ../laborations/lab2/lab2.html]
Next lecture: how to organize and write the code, using
syntax-directed translation.
Preparation and exercise: write typing rules for Lab PM
constructs.