Lecture 7: Type Systems
Programming Languages Course
Aarne Ranta (aarne@cs.chalmers.se)

%!target:html

%!postproc(html): #NEW <!-- NEW -->
%!postproc(html): #HR <HR>
%!postproc(html): #sub1 <sub>1</sub>
%!postproc(html): #subn <sub>n</sub>


Book: 6.3, 6.5


#NEW

==The purpose of types==

To define what the program should do.
- e.g. read an array of integers and return a double


To guarantee that the program is meaningful.
- that it does not add a string to an integer
- that variables are declared before they are used


To document the programmer's intentions.
- better than comments, which are not checked by the compiler


To optimize the use of hardware.
- reserve the minimal amount of memory, but not more
- use the most appropriate machine instructions


#NEW

==What belongs to type checking==

Depending on language, the type checker can prevent 
- application of a function to wrong number of arguments,
- application of integer functions to floats,
- use of undeclared variables in expressions,
- functions that do not return values,
- division by zero
- array indices out of bounds,
- nonterminating recursion,
- sorting algorithms that don't sort...


Languages differ greatly in how strict their
static semantics is: none of the things above is
checked by all programming languages!

In general, the more there is static checking in the compiler, 
the less need there is for manual debugging.


#NEW

==Description formats for different compiler phases==

These formats are independent of implemantation language.

- Lexer: regular expressions

- Parser: BNF grammars

- **Type checker: typing rules**

- Interpreter: operational semantic rules

- Code generator: compilation schemes


#NEW

==Typing judgements and rules==

Typing rules concern **judgements** of the form
```
   E => e : T
```
where //E// is an **environment**, which contains e.g. typings of identifiers. 
The judgement says
- in the environment //E//, expression //e// has type //T//


Judgements are used in **typing rules** of the form
```
   J1  J2  ...  Jn
   --------------- C
         J
```
(n >= 0) which says
- from the judgements //J1, J2, ..., Jn// you may conclude //J//, 
  if condition //C// holds.


The judgements above the line in a rule are the **premisses**. 

The judgement under the line is the **conclusion**. 

The condition beside is a 
**side condition**, typically not expressible as a premiss.

The judgement are written in a formal language, whereas side conditions 
can be written in natural language.


#NEW

==Examples of typing rules and derivation==

Typing rules for arithmetic expressions
```
  E => e1 : int     E => e2 : int    E => e1 : int     E => e2 : int
  -------------------------------    -------------------------------
       E => e1 + e2 : int                E => e1 * e2 : int


       ---------- x : T is in E      ------------ i is an integer literal
       E => x : T                    E => i : int
```

Derivation of judgement ``x : int, y : int => x + 12 * y : int``
```
                            x:int, y:int => 12 : int    x:int, y:int => y : int
                            ---------------------------------------------------  
    x:int, y:int => x : int        x:int, y:int => 12 * y : int
    -----------------------------------------------------------
                x:int, y:int => x + 12 * y : int
```


#NEW
==Signature vs. context==

We generalize the type checking context to an **environment** with
two parts:
- **signature**, which shows the types of functions
- **context**, which shows the types of variables.


In the course of type checking, the signature remains the same 
throughout a program module, whereas
the context changes all the time.


#NEW

==Function types==

No expression in the language of Lab 2 has **function types**, 
because functions are never returned as values or used as arguments.

However, the compiler needs internally 
a data structure for function types, to hold the
types of the parameters and the return type. E.g. for a function
```
  bool between (int x, double a, double b) {...}
```
we write
```
  between : (int, double, double) -> bool
```
to express this internal representation in typing rules.


#NEW

==Notation for signature and context==

Dividing the environment to signature ``F`` and context ``G``,
```
  F,G => J
```
Example: typing rule for a variable expression
```
  ------------- x : T is in G                
  F,G => x : T
```
Example: typing rule for one-place function application
```
  F,G => e : A
  --------------- f : (A) -> T is in F                
  F,G => f(e) : T
```


#NEW

==The validity of statements==

Expressions have types, but statements do not.

However, also statements are checked in type checking.

We need a new judgement form, saying that a statement ``S`` is valid:
```
  F,G => S valid
```
Example: typing rule for an assignment
```
  F,G => e : T
  ------------------ x : T is in G                
  F,G => x = e ; valid
```
Example: typing rule for while loops
```
  F,G => e : bool   F,G => S valid
  --------------------------------
  F,G => while (e) S valid
```


#NEW

==Variable binding and context==

Contexts can be **extended** with new variables. The notation we use is
```
  (G, x : T)
```
This corresponds to **variable binding** constructs, e.g. declarations.

Example: typing rule for a declaration; ``SS`` is a sequence of statements
```
  F,(G, x:t) => SS valid
  ------------------------- x not in G
  F,G => t x ; SS valid
```
The rule says: if ``SS`` is a valid sequence of statements in context
``(G, x : T)``, then ``t x ; SS`` is valid in ``G``.


#NEW

==Example of variable binding and context==

We prove that ``int x ; x = x + 5 ;`` is valid in the empty context ``()``.
```
  x : int => x : int     x : int => 5 : int
  -----------------------------------------
        x : int => x + 5 : int
      --------------------------
      x : int => x = x + 5 ; valid
    -------------------------------
    () => int x ; x = x + 5 ; valid
```
The signature is omitted for simplicity.


#NEW

==Function definitions and applications==

The validity of a function definition.
```  
  F,(G, x#sub1:a#sub1,...,x#subn : a#subn) => SS valid
  --------------------------------------------- f not in F
  F,G => T f (A#sub1 x#sub1,...,A#subn x#subn) { SS } valid
```
More conditions:
- that ``x#sub1 ... x#subn`` are distinct
- that ``A#sub1 ... A#subn, T`` are types (can be guaranteed by syntax)
- that there is a ``return e`` with ``e : T`` (not always checked)


The typing rule for function applications is the following
```  
  F,G => f : (A#sub1,...,A#subn) -> T    F,G => e#sub1 : A#sub1,..., e#subn : A#subn
  ----------------------------------------------------------
                 F,G => f(e#sub1,...,e#subn) : T
```


#NEW

==Block structure==

In C, C++, Java, Haskell, etc, variables on the same level must be distinct
(e.g. in function parameter lists).

However, variables in an **inner block** are no longer on the same level
and can hence **overshadow** outer variables
```
  {
    int x = 1 ;
    bool b ;
    x = x + 2 ;         // x : int
    {
      double x = 2.0 ;
      x = x + 1.0 ;     // x : double
      b = true ;        // b : bool
    }
    x = x + 5 ;         // x : int
    b = b && b ;        // b : bool
  }
```
Variables declared in a block are discarded at exit from the block.

There is no limit in the number of block levels.


#NEW

==Contexts for block structure==

Implementation 1: with markers
```
  x : int, b : bool, MARK, x : double
```
- entering a block: add a marker
- leaving a block: delete variables after last marker, and the marker itself
- update: after the last variable
- lookup: the latest occurrence of the variable


Implementation 2: with stacks of contexts
```
  (x : int, b : bool).(x : double)
```
- entering a block: push an empty context on the stack
- leaving a block: pop the topmost context
- update: after the last variable in the topmost context
- lookup: the deepest surrounding occurrence of the variable


Implementation 2 can be done with lists of lists: the top of the stack is the head
of the list.


#NEW
==Typing rules for variables==

The rules are expressed as checking if a list of statements is valid.
Instead of a single context, we have a **stack of contexts** ``Gs.G`` where 
we denote by ``G`` the topmost context.
```
  -------------- x : T the closest entry for x in Gs.G
  Gs.G => x : T

  Gs.(G,x:t) => SS valid
  -------------------------- x not in G
  Gs.G => t x ; SS valid

  Gs => e : t     Gs => SS valid
  ------------------------------- x : t in Gs  
  Gs => x = e ; SS valid

  Gs.() => SS valid      Gs => SSS valid
  -------------------------------------
  Gs => { SS } SSS valid
```


#NEW

==Type checking and type inference==

**Type checking**: given a judgement ``G => e : T``, find out whether it
can be derived by the typing rules. The **derivation** is a tree of rule 
application with the judgement as the last line.

**Type inference**: given an expression ``e``,
find a type ``T`` in context ``G`` such that ``G => e : T``
can be derived by the typing rules.

For Java, C, and C++, we can mostly do with just type checking,
because types are marked explicitly.

Haskell has type inference as well: if you don't give the type,
the compiler can usually find //the most general type//.


#NEW

==Different type checkers==

We can classify checkers in terms of what they return:
- A //rude checker//, which only says ``True`` or ``False``, 
  and may even crash (for instance, when variable lookup
  just gives an ``error``is the variable is not found).
- An //error-reporting checker//, which returns ``OK`` or a message
  saying where the error is.
- An //annotating checker//, which returns a syntax tree annotated 
  with more type information.


To build a compiler back end, and in Lab 3, we need the third.

In Lab 2, we build the second.


#NEW

==The passes of the type checker==

Pass 1:
- start with empty signature
- for each function ``f``,  update signature with ``f : T``


Pass 2:
- for each ``f``,  check the function body of ``f``
  with respect to the type ``T``


The expression checker consists of functions:
```
  check (Exp e, Type t)  returns void
  infer (Exp e)          returns Type
```
These functions are defined by mutual recursion, by cases
on the expression. 

We also need to check function definitions and sequences of statements.
```
  check (Def  d)   returns void
  check (Stms ss)  returns void
```
All functions use an environment (= signature and stack of contexts).

The method is syntax-directed translation.


#NEW

==Examples of type checking==

We show syntax-directed translation in pseudocode.
```
  infer x =              // variable x
    t := lookup(x)
    return t

  infer i =              // integer literal i
    return int

  infer f(a#sub1,..., a#subn) =
    T := lookup(f)
    if T = (A#sub1, ..., A#subn) -> B
      check a#sub1 : A#sub1
      ...
      check a#subn : A#subn
      return B
    else failure
```
We return to the translation rules and their implementation in the
next lecture.

  
#NEW

==Supplementary topics in type checking==

Not in Lab 2; go through if there is time.
- type casts
- arithmetic conversions
- function overloading


Advanced type systems; forthcoming in lecture 12 
- higher-order function types
- polymorphism
- algebraic data types
- dependent types


#NEW

===Type casts===

An expression of one type can be used in another type.

**Promotions**: no information is lost. E.g. ``char`` to ``int``.

**Conversions**: information can be lost. E.g. ``int`` to ``double``,
``double`` to ``int``.

**Explicit cast**
```
  int i = 3 ;

  double d = (double)i ;  // C and Java syntax

  double d = double(i) ;  // C++ syntax
```
**Implicit cast**
```
  int i = 3 ;
 
  double d = i ;
```


#NEW

===Arithmetic conversions===

(Not in Lab 2)

In binary expressions like
```
  a + b
```
if the two operands ``a`` and ``b`` 
have different types, the idea is
to convert the "less precise" to the "more precise" type.

Rules in C:
- if one of the operands is ``double``, convert the other to ``double``
- otherwise, convert ``bool`` to ``int``


#NEW

===Function overloading===

(Not in Lab 2)

Arithmetic operators can be seen as a special case of **overloaded functions**:
the same name is used for many functions, each of which has a different type.

Ada and C++ permit function overloading, which is resolved
by argument types.
```
  void print (int i)    {...}                // _print__Fi
  void print (double d) {...}                // _print__Fd
  void print (string s) {...}                // _print__Fs
  void print (string s, int indent) {...}    // _print__Fsi
```
The compiler **resolves** overloading and replaces the name by a unique one.
```
  print(2)          // _print__Fi

  double d = 1.0 ;
  print(d + d)      // _print__Fd
```


#NEW

===Overloading resolution rules===

In presence of implicit type casts, the rules have to find a best match
by using the following principles in order of decreasing priority:
- exact match
- match using promotions
- match using conversions


Thus ``print(2)`` does //not// convert to ``double`` and use
``_print__Fd``.

Example of a conflict: two equally good matches, depending on what
arguments are considered.
```
  int    pow (int,    int) ;
  double pow (double, double) ;

  pow(2.0, 2)  // ambiguous!
```
Thus the C++ resolution rules for functions are different 
from operators after all.


#NEW

==Lab 2 overview==

We will read through the [Lab PM ../laborations/lab2/lab2.html]

Next lecture: how to organize and write the code, using
syntax-directed translation.

Preparation and exercise: write typing rules for Lab PM
constructs.