Type checker for C/C++
Programming Languages Course, 2011, Laboration 2
Aarne Ranta (aarne (at) chalmers.se)

%!target:html

%!postproc(html): #NEW <!-- NEW -->


#NEW

=Summary=

The objective of this lab is to write a type checker for a fragment
of the C++ programming language. The type checker should return 
an "OK" at success, and report a type error at failure.

Before the lab can be submitted, the type checker has to pass some
tests, which are given on the course web page via links
later in this document. 

The recommended implementation is via a BNF
grammar processed by the BNF Converter (BNFC) tool. The syntax
tree created by the parser should then be processed further by 
a program using the skeleton generated by BNFC.

The fragment of C++ covered is smaller than in 
[Laboration 1 ../lab1/lab1.html], and does not really include any
C++ specific features not available in C. 


#NEW

The approximate size of the grammar is 50 rules, and the type checker
code should be 100-300 lines, depending on the programming language
used.

All BNFC supported languages can be used, but guidance is guaranteed
only for Haskell, Java 1.5, and C++.

The type system is partially characterized by formal rules in
[Lecture 7 ../../lectures/proglang-07.html].


#NEW

=Method=

First build the grammar by scaling down the grammar from Lab 1,
so that it is just big enough to meet the specification below and
parse the [test programs ./testsuite]. 
Having superfluous rules in the grammar
will produce more work in writing the type checking rules.

In the type checker itself, the recommended procedure is two passes:
+ build a symbol table with all function types
+ type check and annotate the code by using this symbol table


#NEW

=Types and annotations=

==Types==

Only the four built-in types
```
  int
  double
  bool
  void
```
are taken into account. Every expression has one of these types.


#NEW

Types of functions in the symbol table
can be represented in any way that stores their
argument and return types. For instance, the function header
```
  int f (double x, bool b)
```
can create a symbol table entry
```
  f |-> ([double, bool], int)
```
mapping the name ``f`` to a pair whose first component
is the list of argument types and the second component is the
return type.


#NEW

=Language specification=

This chapter is a reduced version of the language specification in 
Lab 1, only indicating those language structures that must be covered
in the type checker.

Like Lab 1, this specification differs in some places from the official C++
specification.


==Programs==

A program is a sequence of definitions.

A program may also contain comments and preprocessor directives,
which are just ignored by the parser (see below).

#NEW

==Definitions==

A function definition has a type, a name, an argument list, and a body.
Example:
```
  int foo(double x, int y)
  {
    return y + 9 ;
  }
```

(No variable and structure definitions.)

(No ``using`` definitions. No ``inline`` functions. No ``typedef``s.)


#NEW

===Typing rules===

The same function name may be used in at most one function definition.

All ``return`` statements in a function body must
return an expression whose type is the
return type of the function. 

You don't need to check that there actually is a ``return`` statement (you can do this
optionally).


#NEW

==Argument lists, declarations, and function bodies==

An argument list is a comma-separated list of argument declarations.
It is enclosed in parentheses ``(`` and ``)``.

An argument declaration has a type and an identifier, for instance
```
  int x
```
(There are no ``const`` specifiers, no initializations, 
no types without identifiers). 

Notice that argument declarations with multiple variables
(``int x, y``) are not included. A declaration that occurs
as a statement (as shown below), can also have multiple variables.
But it must have at least one variable.

A function body is a list of statements enclosed
in curly brackets ``{`` and ``}`` .

(No function definitions without bodies; but a body can be just ``{}``.)


#NEW

===Typing rules===

An argument list may only use each variable once.


#NEW

==Statements==

Any expression followed by a semicolon ``;`` can be used as a statement.

Any declaration followed by a semicolon ``;`` can be used as a statement.
Declarations have one of the following formats:
- type and one variable (as in function parameter lists):
```
  int i ;

```
- type and many variables
```
  int i, j ;

```
- type and one initialized variable
```
  int i = 6 ;
```
**Typing**. The initializing expression must have the declared type.


Statements returning an expression, for example
```
  return i + 9 ;
```
**Typing**. The type of the returned expression must be the same as
the return type of the function in which it occurs.

While loops, with an expression in parentheses followed by a statement, for
example:
```
  while (i < 10) ++i ;
```
**Typing**. The expression must have type **bool**.

(No do-while loops.)

(No for loops.)


#NEW

Conditionals: ``if`` with an expression in parentheses followed by
a statement, ``else``, and another statement. Examples:
```
  if (x > 0) return x ; else return y ;
```
**Typing**. The expression must have type **bool**.

(No else-less if statements.)

Blocks: any list of statements (including empty list) between 
curly brackets. For instance,
```
  { 
    int i = 2 ;
    {
    }
    i++ ;
  }
```

(No type definitions or structure definitions.)


#NEW

===Typing rules===

A variable may only be declared once on the same block level.

The parameters of a function have the same level as the
top-level block in the body.

#NEW

==Expressions==

The following table gives the expressions and their precedence levels.
Infix operators are assumed to be left-associative. The arguments in
a function call can be expressions of any level.
Otherwise, the subexpressions are assumed to be one precedence level above of 
the main expression.

**Note**. The table is not guaranteed to be exactly the same as in 
the C++ standard.

  || level   | expression forms          | explanation     | type   ||
  |  16      | literal                   | literal         | literal type
  |  16      | identifier                | variable        | declared type
  |  15      | ``f(e,...,e)``            | function call   | return type
  |  14      | ``v++``, ``v--``          | in/decrement    | (sugar)
  |  13      | ``++v``, ``--v``          | in/decrement    | (sugar)
  |  12      | ``e*e``, ``e/e``          | mult, div   | operand type (int or double)
  |  11      | ``e+e``, ``e-e``          | add, sub | operand type (int or double)
  |   9      | ``e<e``, ``e>e``, ``e>=e``, ``e<=e`` | comparison | bool
  |   8      | ``e==e``, ``e!=e``           | (in)equality | bool
  |   4      | ``e&&e``                 | conjunction | bool
  |   3      | ``e||e``                 | disjunction | bool
  |   2      | ``v=e``                  | assignment  | type of LHS


#NEW

===Typing rules===

Integer, double, and boolean literals have their usual types,

Variables have the type declared in the nearest enclosing block.
A variable must be declared before it is used in an expression.

The arguments of a function call must have types corresponding to
the argument types of the called function. The number of arguments
must be the same as in the function declaration (thus the C++
default argument rule is not applied). Notice that only
identifiers are used as functions.

This can be covered in the parser: increments and decrements only apply to variables.

Comparison and (in)equality apply to two operands of the same type,
which is ``int``, ``double``, or ``bool``.

Conjunction and disjunction apply to operands of type ``bool`` only.

Assignments have the same type as the left-hand-side variable.
Notice that only variables are used as left-hand-sides.

(There are no qualified constants or template instantiations.)

#NEW


==Types==

Types are just the following:
```
  bool
  double
  int
  void
```
(Hence no type constants, no reference, array, or pointer types.)


#NEW

==Literals==

We include integer literals and floating point literals.

There are also two boolean literals, ``true`` and ``false``.

(Notice that the names ``true`` and ``false`` were not specified
as literals in Lab 1, so you probably treated them as identifiers.)

(No string literals included.)


==Identifiers==

An identifier is a letter followed by a list of letters, digits, and underscores.


==Comments==

There are three kinds of comments.
- anything between tokens ``/*`` and ``*/``
- anything from token ``//`` to the end of a line or the file
- anything from token ``#`` to the end of a line or the file
  (preprocessor directive)


Comments cannot be nested.


#NEW

=Lab format=

==Input and output==

The type checker must be a program called ``lab2``, which is 
executed by the command
```
  lab2 <SourceFile>
```
and prints its output to the standard output. The output at success
must be just the line ``OK``.

The output at failure must be the line ``TYPE ERROR`` followed by
a message indicating the first error. For syntax errors,
the line must say ``SYNTAX ERROR``.

The easiest way to produce this format is to use the ready-made files in either of
- [Haskell package haskell/]
- [Java 1.5 package java1.5/]


See also [lecture 8 ../../lectures/proglang-08.html] for the use of this format.


#NEW

==Example of success==

Source file
```
  // file good.cc

  int factr (int n) 
  {
    if (n<2)
      return 1 ;
    else 
      return (n * factr(n-1)) ;
  }

  int main () 
  {
    int i = factr(7) ;
    return 0 ;
  }
```
Running the type checker
```
  % lab2 good.cc
  OK
```

#NEW

==Example of failure==

Source file
```
  // file bad.cc

  int f (double c)  
  {
    int n = 1 ;
    while(c) ++n ;
  }                   
```
Running the type checker
```
  % lab2 bad.cc
  TYPE ERROR

  condition c in while: expected bool, found double
```

#NEW

==Compiling the type checker==

The type checker is submitted as a tar package, which contains a ``Makefile``
such that typing
```
  make
```
produces the executable program ``lab2`` in a normal Unix environment.
The standard compilers and 
parser and lexer tools can be assumed to be a part of
this environment and should not be included in the tar package.

The easiest way to produce this behaviour is to use the ready-made files in either the
[Haskell package haskell/] or the [Java 1.5 package java1.5/].


#NEW

==A hint on file structure==

The simplest way of producing the type checker is to use the
BNFC-generated test file as a template. Just modify its main
function so that, instead of the parse tree, it prints the
type checker output specified above.

We recommend the use of either the
[Haskell package haskell/] or the [Java 1.5 package java1.5/].
In these packages, you only have to modify two file:
- ``CPP.cf``: the grammar
- ``TypeChecker.hs`` or ``TypeChecker.java``, depending on your implementation language


See [Lecture 7 ../../lectures/proglang-07.html] for how the type checker
module looks like.


#NEW

=Test programs=

Run the [test suite ./testsuite] before submitting the lab.


=Success criteria=

Your grammar must pass the [test suite testsuite/]
The test suite contains some correct programs, as well
as a number of programs which the type checker must reject
with adequate error messages.

The solution must be written in an easily readable and
maintainable way. This will moreover make its reuse in Lab 3
painless, and you will save a lot of work!


=Submission=

Submit your lab using [Fire https://fire.cs.chalmers.se:8009/cgi/Fire-proglang].
when Lab 2 is in place.

You should submit the minimal set of files needed to run 
the test program against your solution.

For Haskell solutions, this includes:

- ``CPP.cf`` (or whatever you called your grammar)
- ``Makefile`` (see above)
- ``lab2.hs`` (see above)
- ``TypeChecker.hs`` and any other Haskell modules that
  you have written.


For Java solutions, this includes:

- ``CPP.cf`` (or whatever you called your grammar)
- ``Makefile`` (see above)
- ``lab2`` (see above)
- ``lab2.java`` (see above)
- ``TypeException.java`` (see above)
- ``TypeChecker.java``
   and any other Java files that you have written.


If you have any problems getting the test program to run, 
or if you think that there is an error in the test suite,
contact the teachers via the course Google group.