Type checker for C/C++ (DRAFT) Programming Languages Course, 2007, Laboration 2 Aarne Ranta (aarne (at) cs.chalmers.se) %!target:html %!postproc(html): #NEW #NEW =Summary= The objective of this lab is to write a type checker for a fragment of the C++ programming language. The type checker should return a type-annotated program at success, and report a type error at failure. Before the lab can be submitted, the type checker has to pass some tests, which are given on the course web page via links later in this document. The recommended implementation is via a BNF grammar processed by the BNF Converter (BNFC) tool. The syntax tree created by the parser should then be processed further by a program using the skeleton generated by BNFC, and the annotated syntax tree should be printed out using the pretty printer also generated by the BNFC. The fragment of C++ covered is smaller than in [Laboration 1 ../lab1/lab1.html], and does not really include any C++ specific features not available in C. In two respects, the language is not a subset of C++: + functions can be mutually recursive + type names can be used before they are defined #NEW The approximate size of the grammar is 50 rules, and the type checker code should be 100-300 lines, depending on the programming language used. All BNFC supported languages can be used, but guidance is guaranteed only for Haskell, Java 1.5, and C++. The type system is partially characterized by formal rules in [Lecture 7 ../../lectures/proglang-07.html]. #NEW =Method= First build the grammar by scaling down the grammar from Lab 1, so that it is just big enough to meet the specification below and parse the test programs. Having superfluous rules in the grammar will produce more work in writing the type checking rules. You may want to have an intervening **desugaring** phase to reduce the number of abstract syntax rules. Alternatively, desugaring can be incorporated in the type checker. But the following desugaring is in any case expected to be done in the lab: - translate increments and decrements to assignments; for simplicity, treat postincrements (``i++``) and preincrements (``++i``) in the same way: ``i = i + 1`` - translate multiple-variable declarations to sequences of one-variable declarations: ``int x,y`` becomes ``int x ; int y``. In the type checker itself, the recommended procedure is two passes: + build a symbol table with all function types and type definitions + type check and annotate the code by using this symbol table #NEW =Types and annotations= ==Types== Only the four built-in types ``` int double bool void ``` are taken into account. Every expression has one of these types. Types defined by ``typedef`` must be computed to one of the four basic types before performing type checking. #NEW Types of functions in the symbol table can be represented in any way that stores their argument and return types. For instance, the function header ``` int f (double x, bool b) ``` can create a symbol table entry ``` f |-> ([double, bool], int) ``` mapping the name ``f`` to a pair whose first component is the list of argument types and the second component is the return type. #NEW ==Normalizing types== All types occurring in the code must be **normalized**, i.e. replaced by one of the four basic types in the code printed by the type checker. This concerns types occurring in - declarations - function headers - type definitions Thus we translate ``` typedef Number Limit ; typedef int Limit ; Number f (Limit c) int f (int c) { { Number n = 4 ; ---> int n = 4 ; } } typedef int Number ; typedef int Number ; ``` In this example, also notice that type definitions may refer to later type definitions. This follows the same logic as the mutual recursivity of functions. #NEW ==Annotations== Annotation means that you wrap certain expressions in a type-indicating identity function, whose C++ signature is ``` template T typed (T x) { return x ; } ``` You don't need to be able to deal with this function declaration in your type checker! But you do need to deal with expressions that call it, e.g. ``` typed(1 + 2) ``` #NEW The purpose of the type checker is to annotate with ``typed`` all those expressions of type ``T`` whose type will be needed at a later code generation phase. Since code generation has not been covered in the course yet, let us list here the cases that have to be annotated || Description | Example | Outcome || | variable expression | ``x`` | ``typed(x)`` | arithmetic operation ``+ - * / %`` | ``x + 2`` | ``typed(x + 2)`` | comparison ``== != < > >= <=`` | ``x < 2`` | ``typed(x) < 2`` | assignment | ``x = y`` | ``x = typed(y)`` | increment/decrement | ``d++`` | ``typed(d++)`` #NEW Why just these? In brief, machine languages like JVM have type-dependent instructions for variables, arithmetic, comparisons, and assignment. For instance, ``` x = x + y * z ``` generates either of ``` iload 1 dload 1 iload 2 dload 2 iload 3 dload 3 imul dmul iadd dadd istore 1 dstore 1 ``` depending on whether the type is ``int`` or ``double``, respectively. #NEW In order to guarantee the locality of code generation, annotations must be done recursively in all subterms. Thus the annotated form of the example is, provided are variables are ``double``, ``` x = typed( typed(x) + typed( typed(y) * typed(z))) ``` Notice that the outermost annotation is not repeated for the addition and for the declaration; in fact, multiple annotations must always be removed by using the rule ``` typed(typed(e)) ---> typed(e) ``` #NEW =Language specification= This chapter is a reduced version of the language specification in Lab 1, only indicating those language structures that must be covered in the type checker. Like Lab 1, this specification differs in some places from the official C++ specification. ==Programs== A program is a sequence of definitions. A program may also contain comments and preprocessor directives, which are just ignored by the parser (see below). #NEW ==Definitions== 1. A function definition has a type, a name, an argument list, and a body. Example: ``` int foo(double x, int y) { return y + 9 ; } ``` 2. Also ``typedef`` statements can be used as top-level definitions. (Thus no variable and structure definitions.) (No ``using`` definitions. No ``inline`` functions.) #NEW ===Typing rules=== The same name may be used in at most one function definition and in at most one type definition. All ``return`` statements in a function body must be return an expression whose type is the return type of the function. But you don't need to check that there actually is a ``return`` statement. #NEW ==Argument lists, declarations, and function bodies== An argument list is a comma-separated list of argument declarations. It is enclosed in parentheses ``(`` and ``)``. An argument declaration has always a type. This type is optionally followed by an identifier or an identifier and an initialization. (Thus no ``const`` specifiers). The following are examples of argument declarations. ``` int int x int x = 5 ``` Notice that argument declarations with multiple variables (``int x, y``) are not included. A declaration that occurs as a statement (as shown below), can also have multiple variables. But it must have at least one variable. A function body is a list of statements enclosed in curly brackets ``{`` and ``}`` . (Thus no empty bodies.) #NEW ===Typing rules=== An argument list may only use each variable once. The expression ``e`` in an initialization ``typ v = e`` must have the type ``typ``. #NEW ==Statements== Any expression followed by a semicolon ``;`` can be used as a statement. Any declaration followed by a semicolon ``;`` can be used as a statement. Declarations have the same form as argument declarations in functions, except that they have one or more variables. Statements returning an expression, for example ``` return i + 9 ; ``` **Typing**. The type of the returned expression must be the same as the return type of the function in which it occurs. While loops, with an expression in parentheses followed by a statement, for example: ``` while (i < 10) ++i ; ``` **Typing**. The expression must have type **bool**. (No do-while loops.) (No for loops.) #NEW Conditionals: ``if`` with an expression in parentheses followed by a statement and optionally by ``else`` and a statement. Examples: ``` if (x > 0) return x ; if (x > 0) return x ; else return y ; ``` **Typing**. The expression must have type **bool**. Blocks: any list of statement (including empty list) between curly brackets. For instance, ``` { int i = 2 ; { } i++ ; } ``` (Type definitions inside function bodies are not included.) (Structure definitions are not included.) #NEW ===Typing rules=== A variable may only be declared once on the same block level. The parameters of a function have the same level as the top-level block in the body. #NEW ==Expressions== The following table gives the expressions and their precedence levels. Infix operators are assumed to be left-associative. The arguments in a function call can be expressions of any level. Otherwise, the subexpressions are assumed to be one precedence level above of the main expression. **Note**. The table is not guaranteed to be exactly the same as in the C++ standard. || level | expression forms | explanation | type || | 16 | literal | literal | literal type | 16 | identifier | variable | declared type | 15 | ``f(e,...,e)`` | function call | return type | 14 | ``e++``, ``e--`` | in/decrement | (sugar) | 13 | ``++e``, ``--e`` | in/decrement | (sugar) | 12 | ``e*e``, ``e/e`` | mult, div | operand type (int or double) | 12 | ``e%e`` | rem | int | 11 | ``e+e``, ``e-e`` | add, sub | operand type (int or double) | 9 | ``ee``, ``e>=e``, ``e<=e`` | comparison | bool | 8 | ``e==e``, ``e!=e`` | (in)equality | bool | 4 | ``e&&e`` | conjunction | bool | 3 | ``e||e`` | disjunction | bool | 2 | ``v=e`` | assignment | type of LHS #NEW ===Typing rules=== Integer, double, and boolean literals have their usual types, Variables have the type declared in the nearest enclosing block. A variable must be declared before it is used in an expression. The arguments of a function call must have types corresponding to the argument types of the called function. The number of arguments must be the same as in the function declaration (thus the C++ default argument rule is not applied). The operands of remainder (``e % e``) must be integers. Comparison and (in)equality apply to two operands of the same type, which is ``int`` or ``double``. Conjunction and disjunction apply to operands of type ``bool`` only. Assignments have the same type as the left-hand-side variable. Notice that only variables are used as left-hand-sides. #NEW ==Qualified constants and template instantiations== (Qualified constants are no included.) The only template instantiations are in one-place applications of the form ``` ident < type > ( expression ) ``` where the only ``ident`` that needs to be handled is ``typed``, used for type annotations. #NEW ==Types== Types are either identifiers (defined in ``typedef``) or built-in types, of which we include the following: ``` bool double int void ``` #NEW ==Literals== We include integer literals and floating point literals. There are also two boolean literals, ``true`` and ``false``. (No string literals included.) ==Identifiers== An identifier is a letter followed by a list of letters, digits, and underscores. ==Comments== There are three kinds of comments. - anything between tokens ``/*`` and ``*/`` - anything from token ``//`` to the end of a line or the file - anything from token ``#`` to the end of a line or the file (preprocessor directive) Comments cannot be nested. #NEW =Lab format= ==Input and output== The type checker must be a program called ``lab2``, which is executed by the command ``` lab2 ``` and prints its output to the standard output. The output at success must be the line ``OK`` followed by the print-out of the type-annotated tree. The output at failure must be the line ``TYPE ERROR`` followed by a message indicating the first error. For syntax errors, the line must say ``SYNTAX ERROR``. #NEW ==Example of success== Source file ``` // file good.cc typedef Number Limit ; Number f (Limit c) { Number n = 4 + 5 ; ++n ; } typedef int Number ; ``` Running the type checker ``` % lab2 good.cc OK typedef int Limit ; int f (int c) { int n = typed(4 + 5) ; n = typed(typedef(n) + 1) ; } typedef int Number ; ``` #NEW ==Example of failure== Source file ``` // file bad.cc typedef Number Limit ; Number f (Limit c) { Number n = 1 ; while(n)++n ; } typedef int Number ; ``` Running the type checker ``` % lab2 bad.cc TYPE ERROR condition n in while: expected bool, found int ``` #NEW ==Compiling the type checker== The type checker is submitted in a tar package, which contains a ``Makefile`` such that typing ``` make ``` produces the executable program ``lab2`` in a normal Unix environment. The standard compilers and parser and lexer tools can be assumed to be a part of this environment and should not be included in the tar package. #NEW ==A hint on file structure== The simplest way of producing the type checker is to use the BNFC-generated test file as a template. Just modify its main function so that, instead of the parse tree, it prints the type checker output specified above. In addition to the BNFC-generated modules, you should thus only need - a type checker module (adapted from the skeleton) - a desugaring module (adapted from the skeleton) - a new main module (adapted from the test file) - a new ``Makefile`` (replacing ``make test`` with a command making ``lab2``) - optionally, a separate module defining the symbol table data structure - (in Haskell) optinally, a separate module defining the compiler state monad #NEW =Test programs= Forthcoming. =Success criteria= Your grammar must pass the test suite (forthcoming). The test suite contains some correct programs, as well as a number of programs which the type checker must reject with adequate error messages. =Submission= Submit your lab using [Fire https://fire.cs.chalmers.se:8014/cgi/Fire-progs], when Lab 2 is in place. If you have any problems getting the test program to run, or if you think that there is an error in the test suite, contact bringert@cs.chalmers.se.