The objective of this lab is to write a type checker and an interpreter for a fragment of the C++ programming language. The type checker should check the program and send it to the interpreter at success. The interpreter should run the program and correctly perform all its input and output actions. At type checking failure, a type error should be reported.
Before the lab can be submitted, the program has to pass some tests, which are given on the course web page via links later in this document.
The recommended implementation is via a BNF grammar processed by the BNF Converter (BNFC) tool. The syntax tree created by the parser should then be processed further by a program using the skeleton generated by BNFC.
The fragment of C++ covered is smaller than in Laboration 1, and does not really include any C++ specific features not available in C. You can use the grammar given here, also explained in the Lecture notes, Chapter 2.
The type checker and the interpreter code will be roughly 200-500 lines together, depending on the programming language used.
All BNFC supported languages can be used, but guidance is guaranteed only for Haskell and Java 1.5.
The type system and the interpreter are partially characterized by formal rules in in the course lecture notes, chapters 4 and 5.
In the type checker, the recommended procedure is two passes:
In the interpreter, do a similar thing:
main()
Only the four built-in types
int double bool void
are taken into account. Every expression has one of these types.
Types of functions in the symbol table can be represented in any way that stores their argument and return types. For instance, the function header
int f (double x, bool b)
can create a symbol table entry
f |-> ([double, bool], int)
mapping the name f
to a pair whose first component
is the list of argument types and the second component is the
return type.
A program is a sequence of definitions.
A program may also contain comments and preprocessor directives, which are just ignored by the parser (see below).
An interpretable program must have a function main
that takes no arguments
and returns an int
. But this need not be checked in the type checker.
A function definition has a type, a name, an argument list, and a body. Example:
int foo(double x, int y) { return y + 9 ; }
The language has four built-in functions dealing with input and output
void printInt(int x) // print an integer and a newline in standard output void printDouble(double x) // print a double and a newline in standard output int readInt() // read an integer from standard input double readDouble() // read a double from standard input
Typing rules
The same function name may be used in at most one function definition.
All return
statements in a function body must
return an expression whose type is the
return type of the function.
You don't need to check that there actually is a return
statement (you can do this
optionally).
An argument list is a comma-separated list of argument declarations.
It is enclosed in parentheses (
and )
.
An argument declaration has a type and an identifier, for instance
int x
Notice that argument declarations with multiple variables
(int x, y
) are not included. A declaration that occurs
as a statement (as shown below), can also have multiple variables.
But it must have at least one variable.
A function body is a list of statements enclosed
in curly brackets {
and }
.
Typing rules
An argument list may only use each variable once.
Any expression followed by a semicolon ;
can be used as a statement.
Any declaration followed by a semicolon ;
can be used as a statement.
Declarations have one of the following formats:
int i ;
int i, j ;
int i = 6 ;Typing. The initializing expression must have the declared type.
Statements returning an expression, for example
return i + 9 ;
Typing. The type of the returned expression must be the same as the return type of the function in which it occurs.
While loops, with an expression in parentheses followed by a statement, for example:
while (i < 10) ++i ;
Typing. The expression must have type bool.
Conditionals: if
with an expression in parentheses followed by
a statement, else
, and another statement. Examples:
if (x > 0) return x ; else return y ;
Typing. The expression must have type bool.
(No else-less if statements.)
Blocks: any list of statements (including empty list) between curly brackets. For instance,
{ int i = 2 ; { } i++ ; }
Typing rules
A variable may only be declared once on the same block level.
The parameters of a function have the same level as the top-level block in the body.
The following table gives the expressions and their precedence levels. Infix operators are assumed to be left-associative. The arguments in a function call can be expressions of any level. Otherwise, the subexpressions are assumed to be one precedence level above of the main expression.
Note. The table is not guaranteed to be exactly the same as in the C++ standard.
level | expression forms | explanation | type | |
---|---|---|---|---|
16 | literal | literal | literal type | |
16 | identifier | variable | declared type | |
15 | f(e,...,e) |
function call | return type | |
14 | v++ , v-- |
in/decrement | (sugar) | |
13 | ++v , --v |
in/decrement | (sugar) | |
12 | e*e , e/e |
mult, div | operand type (int or double) | |
11 | e+e , e-e |
add, sub | operand type (int or double) | |
9 | e<e , e>e , e>=e , e<=e |
comparison | bool | |
8 | e==e , e!=e |
(in)equality | bool | |
4 | e&&e |
conjunction | bool | |
3 | e||e |
disjunction | bool | |
2 | v=e |
assignment | type of LHS |
Typing rules
Integer, double, and boolean literals have their usual types,
Variables have the type declared in the nearest enclosing block. A variable must be declared before it is used in an expression.
The arguments of a function call must have types corresponding to the argument types of the called function. The number of arguments must be the same as in the function declaration (thus the C++ default argument rule is not applied). Notice that only identifiers are used as functions.
This can be covered in the parser: increments and decrements only apply to variables.
Comparison and (in)equality apply to two operands of the same type,
which is int
, double
, or bool
.
Conjunction and disjunction apply to operands of type bool
only.
Assignments have the same type as the left-hand-side variable. Notice that only variables are used as left-hand-sides.
(There are no qualified constants or template instantiations.)
We include integer literals and floating point literals.
There are also two boolean literals, true
and false
.
Notice that the names true
and false
were not specified
as literals in Lab 1, so you probably treated them as identifiers.
An identifier is a letter followed by a list of letters, digits, and underscores.
There are three kinds of comments.
/*
and */
//
to the end of a line or the file
#
to the end of a line or the file
(preprocessor directive)
Comments cannot be nested.
The top-level interpreter is run by by eveluating the expression main()
.
The return value is ignored.
There are four types of values:
true
and false
Instead of boolean values, you may use integers.
Then true
can be interpreted as 1 and false
as 0.
Values can be seen as a special case of expressions: as expressions that contain no variables and cannot be evaluated further. But it is recommended to have a separate datatype of values, in order to guarantee that evaluation always results in a value.
Thus the evaluation of an expression in an environment should always result in a value.
A program is a sequence of function definitions. Each function has a parameter list and a body, which is a sequence of statements.
The evaluation of a function call starts by evaluting the arguments and building an environment where the received values are assigned to the argument variables (a.k.a. parameters) of the function.
The statements in the body are then executed in
the order defined by their textual order as altered by while
loops and if
conditions.
The function returns a value, which is obtained from the return
statement. This statement can be assumed to be the last one in the
function body. If the return type is void
, no return statement is
required.
A declaration, e.g.
int i ;
adds a variable to the current environment. Its value is initialized if and only if the declaration includes an initializing expression, e.g.
int i = 9 ;
An expression statement, e.g.
i++ ;
is evaluated, and its value is ignored.
A block of statements, e.g.
{ int i = 3 ; i++ ; }
is interpreted in an environment where a new context is pushed on the context stack at entrance, and popped at exit.
A while
statement, e.g.
while (1 < 10){ i++ ; }
is interpreted so that the condition expression is first evaluated.
If the value is true
, the body is interpreted in the resulting
environment, and the while
statement is executed again.
If the value is false
, the statement after the while
statement
is interpreted.
An if-else
statement, e.g.
if (1 < 10) i++ ; else j++ ;
is interpreted so that the condition expression is first evaluated.
If the value is true
, the statement before else
is interpreted.
If the value is false
, the statement after else
is interpreted.
A return
statement is executed by evaluating its expression argument.
The value is returned to the caller of the function, and no more
statements in the function body are executed. You can assume that
the return
statement is always the last one in a function body.
The interpretation of an expression, also called evaluation, returns a value whose type is determined by the type of the expression.
A literal, e.g.
123 3.14 true
is not evaluated further but just converted to the corresponding value.
A variable, e.g.
x
is evaluated by looking up its value in the innermost context where it occurs. If the variable is not in the context, or has no value there, the interpreter terminates with an error message
uninitialized variable x
A function call, e.g.
foo(8,9,true)
is interpreted by first evaluating its arguments from left to right. The environment is then looked up to find out how the function is interpreted on the resulting values.
Calls of the four built-in functions can be hard-coded as special cases in the expression evaluation code.
A postincrement,
i++
has the same value as its body initially has
(here i
). The value of the variable
i
is then incremented by 1.
i--
correspondingly decrements i
by 1.
If i
is of type double
, 1.0 is used instead.
A preincrement,
++i
has the same value as i
plus 1. This incremented value
replaces the old value of i
. The decrement and double
variants are analogous.
The arithmetic operations addition, subtraction, multiplication, and division,
a + b a - b a * b a / b
are interpreted by evaluating their operands from left to right.
The resulting values are then added, subtracted, etc., by using
appropriate operations of the implementation language. We are not
picky about the precision chosen, but suggest for simplicity
that int
should be int
and double
should be double
.
Comparisons,
a < b a > b a >= b a <= b a == b a != b
are treated similarly to the arithmetic operations, using comparisons of the implementation language. The returned value is an integer.
Conjunction,
a && b
is evaluated lazily: first a
is evaluated. If the
result is true (1), also b
is evaluated, and the
value of b
is returned. However, if
a
evaluates to false (0), then false
is returned without evaluating b
.
Disjunction,
a || b
is also evaluated lazily: first a
is evaluated. If the
result is false (0), also b
is evaluated, and the
value of b
is returned. However, if
a
evaluates to true (1), then true
is returned without evaluating b
.
Assignment,
x = a
is evaluated by first evaluating a
. The resulting
value is returned, but also the context is changed by assigning
this value to the innermost occurrence of x
.
The interpreter must be a program called lab2
, which is
executed by the command
lab2 <SourceFile>
and prints its output to the standard output. The output at success must be just the output defined by the interpreter.
The output at failure is an interpreter error, or a
TYPE ERROR
or a
SYNTAX ERROR
,
depending on the phase at which the error occurs.
These error messages should also give some useful explanation, but we don't
specify what exactly its format.
The input can be read not only from user typing on the
terminal, but also from standard input redirected from
a file or by echo
. For instance,
./lab2 fibonacci.cc <test-input echo 20 | ./lab2 fibonacci.cc
The easiest way to produce the proper format is to use the ready-made files in either of
Source file
// file good.cc int main () { int i = readInt() ; //5 printInt(i) ; //5 printInt(i++) ; //5 printInt(i) ; //6 printInt(++i) ; //7 printInt(i) ; //7 }
Running the interpreter
% echo 3 | ./lab2 good.cc 3 3 4 5 5
Source file
// file bad.cc int f (double c) { int n = 1 ; while(c) ++n ; }
Running the type checker
% lab2 bad.cc TYPE ERROR condition c in while: expected bool, found double
Source file
// file bad.cc int main () { int i ; printInt(i) ; printInt(i++) ; printInt(i) ; printInt(++i) ; printInt(i) ; }
Running the interpreter
% lab2 bad.cc uninitialized variable x
Thus it is assumed that the type checker does not detect uninitialized variables.
The interpreter is submitted as a source file package that can be
compiled by typing make
.
The file names must match the ready-made files in either the
Haskell package or the Java 1.5 package.
The simplest solution is to copy the contents of these files
and replace the grammar, the type checker, and the interpreter
by your own files.
If you want to write the interpreter in another language, the
procedure is the same: send a tar package
and make sure the interpreter can be compiled in a normal Unix enviroment
by typing make
.
Run the testsuite before submitting the lab.
Your interpreter must pass the testsuite. The test suite contains both good and bad programs. The good programs must be executed correctly, whereas the bad ones must fail in appropriate ways in either the type checker or the interpreter.
The solution must be written in an easily readable and maintainable way. For instance, tailoring it for the programs in the test suite is not maintainable!
Submit your lab by using Fire, when Lab 2 is in place.
If you have any problems getting the test program to run, or if you think that there is an error in the test suite, contact the teachers via the course Google group.
If you have any problems getting the test program to run, or if you think that there is an error in the test suite, contact the teachers via the course Google group.
The easiest way to get started is to extend ready-made files in either of
One way to extend the files to the full language is to use the BNFC-generated skeleton as a template. You can also bootstrap them from the files in the mini interpreter.
In these packages, you only have to change three files:
CPP.cf
: the grammar; you can use this one
TypeChecker.hs
or TypeChecker.java
, depending on your implementation language
Interpreter.hs
or Interpreter.java
, depending on your implementation language