Interpreter for C/C++

Programming Languages Course, 2011, Laboration 3
Aarne Ranta (aarne (at) chalmers.se)



Summary

The objective of this lab is to write an interpreter for a fragment of the C++ programming language. The interpreter should run programs and correctly perform all their input and output actions.

Before the lab can be submitted, the interpreter has to pass some tests, which are given on the course web page via links later in this document.

The recommended implementation is via a BNF grammar processed by the BNF Converter (BNFC) tool. The syntax tree created by the parser is first type checked by using the type checker created in Laboration 2. The interpreter should then make another pass of the type-checked code.

The fragment of C++ covered is the same as in Laboration 2, except that the interpreter only needs to deal with programs consisting of one main function.

The approximate size of the grammar is 50 rules, and the interpreter code should be 100-300 lines, depending on the programming language used.

All BNFC supported languages can be used, but guidance is guaranteed only for Haskell, Java 1.5, and C++.

The semantics is partially characterized by formal rules in Lecture 8.

Method

Use the files in either of the directories

Copy your CPP.cf grammar and the TypeChecker module from Lab 2 to the same directory.

Edit the file Interpreter.hs or Interpreter.class till it implements a complete interpreter. One way of doing this is to copy the contents of TypeChecker and modify them - the interpreter will be structurally very similar to the type checker.

Language specification

The language is the same as in Laboration 2.

Also its type system is the same.

The type checker of Lab2 works on programs that contain any number of function definitions. The interpreter in Lab3 only needs to consider programs that have exactly one function, whose header is

    int main ()

The statements inside main may call the following four functions.

    void   printInt(int x)        // print an integer and a newline in standard output
    void   printDouble(double x)  // print a double and a newline in standard output
    int    readInt()              // read an integer from standard input
    double readDouble()           // read a double from standard input

The implementation of these function is a part of the interpreter. Their type checker can be implemented as a hard-coded signature in the type checker of lab3.

Values

There are four types of values:

Values can be seen as a special case of expressions: as expressions that contain no variables and cannot be evaluated further. But it is recommended to have a separate datatype of values, in order to guarantee that evaluation always results in a value.

Thus the evaluation of an expression in an environment should always result in a value.

Operational Semantics

Programs

A program is a sequence of statements, which are executed in the order defined by the textual order as altered by while loops and if conditions.

Statements

A declaration, e.g.

    int i ;

adds a variable to the current context. Its value is initialized if and only if the declaration includes an initializing expression, e.g.

    int i = 9 ;

An expression statement, e.g.

    i++ ;

is evaluated, and its value is ignored.

A block of statements, e.g.

    {
      int i = 3 ;
      i++ ;
    }

is interpreted in an environment where a new context is pushed on the context stack at entrance, and popped at exit.

A while statement, e.g.

    while (1 < 10){
      i++ ;
    }

is interpreted so that the condition expression is first evaluated. If the value is true, the body is interpreted in the resulting environment, and the while statement is executed again. If the value is false, the statement after the while statement is interpreted.

An if-else statement, e.g.

    if (1 < 10) i++ ; else j++ ;

is interpreted so that the condition expression is first evaluated. If the value is true, the statement before else is interpreted. If the value is false, the statement after else is interpreted.

A return statement can be interpreted as doing nothing. Alternatively, and more properly, it can be caused to terminate the program. (The test suite will not test what return does, because this paragraph was forgotten from the first version of the PM.)

Expressions

The interpretation of an expression, also called evaluation, returns a value whose type is determined by the type of the expression.

A literal, e.g.

    123
    3.14
    true

is not evaluated further but just converted to the corresponding value.

A variable, e.g.

    x

is evaluated by looking up its value in the innermost context where it occurs. If the variable is not in the context, or has no value there, the interpreter terminates with an error message

    uninitialized variable x

A function call, e.g.

    printInt(8 + 9)

is interpreted by first evaluating its arguments from left to right. The environment is then looked up to find out how the function is interpreted on the resulting values. Alternatively, since there are only four function calls, they can be hard-coded in the expression evaluation code.

A postincrement,

    i++

has the same value as its body initially has (here i). The value of the variable i is then incremented by 1. i-- correspondingly decrements i by 1. If i is of type double, 1.0 is used instead.

A preincrement,

    ++i

has the same value as i plus 1. This incremented value replaces the old value of i. The decrement and double variants are analogous.

The arithmetic operations addition, subtraction, multiplication, and division,

    a + b
    a - b
    a * b
    a / b

are interpreted by evaluating their operands from left to right. The resulting values are then added, subtracted, etc., by using appropriate operations of the implementation language. We are not picky about the precision chosen, but suggest for simplicity that int should be int and double should be double.

Comparisons,

    a <  b
    a >  b
    a >= b
    a <= b
    a == b
    a != b

are treated similarly to the arithmetic operations, using comparisons of the implementation language. The returned value must be boolean.

Conjunction,

    a && b

is evaluated lazily: first a is evaluated. If the result is true, also b is evaluated, and the value of b is returned. However, if a evaluates to false, then false is returned without evaluating b.

Disjunction,

    a || b

is also evaluated lazily: first a is evaluated. If the result is false, also b is evaluated, and the value of b is returned. However, if a evaluates to true, then true is returned without evaluating b.

Assignment,

    x = a

is evaluated by first evaluating a. The resulting value is returned, but also the context is changed by assigning this value to the innermost occurrence of x.

Lab format

Input and output

The interpreter must be a program called lab3, which is executed by the command

    lab3 <SourceFile>

and prints its output to the standard output. The output at success must be just the output defined by the interpreter.

The output at failure is an interpreter error, or a TYPE ERROR as in Lab 2, or a SYNTAX ERROR as in Lab 2.

The input can be read not only from user typing on the terminal, but also from standard input redirected from a file or by echo. For instance,

    ./lab3 fibonacci.cc <test-input
    echo 20 | ./lab3 fibonacci.cc

The easiest way to produce the proper format is to use the ready-made files in either of

Example of success

Source file

  // file good.cc
  
  int main () 
  {
    int i = readInt() ; //5
  
    printInt(i) ;   //5
    printInt(i++) ; //5
    printInt(i) ;   //6
    printInt(++i) ; //7
    printInt(i) ;   //7
  
  }

Running the interpreter

    % echo 3 | ./lab3 good.cc
    3
    3
    4
    5
    5  

Example of failure

Source file

  // file bad.cc
  
  int main () 
  {
    int i ;
  
    printInt(i) ;
    printInt(i++) ;
    printInt(i) ;
    printInt(++i) ;
    printInt(i) ;
  
  }

Running the interpreter

    % lab3 bad.cc
    uninitialized variable x

Thus it is assumed that the type checker does not detect uninitialized variables.

Compiling the interpreter

The interpreter is submitted as a source file package that can be compiled by typing make. The file names must match the ready-made files in either the Haskell package or the Java 1.5 package. The simplest solution is to copy the contents of these files and replace the grammar, the type checker, and the interpreter by your own files.

If you want to write the interpreter in another language, the procedure is the same: send a tar package and make sure the interpreter can be compiled in a normal Unix enviroment by typing make.

Test programs

Run the test suite before submitting the lab.

Success criteria

Your grammar must pass the test suite. The test suite contains only correct programs; the assumption is that the type checker has rejected rejected the incorrect programs.

The solution must be written in an easily readable and maintainable way.

Submission

Submit your lab by using Fire. when Lab 3 is in place.

If you have any problems getting the test program to run, or if you think that there is an error in the test suite, contact the teachers via the course Google group.