Lab 3: Code Generator for C/C++

Programming Language Technology, 2018

Back to main page of Programming Language Technology.

Summary

The objective of this assignment is to write a code generator from a fragment of the C++ programming language to JVM, Java Virtual Machine. The code generator should produce Java class files, which can be run in the Java bytecode interpreter so that they correctly perform all their input and output actions.

The code generator is partially characterized by compilation schemes in Chapter 6 of the PLT book. More JVM instructions are given in Appendix B of the book. These explanations follow Jasmin assembler; the code generator may emit Jasmin assembly code into text files, which are then processed further by Jasmin to create class files.

The recommended implementation is via a BNF grammar processed by the BNF Converter (BNFC) tool. The syntax tree created by the parser is first type checked by using the type checker created in Lab 2. The code generator should then make another pass of the type-checked code.

The approximate size of the grammar is 50 rules, and the code generator should be around 300-700 lines, depending on the programming language used. All BNFC supported languages can be used, but guidance is guaranteed only for Haskell and Java.

Before the work can be submitted, the program has to pass the test suite.

Restriction of the task

To pass the assignment, it is sufficient to treat the grammar of Lab 2 excluding doubles. Thus, operations involving type double need not be covered by the code generator.

However, for the ambitious student, we recommend to extend the code generator to also handle doubles (after it works for integers and the statements). The extension requires some extra work:

  1. The type checker needs to annotate the abstract syntax with types, such that the correct instruction (for int or double) can be picked by the code generator.
  2. As doubles are 64bit (whereas the other values have only 32bit), the environment has to record the size of local variables (single word or double word).

Method

The recommended procedure is two passes:

  1. build a symbol table that for every function gives its type in a form usable for JVM code generation;
  2. compile the program by generating a class file containing every source code function as a method.

You can copy your CPP.cf grammar and the TypeChecker module from Lab 2 to the same directory.

Language specification

The language is the same as in Lab 2, except the handling of uninitialized variables, and you can use the grammar file CPP.cf. Also its type system is the same.

Programs using uninitialized variables fall under undefined behaviour in this lab, meaning that any semantics for such programs are allowed.

There are four built-in functions:

    void   printInt(int x)        // print an integer and a newline in standard output
    void   printDouble(double x)  // print a double and a newline in standard output
    int    readInt()              // read an integer from standard input
    double readDouble()           // read a double from standard input

These functions can be defined in a separate runtime class, which can be obtained e.g. from writing these functions in Java and compiling to a class file. A ready-made Java file is here.

Bytecode verification

When running your compiled programs, the Java bytecode you have generated will be automatically sanity-checked by a Java bytecode verifier (which is part of the JVM you are using). The verifier will e.g. check that there are no operand stack overflows or underflows and that all local variable uses and stores are valid. If your code does not pass the verifier, a java.lang.VerifyError exception will be thrown (with an often useful error message).

Class structure

Boilerplate code, see the PLT book, Chapter 6, or the example below.

Functions

Methods in the class, main special, see Chapter 6, or example below.

Note that all returns must be explicit in Java bytecode, including returns from functions returning void. (The bytecode verifier will complain if it is possible for a function to end without returning.)

For non-void-returning functions you are allowed to assume that returns are always properly placed by the programmer who wrote the input source code. However, for void-returning functions you must consider the possibility that no explicit return statement was included in the program source code.

Statements and expressions

The semantics is the same as in Lab 2 (again, except for programs using uninitialized variables). In other words, running the generated classes in java produces the same behaviour as running the source code in the lab2 interpreter.

Example

File good03.cc

  int main ()
  {
    int arg = readInt() ;
    int ret = 1 ;
  
    int i = 1 ;
  
    while (i < arg + 1) {
      ret = i * ret ;
      ++i ;
    }
  
    printInt(ret) ;
    return 0 ;
  }

could compile to good03.j as follows:

  ;; Boilerplate: a wrapping class for cc code
  
  .class public good03
  .super java/lang/Object
  
  .method public <init>()V
    .limit locals 1
    .limit stack 1
  
    aload_0
    invokespecial java/lang/Object/<init>()V
    return
  
  .end method
  
  ;; The java-style main method calls the cc main
  
  .method public static main([Ljava/lang/String;)V
    .limit locals 1
    .limit stack 1
  
    invokestatic good03/main()I
    pop
    return
  
  .end method
  
  ;; Program
  
  .method public static main()I
    .limit locals 3
    .limit stack 3
  
    ;; int arg = readInt();
  
    invokestatic Runtime/readInt()I
    istore_0
  
    ;; int ret = 1;
  
    iconst_1
    istore_1
  
    ;; int i = 1;
  
    iconst_1
    istore_2
  
    ;; while (i < arg + 1)
  
    L0:            ;; // beginning of loop, check condition
    iload_2        ;; i
    iload_0
    iconst_1
    iadd           ;; arg + 1
    if_icmplt L2   ;; test i < arg + 1
    iconst_0
    goto L3
    L2:            ;; i < arg + 1 is true
    iconst_1
    L3:            ;; i < arg + 1 is false
    iconst_0
    if_icmpeq L1   ;; if last comparison was false, exit while loop
  
    ;; ret = i * ret
  
    iload_2
    iload_1
    imul
    istore_1
    iload_1
    pop
  
    ;; ++i
  
    iload_2
    iconst_1
    iadd
    istore_2  ;; // i = i + 1
    iload_2
    pop
  
    ;; // continue loop
    goto L0
  
    ;; printInt(ret)
  
    L1:
    iload_1
    invokestatic Runtime/printInt(I)V
    nop
  
    ;; return 0
  
    iconst_0
    ireturn
  
  .end method

(The comments are only for seeing the connection between .cc and .j).

Solution format

Solution templates

For Haskell and Java there are stubs that can be extended to the full solution. You may download one of the following tar archives, unpack, write the code, test, pack the archive again, and submit to fire.

These packages contain the grammar, stubs for type checker and compiler, and suitable makefiles. (You probably want to adapt your type checker from lab 2, rather than starting from scratch. But have a look at these stubs anyway, they contain code to invoke Jasmin.) The stubs match the requirements for the solution format as detailed in the following:

Input and output

The code generator must be a program called lab3, which is executed by the command

    lab3 <SourceFile>

and produces a class (.class) file. It may do this by first generating Jasmin assembly code (a .j file) and then calling Jasmin on this code. For help with building the Jasmin file, refer to the Jasmin instructions and Java bytecode instruction listings. Jasmin can be called by

    java -jar jasmin.jar <File>.j

or

    jasmin <File>.j

if you have installed Jasmin as a stand-alone program.

The generated class file should have the same name and be in the same directory as the original source file:

    lab3 ../a/b/c.cc

This should produce a class file ../a/b/c.class. This can be achieved using Jasmin's -d flag.

The output at failure is a code generator error, or TYPE ERROR or SYNTAX ERROR as in Assignment 2.

The input can be read not only from user typing on the terminal, but also from standard input redirected from a file or by echo. For instance,

    java fibonacci < test-input
    echo 20 | java fibonacci fibonacci.cc

Example of success

Source file:

  // file good.cc
  int main ()
  {
    int i = readInt() ; // 5
  
    printInt(i) ;       // 5
    printInt(i++) ;     // 5
    printInt(i) ;       // 6
    printInt(++i) ;     // 7
    printInt(i) ;       // 7
  
    return 0 ;
  }

Running the compiler:

  % lab3 good.cc # produces good.class
  % echo 5 | java good
  5
  5
  6
  7
  7

Compiling the code generator

The compiler is submitted as source files that can be compiled by typing make.

Test programs

If you have any problems getting the test program to run, or if you think that there is an error in the test suite, contact the teachers of the course via the mailing list.

We also provide a test suite for lab 3 (with the same test cases as for lab 2). By default, no tests containing doubles will be run; passing all these tests is the minimum required for passing this lab.

If you are ambitious, run the test suite with the --doubles flag to also include the doubles tests.

Run your compiler and the generated code with the test suite before submitting the assignment.

Download the entire test suite folder or the tarball and compile it using make. The test script can then be run as follows:

    ./progs-test-lab3 ../path/to/solution/

Jasmin tips

Jasmin is available on Sourceforge. Note that version 2.4 is included in the test suite and that there are also versions on e.g. Ubuntu (apt install jasmin-sable) and Mac (brew install jasmin).

Make sure your class names in Jasmin have simple names without slashes or dots. If the first line of your Jasmin file is

  .class public x/y/z.cc

then Jasmin will compile it to a file

  x/y/z/cc.class

regardless of what the name of the .j file is. The easiest option, and also what the test suite expects, is that your class name is just a string without any slashes or dots (in this example, just "z").

Important: Make sure that your compiler waits for the Jasmin conversion to finish before exiting. In other words, it shouldn't exit before the .class file has been written. For example when using Haskell, you can do the following:

  p <- runCommand <call jasmin here>
  waitForProcess p

or with at least version 1.2.0.0 (December 2013) of library process installed, simply:

  callCommand <call jasmin here>

Success criteria

Your program must be compatible with the test suite and meet the specification in this document in all respects.

The code you generate must pass the Java bytecode verification. Note: Passing the verifier check means no additional work beyond generating correct code (such as not popping empty stacks etc.).

The solution must be written in an easily readable and maintainable way. In particular, tailoring it for the programs in the test suite is not maintainable!

Submission