Lecture 9: Code Generation

Programming Languages Course
Aarne Ranta (aarne@chalmers.se)

Book: 6.2, 6.4, 6.6, 7.1, 7.4, 7.5, 7.6

Plan

The Semantic Gap

Compilation schemes

JVM code generation

Stack and heap memory, garbage collection

Back-end optimization

Native code generation for Intel x86

The goals of the lecture

JVM: to learn the basic techniques of compilation

Native code: to get an idea of how programs are made to work on "bare silicon"

The compiler is a translator

Reminder: a compiler translates the code into another format - source code into target code.

Example: C++ source code.

    int i = 1 ;
    int j = i + 2 ;
    printInt(i + j) ;

Translated to JVM target code:

    bipush 1
    dup
    istore 0
    bipush 2
    iadd
    istore 1
    iload 0
    iload 1
    iadd
    invokestatic runtime/iprint(I)V

Semantic gap

Different kinds of constructions: semantic gap between source and target code.

high-level code machine code
statement instructions
expression instructions
variable memory address, register
value bit vector
type memory layout
control structure jump
function subroutine
tree structure linear structure

Typically, one statement/expression translates to many instructions.

    x + 3   ==>  iload 0
                 bipush 3
                 iadd

Compilation schemes

Syntax-directed translation rules that generate target code from source code

    // integer addition  
    compile (exp1 + exp2) =
      compile exp1
      compile exp2
      emit iadd
  
    // integer multiplication
    compile (exp1 * exp2) =
      compile exp1
      compile exp2
      emit imul
  
    // integer literal, one signed byte
    compile i =
      emit (bipush i)

The rules go recursively to subtrees, and call emit to generate new instructions.

But how do we know that we should call iadd and not dadd (double multiplication)?

Type annotations

Code generation must know the types of arithmetic operations, variables, etc. The simplest way to guarantee this is to use an annotating type inference,

    Exp annotExp (Env env, Exp exp) =
      typ := inferExp env exp
      return typed<typ>(exp)

Then we can generate code as follows:

    // addition  
    compile typed<typ>(exp1 + exp2) =
      compile exp1
      compile exp2
      if typ == int
        emit iadd
      else
        emit dadd

An alternative is to compute the types of expressions in the code generator - but this duplicates the work done in the type checker.

Machine code constructs that depend on types

We will simplify the presentation by only considering integers.

Booleans are treated similar to integers.

Variable addresses

The compiler needs an environment to keep the addresses of variables needed in load and store instructions.

The addresses are integers starting from 0. Each declaration increments it.

The environment has a block structure, because, variables declared in different blocks are stored in different addresses.

    int i ;    // i -> 0
    int j ;    // j -> 1
    {
      int k ;  // k -> 2
      int j ;  // j -> 3
    }
    int m ;    // m -> 2

Exit from a block frees the addresses reserved in the block.

Compiling declarations and statements

We assume an environment where we can

    // declarations
    compile (int x ;) =
      addVar x
      
    // variable expressions
    compile (x) =
      addr := lookup x
      emit (iload addr)
  
    // assignments
    compile (x = exp) =
      compile exp
      addr := lookup x
      emit (istore addr)
  
    // blocks
    compile ({ stms }) =
      for all stm in stms: 
        compile stm
      discard (number of declared variables in stms)

Labels and jumps

A label is an identifier marking a position in a code.

A jump is the instruction goto LABEL which makes the interpreter continue from that position, with the same stack as before.

In small step semantics:

    <goto LABEL, i-C-V-S>  -->  <C(LABEL), LABEL-C-V-S>

Example (my first BASIC program):

    BEGIN:
      bipush 66 
      invokestatic runtime/iprint(I)V 
      goto BEGIN

Notice:

Compiling while loops

Idea:

                         TEST:
    while (exp)    ===>    evaluate exp
      stm                  if (exp==0) goto END
                           execute stm
                           goto TEST
                         END:

Compilation scheme:

    compile (while (exp) stm) =
      TEST := getLabel
      END  := getLabel
      emit TEST
      compile exp
      emit ifeq END
      compile stm
      emit goto TEST
      emit END

The code uses the conditional jump ifeq LABEL, with semantics

    <ifeq LABEL, i-C-V-S.0>  -->  <C(LABEL), LABEL-C-V-S> 
    <ifeq LABEL, i-C-C-S.v>  -->  <C(i+1),   (i+1)-C-V-S>   if v!= 0

Notice that the jump can go to an earlier position in the code.

Another way to compile while loops

                         goto TEST
    while (exp)    ===>  LOOP:  
      stm                  execute stm
                         TEST:
                           evaluate exp
                           if (exp != 0) goto LOOP

Compiling conditionals

Idea:

    if (exp)    ===>      evaluate exp
      stm1                if (exp==0) goto FALSE
    else                  execute stm1   
      stm2                goto TRUE
                        FALSE:
                          execute stm2
                        TRUE:

Compilation scheme:

    compile (if (exp) stm1 else stm2) =
      TRUE  := getLabel
      FALSE := getLabel
      compile exp
      emit ifeq FALSE
      compile stm1
      emit goto TRUE
      emit FALSE
      compile stm2
      emit TRUE

Compiling comparisons

There are no comparison operations.

There are no booleans.

If we want to get the value of exp1 < exp2, we execute code corresponding to

    if (exp1 < exp2) 1 ; else 0 ; 

We use the conditional jump if_icmplt LABEL, which compares the two elements at the top of the stack and jumps if the second-last is less than the last:

    <if_icmplt LABEL, i-C-V-S.a.b>  -->  <C(LABEL), LABEL-C-V-S>  if a < b
    <if_icmplt LABEL, i-C-V-S.a.b>  -->  <C(i+1),   (i+1)-C-V-S>  otherwise

We can use code that first pushes 1 on the stack. This is overwritten by 0 if the comparison does not succeed.

      bipush 1
      exp1
      exp3
      if_icmplt TRUE
      pop
      bipush 0
    TRUE: 

Comparisons in while loop conditions

Putting together the compilation of comparisons and while loops gives terrible spaghetti code:

    while (x < 9) stm   ===>
  
    TEST:
      bipush 1
      iload 0
      bipush 9
      if_icmplt TRUE
      pop
      bipush 0
    TRUE: 
      ifeq goto END
      stm
      goto TEST
    END:

Couldn't we use the if_icmplt comparison directly in the while jump?

Optimizing labels and jumps

Yes - the negation of if_icmplt: recall that !(a < b) == (a >= b).

    while (x < 9) stm   ===>
  
    TEST:
      iload 0
      bipush 9
      if_icmpge END
      stm
      goto TEST
    END:

The problem is: how can we get this code by using the compilation schemes?

Compositionality

Syntax-directed translation is compositional, if the value returned for a tree is a function of the values for its immediate subtrees:

    T (C a1 ... an) = ... T(a1) ... T(an) ...

In programming, this means that

In Haskell, it would be easy to use noncompositional compilation schemes, by deeper patterns:

    compile (SWhile (ELt exp1 exp2) stm) = ...

In Java, another visitor must be written to define what can happen depending on the condition part of while.

Another approach: back-end optimization of the generated code: run through the code and look for code fragments that can be improved.

References and objects

An integer is one word, a double is two - but how big is a linked list? Or a TypeChecker?

How does JVM (or any machine) deal with objects whose size is not known?

Even worse: the size of objects may change at runtime (e.g. elements can be added to a list).

The stack can of course be arbitrarily large - but the memory needed by a variable must be known when it is allocated on the stack. This is before we know what values it will contain!

Solution: use indirect addressing. On the stack, store a reference to an object that itself is stored in the heap. The heap is a separate part of the memory, which is not ordered like a stack.

Stack vs. heap allocation

The reference gives the address of the object in the heap.

In the heap, the object can be stored discontinuously, divided to many places. Each part then includes the address to the next part; when the list ends, the address is nul.

A linked list is a clear example.

                                             ..
                                        ---- addr to x3 
                                       |     x2         <------
                                       |     ..                |
                                       |     nul               |
                            STACK       ---> x3                |
                            -----            ..                |
    [x1, x2, x3]            ..               addr to x2 -------
                            addr to x1 ----> x1 
                                             ..
                                             ----
                                             HEAP

Graphics convention: the stack grows downwards, the heap grows upwards.

How to free memory from the stack and the heap

Objects on the stack are removed automatically when they are no longer used:

Objects on the heap are not removed in this way: only the references to them from the stack are.

Example: a function prime, which returns the nth prime by building a list of the n first primes,

    int prime(int n)
    {
      LinkedList<int> primes ;
      //...
    }

When returning from the function, how do you know what parts of the heep contain parts of primes and are no longer needed?

Garbage collection

In C (and to a less extent, C++), heap-allocated memory should be freed by using the function free.

In Java, memory is freed automatically by garbage collection.

Garbage = values in memory no longer used by the program.

A garbage collection routine is a part of the interpreter.

It can be part of a compiler, too: in addition to the compiled native code, the compiler provides it as a part of a runtime system.

C and C++ compilers do not generally provide one. Haskell compilers do.

Mark-and-sweep garbage collection

Book 7.6.1

To understand how garbage collection works, let us consider a simple program that does it.

The algorithm consists of three functions:

Problems:

There are more developed garbage collection algorithms that address these issues.

Modern garbage collections are difficult to beat with hand-written memory management.

Peephole optimization

Improve the generated code by looking at chunks of e.g. 3 instructions (this is called the peephole window).

Reduce the number of instructions e.g. by constant folding,

    bipush 6
    bipush 9    ===>   bipush 15
    iadd

or change them to cheaper ones, e.g.

    bipush 15   ===>   bipush 15
    bipush 15          dup

Iteration gives the best effect:

    bipush 3
    bipush 6           bipush 3
    bipush 9    ===>   bipush 15  ===> bipush 45  
    iadd               imul
    imul

Assembling JVM

The notation have used in this lecture is from the Jasmin JVM assembler.

Things done by the assembler:

We have created a simple script jass, which reads the code for a main function and embeds it into a class called Main. Then it calls jasmin.

In Exercise 6, you can build a compiler by using the compilation schemes of this lecture. You can then jass the output of the compiler to produce a file Main.class. This file can be run with

    java Main

Compiling for Intel X86

The most common processor family in the world.

Originally the "IBM PC", historically:

The machines are backward compatible for the older code, but operating systems will forbid certain things, such as accessing the memory of other processes.

Registers vs. memory

Registers are places for data inside the CPU.

Arithmetic operations must have their operands and return their values in registers.

All modern native code assembly languages use registers.

From assembler to machine language

Machine language is just bytes. For instance, in x86 we have an instruction

    00000011 11000011

This is usually given in hex code:

    03 C3

This can be generated from the assembly instruction

    add eax, ebx

In source code notation, this would be written

    eax = eax + ebx

The variables eax and ebx are names of integer registers.

More examples

Addition and subtraction: two-address instructions with two registers.

    add eax, ebx        ; eax = eax + ebx
    sub eax, ebx        ; eax = eax - ebx

Immediate values (integers) and memory addresses are also possible as source (but not as target):

    add eax, 5          ; eax = eax + 5
    add eax, [ebp - 8]  ; eax = eax + [ebp - 8]

The memory address [ebp + 8] is an offset of 8 from the address stored in the register ebp.

Multiplication and division are more complicated, e.g.

    div a               ; eax = (edx:eax)/a ; edx = edx:eax % a

Move instructions can also have a memory address as target, but not as both source and target.

    mov [ebp - 12], eax

Labels and jumps are as in JVM, e.g.

    cmp  eax, ebx       ; compare eax with ebx
    jge  Label          ; if eax >= ebx, jump to Label

Example

Source code

    int i = 3 ;
    while(i < 17) {
      i = i + 2 ;
    }
    int j = i ;

Compiled code

      mov [ebp - 8], 3     ; i -> [ebp - 8], i = 3
      mov eax, [ebp - 8]   ; eax = i
    Test:
      mov ebx, 17          ; ebx = 17
      cmp eax, ebx         ; cmp i 17
      jge End
      add eax, 2           ; eax = i = i + 2   
      jmp Test
    End:
      mov [ebp - 12], eax  ; j -> [ebp - 12], j = i
      mov [ebp -  8], eax  ; save i in memory

Notice optimization: i is only in a register during the loop.

Compiling to native code

Compilation schemes are similar to JVM.

Typically not directly to native code, but to intermediate code with infinitely many virtual registers.

2-address code is similar to x86: one target and one source register. The scheme returns a register where the value is found.

    compile (exp1 + exp2) =
      r1 := compile exp1
      r2 := compile exp2
      emit (r1 += r2)
      return r1

But often the intermediate language is 3-address code, with two source registers.

    compile (exp1 + exp2) =
      r0 := newReg
      r1 := compile exp1
      r2 := compile exp2
      emit (r0 = r1 + r2)
      return r0

Register allocation

The step from infinitely many virtual registers to the limited number of a machine is called register allocation.

Main constraint: variables that are live at the same time cannot be kept in the same register.

Example: how many registers are needed for x, y, x?

    int x = 1 ;       // x     alive
    int y = 2 ;       // x,y   alive
    printInt(y) ;     // x,y   alive
    int z = 3 ;       // x,  z alive
    printInt(x + z) ; // x,  z alive

Answer: two, because y and z can be kept in the same register.

Assembling for Intel X86

Excellent free book: A Tutorial on PC assembly by Paul Carter.

The book uses NASM, Net-Wide Assembler, also available through the book web page.

The result of compilation is first assembled and then linked.

    compile Foo.cc                      -- generate assembler
  
    nasm -f elf Foo.asm                 -- assemble to object file
  
    gcc -o Foo driver.o Foo.o asm_io.o  -- link to executable

The linker combines object files and resolves names in them.

Dynamic linking: some names are only linked at runtime. This makes it possible to have library code in one location, instead of being copied to all applications.