Lecture 9: Code Generation
Programming Languages Course
Aarne Ranta (aarne@chalmers.se)

%!target:html

%!postproc(html): #NEW <!-- NEW -->
%!postproc(html): #HR <HR>
%!postproc(html): #sub1 <sub>1</sub>
%!postproc(html): #subn1 <sub>n-1</sub>
%!postproc(html): #subn <sub>n</sub>


Book: 6.2, 6.4, 6.6, 7.1, 7.4, 7.5, 7.6


#NEW

==Plan==

The Semantic Gap

Compilation schemes

JVM code generation

Stack and heap memory, garbage collection

Back-end optimization

Native code generation for Intel x86


#NEW

==The goals of the lecture==

JVM: to learn the basic techniques of compilation
- even in practice: [Exercise 5  ../exercises/06-exx.html]


Native code: to get an idea of how programs are made to work
on "bare silicon"
- in practice: the Compiler Construction course (Period 4)


#NEW

==The compiler is a translator==

Reminder: a compiler **translates** the code into another format -
**source code** into **target code**.

Example: C++ source code.
```
  int i = 1 ;
  int j = i + 2 ;
  printInt(i + j) ;
```
Translated to JVM target code:
```
  bipush 1
  dup
  istore 0
  bipush 2
  iadd
  istore 1
  iload 0
  iload 1
  iadd
  invokestatic runtime/iprint(I)V
```


#NEW

==Semantic gap==

Different kinds of constructions: **semantic gap** between
source and target code.

  || high-level code  | machine code   ||
  | statement         | instructions    |
  | expression        | instructions    |
  | variable          | memory address, register |
  | value             | bit vector               |
  | type              | memory layout            |
  | control structure | jump                     |
  | function          | subroutine               |
  | tree structure    | linear structure         |

Typically, one statement/expression translates to many instructions.
```
  x + 3   ==>  iload 0
               bipush 3
               iadd
```


#NEW

==Compilation schemes==

Syntax-directed translation rules that generate target code
from source code
```
  // integer addition  
  compile (exp1 + exp2) =
    compile exp1
    compile exp2
    emit iadd

  // integer multiplication
  compile (exp1 * exp2) =
    compile exp1
    compile exp2
    emit imul

  // integer literal, one signed byte
  compile i =
    emit (bipush i)
```
The rules go recursively to subtrees, and call ``emit`` to
generate new instructions.

But how do we know that we should call ``iadd`` and not
``dadd`` (double multiplication)?


#NEW

==Type annotations==

Code generation must know the types of arithmetic operations,
variables, etc. The simplest way to guarantee this is to use
an annotating type inference,
```
  Exp annotExp (Env env, Exp exp) =
    typ := inferExp env exp
    return typed<typ>(exp)
```
Then we can generate code as follows:
```
  // addition  
  compile typed<typ>(exp1 + exp2) =
    compile exp1
    compile exp2
    if typ == int
      emit iadd
    else
      emit dadd
```
An alternative is to compute the types of expressions in the
code generator - but this duplicates the work done in the type
checker.


#NEW

==Machine code constructs that depend on types==

- Arithmetic operations: 
  ``iadd, imul, isub, idiv`` vs. ``dadd, dmul, dsub, ddiv``.

- Numeric constants: ``bipush 7`` vs. ``ldc2_w 7.0``.

- Load and store instructions: ``iload, istore`` vs. ``dload, dstore``

- Memory usage: integers take one slot on the stack, doubles take two.

- Comparisons, return statements,...


We will simplify the presentation by only considering integers.

Booleans are treated similar to integers.


#NEW

==Variable addresses==

The compiler needs an environment to keep the addresses of variables
needed in load and store instructions.

The addresses are integers starting from ``0``. Each declaration increments it.

The environment has a block structure, because, variables 
declared in different blocks are stored in different addresses.
```
  int i ;    // i -> 0
  int j ;    // j -> 1
  {
    int k ;  // k -> 2
    int j ;  // j -> 3
  }
  int m ;    // m -> 2
```
Exit from a block frees the addresses reserved in the block.


#NEW

==Compiling declarations and statements==

We assume an environment where we can 
- ``lookup`` - the address of a variable
- ``addVar`` - add a variable to the next available address
- ``discard`` - a number of variables at exit from a block


```
  // declarations
  compile (int x ;) =
    addVar x
    
  // variable expressions
  compile (x) =
    addr := lookup x
    emit (iload addr)

  // assignments
  compile (x = exp) =
    compile exp
    addr := lookup x
    emit (istore addr)

  // blocks
  compile ({ stms }) =
    for all stm in stms: 
      compile stm
    discard (number of declared variables in stms)
```


#NEW

==Labels and jumps==

A **label** is an identifier marking a position in a code.

A **jump** is the instruction ``goto LABEL`` which makes the
interpreter continue from that position, with the same stack as before.

In small step semantics:
```
  <goto LABEL, i-C-V-S>  -->  <C(LABEL), LABEL-C-V-S>
```
Example (my first BASIC program):
```
  BEGIN:
    bipush 66 
    invokestatic runtime/iprint(I)V 
    goto BEGIN
```
Notice:
- indentation is just a convention, not obligatory
- labels are not instructions in the bytecode, just indices in the code array
  calculated by the assembler
- all labels in the code must be unique
- ``invokestatic`` calls a function (static method), indicating its 
  class (``runtime``), name (``iprint``), and type (``(I)V``)


#NEW

==Compiling while loops==

Idea:
```
                       TEST:
  while (exp)    ===>    evaluate exp
    stm                  if (exp==0) goto END
                         execute stm
                         goto TEST
                       END:
```
Compilation scheme:
```
  compile (while (exp) stm) =
    TEST := getLabel
    END  := getLabel
    emit TEST
    compile exp
    emit ifeq END
    compile stm
    emit goto TEST
    emit END
```
The code uses the **conditional jump** ``ifeq LABEL``, with semantics
```
  <ifeq LABEL, i-C-V-S.0>  -->  <C(LABEL), LABEL-C-V-S> 
  <ifeq LABEL, i-C-C-S.v>  -->  <C(i+1),   (i+1)-C-V-S>   if v!= 0
```
Notice that the jump can go to an earlier position in the code. 


#NEW

==Another way to compile while loops==

```
                       goto TEST
  while (exp)    ===>  LOOP:  
    stm                  execute stm
                       TEST:
                         evaluate exp
                         if (exp != 0) goto LOOP
```


#NEW

==Compiling conditionals==

Idea:
```                      
  if (exp)    ===>      evaluate exp
    stm1                if (exp==0) goto FALSE
  else                  execute stm1   
    stm2                goto TRUE
                      FALSE:
                        execute stm2
                      TRUE:
```
Compilation scheme:
```
  compile (if (exp) stm1 else stm2) =
    TRUE  := getLabel
    FALSE := getLabel
    compile exp
    emit ifeq FALSE
    compile stm1
    emit goto TRUE
    emit FALSE
    compile stm2
    emit TRUE
```


#NEW

==Compiling comparisons==

There are no comparison operations.

There are no booleans.

If we want to get the value of ``exp1 < exp2``, we execute code corresponding to
```
  if (exp1 < exp2) 1 ; else 0 ; 
```
We use the conditional jump ``if_icmplt LABEL``, which compares the two
elements at the top of the stack and jumps if the second-last is less 
than the last:
```
  <if_icmplt LABEL, i-C-V-S.a.b>  -->  <C(LABEL), LABEL-C-V-S>  if a < b
  <if_icmplt LABEL, i-C-V-S.a.b>  -->  <C(i+1),   (i+1)-C-V-S>  otherwise
```
We can use code that first pushes 1 on the stack. 
This is overwritten by 0 if the comparison does not succeed.
```
    bipush 1
    exp1
    exp3
    if_icmplt TRUE
    pop
    bipush 0
  TRUE: 
```


#NEW

==Comparisons in while loop conditions==

Putting together the compilation of comparisons and ``while`` loops gives
terrible spaghetti code:
```
  while (x < 9) stm   ===>

  TEST:
    bipush 1
    iload 0
    bipush 9
    if_icmplt TRUE
    pop
    bipush 0
  TRUE: 
    ifeq goto END
    stm
    goto TEST
  END:
```
Couldn't we use the ``if_icmplt`` comparison directly in the ``while`` jump?


#NEW

==Optimizing labels and jumps==

Yes - the negation of ``if_icmplt``: recall that ``!(a < b) == (a >= b)``.
```
  while (x < 9) stm   ===>

  TEST:
    iload 0
    bipush 9
    if_icmpge END
    stm
    goto TEST
  END:
```
The problem is: how can we get this code by using the compilation schemes?


#NEW

==Compositionality==

Syntax-directed translation is **compositional**, if the value returned
for a tree is a function of the values for its immediate subtrees:
```
  T (C a#sub1 ... a#subn) = ... T(a#sub1) ... T(a#subn) ...
```
In programming, this means that 
- in Haskell, pattern matching does not need patterns deeper than 1
- in Java, one visitor definition per class and function is enough


In Haskell, it would be easy to use **noncompositional** compilation schemes,
by deeper patterns:
```
  compile (SWhile (ELt exp1 exp2) stm) = ...
```
In Java, another visitor must be written to define what can happen 
depending on the condition part of ``while``.

Another approach: **back-end optimization** of the generated code:
run through the code and look for code fragments that can be improved.


#NEW

==References and objects==

An integer is one word, a double is two - but how big is a linked
list? Or a ``TypeChecker``?

How does JVM (or any machine) deal with objects whose
size is not known?

Even worse: the size of objects may change at runtime (e.g.
elements can be added to a list).

The stack can of course be arbitrarily large - but the
memory needed by a variable must be known when it is
allocated on the stack. This is //before// we know
what values it will contain!

Solution: use **indirect addressing**. On the stack, store
a **reference** to an object that itself is stored in the
**heap**. The heap is a separate part of the memory, which
is not ordered like a stack.


#NEW

==Stack vs. heap allocation==

The reference gives the address of the object in the heap.

In the heap, the object can be stored discontinuously, divided
to many places. Each part then includes the address to the
next part; when the list ends, the address is ``nul``.

A linked list is a clear example.
```
                                           ..
                                      ---- addr to x3 
                                     |     x2         <------
                                     |     ..                |
                                     |     nul               |
                          STACK       ---> x3                |
                          -----            ..                |
  [x1, x2, x3]            ..               addr to x2 -------
                          addr to x1 ----> x1 
                                           ..
                                           ----
                                           HEAP
```
Graphics convention: the stack grows downwards, the heap grows upwards.


#NEW

==How to free memory from the stack and the heap==

Objects on the stack are removed automatically when they are
no longer used:
- the arguments of functions and arithmetic operations are popped
  when the value is returned
- the local variables of a function are popped when the function returns


Objects on the heap are not removed in this way: only the references
to them from the stack are.

Example: a function ``prime``, which returns the //n//th prime
by building a list of the //n// first primes,
```
  int prime(int n)
  {
    LinkedList<int> primes ;
    //...
  }
```
When returning from the function, how do you know what parts of
the heep contain parts of ``primes`` and are no longer needed?


#NEW

==Garbage collection==

In C (and to a less extent, C++), heap-allocated memory should be
freed by using the function ``free``.

In Java, memory is freed automatically by **garbage collection**.

Garbage = values in memory no longer used by the program.

A garbage collection routine is a part of the interpreter.

It can be part of a compiler, too: in addition to the compiled
native code, the compiler provides it as a part of a 
**runtime system**.

C and C++ compilers do not generally provide one.
Haskell compilers do.


#NEW

==Mark-and-sweep garbage collection==

Book 7.6.1

To understand how garbage collection works, let us consider a
simple program that does it.

The algorithm consists of three functions:
- **roots**: find the heap pointers in the stack
- **mark**: recursively mark as true all blocks that are addressed
- **sweep**: free unmarked memory blocks and unmark marked blocks


Problems:
- when and how often to run it?
- how to prevent the fragmentation of memory?


There are more developed garbage collection algorithms that
address these issues.

Modern garbage collections are difficult to beat with hand-written
memory management.


#NEW

==Peephole optimization==

Improve the generated code by looking at chunks of e.g. 3 instructions
(this is called the peephole window).

Reduce the number of instructions e.g. by **constant folding**,
```
  bipush 6
  bipush 9    ===>   bipush 15
  iadd
```
or change them to cheaper ones, e.g.
```
  bipush 15   ===>   bipush 15
  bipush 15          dup
```
Iteration gives the best effect:
```
  bipush 3
  bipush 6           bipush 3
  bipush 9    ===>   bipush 15  ===> bipush 45  
  iadd               imul
  imul
```


#NEW

==Assembling JVM==

The notation have used in this lecture is from the 
[Jasmin http://jasmin.sourceforge.net/] JVM assembler.

Things done by the assembler:
- change **opcodes** and their arguments to bytes
- change labels to positions in the byte array
- put constants into a **global constant pool**


We have created a simple script [``jass`` ../exercises/jvm/jass],
which reads the code for a ``main`` function and
embeds it into a class called ``Main``. Then it calls ``jasmin``.

In Exercise 6, you can build a compiler by using the 
compilation schemes of this lecture. You can then
``jass`` the output of the compiler to produce a file
``Main.class``. This file can be run with
```
  java Main
```


#NEW

==Compiling for Intel X86==

The most common processor family in the world. 

Originally the "IBM PC", historically:
- 8088, 8086. Real mode: any program may access any memory address 
- 80286 ("AT"). Main novelty: protected mode 
- 80386. Main novelty: 32-bit registers.
- 80486DX, Pentium, Pentium II, III, IV... 
  Main novelty: integrated math coprocessor with eight 80-bit registers


The machines are backward compatible for the older code, but operating
systems will forbid certain things, such as accessing the memory
of other processes.
 

#NEW

==Registers vs. memory==

Registers are places for data inside the CPU.
- + up to 10 times faster access than to main memory
- + fewer instructions needed
- - expensive; typically just 32 of them in a 32-bit CPU
- - difficult to use optimally 
- - efficient cache memory is making them less central in
  compiler optimizations


Arithmetic operations must have their operands and
return their values in registers.

All modern native code assembly languages use registers.


#NEW

==From assembler to machine language==

Machine language is just bytes. For instance, in x86 we have
an instruction
```
  00000011 11000011
```
This is usually given in hex code:
```
  03 C3
```
This can be generated from the assembly instruction
```
  add eax, ebx
```
In source code notation, this would be written
```
  eax = eax + ebx
```
The variables ``eax`` and ``ebx`` are names of integer registers.


#NEW

===More examples===

Addition and subtraction: two-address instructions with two registers.
```
  add eax, ebx        ; eax = eax + ebx
  sub eax, ebx        ; eax = eax - ebx
```
Immediate values (integers) and memory addresses are also possible
as source (but not as target):
```
  add eax, 5          ; eax = eax + 5
  add eax, [ebp - 8]  ; eax = eax + [ebp - 8]
```
The memory address ``[ebp + 8]`` is an offset of 8 from the
address stored in the register ``ebp``.

Multiplication and division are more complicated, e.g.
```
  div a               ; eax = (edx:eax)/a ; edx = edx:eax % a
```
Move instructions can also have a memory address as target,
but not as both source and target.
```
  mov [ebp - 12], eax
```
Labels and jumps are as in JVM, e.g.
```
  cmp  eax, ebx       ; compare eax with ebx
  jge  Label          ; if eax >= ebx, jump to Label
```


#NEW

===Example===

Source code
```
  int i = 3 ;
  while(i < 17) {
    i = i + 2 ;
  }
  int j = i ;
```
Compiled code
```
    mov [ebp - 8], 3     ; i -> [ebp - 8], i = 3
    mov eax, [ebp - 8]   ; eax = i
  Test:
    mov ebx, 17          ; ebx = 17
    cmp eax, ebx         ; cmp i 17
    jge End
    add eax, 2           ; eax = i = i + 2   
    jmp Test
  End:
    mov [ebp - 12], eax  ; j -> [ebp - 12], j = i
    mov [ebp -  8], eax  ; save i in memory
```
Notice optimization: ``i`` is only in a register during the loop.


#NEW

===Compiling to native code===

Compilation schemes are similar to JVM.

Typically not directly to native code, but to 
**intermediate code** with infinitely many
**virtual registers**.

**2-address code** is similar to x86: one target and one source register.
The scheme returns a register where the value is found.
```
  compile (exp1 + exp2) =
    r1 := compile exp1
    r2 := compile exp2
    emit (r1 += r2)
    return r1
```
But often the intermediate language is 
**3-address code**, with two source registers.
```
  compile (exp1 + exp2) =
    r0 := newReg
    r1 := compile exp1
    r2 := compile exp2
    emit (r0 = r1 + r2)
    return r0
```


#NEW

==Register allocation==

The step from infinitely many virtual registers to the
limited number of a machine is called register allocation.

Main constraint: variables that are **live** at the same
time cannot be kept in the same register.

Example: how many registers are needed for ``x, y, x``?
```
  int x = 1 ;       // x     alive
  int y = 2 ;       // x,y   alive
  printInt(y) ;     // x,y   alive
  int z = 3 ;       // x,  z alive
  printInt(x + z) ; // x,  z alive
```
Answer: two, because ``y`` and ``z`` can be kept in the same
register.


#NEW

==Assembling for Intel X86==

Excellent free book:
[A Tutorial on PC assembly http://www.drpaulcarter.com/pcasm/]
by Paul Carter.

The book uses NASM, Net-Wide Assembler, also available
through the book web page.

The result of compilation is first assembled and then
**linked**.
```
  compile Foo.cc                      -- generate assembler

  nasm -f elf Foo.asm                 -- assemble to object file

  gcc -o Foo driver.o Foo.o asm_io.o  -- link to executable
```
The linker combines object files and resolves names in them.

**Dynamic linking**: some names are only linked at runtime.
This makes it possible to have library code in one
location, instead of being copied to all applications.