Lecture 9: Code Generation
Programming Languages Course
Aarne Ranta (aarne@chalmers.se)
%!target:html
%!postproc(html): #NEW
%!postproc(html): #HR
%!postproc(html): #sub1 1
%!postproc(html): #subn1 n-1
%!postproc(html): #subn n
Book: 6.2, 6.4, 6.6, 7.1, 7.4, 7.5, 7.6
#NEW
==Plan==
The Semantic Gap
Compilation schemes
JVM code generation
Stack and heap memory, garbage collection
Back-end optimization
Native code generation for Intel x86
#NEW
==The goals of the lecture==
JVM: to learn the basic techniques of compilation
- even in practice: [Exercise 5 ../exercises/06-exx.html]
Native code: to get an idea of how programs are made to work
on "bare silicon"
- in practice: the Compiler Construction course (Period 4)
#NEW
==The compiler is a translator==
Reminder: a compiler **translates** the code into another format -
**source code** into **target code**.
Example: C++ source code.
```
int i = 1 ;
int j = i + 2 ;
printInt(i + j) ;
```
Translated to JVM target code:
```
bipush 1
dup
istore 0
bipush 2
iadd
istore 1
iload 0
iload 1
iadd
invokestatic runtime/iprint(I)V
```
#NEW
==Semantic gap==
Different kinds of constructions: **semantic gap** between
source and target code.
|| high-level code | machine code ||
| statement | instructions |
| expression | instructions |
| variable | memory address, register |
| value | bit vector |
| type | memory layout |
| control structure | jump |
| function | subroutine |
| tree structure | linear structure |
Typically, one statement/expression translates to many instructions.
```
x + 3 ==> iload 0
bipush 3
iadd
```
#NEW
==Compilation schemes==
Syntax-directed translation rules that generate target code
from source code
```
// integer addition
compile (exp1 + exp2) =
compile exp1
compile exp2
emit iadd
// integer multiplication
compile (exp1 * exp2) =
compile exp1
compile exp2
emit imul
// integer literal, one signed byte
compile i =
emit (bipush i)
```
The rules go recursively to subtrees, and call ``emit`` to
generate new instructions.
But how do we know that we should call ``iadd`` and not
``dadd`` (double multiplication)?
#NEW
==Type annotations==
Code generation must know the types of arithmetic operations,
variables, etc. The simplest way to guarantee this is to use
an annotating type inference,
```
Exp annotExp (Env env, Exp exp) =
typ := inferExp env exp
return typed(exp)
```
Then we can generate code as follows:
```
// addition
compile typed(exp1 + exp2) =
compile exp1
compile exp2
if typ == int
emit iadd
else
emit dadd
```
An alternative is to compute the types of expressions in the
code generator - but this duplicates the work done in the type
checker.
#NEW
==Machine code constructs that depend on types==
- Arithmetic operations:
``iadd, imul, isub, idiv`` vs. ``dadd, dmul, dsub, ddiv``.
- Numeric constants: ``bipush 7`` vs. ``ldc2_w 7.0``.
- Load and store instructions: ``iload, istore`` vs. ``dload, dstore``
- Memory usage: integers take one slot on the stack, doubles take two.
- Comparisons, return statements,...
We will simplify the presentation by only considering integers.
Booleans are treated similar to integers.
#NEW
==Variable addresses==
The compiler needs an environment to keep the addresses of variables
needed in load and store instructions.
The addresses are integers starting from ``0``. Each declaration increments it.
The environment has a block structure, because, variables
declared in different blocks are stored in different addresses.
```
int i ; // i -> 0
int j ; // j -> 1
{
int k ; // k -> 2
int j ; // j -> 3
}
int m ; // m -> 2
```
Exit from a block frees the addresses reserved in the block.
#NEW
==Compiling declarations and statements==
We assume an environment where we can
- ``lookup`` - the address of a variable
- ``addVar`` - add a variable to the next available address
- ``discard`` - a number of variables at exit from a block
```
// declarations
compile (int x ;) =
addVar x
// variable expressions
compile (x) =
addr := lookup x
emit (iload addr)
// assignments
compile (x = exp) =
compile exp
addr := lookup x
emit (istore addr)
// blocks
compile ({ stms }) =
for all stm in stms:
compile stm
discard (number of declared variables in stms)
```
#NEW
==Labels and jumps==
A **label** is an identifier marking a position in a code.
A **jump** is the instruction ``goto LABEL`` which makes the
interpreter continue from that position, with the same stack as before.
In small step semantics:
```
-->
```
Example (my first BASIC program):
```
BEGIN:
bipush 66
invokestatic runtime/iprint(I)V
goto BEGIN
```
Notice:
- indentation is just a convention, not obligatory
- labels are not instructions in the bytecode, just indices in the code array
calculated by the assembler
- all labels in the code must be unique
- ``invokestatic`` calls a function (static method), indicating its
class (``runtime``), name (``iprint``), and type (``(I)V``)
#NEW
==Compiling while loops==
Idea:
```
TEST:
while (exp) ===> evaluate exp
stm if (exp==0) goto END
execute stm
goto TEST
END:
```
Compilation scheme:
```
compile (while (exp) stm) =
TEST := getLabel
END := getLabel
emit TEST
compile exp
emit ifeq END
compile stm
emit goto TEST
emit END
```
The code uses the **conditional jump** ``ifeq LABEL``, with semantics
```
-->
--> if v!= 0
```
Notice that the jump can go to an earlier position in the code.
#NEW
==Another way to compile while loops==
```
goto TEST
while (exp) ===> LOOP:
stm execute stm
TEST:
evaluate exp
if (exp != 0) goto LOOP
```
#NEW
==Compiling conditionals==
Idea:
```
if (exp) ===> evaluate exp
stm1 if (exp==0) goto FALSE
else execute stm1
stm2 goto TRUE
FALSE:
execute stm2
TRUE:
```
Compilation scheme:
```
compile (if (exp) stm1 else stm2) =
TRUE := getLabel
FALSE := getLabel
compile exp
emit ifeq FALSE
compile stm1
emit goto TRUE
emit FALSE
compile stm2
emit TRUE
```
#NEW
==Compiling comparisons==
There are no comparison operations.
There are no booleans.
If we want to get the value of ``exp1 < exp2``, we execute code corresponding to
```
if (exp1 < exp2) 1 ; else 0 ;
```
We use the conditional jump ``if_icmplt LABEL``, which compares the two
elements at the top of the stack and jumps if the second-last is less
than the last:
```
--> if a < b
--> otherwise
```
We can use code that first pushes 1 on the stack.
This is overwritten by 0 if the comparison does not succeed.
```
bipush 1
exp1
exp3
if_icmplt TRUE
pop
bipush 0
TRUE:
```
#NEW
==Comparisons in while loop conditions==
Putting together the compilation of comparisons and ``while`` loops gives
terrible spaghetti code:
```
while (x < 9) stm ===>
TEST:
bipush 1
iload 0
bipush 9
if_icmplt TRUE
pop
bipush 0
TRUE:
ifeq goto END
stm
goto TEST
END:
```
Couldn't we use the ``if_icmplt`` comparison directly in the ``while`` jump?
#NEW
==Optimizing labels and jumps==
Yes - the negation of ``if_icmplt``: recall that ``!(a < b) == (a >= b)``.
```
while (x < 9) stm ===>
TEST:
iload 0
bipush 9
if_icmpge END
stm
goto TEST
END:
```
The problem is: how can we get this code by using the compilation schemes?
#NEW
==Compositionality==
Syntax-directed translation is **compositional**, if the value returned
for a tree is a function of the values for its immediate subtrees:
```
T (C a#sub1 ... a#subn) = ... T(a#sub1) ... T(a#subn) ...
```
In programming, this means that
- in Haskell, pattern matching does not need patterns deeper than 1
- in Java, one visitor definition per class and function is enough
In Haskell, it would be easy to use **noncompositional** compilation schemes,
by deeper patterns:
```
compile (SWhile (ELt exp1 exp2) stm) = ...
```
In Java, another visitor must be written to define what can happen
depending on the condition part of ``while``.
Another approach: **back-end optimization** of the generated code:
run through the code and look for code fragments that can be improved.
#NEW
==References and objects==
An integer is one word, a double is two - but how big is a linked
list? Or a ``TypeChecker``?
How does JVM (or any machine) deal with objects whose
size is not known?
Even worse: the size of objects may change at runtime (e.g.
elements can be added to a list).
The stack can of course be arbitrarily large - but the
memory needed by a variable must be known when it is
allocated on the stack. This is //before// we know
what values it will contain!
Solution: use **indirect addressing**. On the stack, store
a **reference** to an object that itself is stored in the
**heap**. The heap is a separate part of the memory, which
is not ordered like a stack.
#NEW
==Stack vs. heap allocation==
The reference gives the address of the object in the heap.
In the heap, the object can be stored discontinuously, divided
to many places. Each part then includes the address to the
next part; when the list ends, the address is ``nul``.
A linked list is a clear example.
```
..
---- addr to x3
| x2 <------
| .. |
| nul |
STACK ---> x3 |
----- .. |
[x1, x2, x3] .. addr to x2 -------
addr to x1 ----> x1
..
----
HEAP
```
Graphics convention: the stack grows downwards, the heap grows upwards.
#NEW
==How to free memory from the stack and the heap==
Objects on the stack are removed automatically when they are
no longer used:
- the arguments of functions and arithmetic operations are popped
when the value is returned
- the local variables of a function are popped when the function returns
Objects on the heap are not removed in this way: only the references
to them from the stack are.
Example: a function ``prime``, which returns the //n//th prime
by building a list of the //n// first primes,
```
int prime(int n)
{
LinkedList primes ;
//...
}
```
When returning from the function, how do you know what parts of
the heep contain parts of ``primes`` and are no longer needed?
#NEW
==Garbage collection==
In C (and to a less extent, C++), heap-allocated memory should be
freed by using the function ``free``.
In Java, memory is freed automatically by **garbage collection**.
Garbage = values in memory no longer used by the program.
A garbage collection routine is a part of the interpreter.
It can be part of a compiler, too: in addition to the compiled
native code, the compiler provides it as a part of a
**runtime system**.
C and C++ compilers do not generally provide one.
Haskell compilers do.
#NEW
==Mark-and-sweep garbage collection==
Book 7.6.1
To understand how garbage collection works, let us consider a
simple program that does it.
The algorithm consists of three functions:
- **roots**: find the heap pointers in the stack
- **mark**: recursively mark as true all blocks that are addressed
- **sweep**: free unmarked memory blocks and unmark marked blocks
Problems:
- when and how often to run it?
- how to prevent the fragmentation of memory?
There are more developed garbage collection algorithms that
address these issues.
Modern garbage collections are difficult to beat with hand-written
memory management.
#NEW
==Peephole optimization==
Improve the generated code by looking at chunks of e.g. 3 instructions
(this is called the peephole window).
Reduce the number of instructions e.g. by **constant folding**,
```
bipush 6
bipush 9 ===> bipush 15
iadd
```
or change them to cheaper ones, e.g.
```
bipush 15 ===> bipush 15
bipush 15 dup
```
Iteration gives the best effect:
```
bipush 3
bipush 6 bipush 3
bipush 9 ===> bipush 15 ===> bipush 45
iadd imul
imul
```
#NEW
==Assembling JVM==
The notation have used in this lecture is from the
[Jasmin http://jasmin.sourceforge.net/] JVM assembler.
Things done by the assembler:
- change **opcodes** and their arguments to bytes
- change labels to positions in the byte array
- put constants into a **global constant pool**
We have created a simple script [``jass`` ../exercises/jvm/jass],
which reads the code for a ``main`` function and
embeds it into a class called ``Main``. Then it calls ``jasmin``.
In Exercise 6, you can build a compiler by using the
compilation schemes of this lecture. You can then
``jass`` the output of the compiler to produce a file
``Main.class``. This file can be run with
```
java Main
```
#NEW
==Compiling for Intel X86==
The most common processor family in the world.
Originally the "IBM PC", historically:
- 8088, 8086. Real mode: any program may access any memory address
- 80286 ("AT"). Main novelty: protected mode
- 80386. Main novelty: 32-bit registers.
- 80486DX, Pentium, Pentium II, III, IV...
Main novelty: integrated math coprocessor with eight 80-bit registers
The machines are backward compatible for the older code, but operating
systems will forbid certain things, such as accessing the memory
of other processes.
#NEW
==Registers vs. memory==
Registers are places for data inside the CPU.
- + up to 10 times faster access than to main memory
- + fewer instructions needed
- - expensive; typically just 32 of them in a 32-bit CPU
- - difficult to use optimally
- - efficient cache memory is making them less central in
compiler optimizations
Arithmetic operations must have their operands and
return their values in registers.
All modern native code assembly languages use registers.
#NEW
==From assembler to machine language==
Machine language is just bytes. For instance, in x86 we have
an instruction
```
00000011 11000011
```
This is usually given in hex code:
```
03 C3
```
This can be generated from the assembly instruction
```
add eax, ebx
```
In source code notation, this would be written
```
eax = eax + ebx
```
The variables ``eax`` and ``ebx`` are names of integer registers.
#NEW
===More examples===
Addition and subtraction: two-address instructions with two registers.
```
add eax, ebx ; eax = eax + ebx
sub eax, ebx ; eax = eax - ebx
```
Immediate values (integers) and memory addresses are also possible
as source (but not as target):
```
add eax, 5 ; eax = eax + 5
add eax, [ebp - 8] ; eax = eax + [ebp - 8]
```
The memory address ``[ebp + 8]`` is an offset of 8 from the
address stored in the register ``ebp``.
Multiplication and division are more complicated, e.g.
```
div a ; eax = (edx:eax)/a ; edx = edx:eax % a
```
Move instructions can also have a memory address as target,
but not as both source and target.
```
mov [ebp - 12], eax
```
Labels and jumps are as in JVM, e.g.
```
cmp eax, ebx ; compare eax with ebx
jge Label ; if eax >= ebx, jump to Label
```
#NEW
===Example===
Source code
```
int i = 3 ;
while(i < 17) {
i = i + 2 ;
}
int j = i ;
```
Compiled code
```
mov [ebp - 8], 3 ; i -> [ebp - 8], i = 3
mov eax, [ebp - 8] ; eax = i
Test:
mov ebx, 17 ; ebx = 17
cmp eax, ebx ; cmp i 17
jge End
add eax, 2 ; eax = i = i + 2
jmp Test
End:
mov [ebp - 12], eax ; j -> [ebp - 12], j = i
mov [ebp - 8], eax ; save i in memory
```
Notice optimization: ``i`` is only in a register during the loop.
#NEW
===Compiling to native code===
Compilation schemes are similar to JVM.
Typically not directly to native code, but to
**intermediate code** with infinitely many
**virtual registers**.
**2-address code** is similar to x86: one target and one source register.
The scheme returns a register where the value is found.
```
compile (exp1 + exp2) =
r1 := compile exp1
r2 := compile exp2
emit (r1 += r2)
return r1
```
But often the intermediate language is
**3-address code**, with two source registers.
```
compile (exp1 + exp2) =
r0 := newReg
r1 := compile exp1
r2 := compile exp2
emit (r0 = r1 + r2)
return r0
```
#NEW
==Register allocation==
The step from infinitely many virtual registers to the
limited number of a machine is called register allocation.
Main constraint: variables that are **live** at the same
time cannot be kept in the same register.
Example: how many registers are needed for ``x, y, x``?
```
int x = 1 ; // x alive
int y = 2 ; // x,y alive
printInt(y) ; // x,y alive
int z = 3 ; // x, z alive
printInt(x + z) ; // x, z alive
```
Answer: two, because ``y`` and ``z`` can be kept in the same
register.
#NEW
==Assembling for Intel X86==
Excellent free book:
[A Tutorial on PC assembly http://www.drpaulcarter.com/pcasm/]
by Paul Carter.
The book uses NASM, Net-Wide Assembler, also available
through the book web page.
The result of compilation is first assembled and then
**linked**.
```
compile Foo.cc -- generate assembler
nasm -f elf Foo.asm -- assemble to object file
gcc -o Foo driver.o Foo.o asm_io.o -- link to executable
```
The linker combines object files and resolves names in them.
**Dynamic linking**: some names are only linked at runtime.
This makes it possible to have library code in one
location, instead of being copied to all applications.