Book: 6.2, 6.4, 6.6, 7.1, 7.4, 7.5, 7.6
The Semantic Gap
Compilation schemes
JVM code generation
Stack and heap memory, garbage collection
Back-end optimization
Native code generation for Intel x86
JVM: to learn the basic techniques of compilation
Native code: to get an idea of how programs are made to work on "bare silicon"
Reminder: a compiler translates the code into another format - source code into target code.
Example: C++ source code.
int i = 1 ; int j = i + 2 ; printInt(i + j) ;
Translated to JVM target code:
bipush 1 dup istore 0 bipush 2 iadd istore 1 iload 0 iload 1 iadd invokestatic runtime/iprint(I)V
Different kinds of constructions: semantic gap between source and target code.
high-level code | machine code | |
---|---|---|
statement | instructions | |
expression | instructions | |
variable | memory address, register | |
value | bit vector | |
type | memory layout | |
control structure | jump | |
function | subroutine | |
tree structure | linear structure |
Typically, one statement/expression translates to many instructions.
x + 3 ==> iload 0 bipush 3 iadd
Syntax-directed translation rules that generate target code from source code
// integer addition compile (exp1 + exp2) = compile exp1 compile exp2 emit iadd // integer multiplication compile (exp1 * exp2) = compile exp1 compile exp2 emit imul // integer literal, one signed byte compile i = emit (bipush i)
The rules go recursively to subtrees, and call emit
to
generate new instructions.
But how do we know that we should call iadd
and not
dadd
(double multiplication)?
Code generation must know the types of arithmetic operations, variables, etc. The simplest way to guarantee this is to use an annotating type inference,
Exp annotExp (Env env, Exp exp) = typ := inferExp env exp return typed<typ>(exp)
Then we can generate code as follows:
// addition compile typed<typ>(exp1 + exp2) = compile exp1 compile exp2 if typ == int emit iadd else emit dadd
An alternative is to compute the types of expressions in the code generator - but this duplicates the work done in the type checker.
iadd, imul, isub, idiv
vs. dadd, dmul, dsub, ddiv
.
bipush 7
vs. ldc2_w 7.0
.
iload, istore
vs. dload, dstore
We will simplify the presentation by only considering integers.
Booleans are treated similar to integers.
The compiler needs an environment to keep the addresses of variables needed in load and store instructions.
The addresses are integers starting from 0
. Each declaration increments it.
The environment has a block structure, because, variables declared in different blocks are stored in different addresses.
int i ; // i -> 0 int j ; // j -> 1 { int k ; // k -> 2 int j ; // j -> 3 } int m ; // m -> 2
Exit from a block frees the addresses reserved in the block.
We assume an environment where we can
lookup
- the address of a variable
addVar
- add a variable to the next available address
discard
- a number of variables at exit from a block
// declarations compile (int x ;) = addVar x // variable expressions compile (x) = addr := lookup x emit (iload addr) // assignments compile (x = exp) = compile exp addr := lookup x emit (istore addr) // blocks compile ({ stms }) = for all stm in stms: compile stm discard (number of declared variables in stms)
A label is an identifier marking a position in a code.
A jump is the instruction goto LABEL
which makes the
interpreter continue from that position, with the same stack as before.
In small step semantics:
<goto LABEL, i-C-V-S> --> <C(LABEL), LABEL-C-V-S>
Example (my first BASIC program):
BEGIN: bipush 66 invokestatic runtime/iprint(I)V goto BEGIN
Notice:
invokestatic
calls a function (static method), indicating its
class (runtime
), name (iprint
), and type ((I)V
)
Idea:
TEST: while (exp) ===> evaluate exp stm if (exp==0) goto END execute stm goto TEST END:
Compilation scheme:
compile (while (exp) stm) = TEST := getLabel END := getLabel emit TEST compile exp emit ifeq END compile stm emit goto TEST emit END
The code uses the conditional jump ifeq LABEL
, with semantics
<ifeq LABEL, i-C-V-S.0> --> <C(LABEL), LABEL-C-V-S> <ifeq LABEL, i-C-C-S.v> --> <C(i+1), (i+1)-C-V-S> if v!= 0
Notice that the jump can go to an earlier position in the code.
goto TEST while (exp) ===> LOOP: stm execute stm TEST: evaluate exp if (exp != 0) goto LOOP
Idea:
if (exp) ===> evaluate exp stm1 if (exp==0) goto FALSE else execute stm1 stm2 goto TRUE FALSE: execute stm2 TRUE:
Compilation scheme:
compile (if (exp) stm1 else stm2) = TRUE := getLabel FALSE := getLabel compile exp emit ifeq FALSE compile stm1 emit goto TRUE emit FALSE compile stm2 emit TRUE
There are no comparison operations.
There are no booleans.
If we want to get the value of exp1 < exp2
, we execute code corresponding to
if (exp1 < exp2) 1 ; else 0 ;
We use the conditional jump if_icmplt LABEL
, which compares the two
elements at the top of the stack and jumps if the second-last is less
than the last:
<if_icmplt LABEL, i-C-V-S.a.b> --> <C(LABEL), LABEL-C-V-S> if a < b <if_icmplt LABEL, i-C-V-S.a.b> --> <C(i+1), (i+1)-C-V-S> otherwise
We can use code that first pushes 1 on the stack. This is overwritten by 0 if the comparison does not succeed.
bipush 1 exp1 exp3 if_icmplt TRUE pop bipush 0 TRUE:
Putting together the compilation of comparisons and while
loops gives
terrible spaghetti code:
while (x < 9) stm ===> TEST: bipush 1 iload 0 bipush 9 if_icmplt TRUE pop bipush 0 TRUE: ifeq goto END stm goto TEST END:
Couldn't we use the if_icmplt
comparison directly in the while
jump?
Yes - the negation of if_icmplt
: recall that !(a < b) == (a >= b)
.
while (x < 9) stm ===> TEST: iload 0 bipush 9 if_icmpge END stm goto TEST END:
The problem is: how can we get this code by using the compilation schemes?
Syntax-directed translation is compositional, if the value returned for a tree is a function of the values for its immediate subtrees:
T (C a1 ... an) = ... T(a1) ... T(an) ...
In programming, this means that
In Haskell, it would be easy to use noncompositional compilation schemes, by deeper patterns:
compile (SWhile (ELt exp1 exp2) stm) = ...
In Java, another visitor must be written to define what can happen
depending on the condition part of while
.
Another approach: back-end optimization of the generated code: run through the code and look for code fragments that can be improved.
An integer is one word, a double is two - but how big is a linked
list? Or a TypeChecker
?
How does JVM (or any machine) deal with objects whose size is not known?
Even worse: the size of objects may change at runtime (e.g. elements can be added to a list).
The stack can of course be arbitrarily large - but the memory needed by a variable must be known when it is allocated on the stack. This is before we know what values it will contain!
Solution: use indirect addressing. On the stack, store a reference to an object that itself is stored in the heap. The heap is a separate part of the memory, which is not ordered like a stack.
The reference gives the address of the object in the heap.
In the heap, the object can be stored discontinuously, divided
to many places. Each part then includes the address to the
next part; when the list ends, the address is nul
.
A linked list is a clear example.
.. ---- addr to x3 | x2 <------ | .. | | nul | STACK ---> x3 | ----- .. | [x1, x2, x3] .. addr to x2 ------- addr to x1 ----> x1 .. ---- HEAP
Graphics convention: the stack grows downwards, the heap grows upwards.
Objects on the stack are removed automatically when they are no longer used:
Objects on the heap are not removed in this way: only the references to them from the stack are.
Example: a function prime
, which returns the nth prime
by building a list of the n first primes,
int prime(int n) { LinkedList<int> primes ; //... }
When returning from the function, how do you know what parts of
the heep contain parts of primes
and are no longer needed?
In C (and to a less extent, C++), heap-allocated memory should be
freed by using the function free
.
In Java, memory is freed automatically by garbage collection.
Garbage = values in memory no longer used by the program.
A garbage collection routine is a part of the interpreter.
It can be part of a compiler, too: in addition to the compiled native code, the compiler provides it as a part of a runtime system.
C and C++ compilers do not generally provide one. Haskell compilers do.
Book 7.6.1
To understand how garbage collection works, let us consider a simple program that does it.
The algorithm consists of three functions:
Problems:
There are more developed garbage collection algorithms that address these issues.
Modern garbage collections are difficult to beat with hand-written memory management.
Improve the generated code by looking at chunks of e.g. 3 instructions (this is called the peephole window).
Reduce the number of instructions e.g. by constant folding,
bipush 6 bipush 9 ===> bipush 15 iadd
or change them to cheaper ones, e.g.
bipush 15 ===> bipush 15 bipush 15 dup
Iteration gives the best effect:
bipush 3 bipush 6 bipush 3 bipush 9 ===> bipush 15 ===> bipush 45 iadd imul imul
The notation have used in this lecture is from the Jasmin JVM assembler.
Things done by the assembler:
We have created a simple script jass
,
which reads the code for a main
function and
embeds it into a class called Main
. Then it calls jasmin
.
In Exercise 6, you can build a compiler by using the
compilation schemes of this lecture. You can then
jass
the output of the compiler to produce a file
Main.class
. This file can be run with
java Main
The most common processor family in the world.
Originally the "IBM PC", historically:
The machines are backward compatible for the older code, but operating systems will forbid certain things, such as accessing the memory of other processes.
Registers are places for data inside the CPU.
Arithmetic operations must have their operands and return their values in registers.
All modern native code assembly languages use registers.
Machine language is just bytes. For instance, in x86 we have an instruction
00000011 11000011
This is usually given in hex code:
03 C3
This can be generated from the assembly instruction
add eax, ebx
In source code notation, this would be written
eax = eax + ebx
The variables eax
and ebx
are names of integer registers.
Addition and subtraction: two-address instructions with two registers.
add eax, ebx ; eax = eax + ebx sub eax, ebx ; eax = eax - ebx
Immediate values (integers) and memory addresses are also possible as source (but not as target):
add eax, 5 ; eax = eax + 5 add eax, [ebp - 8] ; eax = eax + [ebp - 8]
The memory address [ebp + 8]
is an offset of 8 from the
address stored in the register ebp
.
Multiplication and division are more complicated, e.g.
div a ; eax = (edx:eax)/a ; edx = edx:eax % a
Move instructions can also have a memory address as target, but not as both source and target.
mov [ebp - 12], eax
Labels and jumps are as in JVM, e.g.
cmp eax, ebx ; compare eax with ebx jge Label ; if eax >= ebx, jump to Label
Source code
int i = 3 ; while(i < 17) { i = i + 2 ; } int j = i ;
Compiled code
mov [ebp - 8], 3 ; i -> [ebp - 8], i = 3 mov eax, [ebp - 8] ; eax = i Test: mov ebx, 17 ; ebx = 17 cmp eax, ebx ; cmp i 17 jge End add eax, 2 ; eax = i = i + 2 jmp Test End: mov [ebp - 12], eax ; j -> [ebp - 12], j = i mov [ebp - 8], eax ; save i in memory
Notice optimization: i
is only in a register during the loop.
Compilation schemes are similar to JVM.
Typically not directly to native code, but to intermediate code with infinitely many virtual registers.
2-address code is similar to x86: one target and one source register. The scheme returns a register where the value is found.
compile (exp1 + exp2) = r1 := compile exp1 r2 := compile exp2 emit (r1 += r2) return r1
But often the intermediate language is 3-address code, with two source registers.
compile (exp1 + exp2) = r0 := newReg r1 := compile exp1 r2 := compile exp2 emit (r0 = r1 + r2) return r0
The step from infinitely many virtual registers to the limited number of a machine is called register allocation.
Main constraint: variables that are live at the same time cannot be kept in the same register.
Example: how many registers are needed for x, y, x
?
int x = 1 ; // x alive int y = 2 ; // x,y alive printInt(y) ; // x,y alive int z = 3 ; // x, z alive printInt(x + z) ; // x, z alive
Answer: two, because y
and z
can be kept in the same
register.
Excellent free book: A Tutorial on PC assembly by Paul Carter.
The book uses NASM, Net-Wide Assembler, also available through the book web page.
The result of compilation is first assembled and then linked.
compile Foo.cc -- generate assembler nasm -f elf Foo.asm -- assemble to object file gcc -o Foo driver.o Foo.o asm_io.o -- link to executable
The linker combines object files and resolves names in them.
Dynamic linking: some names are only linked at runtime. This makes it possible to have library code in one location, instead of being copied to all applications.