|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  | Example                                                                                                                                                                                                                                                                                                                                                                                       |
|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Compiler construction 2011                                                                                                                                                                                                                                                                                                                                                                                                                                                                       | A motivating example                                                                                                                                                                                                                                                                                                                                                                          |
| Lecture 10<br>More on code optimization<br>• IR code optimizations revisited: loops<br>• Native code generation revisited<br>• Instruction selection<br>• Instruction scheduling                                                                                                                                                                                                                                                                                                                 | A simple Javalette function (in extension arrays1)<br>int sum (int [] a) {<br>int res=0;<br>for (int x : a)<br>res = res + x;<br>return res;<br>}<br>What code would you generate?                                                                                                                                                                                                            |
| CHAIMER                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |                                                                                                                                                                                                                                                                                                                                                                                               |
| Example<br>Possible naive LLVM code, part 1                                                                                                                                                                                                                                                                                                                                                                                                                                                      | Example<br>Possible naive LLVM code, part 2                                                                                                                                                                                                                                                                                                                                                   |
| <pre>%arr = type { i32, [ 0 x i32 ] }* define i32 @sum(%arr %_pa) { entry: %a = alloca %arr     store %arr %_pa, %arr* %a     %_res_t0 = alloca i32     store i32 0, i32* %_res_t0     %_x_t1 = alloca i32     %t2 = loca %arr* %a     %t3 = getelementptr %arr %t2 , i32 0, i32 0     %t4 = load i32* %t3     %_index_t5 = alloca i32     store i32 0, i32* %_indexx_t5     br label %lab0 lab0: %t6 = load i32* %_indexx_t5     %t7 = icmp slt i32 %t6 , %t4     br i1 %t7 , label %lab2</pre> | <pre>lab1: %t8 = getelementptr %arr %t2 , i32 0, i32 1, i32 %t6     %t9 = load i32* %t8     store i32 %t9 , i32* %_x_t1     %t10 = load i32* %_x_t0     %t11 = load i32* %_x_t1     %t12 = add i32 %t10 , %t11     store i32 %t12 , i32* %_res_t0     %t13 = add i32 %t6 , 1     store i32 %t13 , i32* %_index_t5     br label %lab0 lab2: %t14 = load i32* %_res_t0     ret i32 %t14 }</pre> |
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |                                                                                                                                                                                                                                                                                                                                                                                               |

| mple                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |                                                                                                                                                                                                                                                                                                                                  | Example                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |
|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| <pre>maps filef opt -mem2reg fil</pre> | [ %t12, %lab1 ]<br>, [ %t13, %lab1 ]<br>i                                                                                                                                                                                                                                                                                        | <pre>After Dot -std-compile-opts<br/>define i32 @sum(Marr nocapture %_pa) nounrind readonly {<br/>entry: %15 = getelementptr %arr &amp;_pa, i32 0, i32 0<br/>%t71 = icen get 152 %t4, 0<br/>br i1 %t71, lokel %b0,nph, label %lab2<br/>bb.nph: %tmg = zert i32 %t4 to i64<br/>br label %lab1<br/>lab1: %indrar = phi 164 [0, %bb.nph ], [%indrar.nert, %lab1 ]<br/>%t72 = compile %lab1, %lab2 %t8<br/>%t12 = compile %lab2 %t8<br/>%t12 = icen get 22 %t8<br/>%t12 = icen get 122 %t8<br/>%t12 = icen get 122 %t8<br/>%t12 = add 132 %t8<br/>%t12 = add 132 %t8<br/>%t12 = add 132 %t8<br/>%t12 = add 132 %t8<br/>%t12 = idd 132 %t8</pre> |
| br label %lab0<br>ab2: ret 132 %_res_t0.0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            | CHALMERS                                                                                                                                                                                                                                                                                                                         | br il %xitcond, label %lab2, label %lab1<br>lab2: %_res_t0.0.1cssa = phi i32 [ 0, %entry ], [ %ti2, %lab1 ]<br>ret i32 %_res_t0.0.1cssa<br>}                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |
| enerated x86 assembly (with 11,<br>sum: push EDI<br>mov ECX, bWORD PTR [ESP + 12]<br>mov EXX, bWORD PTR [ECX]<br>test EXX, EDX<br>jg LBB0_2<br>xor EAX, EAX<br>jgp LBB0_4<br>LBD_2: xor ESI, ESI<br>add ECX, 4<br>add EX, 4<br>add EX, 4<br>add EX, 4<br>add EX, 4<br>add EX, 4<br>add EX, 4                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         | <ul> <li>Comments         <ul> <li>No local vars; no stack<br/>frame handling.</li> <li>Uses callee save<br/>registers EDI and ESI;<br/>note save/restore.</li> <li>ECX holds address of<br/>current array elem;<br/>increased by 4 in each<br/>iteration.</li> <li>EDX counts nr of<br/>elems remaining.</li> </ul> </li> </ul> | Optimizations of loops In computationally demanding applications, most of the time is spent in executing (inner) loops. Thus, an optimizing compiler should focus its efforts in improving loop code. The first task is to identify loops in the code. In the source code, loops are easily identified, but how to recognize them in a low level IR code? A loop in a CFG is a subset of the nodes that <ul> <li>has a header node, which dominates all nodes in the loop.</li> </ul>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |

| IR optimization                                                                                                                                                                | IR optimization                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| $\label{eq:advector} \hline \begin{tabular}{ c c c c } \hline \begin{tabular}{l c c c } \hline \begin{tabular}{l c c } A simple example & & & & & & & & & & & & & & & & & & &$ | Induction variables         A basic induction variable is an (integer) variable which has a single definition in the loop body, which increases its value with a fixed (loop-invariant) amount.         Example: $n = n + 3$ A basic IV will assume values in arithmetic progression when the loop executes.         Given a basic IV we can find a collection of derived IV's, each of which has a single def of the form $m = a^*n+b$ ; where a and b are loop-invariant. The def can be extended to allow RHS of the form $a+k+b$ where also k is an already established derived IV.         Image: The loop might not execute at all, in which case k would not be evaluated. Better to perform loop inversion first.         If (n<100) { |
| def of derived IV by addition. n++;                                                                                                                                            | $ \begin{array}{c} k = 7*n + 3; \\ do \{ \\ a \ k = 7; \\ k = 7; \\ \} \ while \ (k < 703); \\ \end{array} $                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |

| Native code generation, revisited         Instruction selection for RISC machines generally simpler than for CISC machines.           O far, we have ignored some important concerns in code generation:         • The instruction set in real-world processors typically offer many different ways to achieve the same effect. Thus, when translating an IR program to native code we must do instruction selection, i.e. choose between available alternatives.         • The number of translation possibilities grow (combinatorially) as one considers larger chunks of IR code for translation.           • Often an instruction sequence contain independent parts that can be executed in arbitrary order. Different orders may take very different time; thus the code generator must do instruction scheduling.         • The number of translation possibilities grow (combinatorially) as one considers larger chunks of IR code for translation.           • Often an instruction sequence contain independent parts that can be executed in arbitrary order. Different orders may take very different time; thus the code generator must do instruction scheduling.         • The code can be seen as a pattern matching problem: The native instruction selection is the problem to cover the IR code by patterns.           • In LLVM, these tasks are done by the native code generator 11c and the         • Two approaches | optimization                                                                                                                                                                                                                                                                          |                                    | IR optimization                                                                                                                                                                                              |                                                                                                   |
|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------|
| int sum = 0;       for(i=0; i<100; i+1)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       | Dne more example                                                                                                                                                                                                                                                                      |                                    | Loop unrolling                                                                                                                                                                                               |                                                                                                   |
| Added to approximate  Mater code generation, revisited  More complications So far, we have ignored some important concerns in code generation:     The instruction selection for RISC machines generally simpler than for CISC machines.     Instruction for RISC machines generally simpler than for CISC machines.     The number of translation possibilities grow (combinatorially) as one considers larger chunks of IR code for translation.     Pattern matching The IR code can be seen as pattern s; instruction selection is the problem.     The Code by patterns.     The automate of the code generator must do instruction scheduling. Both these task are complex and interact with register allocation.     In LLVM, these tasks are done by the native code generator 11c and the                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            | Sample loop<br>int sum = 0;<br>for(i=0; i:1000; i++)<br>sum += a[i];<br>Strength reduction/IV techniques<br>%sum = 0<br>%off = 0<br>%addr = %addr.a<br>%end = add %addr.a,4000<br>L1: %a.i = load %addr<br>%sum = add %sum,%a.i<br>%addr = add %sum,%a.i<br>%addr = emp lt %addr,%end | this loop?<br>Naive assembler code | for (i=0; i<100; i++) for (i<br>a[i] = a[i] + x[i] a[i<br>a[i]<br>a[i]<br>a[i]<br>b<br>o In which ways is this an improveme<br>o What to do if upper bound is n?<br>o Is unrolling four steps the best choic | ] = a[i] + x[i]<br>+1] = a[i+1] + x[i+1]<br>+2] = a[i+2] + x[i+2]<br>+3] = a[i+3] + x[i+3]<br>nt? |
| Native code generation, revisited       Instruction selection         More complications       Further observations         So far, we have ignored some important concerns in code generation:       • Instruction selection for RISC machines generally simpler than for CISC machines.         • The instruction set in real-world processors typically offer many different ways to achieve the same effect. Thus, when translating an IR program to native code we must do <b>instruction selection</b> , i.e. choose between available alternatives.       • Instruction selection for RISC machines generally simpler than for CISC machines.         • Often an instruction sequence contain independent parts that can be executed in arbitrary order. Different orders may take very different instructions are seen as a pattern matching problem: The native instructions are seen as patterns; instruction selection is the problem to cover the IR code by patterns.         Both these tasks are done by the native code generator 11c and the       Two approaches                                                                                                                                                                                                                                                                                                                                                            |                                                                                                                                                                                                                                                                                       |                                    |                                                                                                                                                                                                              |                                                                                                   |
| More complications         So far, we have ignored some important concerns in code generation:         • The instruction set in real-world processors typically offer many different ways to achieve the same effect. Thus, when translating an IR program to native code we must do <b>instruction selection</b> , i.e. choose between available alternatives.         • Often an instruction sequence contain independent parts that can be executed in arbitrary order. Different orders may take very different time; thus the code generator must do <b>instruction scheduling</b> .         Both these task are complex and interact with <b>register allocation</b> .         In LLVM, these tasks are done by the native code generator 11c and the                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |                                                                                                                                                                                                                                                                                       | CHALMERS                           |                                                                                                                                                                                                              | CHALMERS                                                                                          |
| More complications <ul> <li>Instruction set in real-world processors typically offer many different ways to achieve the same effect. Thus, when translating an IR program to native code we must do <b>instruction setection</b>, i.e. choose between available alternatives.</li> <li>Other an instruction sequence contain independent parts that can be executed in arbitrary order. Different orders may take very different time; thus the code generator must do <b>instruction scheduling</b>.</li> </ul> <ul> <li>Instructions are seen as a pattern struction selection is the problem: The native instruction selection is the problem to cover the IR code by patterns.</li> </ul> <ul> <li>Pattern matching</li> <li>The Rode can be seen as a patterns; instruction selection is the problem to cover the IR code by patterns.</li> </ul>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        | lifve code generation                                                                                                                                                                                                                                                                 | CHALMERS                           | Native code generation                                                                                                                                                                                       | CHALMERS                                                                                          |
| JIT compiler in 111.<br>• Tree pattern matching. Think of IR code as sequence.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |                                                                                                                                                                                                                                                                                       |                                    |                                                                                                                                                                                                              | GALMER                                                                                            |



| Native code generation                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 | Native code generation                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |                                                                                                                                                                            |
|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Instruction scheduling, background                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     | Instruction selection, example                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |                                                                                                                                                                            |
| Simple-minded, old-fashioned view of processor<br>Fetch an instruction, decode it, fetch operands, perform operation, store<br>result. Then fetch next operation,<br>Modern processors<br>• Several instructions under execution concurrently.<br>• Memory system cause delays, with operations waiting for data.<br>• Similar problems for results from arithmetic operations, that may take<br>several cycles.<br>Consequence<br>Important to understand data dependencies and order instructions<br>advantageously. | $ \begin{array}{l} \mbox{Example (from Cooper)} \\ \mbox{w = w + 2 + x + y + z} \\ \mbox{Memory op takes 3 cycles, mult 2 cy} \\ \mbox{One instruction can be issued each} \\ \hline \\ \mbox{Schedule 1} \\ \mbox{r1 < ~ M [fp + 0w]} \\ \mbox{r1 < ~ r1 + r1} \\ \mbox{r2 < ~ M [fp + 0w]} \\ \mbox{r1 < ~ r1 + r2} \\ \mbox{r2 < ~ M [fp + 0y]} \\ \mbox{r1 < ~ r1 + r2} \\ \mbox{r2 < ~ M [fp + 0z]} \\ \mbox{r1 < ~ r1 + r2} \\ \mbox{r2 < ~ M [fp + 0z]} \\ \mbox{r1 < r1 + r2} \\ \mbox{r2 < ~ M [r] + r2} \\ r2 < ~$ | cycle, if data available.<br>Schedule 2<br>r1 <- M [fp + @u]<br>r2 <- M [fp + @z]<br>r1 <- r1 + r1<br>r1 <- r1 + r2<br>r2 <- M [fp + @z]<br>r1 <- r1 + r3<br>r1 <- r1 + r3 |
| advantageously.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        | r1 <- r1 * r2<br>M [fp + @w] <- r1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   | r1 <- r1 * r2<br>M [fp + @w] <- r1                                                                                                                                         |
| Table costs generation                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 | Native code generation<br>Summing up                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 | urnamerts                                                                                                                                                                  |
| Comments <ul> <li>Problem is NP-complete for realistic architectures.</li> <li>Common technique is Ist scheduling: greedy algorithm for scheduling a basic block.</li> <li>Builds graph describing data dependencies between instructions and schedules instructions from ready list of instructions with available operands.</li> </ul>                                                                                                                                                                               | On optimization<br>We have only looked at a few of mar<br>Modern optimization techniques use<br>data structures.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     | sophisticated algorithms and clever                                                                                                                                        |
| Interaction<br>Despite interaction between selection, scheduling and register allocation,<br>these are typically handled independently (and in this order).                                                                                                                                                                                                                                                                                                                                                            | Frameworks such as LLVM make it p<br>state-of-the-art techniques in your or                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |                                                                                                                                                                            |