1 00:00:01,000 --> 00:00:07,060 The syntax tree is a useful intermediate representation (IR) that is independent of both the source 2 00:00:07,060 --> 00:00:12,200 language and the target ISA. It contains information about the sequencing 3 00:00:12,200 --> 00:00:18,020 and grouping of operations that isn't apparent in individual machine language instructions. 4 00:00:18,020 --> 00:00:22,730 And it allows frontends for different source languages to share a common backend targeting 5 00:00:22,730 --> 00:00:27,619 a specific ISA. As we'll see, the backend processing can be 6 00:00:27,619 --> 00:00:33,730 split into two sub-phases. The first performs machine-independent optimizations 7 00:00:33,730 --> 00:00:40,370 on the IR. The optimized IR is then translated by the 8 00:00:40,370 --> 00:00:46,380 code generation phase into sequences of instructions for the target ISA. 9 00:00:46,380 --> 00:00:53,510 A common IR is to reorganize the syntax tree into what's called a control flow graph (CFG). 10 00:00:53,510 --> 00:00:58,809 Each node in the graph is a sequence of assignment and expression evaluations that ends with 11 00:00:58,809 --> 00:01:02,989 a branch. The nodes are called "basic blocks" and represent 12 00:01:02,989 --> 00:01:06,870 sequences of operations that are executed as a unit. 13 00:01:06,870 --> 00:01:11,470 Once the first operation in a basic block is performed, the remaining operations will 14 00:01:11,470 --> 00:01:16,000 also be performed without any other intervening operations. 15 00:01:16,000 --> 00:01:21,670 This knowledge lets us consider many optimizations, e.g., temporarily storing variable values 16 00:01:21,670 --> 00:01:27,240 in registers, that would be complicated if there was the possibility that other operations 17 00:01:27,240 --> 00:01:31,940 outside the block might also need to access the variable values while we were in the middle 18 00:01:31,940 --> 00:01:36,869 of this block. The edges of the graph indicate the branches 19 00:01:36,869 --> 00:01:44,170 that take us to another basic block. For example, here's the CFG for GCD. 20 00:01:44,170 --> 00:01:49,549 If a basic block ends with a conditional branch, there are two edges, labeled "T" and "F" leaving 21 00:01:49,549 --> 00:01:55,158 the block that indicate the next block to execute depending on the outcome of the test. 22 00:01:55,158 --> 00:02:00,360 Other blocks have only a single departing arrow, indicating that the block always transfers 23 00:02:00,360 --> 00:02:06,350 control to the block indicated by the arrow. Note that if we can arrive at a block from 24 00:02:06,350 --> 00:02:11,610 only a single predecessor block, then any knowledge we have about operations and variables 25 00:02:11,610 --> 00:02:16,820 from the predecessor block can be carried over to the destination block. 26 00:02:16,820 --> 00:02:23,489 For example, if the "if (x > y)" block has generated code to load the values of x and 27 00:02:23,489 --> 00:02:29,700 y into registers, both destination blocks can use that information and use the appropriate 28 00:02:29,700 --> 00:02:33,510 registers without having to generate their own LDs. 29 00:02:33,510 --> 00:02:38,950 But if a block has multiple predecessors, such optimizations are more constrained. 30 00:02:38,950 --> 00:02:43,680 We can only use knowledge that is common to *all* the predecessor blocks. 31 00:02:43,680 --> 00:02:49,329 The CFG looks a lot like the state transition diagram for a high-level FSM! 32 00:02:49,329 --> 00:02:56,269 We'll optimize the IR by performing multiple passes over the CFG. 33 00:02:56,269 --> 00:03:03,120 Each pass performs a specific, simple optimization. We'll repeatedly apply the simple optimizations 34 00:03:03,120 --> 00:03:08,849 in multiple passes, until we can't find any further optimizations to perform. 35 00:03:08,849 --> 00:03:15,340 Collectively, the simple optimizations can combine to achieve very complex optimizations. 36 00:03:15,340 --> 00:03:19,940 Here are some example optimizations: We can eliminate assignments to variables 37 00:03:19,940 --> 00:03:23,230 that are never used and basic blocks that are never reached. 38 00:03:23,230 --> 00:03:30,280 This is called "dead code elimination". In constant propagation, we identify variables 39 00:03:30,280 --> 00:03:34,810 that have a constant value and substitute that constant in place of references to the 40 00:03:34,810 --> 00:03:37,989 variable. We can compute the value of expressions that 41 00:03:37,989 --> 00:03:43,269 have constant operands. This is called "constant folding". 42 00:03:43,269 --> 00:03:48,260 To illustrate how these optimizations work, consider this slightly silly source program 43 00:03:48,260 --> 00:03:52,409 and its CFG. Note that we've broken down complicated expressions 44 00:03:52,409 --> 00:04:00,150 into simple binary operations, using temporary variable names (e.g, "_t1") to name the intermediate 45 00:04:00,150 --> 00:04:02,989 results. Let's get started! 46 00:04:02,989 --> 00:04:07,879 The dead code elimination pass can remove the assignment to Z in the first basic block 47 00:04:07,879 --> 00:04:13,000 since Z is reassigned in subsequent blocks and the intervening code makes no reference 48 00:04:13,000 --> 00:04:18,279 to Z. Next we look for variables with constant values. 49 00:04:18,279 --> 00:04:23,940 Here we find that X is assigned the value of 3 and is never re-assigned, so we can replace 50 00:04:23,940 --> 00:04:30,460 all references to X with the constant 3. Now perform constant folding [CLICK], evaluating 51 00:04:30,460 --> 00:04:36,680 any constant expressions. Here's the updated CFG, ready for another 52 00:04:36,680 --> 00:04:41,600 round of optimizations. First dead code elimination. 53 00:04:41,600 --> 00:04:47,970 Then constant propagation. And, finally, constant folding. 54 00:04:47,970 --> 00:04:53,190 So after two rounds of these simple operations, we've thinned out a number of assignments. 55 00:04:53,190 --> 00:04:56,580 On to round three! Dead code elimination. 56 00:04:56,580 --> 00:05:01,400 And here we can determine the outcome of a conditional branch, eliminating entire basic 57 00:05:01,400 --> 00:05:07,280 blocks from the IR, either because they're now empty or because they can no longer be 58 00:05:07,280 --> 00:05:13,970 reached. Wow, the IR is now considerably smaller. 59 00:05:13,970 --> 00:05:20,570 Next is another application of constant propagation. And then constant folding. 60 00:05:20,570 --> 00:05:26,380 Followed by more dead code elimination. The passes continue until we discover there 61 00:05:26,380 --> 00:05:31,200 are no further optimizations to perform, so we're done! 62 00:05:31,200 --> 00:05:35,700 Repeated applications of these simple transformations have transformed the original program into 63 00:05:35,700 --> 00:05:40,919 an equivalent program that computes the same final value for Z. 64 00:05:40,919 --> 00:05:46,669 We can do more optimizations by adding passes: eliminating redundant computation of common 65 00:05:46,669 --> 00:05:51,320 subexpressions, moving loop-independent calculations out of loops, 66 00:05:51,320 --> 00:05:57,100 unrolling short loops to perform the effect of, say, two iterations in a single loop execution, 67 00:05:57,100 --> 00:06:01,449 saving some of the cost of increment and test instructions. 68 00:06:01,449 --> 00:06:06,770 Optimizing compilers have a sophisticated set of optimizations they employ to make smaller 69 00:06:06,770 --> 00:06:11,370 and more efficient code. Okay, we're done with optimizations. 70 00:06:11,370 --> 00:06:15,170 Now it's time to generate instructions for the target ISA. 71 00:06:15,170 --> 00:06:19,650 First the code generator assigns each variable a dedicated register. 72 00:06:19,650 --> 00:06:23,810 If we have more variables than registers, some variables are stored in memory and we'll 73 00:06:23,810 --> 00:06:29,389 use LD and ST to access them as needed. But frequently-used variables will almost 74 00:06:29,389 --> 00:06:35,139 certainly live as much as possible in registers. Use our templates from before to translate 75 00:06:35,139 --> 00:06:39,569 each assignment and operation into one or more instructions. 76 00:06:39,569 --> 00:06:44,789 The emit the code for each block, adding the appropriate labels and branches. 77 00:06:44,789 --> 00:06:50,039 Reorder the basic block code to eliminate unconditional branches wherever possible. 78 00:06:50,039 --> 00:06:55,710 And finally perform any target-specific peephole optimizations. 79 00:06:55,710 --> 00:07:02,300 Here's the original CFG for the GCD code, along with the slightly optimized CFG. 80 00:07:02,300 --> 00:07:07,360 GCD isn't as trivial as the previous example, so we've only been able to do a bit of constant 81 00:07:07,360 --> 00:07:12,240 propagation and constant folding. Note that we can't propagate knowledge about 82 00:07:12,240 --> 00:07:17,690 variable values from the top basic block to the following "if" block since the "if" block 83 00:07:17,690 --> 00:07:22,690 has multiple predecessors. Here's how the code generator will process 84 00:07:22,690 --> 00:07:27,380 the optimized CFG. First, it dedicates registers to hold the 85 00:07:27,380 --> 00:07:32,080 values for x and y. Then, it emits the code for each of the basic 86 00:07:32,080 --> 00:07:35,520 blocks. Next, reorganize the order of the basic blocks 87 00:07:35,520 --> 00:07:39,580 to eliminate unconditional branches wherever possible. 88 00:07:39,580 --> 00:07:44,320 The resulting code is pretty good. There no obvious changes that a human programmer 89 00:07:44,320 --> 00:07:50,490 might make to make the code faster or smaller. Good job, compiler! 90 00:07:50,490 --> 00:07:55,430 Here are all the compilation steps shown in order, along with their input and output data 91 00:07:55,430 --> 00:07:59,039 structures. Collectively they transform the original source 92 00:07:59,039 --> 00:08:05,280 code into high-quality assembly code. The patient application of optimization passes 93 00:08:05,280 --> 00:08:11,330 often produces code that's more efficient than writing assembly language by hand. 94 00:08:11,330 --> 00:08:16,350 Nowadays, programmers are able to focus on getting the source code to achieve the desired 95 00:08:16,350 --> 00:08:22,039 functionality and leave the details of translation to instructions in the hands of the compiler.