Lecture 5: C to Assembly | Lecture Videos | Performance Engineering of Software Systems | Electrical Engineering and Computer Science

Flash and JavaScript are required for this feature.

Download the video from Internet Archive.

About this Video
Playlist
Transcript
Download this Video

Description: This lecture focuses on how C code is implemented in x86-64 assembly. Dr. Schardl reasons through the mapping from C code to assembly in two steps: C to LLVM IR and then LLVM IR to Assembly.

Instructor: Tao B. Schardl

Lecture 1: Introduction and...

Lecture 2: Bentley Rules fo...

Lecture 3: Bit Hacks

Lecture 4: Assembly Languag...

Now Playing

Lecture 5: C to Assembly

Lecture 6: Multicore Progra...

Lecture 7: Races and Parall...

Lecture 8: Analysis of Mult...

Lecture 9: What Compilers C...

Lecture 10: Measurement and...

Lecture 11: Storage Allocation

Lecture 12: Parallel Storag...

Lecture 13: The Cilk Runtim...

Lecture 14: Caching and Cac...

Lecture 15: Cache-Oblivious...

Lecture 16: Nondeterministi...

Lecture 17: Synchronization...

Lecture 18: Domain Specific...

Lecture 19: Leiserchess Cod...

Lecture 20: Speculative Par...

Lecture 21: Tuning a TSP Al...

Lecture 22: Graph Optimization

Lecture 23: High Performanc...

Download English-US transcript (PDF)

The following content is provided under a Creative Commons license. Your support will help MIT Open Courseware continue to offer high quality educational resources for free. To make a donation or to view additional materials from hundreds of MIT courses, visit MIT OpenCourseware at ocw.mit.edu.

JULIAN SHUN: Good afternoon, everyone. So today, we have TB Schardl here. He's going to give us the lecture on C to assembly. So TB's a research scientist here at MIT working with Charles Leiserson. He also taught this class with me last year, and he got one of the best ratings ever for this class. So I'm really looking forward to his lecture.

TAO SCHARDL: All right, great. So thank you for the introduction, Julian. So I hear you just submitted the beta for project 1. Hopefully, that went pretty well. How many of you slept in the last 24 hours? OK, good. All right, so it went pretty well. That sounds great.

Yeah, so today, we're going to be talking about C to assembly. And this is really a continuation from the topic of last lecture, where you saw computer architecture, if I understand correctly. Is that right? You looked at computer architecture, x86-64 assembly, that sort of thing.

So how many of you walked away from that lecture thinking, oh yeah, x86-64 assembly, this is easy? This is totally intuitive. Everything makes perfect sense. There's no weirdness going on here whatsoever.

How many of you walked away not thinking that? Thinking that perhaps this is a little bit strange, this whole assembly language. Yeah, I'm really in the later cab. x86 is kind of a strange beast. There are things in there that make no sense. Quad word has 8 bytes. P stands for integer, that sort of thing.

So when we move on to the topic of seeing how C code gets translated into assembly, we're translating into something that's already pretty complicated. And the translation itself isn't going to be that straightforward. So we're going to have to find a way to work through that. And I'll outline the strategy that we'll be using in the start of this presentation.

But first, let's quickly review. Why do we care about looking at assembly? You should have seen this slide from the last lecture. But essentially, assembly is a more precise representation of the program than the C code itself. And if you look at the assembly, that can reveal details about the program that are not obvious when you just look at the C code directly.

There are implicit things going on in the C code, such as type cast or the usage of registers versus memory on the machine. And those can have performance implications. So it's valuable to take a look at the assembly code directly.

It can also reveal what the compiler did or did not do when it tried to optimize the program. For example, you may have written a division operation or a multiply operation. But somehow, the compiler figured out that it didn't really need to do a divide or multiply to implement that operation. It could implement it more quickly using simpler, faster operations, like addition and subtraction or shift. And you would be able to see that from looking at the assembly.

Bugs can also arise only at a low level. For example, there may be a bug in the program that only creates unexpected behavior when you optimize the code at 03. So that means, when you're debugging and with that OG or -01, you wouldn't see any unusual behaviors. But when you crank up the optimization level, suddenly, things start to fall apart. Because the C code itself didn't change, it can be hard to spot those bugs. Looking at the assembly can help out in that regard.

And when worse comes to worse, if you really want to make your code fast, it is possible to modify the assembly code by hand. One of my favorite uses of looking at the assembly, though, is actually reverse engineering. If you can read the assembly for some code, you can actually decipher what that program does, even when you only have access to the binary of that program, which is kind of a cool thing.

It takes some practice to read assembly at that level. One trick that some of us in Professor Leiserson's research group have used in the past to say figure out what Intel's Math Kernel Library is doing to multiply matrices.

Now, as I mentioned before, at the end of last lecture, you saw some computer architecture. And you saw the basics of x86-64 assembly, including all the stuff, like the instructions, the registers, the various data types, memory addressing modes, the RFLAGS registered with those condition codes, and that sort of thing. And today, we want to talk about how C code gets implemented in that assembly language.

OK, well, if we consider how C code becomes assembly and what that process actually looks like, we know that there is a compiler involved. And the compiler is a pretty sophisticated piece of software. And, frankly, the compiler has a lot of work to do in order to translate a C program into assembly.

For example, it has to choose what assembly instructions are going to be used to implement those C operations. It has to implement C conditionals and loops-- those if, then, elses and those for and why loops-- into jumps and branches. It has to choose registers and memory locations to store all of the data in the program.

It may have to move data among the registers and the memory locations in order to satisfy various data dependencies. It has to coordinate all the function calls that happen when subroutine A calls B and calls C, and then returns, and so on and so forth. And on top of that, these days, we expect our compiler to try really hard to make that code fast.

So that's a lot of work that the compiler has to do. And as a result, if we take a look at the assembly for any arbitrary piece of C code, the mapping from that C code to the assembly is not exactly obvious, which makes it hard to execute this particular lecture and hard to, in general, read the binary or the assembly for some program and figure out what's really going on.

So what we're going to do today to understand this translation process is we're going to take a look at how that compiler actually reasons about translating C code into assembly. Now this is not a compiler class. 6172 is not a class you take if you want to learn how to build a compiler. And you're not going to need to know everything about a compiler to follow today's lecture. But what we will see is just a little bit about how the compiler understands a program and, later on, how the compiler can translate that program into assembly code.

Now when a compiler compiles a program, it does so through a sequence of stages, which are illustrated on this slide. Starting from the C code, it first pre-processes that code, dealing with all the macros. And that produces a pre-process source. Then the compiler will translate that source code into an intermediate representation. For the client compiler that you're using, that intermediate representation is called LLVM IR. LLVM being the name of the underlying compiler, and IR being the creative name for the intermediate representation.

That LLVM IR is really a sort of pseudo-assembly. It's kind of like assembly, but as we'll see, it's actually a lot simpler than x86-64 assembly. And that's why we'll use it to understand this translation process.

Now it turns out that the compiler does a whole lot of work on that intermediate representation. We're not going to worry about that today. We'll just skip to the end of this pipeline when the compiler translates LLVM IR into assembly code.

Now the nice thing about taking a look at the LLVM IR is that If you're curious, you can actually follow along with the compiler. It is possible to ask clang to compile your code and give you the LLVM IR rather than the assembly. And the flags to do that are somewhat familiar.

Rather than passing the dash s flag, which, hopefully, you've already seen, that will translate C code directly into assembly. If you pass dash s dash omit LLVM, that will produce the LLVM IR. You can also ask clang to translate LLVM IR itself directly into assembly code, and that process is pretty straightforward. You just use the dash S flag once again.

So this is the outline of today's lecture. First, we're going to start with a simple primer on LLVM IR. I know that LLVM IR sounds like another language. Oh, gosh, we have to learn another language. But don't worry. This primer, I would say, is simpler than the x86-64 primer. Based on the slides, for x86-64, that primer was 20-some slides long. This primer is six slides, so maybe a little over a quarter.

Then we'll take a look at how the various constructs in the C programming language get translated into LLVM IR, including straight line code, C functions, conditionals-- in other words, if, then, else-- loops. And we'll conclude that section with just a brief mention of LLVM IR attributes.

And finally, we'll take a look at how LLVM IR gets translated into assembly. And for that, we'll have to focus on what's called the Linux x86-64 calling convention. And we'll conclude with a case study, where we see how this whole process works on a very simple code to compute Fibonacci numbers. Any questions so far?

All right, let's get started. Brief primer on LLVM IR-- so I've shown this in smaller font on some previous slides, but here is a snippet of LLVM IR code. In particular, this is one function within an LLVM IR file. And just from looking at this code, we can see a couple of the basic components of LLVM IR.

In LLVM IR, we have functions. That's how code is organized into these chunks-- chunks called functions. And within each function, the operations of the function are encoded within instructions. And each instruction shows up, at least on this slide, on a separate line.

Those functions operate on what are called LLVM IR registers. These are kind of like the variables. And each of those variables has some associated type. So the types are actually explicit within the IR. And we'll take a look at the types in more detail in a couple of slides.

So based on that high-level overview, we can do a little bit of a comparison between LLVM IR and assembly language. The first thing that we see is that it looks kind of similar to assembly, right? It still has a simple instruction format. There is some destination operand, which we are calling a register. And then there is an equal sign and then an op code, be it add, or call, or what have you, and then some list of source operations. That's roughly what each instruction looks like.

We can also see that the LLVM IR code, it'll turn out. The LLVM IR code adopts a similar structure to the assembly code itself. And control flow, once again, is implemented using conditional branches, as well as unconditional branches.

But one thing that we'll notice is that LLVM IR is simpler than assembly. It has a much smaller instruction set. And unlike assembly language, LLVM IR supports an infinite number of registers. If you can name it, it's a register.

So in that sense, LLVM's notion of registers is a lot closer to C's notion of variables. And when you read LLVM IR, and you see those registers, you should just think about C variables. There's no implicit RFLAGS register, and there no implicit condition codes going on. Everything is pretty explicit in terms of the LLVM.

There's no explicit stack pointer or frame pointer. There's a type system that's explicit in the IR itself. And it's C like in nature, and there are C-like functions for organizing the code overall.

So let's take a look at each of these components, starting with LLVM IR registers. This is basically LLVM's name for a variable. All of the data in LLVM IR is stored in these variables, which are called registers. And the syntax is a percent symbol followed by a name. So %0, %1, %2, that sort of thing.

And as I mentioned before, LLVM IR registers are a lot like c variables. LLVM supports an infinite number of these things, and each distinct register is just distinguished by its name. So %0 is different from %1, because they have different names.

Register names are also local to each LLVM IR function. And in this regard, they're also similar to C variables. If you wrote a C program with two functions, A and B, and each function had a local variable apple, those are two different apples. The apple in A is not the same thing as the apple in B. Similarly, if you had two different LLVM IR functions, and they both described some register five, those are two different variables. They're not automatically aliased.

So here's an example of an LLVM IR snippet. And what we've done here is just highlighted all of the registers. Some of them are being assigned, because they're on the left-hand side of an equal symbol. And some of them are being used as arguments when they show up on the right-hand side.

There is one catch, which we'll see later on, namely that the syntax for LLVM registers ends up being hijacked when LLVM needs to refer to different basic blocks. We haven't defined basic blocks yet. We'll see what that's all about in just a couple of slides. Everyone good so far?

So LLVM IR code is organized into instructions, and the syntax for these instructions is pretty straightforward. We have a register name on the left-hand side, then an equal symbol, and then and op code, followed by an operand list. For example, the top highlight instruction has register six equal to add of sum arguments. And we'll see a little bit more about those arguments later.

That's the syntax for when an instruction actually returns some value. So addition returns the sum of the two operands. Other instructions don't return a value, per se, not a value that you'd store in a local register. And so the syntax for those instructions is just an op code followed by a list of operands. Ironically, the return instruction that you'd find at the end of a function doesn't assign a particular register value. And of course, the operands can be either registers, or constants, or, as we'll see later on, they can identify basic blocks within the function.

The LLVM IR instruction set is smaller than that of x86. x86 contains hundreds of instructions when you start counting up all the vector instructions. And LLVM IR is far more modest in that regard. There's some instructions for data movements, including stack allocation, reading memory, writing memory, converting between types. Yeah, that's pretty much it.

There are some instructions for doing arithmetic or logic, including integer arithmetic, floating-point arithmetic, Boolean logic, binary logic, or address calculations. And then there are a couple of instructions to do control flow. There are unconditional branches or jumps, conditional branches or jumps, subroutines-- that's call or return-- and then there's this magical phi function, which we'll see more of later on in these slides.

Finally, as I mentioned before, everything in LLVM IR is explicitly typed. It's a strongly-typed language in that sense. And the type system looks something like this. For integers, whenever there's a variable of an integer type, you'll see an i followed by some number. And that number defines the number of bits in that integer.

So if you see a variable of type i64, that means it's a 64-bit integer. If you see a variable of type i1, that would be a 1-bit integer or, in other words, a Boolean value. There are also floating-point types, such as double and float.

There are pointer types, when you follow an integer or floating-point type with a star, much like in C, you can have a raise. And that uses a square bracket notation, where, within the square brackets, you'll have some number and then times and then some other type. Maybe it's a primitive type, like an integer or a floating-point. Maybe it's something more complicated.

You can have structs with an LLVM IR. And that uses squiggly brackets with types enumerated on the inside. You can have vector types, which uses angle brackets and otherwise adopts a similar syntax to the array type. Finally, you can occasionally see a variable, which looks like an ordinary register, except that its type is label. And that actually refers to a basic block.

Those are the basic components of LLVM IR. Any questions so far? Everything clear? Everything unclear?

STUDENT: What's the basic [INAUDIBLE]?

TAO SCHARDL: That should be unclear, and we'll talk about it. Yeah?

STUDENT: Is the vector notation there for the vectorization that's done, like the special register is used?

TAO SCHARDL: Is the vector notation used for the vector registers? In a sense, yes. The vector operations with an LLVM don't look like SEC or AVX, per se. They look more like ordinary operations, except those ordinary operations work on a vector type. So that's how the vector operations show up in LLVM IR. That make some sense? Cool. Anything else?

OK, that's the whole primer. That's pretty much all of the language that you're going to need to know, at least for this slide deck. We'll cover some of the details as we go along. Let's start translating C code into LLVM IR. Is that good?

All right, let's start with pretty much the simplest thing we can-- straight line C code. What do I mean by straight line C code? I mean that this is a blob of C code that contains no conditionals or loops. So it's just a whole sequence of operations. And that sequence of operations in C code turns into a sequence of operations in LLVM IR.

So in this example here, we have foo of n minus 1 plus bar of n minus 2. That is a sequence of operations. And it turns into the LLVM IR on the right. We can see how that happens.

There are a couple rules of thumb when reading straight line C code and interpreting it in the IR. Arguments to any operation are evaluated before the operation itself. So what do I mean by that? Well, in this case, we need to evaluate n minus 1 before we pass the results to foo. And what we see in the LLVM IR is that we have an addition operation that computes n minus 1. And then the result of that-- stored into register 4-- gets passed to the call instruction on the next line, which calls out to function foo. Sound good?

Similarly, we need to evaluate n minus 2 before passing its results to the function bar. And we see that sequence of instructions showing up next in the LLVM IR. And now, we actually need the return value-- oh, yeah? Question?

STUDENT: What is NSW?

TAO SCHARDL: NSW? Essentially, that is an attribute, which we'll talk about later. These are things that decorate the instructions, as well as the types, within LLVM IR, basically, as the compiler figures stuff out. So it helps the compiler along with analysis and optimization. Good?

So for the last operation here, we had to evaluate both foo and bar and get their return values before we could add them together. And so the very last operation in this sequence is the addition. That just takes us those return values and computes their sum.

Now all of that used primitive types, in particular, integers. But it's possible that your code uses aggregate types. By aggregating types, I mean, arrays or struts, that sort of thing. And aggregate types are harder to store within registers, typically speaking. And so they're typically stored within memory.

As a result, if you want to access something within an aggregate type, if you want to read some elements out of an array, that involves performing a memory access or, more precisely, computing some address into memory, and then loading or storing that address. So here, for example, we have an array A of seven integers. And we're going to access A sub x.

In LLVM IR, that turns into two instructions-- this getelementptr followed by a load. And in the getelementptr case, this computes an address into memory and stores the result of that address into a register, in this case, register 5. The next instruction, the load, takes the address stored in register 5 and simply loads that particular memory address, storing the result into another register, in this case, 6. Pretty simple.

When reading the getelementptr instruction, the basic syntax involves a pointer into memory followed by a sequence of indices. And all that getelementptr really does is it computes an address by taking that pointer and then adding on that sequence of indices. So in this case, we have a getelementptr instruction, which takes the address in register 2, and then adds onto it-- yeah, that's a pointer into memory-- and then it adds onto it to indices.

One is the literal value 0, and the other is the value stored in register 4. So that just computes the address, starting at 2 plus 0 plus whatever was in register 4. That's all for straight line code. Good so far? feel free to interrupt if you have questions. Cool.

Functions-- let's talk about C functions. So when there's a function in your C code, generally speaking, you'll have a function within the LLVM code as well. And similarly, when there's a return statement in the C code, you'll end up with a return statement in the LLVM IR.

So here, we have just the bare bones C code for this fib routine. That corresponds to this fib function within LLVM IR. And the function declaration itself looks pretty similar to what you would get in ordinary C.

The return statement is also similar. It may take an argument, if you're returning some value to the caller. In this case, for the fib routine, we're going to return a 64-bit integer. And so we see that this return statement returns the 64-bit integer stored in register 0, a lot like in C.

Functions can have parameters. And when you have a C function with a list of parameters, basically, in LLVM IR, you're going to end up with a similar looking function with the exact same list of parameters translated into LLVM IR. So here, we have this C code for the mm base routine. And we have the corresponding LLVM IR for an mm-based function. And what we see is we have a pointer to a double as the first parameter, followed by a 32-bit integer, followed by another pointer to a double, followed by another 32-bit integer, following another pointer to a double, and another 33-bit integer, and another 32-bit integer.

One implicit thing with an LLVM IR if you're looking at a function declaration or definition, the parameters are automatically named %0, %1, %2, so on and so forth. There's one unfortunate thing about LLVM IR. The registers are a lot like C functions, but unfortunately, that implies that when you're reading LLVM IR, it's a lot like reading the code from your teammate, who always insists on naming things with nondescript, single-letter variable names. Also, that teammate doesn't comment his code, or her code, or their code.

OK, so basic blocks-- when we look at the code within a function, that code gets partitioned into chunks, which are called basic blocks. A basic block has a property that's a sequence of instructions. In other words, it's a blob a straight line code, where control can only enter from the first instruction in that block. And it can only leave from the last instruction in that block.

So here we have the C code for this routine fib.c. We're going to see a lot of this routine fib.c, by the way. And we have the corresponding LLVM IR. And what we have in the C code, what the C code is telling us is that if n is less than 2, you want to do one thing. Otherwise, you want to do some complicated computation and then return that result.

And if we think about that. We've got this branch in our control flow. And what we'll end up with are three different blocks within the LLVM IR. So we end up with one block, which does the computation is n less than 2. And then we end up with another block that says, well, in one case, just go ahead and return something, in this case, the input to the function. In the other case, do some complicated calculations, some straight line code, and then return that result.

Now when we partition the code of a function into these basic blocks, we actually have connections between the basic blocks based on how control can move between the basic blocks. These control flow instructions, in particular, the branch instructions, as we'll see, induce edges among these basic blocks. Whenever there's a branch instruction that can specify, that control can leave this basic block and go to that other basic block, or that other basic block, or maybe one or the other, depending on how the result of some computation unfolded.

And so for the fib function that we saw before, we had those three basic blocks. And based on whether or not n was than 2, either we would execute the simple return statement, or we would execute the blob of straight line code shown on the left. So those are basic blocks and functions. Everyone still good so far? Any questions? Clear as mud?

Let's talk about conditionals. You've already seen one of these conditionals. That's given rise to these basic blocks and these control flow edges. So let's tease that apart a little bit further. When we have a C conditional-- in other words, an if-then-else statement or a switch statement, for that matter-- that gets translated into a conditional branch instruction, or BR, in the LLVM IR representation.

So what we saw before is that we have this if n less than 2 and this basic block with two outgoing edges. If we take a really close look at that first basic block, we can tease it apart and see what each operation does. So first, in order to do this conditional operation, we need to compute whether or not n is less than 2. We need to do a comparison between n and the literal value 2.

That comparison operation turns into an icmp instruction within the LLVM IR, an integer comparison in the LLVM IR. The result of that comparison then gets passed to a conditional branch as one of its arguments, and the conditional branch specifies a couple of things beyond that one argument.

In particular, that conditional branch takes out 1-bit integer-- that Boolean result-- as well as labels of two different basic blocks. So that Boolean value is called the predicate. And that's, in this case, a result of that comparison from before. And then the two basic blocks say where to go if the predicate is true or where to go if the predicate is false. The first label is the destination when it's true, second label destination when it's false-- pretty straightforward.

And if we decide to map this onto our control flow graph, which we were looking at before, we can identify the two branches coming out of our first basic block as either the true branch or the false branch based on whether or not you follow that edge when the predicate is true or you follow it when the predicate is false. Sound good? That should be straightforward. Let me know if it's not. Let me know if it's confusing.

Now it's also possible that you can have an unconditional branch in LLVM IR. You can just have a branch instruction with one operand, and that one operand specifies a basic block. There's no predicate. There is no true or false. It's just the one basic block. And what that instruction says is, when you get here, now, go to that other basic block.

This might seem kind of silly, right? Why wouldn't we just need to jump to another basic block? Why not just merge this code with the code in the subsequent basic block? Any thoughts?

STUDENT: For instance, in this case, other things might jump in.

TAO SCHARDL: Correct. Other things might go to that basic block. And in general, when we look at the structure that we get for any particular conditional in C, we end up with this sort of diamond shape. And in order to implement that diamond shape, we need these unconditional branches. So there's a good reason for them to be around.

And here, we just have an example of a slightly more complicated conditional that creates this diamond shape in our control flow graph. So lets tease this piece of code apart. In the first block, we're going to evaluate if some predicate-- and in this case, our predicate is x bitwise and 1.

And what we see in the first basic block is that we compute the bitwise and store that result, do a comparison between that result, and the value 1. That gives us a Boolean value, which is stored in register 3. And we branch conditionally on whether 3 is true or false.

In the case that it's true, we'll branch to block 4. And in block 4, that contains the code for the consequence, the then clause of the if, then, else. And in the call square, we just call function foo. And then we need to leave the conditional, so we'll just branch unconditionally.

The alternative, if x and 1 is zero, if it's false, then we will execute the function bar, but then also need to leave the conditional. And so we see in block 5, following the false branch that we call bar, then we'd just branch to block 6. And finally, in block 6, we return the result.

So we end up with this diamond pattern whenever we have a conditional, in general. We may delete certain basic blocks if the conditional in the code is particularly simple. But in general, it's going to be this kind of diamond-looking thing. Everyone good so far?

One last C construct-- loops. Unfortunately, this is the most complicated C construct when it comes to the LLVM IR. But things haven't been too bad so far. So yeah, let's walk into this with some confidence.

So the simple part is that what we will see is the C code for a loop translates into LLVM IR that, in the control flow graph representation, is a loop. So a loop in C is literally a loop in this graph representation, which is kind of nice. But to figure out what's really going on with these loops, let's first tease apart the components of a C loop. Because we have a couple of different pieces in an arbitrary C loop.

We have a loop body, which is what's executed on each iteration. And then we have some loop control, which manages all of the iterations of that loop. So in this case, we have a simple C loop, which multiplies each element of an input vector x by some scale over a and stores the result into y.

That body gets translated into a blob of straight line code. I won't step through all of the straight line code just now. There's plenty of it, and you'll be able to see the slides after this lecture. But that blob of straight line code corresponds to a loop body. And the rest of the code in the LLVM IR snippet corresponds to the loop control.

So we have the initial assignment of the induction variable. The comparison would be end of the loop and the increment operation at the end. All of that gets encoded in the stuff highlighted in yellow, that loop control part.

Now if we take a look at this code, there's one odd piece that we haven't really understood yet, and it's this phi instruction at the beginning. The phi instruction is weird, and it arises pretty commonly when you're dealing with loops. It basically is there to solve a problem with LLVM's representation of the code. So before we describe the phi instruction, let's actually take a look at the problem that this phi instruction tries to solve.

So let's first tease apart the loop to reveal the problem. The C loop produces this looping pattern in the control flow graph, literally, an edge that goes back to the beginning. If we look at the different basic blocks we have, we have one block at the beginning, which initializes the induction variable and sees if there are any iterations of the loop that need to be run.

If there aren't any iterations, then they'll branch directly to the end of loop. It will just skip the loop entirely. No need to try to execute any of that code. And in this case, it will simply return.

And then inside the loop block, we have these two incoming edges-- one from the entry point of the loop, where i has just been set to zero, and another where we're repeating the loop, where we've decided there's one more iteration to execute. And we're going to go back from the end of the loop to the beginning. And that back edge is what creates the loop structure in the control flow graph. Make sense? I at least see one nod over there. So that's encouraging.

OK, so if we take a look at the loop control, there are a couple of components to that loop control. There's the initialization of the induction variable. There is the condition, and there's the increment. Condition says when do you exit. Increment updates the value of the induction variable.

And we can translate each of these components from the C code for the loop control into the LLVM IR code for that loop. So the increment, we would expect to see some sort of addition where we add 1 to some register somewhere. And lo and behold, there is an add operation. So we'll call that the increment.

For the condition, we expect some comparison operation and a conditional branch based on that comparison. Look at that. Right after the increment, there's a compare and a conditional branch that we'll either take us back to the beginning of the loop or out of the loop entirely.

And we do see that there is some form of initialization. The initial value of this induction variable is 0. And we do see a 0 among this loop control code. It's kind of squirreled away in that weird notation there. And that weird notation is sitting next to the phi instruction.

What's not so clear here is where exactly is the induction variable. We had this single variable i in our C code. And what we're looking at in the LLVM IR are a whole bunch of different registers. We have a register that stores what we're claiming to be i plus 1, then we do this comparison and branch thing. And then we have this phi instruction that takes 0 or the result of the increment.

Where did i actually go? So the problem here is that i is really represented across all of those instructions. And that happens because the value of the induction variable changes as you execute the loop. The value of i is different on iteration 0 versus iteration 1 versus iteration 2 versus iteration 3 and so on and so forth. i is changing as you execute the loop.

And there's this funny invariant. Yeah, so if we try to map that induction variable to the LLVM IR, it kind of maps to all of these locations. It maps to various uses in the loop body. It maps, roughly speaking, to the return value of this field instruction, even though we're not sure what that's all about. But we can tell it maps to that, because we're going to increment that later on. And we're going to use that in a comparison. So it kind of maps all over the place.

And because it changes values with the increment operation, we're going to encounter-- so why does it change registers? Well, we have this property in LLVM that each instruction defines the value of a register, at most, once. So for any particular register with LLVM, we can identify a unique place in the code of the function that defines that register value.

This invariant is called the static single assignment invariant. And it seems a little bit weird, but it turns out to be an extremely powerful invariant within the compiler. It assists with a lot of the compiler analysis. And it also can help with reading the LLVM IR if you expect it.

So this is a nice invariant, but it poses a problem when we're dealing with induction variables, which change as the loop unfolds. And so what happens when control flow merges at the entry point of a loop, for example? How do we define what the induction variable is at that location? Because it could either be 0, if this is the first time through the loop, or whatever you lost incremented. And the solution to that problem is the phi instruction.

The phi instruction defines a register that says, depending on how you get to this location in the code, this register will have one of several different values. And the phi instruction simply lists what the value of that register will be, depending on which basic block you came from.

So in this particular code, the phi instruction says, if you came from block 6, which was the entry point of the loop, where you initially checked if there were any loop iterations to perform, if you come from that block, then this register 9 is going to adopt the value 0. If, however, you followed the back edge of the loop, then the register is going to adopt the value, in this case, 14. And 14, lo and behold, is the result of the incremental operation. And so this phi instruction says, either you're going to start from zero, or you're going to be i plus 1.

Just to note, the phi instruction is not a real instruction. It's really a solution to a problem with an LLVM. And when you translate this code into assembly, the phi instruction isn't going to map to any particular assembly instruction. It's really a representational trick. Does that make some sense? Any questions about that? Yeah?

STUDENT: Why is it called phi?

TAO SCHARDL: Why is it called phi? That's a great question. I actually don't know why they chose the name phi. I don't think they had a particular affinity for the Golden Ratio, but I'm not sure what the rationale was. I don't know if anyone else knows. Yeah? Google knows all, sort of. Yeah, so adopt the value 0 from block 6 or 14 from block 8.

So that's all of the basic components of C translated into LLVM IR. The last thing I want to leave you with in this section on LLVM IR is a discussion of these attributes. And we already saw one of these attributes before. It was this NSW thing attached the add instruction.

In general, these LLVM IR constructs might be decorated with these extra words and keywords. And those are the keywords I'm referring to as attributes. Those attributes convey a variety of information. So in this case, what we have here is C code that performs this memory calculation, which you might have seen from our previous lecture. And what we see in the corresponding LLVM IR is that there's some extra stuff tacked onto that load instruction where you load memory.

One of those pieces of extra information is this align 4. And what that align 4 attribute says is it describes the alignment of that read from memory. And so if subsequent stages of the compiler can employ that information, if they can optimize reads that are 4-byte aligned, then this attribute will say, this is a load that you can go ahead and optimize.

There are a bunch of places where attributes might come from. Some of them are derived directly from the source code. If you write a function that takes a parameter marked as const, or marked as restrict, then in the LLVM IR, you might see that the corresponding function parameter is marked as no alias, because the restricted keyword said this pointer can ever alias or the const keyword says, you're only ever going to read from this pointer. So this pointer is going to be marked read-only. So in that case, the source code itself-- the C code-- was the source of the information for those attributes.

There are some other attributes that occur simply because the compiler is smart, and it does some clever analysis. So in this case, the LLVM IR has a load operation that's 8-byte aligned. It was really analysis that figured out the alignment of that load operation. Good so far? Cool.

So let's summarize this part of the discussion with what we've seen about LLVM IR. LLVM IR is similar to assembly, but a lot simpler in many, many ways. All of the computed values are stored in registers. And, really, when you're reading LLVM IR, you can think of those registers a lot like ordinary C variables.

LLVM IR is a little bit funny in that it adopts a static, single assignment paradigm-- this invariant-- where each registered name, each variable is written by, at most, one instruction within the LLVM IR code. So if you're ever curious where %14 is defined within this function, just do a search for where %14 is on the left-hand side of an equals, and there you go.

We can model of function in LLVM IR as a control flow graph, whose nodes correspond to basic blocks-- these blobs of straight line code-- and whose edges do node control flow among those basic blocks. And compared to C, LLVM IR is pretty similar, except that all of these operations are explicit. The types are explicit everywhere.

The integer sizes are all apparent. You don't have to remember that int really means a 32-bit integer, and you need n-64 to be a 64-bit integer, or you need a long or anything. It's just i and then a bit width.

There no implicit operations at the LLVM IR level. All the typecasts are explicit. In some sense, LLVM IR is like assembly if assembly were more like c. And that's doubly a statement that would not have made sense 40 minutes ago.

All right, so you've seen how to translate C code into LLVM IR. There's one last step. We want to translate the LLVM IR into assembly. And it turns out that structurally speaking, LLVM IR is very similar to assembly. We can, more or less, map each line of LLVM IR to some sequence of lines in the final assembly code.

But there is some additional complexity. The compiler isn't done with its work yet when it's compiling C to LLVM IR to assembly. There are three main tasks that the compiler still has to perform in order to generate x86-64. First, it has to select the actual x86 assembly instructions that are going to implement these various LLVM IR operations.

It has to decide which general purpose registers are going to hold different values and which values need to be squirreled away into memory, because it just has no other choice. And it has to coordinate all of the function calls. And it's not just the function calls within this particular source file. It's also function calls between that source file, and other source files that you're compiling, and binary libraries that are just sitting on the system. But the compiler never really gets to touch.

It has to coordinate all of those calls. That's a bit complicated. That is going to be the reason for a lot of the remaining complexity. And that's what brings our discussion to the Linux x86-64 calling convention. This isn't a very fun convention. Don't worry. But nevertheless, it's useful.

So to talk about this convention, let's first take a look at how a program gets laid out in memory when you run it. So when a program executes, virtually memory gets organized into a whole bunch of different chunks which are called segments. There's a segment that corresponds to the stack that's actually located near the top of virtual memory, and it grows downwards. The stack grows down. Remember this.

There is a heap segment, which grows upwards from a middle location in memory. And those two dynamically-allocated segments live at the top of the virtual address space. There are then two additional segments-- the bss segment for uninitialized data and the data segment for initialized data. And finally, at the bottom of virtual address space, there's a tech segment. And that just stores the code of the program itself.

Now when you read assembly code directly, you'll see that the assembly code contains more than just some labels and some instructions. In fact, it's decorated with a whole bunch of other stuff. And these are called assembler directives, and these directives operate on different sections of the assembly code.

Some of those directives refer to the various segments of virtual memory. And those segment directives are used to organize the content of the assembly file. For example, the .text directive identifies some chunk of the assembly, which is really code and should be located in the text segment when the program is run. The .bss segment identifies stuff that lives in the assembler directive to identify stuff in the bss segment. The .data directive identify stuff in the data segment, so on and so forth.

There are also various storage directives that will store content of some variety directly into the current segment-- whatever was last identified by a segment directive. So if, at some point, there is a directive x colon dot space 20, that space directive says, allocate some amount of memory. And in this case, it says, allocate 20 bytes of memory. And we're going to label that location x.

The .long segment says, store a constant long integer value-- in this case, 172-- in this example, at location y. The asciz segment similarly stores a string at that particular location. So here, we're storing the string 6.172 at location z. There is an align directive that aligns the next content in the assembly file to an 8-byte boundary.

There are additional segments for the linker to obey, and those are the scope and linkage directives. For example, you might see .globl in front of a label. And that single is linker that that particular symbol should be visible to the other files that the linker touches. In this case, .globl fib makes fib visible to the other object files, and that allows this other object files to call or refer to this fib location.

Now, let's turn our attention to the segment at the top, the stack segment. This segment is used to store data and memory in order to manage function calls and returns. That's a nice high-level description, but what exactly ends up in the stack segment? Why do we need a stack? What data will end up going there? Can anyone tell me?

STUDENT: Local variables in function?

TAO SCHARDL: Local variables in function. Anything else? You already answered once. I may call on you again. Go ahead.

STUDENT: Function arguments?

TAO SCHARDL: Sorry?

STUDENT: Function arguments?

TAO SCHARDL: Function arguments-- very good. Anything else? I thought I saw a hand over here, but--

STUDENT: The return address?

TAO SCHARDL: The return address. Anything else? Yeah? There's one other important thing that gets stored on stack. Yeah?

STUDENT: The return value?

TAO SCHARDL: The return value-- actually, that one's interesting. It might be stored on the stack, but it might not be stored on the stack. Good guess, though. Yeah?

STUDENT: Intermediate results?

TAO SCHARDL: Intermediate results, in a manner of speaking, yes. There are more intermediate results than meets the eye when it comes to assembly or comparing it to C. But in particular, by intermediate results, let's say, register state.

There are only so many registers on the machine. And sometimes, that's not enough. And so the function may want to squirrel away some data that's in registers and stash it somewhere in order to read it back later. The stack is a very natural place to do it. That's the dedicated place to do it. So yeah, that's pretty much all the content of what ends up on the call stack as the program executes.

Now, here's the thing. There are a whole bunch of functions in the program. Some of them may have been defined in the source file that you're compiling right now. Some of them might be defined in other source files. Some of them might be defined in libraries that were compiled by someone else, possibly using a different compiler, with different flags, under different parameters, presumably, for this architecture-- at least, one hopes. But those libraries are completely out of your control.

And now, we have this problem. All those object files might define these functions. And those functions want to call each other, regardless of where those functions are necessarily defined. And so somehow, we need to coordinate all those function calls and make sure that if one function wants to use these registers, and this other function wants to use the same registers, those functions aren't going to interfere with each other. Or if they both want to read stack memory, they're not going to clobber each other's stacks.

So how do we deal with this coordination problem? At a high level, what's the high-level strategy we're going to adopt to deal with this coordination problem?

STUDENT: Put the values of the registers on the stack before you go into the function.

TAO SCHARDL: That will be part of it. But for the higher level strategy-- so that's a component of this higher level strategy. Yeah? Go ahead.

STUDENT: Calling convention?

TAO SCHARDL: Calling convention. You remembered the title of this section of the talk. Great. We're going to make sure that every single function, regardless of where it's defined, they all abide by the same calling convention. So it's a standard that all the functions will obey in order to make sure they all play nicely together.

So let's unpack the Linux x86-64 calling convention. Well, not the whole thing, because it's actually pretty complicated, but at least enough to understand the basics of what's going on. So a high level, this calling convention organizes the stack segment into frames, such that each function instantiation-- each time you call a function-- that instantiation gets a single frame all to itself.

And to manage all those stack frames, the calling convention is going to use these two pointers-- rbp and rsp, which you should've seen last time. rbp, the base pointer, will point to the top of the current stack frame. rsp will point to the bottom up the current stack frame. And remember, the stack grows.

Now when the code executes call-and-return instructions, those instructions are going to operate on the stack, these various stock pointers, as well as the instruction pointer, rip, in order to manage the return address of each function.

In particular, when a call instruction gets executed, in x86, that call instruction will push the current value of rip onto the stack, and that will be the return address. And then the call instruction will jump to its operand. It's operand being the address of some function in the program memory, or, at least, one hopes. Perhaps there was buffer overflow corruption of some kind, and your program is in dire straits. But presumably, it's the address of a function.

The return instruction complements the call, and it's going to undo the operations of that call instruction. It'll pop the return address off the stack and put that into rip. And that will cause the execution to return to the caller and resume execution from the statement right after the original call. So that's the high level of how the stack gets managed as well as the return address.

How about, how do we maintain registers across all those calls? Well, there's a bit of a problem. Because we might have two different functions that want to use the same registers. Some of this might be review, by the way, from 6004. If you have questions, just let me know.

So we have this problem, where two different functions, function A, which might call another function B. Those two functions might want to use the same registers. So who's responsible for making sure that if function B operates on the same registers as A, that when B is done, A doesn't end up with corrupted state in its registers?

Well, they're two different strategies that could be adopted. One is to have the caller save off the register state before invoking a call. But that has some downsides. The caller might waste work, saying, well, I have to save all of this register state in case the function I'm calling wants to use those registers. If the calling function doesn't use those registers, that was a bunch of wasted work.

So on the other side, you might say, well, let's just have the callee save all that registered state. But that could waste work if the callee is going to save off register state that the caller wasn't using. So if the callee says, well, I want to use all these registers. I don't know what the calling function used, so I'm just going to push everything on the stack, that could be a lot of wasted work.

So what does the x86 calling convention do, if you had to guess? Yeah?

STUDENT: [INAUDIBLE]

TAO SCHARDL: That's exactly right. It does a little bit of both. It specifies some of the registers as being callee-saved registers, and the rest of the registers are caller-saved registers. And so the caller will be responsible for saving some stuff. The callee will be responsible for saving other stuff. And if either of those functions doesn't need one of those registers, then it can avoid wasted work.

In x86-64, in this calling convention, turns out that the rbx, rbp, and r12 through r15 registers are all callee saved, and the rest of the registers are caller saved. In particular, the C linkage defined by this calling convention for all the registers looks something like this. And that identifies lots of stuff. It identifies a register for storing the return value, registers for storing a bunch of the arguments, caller-save registers, callee-saved registers, a register just for linking.

I don't expect you to memorize this in 12 seconds. And I think on any quiz-- well, I won't say what the course app will do on quizzes this year.

STUDENT: [INAUDIBLE] everyone.

TAO SCHARDL: Yeah, OK, well, there you go. So you'll have these slides later. You can practice memorizing them. Not sure on this slide. There are a couple other registers that are used for saving function arguments and return values. And, in particular, whenever you're passing floating point stuff around, the xmm register 0 through 7 are used to deal with those floating point values.

Cool. So we have strategies for maintaining the stack. We have strategies for maintaining register states. But we still have the situation where functions may want to use overlapping parts of stack memory. And so we need to coordinate how all those functions are going to use the stack memory itself.

This is a bit hard to describe. The cleanest way I know describe it is just to work through an example. So here's the setup. Let's imagine that we have some function A that is called of function B. And we're in the midst of executing function B, and now, function B is about to call some other function C.

As we mentioned before, B has a frame all to itself. And that frame contains a whole bunch of stuff. It contains arguments that A passed to B. It contains a return address. It contains a base pointer. It contains some local variables. And because B is about to call C, it's also going to contain some data for arguments that B will pass to C.

So that's our setup. We have one function ready to call another. Let's take a look at how this stack memory is organized first. So at the top, we have what's called a linkage block. And in this linkage block, this is the region of stack memory, where function B will access non-register arguments from its caller, function A.

It will access these by indexing off of the base pointer, rbp, using positive offsets. Again, the stack grows down. B will also have a block of stack space after the linkage block and return address and bass pointer. It will have a region of its frame for local variables, and it can access those local variables by indexing off of rbp in the negative direction. Stack grows down. If you don't have anything else, stack grows down.

Now B is about to call a function C, and we want to see how all of this unfolds. So before calling C, B is going to place non-register arguments for C on to a reserved linkage block in its own stack memory below its local variables. And it will access those by indexing rbp with negative offsets.

So those arguments from B to its callers will specify those to be arguments from B to C. And then what's going to happen? Then B is going to call C. And as we saw before, the call instruction saves off the return address onto the stack, and then it branches control to the entry point of function C.

When the function C starts, it's going to execute what's called the function prologue. And the function prologue consists of a couple of steps. First, it's going to save off the base pointer for B's stack frame. So it'll just squirrel away the value of rbp onto the stack. Then it's going to set rbp equal to rsp, because we're now entering a brand new frame for the invocation of C.

And then C can go ahead and allocate the space that it needs on the stack. This will be space that C needs for its own local variables, as well as space that C will use for any linkage blocks that it creates for the things that it calls.

Now there is one common optimization that the compiler will attempt to perform. If a function never needs to perform stack allocations, except to handle these function calls-- in other words, if the difference between rbp and rsp is a compile time constant, then the compiler might go ahead and just get rid of rbp and do all of the indexing based off the stack pointer rsp.

And the reason it'll do that is because, if it could get one more general purpose register out of our rbp, well, now, rpb is general purpose. And it has one extra register to use to do all of its calculations. Reading from a register takes some time. Reading from even L1 cache takes significantly more, I think, four times that amount. And so this is a common optimization that the compiler will want to perform.

Now, turns out that there's a lot more to the calling convention than just what's shown on these slides. We're not going to go through that today. If you'd like to have more details, there's a nice document-- the System V ABI-- that describes the whole calling convention. Any questions so far?

All right, so let's wrap all this up with a final case study, and let's take a look at how all these components fit together. When we're translating a simple C function to compute Fibonacci numbers all the way down to assembly. And as you've been describing this whole time, we're going to take this in two steps.

Let's describe our starting point, fib.c. This should be basically no surprise to you at this point. This is a C function fib, which computes the nth Fibonacci number in one of the worst computational ways possible, it turns out. But it computes the nth Fibonacci number f of n recursively using the formula f of n is equal to n when n is either 0 or 1. Or it computes f of n minus 1 and f of n minus 2 and takes their sum.

This is an exponential time algorithm to compute Fibonacci numbers. I would say, don't run this at home, except, invariably, you'll run this at home. There are much faster algorithms to compute Fibonacci numbers. But this is good enough for a didactic example. We're not really worried about how fast can we compute fib today.

Now the C code fib.c is even simpler than the recurrence implies. We're not even going to bother checking that the input value n is some non-negative value. What we're going to do is say, look, if n is less than 2, go ahead and return that value of n. Otherwise, do the recursive thing.

We've already seen this go a couple of times. Everyone good so far? Any questions on these three lines? Great.

All right, so let's translate fib.c into fib.ll. We've seen a lot of these pieces in lectures so far. And here, we've just rewritten fib.c a little bit to make drawing all the lines a little bit simpler. So here, we have the C code for fib.c.

The corresponding LLVM IR looks like this. And as we could guess from looking at the code for fib.c, we have this conditional and then two different things that might occur based on whether or not n is less than 2. And so we end up with three basic blocks within the LLVM IR.

The first basic block checks event is less than 2 and then branches based on that result. And we've seen how all that works previously. If n happens to be less than 2, then the consequent-- the true case of that branch-- ends up showing up at the end. And all it does is it returns the input value, which is stored in register 0.

Otherwise, it's going to do some straight line code to compute fib of n minus 1 and fib of n minus 2. It will take those return values, add them together, return that result. That's the end Fibonacci number. So that gets us from C code to LLVM IR. Questions about that? All right, fib n minus 1, fib n minus 2, add them, return it. We're good.

OK, so one last step. We want to compile LLVM IR all the way down to assembly. As I alluded to before, roughly speaking, the structure of the LLVM IR resembles the structure of the assembly code. There's just extra stuff in the assembly code. And so we're going to translate the LLVM IR, more or less, line by line into the assembly code and see where that extra stuff shows up.

So at the beginning, we have a function. We were defining a function fib. And in the assembly code, we make sure that fib is a globally accessible function using some assembler directives, the globlfib directive. We do an alignment to make sure that function lies in a nice location in the instruction memory, and then we declare the symbol fib, which just defines where this function lives in memory.

All right, let's take a look at this assembly. The next thing that we see here are these two instructions-- a push queue or rbp and a movq of rsp, rbp. Who can tell me what these do? Yes?

STUDENT: Push the base [INAUDIBLE] on the stack, then [INAUDIBLE].

TAO SCHARDL: Cool. Does that sound like a familiar thing we described earlier in this lecture?

STUDENT: the calling convention?

TAO SCHARDL: Yep, it's part of the calling convention. This is part of the function prologue. Save off rpb, and then set rbp equal to rsp. So we already have a couple extra instructions that weren't in the LLVM IR, but must be in the assembly in order to coordinate everyone.

OK, so now, we have these two instructions. We're now going to push a couple more registers onto the stack. So why does the assembly do this? Any guesses? Yeah?

STUDENT: Callee-saved registers?

TAO SCHARDL: Callee-saved registers-- yes, callee-saved registers. The fib routing, we're guessing, will want to use r14 rbx during this calculation. And so if there are interesting values in those registers, save them off onto the stack. Presumably, we'll restore them later.

Then we have this move instruction for rdi into rbx. This requires a little bit more arcane knowledge, but any guesses as to what this is for?

STUDENT: rdi is probably the argument to the function.

TAO SCHARDL: rdi is the argument to the function. Exactly. That's the arcane knowledge. So this is implicit from the assembly, which is why you either have to memorize that huge chart of GPR C linkage nonsense. But all this operation does is it takes whatever that argument was, and it's squirrels it away into the rbx register for some purpose that we'll find out about soon.

Then we have this instruction, and this corresponds to the highlighted instruction on the left, in case that gives any hints. What does this instruction do?

STUDENT: [INAUDIBLE].

TAO SCHARDL: Sorry.

STUDENT: It calculates whether n is small [INAUDIBLE].

TAO SCHARDL: Correct. It evaluates the predicate. It's just going to do a comparison between the value of n and the literal value of 2, comparing against 2.

So based on the result of that comparison, if you recall, last lecture, the results of a comparison will set some bits in this implicit EFLAGS flags register, or RFLAGS register. And based on the setting of those bits, the various conditional jumps that occur next in the code will have varying behavior. So in case the comparison results to false-- if n is, in fact, greater than or equal to 2-- then the next instruction is jge, will jump to the label LBB0 underscore 1. You can tell already that reading assembly is super-fun.

Now that's a conditional jump. And it's possible that the setting of bits in RFLAGS doesn't evaluate true for that condition code. And so it's possible that the code will just fall through pass this jge instruction and, instead, execute these operations. And these operations correspond to the true side of the LLVM IR branch operation. When n is less than 2, this will move n into rax, and then jumped to the label LBB03. Any guesses as to why it moves n into our rax? Yeah?

STUDENT: That's the return value.

TAO SCHARDL: That's a return value-- exactly. If it can return a value through registers, it will return it through rax. Very good. So now, we see this label LBBO1. That's the label, as we saw before, for the false side of the LLVM branch.

And the first thing in that label is this operation-- leaq minus 1 of rbx rdi. Any guesses as to what that's for? The corresponding LLVM IR is highlighted on the left, by the way.

The lea instruction means load-effective address. All lea does is an address calculation. But something that compilers really like to do is exploit the lea instruction to do simple integer arithmetic as long as that integer arithmetic fits with the things that lea can actually compute.

And so all this instruction is doing is adding negative 1 to rbx. And rbx, as we recall, stored the input value of n. And it will store the result into rdi. That's all that this instruction does. So it computes the negative 1, stores it into rbi.

How about this instruction? This one should be easier.

STUDENT: For the previous one, how did you get [INAUDIBLE]? I'm familiar with [INAUDIBLE] because [INAUDIBLE]. But is there no add immediate instruction in x86?

TAO SCHARDL: Is there no add immediate instruction? So you can do an add instruction in x86 and specify an immediate value. The advantage of this instruction is that you can specify a different destination operand. That's why compilers like to use it. More arcane knowledge. I don't blame you if this kind of thing turns you off from reading x86. It certainly turns me off from reading x86.

So this instruction should be a little bit easier. Guess as to why it does? Feel free to shout it out, because we're running a little short on time.

STUDENT: Calls a function.

TAO SCHARDL: Calls a function. What function?

STUDENT: Call fib.

TAO SCHARDL: Call fib, exactly. Great. Then we have this move operation, which moves rax into r14. Any guess as to why we do this? Say it.

STUDENT: Get the result of the call.

TAO SCHARDL: Get the result of the call. So rax is going to store the return value of that call. And we're just going to squirrel it away into r14. Question?

STUDENT: [INAUDIBLE]

TAO SCHARDL: Sorry.

STUDENT: It stores [INAUDIBLE]?

TAO SCHARDL: It'll actually store the whole return value from the previous call.

STUDENT: [INAUDIBLE]

TAO SCHARDL: It's part of that result. This will be a component in computing the return value for this call of fib. You're exactly right. But we need to save off this result, because we're going to do, as we see, another call to fib. And that's going to clobber rax. Make sense? Cool.

So rax stores the result of the function. Save it into r14. Great. Since we're running short of time, anyone want to tell me really quickly what these instructions do? Just a wild guess if you had to.

STUDENT: N minus 2

TAO SCHARDL: n minus 2. Compute n minus 2 by this addition operation. Stash it into rdi. And then you call fib on n minus 2. And that will return the results into rax, as we saw before. So now, we do this operation. Add r14 into rax. And this does what?

STUDENT: Ends our last function return to what was going off this one.

TAO SCHARDL: Exactly. So rax stores the result of the last function return. Add it into r14, which is where we stashed the result of fib of n minus 1. Cool.

Then we have a label for the true side of the branch. This is the last pop quiz question I'll ask. Pop quiz-- God, I didn't even intend that one. Why do we do these pop operations? In the front.

STUDENT: To restore the register before exiting the stack frame?

TAO SCHARDL: Restore the registers before exiting the stack frame-- exactly. In calling convention terms, that's called the function epilogue. And then finally, we return.

So that is how we get from C to assembly. This is just a summary slide of everything we covered today. We took the trip from C to assembly via LLVM IR. And we saw how we can represent things in a control flow graph as basic blocks connected by control flow edges. And then there's additional complexity when you get to the actual assembly, mostly to deal with this calling invention.

That's all I have for you today. Thanks for your time.

Free Downloads

Video

Internet Archive (MP4 - 182MB)

Caption

English-US (SRT)