Lecture 21: Tuning a TSP Algorithm | Lecture Videos | Performance Engineering of Software Systems | Electrical Engineering and Computer Science

Flash and JavaScript are required for this feature.

Download the video from Internet Archive.

About this Video
Playlist
Transcript
Download this Video

Description: Jon Bentley, retired from Bell Labs Research, discusses the traveling salesperson problem. This class is a case study in implementing algorithms, recursive enumeration, algorithm engineering, and applying algorithms and data structures.

Instructor: Jon Bentley

Lecture 1: Introduction and...

Lecture 2: Bentley Rules fo...

Lecture 3: Bit Hacks

Lecture 4: Assembly Languag...

Lecture 5: C to Assembly

Lecture 6: Multicore Progra...

Lecture 7: Races and Parall...

Lecture 8: Analysis of Mult...

Lecture 9: What Compilers C...

Lecture 10: Measurement and...

Lecture 11: Storage Allocation

Lecture 12: Parallel Storag...

Lecture 13: The Cilk Runtim...

Lecture 14: Caching and Cac...

Lecture 15: Cache-Oblivious...

Lecture 16: Nondeterministi...

Lecture 17: Synchronization...

Lecture 18: Domain Specific...

Lecture 19: Leiserchess Cod...

Lecture 20: Speculative Par...

Now Playing

Lecture 21: Tuning a TSP Al...

Lecture 22: Graph Optimization

Lecture 23: High Performanc...

Download English-US transcript (PDF)

The following content is provided under a Creative Commons license. Your support will help MIT OpenCourseWare continue to offer high-quality educational resources for free. To make a donation or to view additional materials from hundreds of MIT courses, visit MIT OpenCourseWare at ocw.mit.edu.

CHARLES LEISERSON: It is my great pleasure to welcome Jon Bentley, now retired from Bell Labs. Jon was my PhD thesis supervisor at Carnegie Mellon. I actually had two supervisors. The other one was HT Kung, who is now at Harvard. I guess people flee Carnegie Mellon like the plague or something.

So Jon is, as you know because you've studied some of his work, is a pioneer in software performance engineering. And he's going to talk today about a particularly neat piece of algorithmic engineering sets that centers around the so-called traveling salesperson problem, which is an NP-hard problem. NP-complete problem in fact. And so, without further ado, Jon, why don't you tell us what you've got to say?

JON BENTLEY: As Charles mentioned, I want to talk with you-- I want to tell you a story about a cool problem. This is a problem that I first heard when I was a young nerd-- not much older than this little pile of nerds in front of me-- in high school, the traveling salesperson problem. Who here has heard of the TSP at some point?

I heard about this in high school, one of the things you read about it in the math books. And a few years later, I had a chance to work on it. In 1980, I was doing some consulting, and I said, well, what you need to do is solve a TSP. Then I went home and realized that all of the stuff that I learned about it was sort of relevant but it didn't solve the problem, so I started working on it then.

Our colleague, Christos Papadimitriou, who's been at Berkeley for a long time after being at a lot of other places, once told me the TSP is not a problem. It is an addiction. So I've been hooked for coming up on 40 years now. And I want to tell you one story about a really cool program I wrote. Because this is one of the-- I've been paid to be a computer programmer for coming up on 50 years, since I've been doing it for 48 years now. This is probably the most fun, the coolest program I've written over a couple day period.

I want to tell you a story. Start off with what recursive generation is. Then the TSP, what it is. Then I'll start with one program, and we'll make it faster and faster and faster. Again, I spend my whole life squeezing performance. This is the biggest squeeze ever. And then some principles behind that.

We'll start, though, with how do you enumerate all the elements in a set? If I want to count-- enumerate the guys between 1 and a hundred, I just count. That's no big deal. But how can I, for instance, enumerate all subsets of the set from the integers from 1 to 5? How many subsets are there of integers from 1 to 5?

AUDIENCE: 2 to the 5.

JON BENTLEY: Pardon me?

AUDIENCE: 2 to the 5.

JON BENTLEY: 2 to the 5, 32. But how do you say which ones they are? How do you go through and count them? Well, you have to decide how you represent it. You guys know all about set representations. We'll stick with bit vectors for the time being.

An iterative solution is you just count-- 0, 1, 2, 3, 4, 5, up to 31. That's pretty easy. But what does it mean to count? What does it mean to go from one integer to the next? How do you go from a given integer to the next one? What's the rule for that?

It's pretty easy, actually. You just scan over all the 0's, turning the-- you start at the right-hand side, the least significant digit, scan over all the 0's, turn it to 1. Oh, I lied to you. You scan over all the 1's, turning them to 0 until you get to the first 0. And then you turn that to a 1. So this one goes to 10. This one goes to 11. This one goes-- that one becomes 0, that one becomes 0. Then it becomes 00100.

So a pretty easy algorithm. You could do it that way. Just scan over all the 1's, turn them to 0's, take that first one and flip it around. But that doesn't generalize nicely. We're going to see a method that generalizes very nicely. This is a recursive solution to enumerate all 2 to the n subsets of a set of size n.

And the answer is this all sets of size m is just put a 0 at this end, enumerate all sets of size m minus 1. How many of these will there be? 2 to the m minus 1. How many of those 2 to the m minus 1? What do they add up to? 2 to the m. But all of these have the 0 at that end, and the one at that end. Everyone see that recursive sketch and how that works?

Here's the example. A period with 0's at this end and you fill it out. You have the 1 at that and you fill that out. If you do that, you notice that in fact we're just counting backwards-- 000, 001, 010, 3, 4, 5, 6, 7. That's the algorithm. And the cool thing is the code is really simple. I could probably write a program like that in most languages and get it correct.

So if m equals 0 in generate all subsets of size m, this doesn't occur at 1. You have a pointer going down the array. Otherwise, set the rightmost bit to 0, generate all subsets recursively, set it to 1, do it again recursively. That's a starting program. If you understand this, everything else is going to be pretty straightforward. If you don't, please speak up.

One thing that-- you people have suffered the tragedy of 14 or 15 or 16 years of educational system that has sort of beaten the creativity out of you and you're afraid to speak up. So even if something-- even if I'm up here spouting total bullshit, you'll ignore that fact and just sort of politely stare at me and nod. But this is important. I want you to understand this. If you don't understand this, speak now or forever hold it. Anyone have any questions? Please, please.

AUDIENCE: What does mean, [INAUDIBLE]?

JON BENTLEY: I'm sorry. Why did we set p to the--

AUDIENCE: [INAUDIBLE].

JON BENTLEY: So here, first I go out to the extreme rightmost and I set it to 0. Then I recursively fill those in. Then I change it from a 0 to a 1 there, and I fill all those in. So this is a program that will go through, and as it enumerates a subset, it will call the visit procedure a total of 2 to the m times, then it comes down to the bottom of the recursion. Thank you, great question. Any other questions about how this works? OK, we'll come back to this.

The traveling salesperson problem. I apologize. I will really try to say the traveling salesperson problem, but I will slip because I was raised with this being the traveling salesman problem. No connotations, no intentionality there, just senility galloping along. It's a cool problem. Abraham Lincoln faced this very problem in the years 1847 to 1853 when he-- everyone here has heard of circuit courts?

Why do they call them circuit courts? Because the court used to go out and ride a circuit to go to a whole bunch of cities. Now people in the cities come to the court. But back in the day, in 1847 to 1853, Lincoln and all of his homies would hop on their horses-- a judge, defense lawyers, prosecutors-- and go around and ride the circuit here.

And so this is the actual route that they rode where they wanted to do this effectively. It would be really stupid to start here in Springfield and go over there, then to come back here, then to go over there back and forth. What they did was try to find a circuit that minimized the total amount of distance traveled. That is the traveling salesperson problem.

We're given a set of n things. It might be a general graph. These happen to be in the plane. But you really-- helicopter service was really bad in those days, so they didn't fly there from point to point. Whether they stayed on roads, what really matters here is the graph embedded in here. I'm going to speak at this. Everything I say will be true for a graph. It will be true for geometry. I'll be sloppy about that. We'll see interfaces, how you handle both, but just cut me some slack for that.

I have actually face this myself when I worked on a system where we had a big mechanical plotter and we wanted to draw these beautiful maps where the maps would fill in dots. They happened to be precincts. Some of them were red, some of them were blue. And you wanted to draw all the red dots first and go around here. And, in fact, the plotter would draw this red dot, then that red dot, then this one, then that one. The plotter took an hour to draw the map.

I was consulted on this. Aha, you have a traveling salesperson problem. I went down. I reduced the length to about 1/10 of the original length. If it took an hour before, how long would it take now? Well, it took about half an hour. And the reason is that the plotter took about half of its time moving around about, 30 minutes moving around, and about 30 minutes just drawing the symbols. I didn't reduce the time drawing the symbols at all, but I reduced the time moving things around from about 30 minutes to about 3 minutes. That was still a big difference.

So I fixed it there. When I worked at Bell Labs, we had drills that would go around, laser drills, move around on printed circuit boards to drill holes. They wanted to move it that way

I talked to people at General Motors at one point. On any day, they might have a thousand cars go through an assembly line. Some of the cars are red, some are white, some are green, some are yellow. You have to change the paint. Some of them have V6, some of them have V8. Some of them are two doors, some of them are four doors.

In what order do you want to build those cars? Well, every time you change one part of the car, you have to change the line. And what you want to do is examine, as it were, all n factorial permutations of putting the cars through and choose the one that involves the minimum amount of change. And the change from one car to another is a well-defined function. Everyone see how that weird TSP is in fact a TSP?

So all of these are cool problems. Furthermore, as a computer scientist, it's the prototypical problem. It's the E. coli of algorithmic problems. It was literally one of the first problems to be proven to be NP-hard. Held-Karp gave a polynomial time algorithm for it. There are approximation algorithms of this. Kernighan-Lin have given heuristics. It's a really famous problem. It's worth studying.

But here is what really happened to me. Here's why I'm standing in front of you today talking about this. My friend Mike Shamos, in his 1978 PhD thesis on computational geometry, talked about a number of problems. One of them was the TSP. And he shows us and he gives an example of this tour. He says, here's a set of 16 points. Here's a tour through them. Here's a traveling salesperson tour through them.

And then he says in a footnote, in fact, I'm not sure if it's a really optimal tour. I applied a heuristic several times. I'm not positive it's the shortest tour. If you wrote a thesis, it would be sort of nice to know what's going on there. Can you solve a problem that was-- this tiny little 16-element problem, 16 points in the plane. Can you really figure out what the TSP is to that? At the time, my colleague, our colleague, a really smart guy, couldn't do it. It was computationally beyond the bounds for him.

Well, in 1997 I came back to this, and I really wondered is it possible now? Computers are a whole lot faster in the 20 years. We were talking about that earlier today. 20 years, computers got faster. A lot of things got better. Have things changed enough so I can write a quick little program to solve this? I don't know. We'll see.

I did that. I talked about it. I gave a talk at Lehigh University 20 years ago. They liked it. They incorporated it into an algorithms class. The same professor gave it time and time and time again. Eventually, he retired. They asked me to come over and give this talk to them.

I can't give a talk about 20-year-old material. Computer science doesn't work that way. So I coded things. I wanted to see how things changed in two years. So this talk is about a lot of things, but especially it's about how has performance changed in 40 years. So that's one of the reasons we were-- one of things we were talking about earlier today.

I could give a bunch of titles for this talk. For you, the title I give is a sampler of performance engineering. It could be-- next week I'll give it at Lehigh. This is their final class in-- one of their final classes in algorithms and data structures. I'm going to try to tie everything they learn together.

It could be all these other things-- implementing algorithms, a lot of recursive generation, applying algorithms, really-- Charles is a fancy dancy academic. He's a professor at the Massachusetts Institute of Technology. I'm just a poor dumb computer programmer, but boy this is a fun program.

What it is not is it's not state-of-the-art TSP algorithm. People have studied the problem for well over a century. They have beautiful algorithms. I am not going to tell you about any of those algorithms for the simple reason that I don't know them. I could look them up in books, but I've never really lived the fancy state-of-the-art algorithms.

And I'm also going to just show you getting the answer. I could analyze it. I've analyzed much of these. If I had another hour or three, I could do the analysis. But I can't, so I'm just going to do the-- show you some anecdotal speeds without really the analysis.

Let's talk about some programs. A simple C program. MAXN is a maximum number, n int is going to be n, the number of cities. I'm going to have a permutation vector, where if I have the tour going from city 1 to 7 to 3 to 4, it says 1734. The distance between cities is going to be given by a distance function. There is this distance d of i, j, the distance from city i to city j.

Here's the first algorithm. What I'm going to do is generate all intact real permutations, look at them, and find the best one. It's not rocket science. The way I'm going to do this is a recursive function where I happened-- I could have done it from left to right. I am a C programmer. I always count down towards 0. So I'm going to count down that way, where all of these cities are already fixed. I'm going to permute these.

Here's the program. To search for m-- all of these have already been fixed. What I'm going to do is if m equals 1, then I check it. Otherwise, for i equals 0 up to m, for each value from 0 to minus 1, I take the ith element. I swap it. swap 3, 7 takes the third and seventh positions and swaps them. I swap that to the final thing. I call it recursively. I then swap it back to leave it in exactly the same state I found it, and I continue.

So here it's going to generate, first, all nine permuta-- put all nine digits in the last position. Then for each one of those, I'll put all eight digits in the last position, and go on down. This is really interesting, important, and subtle. If you don't follow this part, it's going to be difficult. Are there any questions at all about this? Have I lied to you yet about this? You're honest enough to tell me if I have.

AUDIENCE: You're good.

JON BENTLEY: Thank you. Anyone else?

AUDIENCE: [INAUDIBLE].

JON BENTLEY: I'm sorry. Please.

AUDIENCE: Sorry. I'm not really understanding the part that's fixed and what you're permuting, and why is that hard to fix.

JON BENTLEY: So, so far, as I recur down with-- as m moves down, all these are fixed. So I'm going to fix these things, and then I'm going to take care of all these later. So, originally, I'm going to have this array be 0-- if I have a nine-city TSP, it will be 0, 1 2, 3, 4, 5, 6, 7, 8, 9. And first I put 0 in the end and do the rest. Then I put 1 in the end, [INAUDIBLE] 9 in the end, and recur down.

But as the program is progressing, if you stop the program at any time and look at a glance at the program, you can see that, given the value of m, this parameter, the recursive function. So this is a way that I'm essentially building this tree where at the top of the tree the branching factor is 9. At each of those nine nodes, the branching factor is 8, then 7 and 6. It's going to be a big tree.

If n is 10, how big is that tree going to be? What's 10 factorial? Pardon me? When I was a nerd, we used to try to impress people of appropriate genders by going off saying things like 3628800. You can probably guess how effective that was. So 3.6 million. It's going to be a big tree. Any questions about that? Let's go.

When I check things, I just compute the sum there. I start off with the sum being the distance from 0 to p n minus first. Then I go through and add up all the pairwise things and save it. What does it mean to say it? If the sum is less than the minimum sum so far, I just copy those over, change the minsum. And to solve the whole thing, I do a search of size n.

This is a simple but powerful recursive program. You should all feel very comfortable with this. Is it correct? Does it work? Is it possible to write a program with about two dozen lines of code that works? Not the first time. But after you get rid of a few syntax errors, you check it. How do you make sure it works?

I start with n equals 3, and I put 3. Does it give me a tour? Well, it works. Think about it. For 3, 3 factorial, they're all the same tour. That part wasn't hard. 4, now that's interesting. That one works too. This program, in fact, can work.

Is it going to be a fast program? How long will it take if n equals 10? How many seconds? I'm sorry. What class have I stumbled into? Is this in fact Greek Art 303? How long will this take for n equals 10?

AUDIENCE: [INAUDIBLE].

JON BENTLEY: Pardon me?

AUDIENCE: 1 second.

JON BENTLEY: About a second. Pretty cool. For equal 20, how long will it take? A lot longer. Technically speaking, it's going to take a boatload longer.

So what I'm going to do here is-- notice that there are n factorial permutations. You do n of those at each, total of that, on this fairly fast laptop from a few years ago. But now they're all about the same. At 8 seconds, it took that. At 9 seconds, what should be the ratio-- what would you expect to be the ratio between its time at 8 and time at 9?

Well, about a factor of 9, you'd hope. Is 0.5 times 9 about 0.34? Yes, close enough. Here, going down, for 10 it's 4 seconds, 46 seconds. Yes, it's going up by a factor-- so here I've run all my examples. I ran out to 1 minute of CPU time. After that, I estimate. If this one takes 3/4 of a minute, 12 times that is 12 minutes-- 3/4 of that is 9 minutes. For 13, it's 2 hours.

How long should 14 take, ballpark? A day, ballpark. How long will 15 take if 14 takes a day?

AUDIENCE: Two weeks.

JON BENTLEY: Two weeks. How long will 16 take? Eight months. You get the idea. Are you going to go out to 20 for this one? No. Are you going to go out to 16 with this one? Can you just put this into a thesis right now? No. The problem is it's fast for really small values of n. As it becomes bigger-- how can you make the program faster?

If you wanted to make this program faster, what would you do? What are some ideas? Give me some ideas, please. This is performance engineering. You should know this. Ideas for making it faster. Please.

AUDIENCE: You can start with arbitrary nodes. So if you take the tour, you can start anywhere, right?

JON BENTLEY: OK. So you're saying just choose one start and ignore that, ignore all the others. You don't need to take each random start. Fantastic. A factor of n. My friend in the gray T-shirt just got a factor of n. How else can you make it faster? What ideas do you have? Please.

AUDIENCE: You can start by the distance, and then reject things that were [INAUDIBLE].

JON BENTLEY: Be greedy. Follow the pig principle. If it feels good, do it. Do just local optimization. We'll get to that in a long time, but, boy, would that be a powerful technique. Other ideas, please?

AUDIENCE: Parallelize [INAUDIBLE].

JON BENTLEY: Ah. Parallelize. I would write that out, but the first I would have to do is remember how many R's and L's there are in various places. So I'll write that much. But we'll have a comment on that at the end. People tried that. Sir?

AUDIENCE: Clock the machine.

[STUDENTS LAUGH]

JON BENTLEY: Unlike you, Charles and I, at one point, attended a real engineering school at Carnegie Mellon, formerly known as CIT, Carnegie Institute of Technology. Charles, do you remember the cheer?

CHARLES LEISERSON: The cheer?

JON BENTLEY: The cheer.

CHARLES LEISERSON: I don't know how to cheer.

JON BENTLEY: 3.14159, tangent, secant, cosine, sine. Square root, cube root, log of e. Water-cooled slipstick, CIT. What's a water-cooled slipstick?

AUDIENCE: [INAUDIBLE].

JON BENTLEY: Pardon me?

AUDIENCE: [INAUDIBLE].

JON BENTLEY: It's a slide rule that you run so fast. It has to be water-cooled. So if you can just overclock the machine, just spray it with a garden hose. And as long as it makes over the finish line, you don't care if it dies when it collapses. So, sure, you can get faster machines. We'll talk about that.

How else can you make this faster? Other ideas? These are all great ideas. We'll try it. Let's see some ideas. Compiler optimizations. I just said gcc and I ran it. What should I have said instead? Instead of just gcc?

AUDIENCE: O3.

JON BENTLEY: O3. How much difference will that make? I used to know the answers to all these. [INAUDIBLE] turn on optimization, 10%. Sometimes, whoopee-freaking-do, 15%. Does turning on O3 still make it a 15% difference? We'll see. You could do that.

A faster hardware. I did this 20 years ago. I had all that number there. I'll show you some of those numbers. Modify the C code. We'll talk about all those options, but let's start with compiler optimizations.

With no options there-- how much faster will it be if I turn on optimization? This is a performance engineering class. You should know that thing. Does it matter at all? Is it going to be 15%? Is it going to not matter at all? How much will it matter to turn on optimization?

AUDIENCE: [INAUDIBLE] a lot.

JON BENTLEY: How much is a lot? I know this isn't the real engineering school of CIT, but pretend like this is kind of a semi-- one of the engineering schools. Give me a number for this.

AUDIENCE: More than 15%.

JON BENTLEY: More than 15% Do I hear more than 16%? I was surprised. If I enabled O3, it went from 4 seconds to 12-- I couldn't even time it here. It wasn't enough to time it here. 45 seconds to 1.6 seconds. I can get real times down there. I observed, ballpark here, about a factor of 25. Holy tamale.

On a Raspberry Pi, it was only a factor of 6, and on other machines it was somewhere between the two. Turning on optimization really matters. Enabling that really matters. For now on, I'm only going to show you full optimization. It's cheating not to. But just think about that, a factor of 25.

How else can I make if faster? Two machines. Back in the day, I happened to have some data laying around of running it on a Pentium Pro at 20 megahertz. Nowadays, I had this. How much faster will this machine be 20 years later? Again, pretend like you're at a real engineering school. What will it be? Please.

AUDIENCE: 20 times faster?

JON BENTLEY: 20 times faster? How did you get 20 times faster?

AUDIENCE: Well, the clock speed is 10 times faster.

JON BENTLEY: The clock speed about 10.

AUDIENCE: But I'm guessing that it has much better instructions.

JON BENTLEY: Here's what I found. On this machine, it went from a factor-- there is about a hundred-- these factors, I found, consistently were about, over the 20 years, about a factor of 150. From Moore's law, what would it be if you had 20 years if you doubled every two? That's 10 doublings. What is 2 to the 10th?

AUDIENCE: It's a thousand.

JON BENTLEY: A thousand. So Moore's law predicts a thousand. It's more than a factor of 20. I got a factor of 150 here, which is close to what Moore's law might predict, but there is some slowing down at the end. I'm not at all traumatized by this.

A speed-up of about a factor of 150, where does that come from? My guess is you get about a factor of 12 due to a faster clock speed, and another factor of 12 due to things like wider data paths. You don't have to try to its cram everything into 16-bit funnel. You have 64-bit data paths there. Deeper pipelines, more appropriate instruction sets, and compilers that exploit those instruction sets if O3 is enabled. If O3 is not enabled, sucks to be you. Questions about that? Let's go.

So we have constant factor improvements, external, modern machines, turn on optimization. But a factor of 150 and a factor 25 is a lot. We were starting off with that. That is a good start. Back in the day, if you change things from doubles to floats, it got way faster. From floats, the answer was faster yet. Does that change make much difference nowadays? No. Exactly the same runtime.

One thing that does make a difference is-- this is the definition of the geometric distance. My j is the square root of the sum of the squares of the differences. That's doing an array access, a subtraction, a multiplication, multiplication, two array accesses, subtraction, multiplication, addition, and a square root. That used to take a long time.

If I replace that with a table lookup by filling out this sort of table, the distance for algorithm 2 is just the distance arrays of i sub j. That gave me a speedup factor of 2 and 1/2 or 3. Back in the day, that was a speedup factor of 25.

For you as performance engineers, you have all this intuition. Every piece of intuition you have, that I had, that was really appropriate 10 years ago is irrelevant now. You have to go back and get models to figure out how much each thing costs. But, still, it's another speedup factor of 3 just by replacing this arithmetic with a table lookup.

Algorithm 3. What we're going to do is choose the ones we need to start with. So we'll start at city 1. We'll leave 9, if we have a 9-element problem, in that position, and just search of n minus 1. It doesn't matter where you start. You're going to go back to it, so you can just choose one to start with.

Not a lot of code. Permutations are now that, distance at each. So now you've reduced n times n factorial to n factorial. Algorithm 4 is I'm computing the same sum each time. Is there a way to avoid computing the same darn sum each time? We'll carry that sum along with you.

Instead of recomputing the same thing over and over and over, start off with the sum being 0. The parameters are now m and the distance so far. s Then you just add in these remaining pieces at each point, and you solve it that way. And there it's sort of a nice piece of mathematics. I wish I had the time to analyze it.

I did a spreadsheet where I said, what's the ratio of this? And it started off as 3, 3 and 1/2, 3.6, 3.65, 3.7-- 3.718281828. What does that mean if you see a constant 3.718281828? It's 1 plus e. And once I knew what the answer was, even I, in my mathematical frailty, was able to prove that it's 1 plus e times n factorial. I'm not giving you the proof, but it's very cool. You run across these things.

So here are the four algorithms so far. On an entirely different semi-fast machine, the runtime-- here the real clock times on this machine were 10, 11, 12, 13. Real times in bold are measured times. These other times are approximate estimates. And you can see now that for size 13, you go from taking a fraction of an hour to taking a third of a minute.

We've made some programs faster. That's pretty cool. We feel good about this. This is what we do. Any questions at all? We got to go faster. How do we go faster?

To say precisely, for all these experiments, I took one data set. And if I say that runtime for size 15, I take the first 15 elements of that data set. For 16, I take the first 16 elements. 17, and so on and so forth. It's not great science. I've done the experiments where I did it on lots of random data. The trends are the same. It smooths out some of the curves, but we'll see this. The times are for initial sequence of one random set. It's pretty robust.

But the problem has factorial growth. it started factorial. It's still factorial. What does that mean? Each factor of n allows us to increase the problem size by 1 in about the same time. Faster machine and all that, we can now push into the teens. What does that mean?

You can take Abraham Lincoln's problem, and they got a tour with this length. The optimal tour looks sort of the same on this side, but it's really different over here. Charles, what figure is that? I've mentioned yesterday that if you work on the traveling salesman, every instance you see turns into a Rorschach test.

CHARLES LEISERSON: The first one is a bunny hopping, and the second one is just the head of the bunny.

JON BENTLEY: The bunny head. Everyone see that? Those are in fact the correct answers. He is a psychologically sound human being. Does anyone else want to give their Rorschach answers? A free diagnosis. Absolutely no charge. I'll completely diagnose you. but the bunny hopping and the bunny head are in fact the correct answers for here. We'll see more later.

So Abraham Lincoln, you've solved his problem now. My friend Mike Shamos could solve his problem. Did he get the optimal tour?

Well, over here he got a big part of it. But over here it's really sort of a different character. It's a fairly different character. Is it far off? Yes, about a third of a percent off. So his approach was within a third of a percent.

I've always worked-- I spent much of my career working on approximate solutions to TSPs. Those are often good enough. This algorithm, you can prove-- that he applied-- is within 50%. In the real world, it got within a third of a percent. Wow. But now we can go out and we can solve the whole problem in 16 hours.

If you were writing the thesis and you happened to do this, would it be worthwhile now to sink 16 hours of CPU time into this? You're going to go away for a weekend and leave your machine running. At the time, Charles, when we had one big computer for 60 or 70 people in that department, could we have dreamt about using 16 hours for that? On the very border. If you made it a really mellow background process, it might finish in a week or three.

All of these things change. The computers get faster. They get more available. You can devote a machine to dump 16 hours down this. But can we make it faster yet? Can we ever analyze, say, all permutations of a deck of cards? How many permutations are there of a deck of cards if you take out those jokers? What's that number?

AUDIENCE: 15 zeros?

JON BENTLEY: 1 with 15 zeros after it? It's a big number, 2 to the-- 52 factorial. I want to teach you how big 52 factorial is. People say, that problem is growing exponentially. What does that mean? It's quick is what people usually mean by it.

In mathematics, it's some constant to the n for some defined time period n. Factorial growth-- is factorial growth exponential growth? Why not? Why isn't a factorial exponential?

AUDIENCE: It's more than exponential?

JON BENTLEY: It's more than exponential. It's super exponential. We'll talk about the details here. By Sterling's approximation, you have seen in other classes that log of n factorial is n log n minus n plus O of log n for the natural log. The log base 2 of n factorial is about n log n minus 1.386n. Where have you seen this number before? n log n minus 1--

In an algorithms class, you did a lower bound on a decision tree model of sorting. There were n factorial leaves to sort. A sort algorithm must take at least as much time. So that gives you that bound. And merge sort is n log n minus n, so you're really narrow.

Where else have you seen 1.386n? That's the runtime of quick sort. All these things are coming back together here, because it's the natural log of e-- I'm sorry-- the log base 2 of e. So n factorial is not 2 to the n. It's 2 to the n log n. It's about n to the n. It's faster than any exponential function.

How big is 52 factorial? You guessed 10 to the 15th? Was that--

AUDIENCE: Yes.

JON BENTLEY: OK. If we see here, it's going to be something like 2 to the n log n. n is 52. Log of 52 is about six. So that's 2 to the 300. But there's a minus n term. Maybe 2 to the 250. It's about 2 to the 225, which is 10 to the 67th. That's a big number. How big is it? Let me put it in everyday terms.

Try this after class. Set a timer to count down 52 factorial nanoseconds, 10 to the 67th. Stand on the equator-- watch out where you are-- and take one step forward every million years. Don't rush into this. I don't want you to get all hyper about this.

Eventually, when you circle the Earth once, take a drop of water from the Pacific Ocean, and keep on going. Be careful about this. But this is an experiment. You're nerds. It's OK.

When the Pacific Ocean is empty, at that point lay a sheet of paper down, refill the ocean, and carry on. Now keep on doing that. When you're stack of paper reaches the Moon, check the timer. You're almost done. This is how big 10 to the 52nd is. The age of the universe so far is about 10 to the 26th nanoseconds. 10 to the 52nd is a long time.

Can we ever solve a problem if we look at all 10 to the 52nd options? What do we have to do instead?

AUDIENCE: Quantum computing?

JON BENTLEY: Pardon me?

AUDIENCE: Quantum computing.

JON BENTLEY: Quantum computing. OK. That's great. And I have a really cool bridge across this river out here that I'll sell you after class. Let's talk about that. Is there a nice quantum approach to this problem? Maybe. Maybe you could actually phrase this as an optimization problem where you could maybe get some mileage out of that. But we'll see.

So one approach is quantum computing. What's another approach? What are we going to have to do to make our program surmount this obstacle? Please.

AUDIENCE: Limit the search space?

JON BENTLEY: Pardon me?

AUDIENCE: [INAUDIBLE].

JON BENTLEY: We're going to have to limit our search space. We're going to have to prune the search space. That's the idea. Let's try it. Here's a cool problem. I was at a ceremony a few weeks ago. A friend of mine said here's this cool problem that his daughter just brought home from high school. How do you solve it?

Find all permutations of the 10 integer-- the nine integers 1 through n such that each initial substring of length m is divisible by m. So the whole darn thing is divisible by 9. Is any permutation of integers 1 through 9 divisible by 9? Well, they all sum up to numbers divisible by 9. You work that. Is it divisible-- are the first eight characters divisible by 8?

But let's start with an easy one. If you were doing it for size 3, 321 works. Is 321 divisible by 3? Is 32 divisible by 2? Is 3 divisible by 1? Thinking, then, it works. Is 132 divisible by 3? Yes. Is 13 divisible by 2? [MAKES BUZZER SOUND] That doesn't work.

So we're going to try to solve this problem. My friend Greg Conti, a really great computer security guy, gave me this problem. How do you solve it? How would you solve this problem? If this high school kid says, here's a problem I brought home from school, how do I solve it? What would you do? Ideas? I'm sorry. Please.

AUDIENCE: Yes. You could write a program where the state could be [INAUDIBLE]. Or actually just like a subset [INAUDIBLE]. Then you iterate over [INAUDIBLE].

JON BENTLEY: Great. So there are two main approaches. One is write a program. So you can either think or you can compute. Who in this room enjoys writing programs? Who enjoys thinking? Oh, that's an easy call. What's the right approach here?

Well, the right answer is you think for a while. If you solve it in the first three minutes, don't write a program. If you spend much more than five minutes on it, let's write a program and see what we learn from the program. We'll go back and forth. Never think when you should compute, never compute when you should think. How do you know which one to do? Try each. See which one gets you further faster.

If you write a program for this, What are the basic structures you have to deal with? You have to deal with nine-digit strings that are also nine-digit numbers. What's a good language for dealing with that? What would you-- if you had to write a program to do this, what language would you choose? We'll see.

How do you generate all intact real permutations of the string? Well, I hope you can see this. Here's the way that I chose to do it. I chose to have a recursive procedure search. And I'm going to have right be the part that's already fixed, left be the part that you're going to vary. I could've done it the other way, but I'll choose to do it this way.

I start with left equals that, right equals that. I end when the left is empty. So I have to recur down, just like we've been doing so far, but I'm going to do that with strings instead. And if I get to the call search of 56-- of 356 with 421978-- these are all fixed-- I'll take each one of these in turn, 3, 5, and 6, put it into here. So I'll call search of 56 with that, search of 36 with that, search of 35 with that. Everyone see how that works?

How long will the code be in your favorite language? Here's the code in my favorite language. Has anyone here ever used the AWK programming language, written by Aho, Weinberger, and Kernighan? They observed that naming a language after the initials of the authors shows a certain paucity of imagination. But it works.

So a function search of left, right, that, if left equals 0-- is null, I'll check it. Otherwise, what will I do here? The details don't matter. For i equals 1 up to the length of the left-hand side of the string, search the substring at the left starting at 1, going for i minus 1, concatenated with the substring at the left starting at i plus 1. And then take the substring in the middle, put it out in front of the right. Do that for all i values. Any questions about that? The details don't matter. It's not a big program.

If I do this, and at the end, for i equal 1 to length, if the substring of the right mod i is nonzero, then return. If it's not that, you print it out. If I run this program, how long, ballpark, will this program take for 9 factorial, ballpark? What was your answer before?

AUDIENCE: A second.

JON BENTLEY: A second. Great. Well, we'll recycle that. Reduce, reuse, recycle. We'll recycle your answers. If I call it originally with that string, it takes about 3 seconds. And it found that there was-- it searched all 9 factorial, 362880, 362,000 strings, and found only one string there that matches that. Whoops.

Are these divisible by 9? Well, they sum to a multiple of 9, sure. Is the string that ends in 72 divisible by 8? Yes, that works. 7, I'm not going to bother with. All the way down, is 38 divisible by 2? Is 381 divisible by 3? This one works. That's a pretty cool problem for a high school afternoon.

Is 3 seconds fast enough? Yes. The trade-off of thinking and programming. Write the darn program. You're done. It's cool. If you wanted to make it faster, how could you make it faster? That's what this course is all about? Always think about how you could make things faster. Please.

AUDIENCE: Well, if you just stop searching once you know one number isn't going to work.

JON BENTLEY: How early can you stop searching? That's great. So you could get constant factor speedups. Like don't check for divisibility by 1 at the end. You can change language, all that. But those are never going to matter.

The big win is going to come from pruning the search. How can you put in the search? Any winning string must have some properties of this string. What are some properties that that string has that you can check for early? Please.

AUDIENCE: The second from the left [INAUDIBLE] 2, 4, 6 or 8.

JON BENTLEY: The eighth position has to be a multiple of 2. Furthermore, if you really think about it, you can get more than that. It has to be divisible by 4. So an even number has to be in the eighth position. Anywhere else you're going to need an even number?

AUDIENCE: [INAUDIBLE].

JON BENTLEY: This position has to be even, that has to be even, that has to be even, that has to be even. In general, what's the general rule?

AUDIENCE: All the even positions [INAUDIBLE].

JON BENTLEY: Every even position has to contain an even number. There are four even numbers, there are five odd numbers. What other rule might you come up with?

AUDIENCE: The fifth position has to be 5.

JON BENTLEY: OK. Every odd position has to be an odd number. And, in particular, the fifth position has to be a 5. So those are a few rules. Even digits in even positions, odd digits in odd positions, digit 5 in position 5. Three simple rules. You can test those easily. The code is pretty straightforward.

Will that shrink the size of the search space much at all really? How big was the search space before? 9 for the first one. Now how big is the search space? For the first, if you just had the three rules-- evens going evens, odds in odds, and 5 in the middle-- how many choices do you have for the first one?

AUDIENCE: For the first, we have [INAUDIBLE].

JON BENTLEY: Four choices. For the second one, you have?

AUDIENCE: [INAUDIBLE].

JON BENTLEY: It can't be a 5. It has to be an odd number, not a 5. You have a 4. So it's 4 by 4 times 3 times 3 times 1 times 2 times 2 times 1 times-- everyone see that? We've reduce the size of the search space from a third of a million to half a thousand.

Isn't it going to be a lot of hassle to code that? I mean, is it going to take a major software development effort to code that? Well, yes, if you define that as a major software development effort.

If the parity of the string length is equal to the parity of the digit, then you can continue. If you don't have these things, you can't continue. Three lines of code allow you to do this. That's the story. Factorial grows quickly. You can never visit the entire search space. The key to speed is pruning the search. We're doing just a baby branch-and-bound, it's called.

Some fancy algorithms can be implemented in little code. That's our break. We've learned a couple of things. We're going to go back into the fray. Any questions about this diversion before we go back to the TSP? These are important lessons. We'll try to apply them now.

I got great advice yesterday from people about how to do this. And I seem to have skipped-- OK, here it is. I've got it. How do we prune our search? Here we had these conditions. How can we prune the search? How can I make the program faster? What's the way I can stop doing the search?

Simplest way, don't keep doing what doesn't work. If the sum that you have so far is greater than the minimum sum, by adding more to it, are you going to make it less? What can you do? You can stop the search right there.

Is the resulting algorithm going to be faster? Maybe. It's a trade-off. I'm doing more work, which takes some time, but I might be able to prune the search space. The question is, is this benefit worth this cost? What do you think?

Well, on the same machine, algorithm 4 at size 12 took 0.6 seconds. Now it's a factor of 60 faster, a factor of 40 faster, a factor of 100 faster. Just by-- if it doesn't work, if you've already screwed up, just don't keep what doesn't work. That makes the thing a whole lot faster. Everyone see that? That's the first big win.

Can we do even better than that? Is there any way of stopping the search with more information other than, whoops, I've already gone too far? Please.

AUDIENCE: If the nodes you visited previously--

JON BENTLEY: Wait. Command voice. Speak loudly,

AUDIENCE: If the nodes you visited previously are the same, like the same subset but a different word than a search you've done before, then the answer [INAUDIBLE].

JON BENTLEY: That's a really powerful idea that Held and Karp used to reduce it from n factorial time to n squared 2 to the n time. We'll get to that. That's really powerful, but now we're looking for something not quite that sophisticated. But that's a great idea.

Can I somehow prune the search if a sum plus a lower bound on the remaining cities is greater than the minimum sum? What kind of lower bound could I get? Well, I could computed a TSP path through them. That's really powerful. That will give me a really good bound, but it's really expensive to compute.

So I could-- if this is a city I've done so far, I could compute a TSP path to the rest, which might in this case looks like this, and hook it up. That's going to be a really powerful heuristic, but it's going to be really expensive to compute. On the other hand, I could take just the distance between two random points. I'm going to choose this point and this point I happened to get the diameter of the set.

And that's a lower bound. It's going to be at least that long. And it's really cheap to compute, but is it very effective? Nyah. So the first choice is effective but too expensive. The second point is really inexpensive but not very effective.

I could also compute the nearest neighbor of each city. From this city, if I just compute its nearest neighbor among here, so it's that. This one is that. That one has its own nearest neighbor. I could compute these distances. And that's pretty inexpensive to compute, and it's a pretty good lower bound. That would work.

Who here knows what a minimum spanning tree is? Good. What I'll do here is I'll take here a minimum spanning tree. In cities, a tree is n minus 1 edges. This tree is n minus 1 edges. This is a spanning tree because it touches-- it connects all cities. And, furthermore, it's a minimum spanning tree, because, of all spanning trees, This one has the minimum total distance.

Now, the tour is going to be less-- or greater in distance than the minimum spanning tree. Why is that? If I get a tour of this, I can just knock off the longest edge. And that now becomes a minimum spanning tree. So the minimum spanning tree is a pretty good bound, a lower bound. It's cheap to compute.

Who here has ever seen an algorithm for computing minimum spanning trees? Good, good. Some of you are awake in some of their classes. What are the odds of that? I mean, what an amazing coincidence.

So what we'll do is say now that a better lower bound is to add the minimum spanning tree's remaining points. So I change this program to if sum plus the MST distance. And now I'm going to do a trick. I'm going to use word parallelism. I'm going to have the representation of the subset of the cities as a mask, a bit mask in which if the appropriate city is on, the bit is turned on. Otherwise, it's turned off.

And I just OR bits into it, and say if I compute the minimum spanning tree of this set, I can cut the search and return. And then I just compute the MST and bring this along with me, turning things off and on in the bit mask as I go down. Pretty straightforward. How much code will it cost to compute a minimum spanning tree? Ballpark? Yes.

AUDIENCE: 30 or 20 lines of code.

JON BENTLEY: About that many lines of code. This is the Prim-Dijkstra method. It takes quadratic time. For computing an MST of n points, it takes n squared time. It's quite simple. You can do it in e log log b time. But this is a simple code. It's pretty straightforward.

Will this make the program run slower or faster? What would the argument be that it might run slower? Holy moly. At every node I'm computing an MST. That takes long time and I will run slower. What's the argument to be that it might run faster? Yes, but I'm getting a much more powerful pruning. Is it worth it?

I should point out that I'm only showing the wins here to you. When I redid this myself, I went down a few wrong paths. I wish I would have documented them better now. But I might go back and see if I can find them. That would be a good thing.

But here it is. It used to take 17 seconds. Now it takes-- or 4 seconds. Now it takes 0. You like algorithms to take 0 seconds. You'd like to live in the rounding error. 4.40 to 0.2. Down here, this program is not only faster, it's a boatload faster.

And so now we can go out in this. And notice here that as you go out, the times usually get bigger, but they are bumpy, from 2.4 seconds to 0.7 seconds, to 1.8 seconds. It's because you're doing that one thing. It's just the matter of the geometry. The times that were originally really smooth now turn bumpy. I've done experiments where I do 10 different data sets, randomly drawing each one, and it's a nice smooth line. But I missed doing it here to be easy.

Before we can go out to size 17. Now we can go out to size 30. Wow. How cool is that? That's pretty powerful. Can I make this-- please.

AUDIENCE: So is it possible that the [INAUDIBLE] is chosen in such a way that this thing doesn't actually prune any bad permutations?

JON BENTLEY: That's absolutely true. And I've tried this both on random point sets. I've tried it on distance matrices. I've tried on points where they're randomly distributed around the perimeter of a circle. And so this could be a lot of time. Almost always, it's pretty effective. Again, if I had more time, I'd talk about it. But in fact we're going to go until 3:45, Charles?

CHARLES LEISERSON: 3:55.

JON BENTLEY: 3:55? When the big hand is on the 11? Oh. Sucks to be you.

[STUDENTS LAUGH]

I profiled this bad boy, and it shows that most of the time is in building minimum spanning trees. Your fear that it might take a long time, it might make it slower, has a basis. That's where all the time is going. How can I reduce the time spent in building minimum spanning trees? As I search this-- please.

AUDIENCE: Maybe don't do it every time?

JON BENTLEY: I could do some incremental minimum spanning trees because they change a lot. And so there are several responses. One is whenever you're building something over again, rather than building it from scratch, see if you can do an incremental algorithm, where you just change one bit of the minimum spanning tree. If I just add one edge into the graph, always try an incremental algorithm. That's cool. That's one sophisticated approach.

What is one-- what was another pretty idiot simpler approach? Whenever you compute something over and over again, what can you do to reduce the time spent computing it?

AUDIENCE: Store it?

JON BENTLEY: Store it. Do I ever compute the same MST over and over again? I don't know. I think maybe it's worth a try. So what I'll do is return of caching. Store rather than recompute. Cache MST distances rather than computing them.

The code looks like this. The new mask is that. If the MST distance array is less than 0, initialize everything to 0. Here I'm just going to store them all in a table of size 2 to the n. I can do direct indexing. If it's less than 0, compute it, fill in the value. If sum plus that, return. Not much code.

But do you really want to store-- to blast it out and to use a lazy-- I'm using lazy evaluation of this table here. Only when I need it do I fill in a value. That's not effective. Rather than storing all 2 to the n tables, what can I do instead? What's our favorite data structure for storing stuff?

Hash table. A cache via hash. So the key to happiness. You can write that down too. Store them in a hash table. If sum plus MST distance lookup-- oh, but I have to implement a hash table now.

How much code is that going to be? Ballpark? What does it cost to build a hash table? Roughly. Come on. Yes. About that many lines.

So just go down the hash table. If you find it, return it. Otherwise, make a new node, compute the distance, put it in there, fill in the values, and you're done. Is it going to be faster? Oh, we'll see. Who reads xkcd on a regular basis? The rest of you are bad, bad, bad people, and you should feel very guilty until you go to xkcd.com and start reading this on a regular basis.

I mean, like wow. This is two deep psychological insights in one lecture for no additional fee. Sir.

CHARLES LEISERSON: Were you resolving collisions by chaining them?

JON BENTLEY: Right, by chaining, yes.

CHARLES LEISERSON: Why bother? Why not just store the place value and keep a key to make sure that it's the value associated with the one that you want?

JON BENTLEY: That is a great question, and the answer is left as an exercise for the listener. We've got about 20 minutes, Charles.

CHARLES LEISERSON: Code, less code.

JON BENTLEY: It would be, yes. And it's well worth a try. All of these things are empirical questions. One thing that's really important to learn as a performance engineer is that your intuition is almost always wrong. Always try to experiment to see.

It's a great question. When I get home, I'll actually-- when I leave here, I'm going to go up to try to climb Mount Monadnock. Who here has ever climbed Mount Monadnock? Yes. I finished climbing all 115 4,000-foot peaks in the Northeastern US last year. I've never climbed Monadnock. I'm really eager to give it a try tomorrow.

xkcd. Brute force n factorial. The Held-Karp dynamic programming algorithm uses the grown-up version of dynamic programming for n squared 2 to the n, but even better. Algorithm 6 looks like that if I cache the TSPs. Does it have to be faster? No. Is it faster? Oh, by about a factor of 15 there. By about a factor of 25 there, 26 there.

You can go out now much further, 6 and 8. So we've done that. Is there any other way to make this program faster? We've pruned the search like crazy. Any other way to make it faster? Please.

AUDIENCE: [INAUDIBLE].

JON BENTLEY: I forget what happens at 39. Let's see. At 39, it went over a minute. And, like I said, this thing goes up and down. I guess it just hit some weird bumps in the search space.

That's something else. The first algorithm is completely predictable. The other algorithms, you have to get more and more into analysis. And now the times go up and down. There is a trend. And, basically, I'm taking an exponent and I'm lowering-- I turned it from super exponential to exponential, and then I'm being down on the exponent right now.

Can you make this run faster? What we're going to do is take this idea of a greedy search. I've can have smarter researching. Better than a random order, I'm going to do a better starting tour. And what I'm going to do is always at each point sort the points to look at the nearest one to the current one first. Start with a random one. Then for the next one, always look at the nearest point first, then the second nearest, the third nearest, et cetera.

So I'll go in that order. That should make the search smarter, and that should guide me rather quickly to the initial starting tour. Rather than just a random tour, I'll have a good one that will give me a better prune of the search space. Will that make a difference? We'll have to include a sort. I'll get two birds with one modification.

By a really dumb insertion sort, which takes up that many lines of code, I'll visit the nearest city first, then others in order. If I do that, here it's a factor of 2, there it's a factor of 8, a factor of 4. But it seems to work. It gives you some-- as you go out especially. I can now go out further. I lied. I didn't stop my search at 60 seconds there. But I can now go up further, and it seems to be a lot faster.

So in 1997, 20 years ago, I was really happy to get out to 30. The question now is, in 20 more years, how much bigger can I go? If I just depend on Moore's law alone, in 20 years a factor of a thousand. At 30, 30 times 31 times 32, that's a factor I can go up by Moore's law. With a [INAUDIBLE] algorithm, it would give me two more cities at this size in 20 years.

Can I get from 30 on to anything interesting by combining Moore's law, and compiler technology, and all the algorithms. How far can I go? Well, I was going to give a talk at Lehigh. So I could go out-- in under a minute, I could go to the 45-city tour. Charles answered this yesterday, so he is completely clear.

Rorschach test. Who's willing to go out-- what do you see there?

AUDIENCE: A puppy.

JON BENTLEY: Dancing doggy. That was my answer, dancing doggy. I like that a lot. That's the obvious answer. But Charles-- and this shows a profoundly profound mind. Professor Leiserson, what is this?

CHARLES LEISERSON: This is a dog doing his business [INAUDIBLE].

JON BENTLEY: OK. So any Freudians, you feel free to go to town on that one. 45-city tour, it's pretty cool. Dancing doggy. How far can it go? I got out to 45 in under a minute.

46-- I broke my rule of this-- I went over the minute boundary. This was my Thanksgiving 2016 cycle test. I was just going hog wild. I was willing to spend the-- I had to give a-- I was doing this Wednesday night. I had to give a lecture on Monday. A hundred hours of CPU time. How far can I go?

47. Yes. Yikes, factor of 5. When do I think? When do I run? Should I go back and [? work on it. ?] 52-- wouldn't it be sweet to be able to go out to 52 factorial? Wouldn't that be cool? 48-- that's not bad. That's looking pretty good there, actually.

Oh, ouch, ouch. That's going to take a-- so that about 2 hours right there. But 50, whoo, edge of my seat. The turkey was smelling good, but 51. And can I get to 52? Will it make it? Will I have to go back to my-- whew. 3 hours and 7 minutes.

By combining all of these things, we're able to go out to something that is just obscene. 52 is obscenely huge. We're able to get out there by a combination of all of these things, of some really simple performance engineering techniques. If you're going to work on a real TSP, read the literature, study that. I hope we can come across some things that I've written about approximation algorithms.

But if you really need them, forget the approximation algorithms because they're too short. There's a huge literature. I haven't told you any of that. Everything that I've done here are things that you, as a person who has completed this class, should be able to do. All these things are well within your scope of practice, as we say. You will not be sued for malpractice.

How much code is the final thing? About that much. You build an MST. You had a hash table. Charles points out you could nuke three or four of those lines. You have the sort here. Altogether about 160 lines.

Where have we been? We started we could get out to 11. Store the distances. Out to 12. Fix the starting city. That was a big one. Accumulate distance along the way. These were all good. But then by pruning the search, we started making the big things. Add the MST, store the distances in a hash table, visit the cities in a greedy algorithm. Each one of these gave us more and more and more power as we went out there, till we're finally able to go out pretty far.

There are a lot of things you can do. Parallelism, faster machines, more code tuning, better hashing. That malloc is just begging to be removed. Better pruning, a better starting tour, better bounds. I can take the MST length plus the nearest-- that's why I do this MST-- plus the nearest neighbor to each of the ends. I can get that. Would that make a big difference? Empirical question. Easy to find out.

Can I move by pruning tests earlier? Better sorting. This is really cool. Can I maybe just sort once for each city to get that sorted list, then go through that, precompute and sort, and select the things in order? Is that going to be a win in this context? The main ideas here are caching, precomputing, storing this, avoiding the work. Can I change that n squared algorithm to just a linear time selection? All of these things are really fun to look at.

I've tried to tell you about incremental software development. I started off with around 30, 40 lines of code. It grew to 160. But altogether all the versions come to about 600 lines of code. You've now seen more than you need for one life about recursive generation. It's a surprisingly powerful technique if you ever need to use it. No excuses now. You're obligated to build it immediately.

Storing precomputed results, partial sums, early cut-offs. Algorithms and data structures. These are things that sounded fancy in your algorithms class, but you just pull them out when you need them. Vectors, strings, arrays and bit vectors, minimum spanning trees, hash tables, insertion sort. It's easy. It's a dozen lines of code here. two dozen lines of code there.

I believe that Charles may had mentioned earlier that I wrote a book in 1982 about code tuning. At the time, you did these in the smaller programs. Now compilers do all that for you. But these ideas-- some of these ideas still apply. Store precomputed results. Rather than [INAUDIBLE] elimination in an expression, you now put interpoint distances in a matrix or a table of MST lengths.

Lazy evaluation. You compute the n squared distances eagerly but only the MSTs that you need. Don't bother computing them all. That's essentially what Held and Karp does. Short-circuiting monotone functions, reordering tests, word parallelism. These are the things that you as performance engineers can do quite readily.

I had a lot of tools behind the scenes. I wish I could come back and give you another hour about how I really did this with the analysis and the tools that I used. I had a driver to make the experiments easy, a whole bunch of profilers. Where is the time really going here? What should I focus on? Cost models that allowed me to estimate those, how much does an MST cost. A spreadsheet was my lab notebook for graphs of performance, all sorts of curve fitting.

But these are the main things I wanted to tell you about. The big hand is getting about nine minutes away from the 11. Professor Leiserson, is there anything else that these fine, young semi-humanoids need to know about this material?

CHARLES LEISERSON: Does anybody happen to see any analogies with the current project 4. Maybe people could chat a little bit about where they see analogies [INAUDIBLE].

JON BENTLEY: I don't know it, but one of my first exposures to MIT was when I had Donovan as a software systems book, and it was dedicated to 6.51 graduate students. I saw that I thought, that bastard. I'm sure that the six students really worked hard on it, but to say that the seventh student worked only a little much more over halfway and then to be so precise, that's just cruel. What a son of a bitch that guy had to be.

So I don't know what project 4 is, but is it Leiserchess? Oh, great. I know what that is. So what things-- have you used any of these techniques? Did you ever prune searches? Did you ever store results? What did you do in project 4? You're delegating this. That's a natural leader right there.

AUDIENCE: We talked about search pruning-- we already have--

JON BENTLEY: Speak up so all of them can hear, please.

AUDIENCE: Commander voice. So we already have --everybody in this room knows-- alpha-beta pruning. [INAUDIBLE] It's got search. I don't know how many teams are already working on search but at least my team is working on changing order representation first. So we haven't gotten into pruning search yet, but that's definitely on the horizon [INAUDIBLE].

JON BENTLEY: Is there anyone here from the state of California? I was born in California. When you hear alpha beta, apart from the search, what do you think of?

AUDIENCE: The grocery store.

JON BENTLEY: There's a grocery store there called Alpha Beta. And when Knuth wrote a paper on that topic, he went out and bought a box of Alpha Beta prunes that he had in his desk. So he was an expert in two senses on alpha beta pruning. So good. Other techniques? Please.

AUDIENCE: The hashing. There's one function [INAUDIBLE] takes a long time, and suggested maybe you could somehow keep track of the laser path with a hash table [INAUDIBLE].

JON BENTLEY: Great. Did you resolve collisions at all? Or did you just have one element there with a key? How did you address the problem that Charles mentioned of-- what kind of hashing did you use?

AUDIENCE: So we haven't used caching yet.

JON BENTLEY: Other techniques?

CHARLES LEISERSON: Yes. That's a classic example of the fastest way to compute is not to compute at all.

JON BENTLEY: In general, in life no problem is so big that it can't be run away from. These things about avoiding work and being lazy are certainly models for organizing your own life. The lazy evaluation really works in the real world. Other questions? Was that a question or a random obscene hand gesture?

AUDIENCE: [INAUDIBLE].

JON BENTLEY: Please.

AUDIENCE: [INAUDIBLE] state-of the-art [INAUDIBLE]?

JON BENTLEY: Oh. That's a great question. I worked on this problem a lot in the early 1990s with my colleague David Johnson, who literally wrote the book on NP-completeness. An MIT PhD guy. We were really happy we're in-- at the time, in a couple of hours of CPU time we could solve 100,000 city problems to within a few percent. We were able to solve a million city problems in a day of CPU time to within a few percent.

And we were ecstatic. That was really big. So we could go out that big to within a few percent. If we worked really, really hard, we can get 10,000 problems down within a half a percent. But if you want to go all the way to have not only the optimal solution but a proof that it's optimal, for a while people bragged about we finally solved that problem. This will let you see about what was done. We solved the problem of all 48 state capitals.

So for a while that was the state of the art. And then that number has crept over time. And now you can get exact solutions to some famous problems into the tens of thousands by using lots and lots of really clever searching the branching down with really clever lower bounds to guide it up. And you at one point get a tour, and you can make that tour. But then you get a proof of a lower bound along with it to do that.

CHARLES LEISERSON: Hey, old man, I want to let you know that there are actually now 50 states in the union.

JON BENTLEY: No. What time did this happen? You can tell that I am much, much, much older than Charles, and he never lets me hear the end of it. I trust that the rest of you-- this is like the third free deep psychology insight, is be kind to old people ignore the example that the kid over there sets and show some class and respect to me and my fellow geezers.

CHARLES LEISERSON: We were both born in 1953.

JON BENTLEY: But I was born in the good part of 1953. In particular, I was born before Her Majesty the Queen of England assumed the throne. Can you make the same claim?

CHARLES LEISERSON: I cannot make the same claim.

JON BENTLEY: I'm sorry. He can, but only because he's a sneaky bastard. Can you make it truthfully is the question that I should have asked. Other questions?

This class can be very important. Like I said, I spent the past almost half century as a working computer programmer. The majority of that thing I've done most is performance engineering. It's allowed me to do a number of really interesting things. I've been able to dabble in all sorts of computational systems, ranging from automated gerrymandering.

Every time you make a telephone call in this country, if it's, say, a call from inside an institution like a hospital of a university, it uses some code that I wrote, some of the performance things. If you make a long-distance call, it uses code that I wrote. If you've ever used something called Google internet search or maps, or stocks or anything else, that uses some algorithms I've done.

It's incredibly satisfying. It's been a very, very fulfilling way for me to spend a big chunk of my life. I am grateful. It's allowed me to make friends, whom I've known for almost half a century, and to our wonderful dear people. And it's been a great way for my life. I hope that performance engineering is as good to you as it has been to me. Anything else, professor?

CHARLES LEISERSON: Thank you very much, Jon.

JON BENTLEY: Thank you.

[STUDENTS APPLAUD]

Free Downloads

Video

Internet Archive (MP4 - 180MB)

Caption

English-US (SRT)