1 00:00:17,970 --> 00:00:20,400 PROFESSOR: I sent out a survey this morning 2 00:00:20,400 --> 00:00:22,400 about how the class is going, what 3 00:00:22,400 --> 00:00:23,970 you thought of the problem set. 4 00:00:23,970 --> 00:00:27,860 And I would appreciate if you provide me some feedback-- 5 00:00:27,860 --> 00:00:31,740 so things you like or don't like about the class or about 6 00:00:31,740 --> 00:00:34,364 the problem set that was just due last night. 7 00:00:34,364 --> 00:00:38,115 So I can try to adjust to make it more interesting and useful 8 00:00:38,115 --> 00:00:38,740 for all of you. 9 00:00:42,600 --> 00:00:45,200 Last time we talked about Szemerédi's graph regularity 10 00:00:45,200 --> 00:00:46,100 lemma. 11 00:00:46,100 --> 00:00:48,020 So the regularity lemma, as I mentioned, 12 00:00:48,020 --> 00:00:52,230 is an extremely powerful tool in modern combinatorics. 13 00:00:52,230 --> 00:00:55,380 And last time we saw the statement and the proof 14 00:00:55,380 --> 00:00:57,690 of this regularity lemma. 15 00:00:57,690 --> 00:01:01,290 Today, I want to show you how to apply the lemma 16 00:01:01,290 --> 00:01:04,709 for extremal applications. 17 00:01:04,709 --> 00:01:08,550 In particular, we'll see how to prove Roth's theorem that I 18 00:01:08,550 --> 00:01:10,530 mentioned in the very first lecture, 19 00:01:10,530 --> 00:01:14,580 about subsets of integers lacking three-term arithmetic 20 00:01:14,580 --> 00:01:17,340 progressions. 21 00:01:17,340 --> 00:01:21,360 First, let me remind you the regularity lemma. 22 00:01:21,360 --> 00:01:25,930 We're always working inside some graph, G. 23 00:01:25,930 --> 00:01:32,870 We say that a pair of subsets of vertices 24 00:01:32,870 --> 00:01:42,130 is epsilon regular if the following holds-- 25 00:01:42,130 --> 00:01:57,660 for all subsets A of X, B of Y, neither too small, 26 00:01:57,660 --> 00:02:03,320 we have that the edge density between A and B 27 00:02:03,320 --> 00:02:08,340 is very similar to the edge density between the ambient 28 00:02:08,340 --> 00:02:11,437 sets X and Y. 29 00:02:11,437 --> 00:02:13,020 So we had this picture from last time. 30 00:02:19,040 --> 00:02:20,690 You have two sets. 31 00:02:20,690 --> 00:02:22,190 Now, they don't have to be disjoint. 32 00:02:22,190 --> 00:02:23,523 They could even be the same set. 33 00:02:23,523 --> 00:02:25,430 But for illustration purposes, it's 34 00:02:25,430 --> 00:02:27,650 easier to visualize what's going on if I draw them 35 00:02:27,650 --> 00:02:31,100 as disjoint subsets. 36 00:02:31,100 --> 00:02:33,320 So there is some edge density. 37 00:02:33,320 --> 00:02:36,020 And I say they're epsilon regular if they behave 38 00:02:36,020 --> 00:02:38,360 random-like in the following sense-- 39 00:02:38,360 --> 00:02:42,200 that the edges are somehow distributed in a fairly uniform 40 00:02:42,200 --> 00:02:46,790 way so that if I look at some smaller subsets 41 00:02:46,790 --> 00:02:50,810 A and B, but not too small, then the edge densities 42 00:02:50,810 --> 00:02:56,170 between A and B is very similar to the ambient edge densities. 43 00:02:56,170 --> 00:02:58,360 So by most, epsilon difference. 44 00:02:58,360 --> 00:03:01,040 Now I need that A and B are not too small 45 00:03:01,040 --> 00:03:05,270 because if you allow to take, for example, single vertices, 46 00:03:05,270 --> 00:03:09,620 you can easily get densities that are either 0 or 1. 47 00:03:09,620 --> 00:03:12,530 So then it's very hard to make any useful statement. 48 00:03:12,530 --> 00:03:16,480 So that's why these two conditions are needed. 49 00:03:16,480 --> 00:03:20,590 And here, the edge density is defined 50 00:03:20,590 --> 00:03:25,210 to be the number of edges with one endpoint in A, 51 00:03:25,210 --> 00:03:31,160 one endpoint in B, divided by the product of the sizes of A 52 00:03:31,160 --> 00:03:40,220 and B. And we say that a partition of the vertex set 53 00:03:40,220 --> 00:03:57,300 of the graph is epsilon regular if, by summing over all pairs 54 00:03:57,300 --> 00:04:10,390 i, j, such that vi, vj is not epsilon regular if we sum up 55 00:04:10,390 --> 00:04:13,360 the product of these part sizes, then 56 00:04:13,360 --> 00:04:19,750 this sum is at most an epsilon fraction of the total number 57 00:04:19,750 --> 00:04:22,480 of pairs of vertices. 58 00:04:22,480 --> 00:04:25,210 And the way to think of this is that there are not too 59 00:04:25,210 --> 00:04:27,880 many irregular parts. 60 00:04:27,880 --> 00:04:30,710 At least in the case when all the parts are equitable. 61 00:04:30,710 --> 00:04:32,770 So we should really think about all 62 00:04:32,770 --> 00:04:35,260 of them having more or less the same size than saying 63 00:04:35,260 --> 00:04:38,276 that at most an epsilon fraction of them are irregular. 64 00:04:42,940 --> 00:04:46,940 And the main theorem from last time was Szemerédi's regularity 65 00:04:46,940 --> 00:04:47,440 lemma. 66 00:05:01,560 --> 00:05:05,130 And the statement is that for every epsilon, 67 00:05:05,130 --> 00:05:07,850 there exists some M-- 68 00:05:07,850 --> 00:05:10,730 so M depends only on epsilon and not 69 00:05:10,730 --> 00:05:13,460 on the graph that we're about to see-- 70 00:05:13,460 --> 00:05:28,210 such that every graph has an epsilon regular partition 71 00:05:28,210 --> 00:05:32,780 into at most M parts. 72 00:05:36,230 --> 00:05:37,670 In particular, the number of parts 73 00:05:37,670 --> 00:05:40,220 does not depend on the graph. 74 00:05:40,220 --> 00:05:42,050 For every epsilon, there is some M. 75 00:05:42,050 --> 00:05:43,760 And no matter how large the graph, 76 00:05:43,760 --> 00:05:45,950 there exists a bounded size partition 77 00:05:45,950 --> 00:05:49,130 that is epsilon regular. 78 00:05:49,130 --> 00:05:51,620 So the proof last time gave us a bound M 79 00:05:51,620 --> 00:05:54,550 that is quite large as a function of epsilon. 80 00:05:54,550 --> 00:05:56,960 So the last time we saw that this M 81 00:05:56,960 --> 00:06:06,230 was a tower of twos of height essentially polynomial in 1 82 00:06:06,230 --> 00:06:08,770 over epsilon. 83 00:06:08,770 --> 00:06:13,160 And I mentioned that you basically cannot improve this 84 00:06:13,160 --> 00:06:14,850 bound. 85 00:06:14,850 --> 00:06:17,410 So this bound is more or less the best possible 86 00:06:17,410 --> 00:06:20,050 up to maybe changing the 5. 87 00:06:20,050 --> 00:06:22,940 And so in some sense, the proof that we gave last time 88 00:06:22,940 --> 00:06:26,750 for Szemerédi's graph regularity lemma was the right proof. 89 00:06:26,750 --> 00:06:28,690 So that was the sequence of steps that 90 00:06:28,690 --> 00:06:30,315 were the right things to do. 91 00:06:30,315 --> 00:06:31,940 Even though they give a terrible bound, 92 00:06:31,940 --> 00:06:35,060 it's somehow the bound that should come out. 93 00:06:38,690 --> 00:06:40,600 What I want to talk about today is, 94 00:06:40,600 --> 00:06:44,270 what's a regularity partition good for? 95 00:06:44,270 --> 00:06:48,230 So we did all this work to get a regularity partition, 96 00:06:48,230 --> 00:06:50,210 and it has all of these nice definitions. 97 00:06:50,210 --> 00:06:51,850 But they are useful for something. 98 00:06:51,850 --> 00:06:55,130 So what is it useful for? 99 00:06:55,130 --> 00:06:57,050 And here is the intuition. 100 00:06:59,820 --> 00:07:02,530 Remember at the beginning of last lecture 101 00:07:02,530 --> 00:07:06,070 I mentioned this informal statement of regularity lemma-- 102 00:07:06,070 --> 00:07:10,360 namely that there exists a partition of the graph 103 00:07:10,360 --> 00:07:13,960 so that most pairs look random-like. 104 00:07:21,290 --> 00:07:23,810 So what does random-like mean? 105 00:07:23,810 --> 00:07:26,710 So random-like, there is a specific definition. 106 00:07:26,710 --> 00:07:30,620 But the intuition is that in many aspects, especially when 107 00:07:30,620 --> 00:07:34,460 it comes to counting small patterns, 108 00:07:34,460 --> 00:07:37,220 the graph in the random-like setting 109 00:07:37,220 --> 00:07:41,090 looks very similar to what happens in a random graph-- 110 00:07:41,090 --> 00:07:42,920 in a genuine random graph. 111 00:07:42,920 --> 00:07:47,150 In particular, if you have three subsets-- 112 00:07:47,150 --> 00:07:49,670 x, y, and z-- 113 00:07:49,670 --> 00:07:59,000 and suppose that the three pairs are all epsilon regular, 114 00:07:59,000 --> 00:08:07,220 then you might be interested in the number of triangles 115 00:08:07,220 --> 00:08:09,830 with one vertex in each set. 116 00:08:13,720 --> 00:08:17,690 Now, if this were a genuine random tripartite 117 00:08:17,690 --> 00:08:20,900 graph with specified edge densities, 118 00:08:20,900 --> 00:08:23,240 then the number of triangles in such a random graph 119 00:08:23,240 --> 00:08:24,960 is pretty easy to calculate. 120 00:08:24,960 --> 00:08:28,380 You would expect that it is around 121 00:08:28,380 --> 00:08:34,520 the product of the sizes of these vertex sets multiplied 122 00:08:34,520 --> 00:08:37,070 by their edge densities. 123 00:08:42,120 --> 00:08:44,790 And what we will see is that in this case, 124 00:08:44,790 --> 00:08:49,200 in that of an epsilon regular setting, 125 00:08:49,200 --> 00:08:50,750 this is also a true statement. 126 00:08:50,750 --> 00:08:53,220 It's a true, deterministic statement. 127 00:08:53,220 --> 00:08:56,410 That's one of the consequences of epsilon regularity. 128 00:08:56,410 --> 00:08:57,250 Yes, question? 129 00:08:57,250 --> 00:08:58,792 AUDIENCE: Why are we only multiplying 130 00:08:58,792 --> 00:09:02,293 the sizes [INAUDIBLE]? 131 00:09:02,293 --> 00:09:04,210 PROFESSOR: Asking, why are we only multiplying 132 00:09:04,210 --> 00:09:07,400 the sizes of x, y, and z? 133 00:09:07,400 --> 00:09:08,940 So you're asking-- 134 00:09:08,940 --> 00:09:09,440 OK. 135 00:09:09,440 --> 00:09:12,230 So we're trying to find out how many triangles are there 136 00:09:12,230 --> 00:09:16,790 with one vertex in x, one vertex in y, and one vertex in z. 137 00:09:16,790 --> 00:09:20,030 So if I put these vertices in there, one by one, 138 00:09:20,030 --> 00:09:22,400 then if this were a random graph, 139 00:09:22,400 --> 00:09:30,730 I expect that pair to be an edge with probability dxy and so on. 140 00:09:30,730 --> 00:09:33,900 So if all the edge densities were one half, then 141 00:09:33,900 --> 00:09:38,620 I expect one eighth of these triples to be actual triangles. 142 00:09:38,620 --> 00:09:41,710 And what we're saying is that in an epsilon regular setting, 143 00:09:41,710 --> 00:09:47,490 that is approximately a true statement. 144 00:09:47,490 --> 00:09:49,380 So let me formalize this intuition 145 00:09:49,380 --> 00:09:51,666 into an actual statement. 146 00:09:58,970 --> 00:10:01,980 And this type of statements are known as counting lemmas 147 00:10:01,980 --> 00:10:02,860 in literature. 148 00:10:02,860 --> 00:10:06,000 And in particular, let's look at the triangle counting lemma. 149 00:10:15,860 --> 00:10:17,340 In the triangle counting lemma-- 150 00:10:17,340 --> 00:10:19,500 so we're using the same picture over there-- 151 00:10:19,500 --> 00:10:27,150 I have three vertex subsets of some given graph. 152 00:10:27,150 --> 00:10:28,740 Again, they don't have to be disjoint. 153 00:10:28,740 --> 00:10:30,570 They could overlap, but it's fine to think 154 00:10:30,570 --> 00:10:31,820 about that picture over there. 155 00:10:34,310 --> 00:10:41,655 And suppose that these three pairs of subsets-- 156 00:10:44,220 --> 00:10:47,890 so these three subsets-- they are mutually epsilon regular. 157 00:10:57,270 --> 00:11:01,740 Then, for abbreviation, let me write 158 00:11:01,740 --> 00:11:07,580 the sub xy to be the edge density between x and y, 159 00:11:07,580 --> 00:11:09,500 and so on for the other two pairs. 160 00:11:12,410 --> 00:11:15,800 The conclusion is that the number of triangles-- 161 00:11:19,810 --> 00:11:25,420 where I'm looking at triangles and only counting triangles 162 00:11:25,420 --> 00:11:29,940 with one specified vertex in x, one in y, and one in z-- 163 00:11:33,740 --> 00:11:37,010 is at least some quantity. 164 00:11:37,010 --> 00:11:39,950 So there is a small potential error 165 00:11:39,950 --> 00:11:44,930 loss but otherwise the product, as I mentioned earlier. 166 00:11:57,410 --> 00:12:00,890 So it is at least this quantity I mentioned earlier up 167 00:12:00,890 --> 00:12:04,220 to a potential small error because we're 168 00:12:04,220 --> 00:12:06,062 looking at epsilon regularity. 169 00:12:06,062 --> 00:12:07,520 So there could be some fluctuations 170 00:12:07,520 --> 00:12:10,110 in both directions. 171 00:12:10,110 --> 00:12:13,493 A similar statement is also true as an upper bound. 172 00:12:13,493 --> 00:12:15,160 But the lower bound will be more useful, 173 00:12:15,160 --> 00:12:17,290 so I will show you the proof of the lower bound. 174 00:12:17,290 --> 00:12:19,590 But you can figure out how to do the upper bound. 175 00:12:19,590 --> 00:12:23,490 And later on we'll see a general proof what happens, 176 00:12:23,490 --> 00:12:26,910 instead of triangles, if you have other subgraphs that you 177 00:12:26,910 --> 00:12:29,920 wish to count. 178 00:12:29,920 --> 00:12:30,940 So here's the intuition. 179 00:12:30,940 --> 00:12:32,840 So you have a random-like setting, 180 00:12:32,840 --> 00:12:34,570 and we'll formalize it in the setting 181 00:12:34,570 --> 00:12:36,655 of epsilon regular pairs. 182 00:12:36,655 --> 00:12:37,155 Yeah? 183 00:12:37,155 --> 00:12:40,800 AUDIENCE: Where does the 1 minus 2 epsilon come from? 184 00:12:40,800 --> 00:12:41,537 PROFESSOR: OK. 185 00:12:41,537 --> 00:12:43,870 The question is, where does 1 minus 2 epsilon come from? 186 00:12:43,870 --> 00:12:45,680 You'll see in the proof. 187 00:12:45,680 --> 00:12:47,980 But you should think of this as essentially 188 00:12:47,980 --> 00:12:49,020 a negligible factor. 189 00:12:51,580 --> 00:12:54,850 Any more questions? 190 00:12:54,850 --> 00:12:55,350 All right. 191 00:12:55,350 --> 00:12:57,410 So here's how this proof is going to go. 192 00:13:03,850 --> 00:13:10,880 Let's look at x and think about its relationship to y. 193 00:13:10,880 --> 00:13:12,820 It's epsilon regular. 194 00:13:12,820 --> 00:13:14,450 And they claim, as a result of them 195 00:13:14,450 --> 00:13:21,590 being epsilon regular, fewer than epsilon fraction of x. 196 00:13:21,590 --> 00:13:30,900 So fewer than this many vertices in x have very small number 197 00:13:30,900 --> 00:13:32,720 of neighbors in y. 198 00:13:42,270 --> 00:13:45,330 Because if this were not the case, 199 00:13:45,330 --> 00:13:49,740 then you can violate the condition 200 00:13:49,740 --> 00:13:51,480 of absolute regularity. 201 00:13:51,480 --> 00:13:56,890 So if not, then let's look at this subset, which 202 00:13:56,890 --> 00:14:01,690 has size at least epsilon x. 203 00:14:01,690 --> 00:14:12,250 And all of them have fewer than that number of neighbors to y. 204 00:14:12,250 --> 00:14:16,750 So these two sets-- 205 00:14:16,750 --> 00:14:21,480 so this set, x prime and y, would 206 00:14:21,480 --> 00:14:28,420 witness non-epsilon regularity. 207 00:14:28,420 --> 00:14:31,300 So you cannot have too many vertices with small degrees 208 00:14:31,300 --> 00:14:32,260 going to x-- 209 00:14:32,260 --> 00:14:34,400 going to y. 210 00:14:34,400 --> 00:14:34,900 OK. 211 00:14:34,900 --> 00:14:35,400 Great. 212 00:14:37,890 --> 00:14:45,260 And likewise, fewer than epsilon x vertices 213 00:14:45,260 --> 00:14:49,560 have a small number of neighbors to z. 214 00:15:01,705 --> 00:15:03,330 So what does the picture now look like? 215 00:15:08,570 --> 00:15:13,210 So you have this x and then these two other sets, 216 00:15:13,210 --> 00:15:24,680 y and z where I'm going to throw out a small proportion of x 217 00:15:24,680 --> 00:15:29,480 less than 2 epsilon fraction of x that have 218 00:15:29,480 --> 00:15:31,910 the wrong kinds of degrees. 219 00:15:31,910 --> 00:15:39,290 And everything else in here have lots of neighbors in both y 220 00:15:39,290 --> 00:15:40,115 and in z. 221 00:15:40,115 --> 00:15:50,120 And in particular, for all x up here it has lots of neighbors 222 00:15:50,120 --> 00:15:56,100 to y, lots of neighbors to z. 223 00:15:56,100 --> 00:15:57,080 How many? 224 00:15:57,080 --> 00:16:06,110 Well, we have at least d sub xy minus epsilon y neighbors to y 225 00:16:06,110 --> 00:16:13,080 and at least d sub xz minus epsilon times z neighbors to z. 226 00:16:22,390 --> 00:16:23,980 OK. 227 00:16:23,980 --> 00:16:29,314 So now I realize I'm missing a hypothesis in the counting 228 00:16:29,314 --> 00:16:29,814 lemma. 229 00:16:34,080 --> 00:16:41,580 Let me assume that none of these edge densities are too small. 230 00:16:47,980 --> 00:16:49,630 They're all at least 2 epsilon. 231 00:16:56,410 --> 00:17:02,750 So now these guys are at least epsilon fractions of y and z. 232 00:17:06,260 --> 00:17:17,619 So I can apply the definition of epsilon regularity 233 00:17:17,619 --> 00:17:25,650 to the pair yz to deduce that there are lots of edges 234 00:17:25,650 --> 00:17:26,960 between these two sets. 235 00:17:29,650 --> 00:17:37,530 So over here, the number of edges 236 00:17:37,530 --> 00:17:57,970 is at least the products of the sizes multiplied by the edge 237 00:17:57,970 --> 00:17:59,680 density between them. 238 00:17:59,680 --> 00:18:02,260 And by the definition of epsilon regularity, 239 00:18:02,260 --> 00:18:05,470 the edge density between these two small or these two red sets 240 00:18:05,470 --> 00:18:10,090 is at least d of yz minus epsilon. 241 00:18:12,950 --> 00:18:14,890 So putting everything now together, 242 00:18:14,890 --> 00:18:22,980 we find that the total number of triangles, 243 00:18:22,980 --> 00:18:27,336 looking at all the possible places where x can go-- 244 00:18:27,336 --> 00:18:33,550 so at least 1 minus 2 epsilon times the size of x. 245 00:18:33,550 --> 00:18:38,710 And then multiply by this factor over here. 246 00:18:43,720 --> 00:18:47,910 And so we find the statement up there. 247 00:18:47,910 --> 00:18:50,940 So this calculation formalizes the intuition 248 00:18:50,940 --> 00:18:53,010 that if you have epsilon regular pairs, 249 00:18:53,010 --> 00:18:55,350 then they behave like random settings 250 00:18:55,350 --> 00:18:57,540 when it comes to counting small patterns-- namely 251 00:18:57,540 --> 00:18:58,332 that of a triangle. 252 00:19:03,280 --> 00:19:06,510 So what can we use this for? 253 00:19:06,510 --> 00:19:08,520 The next statement I want to show you 254 00:19:08,520 --> 00:19:11,470 is called a triangle removal lemma. 255 00:19:22,490 --> 00:19:24,830 So this is a somewhat innocuous looking 256 00:19:24,830 --> 00:19:28,760 statement that is surprisingly tricky to prove. 257 00:19:28,760 --> 00:19:32,960 And part of the development of this regularity lemma was 258 00:19:32,960 --> 00:19:36,830 to prove Szemerédi's-- to the triangle removal lemma. 259 00:19:36,830 --> 00:19:40,360 This was one of the early applications of the regularity 260 00:19:40,360 --> 00:19:41,650 lemma. 261 00:19:41,650 --> 00:19:50,220 So it's due to Ruzsa and Szemerédi back in the '70s. 262 00:19:53,170 --> 00:19:55,150 Here's the statement. 263 00:19:55,150 --> 00:19:59,800 For every epsilon there exists a delta, 264 00:19:59,800 --> 00:20:13,400 such that every graph of n vertices with a small number 265 00:20:13,400 --> 00:20:15,540 of triangles-- 266 00:20:15,540 --> 00:20:17,510 so a small number of triangles means 267 00:20:17,510 --> 00:20:21,170 a negligible fraction of all the possible triples of vertices 268 00:20:21,170 --> 00:20:22,580 are actual triangles. 269 00:20:22,580 --> 00:20:26,120 So fewer than delta n cubed triangles. 270 00:20:31,097 --> 00:20:33,430 So if you have a graph with a small number of triangles, 271 00:20:33,430 --> 00:20:37,690 the question is, can you make it triangle free by getting rid 272 00:20:37,690 --> 00:20:40,231 of a small number of edges? 273 00:20:40,231 --> 00:20:43,430 So actually, there was already a problem on the first homework 274 00:20:43,430 --> 00:20:47,300 set that is in that spirit. 275 00:20:47,300 --> 00:20:50,450 So if you compare what I'm doing here to the homework set, 276 00:20:50,450 --> 00:20:52,400 you'll see that there are different scales. 277 00:20:52,400 --> 00:20:57,110 So fewer than delta n cubed triangles 278 00:20:57,110 --> 00:21:12,740 can be made triangle free by removing epsilon n 279 00:21:12,740 --> 00:21:13,730 squared edges. 280 00:21:22,028 --> 00:21:23,820 So if you have a small number of triangles, 281 00:21:23,820 --> 00:21:25,320 you can get rid of all the triangles 282 00:21:25,320 --> 00:21:27,935 by removing a small number of edges. 283 00:21:27,935 --> 00:21:29,310 If I put it that way, it actually 284 00:21:29,310 --> 00:21:31,230 sounds kind of trivial. 285 00:21:31,230 --> 00:21:33,150 You just get rid of all the triangles. 286 00:21:33,150 --> 00:21:36,550 But if you look at the scales it's not trivial at all, 287 00:21:36,550 --> 00:21:41,740 because there are only a subcubic number of triangles. 288 00:21:41,740 --> 00:21:45,150 So if you take out one edge from each triangle, 289 00:21:45,150 --> 00:21:48,425 maybe you got rid of a lot of edges. 290 00:21:48,425 --> 00:21:50,300 So this is a very innocent looking statement, 291 00:21:50,300 --> 00:21:53,720 but it's actually incredibly deep and tricky. 292 00:21:53,720 --> 00:21:56,060 Before jumping to the proof, let me first 293 00:21:56,060 --> 00:22:02,120 show you an equivalent reformulation of the statement 294 00:22:02,120 --> 00:22:04,640 that also helps you to think about what this statement is 295 00:22:04,640 --> 00:22:05,350 trying to say. 296 00:22:16,480 --> 00:22:19,930 So the triangle removal lemma can be equivalently stated 297 00:22:19,930 --> 00:22:28,720 as saying that every n vertex graph 298 00:22:28,720 --> 00:22:35,180 with a subcubic number of triangles-- 299 00:22:35,180 --> 00:22:38,320 so little o of n cubed triangles-- 300 00:22:42,780 --> 00:22:56,240 can be made triangle free by removing a subquadratic-- 301 00:22:56,240 --> 00:22:58,700 namely, little o of n squared-- 302 00:22:58,700 --> 00:22:59,460 number of edges. 303 00:23:05,710 --> 00:23:07,140 So this is an equivalent statement 304 00:23:07,140 --> 00:23:09,460 to what I wrote above, although it actually 305 00:23:09,460 --> 00:23:12,550 takes some thought to figure out what this is even saying 306 00:23:12,550 --> 00:23:16,440 because everybody loves using asymptotic notation, 307 00:23:16,440 --> 00:23:18,920 but there is also ambiguity with, 308 00:23:18,920 --> 00:23:21,250 what do you mean by asymptotic notation, 309 00:23:21,250 --> 00:23:26,200 especially if it appears in the hypothesis of a claim? 310 00:23:26,200 --> 00:23:27,950 So what do you think this statement means? 311 00:23:27,950 --> 00:23:33,310 Can you write out more of a full form? 312 00:23:33,310 --> 00:23:35,770 I think of this as a lazy version 313 00:23:35,770 --> 00:23:38,090 of trying to say something. 314 00:23:38,090 --> 00:23:42,010 So what do you mean by having little o of n cubed triangles? 315 00:23:46,990 --> 00:23:47,986 Yes. 316 00:23:47,986 --> 00:23:49,978 AUDIENCE: The sequence of the graph. 317 00:23:49,978 --> 00:23:50,974 [INAUDIBLE] 318 00:23:54,545 --> 00:23:58,262 AUDIENCE: [INAUDIBLE] function has n and only n. 319 00:23:58,262 --> 00:23:59,236 [INAUDIBLE] 320 00:24:04,277 --> 00:24:04,860 PROFESSOR: OK. 321 00:24:04,860 --> 00:24:05,360 Great. 322 00:24:05,360 --> 00:24:07,710 So I have a sequence of graphs. 323 00:24:07,710 --> 00:24:10,207 And also, we can put some functions in. 324 00:24:10,207 --> 00:24:11,790 So I'll write down the statement here, 325 00:24:11,790 --> 00:24:12,957 but that's kind of the idea. 326 00:24:12,957 --> 00:24:14,820 We're looking at not just a single graph, 327 00:24:14,820 --> 00:24:17,340 but we're looking at a sequence. 328 00:24:17,340 --> 00:24:23,490 Another way to say this is that for every function 329 00:24:23,490 --> 00:24:28,390 fn, that is subcubic. 330 00:24:28,390 --> 00:24:32,730 So for example, if f of n is n cubed divided about log n, 331 00:24:32,730 --> 00:24:39,060 there exists some function g, which is subquadratic, 332 00:24:39,060 --> 00:24:42,810 such that if you replace the first one by f of n 333 00:24:42,810 --> 00:24:46,800 and the second thing by g of n, then this is a true statement. 334 00:24:46,800 --> 00:24:48,940 And I'll leave it to you as an exercise 335 00:24:48,940 --> 00:24:52,230 in quantified elimination, let's say, 336 00:24:52,230 --> 00:24:54,948 to explain why these two statements are 337 00:24:54,948 --> 00:24:55,990 equivalent to each other. 338 00:25:04,380 --> 00:25:09,750 I want to explain a recipe for applying Szemerédi's regularity 339 00:25:09,750 --> 00:25:10,250 lemma. 340 00:25:10,250 --> 00:25:12,680 How does one use the regularity lemma 341 00:25:12,680 --> 00:25:17,990 to prove, well, statements in graph theory? 342 00:25:17,990 --> 00:25:22,100 The most standard applications of regularity lemma 343 00:25:22,100 --> 00:25:23,810 generally have the following steps. 344 00:25:26,990 --> 00:25:29,050 Let me call this a recipe. 345 00:25:29,050 --> 00:25:30,330 And we'll see it a few times. 346 00:25:46,510 --> 00:25:54,010 The first step is we apply Szemerédi's regularity lemma 347 00:25:54,010 --> 00:25:55,828 to obtain a partition. 348 00:26:08,290 --> 00:26:11,653 So let me call the first step partition. 349 00:26:16,860 --> 00:26:20,520 In the second step, we look at the partition that we obtained, 350 00:26:20,520 --> 00:26:22,410 and we clean it up. 351 00:26:22,410 --> 00:26:26,610 So in the partition, you have some irregular pairs that 352 00:26:26,610 --> 00:26:28,690 are undesirable to work with. 353 00:26:28,690 --> 00:26:31,160 And there are some other pairs that we'll see. 354 00:26:31,160 --> 00:26:35,820 So in particular, if your pair involves 355 00:26:35,820 --> 00:26:39,300 edges that are fairly sparse or subsets 356 00:26:39,300 --> 00:26:41,825 of vertices that are fairly small, then 357 00:26:41,825 --> 00:26:43,200 maybe we don't want to touch them 358 00:26:43,200 --> 00:26:47,320 because they're kind of not so good to deal with. 359 00:26:47,320 --> 00:26:52,480 So we're going to clean the graph 360 00:26:52,480 --> 00:27:10,140 by removing edges in irregular pairs and low density pairs. 361 00:27:17,100 --> 00:27:19,910 And unless you're using the version of regularity lemma 362 00:27:19,910 --> 00:27:22,920 that allows you to have equitable parts, 363 00:27:22,920 --> 00:27:31,320 you also want to get rid of edges where one of the parts 364 00:27:31,320 --> 00:27:32,250 is too small. 365 00:27:43,210 --> 00:27:49,630 And the third step, I'll call this count. 366 00:27:49,630 --> 00:27:52,690 Once you've cleaned up the regularity partition, 367 00:27:52,690 --> 00:27:55,690 say, well, let's try to find some patterns. 368 00:27:55,690 --> 00:28:05,830 If you find one pattern in the cleaned graph-- 369 00:28:11,780 --> 00:28:23,450 and we can use the counting lemma to find lots of patterns. 370 00:28:23,450 --> 00:28:27,170 Here, for the purpose of triangle removal lemma 371 00:28:27,170 --> 00:28:29,270 and what we've been doing so far, 372 00:28:29,270 --> 00:28:32,510 pattern just means a triangle. 373 00:28:32,510 --> 00:28:35,110 So we're going to use the triangle counting lemma to find 374 00:28:35,110 --> 00:28:37,630 us lots of triangles. 375 00:28:37,630 --> 00:28:39,250 So we'll see the details in a bit. 376 00:28:39,250 --> 00:28:42,020 But if we run through the strategy-- 377 00:28:42,020 --> 00:28:43,540 you give me a graph. 378 00:28:43,540 --> 00:28:46,980 Let's say, starting from the triangle removal lemma, 379 00:28:46,980 --> 00:28:49,530 it has a small number of triangles. 380 00:28:49,530 --> 00:28:53,000 You apply the partition, clean it up, 381 00:28:53,000 --> 00:28:55,140 and I claim this cleaning removes 382 00:28:55,140 --> 00:28:56,970 a small number of edges. 383 00:28:56,970 --> 00:29:00,690 And it should result in a triangle free graph 384 00:29:00,690 --> 00:29:04,770 because if it did not result in a triangle free graph, then 385 00:29:04,770 --> 00:29:06,506 there's some triangle. 386 00:29:06,506 --> 00:29:09,760 And from that triangle I can apply the triangle counting 387 00:29:09,760 --> 00:29:13,600 lemma to get lots of triangles. 388 00:29:13,600 --> 00:29:16,000 And that would violate the hypothesis 389 00:29:16,000 --> 00:29:18,820 of the triangle removal lemma. 390 00:29:18,820 --> 00:29:22,350 So that's how the proof is going to go. 391 00:29:22,350 --> 00:29:25,300 So I want to take a very quick break. 392 00:29:25,300 --> 00:29:26,970 And then when we come back, we'll 393 00:29:26,970 --> 00:29:31,174 see the details of how to apply the irregularity lemma. 394 00:29:31,174 --> 00:29:32,800 Are there any questions so far? 395 00:29:37,730 --> 00:29:38,716 Yeah? 396 00:29:38,716 --> 00:29:41,190 AUDIENCE: So when we're removing edges 397 00:29:41,190 --> 00:29:44,830 in one of the [INAUDIBLE],, is that too small? 398 00:29:44,830 --> 00:29:48,477 Can we do that for every vertex, or is it too small? 399 00:29:48,477 --> 00:29:50,060 PROFESSOR: So you're asking about what 400 00:29:50,060 --> 00:29:54,672 happens when we remove vertexes that are too small. 401 00:29:54,672 --> 00:29:56,380 You will see in the details of the proof. 402 00:29:56,380 --> 00:29:58,000 So hold on to that question for a bit. 403 00:30:00,560 --> 00:30:03,860 More questions. 404 00:30:03,860 --> 00:30:04,360 OK. 405 00:30:04,360 --> 00:30:07,517 So let's see the proof of the triangle removal lemma. 406 00:30:18,470 --> 00:30:25,230 So the first step is to apply Szemerédi's regularity lemma 407 00:30:25,230 --> 00:30:27,820 and find a partition. 408 00:30:27,820 --> 00:30:34,380 So we'll find a partition that's epsilon over 4 regular. 409 00:30:45,000 --> 00:30:46,800 So here, epsilon is the same epsilon 410 00:30:46,800 --> 00:30:49,437 in the statement-- in the top statement-- of the triangle 411 00:30:49,437 --> 00:30:50,020 removal lemma. 412 00:30:52,610 --> 00:31:00,450 In the second step, let's clean the graph 413 00:31:00,450 --> 00:31:11,460 by removing all edges in-- 414 00:31:11,460 --> 00:31:14,445 so we are going to get rid of edges between-- 415 00:31:19,760 --> 00:31:20,460 OK. 416 00:31:20,460 --> 00:31:21,502 So let me do it this way. 417 00:31:21,502 --> 00:31:28,340 So all edges between the vi and the vj 418 00:31:28,340 --> 00:31:42,500 whenever vi and vj is not epsilon regular. 419 00:31:42,500 --> 00:31:45,876 Get rid of the edges between irregular parts. 420 00:31:45,876 --> 00:31:47,370 AUDIENCE: Epsilon over 4 regular. 421 00:31:47,370 --> 00:31:48,078 PROFESSOR: Sorry? 422 00:31:48,078 --> 00:31:49,633 AUDIENCE: Epsilon over 4 regular. 423 00:31:49,633 --> 00:31:51,050 PROFESSOR: Epsilon over 4 regular. 424 00:31:51,050 --> 00:31:51,550 Thank you. 425 00:31:57,240 --> 00:32:05,586 Also, between parts where the edge density is too small-- 426 00:32:08,778 --> 00:32:11,910 if the edge density is less than epsilon over 2, 427 00:32:11,910 --> 00:32:12,960 get rid of those edges. 428 00:32:15,790 --> 00:32:22,380 And if one of the two vertex sets 429 00:32:22,380 --> 00:32:25,500 has size too small-- and here, too small 430 00:32:25,500 --> 00:32:32,550 means epsilon over 4M times the size of n. 431 00:32:32,550 --> 00:32:37,630 So here-- OK. 432 00:32:37,630 --> 00:32:42,370 So let me use big M for the number of parts. 433 00:32:42,370 --> 00:32:45,130 So that's the M that comes out of Szemerédi's regularity 434 00:32:45,130 --> 00:32:45,790 lemma. 435 00:32:45,790 --> 00:32:48,310 If you like, some of the vertex sets can be empty. 436 00:32:48,310 --> 00:32:49,855 It doesn't change the proof. 437 00:32:49,855 --> 00:32:55,280 And n is the number of vertices in the graph. 438 00:32:59,045 --> 00:33:00,920 And this step, you don't really need the step 439 00:33:00,920 --> 00:33:04,790 if your regular partition is equitable. 440 00:33:04,790 --> 00:33:08,030 So let's see how many vertices-- how many edges 441 00:33:08,030 --> 00:33:11,300 have we gotten rid of. 442 00:33:11,300 --> 00:33:16,300 We want to show that we're not deleting too many edges. 443 00:33:16,300 --> 00:33:18,040 In the first step-- 444 00:33:18,040 --> 00:33:24,440 so the number of deleted edges. 445 00:33:24,440 --> 00:33:29,240 In the first step, you see that the number of edges deleted 446 00:33:29,240 --> 00:33:39,980 is at most the sum of product of vi vj when you sum over ij such 447 00:33:39,980 --> 00:33:46,980 that this pair is not epsilon regular or epsilon 4 regular. 448 00:33:46,980 --> 00:33:50,540 Epsilon over 4 regular. 449 00:33:50,540 --> 00:33:53,780 By the definition of an epsilon regular partition, 450 00:33:53,780 --> 00:33:59,300 the sum here is at most epsilon over 4 times n squared. 451 00:34:04,480 --> 00:34:10,620 In the second step, I'm getting rid of low density pairs. 452 00:34:10,620 --> 00:34:12,900 By the virtue of them being low density, 453 00:34:12,900 --> 00:34:15,630 I'm not removing so many edges. 454 00:34:15,630 --> 00:34:20,560 So at most epsilon over 2 times n squared edges 455 00:34:20,560 --> 00:34:21,360 I'm getting rid of. 456 00:34:24,620 --> 00:34:33,260 In the third part, you see every time I take a very small piece, 457 00:34:33,260 --> 00:34:38,060 every vertex here is adjacent to at most n vertices. 458 00:34:38,060 --> 00:34:42,909 So the number of such things, such edges 459 00:34:42,909 --> 00:34:45,429 I'm getting rid of in the last step 460 00:34:45,429 --> 00:34:51,360 is at most this number times a number of parts M then times n. 461 00:34:51,360 --> 00:34:56,739 So it's at most epsilon over 4 times n squared. 462 00:35:00,540 --> 00:35:03,600 So here I'm telling you how many edges 463 00:35:03,600 --> 00:35:06,510 I've deleted in each step. 464 00:35:06,510 --> 00:35:10,710 And in total, putting them together, 465 00:35:10,710 --> 00:35:17,153 we see that we get rid of at most epsilon n squared edges 466 00:35:17,153 --> 00:35:17,820 from this graph. 467 00:35:24,350 --> 00:35:26,600 So that's the cleaning step. 468 00:35:26,600 --> 00:35:29,670 So we cleaned up the graph by getting rid of low density 469 00:35:29,670 --> 00:35:33,750 pairs, getting rid of irregular pairs, and small vertex s. 470 00:35:38,930 --> 00:35:41,330 Now suppose, after this cleaning, 471 00:35:41,330 --> 00:35:44,810 some triangles still remains. 472 00:35:44,810 --> 00:35:46,530 So we're now onto the third step. 473 00:35:46,530 --> 00:35:54,320 So suppose some triangle remains. 474 00:36:00,430 --> 00:36:03,050 So where could this triangle sit? 475 00:36:06,130 --> 00:36:08,020 Has to be between three parts-- 476 00:36:08,020 --> 00:36:10,570 vi, vj, and vk. 477 00:36:14,130 --> 00:36:16,537 I, j, and k, they don't have to be distinct. 478 00:36:16,537 --> 00:36:18,870 So the argument will be OK if some of them are the same, 479 00:36:18,870 --> 00:36:22,660 but it's easier to draw if they're all different. 480 00:36:22,660 --> 00:36:24,560 So I have some triangle, like that. 481 00:36:28,040 --> 00:36:32,270 Because these edges have not yet been deleted in the cleaning 482 00:36:32,270 --> 00:36:38,898 step, I know that the vertex sets are not too small, 483 00:36:38,898 --> 00:36:40,440 the edge densities are not too small, 484 00:36:40,440 --> 00:36:43,600 and they are all regular with each other. 485 00:36:43,600 --> 00:37:00,440 So here, each pair in vi, vj, vk is epsilon over 4 regular 486 00:37:00,440 --> 00:37:09,660 and have edge density at least epsilon over 2. 487 00:37:09,660 --> 00:37:15,490 And now we apply the triangle counting lemma, 488 00:37:15,490 --> 00:37:21,770 and we find that the number of triangles 489 00:37:21,770 --> 00:37:24,410 with one vertex in vi, one vertex in vj, 490 00:37:24,410 --> 00:37:31,030 one vertex in the vk is at least this quantity here. 491 00:37:36,390 --> 00:37:39,070 So that's a correction factor. 492 00:37:39,070 --> 00:37:41,960 So 1 over this 2 epsilon. 493 00:37:41,960 --> 00:37:44,980 And then a bunch of densities-- so densities are not too small. 494 00:37:44,980 --> 00:37:48,240 So I have at least epsilon over 4 n 495 00:37:48,240 --> 00:37:52,590 cubed multiplied by the sizes of the vertex sets. 496 00:37:58,360 --> 00:37:59,725 Now I know that-- 497 00:37:59,725 --> 00:38:05,080 use the fact that these part sizes are not too small. 498 00:38:05,080 --> 00:38:15,460 So I have that. 499 00:38:21,120 --> 00:38:26,830 Just in case, if i, j, and k happen to be the same, 500 00:38:26,830 --> 00:38:28,660 or two of them happen to be the same, 501 00:38:28,660 --> 00:38:32,980 I might overcount the number of triangles a little bit. 502 00:38:32,980 --> 00:38:37,410 But at most, you overcount by a factor of 6. 503 00:38:37,410 --> 00:38:38,140 So that's OK. 504 00:38:38,140 --> 00:38:40,060 So if you're worried about that, put 505 00:38:40,060 --> 00:38:48,330 the 1 over 6 factor in, just in case i, j, k not distinct. 506 00:38:51,120 --> 00:38:55,790 Or if you like, in the cleaning step, you can-- 507 00:38:55,790 --> 00:38:58,010 if you apply the equitable version of the regularity 508 00:38:58,010 --> 00:39:00,682 lemma, you can also get rid of edges inside the parts. 509 00:39:00,682 --> 00:39:02,140 But there are many ways to do this. 510 00:39:02,140 --> 00:39:03,620 It's not an important step. 511 00:39:06,720 --> 00:39:12,560 Now, this quantity, let me set it to be delta. 512 00:39:12,560 --> 00:39:15,530 You see, delta is a function of epsilon 513 00:39:15,530 --> 00:39:17,390 because M is a function of epsilon. 514 00:39:21,330 --> 00:39:23,510 So now, looking back at the statement, 515 00:39:23,510 --> 00:39:27,530 you see for every epsilon there exists a delta, such 516 00:39:27,530 --> 00:39:38,160 that if your graph has fewer than delta n cubed triangles, 517 00:39:38,160 --> 00:39:43,050 then let me get rid of all those edges. 518 00:39:43,050 --> 00:39:47,060 I've gotten rid of fewer than epsilon n squared edges, 519 00:39:47,060 --> 00:39:50,310 and the remaining graph should be triangle free. 520 00:39:50,310 --> 00:39:52,340 Because if it were not triangle free, 521 00:39:52,340 --> 00:39:53,960 then I can find some triangle. 522 00:39:53,960 --> 00:39:57,790 And that will lead to a lot more triangles. 523 00:39:57,790 --> 00:40:00,020 So for example, if you set this as delta over 2, 524 00:40:00,020 --> 00:40:03,980 then this will give you 2 delta n cubed triangles. 525 00:40:03,980 --> 00:40:11,870 Therefore, it would contradict the hypothesis. 526 00:40:17,840 --> 00:40:22,380 And that finishes the proof of the triangle removal lemma, 527 00:40:22,380 --> 00:40:32,410 saying that thus the resulting graph is triangle free. 528 00:40:38,990 --> 00:40:41,930 So that's the proof of the triangle removal lemma. 529 00:40:41,930 --> 00:40:43,970 So let me recap. 530 00:40:43,970 --> 00:40:48,250 We start with a graph, apply Szemerédi's regularity lemma, 531 00:40:48,250 --> 00:40:51,220 and clean up the regularity partition by getting rid of low 532 00:40:51,220 --> 00:40:54,550 density pairs, getting rid of irregular pairs, 533 00:40:54,550 --> 00:40:58,030 and getting rid of edges touching a very small vertex 534 00:40:58,030 --> 00:40:58,650 set. 535 00:40:58,650 --> 00:41:01,720 And I claim that the resulting graph, after cleaning up, 536 00:41:01,720 --> 00:41:03,680 should be triangle free. 537 00:41:03,680 --> 00:41:06,730 Because if it were not triangle free and I find some triangle, 538 00:41:06,730 --> 00:41:11,290 then I should be able to use that triple of vertex sets, 539 00:41:11,290 --> 00:41:13,360 combined with a triangle counting lemma, 540 00:41:13,360 --> 00:41:16,300 to produce a lot more triangles and. 541 00:41:16,300 --> 00:41:19,600 That would violate the hypothesis of the theorem. 542 00:41:23,177 --> 00:41:23,760 Any questions? 543 00:41:23,760 --> 00:41:24,480 Yeah. 544 00:41:24,480 --> 00:41:28,435 AUDIENCE: Where are you using that there exists a triangle? 545 00:41:28,435 --> 00:41:29,310 PROFESSOR: Ah, great. 546 00:41:29,310 --> 00:41:32,350 So question is, where am I using there exists a triangle? 547 00:41:32,350 --> 00:41:34,760 If there were no triangles, then we're done. 548 00:41:34,760 --> 00:41:37,630 So the purpose of the triangle-- the claim in the triangle 549 00:41:37,630 --> 00:41:40,450 removal lemma is that you can get rid of all triangles 550 00:41:40,450 --> 00:41:43,540 by removing at most epsilon n squared edges. 551 00:41:43,540 --> 00:41:47,270 AUDIENCE: So say we did that, and now-- 552 00:41:47,270 --> 00:41:51,198 why does this not prove that we still have triangles? 553 00:41:51,198 --> 00:41:52,990 PROFESSOR: Can you say your question again? 554 00:41:52,990 --> 00:41:56,590 AUDIENCE: So say we've removed everything by our cleaning 555 00:41:56,590 --> 00:41:59,200 step, and we've removed epsilon n squared edges, 556 00:41:59,200 --> 00:42:01,690 why does this logic not prove that we still 557 00:42:01,690 --> 00:42:04,415 have delta n cubed triangles. 558 00:42:07,317 --> 00:42:07,900 PROFESSOR: OK. 559 00:42:07,900 --> 00:42:10,390 So let me try to answer your question. 560 00:42:10,390 --> 00:42:13,900 So why does this proof show that you still 561 00:42:13,900 --> 00:42:16,450 have delta n cubed triangles? 562 00:42:16,450 --> 00:42:18,940 So I only set delta at the end. 563 00:42:18,940 --> 00:42:20,680 But of course, you can also set delta 564 00:42:20,680 --> 00:42:23,210 in the beginning of this proof. 565 00:42:23,210 --> 00:42:25,760 So I'm saying that you do the step. 566 00:42:25,760 --> 00:42:28,910 You get rid of epsilon n squared edges. 567 00:42:28,910 --> 00:42:31,030 And now I claim, after the step-- 568 00:42:31,030 --> 00:42:39,940 so I claim the remaining graph is triangle free. 569 00:42:43,430 --> 00:42:47,805 If it were not triangle free, then, well, 570 00:42:47,805 --> 00:42:49,698 it has some triangle. 571 00:42:49,698 --> 00:42:51,490 Then the triangle counting lemma would tell 572 00:42:51,490 --> 00:42:54,740 me there are lots of triangles. 573 00:42:54,740 --> 00:42:57,820 And that would contradict the hypothesis 574 00:42:57,820 --> 00:43:01,360 where we assume that this graph G has 575 00:43:01,360 --> 00:43:03,848 a small number of triangles. 576 00:43:03,848 --> 00:43:06,243 AUDIENCE: So if there is no triangle, 577 00:43:06,243 --> 00:43:11,991 then we've removed edges between vi, vj, or vi, vk, or vj, 578 00:43:11,991 --> 00:43:14,187 vk for any three i, j, k. 579 00:43:14,187 --> 00:43:15,270 PROFESSOR: That's correct. 580 00:43:15,270 --> 00:43:19,710 So we're saying, if you do not have any triangles-- well, 581 00:43:19,710 --> 00:43:22,990 after the cleaning step, we have gotten rid 582 00:43:22,990 --> 00:43:27,152 of all the edges between the bad pairs. 583 00:43:27,152 --> 00:43:29,110 And I'm claiming that there is no configuration 584 00:43:29,110 --> 00:43:32,210 like this left. 585 00:43:32,210 --> 00:43:35,330 And this is the proof because if you have some configuration 586 00:43:35,330 --> 00:43:38,570 where you did not delete the edges between these three 587 00:43:38,570 --> 00:43:42,270 parts, then you should be able to get 588 00:43:42,270 --> 00:43:44,985 a lot more triangles from the triangle counting lemma. 589 00:43:48,560 --> 00:43:49,060 Yeah. 590 00:43:49,060 --> 00:43:50,435 AUDIENCE: What if there were lots 591 00:43:50,435 --> 00:43:53,968 of triangles inside each individual vi, vj, vk? 592 00:43:53,968 --> 00:43:55,510 PROFESSOR: You asked me, what happens 593 00:43:55,510 --> 00:43:58,583 if there were a lot of triangles inside each vi, vj, vk? 594 00:43:58,583 --> 00:43:59,250 So that is fine. 595 00:43:59,250 --> 00:44:01,230 If you find some triangle-- 596 00:44:01,230 --> 00:44:03,925 so this picture, i, j, or k, they 597 00:44:03,925 --> 00:44:05,050 do not have to be distinct. 598 00:44:07,830 --> 00:44:10,970 So the same proof works if i, j, and k, some of them 599 00:44:10,970 --> 00:44:13,440 are equal to each other. 600 00:44:13,440 --> 00:44:14,095 Yep. 601 00:44:14,095 --> 00:44:16,945 AUDIENCE: [INAUDIBLE] but, I don't really understand 602 00:44:16,945 --> 00:44:18,845 why-- isn't delta over 2 there? 603 00:44:21,420 --> 00:44:23,920 PROFESSOR: So you're asking, why did I put the delta over 2? 604 00:44:23,920 --> 00:44:26,800 Just because I put less than or equal to delta. 605 00:44:26,800 --> 00:44:28,420 If I put strictly less than delta, 606 00:44:28,420 --> 00:44:30,166 then I don't need a delta over 2. 607 00:44:30,166 --> 00:44:34,087 AUDIENCE: [INAUDIBLE] delta over 2 or 2 delta. 608 00:44:34,087 --> 00:44:34,670 PROFESSOR: OK. 609 00:44:37,205 --> 00:44:38,080 Don't worry about it. 610 00:44:47,470 --> 00:44:47,970 Yes. 611 00:44:47,970 --> 00:44:49,830 AUDIENCE: Is there a way to generalize 612 00:44:49,830 --> 00:44:52,897 the triangle counting lemma to a general graph? 613 00:44:52,897 --> 00:44:53,480 PROFESSOR: OK. 614 00:44:53,480 --> 00:44:55,272 You're asking, is there a way to generalize 615 00:44:55,272 --> 00:44:57,240 the triangle counting lemma to a general graph? 616 00:44:57,240 --> 00:44:57,740 So yes. 617 00:44:57,740 --> 00:45:00,770 We will see that not today but I think next time. 618 00:45:04,560 --> 00:45:07,860 Any more questions? 619 00:45:07,860 --> 00:45:09,780 Great. 620 00:45:09,780 --> 00:45:14,600 So why do people care about the triangle removal lemma? 621 00:45:14,600 --> 00:45:19,880 So it's a nice, maybe somewhat unintuitive statement. 622 00:45:19,880 --> 00:45:22,170 But there was a very good reason why the statement was 623 00:45:22,170 --> 00:45:24,630 formulated, and it's because you can 624 00:45:24,630 --> 00:45:26,520 use it to prove Roth's theorem. 625 00:45:26,520 --> 00:45:27,960 So that's what I want to explain, 626 00:45:27,960 --> 00:45:30,990 how to connect this graph theoretic statement 627 00:45:30,990 --> 00:45:35,130 to a statement about three-term AP-- 628 00:45:35,130 --> 00:45:38,670 three AP-free subsets of the integers. 629 00:45:38,670 --> 00:45:41,940 This goes back to the very connection between graph theory 630 00:45:41,940 --> 00:45:44,790 and additive combinatorics that I highlighted 631 00:45:44,790 --> 00:45:47,570 in the first lecture. 632 00:45:47,570 --> 00:45:53,900 First, let me state a corollary of the triangle removal lemma-- 633 00:45:53,900 --> 00:45:59,655 namely, that if you have an n vertex graph G, where-- 634 00:46:05,030 --> 00:46:15,280 so if G is n vertex, and every edge is in exactly one 635 00:46:15,280 --> 00:46:29,870 triangle, then the number of edges of G 636 00:46:29,870 --> 00:46:32,490 is little o of n squared. 637 00:46:36,500 --> 00:46:38,710 These are actually kind of strange graphs. 638 00:46:38,710 --> 00:46:40,630 Every edge is in exactly one triangle. 639 00:46:43,080 --> 00:46:43,580 OK. 640 00:46:47,340 --> 00:46:51,360 Well, the number of triangles in G-- 641 00:46:57,040 --> 00:46:59,560 ever edge is in exactly one triangle. 642 00:46:59,560 --> 00:47:02,920 So the number of triangles in G is the number 643 00:47:02,920 --> 00:47:04,890 of edges divided by 3. 644 00:47:09,170 --> 00:47:16,050 The number of edges is at most n squared. 645 00:47:16,050 --> 00:47:20,780 So this quantity is at most quadratic order, 646 00:47:20,780 --> 00:47:24,140 which in particular is little o of n cubed. 647 00:47:27,240 --> 00:47:29,970 And thus the triangle removal lemma 648 00:47:29,970 --> 00:47:35,030 tells us that G can be made triangle 649 00:47:35,030 --> 00:47:47,550 free by removing little o of n squared edges. 650 00:47:55,130 --> 00:48:03,430 On the other hand, since every edge 651 00:48:03,430 --> 00:48:08,940 is in exactly one triangle, well, 652 00:48:08,940 --> 00:48:11,730 how many edges do you need to remove to get rid 653 00:48:11,730 --> 00:48:13,530 of all the triangles? 654 00:48:13,530 --> 00:48:17,025 Well, I need to remove at least a third of the edges. 655 00:48:17,025 --> 00:48:25,030 I need to remove at least a third of edges 656 00:48:25,030 --> 00:48:28,660 to make G triangle free. 657 00:48:34,110 --> 00:48:37,970 Putting these two claims together, 658 00:48:37,970 --> 00:48:40,240 we see that the number of edges of G 659 00:48:40,240 --> 00:48:42,630 must be little o of n squared. 660 00:48:48,410 --> 00:48:50,342 Any questions? 661 00:48:50,342 --> 00:48:53,272 AUDIENCE: Are there not more elementary ways to prove this? 662 00:48:53,272 --> 00:48:53,980 PROFESSOR: Great. 663 00:48:53,980 --> 00:48:58,170 Question is, are there not more elementary ways to prove this? 664 00:48:58,170 --> 00:49:00,140 Let me make some comments about that. 665 00:49:00,140 --> 00:49:12,000 So the short answer is, yes but not really. 666 00:49:12,000 --> 00:49:13,525 And really, the answer is no. 667 00:49:13,525 --> 00:49:16,200 [LAUGHTER] 668 00:49:16,200 --> 00:49:21,000 So you can ask, what about quantitative bounds? 669 00:49:21,000 --> 00:49:24,030 Because what is more elementary, what is less elementary 670 00:49:24,030 --> 00:49:25,320 is kind of subjective. 671 00:49:25,320 --> 00:49:27,223 But quantitative bounds, something 672 00:49:27,223 --> 00:49:28,140 that is very concrete. 673 00:49:28,140 --> 00:49:29,370 It's hard to argue. 674 00:49:29,370 --> 00:49:38,560 So if you look at the triangle removal lemma, you can ask, 675 00:49:38,560 --> 00:49:43,990 how is the dependence of delta on epsilon? 676 00:49:43,990 --> 00:49:46,940 So what does the proof give you? 677 00:49:46,940 --> 00:49:49,145 Where's the bottleneck? 678 00:49:49,145 --> 00:49:53,920 The bottleneck is always in the application 679 00:49:53,920 --> 00:49:58,450 of Szemerédi's regularity lemma-- namely in this M. 680 00:49:58,450 --> 00:50:01,060 So none of the other epsilons really matter. 681 00:50:01,060 --> 00:50:04,750 It's this M that kills you in terms of quantitative bounds. 682 00:50:04,750 --> 00:50:13,850 So in triangle removal lemma, this proof gives 1 over delta. 683 00:50:13,850 --> 00:50:26,210 So you can take 1 over delta being a tower of twos of height 684 00:50:26,210 --> 00:50:34,180 at most polynomial in 1 over epsilon. 685 00:50:34,180 --> 00:50:36,550 So that is your different proof. 686 00:50:36,550 --> 00:50:44,100 Well, the best known bound due to Fox 687 00:50:44,100 --> 00:50:48,590 is that you can replace this height 688 00:50:48,590 --> 00:50:55,170 by a different height that is at most essentially 689 00:50:55,170 --> 00:50:59,040 logarithmic in 1 over epsilon. 690 00:50:59,040 --> 00:51:01,290 Still a tower of twos. 691 00:51:01,290 --> 00:51:03,150 So we've changed some really big number 692 00:51:03,150 --> 00:51:08,360 to another, but slightly smaller, really big number. 693 00:51:08,360 --> 00:51:10,270 So this is still an astronomical number 694 00:51:10,270 --> 00:51:13,850 for any reasonable epsilon. 695 00:51:13,850 --> 00:51:18,490 And in terms of that corollary, basically the only known proof 696 00:51:18,490 --> 00:51:21,270 goes through the triangle removal lemma. 697 00:51:21,270 --> 00:51:25,780 Currently, we do not know any other approach to this problem. 698 00:51:25,780 --> 00:51:29,530 And you'll see later on that, well, what's the best 699 00:51:29,530 --> 00:51:30,790 thing that we can hope for? 700 00:51:30,790 --> 00:51:35,200 So it is quite possible that there are other proofs that 701 00:51:35,200 --> 00:51:37,270 are yet to be found. 702 00:51:37,270 --> 00:51:39,590 So that's actually-- people believe this, 703 00:51:39,590 --> 00:51:41,710 that this is not the right proof, 704 00:51:41,710 --> 00:51:44,200 that maybe there's some other way to do this. 705 00:51:44,200 --> 00:51:47,980 And the best lower bound, which we'll see either later today 706 00:51:47,980 --> 00:52:02,000 or next time, shows that we cannot do better than 1 over 707 00:52:02,000 --> 00:52:11,080 epsilon being essentially just a little bit more than polynomial 708 00:52:11,080 --> 00:52:12,240 in epsilon. 709 00:52:12,240 --> 00:52:17,590 So epsilon raised to something that is 710 00:52:17,590 --> 00:52:20,300 logarithmic in 1 over epsilon. 711 00:52:23,210 --> 00:52:25,410 So you can think of this as very-- 712 00:52:25,410 --> 00:52:28,010 it's a little bit bigger than polynomial in 1 over epsilon 713 00:52:28,010 --> 00:52:31,590 but not that much bigger than polynomial in 1 over epsilon. 714 00:52:31,590 --> 00:52:33,590 So there is a very big gap in our knowledge 715 00:52:33,590 --> 00:52:38,030 on what is the right dependence between epsilon and delta 716 00:52:38,030 --> 00:52:40,640 in the triangle removal lemma. 717 00:52:40,640 --> 00:52:42,272 And that's one of the-- 718 00:52:42,272 --> 00:52:45,380 it's a major open problem in extremal combinatorics 719 00:52:45,380 --> 00:52:46,460 to close this gap. 720 00:52:50,570 --> 00:52:51,640 Other questions? 721 00:52:55,400 --> 00:52:57,000 All right. 722 00:52:57,000 --> 00:52:58,250 So let's prove Roth's theorem. 723 00:53:00,980 --> 00:53:06,710 So let me remind you that Roth's theorem, which 724 00:53:06,710 --> 00:53:09,260 we saw in the very first lecture, 725 00:53:09,260 --> 00:53:13,340 says that if you have a subset of 1 726 00:53:13,340 --> 00:53:19,620 through n that is free of three-term arithmetic 727 00:53:19,620 --> 00:53:27,320 progressions, then the size of the set must be sublinear. 728 00:53:33,060 --> 00:53:36,610 So what does this have to do with a triangle removal lemma? 729 00:53:36,610 --> 00:53:38,540 So if you remember the first lecture, 730 00:53:38,540 --> 00:53:40,540 maybe the connection shouldn't be so surprising. 731 00:53:40,540 --> 00:53:45,640 What we will do is we will set up a graph, 732 00:53:45,640 --> 00:53:48,640 starting from some arithmetic sets such 733 00:53:48,640 --> 00:53:51,700 that the graph encodes some arithmetic information-- 734 00:53:51,700 --> 00:53:56,950 in particular, the three-term APs in your graph, in the set, 735 00:53:56,950 --> 00:54:02,760 correspond to the triangles in the graph. 736 00:54:02,760 --> 00:54:05,550 So let's set up this graph. 737 00:54:05,550 --> 00:54:14,070 It will be helpful to view A not as a subset of the integers. 738 00:54:14,070 --> 00:54:15,750 It'll just be more convenient to view it 739 00:54:15,750 --> 00:54:17,250 as a subset of a cyclic group. 740 00:54:19,950 --> 00:54:23,530 Because I don't have to worry about edge cases so much when 741 00:54:23,530 --> 00:54:24,780 you're working a cyclic group. 742 00:54:28,060 --> 00:54:30,800 Here I take M to be 2N plus 1. 743 00:54:30,800 --> 00:54:34,320 So having it odd makes my life a bit simpler. 744 00:54:34,320 --> 00:54:40,050 Then if A is three AP free subset of 1 through n, 745 00:54:40,050 --> 00:54:43,680 then I claim that A now sitting inside this cyclic group 746 00:54:43,680 --> 00:54:45,650 is also three AP free. 747 00:54:49,580 --> 00:54:53,141 So it's a subset of Z mod n. 748 00:54:57,070 --> 00:55:01,080 And what we will do is that we will set up a certain graph. 749 00:55:04,620 --> 00:55:14,570 So we will set up a tripartite graph, x, y, z. 750 00:55:14,570 --> 00:55:18,130 And here, x, y, and z are going to be 751 00:55:18,130 --> 00:55:23,130 M elements whose vertices are represented 752 00:55:23,130 --> 00:55:25,230 by elements of Z mod n. 753 00:55:28,090 --> 00:55:32,530 And I need to tell you what are the edges of this graph. 754 00:55:32,530 --> 00:55:34,250 So here are the edges. 755 00:55:34,250 --> 00:55:42,210 I'm putting an edge between vertex x and y if and only 756 00:55:42,210 --> 00:55:51,740 if y minus x is an element of A. 757 00:55:51,740 --> 00:55:54,650 So it's a rule for how to put in the edges. 758 00:55:54,650 --> 00:55:58,550 And this is basically a Cayley graph, a bipartite variant 759 00:55:58,550 --> 00:56:01,590 of a Cayley graph. 760 00:56:01,590 --> 00:56:07,090 Likewise, I put an edge between x and z. 761 00:56:07,090 --> 00:56:10,820 So let me put x down here and y up there. 762 00:56:10,820 --> 00:56:18,750 So let me put in the edge between y and z if and only 763 00:56:18,750 --> 00:56:25,365 if z minus y is an element of A. 764 00:56:25,365 --> 00:56:31,450 And for the very last pair, it's similar but slightly different. 765 00:56:31,450 --> 00:56:37,900 I'm putting that edge if and only if z minus x divided by 2 766 00:56:37,900 --> 00:56:41,080 is an element of A. Because we're in an odd cyclic group 767 00:56:41,080 --> 00:56:42,070 I can divide by 2. 768 00:56:46,040 --> 00:56:47,100 So this is a graph. 769 00:56:47,100 --> 00:56:49,470 So starting with a set A I give you 770 00:56:49,470 --> 00:56:53,130 this rule for constructing this tripartite graph. 771 00:56:53,130 --> 00:56:55,410 And the question now is, what are 772 00:56:55,410 --> 00:56:57,150 the triangles in this graph? 773 00:57:00,090 --> 00:57:10,190 If the vertices x, y, z is a triangle, then these three 774 00:57:10,190 --> 00:57:14,720 numbers by definition, because of the edges-- 775 00:57:14,720 --> 00:57:17,750 because they're all edges in this graph, these three 776 00:57:17,750 --> 00:57:25,380 numbers, they all lie in A. 777 00:57:25,380 --> 00:57:27,760 But now notice that these three numbers, 778 00:57:27,760 --> 00:57:35,480 they form a three-term arithmetic progression 779 00:57:35,480 --> 00:57:41,350 because the middle element is the average of the two others. 780 00:57:41,350 --> 00:57:44,470 But we said that A is a set that is three AP free. 781 00:57:52,300 --> 00:57:55,100 Has no three-term arithmetic progression. 782 00:57:55,100 --> 00:57:59,000 So what must be the case? 783 00:57:59,000 --> 00:58:02,670 So A is 3 AP free. 784 00:58:02,670 --> 00:58:06,420 But you can still have three APs using the same element 785 00:58:06,420 --> 00:58:07,010 three times. 786 00:58:09,760 --> 00:58:11,760 So all the three-term arithmetic progressions 787 00:58:11,760 --> 00:58:13,270 must be of that form. 788 00:58:13,270 --> 00:58:20,220 So these three numbers must then equal to each other. 789 00:58:23,932 --> 00:58:27,540 And in particular, you see that if you select x and y, 790 00:58:27,540 --> 00:58:28,540 it determines z. 791 00:58:31,340 --> 00:58:35,650 This equality here is the same as saying that x, y, 792 00:58:35,650 --> 00:58:43,000 and z they themselves form a three AP in Z mod nz. 793 00:58:48,280 --> 00:58:51,230 So this is precisely the description 794 00:58:51,230 --> 00:58:53,360 of all the triangles in the graph. 795 00:59:06,740 --> 00:59:09,340 So all the triangles in the graph G 796 00:59:09,340 --> 00:59:12,590 are precisely x, y, z, where x, y, and z 797 00:59:12,590 --> 00:59:15,820 form a three-term arithmetic progression. 798 00:59:15,820 --> 00:59:31,340 And in particular, every edge of G lies in exactly one triangle. 799 00:59:35,400 --> 00:59:36,570 You give me an edge-- 800 00:59:36,570 --> 00:59:37,890 for example, xy-- 801 00:59:37,890 --> 00:59:41,000 I complete it two a three AP, x, y, z. 802 00:59:41,000 --> 00:59:42,000 And that's the triangle. 803 00:59:42,000 --> 00:59:45,220 And that's the unique triangle that the edge sits in. 804 00:59:45,220 --> 00:59:48,720 And likewise, if you give me xz or yz, 805 00:59:48,720 --> 00:59:50,430 I can produce for you a unique triangle. 806 00:59:53,240 --> 00:59:55,140 So we have this graph. 807 00:59:55,140 --> 00:59:57,620 It has this property that every edge lies in exactly one 808 00:59:57,620 --> 00:59:59,690 triangle, so we can apply the corollary 809 00:59:59,690 --> 01:00:07,440 up there to deduce a bound on the total number of edges. 810 01:00:07,440 --> 01:00:08,780 Well, how many edges are there? 811 01:00:13,820 --> 01:00:19,850 On one hand, we see that because it's a Cayley graph, each 812 01:00:19,850 --> 01:00:21,070 of the three parts-- 813 01:00:21,070 --> 01:00:22,730 there are three parts here. 814 01:00:22,730 --> 01:00:28,040 Each of the three parts, if I start with any vertex, 815 01:00:28,040 --> 01:00:35,240 I have A edges coming out of that vertex to the next part 816 01:00:35,240 --> 01:00:37,160 by the construction. 817 01:00:37,160 --> 01:00:40,280 On the other hand, by the corollary 818 01:00:40,280 --> 01:00:43,010 up there, the number of edges has 819 01:00:43,010 --> 01:00:45,910 to be little o of M squared. 820 01:00:50,830 --> 01:00:55,260 And because M is essentially twice n, 821 01:00:55,260 --> 01:01:04,950 we obtain that the size of A is little o of M. 822 01:01:04,950 --> 01:01:07,490 And that proves Roth's theorem. 823 01:01:07,490 --> 01:01:07,990 Yeah? 824 01:01:07,990 --> 01:01:09,698 AUDIENCE: Could you explain one more time 825 01:01:09,698 --> 01:01:12,257 why every edge is in exactly one triangle? 826 01:01:12,257 --> 01:01:12,840 PROFESSOR: OK. 827 01:01:12,840 --> 01:01:17,190 So the question is, why is every edge in exactly one triangle? 828 01:01:17,190 --> 01:01:18,960 So you know what all the edges are. 829 01:01:18,960 --> 01:01:23,010 So this is a description of what all the edges are. 830 01:01:23,010 --> 01:01:25,740 And what are all the triangles. 831 01:01:25,740 --> 01:01:30,240 Well, x, y, z is a triangle precisely when these three 832 01:01:30,240 --> 01:01:34,080 expressions all lie in A. But note that these three 833 01:01:34,080 --> 01:01:36,120 expressions, they form a three AP 834 01:01:36,120 --> 01:01:40,730 because the middle term is the average of the two others. 835 01:01:40,730 --> 01:01:43,640 So x, y, z is the triangle if and only 836 01:01:43,640 --> 01:01:45,050 if this equation is true. 837 01:01:47,660 --> 01:01:50,460 And this equation is true if and only if x, 838 01:01:50,460 --> 01:01:53,620 y, z form a three AP in Z mod n. 839 01:01:53,620 --> 01:01:56,560 So if you just read out this equation, I give you x and y. 840 01:01:56,560 --> 01:01:59,650 So what is z? 841 01:01:59,650 --> 01:02:07,300 So all the triangles in x, y, z are precisely given by three 842 01:02:07,300 --> 01:02:15,480 APs, where one of the differences y minus x is in A. 843 01:02:15,480 --> 01:02:15,980 OK. 844 01:02:15,980 --> 01:02:17,570 So I give you an edge. 845 01:02:17,570 --> 01:02:26,710 For example, xy, such that y minus z is in A. 846 01:02:26,710 --> 01:02:30,040 And I claim there's a unique z that completes 847 01:02:30,040 --> 01:02:31,060 this edge to a triangle. 848 01:02:35,100 --> 01:02:37,960 Well, it tells you what that z is. 849 01:02:37,960 --> 01:02:43,370 z has to be the element in Z mod m 850 01:02:43,370 --> 01:02:46,990 that completes x and y to a three AP. 851 01:02:46,990 --> 01:02:50,620 Namely, z is the solution to this equation. 852 01:02:50,620 --> 01:02:52,310 No other z can work. 853 01:02:52,310 --> 01:02:54,340 And you can check that z indeed works 854 01:02:54,340 --> 01:02:57,100 and that all the remaining pairs are edges. 855 01:03:00,060 --> 01:03:02,460 So it's something you can check. 856 01:03:05,547 --> 01:03:09,610 Any more questions? 857 01:03:09,610 --> 01:03:14,260 So starting with the set A that is three AP free, 858 01:03:14,260 --> 01:03:17,890 we set up this graph with a property 859 01:03:17,890 --> 01:03:22,140 that every edge lies in exactly one triangle. 860 01:03:22,140 --> 01:03:24,650 And the one triangle basically corresponds to the fact 861 01:03:24,650 --> 01:03:29,060 that you always have these trivial three APs repeating 862 01:03:29,060 --> 01:03:31,870 the same element three times. 863 01:03:31,870 --> 01:03:34,840 And then, by applying this corollary of the triangle 864 01:03:34,840 --> 01:03:38,200 removal lemma, we deduce that the number 865 01:03:38,200 --> 01:03:40,690 of edges in the graph must be subquadratic. 866 01:03:40,690 --> 01:03:43,960 So then the size of A must be sublinear. 867 01:03:43,960 --> 01:03:45,930 And that proves Roth's theorem. 868 01:03:50,030 --> 01:03:53,420 So we did quite a bit of work in proving this theorem-- 869 01:03:53,420 --> 01:03:57,470 Szemerédi's regularity lemma, counting lemma, removal lemma, 870 01:03:57,470 --> 01:03:59,030 and then we set up this graph. 871 01:03:59,030 --> 01:04:02,710 So it's not an easy theorem. 872 01:04:02,710 --> 01:04:04,730 Later in the course, we'll see a different proof 873 01:04:04,730 --> 01:04:08,060 of Roth's theorem that goes through Fourier analysis. 874 01:04:08,060 --> 01:04:10,190 That will look somewhat different, 875 01:04:10,190 --> 01:04:12,330 but it will have similar themes. 876 01:04:12,330 --> 01:04:14,480 So we'll also have this theme comparing 877 01:04:14,480 --> 01:04:18,770 structure and pseudorandomness, which comes up in the proof-- 878 01:04:18,770 --> 01:04:22,370 in the statement and proof of Szemerédi's graph regularity 879 01:04:22,370 --> 01:04:22,940 lemma. 880 01:04:22,940 --> 01:04:24,830 So there, it's really about understanding 881 01:04:24,830 --> 01:04:28,100 what is the structure of the graph in terms of decomposition 882 01:04:28,100 --> 01:04:30,920 into parts that look pseudorandom. 883 01:04:30,920 --> 01:04:32,006 Yeah. 884 01:04:32,006 --> 01:04:34,471 AUDIENCE: You called the graph the Cayley graph. 885 01:04:34,471 --> 01:04:35,622 Why? 886 01:04:35,622 --> 01:04:36,205 PROFESSOR: OK. 887 01:04:36,205 --> 01:04:38,700 So question is, why do I call this graph the Cayley graph? 888 01:04:38,700 --> 01:04:41,130 So usually the Cayley graph refers to a graph 889 01:04:41,130 --> 01:04:45,120 where I give you a group, and I give you a subset of the group, 890 01:04:45,120 --> 01:04:49,830 and I connect two elements if, let's say, their difference 891 01:04:49,830 --> 01:04:52,260 lies in my subset. 892 01:04:52,260 --> 01:04:54,220 This basically has that form. 893 01:04:54,220 --> 01:04:56,910 So it's not exactly what people mean by a Cayley graph, 894 01:04:56,910 --> 01:04:58,140 but it has that spirit. 895 01:05:01,890 --> 01:05:03,128 Any more questions? 896 01:05:06,120 --> 01:05:06,940 OK. 897 01:05:06,940 --> 01:05:10,390 So earlier I talked about bounds for triangle removal lemma. 898 01:05:10,390 --> 01:05:13,630 So what about bounds for Roth's theorem? 899 01:05:13,630 --> 01:05:16,540 We do know somewhat better bounds for Roth's theorem 900 01:05:16,540 --> 01:05:19,270 compared to this proof. 901 01:05:19,270 --> 01:05:22,300 Somehow it's a nice proof, it's a nice graph, theoretic proof, 902 01:05:22,300 --> 01:05:24,530 but it doesn't give you very good bounds. 903 01:05:24,530 --> 01:05:28,730 It gives you bounds that decay very poorly as a function of n. 904 01:05:28,730 --> 01:05:31,490 Actually, what does it give you as a function of n? 905 01:05:31,490 --> 01:05:35,120 If you were to replace this little o by a function of n 906 01:05:35,120 --> 01:05:39,060 according to this proof, what would you get? 907 01:05:39,060 --> 01:05:41,690 I'm basically asking, what is the inverse 908 01:05:41,690 --> 01:05:44,480 of the function where you input some number 909 01:05:44,480 --> 01:05:47,770 and it gives you a tower of exponentials of height 910 01:05:47,770 --> 01:05:48,590 with that input? 911 01:05:54,070 --> 01:05:56,040 It's called a log star. 912 01:05:56,040 --> 01:05:57,930 So the log star-- 913 01:05:57,930 --> 01:06:04,230 so this is essentially N over the log star of N. 914 01:06:04,230 --> 01:06:06,410 So the log star basically is the number 915 01:06:06,410 --> 01:06:14,640 of times you have to take the logarithm to get you below 1. 916 01:06:14,640 --> 01:06:16,120 So that's the log star. 917 01:06:16,120 --> 01:06:20,000 And there's a saying that the log star, we 918 01:06:20,000 --> 01:06:22,000 know that it grows to infinity, but it has never 919 01:06:22,000 --> 01:06:23,350 been observed to do so. 920 01:06:23,350 --> 01:06:26,350 It's extremely slowly growing function. 921 01:06:30,260 --> 01:06:33,164 Any more questions? 922 01:06:33,164 --> 01:06:39,260 So I want to-- 923 01:06:39,260 --> 01:06:42,470 so next time I want to show you a construction that 924 01:06:42,470 --> 01:06:47,300 gives you a-- 925 01:06:47,300 --> 01:06:49,190 so next time I will show you a construction 926 01:06:49,190 --> 01:06:56,300 that gives you a subset A of n that is fairly large. 927 01:06:56,300 --> 01:06:58,670 So you might ask, OK, so you have this upper bound, 928 01:06:58,670 --> 01:07:00,250 but what should the truth be? 929 01:07:00,250 --> 01:07:03,472 And here's more or less the state of knowledge. 930 01:07:18,724 --> 01:07:26,640 So best bounds of Roth's theorem. 931 01:07:31,950 --> 01:07:41,040 Basically, the best bounds have the form N divided by basically 932 01:07:41,040 --> 01:07:44,820 log N raised to power 1 plus little o1. 933 01:07:48,010 --> 01:07:50,810 The precise bounds are of the form N over log N, 934 01:07:50,810 --> 01:07:52,748 and then there's some extra log-log factors. 935 01:07:52,748 --> 01:07:54,040 But let's not worry about that. 936 01:07:57,000 --> 01:07:58,850 The best lower bounds-- 937 01:07:58,850 --> 01:08:00,210 so we'll see this next time. 938 01:08:00,210 --> 01:08:05,060 So there exists subsets of 1 through N such 939 01:08:05,060 --> 01:08:13,420 that the size of A is at least e to the-- 940 01:08:18,109 --> 01:08:24,710 so N times-- so first, let me say it's pretty close to-- 941 01:08:24,710 --> 01:08:29,270 the exponent is as close to 1 as you wish. 942 01:08:29,270 --> 01:08:32,649 So there exists as A such that the size of A 943 01:08:32,649 --> 01:08:34,210 is N to the 1 minus little o1. 944 01:08:34,210 --> 01:08:38,729 And already, this fact is an indication 945 01:08:38,729 --> 01:08:41,550 of the difficulty of the problem because if you 946 01:08:41,550 --> 01:08:44,340 could prove Roth's theorem through some fairly 947 01:08:44,340 --> 01:08:46,420 elementary techniques, like using 948 01:08:46,420 --> 01:08:49,229 a Cauchy-Schwarz a bunch of times for instance, 949 01:08:49,229 --> 01:08:52,740 then experience tells us that you probably 950 01:08:52,740 --> 01:08:58,050 expect some bound that's power saving, replacing 951 01:08:58,050 --> 01:09:00,840 the 1 by some smaller number. 952 01:09:00,840 --> 01:09:01,840 But that's not the case. 953 01:09:01,840 --> 01:09:03,382 And the fact that that's not the case 954 01:09:03,382 --> 01:09:05,080 is already indication of the difficulty 955 01:09:05,080 --> 01:09:07,510 of this upper bound of Roth's theorem, 956 01:09:07,510 --> 01:09:09,279 even getting a little o. 957 01:09:09,279 --> 01:09:12,350 So you don't expect there to be simple proofs getting 958 01:09:12,350 --> 01:09:13,276 the little o. 959 01:09:16,020 --> 01:09:18,588 The bound that we'll see next time-- 960 01:09:18,588 --> 01:09:20,380 so we'll see a construction which gives you 961 01:09:20,380 --> 01:09:22,270 a bound that is of this form. 962 01:09:30,040 --> 01:09:33,189 So it's maybe a little bit hard to think 963 01:09:33,189 --> 01:09:36,380 about how quickly this function grows, 964 01:09:36,380 --> 01:09:38,200 but I'll let you think about it. 965 01:09:41,220 --> 01:09:44,250 Now, how does this-- 966 01:09:44,250 --> 01:09:46,490 so let's look at this corollary here. 967 01:09:50,740 --> 01:09:54,580 Can you see a way to construct a graph which 968 01:09:54,580 --> 01:09:59,135 has lots of edges, such that every edge lies in exactly one 969 01:09:59,135 --> 01:09:59,635 triangle? 970 01:10:13,490 --> 01:10:16,820 So we did this connection showing 971 01:10:16,820 --> 01:10:22,790 how to use this corollary to prove Roth's theorem. 972 01:10:22,790 --> 01:10:27,190 But you can run the same connection. 973 01:10:27,190 --> 01:10:41,180 So starting from this three AP free A, 974 01:10:41,180 --> 01:10:58,910 we can use that construction to build a graph 975 01:10:58,910 --> 01:11:11,150 n, such that a graph of n vertices with essentially 976 01:11:11,150 --> 01:11:19,320 order of n times the size of A number of edges, such 977 01:11:19,320 --> 01:11:33,880 that every edge lies in exactly one triangle. 978 01:11:39,230 --> 01:11:42,850 So you run the same construction. 979 01:11:42,850 --> 01:11:44,770 And this is actually more or less 980 01:11:44,770 --> 01:11:49,030 the only way that we know how to construct such graphs that 981 01:11:49,030 --> 01:11:50,080 are fairly dense. 982 01:11:53,260 --> 01:11:55,480 So on one hand-- 983 01:11:55,480 --> 01:11:58,260 basically what I said earlier. 984 01:11:58,260 --> 01:12:00,080 On one hand, you have this upper bound, 985 01:12:00,080 --> 01:12:03,430 which is given by the proof of using Szemerédi's regularity 986 01:12:03,430 --> 01:12:07,690 lemma that gives you a tower in the upper bound of 1 over 987 01:12:07,690 --> 01:12:09,640 delta. 988 01:12:09,640 --> 01:12:13,230 And if you use this construction here of three AP free 989 01:12:13,230 --> 01:12:17,330 set to construct the graph, you get this lower bound on delta, 990 01:12:17,330 --> 01:12:19,510 which is quasipolynomial. 991 01:12:22,370 --> 01:12:24,945 And that's more or less that we know. 992 01:12:24,945 --> 01:12:27,320 And there's a major open problem to close these two gaps. 993 01:12:32,150 --> 01:12:35,890 Any more questions? 994 01:12:35,890 --> 01:12:39,420 So I want to give you a plan on what's coming up ahead. 995 01:12:39,420 --> 01:12:43,650 So today we saw one application of Szemerédi's regularity 996 01:12:43,650 --> 01:12:44,370 lemma-- 997 01:12:44,370 --> 01:12:47,220 namely, the triangle removal lemma, 998 01:12:47,220 --> 01:12:49,620 which has this application to Roth's theorem. 999 01:12:49,620 --> 01:12:52,410 So we've seen our first proof of Roth's theorem. 1000 01:12:52,410 --> 01:12:55,260 And next lecture, and the next couple lectures, 1001 01:12:55,260 --> 01:12:59,610 I want to show you a few extensions and applications 1002 01:12:59,610 --> 01:13:01,790 of Szemerédi's regularity lemma. 1003 01:13:01,790 --> 01:13:03,810 So one of the questions today was, 1004 01:13:03,810 --> 01:13:05,700 we knew how to count the triangles, but what 1005 01:13:05,700 --> 01:13:06,615 about other graphs? 1006 01:13:06,615 --> 01:13:08,740 And as you can imagine, if you can count triangles, 1007 01:13:08,740 --> 01:13:11,430 then the other graphs should also 1008 01:13:11,430 --> 01:13:13,210 be doable using the same ideas. 1009 01:13:13,210 --> 01:13:14,460 And we'll do that. 1010 01:13:14,460 --> 01:13:17,640 So we'll see how to count other graphs. 1011 01:13:17,640 --> 01:13:19,290 And we'll give you a-- 1012 01:13:19,290 --> 01:13:24,090 well, I'll give you a proof of the Erdos-Stone-Simonovits 1013 01:13:24,090 --> 01:13:28,700 theorem that we did not prove in the first part of this course. 1014 01:13:28,700 --> 01:13:33,450 So it gives you an upper bound on the extremal number 1015 01:13:33,450 --> 01:13:39,990 of a graph H that depends only on the chromatic number of H. 1016 01:13:39,990 --> 01:13:41,480 So we'll do that. 1017 01:13:41,480 --> 01:13:44,090 And then I'll also mention, although not prove, 1018 01:13:44,090 --> 01:13:46,640 some extensions of the regularity 1019 01:13:46,640 --> 01:13:52,430 lemma to other settings, such as to hypergraphs. 1020 01:13:52,430 --> 01:13:54,680 And what that's useful for is that it 1021 01:13:54,680 --> 01:13:58,580 will allow us to deduce generalizations 1022 01:13:58,580 --> 01:14:04,830 of Roth's theorem to longer arithmetic progressions. 1023 01:14:04,830 --> 01:14:07,280 Proving Szemerédi's theorem. 1024 01:14:07,280 --> 01:14:12,060 So one way to deduce Szemerédi's theorem is to use a hypergraph 1025 01:14:12,060 --> 01:14:12,985 removal lemma-- 1026 01:14:12,985 --> 01:14:16,080 the hypergraph extension of the graph removal lemma, 1027 01:14:16,080 --> 01:14:19,100 the triangle removal lemma that we saw today. 1028 01:14:19,100 --> 01:14:22,830 It would also let us derive higher dimensional 1029 01:14:22,830 --> 01:14:27,790 generalizations of these theorems. 1030 01:14:27,790 --> 01:14:29,188 So it's a very powerful tool. 1031 01:14:29,188 --> 01:14:30,980 And actually, the hypergraph removal lemma, 1032 01:14:30,980 --> 01:14:32,920 as mentioned in the very first lecture, 1033 01:14:32,920 --> 01:14:37,340 it's a very difficult extension of the graph removal lemma. 1034 01:14:37,340 --> 01:14:39,630 And the hypergraph regularity lemma, 1035 01:14:39,630 --> 01:14:42,330 which can be used to prove the hypergraph removal lemma, 1036 01:14:42,330 --> 01:14:45,090 is a difficult extension of the graph regularity lemma. 1037 01:14:48,520 --> 01:14:51,180 So we'll see that in the next few lectures.