1 00:00:17,367 --> 00:00:18,950 YUFEI ZHAO: For the past few lectures, 2 00:00:18,950 --> 00:00:21,310 we've been discussing the structure of set addition, 3 00:00:21,310 --> 00:00:25,260 and which culminated in the proof of Freiman's theorem. 4 00:00:25,260 --> 00:00:28,270 So this was a pretty big and central result 5 00:00:28,270 --> 00:00:30,640 in additive combinatorics, which gives you 6 00:00:30,640 --> 00:00:35,500 a complete characterization of sets with small doubling. 7 00:00:35,500 --> 00:00:38,650 Today, I want to look at a somewhat different issue also 8 00:00:38,650 --> 00:00:40,780 related to sets of small doubling, 9 00:00:40,780 --> 00:00:44,860 but this time we want to have a somewhat different 10 00:00:44,860 --> 00:00:48,760 characterization of what does it mean for a set to have 11 00:00:48,760 --> 00:00:51,652 lots of additive structure. 12 00:00:51,652 --> 00:00:53,110 So in today's lecture, we're always 13 00:00:53,110 --> 00:00:55,968 going to be working in an Abelian group. 14 00:01:00,450 --> 00:01:02,990 Let me define the following quantity. 15 00:01:02,990 --> 00:01:09,450 Given sets A and B, we define the additive energy 16 00:01:09,450 --> 00:01:16,450 between A and B to be denoted by E of A and B. 17 00:01:16,450 --> 00:01:17,820 So A and B are subgroups. 18 00:01:17,820 --> 00:01:21,840 They're subsets of this arbitrary Abelian group. 19 00:01:21,840 --> 00:01:26,250 So E of A and B is defined to be the number of quadruples, a1, 20 00:01:26,250 --> 00:01:32,820 a2, b1, b2, where a1, a2 are elements of A, and b1, 21 00:01:32,820 --> 00:01:41,310 b2 are elements of B, such that a1 plus b1 22 00:01:41,310 --> 00:01:43,680 equals to a2 plus b2. 23 00:01:48,650 --> 00:01:52,610 So the additive energy is the number 24 00:01:52,610 --> 00:01:56,510 of quadruples of these elements where you 25 00:01:56,510 --> 00:01:59,510 have this additive relation. 26 00:01:59,510 --> 00:02:02,000 And we would like to understand sets 27 00:02:02,000 --> 00:02:04,790 with large additive energy. 28 00:02:04,790 --> 00:02:07,460 So, intuitively, if you have lots of solutions 29 00:02:07,460 --> 00:02:09,680 to this equation in your sets, then the 30 00:02:09,680 --> 00:02:14,590 sets themselves should have lots of internal additive structure. 31 00:02:14,590 --> 00:02:17,900 So it's a different way of describing additive structure, 32 00:02:17,900 --> 00:02:19,400 and we'd like to understand how does 33 00:02:19,400 --> 00:02:21,800 this way of describing additive structure 34 00:02:21,800 --> 00:02:26,510 relate to things we've seen before, namely small doubling. 35 00:02:30,480 --> 00:02:33,840 When you have not two sets but just one set-- 36 00:02:33,840 --> 00:02:37,170 slightly easier to think about-- 37 00:02:37,170 --> 00:02:42,690 we just write E of A. I mean E of A comma A. 38 00:02:42,690 --> 00:02:55,150 And these objects are analogous to 4 cycles in graph theory. 39 00:02:55,150 --> 00:02:59,020 Because if you about this expression here in a Cayley 40 00:02:59,020 --> 00:03:02,650 graph, let's say over F2, then this 41 00:03:02,650 --> 00:03:05,110 is the description of a 4 cycle. 42 00:03:05,110 --> 00:03:07,240 You go around 4 steps, and you come 43 00:03:07,240 --> 00:03:09,460 back to where you started from. 44 00:03:09,460 --> 00:03:13,360 So these objects are the analogs of 4 cycles. 45 00:03:13,360 --> 00:03:16,720 And we already saw in our discussion of quasi-randomness, 46 00:03:16,720 --> 00:03:18,970 and also elsewhere, that 4 cycles 47 00:03:18,970 --> 00:03:22,030 play an important role in graph theory. 48 00:03:22,030 --> 00:03:24,160 And, likewise, these additive energies 49 00:03:24,160 --> 00:03:27,190 are going to play an important role in describing sets 50 00:03:27,190 --> 00:03:28,847 with additive structure. 51 00:03:33,320 --> 00:03:35,570 Consider the following quantity. 52 00:03:35,570 --> 00:03:42,530 We're going to let r sub A comma B of x to be the number of ways 53 00:03:42,530 --> 00:03:45,410 to write x as a plus b. 54 00:03:49,710 --> 00:03:53,330 So x equals to a plus b. 55 00:03:56,080 --> 00:03:59,750 So r sub A comma B of x is the number of ways 56 00:03:59,750 --> 00:04:04,040 I can write x as a plus b, where a comes from big A, little b 57 00:04:04,040 --> 00:04:09,110 comes from big B. Then, reinterpreting the formula 58 00:04:09,110 --> 00:04:12,590 up there, we see that the additive energy between two 59 00:04:12,590 --> 00:04:18,758 sets A and B is simply the sum of the squares of A-- r sub A 60 00:04:18,758 --> 00:04:26,630 comma B. As x ranges over all elements of the group, 61 00:04:26,630 --> 00:04:34,580 we only need to take x in the sumset A plus B. 62 00:04:34,580 --> 00:04:38,780 So the basic question, like when we discussed 63 00:04:38,780 --> 00:04:41,480 additive combinatorics, in the sense of when we discussed sets 64 00:04:41,480 --> 00:04:44,750 of small doubling, there we asked, 65 00:04:44,750 --> 00:04:49,910 if you have a set A of a certain size, how big can a plus a be? 66 00:04:49,910 --> 00:04:51,350 Here, let's ask the same. 67 00:04:51,350 --> 00:04:55,490 If I give you set A of a certain size, how big or how small 68 00:04:55,490 --> 00:05:00,070 can the additive energy of the set be? 69 00:05:00,070 --> 00:05:02,440 What's the most number of possible number 70 00:05:02,440 --> 00:05:03,520 of additive quadruples. 71 00:05:03,520 --> 00:05:07,760 What's the least possible number of additive quadruples? 72 00:05:07,760 --> 00:05:09,950 There's some trivial bounds, just 73 00:05:09,950 --> 00:05:12,550 like in the case of sumsets. 74 00:05:17,950 --> 00:05:19,290 So what are some trivial bounds? 75 00:05:23,660 --> 00:05:29,330 On one hand, by taking a1 equal to a2, and b2 equal to b2, 76 00:05:29,330 --> 00:05:33,620 we see that the energy is always at least the square 77 00:05:33,620 --> 00:05:37,970 of the size of A. On the other hand, 78 00:05:37,970 --> 00:05:40,300 if I fix three of the four elements, 79 00:05:40,300 --> 00:05:42,750 then the fourth element is determined. 80 00:05:42,750 --> 00:05:49,030 So the upper bound is cube of the size of A. 81 00:05:49,030 --> 00:05:51,430 And you convince yourself that, except up 82 00:05:51,430 --> 00:05:54,340 to maybe a constant factors, this 83 00:05:54,340 --> 00:05:58,300 is the best possible general upper and lower bound. 84 00:05:58,300 --> 00:06:01,510 Similar situation with sumsets, where you have lower bound 85 00:06:01,510 --> 00:06:04,550 linear, upper bound quadratic. 86 00:06:04,550 --> 00:06:07,948 Which is the side with additive structure? 87 00:06:07,948 --> 00:06:10,250 So if you have lots of additive structure, 88 00:06:10,250 --> 00:06:12,650 you have high energy. 89 00:06:12,650 --> 00:06:16,930 So this range is when you have lots of additive structure. 90 00:06:16,930 --> 00:06:19,360 And we would like to understand, what can you 91 00:06:19,360 --> 00:06:23,290 say about a set with high additive energy? 92 00:06:27,460 --> 00:06:32,030 Well, what are some examples of sets with high additive energy? 93 00:06:32,030 --> 00:06:34,450 It turns out that if you have a set that 94 00:06:34,450 --> 00:06:39,640 has small doubling, then, automatically, 95 00:06:39,640 --> 00:06:42,010 it implies large additive energy. 96 00:06:49,030 --> 00:06:54,740 So, in particular, intervals, or GAPs, or a large subset 97 00:06:54,740 --> 00:06:57,480 of GAPs, or all these examples that we saw-- in fact, 98 00:06:57,480 --> 00:07:00,590 these are all the examples coming from Freiman's theorem. 99 00:07:00,590 --> 00:07:01,790 Also, arbitrary groups. 100 00:07:01,790 --> 00:07:02,840 You can have subgroups. 101 00:07:02,840 --> 00:07:05,630 And so all of these examples have large additive energy. 102 00:07:08,490 --> 00:07:10,490 So let me-- I'll you the proof just in a second. 103 00:07:10,490 --> 00:07:11,740 It's not hard. 104 00:07:11,740 --> 00:07:14,330 But the real question is, what about the converse? 105 00:07:14,330 --> 00:07:17,500 So can you see much in the reverse direction? 106 00:07:17,500 --> 00:07:20,670 But, first, let me show you this claim that small doubling 107 00:07:20,670 --> 00:07:23,690 implies large additive energy. 108 00:07:23,690 --> 00:07:28,310 Well, if you have small doubling, if a plus A is size, 109 00:07:28,310 --> 00:07:33,940 at most, k times the size of A, then 110 00:07:33,940 --> 00:07:37,030 it turns out the additive energy of A 111 00:07:37,030 --> 00:07:42,080 is at least the maximum possible, 112 00:07:42,080 --> 00:07:46,370 which is A cubed divided by k. 113 00:07:46,370 --> 00:07:49,480 So that's within a constant factor of the maximum. 114 00:07:49,480 --> 00:07:50,450 It's pretty large. 115 00:07:50,450 --> 00:07:54,900 If you have small doubling, then large additive energy. 116 00:07:54,900 --> 00:07:57,580 So let's see the proof. 117 00:07:57,580 --> 00:07:59,440 So you can often tell how hard a proof 118 00:07:59,440 --> 00:08:02,320 is by how simple the statement is, although that's not always 119 00:08:02,320 --> 00:08:05,080 the case, as we've seen with some of our theorems, 120 00:08:05,080 --> 00:08:07,660 like Plunnecke's inequality. 121 00:08:07,660 --> 00:08:09,990 But in this case, it turns out to be fairly simple. 122 00:08:09,990 --> 00:08:20,620 So we see that r sub A comma A is supported on A plus A. 123 00:08:20,620 --> 00:08:26,250 So we use Cauchy-Schwarz to write-- 124 00:08:26,250 --> 00:08:33,419 so, first, we write additive energy in terms of the sum 125 00:08:33,419 --> 00:08:35,460 of the squares of these r's. 126 00:08:38,260 --> 00:08:46,670 And now, by Cauchy-Schwarz, we find that you can replace 127 00:08:46,670 --> 00:08:50,510 the sum of the squared r's by the sum of the r's. 128 00:08:50,510 --> 00:08:55,370 But now the key point here is that we take out 129 00:08:55,370 --> 00:08:59,090 this factor coming from Cauchy-Schwarz, which is only 130 00:08:59,090 --> 00:09:03,560 A plus A. So if the support size is small, we gain in this step. 131 00:09:06,310 --> 00:09:11,110 But what is the sum of r's? 132 00:09:11,110 --> 00:09:13,210 I mean, r of x is just number of ways 133 00:09:13,210 --> 00:09:16,320 to write x as little a1 plus little ab-- 134 00:09:16,320 --> 00:09:17,800 little a2. 135 00:09:17,800 --> 00:09:23,500 So if I sum over all x, I'm just looking at different two ways-- 136 00:09:23,500 --> 00:09:28,810 we're just looking at ways of picking an ordered pair from A. 137 00:09:28,810 --> 00:09:34,210 So this last expression is equal to the size of A 138 00:09:34,210 --> 00:09:39,980 to power 4 divided by A plus A. And now we 139 00:09:39,980 --> 00:09:43,430 use that A has small doubling to conclude 140 00:09:43,430 --> 00:09:47,700 that the final quantity is at least A cubed divided by k. 141 00:09:53,820 --> 00:09:58,512 So we see small doubling implies large additive energy. 142 00:09:58,512 --> 00:09:59,720 And this kind of makes sense. 143 00:09:59,720 --> 00:10:03,830 If your set doesn't expand, then there 144 00:10:03,830 --> 00:10:08,180 are many collisions of sums. 145 00:10:08,180 --> 00:10:11,110 And so you must have lots of solutions to that equation 146 00:10:11,110 --> 00:10:13,220 up there. 147 00:10:13,220 --> 00:10:15,230 But what about the converse? 148 00:10:15,230 --> 00:10:18,440 If I give you a set with large additive energy, 149 00:10:18,440 --> 00:10:21,170 must it necessarily have small doubling? 150 00:10:24,922 --> 00:10:27,015 Oh. 151 00:10:27,015 --> 00:10:28,140 Let me show you an example. 152 00:10:30,760 --> 00:10:38,320 So, well-- so a large additive energy, 153 00:10:38,320 --> 00:10:45,070 does it imply small doubling? 154 00:10:47,680 --> 00:10:50,730 So consider the following example, where 155 00:10:50,730 --> 00:10:53,610 you take a set A which is a combination, 156 00:10:53,610 --> 00:10:56,970 is a union of a set with small doubling 157 00:10:56,970 --> 00:11:03,874 plus a bunch of elements without additive structure. 158 00:11:10,170 --> 00:11:12,230 So I take a set with small doubling 159 00:11:12,230 --> 00:11:16,940 plus a bunch of elements without additive structure. 160 00:11:16,940 --> 00:11:21,190 Then it has large additive energy, just coming 161 00:11:21,190 --> 00:11:25,120 from this interval itself. 162 00:11:25,120 --> 00:11:31,990 So the energy of A is order N cubed. 163 00:11:31,990 --> 00:11:34,630 N is the number of elements. 164 00:11:34,630 --> 00:11:38,320 What about A plus A? 165 00:11:38,320 --> 00:11:41,650 Well, for A plus A, this part doesn't-- 166 00:11:41,650 --> 00:11:43,120 that's the part that contributes, 167 00:11:43,120 --> 00:11:48,270 or the part of this A without additive structure. 168 00:11:48,270 --> 00:11:52,250 And we see that the size of A plus A 169 00:11:52,250 --> 00:11:56,830 is quadratic in the size of A. 170 00:11:56,830 --> 00:12:00,470 So, unfortunately, the converse fails. 171 00:12:00,470 --> 00:12:05,480 So you can have sets that have large additive energy and also 172 00:12:05,480 --> 00:12:07,590 large doubling. 173 00:12:07,590 --> 00:12:11,100 But, you see, the reason why this has large additive energy 174 00:12:11,100 --> 00:12:14,110 is because there is a very highly structured additively 175 00:12:14,110 --> 00:12:15,990 structured piece of it. 176 00:12:15,990 --> 00:12:20,290 And, somehow, we want to forget about this extra garbage. 177 00:12:20,290 --> 00:12:24,790 And that's part of the reason why the converse is not true. 178 00:12:24,790 --> 00:12:26,820 So we would like a statement that 179 00:12:26,820 --> 00:12:30,150 says that if you have large additive energy, then 180 00:12:30,150 --> 00:12:33,750 it must come from some highly structured piece that 181 00:12:33,750 --> 00:12:36,380 has small doubling. 182 00:12:36,380 --> 00:12:38,260 And that is true, and that's the content 183 00:12:38,260 --> 00:12:40,540 of the Balog-Szemeredi-Gowers theorem, which 184 00:12:40,540 --> 00:12:43,660 is the main topic today. 185 00:12:43,660 --> 00:12:51,890 So the Balog-Szemeredi-Gowers theorem says that if you have 186 00:12:51,890 --> 00:12:52,760 a set-- 187 00:12:52,760 --> 00:12:56,060 so we're working always in some arbitrary Abelian group. 188 00:12:56,060 --> 00:13:00,650 If you have a set with large energy, 189 00:13:00,650 --> 00:13:07,280 then there exists some subset A prime of A such 190 00:13:07,280 --> 00:13:13,460 that A prime is a fairly large proportion of A. 191 00:13:13,460 --> 00:13:18,260 And here, by large I mean up to polynomial changes in the error 192 00:13:18,260 --> 00:13:18,920 parameters. 193 00:13:22,010 --> 00:13:28,310 So this A prime is such that A prime has small doubling. 194 00:13:34,340 --> 00:13:36,760 If you have large additive energy, 195 00:13:36,760 --> 00:13:40,960 then I can pick out a large piece with small doubling 196 00:13:40,960 --> 00:13:44,200 constant, and I only loose a polynomial 197 00:13:44,200 --> 00:13:45,775 in the error factors. 198 00:13:48,200 --> 00:13:50,075 So that's the Balog-Szemeredi-Gowers theorem, 199 00:13:50,075 --> 00:13:56,420 and it describes this example up here. 200 00:13:56,420 --> 00:13:58,058 Any questions about the statement? 201 00:14:01,200 --> 00:14:04,970 So what I will actually show you is a slight variant, actually 202 00:14:04,970 --> 00:14:08,782 a more general statement, where, instead of having one set, 203 00:14:08,782 --> 00:14:09,990 we're going to have two sets. 204 00:14:12,720 --> 00:14:16,880 So here's Balog-Szemeredi-Gowers theorem version 205 00:14:16,880 --> 00:14:25,460 2, where now we have two sets. 206 00:14:25,460 --> 00:14:27,050 Again, A and B are-- 207 00:14:27,050 --> 00:14:28,640 I'm not going to write any-- 208 00:14:28,640 --> 00:14:29,690 I'm not going to write it in this lecture, 209 00:14:29,690 --> 00:14:32,210 but A and B are always subsets of some arbitrary Abelian 210 00:14:32,210 --> 00:14:32,710 group. 211 00:14:32,710 --> 00:14:34,790 So A and B both have size of, at most, 212 00:14:34,790 --> 00:14:40,325 n, and the energy between A and B is large. 213 00:14:44,670 --> 00:14:53,580 Then there exists a subset A prime of A, B prime of B such 214 00:14:53,580 --> 00:14:59,820 that both A prime and B prime are 215 00:14:59,820 --> 00:15:05,250 large fractions of their parent set, 216 00:15:05,250 --> 00:15:14,070 and such that A prime plus B prime is not 217 00:15:14,070 --> 00:15:15,790 too much bigger than n. 218 00:15:21,170 --> 00:15:24,150 It's not so obvious why the second version 219 00:15:24,150 --> 00:15:26,020 implies the first version. 220 00:15:26,020 --> 00:15:29,030 So you can say, well, take A and B to be the same. 221 00:15:29,030 --> 00:15:31,580 But then the conclusion gives you 222 00:15:31,580 --> 00:15:36,200 possibly two different subsets, A prime and B prime. 223 00:15:36,200 --> 00:15:39,980 But the first version, I only want one subset 224 00:15:39,980 --> 00:15:43,320 that has small doubling. 225 00:15:43,320 --> 00:15:45,270 So, fortunately, the second version 226 00:15:45,270 --> 00:15:47,782 does imply the first version. 227 00:15:47,782 --> 00:15:48,490 So let's see why. 228 00:15:54,020 --> 00:15:58,610 The second version implies the first version because, if we-- 229 00:16:03,090 --> 00:16:06,350 so there's a tool that we introduced 230 00:16:06,350 --> 00:16:08,630 early on when we discussed Freiman's theorem, 231 00:16:08,630 --> 00:16:14,390 and this is the Ruzsa triangle inequality. 232 00:16:14,390 --> 00:16:16,410 So the spirit of Ruzsa triangle inequality 233 00:16:16,410 --> 00:16:19,680 is it allows you to relate, to sort of go 234 00:16:19,680 --> 00:16:23,010 back and forth between different sumsets in different sets. 235 00:16:23,010 --> 00:16:31,250 So by Ruzsa triangle inequality, if we apply the second version 236 00:16:31,250 --> 00:16:34,535 with A equals to B, then-- 237 00:16:37,750 --> 00:16:40,290 and we pick out this A prime and B prime, 238 00:16:40,290 --> 00:16:43,050 then we see that A prime plus A prime 239 00:16:43,050 --> 00:16:54,370 is, at most, A prime plus B prime squared over B prime. 240 00:16:54,370 --> 00:16:56,290 Well, actually, this uses the-- 241 00:16:56,290 --> 00:16:58,420 vice versa it uses a slightly stronger version 242 00:16:58,420 --> 00:17:01,910 that we had to use Plunnecke-Ruzsa key lemma 243 00:17:01,910 --> 00:17:02,650 to prove. 244 00:17:02,650 --> 00:17:04,089 But you can come up-- 245 00:17:04,089 --> 00:17:06,730 I mean, if you don't care about the precise loss 246 00:17:06,730 --> 00:17:09,040 in the polynomial factors, you can also 247 00:17:09,040 --> 00:17:10,780 use the basic Ruzsa triangle inequality 248 00:17:10,780 --> 00:17:13,270 to deduce a similar statement. 249 00:17:13,270 --> 00:17:14,920 This is easier to deduce. 250 00:17:14,920 --> 00:17:16,300 So you have that. 251 00:17:16,300 --> 00:17:19,990 And now, the second version tells you 252 00:17:19,990 --> 00:17:24,150 that the numerator is, at most, poly kn, 253 00:17:24,150 --> 00:17:30,340 and the denominator is, at most-- at least, 254 00:17:30,340 --> 00:17:33,170 n divided by poly k. 255 00:17:33,170 --> 00:17:40,440 Remember, over here, to get this hypothesis, we automatically 256 00:17:40,440 --> 00:17:45,830 have that the size of A and B are not 257 00:17:45,830 --> 00:17:47,580 two much smaller than n. 258 00:17:47,580 --> 00:17:50,700 Or else this cannot be true. 259 00:17:50,700 --> 00:17:58,200 So putting all these estimates together, we get that. 260 00:17:58,200 --> 00:18:02,705 So these two versions, they are equivalent to each other. 261 00:18:02,705 --> 00:18:04,080 Second version implies the first. 262 00:18:04,080 --> 00:18:06,240 The second one is stronger. 263 00:18:06,240 --> 00:18:08,292 The first one is slightly more useful. 264 00:18:08,292 --> 00:18:09,750 They're not necessarily equivalent, 265 00:18:09,750 --> 00:18:13,190 but the second one is stronger. 266 00:18:13,190 --> 00:18:16,370 Any questions? 267 00:18:16,370 --> 00:18:17,847 All right. 268 00:18:17,847 --> 00:18:19,680 So this is a Balog-Szemeredi-Gowers theorem. 269 00:18:19,680 --> 00:18:21,570 So the content of today's lecture 270 00:18:21,570 --> 00:18:24,930 is to show you how to prove this theorem. 271 00:18:24,930 --> 00:18:26,940 A remark about the naming of this theorem. 272 00:18:26,940 --> 00:18:29,100 So you might notice that these three letters do not 273 00:18:29,100 --> 00:18:31,738 coming in alphabetical order. 274 00:18:31,738 --> 00:18:33,780 And the reason is that this theorem was initially 275 00:18:33,780 --> 00:18:37,380 approved by Balog and Szemeredi, but using 276 00:18:37,380 --> 00:18:40,890 a more involved method that didn't 277 00:18:40,890 --> 00:18:43,470 give polynomial high bounds. 278 00:18:43,470 --> 00:18:47,190 And Gowers, in his proof of Szemeredi's theorem, 279 00:18:47,190 --> 00:18:49,530 his new proof of Szemeredi's theorem with good bounds, 280 00:18:49,530 --> 00:18:51,030 he required-- 281 00:18:51,030 --> 00:18:52,470 well, he looked into this theorem 282 00:18:52,470 --> 00:18:54,930 and gave a new proof that resulted 283 00:18:54,930 --> 00:18:56,990 in this polynomial type bounds. 284 00:18:56,990 --> 00:18:59,700 And it is that idea that we're going to see today. 285 00:19:09,567 --> 00:19:11,650 So this course is called graph theory and additive 286 00:19:11,650 --> 00:19:12,850 combinatorics. 287 00:19:12,850 --> 00:19:15,490 And the last two topics of this course-- 288 00:19:15,490 --> 00:19:17,680 today being Balog-Szemeredi-Gowers, 289 00:19:17,680 --> 00:19:20,740 and tomorrow we're going to see sum-product problem-- 290 00:19:20,740 --> 00:19:23,500 are both great examples of problems 291 00:19:23,500 --> 00:19:26,740 in additive combinatorics where tools from graph theory 292 00:19:26,740 --> 00:19:29,960 play an important role in their solutions. 293 00:19:29,960 --> 00:19:33,910 So it's a nice combination of the subject where we see 294 00:19:33,910 --> 00:19:36,350 both topics at the same time. 295 00:19:36,350 --> 00:19:39,190 So I want to show you the proof of Balog-Szemeredi-Gowers, 296 00:19:39,190 --> 00:19:41,890 and the proof goes via a graph analog. 297 00:19:41,890 --> 00:19:44,860 So I'm going to state for you a graphical version 298 00:19:44,860 --> 00:19:48,960 of the Balog-Szemeredi-Gowers theorem. 299 00:19:48,960 --> 00:19:50,520 And it goes like this. 300 00:19:50,520 --> 00:20:02,860 If G is a bipartite graph between vertex sets A and B-- 301 00:20:02,860 --> 00:20:06,140 and here A and B are still subsets of the Abelian group-- 302 00:20:17,740 --> 00:20:24,910 we define this restricted sumset, A plus sub G of B, 303 00:20:24,910 --> 00:20:33,820 to be the set of sums where I'm only 304 00:20:33,820 --> 00:20:40,256 taking sums across edges in g. 305 00:20:44,540 --> 00:20:47,500 So, in particular, if G is the complete bipartite graph, 306 00:20:47,500 --> 00:20:50,100 then this is the usual sumset. 307 00:20:50,100 --> 00:20:54,390 But now I may allow G to be a subset 308 00:20:54,390 --> 00:20:56,220 of the complete bipartite graph. 309 00:20:56,220 --> 00:20:59,430 So only taking some but not all of the-- 310 00:20:59,430 --> 00:21:01,920 only taking-- yes, some of this sums but not all of them. 311 00:21:04,600 --> 00:21:12,940 The graphical version of Balog-Szemeredi-Gowers 312 00:21:12,940 --> 00:21:23,290 says that if you have A and B be subsets of an Abelian group, 313 00:21:23,290 --> 00:21:27,610 both having size, at most, n, and G 314 00:21:27,610 --> 00:21:35,770 is a bipartite graph between A and B, 315 00:21:35,770 --> 00:21:42,460 such that G has lots of edges, has at least n squared 316 00:21:42,460 --> 00:21:43,570 over k edges. 317 00:21:47,290 --> 00:21:56,090 If the restricted sumset between A and B is small-- 318 00:21:56,090 --> 00:22:02,470 So here we're not looking at all the sums but a large fraction 319 00:22:02,470 --> 00:22:04,540 of the possible pairwise sums. 320 00:22:04,540 --> 00:22:07,040 If that sumset has small size, this 321 00:22:07,040 --> 00:22:10,210 is kind of like a restricted doubling constant. 322 00:22:10,210 --> 00:22:16,090 Then there exists A prime, subset 323 00:22:16,090 --> 00:22:26,530 of A, B prime, subset of B, with A prime and B prime 324 00:22:26,530 --> 00:22:32,830 both fairly large fractions of their parent set, 325 00:22:32,830 --> 00:22:36,970 and such that the unrestricted sumset between A prime and B 326 00:22:36,970 --> 00:22:40,270 prime is not too large. 327 00:22:48,020 --> 00:22:50,390 So let me say it again. 328 00:22:50,390 --> 00:22:52,760 So we have a fairly dense-- 329 00:22:52,760 --> 00:22:55,180 so a constant fraction edge density, 330 00:22:55,180 --> 00:22:59,480 a fairly dense bipartite graph between A and B. A and B 331 00:22:59,480 --> 00:23:02,660 are subsets of the Abelian group. 332 00:23:02,660 --> 00:23:08,700 Then-- and such that the restricted sumset is small. 333 00:23:08,700 --> 00:23:14,840 Then I can restrict A and B to subsets, fairly large subsets, 334 00:23:14,840 --> 00:23:19,070 so that the complete sumset between the subsets A prime 335 00:23:19,070 --> 00:23:21,170 and B prime is small. 336 00:23:26,180 --> 00:23:29,720 Let me show you why the graphical version of BSG 337 00:23:29,720 --> 00:23:33,270 implies the version of BSG I stated up there. 338 00:23:50,630 --> 00:23:54,511 But, so why do we care about this graphical version? 339 00:23:54,511 --> 00:23:59,530 Well, suppose we-- so we have all of these hypotheses. 340 00:23:59,530 --> 00:24:08,030 Let's write-- so we have all of those hypotheses up there. 341 00:24:08,030 --> 00:24:11,938 So let's write r to be r sub A comma 342 00:24:11,938 --> 00:24:16,930 B, so I don't have to carry the subscripts all around. 343 00:24:16,930 --> 00:24:17,920 What do you think-- 344 00:24:17,920 --> 00:24:20,760 so I start with A and B up there, 345 00:24:20,760 --> 00:24:24,660 and I need to construct that graph G. 346 00:24:24,660 --> 00:24:26,340 So what should we choose as our graph? 347 00:24:30,940 --> 00:24:34,460 Let's consider the popular sums. 348 00:24:40,370 --> 00:24:44,900 So the popular sums are going to be elements 349 00:24:44,900 --> 00:24:50,390 in the complete sumset such that it 350 00:24:50,390 --> 00:24:54,530 is represented as a sum in many different ways. 351 00:25:02,760 --> 00:25:07,840 And we're going to take edges that correspond 352 00:25:07,840 --> 00:25:12,760 to these popular sums. 353 00:25:12,760 --> 00:25:26,670 So let's consider bipartite graph G such 354 00:25:26,670 --> 00:25:39,770 that A comma B is an edge if and only A plus B is a popular sum. 355 00:25:46,900 --> 00:25:50,390 So let's verify some of the hypotheses. 356 00:25:50,390 --> 00:25:53,110 So we're going to assume graph BSG, 357 00:25:53,110 --> 00:25:57,340 and let's verify the hypothesis in graph BSG. 358 00:25:57,340 --> 00:25:59,500 On one hand, because each element of S 359 00:25:59,500 --> 00:26:05,170 is a popular sum, if we consider its multiplicity, 360 00:26:05,170 --> 00:26:13,750 we find that the size of S multiplied by n over 2k, lower 361 00:26:13,750 --> 00:26:19,245 bound be size of A times the size of B. 362 00:26:19,245 --> 00:26:26,750 So if you think about all the different pairs in A and B, 363 00:26:26,750 --> 00:26:31,780 each sum here, each popular sum, contributes this many times 364 00:26:31,780 --> 00:26:36,330 to this A cross B. 365 00:26:36,330 --> 00:26:41,370 So, as a result, because size of A and size of B 366 00:26:41,370 --> 00:26:44,880 are both, at most, n, we find that the size of S 367 00:26:44,880 --> 00:26:46,180 is, at most, 2kn. 368 00:26:49,382 --> 00:26:51,780 And if you think about what G is, 369 00:26:51,780 --> 00:27:00,840 then this implies also that the restricted sumset of A and B 370 00:27:00,840 --> 00:27:02,310 across this graph G-- 371 00:27:02,310 --> 00:27:04,080 which only requires the popular sums. 372 00:27:04,080 --> 00:27:10,718 So the restricted sumset is precisely the popular sums. 373 00:27:10,718 --> 00:27:13,660 So restricted sumset is not too large. 374 00:27:18,930 --> 00:27:19,900 OK, good. 375 00:27:19,900 --> 00:27:24,020 So we got one of the conditions, that the restricted sumset 376 00:27:24,020 --> 00:27:25,910 is not too large. 377 00:27:25,910 --> 00:27:30,150 And now we want to show that this graph has lots of edges. 378 00:27:30,150 --> 00:27:31,360 It has lots of edges. 379 00:27:36,210 --> 00:27:39,120 And here's where we would need to use the hypothesis that, 380 00:27:39,120 --> 00:27:44,166 between A and B, originally there is large additive energy. 381 00:27:44,166 --> 00:27:49,980 And the point here is that these unpopular sums cannot 382 00:27:49,980 --> 00:27:55,140 contribute very much to the additive energy in total, 383 00:27:55,140 --> 00:27:58,240 because each one of them is unpopular. 384 00:27:58,240 --> 00:28:01,960 So the dominant contributions to the additive energy 385 00:28:01,960 --> 00:28:05,280 are going to come from the popular sums, 386 00:28:05,280 --> 00:28:08,910 and we're going to use that to show that G has lots of edges. 387 00:28:12,660 --> 00:28:16,980 So let's lower bound the number of edges of G by first showing 388 00:28:16,980 --> 00:28:18,820 that-- 389 00:28:18,820 --> 00:28:35,030 so we'll show that the unpopular sums contribute very little 390 00:28:35,030 --> 00:28:42,210 to the additive energy between A and B. Indeed, 391 00:28:42,210 --> 00:28:49,860 the sums of the squares of the r's, if for x 392 00:28:49,860 --> 00:28:58,130 not in popular sums, is upper bounded by-- 393 00:28:58,130 --> 00:29:01,170 well, claim that it is upper bounded 394 00:29:01,170 --> 00:29:09,910 by the following quantity, that n over 2k times n squared. 395 00:29:14,520 --> 00:29:19,470 Because I can take out one factor r, 396 00:29:19,470 --> 00:29:24,180 upper bound by this number, just by definition, 397 00:29:24,180 --> 00:29:27,820 and the sums of the r's is n squared. 398 00:29:32,540 --> 00:29:39,150 So you have this additive energy between A and B. 399 00:29:39,150 --> 00:29:41,190 I know that it is large by hypothesis. 400 00:29:45,940 --> 00:29:48,310 Whereas, I also know that I can write it 401 00:29:48,310 --> 00:29:52,570 as a sum of the squares of the r's, which 402 00:29:52,570 --> 00:30:00,550 I can break into the popular contributions 403 00:30:00,550 --> 00:30:02,530 and the unpopular contributions. 404 00:30:05,533 --> 00:30:06,950 And, hopefully, this should all be 405 00:30:06,950 --> 00:30:09,470 somewhat reminiscent of basically all these proofs 406 00:30:09,470 --> 00:30:11,200 that we did so far in this course, 407 00:30:11,200 --> 00:30:14,750 where we separate a sum into the dominant terms 408 00:30:14,750 --> 00:30:16,510 and the minor terms. 409 00:30:16,510 --> 00:30:20,010 This came up in Fourier analysis in particular. 410 00:30:20,010 --> 00:30:24,320 So we do this splitting, and we upper 411 00:30:24,320 --> 00:30:28,820 bound the unpopular contributions by the estimate 412 00:30:28,820 --> 00:30:29,890 from just now. 413 00:30:36,810 --> 00:30:40,800 So, as a result, bringing this small error term, 414 00:30:40,800 --> 00:30:44,610 it doesn't cancel much of the energy. 415 00:30:44,610 --> 00:30:52,350 So we still have a lower bound on the sum of the squares 416 00:30:52,350 --> 00:30:56,590 of the r's in the popular sums. 417 00:31:00,010 --> 00:31:04,240 But I can also give a fairly trivial upper bound to a single 418 00:31:04,240 --> 00:31:08,050 r, namely it cannot be bigger than n. 419 00:31:16,220 --> 00:31:23,860 And so the number of edges of G-- 420 00:31:23,860 --> 00:31:27,480 so what's the number of edges of G? 421 00:31:27,480 --> 00:31:28,260 Look at that. 422 00:31:28,260 --> 00:31:33,470 Each x here contributes rx many edges. 423 00:31:33,470 --> 00:31:36,750 So the number of edges of G is simply the sums of these rx's. 424 00:31:41,310 --> 00:31:42,690 Which is quite large. 425 00:31:49,740 --> 00:31:56,070 So the hypothesis of graph BSG are satisfied. 426 00:31:56,070 --> 00:31:59,850 And so we can use the conclusion of graph BSG, which 427 00:31:59,850 --> 00:32:02,730 is the conclusion that we're looking for in BSG. 428 00:32:11,520 --> 00:32:12,532 Any questions? 429 00:32:17,095 --> 00:32:17,595 Good. 430 00:32:17,595 --> 00:32:19,860 So the remaining task is to prove 431 00:32:19,860 --> 00:32:23,160 the graphical version of BSG. 432 00:32:23,160 --> 00:32:26,040 So let's take a quick break, and when 433 00:32:26,040 --> 00:32:30,030 we come back we'll focus on this theorem, 434 00:32:30,030 --> 00:32:35,140 and it has some nice graph theoretic arguments. 435 00:32:35,140 --> 00:32:37,430 OK, let's continue. 436 00:32:37,430 --> 00:32:42,230 We've reduced the proof of the Balog-Szemeredi-Gowers theorem 437 00:32:42,230 --> 00:32:44,540 to the following graphical result. 438 00:32:44,540 --> 00:32:46,170 Well, it's not just graphical, right? 439 00:32:46,170 --> 00:32:49,370 Still-- we're still inside some an Abelian group, 440 00:32:49,370 --> 00:32:52,570 still looking at some set in some Abelian group, 441 00:32:52,570 --> 00:32:57,140 but, certainly, now it has a graph attached to it. 442 00:32:57,140 --> 00:33:01,410 Let me show this theorem through several steps. 443 00:33:01,410 --> 00:33:04,700 First, something called a path of length 2 lemma. 444 00:33:15,502 --> 00:33:17,860 So the path of length 2 lemma, the statement 445 00:33:17,860 --> 00:33:21,340 is that you start with a graph G which 446 00:33:21,340 --> 00:33:27,130 is a bipartite graph between vertex sets A and B. 447 00:33:27,130 --> 00:33:29,050 And now A and B no longer need-- 448 00:33:29,050 --> 00:33:30,100 they're just sets. 449 00:33:30,100 --> 00:33:31,150 They're just vertex sets. 450 00:33:31,150 --> 00:33:34,550 We're not going to have sums. 451 00:33:34,550 --> 00:33:38,175 And the number of edges is at least a constant fraction 452 00:33:38,175 --> 00:33:39,175 of the maximum possible. 453 00:33:45,570 --> 00:33:48,620 Then the conclusion is that there 454 00:33:48,620 --> 00:33:55,385 exists some U, a subset of A, such that U is fairly large. 455 00:33:59,650 --> 00:34:10,199 And between most pairs of elements of U-- 456 00:34:10,199 --> 00:34:24,880 so between 1 minus epsilon fraction of pairs of U-- 457 00:34:24,880 --> 00:34:30,840 there are lots of common neighbors. 458 00:34:30,840 --> 00:34:36,650 So at least epsilon delta squared 459 00:34:36,650 --> 00:34:46,230 B over 2 common neighbors. 460 00:34:46,230 --> 00:34:58,550 So you start with this bipartite graph A and B. Lots of edges. 461 00:34:58,550 --> 00:35:01,520 And we would like to show that there 462 00:35:01,520 --> 00:35:08,840 exists a pretty large subset U such that between most pairs-- 463 00:35:08,840 --> 00:35:11,150 all but an epsilon fraction-- 464 00:35:11,150 --> 00:35:12,980 of ordered pairs-- they could be the same, 465 00:35:12,980 --> 00:35:15,350 but it doesn't really matter-- 466 00:35:15,350 --> 00:35:22,600 the number of paths of length 2 between these two vertices 467 00:35:22,600 --> 00:35:24,920 is quite large. 468 00:35:24,920 --> 00:35:28,430 So they have lots of common neighbors. 469 00:35:28,430 --> 00:35:30,440 Where have we seen something like this before? 470 00:35:30,440 --> 00:35:30,890 There's a question? 471 00:35:30,890 --> 00:35:32,694 AUDIENCE: Is there a [INAUDIBLE] epsilon? 472 00:35:32,694 --> 00:35:33,600 YUFEI ZHAO: Ah, yes. 473 00:35:33,600 --> 00:35:37,156 So for every epsilon and every delta. 474 00:35:37,156 --> 00:35:43,634 So let epsilon, delta be parameters. 475 00:35:48,080 --> 00:35:51,860 Where have we seen something like this before? 476 00:35:51,860 --> 00:35:54,620 So in a bipartite graph with lots of edges, 477 00:35:54,620 --> 00:35:59,180 I want to find a large subset of one of the parts 478 00:35:59,180 --> 00:36:02,270 so that every pair of elements, or almost 479 00:36:02,270 --> 00:36:05,842 every pair of elements, have lots of common neighbors. 480 00:36:11,254 --> 00:36:12,238 Yes. 481 00:36:12,238 --> 00:36:13,570 AUDIENCE: [INAUDIBLE]. 482 00:36:13,570 --> 00:36:15,070 YUFEI ZHAO: Dependent random choice. 483 00:36:15,070 --> 00:36:17,150 So in the very first chapter of this course, 484 00:36:17,150 --> 00:36:19,340 when we did extremal graph theory 485 00:36:19,340 --> 00:36:21,530 forbidding bipartite subgraphs, there 486 00:36:21,530 --> 00:36:26,900 was a technique for proving the extremal number, upper bounds, 487 00:36:26,900 --> 00:36:29,540 for bipartite graphs of bounded degree. 488 00:36:29,540 --> 00:36:32,810 And there we used something called dependent random choice 489 00:36:32,810 --> 00:36:35,860 that had a conclusion that was very similar flavor. 490 00:36:35,860 --> 00:36:39,848 So there, we had every pair-- so a fairly large, but not as 491 00:36:39,848 --> 00:36:41,390 large as this-- a fairly large subset 492 00:36:41,390 --> 00:36:45,640 where every pair of elements had lots of common neighbors. 493 00:36:45,640 --> 00:36:48,230 For every couple, every k couple of vertices, 494 00:36:48,230 --> 00:36:50,330 have lots of common neighbors. 495 00:36:50,330 --> 00:36:51,470 So it's very similar. 496 00:36:51,470 --> 00:36:53,960 In fact, it's the same type of technique 497 00:36:53,960 --> 00:36:56,480 that we'll use to prove this lemma over here. 498 00:37:00,390 --> 00:37:05,030 So who remembers how dependent random choice goes? 499 00:37:05,030 --> 00:37:09,120 So the idea is that we are going to choose U 500 00:37:09,120 --> 00:37:11,200 not uniformly at random. 501 00:37:11,200 --> 00:37:12,872 So that's not going to work. 502 00:37:12,872 --> 00:37:15,860 Going to choose it in a dependent random way. 503 00:37:15,860 --> 00:37:19,630 So I want elements of U to have lots of common neighbors, 504 00:37:19,630 --> 00:37:20,720 typically. 505 00:37:20,720 --> 00:37:24,950 So one way to guarantee this is to choose U to be 506 00:37:24,950 --> 00:37:28,550 a neighborhood from the right. 507 00:37:28,550 --> 00:37:33,640 So pick a random element in B and choose 508 00:37:33,640 --> 00:37:37,300 U to be its neighborhood. 509 00:37:37,300 --> 00:37:39,230 So let's do that. 510 00:37:39,230 --> 00:37:41,210 So we're going to use dependent random choice. 511 00:37:47,505 --> 00:37:49,380 See, everything in the course comes together. 512 00:37:56,000 --> 00:38:04,480 So let's pick v an element of B uniformly at random. 513 00:38:09,580 --> 00:38:15,440 And let U be the neighborhood v. So, first of all, 514 00:38:15,440 --> 00:38:19,100 by linearity of expectations, the size of U 515 00:38:19,100 --> 00:38:27,600 is at least delta of A. So because the average degree 516 00:38:27,600 --> 00:38:32,472 from the right from B is at least delta A just based 517 00:38:32,472 --> 00:38:33,430 on the number of edges. 518 00:38:36,550 --> 00:38:43,560 If you have two vertices a and a prime in A 519 00:38:43,560 --> 00:39:04,220 with a small number of common neighbors, then the size of-- 520 00:39:04,220 --> 00:39:07,020 so sorry. 521 00:39:07,020 --> 00:39:10,520 Let me-- I skipped ahead a bit. 522 00:39:10,520 --> 00:39:15,170 So if a and a prime have a small number of common neighbors, 523 00:39:15,170 --> 00:39:22,730 then the probability that a and a prime both lie in U 524 00:39:22,730 --> 00:39:25,580 should be quite small. 525 00:39:25,580 --> 00:39:28,670 Because if they both had-- 526 00:39:28,670 --> 00:39:33,040 if a and a prime have a small number of common neighbors, 527 00:39:33,040 --> 00:39:36,863 in order for a and a prime to be included in this U, 528 00:39:36,863 --> 00:39:37,780 you must have chosen-- 529 00:39:41,550 --> 00:39:44,920 so suppose this were their common neighbor. 530 00:39:44,920 --> 00:39:49,550 Then in order that a and a prime be contained in U, 531 00:39:49,550 --> 00:39:54,470 it must have chosen this v to be inside the common neighborhood 532 00:39:54,470 --> 00:39:55,300 of a and a prime. 533 00:39:57,840 --> 00:39:59,940 Which is unlikely if a and a prime 534 00:39:59,940 --> 00:40:02,940 had a small number of common neighbors. 535 00:40:02,940 --> 00:40:08,600 So this probability is, at most, epsilon delta squared over 2. 536 00:40:12,460 --> 00:40:15,520 Just think about how U is constructed. 537 00:40:15,520 --> 00:40:26,740 So if we let x be the number of a and a primes in U cross U 538 00:40:26,740 --> 00:40:34,800 with, at most, epsilon delta squared over 2 times B 539 00:40:34,800 --> 00:40:42,900 common neighbors, then, by linearity of expectations, 540 00:40:42,900 --> 00:40:46,300 the expectation of x is-- 541 00:40:46,300 --> 00:40:54,420 well, by summing up all of these probabilities of a and a prime, 542 00:40:54,420 --> 00:40:56,890 both being in U-- 543 00:40:56,890 --> 00:41:02,010 so this is, at most, epsilon delta squared 544 00:41:02,010 --> 00:41:04,740 over 2 times size of A squared. 545 00:41:08,250 --> 00:41:12,030 So, typically, at least in expectation, 546 00:41:12,030 --> 00:41:16,260 you do not expect very many pairs of elements in U 547 00:41:16,260 --> 00:41:20,510 with few common neighbors. 548 00:41:20,510 --> 00:41:22,940 But we can also turn such an estimate 549 00:41:22,940 --> 00:41:24,830 into a specific instance. 550 00:41:28,015 --> 00:41:33,840 And the way to do this is to consider the quantity size of U 551 00:41:33,840 --> 00:41:39,280 squared minus x over epsilon. 552 00:41:39,280 --> 00:41:43,070 Well, first of all, we can lower bound this quantity, 553 00:41:43,070 --> 00:41:47,630 because the size of second moment of U 554 00:41:47,630 --> 00:41:53,030 is at least the first moment of U squared. 555 00:41:53,030 --> 00:42:02,450 And we also know that the size of x in expectation 556 00:42:02,450 --> 00:42:04,830 is not very large. 557 00:42:04,830 --> 00:42:07,700 So the whole expression can be lower bounded 558 00:42:07,700 --> 00:42:15,910 by delta squared over 2 times the size of A squared. 559 00:42:25,630 --> 00:42:26,935 So this is epsilon, sorry. 560 00:42:30,120 --> 00:42:34,880 Therefore, there is some concrete instance 561 00:42:34,880 --> 00:42:39,110 of this randomness resulting in some specific U such 562 00:42:39,110 --> 00:42:41,180 that this inequality holds. 563 00:42:41,180 --> 00:42:54,330 So there exists some U such that this inequality holds. 564 00:42:54,330 --> 00:43:03,310 And, in particular, we find that the size of U is at least-- 565 00:43:03,310 --> 00:43:05,110 just forget about this minus term-- 566 00:43:05,110 --> 00:43:08,950 is at least that right-hand side, square root. 567 00:43:08,950 --> 00:43:11,500 So, in particular, the size of U is at least 568 00:43:11,500 --> 00:43:14,415 delta over 2 times the size of A. 569 00:43:14,415 --> 00:43:18,307 And, just looking at the left-hand side, which 570 00:43:18,307 --> 00:43:20,890 must be a non-negative quantity because the right-hand side is 571 00:43:20,890 --> 00:43:26,800 non-negative, we find that x is, at most, an epsilon 572 00:43:26,800 --> 00:43:29,786 fraction of U squared. 573 00:43:34,480 --> 00:43:38,730 So putting these together, we arrive 574 00:43:38,730 --> 00:43:42,280 at the path of length 2 lemma. 575 00:43:42,280 --> 00:43:43,830 So let me go through it again. 576 00:43:43,830 --> 00:43:46,100 So this is the dependent random choice method, 577 00:43:46,100 --> 00:43:50,480 where we're going to-- we want to find this U, 578 00:43:50,480 --> 00:43:52,430 where most pairs of vertices in U 579 00:43:52,430 --> 00:43:55,920 have lots of common neighbors. 580 00:43:55,920 --> 00:43:58,640 So we start from the right side. 581 00:43:58,640 --> 00:44:02,480 We start from B, pick a uniform random vertex, which 582 00:44:02,480 --> 00:44:08,598 you call v, and let U be the neighborhood of v. 583 00:44:08,598 --> 00:44:11,140 And I claim that this U, typically, should 584 00:44:11,140 --> 00:44:13,170 have the desired property. 585 00:44:13,170 --> 00:44:18,160 And the reason is that, if you have a pair of vertices 586 00:44:18,160 --> 00:44:24,030 on the left that do not have many common neighbors, 587 00:44:24,030 --> 00:44:27,360 then I claim it is highly unlikely that these two 588 00:44:27,360 --> 00:44:33,360 vertices both appear in U. Because for them to both appear 589 00:44:33,360 --> 00:44:38,310 in U, your v have been selected inside the common neighborhood 590 00:44:38,310 --> 00:44:42,390 of a and a prime, which is unlikely if a and a prime 591 00:44:42,390 --> 00:44:46,550 have few common neighbors. 592 00:44:46,550 --> 00:44:50,050 So, as a result, the expected number 593 00:44:50,050 --> 00:44:58,338 of pairs in U with small number of common neighbors is small. 594 00:44:58,338 --> 00:45:00,130 And, already, that's a very good indication 595 00:45:00,130 --> 00:45:01,020 that we're on the right track. 596 00:45:01,020 --> 00:45:02,670 And, to finish things off, we look 597 00:45:02,670 --> 00:45:07,830 at this expression, which we can lower bound by convexity. 598 00:45:07,830 --> 00:45:10,890 And we know the size of U in expectation is large. 599 00:45:10,890 --> 00:45:13,470 And, also, the size of x, that we just saw, 600 00:45:13,470 --> 00:45:17,260 is small in expectation. 601 00:45:17,260 --> 00:45:19,890 So you have this inequality over here. 602 00:45:19,890 --> 00:45:21,690 And because there's an expectation, 603 00:45:21,690 --> 00:45:25,560 it implies that there's some specific instance such that, 604 00:45:25,560 --> 00:45:28,800 without the expectation, the inequality holds. 605 00:45:28,800 --> 00:45:30,570 So take that specific instance. 606 00:45:30,570 --> 00:45:34,740 We obtain some U such that this inequality is true, 607 00:45:34,740 --> 00:45:37,800 which simultaneously implies that U is large 608 00:45:37,800 --> 00:45:40,910 and x, the number of bad pairs, is small. 609 00:45:43,796 --> 00:45:47,850 So that was dependent random choice. 610 00:45:47,850 --> 00:45:48,953 Any questions? 611 00:45:51,731 --> 00:45:54,046 All right. 612 00:45:54,046 --> 00:45:56,250 So that was the path of length 2 lemma. 613 00:45:56,250 --> 00:45:57,900 So it tells us I can take a large set 614 00:45:57,900 --> 00:46:02,590 with lots of paths of length 2 between most pairs of vertices. 615 00:46:02,590 --> 00:46:07,888 Let's upgrade this lemma to a path of length 3 lemma. 616 00:46:18,850 --> 00:46:20,640 So, in the path of length 3 lemma, 617 00:46:20,640 --> 00:46:27,378 we start with a bipartite graph, as before, between A and B. 618 00:46:27,378 --> 00:46:33,970 So G is a bipartite between A and B. 619 00:46:33,970 --> 00:46:39,230 And, as before, we have a lot of edges between A and B. 620 00:46:39,230 --> 00:46:42,690 It's the delta fraction of all possible edges. 621 00:46:42,690 --> 00:46:50,840 Then the conclusion is that there exists A prime in A and B 622 00:46:50,840 --> 00:46:57,270 prime subset of B such that A prime and B 623 00:46:57,270 --> 00:47:01,560 prime are both large fractions of their parent set. 624 00:47:08,070 --> 00:47:15,820 And now, the-- and, furthermore, every pair 625 00:47:15,820 --> 00:47:24,390 between A prime and B prime is joined 626 00:47:24,390 --> 00:47:28,895 by many paths of length 3. 627 00:47:36,820 --> 00:47:39,300 So a path of length 3 means there's 3 edges. 628 00:47:42,610 --> 00:47:50,130 And, here, this eta is basically the original error term 629 00:47:50,130 --> 00:47:51,690 up to a polynomial change. 630 00:48:00,270 --> 00:48:05,020 So starting with this bipartite graph that's fairly dense, 631 00:48:05,020 --> 00:48:08,500 the lemma tells us that we can find 632 00:48:08,500 --> 00:48:13,870 some large A prime and large B prime so 633 00:48:13,870 --> 00:48:17,440 that between every vertex in A prime and every vertex in B 634 00:48:17,440 --> 00:48:21,960 prime, there are lots of paths of length 3 between them. 635 00:48:28,530 --> 00:48:29,263 Every time. 636 00:48:33,500 --> 00:48:37,215 So we should think about all of these constants as-- 637 00:48:37,215 --> 00:48:39,830 plus you only make polynomial changes in the constants, 638 00:48:39,830 --> 00:48:42,290 we're happy. 639 00:48:42,290 --> 00:48:46,560 Here, eta is a polynomial change in the delta. 640 00:48:46,560 --> 00:48:49,130 There's a convention which I like which is not universal, 641 00:48:49,130 --> 00:48:51,860 but it's often solved, unlike this convention. 642 00:48:51,860 --> 00:48:53,930 It's the difference between the little c 643 00:48:53,930 --> 00:48:56,630 and the big C is that a little c is better 644 00:48:56,630 --> 00:48:59,440 if you make it smaller, and a big C is better-- 645 00:48:59,440 --> 00:49:02,420 I mean, it's better in the sense that if this 646 00:49:02,420 --> 00:49:04,970 is true for little c and big C, and you 647 00:49:04,970 --> 00:49:10,050 make little c smaller and big C bigger, then it is still true. 648 00:49:10,050 --> 00:49:12,570 So big C is a sufficiently large constant, 649 00:49:12,570 --> 00:49:15,436 and little c is a sufficiently small constant. 650 00:49:15,436 --> 00:49:16,422 Just a-- 651 00:49:30,740 --> 00:49:36,650 So let's see the path of length 3 lemma, see it's proof. 652 00:49:36,650 --> 00:49:39,460 We're going to use the path of length 2 lemma, 653 00:49:39,460 --> 00:49:42,070 but we need a bit of preparation first. 654 00:49:42,070 --> 00:49:46,930 So the proof has some nice ideas, but it's also-- 655 00:49:46,930 --> 00:49:50,740 some parts of it are slightly tedious, so bear with me. 656 00:49:50,740 --> 00:49:58,690 So we're going to construct a chain of subsets A-- 657 00:49:58,690 --> 00:50:02,717 inside A. So A1, A2, A3. 658 00:50:02,717 --> 00:50:04,800 And this is just because there's a few cleaning up 659 00:50:04,800 --> 00:50:08,100 steps that need to be done. 660 00:50:08,100 --> 00:50:19,880 Let's call two vertices in A friendly 661 00:50:19,880 --> 00:50:23,980 if they have lots of common neighbors. 662 00:50:23,980 --> 00:50:25,430 And, precisely, we're going to say 663 00:50:25,430 --> 00:50:28,750 they're friendly if they have more than delta 664 00:50:28,750 --> 00:50:34,770 squared over 80 times the size of B common neighbors. 665 00:50:41,770 --> 00:50:46,590 Let me construct this sequence of subsets as follows. 666 00:50:46,590 --> 00:50:53,870 First, let A1 be all the vertices in A 667 00:50:53,870 --> 00:50:58,200 with degree not too small. 668 00:50:58,200 --> 00:51:02,500 So this is in preparation. 669 00:51:02,500 --> 00:51:05,950 So it will make our life quite a bit easier later on. 670 00:51:05,950 --> 00:51:09,100 Let's just trim all the really small degree vertices 671 00:51:09,100 --> 00:51:11,850 so that we don't have to think about them. 672 00:51:11,850 --> 00:51:15,870 So you trim all the small degree vertices. 673 00:51:15,870 --> 00:51:20,420 And think about how many edges you trim. 674 00:51:20,420 --> 00:51:25,700 You cannot trim so many edges, because each time you trim such 675 00:51:25,700 --> 00:51:30,100 a vertex, you only get rid of a small number of edges. 676 00:51:30,100 --> 00:51:34,300 So, in the end, at least half of the original set of edges 677 00:51:34,300 --> 00:51:36,690 must remain. 678 00:51:36,690 --> 00:51:43,320 And, as a result, the size of A1 is at least 679 00:51:43,320 --> 00:51:50,590 a delta over 2 fraction of the original vertex set. 680 00:51:50,590 --> 00:51:53,050 Otherwise, you could not have contained half 681 00:51:53,050 --> 00:51:57,460 of the original set of edges. 682 00:51:57,460 --> 00:51:59,380 So this is the first trimming step. 683 00:52:02,180 --> 00:52:07,390 So we got rid of some edges, but we got rid of fewer than half 684 00:52:07,390 --> 00:52:10,940 of the original edges. 685 00:52:10,940 --> 00:52:15,720 And because now you have a minimum degree on A1, 686 00:52:15,720 --> 00:52:18,670 the number of edges between A1 and B 687 00:52:18,670 --> 00:52:22,660 is quite large, still quite large. 688 00:52:22,660 --> 00:52:27,040 So think about passing down to A1 now. 689 00:52:27,040 --> 00:52:31,530 In the second step, we are going to apply the path of length 2 690 00:52:31,530 --> 00:52:34,480 lemma to this A1. 691 00:52:34,480 --> 00:52:41,880 So A2 is going to be constructed from-- 692 00:52:41,880 --> 00:52:50,170 so using the path of length 2 lemma, 693 00:52:50,170 --> 00:52:56,620 specifically with parameter epsilon being delta over 10. 694 00:52:56,620 --> 00:52:59,440 Although, remember, now the density of the graph 695 00:52:59,440 --> 00:53:01,760 went from delta to delta over 2. 696 00:53:01,760 --> 00:53:04,245 Again, if you don't care about the specific numbers, 697 00:53:04,245 --> 00:53:05,620 they're all polynomials in delta. 698 00:53:05,620 --> 00:53:06,703 So don't worry about them. 699 00:53:06,703 --> 00:53:08,590 Everything's poly delta. 700 00:53:08,590 --> 00:53:11,860 So we're going to apply the path of length 2 lemma 701 00:53:11,860 --> 00:53:16,240 to find this subset A2. 702 00:53:16,240 --> 00:53:25,660 And it has the property that A2 is quite large, 703 00:53:25,660 --> 00:53:45,320 and all but a small fraction of pairs in A2 are friendly. 704 00:53:54,580 --> 00:53:59,540 So we passed down to, first, trimming small degree vertices, 705 00:53:59,540 --> 00:54:02,120 and then passed down further to A2, 706 00:54:02,120 --> 00:54:06,020 where all but a small fraction of elements in A2, 707 00:54:06,020 --> 00:54:08,563 or all but a small fraction of the pairs 708 00:54:08,563 --> 00:54:10,230 are friendly to each other, meaning they 709 00:54:10,230 --> 00:54:11,770 have lots of common neighbors. 710 00:54:15,020 --> 00:54:16,870 And now let's look at the other side. 711 00:54:16,870 --> 00:54:21,610 Let's look at B. So we're in this situation 712 00:54:21,610 --> 00:54:24,520 now where you have-- 713 00:54:27,760 --> 00:54:31,390 so we're now in a situation where you've passed down 714 00:54:31,390 --> 00:54:42,630 to A2 and in B, where, because of what we did initially, 715 00:54:42,630 --> 00:54:47,410 every vertex in here have large degree. 716 00:54:47,410 --> 00:54:53,830 So there's this minimum degree condition 717 00:54:53,830 --> 00:54:57,190 from every vertex on the left. 718 00:54:57,190 --> 00:55:00,250 So the average degree is still very high. 719 00:55:02,960 --> 00:55:07,020 As a result, the average degree from B 720 00:55:07,020 --> 00:55:09,280 is going to be quite high. 721 00:55:09,280 --> 00:55:13,720 So let's focus on the B side and pick out vertices in B 722 00:55:13,720 --> 00:55:16,570 that have high degree. 723 00:55:16,570 --> 00:55:23,390 So let's B1 denote vertices in B such 724 00:55:23,390 --> 00:55:30,530 that the degree from B to A2 is at least half 725 00:55:30,530 --> 00:55:33,390 of what you expect based on average degree. 726 00:55:37,850 --> 00:55:41,390 And, as before, the same logic as the A1 step. 727 00:55:41,390 --> 00:55:52,760 We see that B1 has large size, is a large fraction of B. 728 00:55:52,760 --> 00:55:57,410 And now we pass down to this B1 set. 729 00:56:04,214 --> 00:56:17,970 Now, finally, let's consider A3 to be vertices in A2 730 00:56:17,970 --> 00:56:21,490 where a is friendly. 731 00:56:21,490 --> 00:56:28,650 So vertices a in A2 such that a is friendly to at least 1 732 00:56:28,650 --> 00:56:31,210 over delta over-- 733 00:56:31,210 --> 00:56:38,740 so 1 minus delta over 5 fraction of A2. 734 00:56:42,590 --> 00:56:50,430 So we saw that, in A2, most pairs of vertices are friendly. 735 00:56:50,430 --> 00:57:00,000 So most, meaning all but a delta over 10 fraction. 736 00:57:00,000 --> 00:57:05,770 So if we consider vertices which are 737 00:57:05,770 --> 00:57:10,740 unfriendly to many other vertices in A2, 738 00:57:10,740 --> 00:57:13,560 there aren't so many of them. 739 00:57:13,560 --> 00:57:16,440 If there were many of them, you couldn't have had that. 740 00:57:16,440 --> 00:57:18,850 So that's why I constructed this set 741 00:57:18,850 --> 00:57:23,540 A3 consisting of elements in A2 that 742 00:57:23,540 --> 00:57:27,110 are friendly to many elements. 743 00:57:27,110 --> 00:57:32,990 And the size of A3 is at least half of that of A2. 744 00:57:40,974 --> 00:57:50,510 So we have this A3 inside. 745 00:57:50,510 --> 00:57:51,696 All right. 746 00:57:51,696 --> 00:57:57,235 And now I claim that we can take A3 and B as our final sets, 747 00:57:57,235 --> 00:58:01,510 and that between every vertex in A3 and every vertex in B1, 748 00:58:01,510 --> 00:58:04,990 I claim there must be lots of paths of length 3. 749 00:58:04,990 --> 00:58:07,420 But, first, let's check their sizes. 750 00:58:07,420 --> 00:58:09,820 I mean, the sizes all should be OK, because we never 751 00:58:09,820 --> 00:58:11,960 lost too much at each step. 752 00:58:11,960 --> 00:58:13,740 If you only care about polynomial factors, 753 00:58:13,740 --> 00:58:15,990 well, you already see that we never lost anything more 754 00:58:15,990 --> 00:58:17,590 than a polynomial factor. 755 00:58:17,590 --> 00:58:20,510 But just to be precise, the size of A3 is at least-- 756 00:58:20,510 --> 00:58:24,320 so if you count up the factor lost at each step, 757 00:58:24,320 --> 00:58:29,230 so it's 1/2 delta over 4 delta over 2. 758 00:58:29,230 --> 00:58:34,480 So it's at least delta squared over 16 fraction 759 00:58:34,480 --> 00:58:36,870 of the original set A. 760 00:58:36,870 --> 00:58:44,320 And now, if we consider a comma b 761 00:58:44,320 --> 00:58:49,650 to be an arbitrary pair in A3 cross B1, 762 00:58:49,650 --> 00:58:53,140 I claim that there must be many paths. 763 00:58:53,140 --> 00:58:59,090 Because by using-- so what properties do we know? 764 00:58:59,090 --> 00:59:05,920 We know that b is adjacent to a large fraction. 765 00:59:05,920 --> 00:59:10,300 So here large means at least delta over 4-- so bounded 766 00:59:10,300 --> 00:59:16,080 below-- a large fraction of A2. 767 00:59:16,080 --> 00:59:16,580 Yes. 768 00:59:16,580 --> 00:59:17,302 So I apologize. 769 00:59:17,302 --> 00:59:19,260 When I say the word large, depending on context 770 00:59:19,260 --> 00:59:22,640 it can mean bigger than delta, or it could mean at least 1 771 00:59:22,640 --> 00:59:23,370 minus delta. 772 00:59:23,370 --> 00:59:25,380 So you look at what I write down. 773 00:59:25,380 --> 00:59:31,070 So b is adjacent to at least delta over 4 fraction of A2. 774 00:59:31,070 --> 00:59:39,300 At the same time, we know that a is friendly to at least 1 775 00:59:39,300 --> 00:59:43,940 minus delta over 5 fraction of A2. 776 00:59:49,000 --> 00:59:54,070 So these two sets, they must overlap by at least a delta 777 00:59:54,070 --> 00:59:55,060 over 20 fraction. 778 01:00:00,351 --> 01:00:05,260 So let's take a vertex b. 779 01:00:05,260 --> 01:00:13,100 So you-- so it's adjacent to many vertices here. 780 01:00:13,100 --> 01:00:17,526 And if you look at a vertex in A, 781 01:00:17,526 --> 01:00:21,240 it's friendly to a large fraction. 782 01:00:21,240 --> 01:00:25,570 So, in particular, it's friendly to all these elements 783 01:00:25,570 --> 01:00:26,070 over here. 784 01:00:28,760 --> 01:00:34,840 So, to finish off, what does it mean for a-- 785 01:00:34,840 --> 01:00:37,600 this is-- this vertex is a. 786 01:00:37,600 --> 01:00:38,810 This vertex is b. 787 01:00:38,810 --> 01:00:40,970 What does it mean for a to be friendly to all 788 01:00:40,970 --> 01:00:42,830 of these shaded elements? 789 01:00:42,830 --> 01:00:47,510 It means that there are lots of paths from a 790 01:00:47,510 --> 01:00:51,930 to each of these elements. 791 01:00:51,930 --> 01:00:55,974 And then you can finish off the paths going back to b. 792 01:00:55,974 --> 01:00:56,750 Yes. 793 01:00:56,750 --> 01:01:00,092 AUDIENCE: The shaded stuff is allowed to be outside of A3? 794 01:01:00,092 --> 01:01:01,800 YUFEI ZHAO: No. the shaded-- the question 795 01:01:01,800 --> 01:01:04,040 is, is the shaded stuff allowed to be outside of A3? 796 01:01:04,040 --> 01:01:04,540 No. 797 01:01:04,540 --> 01:01:06,870 The shaded things are inside A3. 798 01:01:06,870 --> 01:01:10,900 So we're looking at intersections within A3. 799 01:01:14,550 --> 01:01:15,090 No, sorry. 800 01:01:17,055 --> 01:01:18,180 Actually, no, you're right. 801 01:01:18,180 --> 01:01:20,530 So the shaded things can be outside A3. 802 01:01:20,530 --> 01:01:22,450 So shaded things can be outside A3. 803 01:01:22,450 --> 01:01:23,130 I apologize. 804 01:01:23,130 --> 01:01:25,650 So everything now is in A2. 805 01:01:28,800 --> 01:01:35,100 So b is adjacent to a large fraction of A2. 806 01:01:35,100 --> 01:01:43,800 And a here is friendly to some part of the neighbors of b. 807 01:01:43,800 --> 01:01:48,920 So you can complete paths like that. 808 01:01:52,750 --> 01:01:53,250 Yes. 809 01:01:53,250 --> 01:01:54,880 So only the starting and ending points 810 01:01:54,880 --> 01:01:56,410 have to be in A prime and B prime. 811 01:01:56,410 --> 01:01:58,970 Everything else, they can go outside of the A prime and B 812 01:01:58,970 --> 01:02:00,666 prime. 813 01:02:00,666 --> 01:02:03,970 Yes, thank you. 814 01:02:03,970 --> 01:02:22,516 So the number of paths from a to B to A2 back to b is-- 815 01:02:22,516 --> 01:02:24,960 let's see if I can stay within B1-- 816 01:02:24,960 --> 01:02:26,010 so is at least-- 817 01:02:34,110 --> 01:02:34,610 yes. 818 01:02:34,610 --> 01:02:36,530 So it's-- sorry. 819 01:02:36,530 --> 01:02:44,070 This is B. So it's at least delta over 20 times A2 times 820 01:02:44,070 --> 01:02:49,680 delta over delta squared over 80 times B. So 821 01:02:49,680 --> 01:02:52,470 if you don't care about polynomial factors in delta, 822 01:02:52,470 --> 01:02:55,000 then you see that-- 823 01:02:58,080 --> 01:03:00,140 the point is there's a large fraction of-- 824 01:03:02,418 --> 01:03:03,460 there are a lot of paths. 825 01:03:03,460 --> 01:03:07,000 So there are a lot of paths between each little a and each 826 01:03:07,000 --> 01:03:09,670 little b by the construction we've done. 827 01:03:15,190 --> 01:03:16,630 So let me just do a recap. 828 01:03:16,630 --> 01:03:19,990 So there were quite a few details in this proof, 829 01:03:19,990 --> 01:03:22,300 and some of them have to do with cleaning up. 830 01:03:22,300 --> 01:03:24,640 Because it's not so nice to work with graphs 831 01:03:24,640 --> 01:03:26,850 that just have large average degree. 832 01:03:26,850 --> 01:03:28,480 It's much nicer to work with graphs 833 01:03:28,480 --> 01:03:29,920 with large minimum degree. 834 01:03:29,920 --> 01:03:33,580 So there are a couple of steps here to take care of vertices 835 01:03:33,580 --> 01:03:34,990 with small degrees. 836 01:03:34,990 --> 01:03:39,290 So we started with, between A and B, lots of edges. 837 01:03:39,290 --> 01:03:42,410 And we trim vertices from A with small degree. 838 01:03:42,410 --> 01:03:45,250 So we get A1. 839 01:03:45,250 --> 01:03:48,970 And then we apply the path of length 2 lemma to get A2. 840 01:03:48,970 --> 01:03:52,838 So inside A2, most pairs of vertices 841 01:03:52,838 --> 01:03:54,630 have lots of common neighbors, but not all. 842 01:03:57,510 --> 01:04:01,860 We then go back to B to get B1, which 843 01:04:01,860 --> 01:04:04,940 has large minimum degree to A2. 844 01:04:07,650 --> 01:04:12,030 And then A3 looks at vertices in A 845 01:04:12,030 --> 01:04:16,490 with many friendly companions in A2. 846 01:04:20,200 --> 01:04:24,100 And A3 is large, and I claim that between every vertex in A3 847 01:04:24,100 --> 01:04:28,120 and every vertex in B, you have many paths of length 3. 848 01:04:28,120 --> 01:04:32,340 Because if you start with a vertex in A3, 849 01:04:32,340 --> 01:04:35,430 it has many friendly companions. 850 01:04:35,430 --> 01:04:41,640 So many here means at least 1 minus delta over 5 fraction. 851 01:04:41,640 --> 01:04:49,470 Whereas every vertex in B1 has lots of neighbors in A2, 852 01:04:49,470 --> 01:04:53,430 where lots means at least delta over 4. 853 01:04:53,430 --> 01:04:56,610 So there's necessarily an overlap 854 01:04:56,610 --> 01:04:59,230 of at least delta over 20. 855 01:04:59,230 --> 01:05:01,750 And for that overlap, we can create 856 01:05:01,750 --> 01:05:07,510 lots of paths going through this overlap from A 857 01:05:07,510 --> 01:05:12,947 to B. Any questions? 858 01:05:16,220 --> 01:05:16,780 OK, great. 859 01:05:16,780 --> 01:05:21,940 So let's put everything together to prove the graphical version 860 01:05:21,940 --> 01:05:23,704 of Balog-Szemeredi-Gowers. 861 01:05:31,040 --> 01:05:32,730 So we'll prove the graphical version 862 01:05:32,730 --> 01:05:34,452 of Balog-Szemeredi-Gowers. 863 01:05:42,660 --> 01:05:46,920 So by-- so, first, note that the hypothesis 864 01:05:46,920 --> 01:05:49,450 of Balog-Szemeredi-Gowers already 865 01:05:49,450 --> 01:05:53,730 implies that the size of A and the size of B 866 01:05:53,730 --> 01:05:55,650 are not too small. 867 01:05:58,910 --> 01:06:03,287 Because, otherwise, you couldn't have had n squared over k edges 868 01:06:03,287 --> 01:06:03,870 to begin with. 869 01:06:08,060 --> 01:06:16,610 So by the path of length 3 lemma, 870 01:06:16,610 --> 01:06:20,375 there exists A prime in A and B prime 871 01:06:20,375 --> 01:06:24,980 in B with the following properties. 872 01:06:24,980 --> 01:06:29,240 That A prime has a large fraction of-- 873 01:06:32,306 --> 01:06:36,110 so A prime and B prime are both large in size. 874 01:06:40,460 --> 01:06:46,665 And for all vertices a in A prime 875 01:06:46,665 --> 01:06:54,090 and vertices b in B prime, there are 876 01:06:54,090 --> 01:06:59,040 lots of paths of length 3 between these vertices. 877 01:06:59,040 --> 01:07:05,010 So there are at least k to the minus little o1-- 878 01:07:05,010 --> 01:07:10,950 to the minus big O1 times n squared 879 01:07:10,950 --> 01:07:18,050 pairs of intermediate vertices a1, 880 01:07:18,050 --> 01:07:37,760 b1 in A cross B, such that a b1 a1 b is a path in G. 881 01:07:37,760 --> 01:07:41,690 So let me draw the situation for you. 882 01:07:48,920 --> 01:08:00,210 So we have A and B. And so inside A and B, 883 01:08:00,210 --> 01:08:08,100 we have this fairly large A prime 884 01:08:08,100 --> 01:08:14,026 and B prime, such that for every little a 885 01:08:14,026 --> 01:08:23,470 and little b, there are many paths like that 886 01:08:23,470 --> 01:08:26,890 going to b1 and a2. 887 01:08:30,439 --> 01:08:44,270 Let me set-- so let me set x to be a plus b1, that sum, 888 01:08:44,270 --> 01:08:52,240 y to be a1 plus b1, and z to be a1 plus b. 889 01:09:04,460 --> 01:09:21,380 So now notice that we can write this a plus b in at least k 890 01:09:21,380 --> 01:09:27,350 to the minus big O1 times n squared ways 891 01:09:27,350 --> 01:09:39,410 as x minus y plus z by following this path, where x, y, 892 01:09:39,410 --> 01:09:45,500 and z all lie in the restricted sumset, 893 01:09:45,500 --> 01:09:49,350 because that's how the restricted sumset is defined. 894 01:09:49,350 --> 01:09:54,229 So if you have an edge, then the sum of the elements 895 01:09:54,229 --> 01:09:57,400 across on the two ends, by definition, 896 01:09:57,400 --> 01:10:00,970 lies in the restricted sumset. 897 01:10:00,970 --> 01:10:02,900 So the path of length 3 lemma tells us 898 01:10:02,900 --> 01:10:06,320 that every pair a and b, their sum 899 01:10:06,320 --> 01:10:12,290 can be written in many different ways as this combination. 900 01:10:12,290 --> 01:10:21,360 As a result, we see that A prime plus B prime-- 901 01:10:21,360 --> 01:10:26,477 so this sum, if we consider sum along with its multiplicity-- 902 01:10:31,450 --> 01:10:36,760 so now we're really looking at all the different sums as well 903 01:10:36,760 --> 01:10:43,890 as ways of writing the sum as this combination-- 904 01:10:43,890 --> 01:10:58,060 we see that it is bounded above by the restricted sumset raised 905 01:10:58,060 --> 01:10:58,920 to the third power. 906 01:11:11,295 --> 01:11:13,740 Because each of these choices, x, y, and z, 907 01:11:13,740 --> 01:11:16,330 they come from the restricted sumset. 908 01:11:16,330 --> 01:11:19,150 But the hypothesis of Balog-Szemeredi-Gowers, 909 01:11:19,150 --> 01:11:21,880 the graphical version, is that the restricted sumset 910 01:11:21,880 --> 01:11:24,630 is small in size. 911 01:11:24,630 --> 01:11:32,430 So we can now upper bound the restricted sumset 912 01:11:32,430 --> 01:11:36,580 by, basically, the-- 913 01:11:36,580 --> 01:11:41,840 within a constant, within a factor of the maximum possible. 914 01:11:41,840 --> 01:11:47,640 And now we are done, because we have deduced 915 01:11:47,640 --> 01:11:52,770 that the complete sumset between A prime and B prime 916 01:11:52,770 --> 01:12:02,320 is, at most, a constant factor with change 917 01:12:02,320 --> 01:12:04,310 in constant by a polynomial. 918 01:12:04,310 --> 01:12:06,795 So a constant factor more than the maximum possible. 919 01:12:11,000 --> 01:12:14,490 So it's, at mostly, k to the big O1 poly k times n. 920 01:12:17,070 --> 01:12:20,310 So that proves the graphical version 921 01:12:20,310 --> 01:12:22,890 of Balog-Szemeredi-Gowers. 922 01:12:22,890 --> 01:12:26,070 And because we showed earlier that the graphical version 923 01:12:26,070 --> 01:12:28,560 of Balog-Szemeredi-Gowers implies Balog-Szemeredi-Gowers, 924 01:12:28,560 --> 01:12:32,170 this shows the Balog-Szemeredi-Gowers theorem. 925 01:12:32,170 --> 01:12:35,880 So let me recap some of the ideas we saw today. 926 01:12:35,880 --> 01:12:38,260 And so the whole point of Balog-Szemeredi-Gowers 927 01:12:38,260 --> 01:12:42,070 and all of these related lemmas and theorems and variations 928 01:12:42,070 --> 01:12:47,050 is that you start with something that has 929 01:12:47,050 --> 01:12:49,600 a lot of additive structure. 930 01:12:49,600 --> 01:12:54,440 Well, after we passed down to graphs just a lot of edges. 931 01:12:54,440 --> 01:12:57,700 So you start with a situation where 932 01:12:57,700 --> 01:13:02,325 you have kind of 1% goodness. 933 01:13:02,325 --> 01:13:03,700 And you want to show that you can 934 01:13:03,700 --> 01:13:07,960 restrict to fairly large subsets, 935 01:13:07,960 --> 01:13:10,610 so that you have perfection. 936 01:13:10,610 --> 01:13:14,510 So you have complete goodness between these two sets. 937 01:13:14,510 --> 01:13:17,240 And this is what's going on in both the graphical version 938 01:13:17,240 --> 01:13:18,920 and the additive version. 939 01:13:18,920 --> 01:13:21,560 So back to the graph path of length 3 lemma. 940 01:13:21,560 --> 01:13:25,820 So we were able to boost the path of length 2 lemma, which 941 01:13:25,820 --> 01:13:31,400 tells us something about 99% of the pairs having lots 942 01:13:31,400 --> 01:13:37,160 of common neighbors, to 100% of the pairs having 943 01:13:37,160 --> 01:13:41,010 lots of path of length 3. 944 01:13:41,010 --> 01:13:43,380 And in the additive setting, we saw that 945 01:13:43,380 --> 01:13:47,880 by starting with a situation where the hypothesis is 946 01:13:47,880 --> 01:13:51,210 somewhat patchy, so like a 1% type hypothesis, 947 01:13:51,210 --> 01:13:54,750 we can pass down to fairly large sets, where 948 01:13:54,750 --> 01:13:58,590 the complete sumset, starting with just the restricted sumset 949 01:13:58,590 --> 01:14:01,110 being small, can pass down to large sets 950 01:14:01,110 --> 01:14:03,540 where the complete sumset is small. 951 01:14:03,540 --> 01:14:05,730 And this is an important principle, that, often, 952 01:14:05,730 --> 01:14:11,210 when we have some typicality by an appropriate argument-- 953 01:14:11,210 --> 01:14:14,300 and, here, it's not at all a trivial argument. 954 01:14:14,300 --> 01:14:15,890 So there's some cleverness involved, 955 01:14:15,890 --> 01:14:18,120 that by doing some kind of argument, 956 01:14:18,120 --> 01:14:21,950 we may be able to pass down to some fairly large set 957 01:14:21,950 --> 01:14:25,130 where it's not typically good, but everything's 958 01:14:25,130 --> 01:14:27,020 perfectly good. 959 01:14:27,020 --> 01:14:31,820 That's the spirit here of the Balog-Szemeredi-Gowers theorem. 960 01:14:31,820 --> 01:14:35,010 So, next time, for the last lecture of this course, 961 01:14:35,010 --> 01:14:38,600 I will tell you about the sum-product problem, 962 01:14:38,600 --> 01:14:40,802 where the-- 963 01:14:40,802 --> 01:14:44,500 there are also some graph-- very nice graph theoretic inputs.