1 00:00:18,962 --> 00:00:20,670 YUFEI ZHAO: For the past couple lectures, 2 00:00:20,670 --> 00:00:22,980 we've been talking about Roth's theorem. 3 00:00:22,980 --> 00:00:28,190 And we showed-- so we saw a proof of Roth's theorem 4 00:00:28,190 --> 00:00:30,730 using Fourier analytic methods. 5 00:00:30,730 --> 00:00:32,220 And we saw basically the same proof 6 00:00:32,220 --> 00:00:33,930 but in two different settings. 7 00:00:33,930 --> 00:00:36,250 So two lectures ago, we saw a proof in F3 8 00:00:36,250 --> 00:00:39,395 to the M And basically the same strategy, 9 00:00:39,395 --> 00:00:40,770 but with a bit more work, we were 10 00:00:40,770 --> 00:00:46,560 able to show Roth's theorem worth roughly comparable bounds 11 00:00:46,560 --> 00:00:48,750 over the integers. 12 00:00:48,750 --> 00:00:51,900 Today, I want to show you a very different kind of proof 13 00:00:51,900 --> 00:00:54,840 of Roth's theorem in the finite fieldfini setting. 14 00:00:54,840 --> 00:00:59,220 So first let me remind you, the bound that we 15 00:00:59,220 --> 00:01:04,019 saw last time for Roth's in F3 to the M 16 00:01:04,019 --> 00:01:07,920 gave an upper bound on the maximum number 17 00:01:07,920 --> 00:01:14,730 of elements in the 3-AP-free set that were of the form 18 00:01:14,730 --> 00:01:17,520 3 to the n over n. 19 00:01:21,010 --> 00:01:22,510 And so this proof wasn't too bad. 20 00:01:22,510 --> 00:01:24,190 So we did it in one lecture. 21 00:01:24,190 --> 00:01:26,670 And then with a lot more work-- and people tried very, 22 00:01:26,670 --> 00:01:28,060 very hard to improve this-- 23 00:01:28,060 --> 00:01:33,170 and there was a paper that got it to just a little bit more. 24 00:01:33,170 --> 00:01:34,400 And this was a lot of work. 25 00:01:34,400 --> 00:01:36,760 And this was something that people thought 26 00:01:36,760 --> 00:01:39,130 was very exciting at the time. 27 00:01:39,130 --> 00:01:41,820 And then just a few years ago, there was a major breakthrough, 28 00:01:41,820 --> 00:01:44,155 a very surprising breakthrough, where-- 29 00:01:44,155 --> 00:01:46,030 you know, at this point, it wasn't even clear 30 00:01:46,030 --> 00:01:49,020 whether 3 should be the right base for this exponent. 31 00:01:49,020 --> 00:01:50,950 That was a big open problem. 32 00:01:50,950 --> 00:01:52,870 And then there was a big breakthrough 33 00:01:52,870 --> 00:01:56,170 where the following bound was proved, 34 00:01:56,170 --> 00:02:03,700 that it was exponentially less than the previous bound. 35 00:02:03,700 --> 00:02:06,520 So this is one that I want to talk about in the first part 36 00:02:06,520 --> 00:02:08,090 of today's lecture. 37 00:02:08,090 --> 00:02:11,020 So this development came first-- 38 00:02:11,020 --> 00:02:13,060 the history is a bit interesting. 39 00:02:13,060 --> 00:02:16,510 So Croot, Lev, and Pach uploaded a paper 40 00:02:16,510 --> 00:02:22,060 to the archive May 5 of 2016, where they showed not 41 00:02:22,060 --> 00:02:26,200 exactly this theorem but in a slightly different setting 42 00:02:26,200 --> 00:02:32,170 in this group, so in Z mod 4 instead of Z mod 3. 43 00:02:32,170 --> 00:02:33,670 And this was already quite exciting, 44 00:02:33,670 --> 00:02:36,640 getting exponential improvement in this setting. 45 00:02:36,640 --> 00:02:38,650 But it wasn't exactly obvious how 46 00:02:38,650 --> 00:02:41,440 to use their method to get F3. 47 00:02:41,440 --> 00:02:44,350 But that was done about a week later. 48 00:02:44,350 --> 00:02:57,400 So Ellenberg and Gijswijt, they managed to improve the-- 49 00:02:57,400 --> 00:03:00,320 use this technique to modify the Croot-Lev-Pach technique 50 00:03:00,320 --> 00:03:04,570 to the F2 to the n setting, which is the one that we've 51 00:03:04,570 --> 00:03:05,380 been interested in. 52 00:03:05,380 --> 00:03:07,380 So there's a small difference between these two, 53 00:03:07,380 --> 00:03:10,855 namely this group has elements of order 2, which 54 00:03:10,855 --> 00:03:12,730 makes things a bit easier to do it here with. 55 00:03:16,540 --> 00:03:18,400 So this is the Croot-Lev-Pach method, 56 00:03:18,400 --> 00:03:20,500 as it's often called in literature. 57 00:03:20,500 --> 00:03:23,440 And we'll see that-- it's a very ingenious use 58 00:03:23,440 --> 00:03:25,240 of the so-called linear algebraic method 59 00:03:25,240 --> 00:03:28,570 in combinatorics, in this case the polynomial method. 60 00:03:28,570 --> 00:03:32,600 And it works specifically in the finite field vector space. 61 00:03:32,600 --> 00:03:35,650 So what we're talking about in this part of the lecture 62 00:03:35,650 --> 00:03:37,260 does not translate whatsoever. 63 00:03:37,260 --> 00:03:40,812 At least, nobody knows how to translate this technique 64 00:03:40,812 --> 00:03:41,770 to the integer setting. 65 00:03:45,010 --> 00:03:47,500 So how does it work? 66 00:03:47,500 --> 00:03:49,420 The presentation I'm going to give 67 00:03:49,420 --> 00:03:52,910 follows not the original paper, which is quite nice to read, 68 00:03:52,910 --> 00:03:53,470 by the way. 69 00:03:53,470 --> 00:03:54,803 It's only about four pages long. 70 00:03:54,803 --> 00:03:57,000 It's pleasant to read. 71 00:03:57,000 --> 00:04:00,640 But there's is a slightly even nicer formulation 72 00:04:00,640 --> 00:04:01,980 on Terry Tao's blog. 73 00:04:01,980 --> 00:04:03,795 And that's the one that I'm presenting. 74 00:04:07,890 --> 00:04:11,670 So the idea is that if you have a subset of F3 75 00:04:11,670 --> 00:04:18,630 to the n that is 3-AP-free, such a set also 76 00:04:18,630 --> 00:04:24,265 has a name capset, which is also used in literature 77 00:04:24,265 --> 00:04:25,890 in this specific setting where you have 78 00:04:25,890 --> 00:04:27,720 no three points on the line. 79 00:04:27,720 --> 00:04:31,659 In this case, then we have the following identity. 80 00:04:38,020 --> 00:04:41,710 So here delta is the Dirac delta. 81 00:04:41,710 --> 00:04:43,300 Let me write that down in a second. 82 00:04:47,470 --> 00:04:51,130 So the delta of a is the Dirac delta. 83 00:04:51,130 --> 00:04:56,450 It's either 1 if x equals to a, and 0 if x does not equal to a. 84 00:04:59,140 --> 00:05:00,910 So this is simply rewriting the fact 85 00:05:00,910 --> 00:05:03,340 that x, y, z form a 3-AP if and only 86 00:05:03,340 --> 00:05:08,125 if their sum is equal to 0. 87 00:05:08,125 --> 00:05:10,420 And because you're 3-AP-free, the only 3-AP's 88 00:05:10,420 --> 00:05:13,670 are the trivial ones recorded on the right-hand side. 89 00:05:13,670 --> 00:05:15,800 So this is simply a recording of the statement 90 00:05:15,800 --> 00:05:17,388 that A is 3-AP-free. 91 00:05:20,140 --> 00:05:23,890 And the idea now is that you have this expression up there, 92 00:05:23,890 --> 00:05:28,570 and I want to show that if A is very, very large, 93 00:05:28,570 --> 00:05:31,780 then I could get a contradiction by considering 94 00:05:31,780 --> 00:05:33,820 some notion of rank. 95 00:05:33,820 --> 00:05:37,150 So we will show that the left-hand side 96 00:05:37,150 --> 00:05:41,870 is, in some sense, low rank. 97 00:05:41,870 --> 00:05:44,330 Well, I haven't told you what rank means yet. 98 00:05:44,330 --> 00:05:46,340 But the left-hand side is somewhat low rank, 99 00:05:46,340 --> 00:05:52,750 and the right-hand side is a high-rank object. 100 00:05:57,690 --> 00:06:00,440 So what does rank mean. 101 00:06:00,440 --> 00:06:02,615 So recall from linear algebra-- 102 00:06:07,900 --> 00:06:10,390 so the classical notion of rank corresponds 103 00:06:10,390 --> 00:06:12,710 to two variable functions. 104 00:06:12,710 --> 00:06:17,890 So you should think of F as a matrix over an arbitrary field 105 00:06:17,890 --> 00:06:23,360 F. So such a function or a corresponding matrix 106 00:06:23,360 --> 00:06:33,600 is called rank 1 if it is nonzero 107 00:06:33,600 --> 00:06:37,810 and it can be written in the following form-- 108 00:06:37,810 --> 00:06:48,190 F of x, y is f of x g of y for some functions 109 00:06:48,190 --> 00:06:51,130 that are one variable each. 110 00:06:51,130 --> 00:06:54,990 So, in matrix language, this is a column vector 111 00:06:54,990 --> 00:06:57,160 times a row vector. 112 00:06:57,160 --> 00:06:58,810 So that's the meaning of rank 1. 113 00:06:58,810 --> 00:07:03,910 And to say that something is of high rank of a specific rank-- 114 00:07:03,910 --> 00:07:08,980 rather, the rank of F is defined to be the minimum number 115 00:07:08,980 --> 00:07:19,050 of rank 1 functions needed to write F 116 00:07:19,050 --> 00:07:21,479 as a sum or a linear combination. 117 00:07:27,840 --> 00:07:29,310 So this is rank 1. 118 00:07:29,310 --> 00:07:31,380 And if you add up r rank 1 functions, 119 00:07:31,380 --> 00:07:33,930 then get something that's, at most, rank r. 120 00:07:33,930 --> 00:07:38,740 So that's the basic definition of rank from linear algebra. 121 00:07:38,740 --> 00:07:40,160 For three-variable functions, you 122 00:07:40,160 --> 00:07:42,410 can come up with other notions of rank. 123 00:07:42,410 --> 00:07:47,248 So what about three-variable functions? 124 00:07:50,460 --> 00:07:53,430 So how do we define a rank of such a function? 125 00:07:53,430 --> 00:07:56,100 So you might have seen such objects as generalizations 126 00:07:56,100 --> 00:07:57,960 of matrices called tensors. 127 00:07:57,960 --> 00:08:01,880 And tensors have, already, a natural notion of rank, 128 00:08:01,880 --> 00:08:04,680 and this is called tensor rank. 129 00:08:04,680 --> 00:08:08,130 Just like how, here, F is-- 130 00:08:08,130 --> 00:08:12,030 we say rank 1 if it's decomposable like that, 131 00:08:12,030 --> 00:08:18,660 we say F has tensor rank 1 if this three-variable function is 132 00:08:18,660 --> 00:08:22,732 decomposable as a product of one-variable functions. 133 00:08:25,660 --> 00:08:28,150 The tensor rank, it turns out, this 134 00:08:28,150 --> 00:08:31,760 is an important notion, which is actually quite mysterious. 135 00:08:31,760 --> 00:08:33,549 There's a lot of important problems 136 00:08:33,549 --> 00:08:37,900 that boil down to us not really understanding what tensor rank, 137 00:08:37,900 --> 00:08:39,703 how it behaves. 138 00:08:39,703 --> 00:08:41,620 And it turns out, this is not the right notion 139 00:08:41,620 --> 00:08:43,299 to use for our problem. 140 00:08:43,299 --> 00:08:46,180 So we're going to use a different notion of rank. 141 00:08:46,180 --> 00:08:49,150 Here, rank 1 is decomposing this three-variable function 142 00:08:49,150 --> 00:08:51,910 into a product of three one-variable functions. 143 00:08:51,910 --> 00:08:54,550 But, instead, I can define a different notion. 144 00:08:54,550 --> 00:09:00,280 We say that F has slice rank 1-- 145 00:09:00,280 --> 00:09:03,327 so this is a definition that's introduced 146 00:09:03,327 --> 00:09:05,410 in the context of this problem, although it's also 147 00:09:05,410 --> 00:09:07,210 quite a natural definition-- 148 00:09:07,210 --> 00:09:11,890 if it has one of the following forms. 149 00:09:16,340 --> 00:09:20,120 So I can write it as a product of a one-variable function 150 00:09:20,120 --> 00:09:21,740 and a two-variable function. 151 00:09:21,740 --> 00:09:25,340 So one variable and the remaining two variables. 152 00:09:25,340 --> 00:09:30,350 But this definition should also be symmetric in the variables, 153 00:09:30,350 --> 00:09:33,010 so the other combinations are OK as well. 154 00:09:40,200 --> 00:09:42,500 So this is the definition of a rank one function, 155 00:09:42,500 --> 00:09:43,930 a slice rank 1. 156 00:09:43,930 --> 00:09:46,750 And, also, if nonzero. 157 00:09:46,750 --> 00:09:49,340 If it's nonzero and can be written in one of these forms. 158 00:09:51,970 --> 00:09:54,340 And, just like earlier, we define 159 00:09:54,340 --> 00:10:03,440 the slice rank of F to be the minimum number of slice 160 00:10:03,440 --> 00:10:04,490 rank 1 functions. 161 00:10:08,390 --> 00:10:13,260 Same as before, that you need to write F as a sum. 162 00:10:13,260 --> 00:10:18,140 So I can decompose this F into a sum of slice rank 1 functions. 163 00:10:18,140 --> 00:10:21,730 What's the most efficient way to do so? 164 00:10:21,730 --> 00:10:24,977 So that's the definition of slice rank. 165 00:10:24,977 --> 00:10:27,060 And, you see, you can come up with this definition 166 00:10:27,060 --> 00:10:29,640 for any number of variables, where slice rank 167 00:10:29,640 --> 00:10:32,670 1 means decompose into two functions, where 168 00:10:32,670 --> 00:10:35,340 one function takes one variable, and the other function takes 169 00:10:35,340 --> 00:10:37,860 all the remaining variables. 170 00:10:37,860 --> 00:10:40,320 And, therefore, two variables, slice rank and rank 171 00:10:40,320 --> 00:10:41,640 correspond to the same notion. 172 00:10:45,185 --> 00:10:46,060 Any questions so far? 173 00:10:49,640 --> 00:10:50,140 All right. 174 00:10:50,140 --> 00:10:53,020 So let's look at the function on the right. 175 00:10:53,020 --> 00:10:55,600 So think of it as a matrix, a tensor. 176 00:10:55,600 --> 00:10:56,900 So what is it? 177 00:10:56,900 --> 00:10:59,735 Well, it's kind of like a diagonal matrix. 178 00:10:59,735 --> 00:11:00,610 So that's what it is. 179 00:11:00,610 --> 00:11:03,110 It's a diagonal matrix. 180 00:11:03,110 --> 00:11:10,660 So what is the rank of a diagonal matrix, in this case 181 00:11:10,660 --> 00:11:11,620 a diagonal function? 182 00:11:14,570 --> 00:11:16,620 Well, you know from linear algebra 183 00:11:16,620 --> 00:11:20,070 that if you have a matrix, then the rank of a diagonal matrix 184 00:11:20,070 --> 00:11:23,090 is the number of nonzero entries. 185 00:11:23,090 --> 00:11:25,700 So something similar is true for slice rank, 186 00:11:25,700 --> 00:11:27,320 although it's less obvious. 187 00:11:27,320 --> 00:11:30,230 It will require a proof. 188 00:11:30,230 --> 00:11:36,310 So if I have this three-variable function 189 00:11:36,310 --> 00:11:43,790 defined by the following formula. 190 00:11:52,090 --> 00:11:54,810 So, in other words, it's a diagonal function where 191 00:11:54,810 --> 00:11:59,280 the entries on the diagonals are the Ca's. 192 00:11:59,280 --> 00:12:01,130 So what is the rank of this function? 193 00:12:01,130 --> 00:12:09,260 So the slice rank of F. In the matrix case, 194 00:12:09,260 --> 00:12:11,305 it will be the number of nonzero entries, 195 00:12:11,305 --> 00:12:12,760 and it's exactly the same here. 196 00:12:17,710 --> 00:12:21,060 So number of nonzero diagonal entries. 197 00:12:21,060 --> 00:12:22,560 That turns out to be the slice rank. 198 00:12:25,670 --> 00:12:26,610 Let's see a proof. 199 00:12:26,610 --> 00:12:30,500 So we go back to the definition of slice rank. 200 00:12:30,500 --> 00:12:36,380 And we see that one of the directions is easy. 201 00:12:36,380 --> 00:12:39,280 So this less than or equal to, greater than or equal to-- so 202 00:12:39,280 --> 00:12:40,030 which one is easy? 203 00:12:45,860 --> 00:12:50,080 So, you see, the right-hand side is a sum of r-- 204 00:12:50,080 --> 00:12:53,690 of a-- well, this many rank 1 functions. 205 00:12:53,690 --> 00:12:58,150 So this direction is-- 206 00:12:58,150 --> 00:13:02,140 so this direction is clear, just looking at the definition. 207 00:13:02,140 --> 00:13:06,610 I can write F explicitly as that many rank 1, 208 00:13:06,610 --> 00:13:10,130 slice rank 1 functions. 209 00:13:10,130 --> 00:13:16,110 So the tricky part is greater than or equal to. 210 00:13:16,110 --> 00:13:19,920 And for the greater than or equal to, 211 00:13:19,920 --> 00:13:26,700 let's assume that all the diagonal entries are nonzero. 212 00:13:30,750 --> 00:13:33,270 So why can we do this? 213 00:13:33,270 --> 00:13:35,730 If it's not nonzero, I claim that we 214 00:13:35,730 --> 00:13:47,440 can remove this element from A. If the Ca is not 0, 215 00:13:47,440 --> 00:13:50,350 then I remove a from the set. 216 00:13:50,350 --> 00:13:55,690 And doing so cannot increase the rank. 217 00:14:03,040 --> 00:14:12,310 A priori, the rank might go down if you get rid of an entry. 218 00:14:12,310 --> 00:14:14,800 Because if you add an entry, even though the function 219 00:14:14,800 --> 00:14:18,850 doesn't change on the original set, if you increase your set, 220 00:14:18,850 --> 00:14:21,850 maybe you have more space, maybe you have more flexibility 221 00:14:21,850 --> 00:14:22,480 to work with. 222 00:14:27,070 --> 00:14:34,530 But, certainly, if you remove an element, the rank cannot go up. 223 00:14:38,310 --> 00:14:45,620 Now, so suppose the slice rank of F 224 00:14:45,620 --> 00:14:47,980 is strictly less than the size of A. 225 00:14:47,980 --> 00:14:52,190 So all these Ca's are nonzero. 226 00:14:52,190 --> 00:14:55,610 So suppose, for contradiction, that there 227 00:14:55,610 --> 00:15:01,590 is some different way to write function F that 228 00:15:01,590 --> 00:15:02,730 uses fewer terms. 229 00:15:05,240 --> 00:15:08,000 So what would such a sum look like? 230 00:15:11,250 --> 00:15:16,410 So I would be able to write this function F in a different way. 231 00:15:28,190 --> 00:15:28,810 Like that. 232 00:15:28,810 --> 00:15:31,556 And then, now, I look at these-- 233 00:15:31,556 --> 00:15:35,700 the other types of functions using different combination 234 00:15:35,700 --> 00:15:36,948 of the variables. 235 00:16:01,358 --> 00:16:02,900 So suppose there were a different way 236 00:16:02,900 --> 00:16:08,210 to write this function F that uses fewer terms. 237 00:16:08,210 --> 00:16:11,063 So I assume it uses exactly the size of A minus 1 terms, 238 00:16:11,063 --> 00:16:12,980 and always putting zero functions if you like. 239 00:16:15,720 --> 00:16:23,720 So now I claim that there exists a function 240 00:16:23,720 --> 00:16:30,910 h on the set A whose support-- 241 00:16:30,910 --> 00:16:34,520 so the support is the number of entries 242 00:16:34,520 --> 00:16:35,930 that give nonzero values. 243 00:16:35,930 --> 00:16:41,740 The support of F is bigger than m, such 244 00:16:41,740 --> 00:16:45,420 that the following sum is 0. 245 00:17:05,520 --> 00:17:12,660 So I claim that we can find a function F-- 246 00:17:12,660 --> 00:17:19,829 h such that I think of it as in the kernel of some of these 247 00:17:19,829 --> 00:17:20,329 f's. 248 00:17:27,869 --> 00:17:29,730 So this is a linear algebraic statement. 249 00:17:29,730 --> 00:17:30,624 Yes. 250 00:17:30,624 --> 00:17:32,763 AUDIENCE: What is h sub [INAUDIBLE]?? 251 00:17:32,763 --> 00:17:33,680 YUFEI ZHAO: Ah, sorry. 252 00:17:33,680 --> 00:17:35,020 It's just h. 253 00:17:35,020 --> 00:17:36,181 Thank you. 254 00:17:36,181 --> 00:17:39,480 It's a single function h such that this equation 255 00:17:39,480 --> 00:17:42,486 is true for all x. 256 00:17:46,887 --> 00:17:49,332 AUDIENCE: [INAUDIBLE] h of x minus the sum 257 00:17:49,332 --> 00:17:52,760 of all [INAUDIBLE]. 258 00:17:52,760 --> 00:17:55,770 YUFEI ZHAO: You are right. 259 00:17:55,770 --> 00:17:57,020 So what do I want to say here? 260 00:18:12,355 --> 00:18:25,140 So we want to find a function h such that the support of h 261 00:18:25,140 --> 00:18:27,220 is at least m. 262 00:18:45,527 --> 00:18:46,610 So what do we want to say? 263 00:18:51,050 --> 00:18:53,070 I want to say that-- 264 00:19:00,915 --> 00:19:02,270 yes, so you're right. 265 00:19:02,270 --> 00:19:03,980 This is not what I want to say. 266 00:19:03,980 --> 00:19:09,090 And, instead, it's something-- 267 00:19:09,090 --> 00:19:09,590 mm-hmm. 268 00:19:22,982 --> 00:19:26,670 Yes, good. 269 00:19:26,670 --> 00:19:28,570 So, let's see. 270 00:19:31,350 --> 00:19:34,345 So here we have some number of functions. 271 00:19:34,345 --> 00:19:35,970 Here, we have some number of functions. 272 00:19:35,970 --> 00:19:42,730 And for each a, I have-- 273 00:19:42,730 --> 00:19:47,670 or for each-- let's see. 274 00:20:00,488 --> 00:20:01,967 Umm, hmm. 275 00:20:06,897 --> 00:20:08,445 AUDIENCE: [INAUDIBLE]. 276 00:20:08,445 --> 00:20:09,362 YUFEI ZHAO: I'm sorry? 277 00:20:09,362 --> 00:20:10,380 AUDIENCE: [INAUDIBLE]. 278 00:20:10,380 --> 00:20:10,570 YUFEI ZHAO: No. 279 00:20:10,570 --> 00:20:12,622 So I do want to show-- no, there's no induction, 280 00:20:12,622 --> 00:20:14,830 because I'm in three variables, and I want to get rid 281 00:20:14,830 --> 00:20:16,390 of-- so the point is-- 282 00:20:16,390 --> 00:20:19,450 so let's see where we're going eventually, 283 00:20:19,450 --> 00:20:23,140 and then we'll figure out what happened up there. 284 00:20:23,140 --> 00:20:26,050 So we want to consider-- 285 00:20:33,180 --> 00:20:36,650 so I would like to eventually consider the following sum. 286 00:20:52,200 --> 00:20:57,010 So I want to consider this sum, which comes from-- 287 00:20:57,010 --> 00:20:58,720 so you look at-- 288 00:21:01,810 --> 00:21:02,550 wait, no. 289 00:21:02,550 --> 00:21:06,050 That's not the sum I want to consider. 290 00:21:06,050 --> 00:21:17,352 So let's look at this F of x, y, z, so F being that sum. 291 00:21:17,352 --> 00:21:17,852 No. 292 00:21:31,010 --> 00:21:33,080 So take that F up there. 293 00:21:33,080 --> 00:21:36,980 And let me consider, basically, taking 294 00:21:36,980 --> 00:21:42,000 the inner product of this function viewed 295 00:21:42,000 --> 00:21:44,900 as a function in z. 296 00:21:44,900 --> 00:21:48,690 So consider this inner product. 297 00:21:48,690 --> 00:21:54,558 And if I-- ah. 298 00:21:54,558 --> 00:22:00,340 I think-- so what I want to say is not this. 299 00:22:03,730 --> 00:22:17,530 So what I want to say is, if I look at an inner product of h 300 00:22:17,530 --> 00:22:19,880 with the-- 301 00:22:24,580 --> 00:22:26,920 so take one of these f's-- 302 00:22:26,920 --> 00:22:29,230 take one of these f's and look at the bilinear 303 00:22:29,230 --> 00:22:31,190 form relating each in f. 304 00:22:31,190 --> 00:22:34,450 So I want to show that this sum vanishes 305 00:22:34,450 --> 00:22:41,565 for all i between m plus 1 and the size of A minus 1. 306 00:22:41,565 --> 00:22:46,120 So this row, I want it to vanish when being 307 00:22:46,120 --> 00:22:50,180 taken bilinear form with h. 308 00:22:50,180 --> 00:22:52,246 So that makes sense now. 309 00:22:52,246 --> 00:22:53,190 OK, good. 310 00:22:57,910 --> 00:23:02,020 So the fact that such a nonzero h exists simply 311 00:23:02,020 --> 00:23:04,390 is a matter of counting parameters. 312 00:23:04,390 --> 00:23:06,490 It's a linear algebraic statement. 313 00:23:06,490 --> 00:23:08,420 You have some number of freedoms. 314 00:23:08,420 --> 00:23:12,400 You have some number of constraints. 315 00:23:12,400 --> 00:23:20,110 So the set of such h satisfy all of these constraints. 316 00:23:20,110 --> 00:23:23,200 So there are this many constraints. 317 00:23:23,200 --> 00:23:25,150 Well, each one of them could carry down 318 00:23:25,150 --> 00:23:27,970 to one dimension less, but the set of such h 319 00:23:27,970 --> 00:23:39,710 is a linear subspace of dimension bigger than m, 320 00:23:39,710 --> 00:23:48,990 because I have A dimensions, and I have these many constraints. 321 00:23:48,990 --> 00:23:52,640 So the set of such h is-- there are a lot of possibilities. 322 00:23:52,640 --> 00:23:59,100 And, furthermore, it is also true that-- 323 00:23:59,100 --> 00:24:01,480 and this is a linear algebraic statement-- 324 00:24:01,480 --> 00:24:09,580 that every subspace of dimension m plus one 325 00:24:09,580 --> 00:24:19,510 has a vector whose support has size at least m plus 1. 326 00:24:26,170 --> 00:24:28,640 I'll leave this as a linear algebraic exercise. 327 00:24:28,640 --> 00:24:34,074 It's not entirely obvious, but it is true. 328 00:24:34,074 --> 00:24:35,830 When you put these two things together, 329 00:24:35,830 --> 00:24:37,970 you find that there is some vector-- 330 00:24:37,970 --> 00:24:40,150 so I think of the corners of the vectors 331 00:24:40,150 --> 00:24:42,040 as indexed by the set A-- 332 00:24:42,040 --> 00:24:45,270 there is some vector whose support is large enough. 333 00:24:53,510 --> 00:24:55,350 So we prove the claim. 334 00:24:55,350 --> 00:24:59,420 Let's go back to this lemma about this diagonal function 335 00:24:59,420 --> 00:25:01,970 having high rank. 336 00:25:01,970 --> 00:25:03,380 Take h from the claim. 337 00:25:08,460 --> 00:25:10,570 So let's take h from the claim. 338 00:25:10,570 --> 00:25:15,260 Then let's consider this sum over here. 339 00:25:15,260 --> 00:25:19,490 On one hand, what this sum is-- 340 00:25:19,490 --> 00:25:24,350 you can do the sum on the right-hand side. 341 00:25:24,350 --> 00:25:29,910 We see that it's like multiplying a diagonal matrix 342 00:25:29,910 --> 00:25:31,350 by a vector. 343 00:25:31,350 --> 00:25:36,320 So what you get, following the formula on the right-hand side, 344 00:25:36,320 --> 00:25:39,876 is the following. 345 00:25:39,876 --> 00:25:41,550 Let me rewrite this part. 346 00:25:47,250 --> 00:25:54,842 Sum over a of C sub a h of a delta sub a of x delta sub 347 00:25:54,842 --> 00:25:55,650 a of y. 348 00:25:58,460 --> 00:26:02,580 Just looking at the formula from the right hand side. 349 00:26:02,580 --> 00:26:08,110 On the other hand, if you had a decomposition up there, 350 00:26:08,110 --> 00:26:14,730 doing this sum and noting the claim, 351 00:26:14,730 --> 00:26:19,080 we see that the third row is gone. 352 00:26:19,080 --> 00:26:38,152 So what you would have is a sum over these z's of-- 353 00:26:38,152 --> 00:26:42,950 so let me write that like this. 354 00:26:42,950 --> 00:26:51,200 So you would have a sum that is of the form f1 of x and g tilde 355 00:26:51,200 --> 00:26:54,830 1 of y, where g tilde is basically 356 00:26:54,830 --> 00:26:58,460 the inner product of g1 as a function of z with h. 357 00:27:05,180 --> 00:27:09,470 So fl of x gl of y. 358 00:27:09,470 --> 00:27:20,070 And then, also, functions like that. 359 00:27:30,860 --> 00:27:33,120 So there exists some functions g, which 360 00:27:33,120 --> 00:27:35,580 come from g tilde, which come from the g's 361 00:27:35,580 --> 00:27:37,070 up there, such that this is true. 362 00:27:41,830 --> 00:27:45,840 But now we're in the world of two-variable functions. 363 00:27:45,840 --> 00:27:49,710 So left and right-hand side are two-variable functions. 364 00:27:49,710 --> 00:27:51,900 And for two-variable functions, you 365 00:27:51,900 --> 00:27:56,440 understand what is the rank of a diagonal function. 366 00:27:56,440 --> 00:28:12,950 So the left-hand side has more than m diagonal entries, 367 00:28:12,950 --> 00:28:15,260 because h has support. 368 00:28:15,260 --> 00:28:17,450 So the number of diagonal entries 369 00:28:17,450 --> 00:28:19,310 is just the support of h. 370 00:28:23,320 --> 00:28:27,760 Whereas the right-hand side has rank-- 371 00:28:27,760 --> 00:28:30,850 so now a linear algebraic matrix rank-- 372 00:28:30,850 --> 00:28:32,230 at most, m. 373 00:28:35,080 --> 00:28:36,540 And that's a contradiction. 374 00:28:36,540 --> 00:28:37,092 Yes. 375 00:28:37,092 --> 00:28:39,060 AUDIENCE: So you can show a similar statement 376 00:28:39,060 --> 00:28:41,560 where [INAUDIBLE]. 377 00:28:41,560 --> 00:28:42,310 YUFEI ZHAO: Great. 378 00:28:42,310 --> 00:28:44,170 So we can show a similar statement 379 00:28:44,170 --> 00:28:51,510 for arbitrary number of variables 380 00:28:51,510 --> 00:28:53,940 by generalizing this proof and using induction 381 00:28:53,940 --> 00:28:56,790 on the number of variables. 382 00:28:56,790 --> 00:28:58,540 But we only need three variables for now. 383 00:29:01,390 --> 00:29:04,130 Any questions? 384 00:29:04,130 --> 00:29:07,550 Just to recap, what we proved is the generalization 385 00:29:07,550 --> 00:29:10,760 of the statement that a diagonal matrix has 386 00:29:10,760 --> 00:29:14,240 rank equal to the number of nonzero diagonal entries. 387 00:29:14,240 --> 00:29:18,380 But the same fact is true for these three-variable functions 388 00:29:18,380 --> 00:29:19,700 with respect to slice rank. 389 00:29:25,240 --> 00:29:27,618 So this is intuitively obvious, but the execution 390 00:29:27,618 --> 00:29:28,410 is slightly tricky. 391 00:29:31,160 --> 00:29:31,890 All right. 392 00:29:31,890 --> 00:29:33,870 So now we have the statement here. 393 00:29:33,870 --> 00:29:41,480 Let's proceed to analyze this function which comes from-- 394 00:29:41,480 --> 00:29:46,547 so this relationship here coming from set A that is 3-AP-free. 395 00:29:53,020 --> 00:29:56,770 So suppose now I'm in-- 396 00:29:56,770 --> 00:30:00,560 so let me-- so everything so far was generally with any A. 397 00:30:00,560 --> 00:30:05,230 But now let me think about, specifically, 398 00:30:05,230 --> 00:30:13,166 functions on the finite field vector space, F3 to the n. 399 00:30:13,166 --> 00:30:16,600 So it's a function taking value F3. 400 00:30:16,600 --> 00:30:22,180 And this function is defined to be the left-hand side 401 00:30:22,180 --> 00:30:23,940 of that equation over there. 402 00:30:29,240 --> 00:30:31,850 So the claim is that-- 403 00:30:31,850 --> 00:30:35,630 so the left-hand side claim that this function has low rank. 404 00:30:35,630 --> 00:30:43,160 So we claim that a slice rank of this function is, at most, 3M, 405 00:30:43,160 --> 00:31:01,120 where M is the sum of, essentially, 406 00:31:01,120 --> 00:31:03,236 this multinomial coefficient. 407 00:31:09,160 --> 00:31:11,500 So we'll analyze this number in a second, 408 00:31:11,500 --> 00:31:13,590 but this number is supposed to be small. 409 00:31:19,540 --> 00:31:23,830 So we want to show that this function here has small rank. 410 00:31:23,830 --> 00:31:29,170 So let's rewrite this function in a form 411 00:31:29,170 --> 00:31:37,040 explicitly as a sum of products by expanding 412 00:31:37,040 --> 00:31:40,410 this function after writing it in a slightly different form. 413 00:31:40,410 --> 00:31:46,600 So in F3, in a three-variable-- 414 00:31:46,600 --> 00:31:53,630 in characteristic-- so in F3, you have this equation. 415 00:31:53,630 --> 00:31:56,560 You can check that it's true for x equal to 0, 1, or 2. 416 00:31:59,920 --> 00:32:04,930 So picked that, and plug it in over here. 417 00:32:04,930 --> 00:32:17,840 So we find-- so now x, y, z are in F3 to the n. 418 00:32:17,840 --> 00:32:31,350 So we find that, applying this guy here coordinate-wise, 419 00:32:31,350 --> 00:32:32,910 you have this product. 420 00:32:36,850 --> 00:32:37,390 Great. 421 00:32:37,390 --> 00:32:41,890 Now let's pretend we're expanding everything. 422 00:32:41,890 --> 00:32:51,370 This is a polynomial in 3n variables, 3n variables. 423 00:32:51,370 --> 00:32:52,810 It's degrees is 2n. 424 00:32:55,880 --> 00:33:04,440 So if we expand, we get a bunch of monomials. 425 00:33:04,440 --> 00:33:06,890 And the monomials will have the following form. 426 00:33:13,710 --> 00:33:18,760 So the x's, which-- whose exponents I call i, 427 00:33:18,760 --> 00:33:25,680 the y's, whose exponents I call j, 428 00:33:25,680 --> 00:33:34,900 and the z's, whose exponents I call k, where-- 429 00:33:34,900 --> 00:33:41,140 so I get a sum of monomials like that, 430 00:33:41,140 --> 00:33:51,630 where all of these i, j's, and k's are either 0, 1, or 2. 431 00:33:55,950 --> 00:33:59,283 So I get this big sum of monomials, 432 00:33:59,283 --> 00:34:01,200 and I want to show that it's possible to write 433 00:34:01,200 --> 00:34:07,940 this sum as a small number of functions that can be written 434 00:34:07,940 --> 00:34:12,320 as a product, where one of the factors only 435 00:34:12,320 --> 00:34:16,949 involves one of x, y, z. 436 00:34:16,949 --> 00:34:19,409 So what we can do is to group them. 437 00:34:22,277 --> 00:34:33,120 So group these monomials by the-- 438 00:34:36,860 --> 00:34:39,770 so, for example, I'm going to group these monomials 439 00:34:39,770 --> 00:34:41,780 by using the following observation. 440 00:34:41,780 --> 00:34:58,030 So by pigeonhole, at least one of the exponents of x, 441 00:34:58,030 --> 00:35:05,080 or the exponents of y, or the exponents of z, at least one 442 00:35:05,080 --> 00:35:10,930 of these guys is, at most, 2n over 3. 443 00:35:13,920 --> 00:35:20,328 So I group these monomials by the-- 444 00:35:20,328 --> 00:35:23,010 one of x, y, z that has the smallest exponent. 445 00:35:26,850 --> 00:35:36,240 So the contributions to the rank or the slice 446 00:35:36,240 --> 00:35:51,340 rank from monomials with the degree of x being, at most, 447 00:35:51,340 --> 00:35:57,990 2n over 3, well, I can write such contributions 448 00:35:57,990 --> 00:36:09,770 in the form like that, where this f of x is a monomial, 449 00:36:09,770 --> 00:36:14,030 and the g is a sum of whatever that could come up. 450 00:36:14,030 --> 00:36:17,680 This is a sum, but this is a monomial. 451 00:36:17,680 --> 00:36:21,950 So the number of such terms-- 452 00:36:21,950 --> 00:36:28,870 so the number of such terms is the number 453 00:36:28,870 --> 00:36:35,030 of monomials corresponding to choices of i's, the sum 454 00:36:35,030 --> 00:36:41,760 to 2n over 3, and individual i's coming from 0, 1, or 2. 455 00:36:41,760 --> 00:36:46,270 And that number is precisely M. 456 00:36:46,270 --> 00:36:51,420 So M counts the number of choices of 0, 1, 2's. 457 00:36:51,420 --> 00:36:52,800 There are n of them. 458 00:36:52,800 --> 00:36:55,500 And the sums of the i's is, at most, 2n over 3. 459 00:37:02,125 --> 00:37:03,500 So these are contributions coming 460 00:37:03,500 --> 00:37:07,310 from monomials where the degree of x is, at most, 2n over 3. 461 00:37:07,310 --> 00:37:13,220 And, similarly, with degree of y being 462 00:37:13,220 --> 00:37:21,910 2n over 3, and also degree of z being, at most, 2n over 3. 463 00:37:21,910 --> 00:37:24,020 So. all the monomials can be grouped 464 00:37:24,020 --> 00:37:27,665 in one of these three groups, and I count the contribution 465 00:37:27,665 --> 00:37:29,774 to the slice rank. 466 00:37:29,774 --> 00:37:32,685 AUDIENCE: Do we have a good idea as to how sharp this bound is? 467 00:37:32,685 --> 00:37:34,227 YUFEI ZHAO: So the question is, do we 468 00:37:34,227 --> 00:37:37,730 have a good idea as to how sharp this bound is? 469 00:37:37,730 --> 00:37:38,980 That's a really good question. 470 00:37:38,980 --> 00:37:40,326 I don't know. 471 00:37:40,326 --> 00:37:40,826 Yes. 472 00:37:46,600 --> 00:37:47,100 Great. 473 00:37:47,100 --> 00:37:49,490 So that finishes the proof of this lemma. 474 00:37:58,580 --> 00:38:00,260 So now we have this lemma. 475 00:38:00,260 --> 00:38:04,190 I can compare-- so we have these two lemmas. 476 00:38:04,190 --> 00:38:08,180 One of them tells me the rank of the right-hand side, which 477 00:38:08,180 --> 00:38:16,860 is A. Let's compare ranks, the slice rank. 478 00:38:21,160 --> 00:38:25,200 So the left-hand side, we know it is, at most, this quantity. 479 00:38:25,200 --> 00:38:28,600 And the right-hand side is equal to A. 480 00:38:28,600 --> 00:38:34,620 So we automatically find this bound. 481 00:38:34,620 --> 00:38:38,590 So now we want to know how big this number M is. 482 00:38:38,590 --> 00:38:42,250 So there's actually-- this is a fairly standard problem 483 00:38:42,250 --> 00:38:45,800 to solve to estimate the growth of this function M. So 484 00:38:45,800 --> 00:38:48,500 let me show you how to do it, and this is basically 485 00:38:48,500 --> 00:38:51,590 the universal method. 486 00:38:51,590 --> 00:38:55,940 Notice that I can-- 487 00:38:55,940 --> 00:38:59,280 if I look at this number here, where if-- 488 00:38:59,280 --> 00:39:04,930 so now x is some real number between 0 and 1. 489 00:39:04,930 --> 00:39:09,700 Then I claim the following is true. 490 00:39:13,800 --> 00:39:16,940 And this is because if you expand the right-hand side 491 00:39:16,940 --> 00:39:20,290 and count your monomials-- 492 00:39:20,290 --> 00:39:24,330 so you can just keep track of which monomials occur, 493 00:39:24,330 --> 00:39:27,540 and there are M of them, where you can lower 494 00:39:27,540 --> 00:39:28,988 bound by this quantity here. 495 00:39:32,580 --> 00:39:37,190 So this is kind of related to things in probability theory 496 00:39:37,190 --> 00:39:40,290 on large deviations, to the Cramér's theorem. 497 00:39:40,290 --> 00:39:42,860 But that's what you can do. 498 00:39:42,860 --> 00:39:47,720 So this is true for every value of x, so you pick one 499 00:39:47,720 --> 00:39:51,070 that gives you the best bound. 500 00:39:51,070 --> 00:40:04,980 So M is, at most, the inf of this quantity here. 501 00:40:04,980 --> 00:40:07,760 And to show you any bound, I just 502 00:40:07,760 --> 00:40:10,410 have to plug in some value. 503 00:40:10,410 --> 00:40:15,560 So if I plug in, for example, x being 0.6, 504 00:40:15,560 --> 00:40:18,200 I already get a bound which is the one that I claimed. 505 00:40:24,900 --> 00:40:28,970 And it turns out this step here is not lossy. 506 00:40:28,970 --> 00:40:33,380 As in, basically, up to 1 plus little o1 in the exponent, 507 00:40:33,380 --> 00:40:36,480 this is the correct bound. 508 00:40:36,480 --> 00:40:39,730 And that follows from general results in large deviation 509 00:40:39,730 --> 00:40:41,980 theory. 510 00:40:41,980 --> 00:40:45,850 And that finishes the proof. 511 00:40:45,850 --> 00:40:47,380 Alternatively, you can also estimate 512 00:40:47,380 --> 00:40:50,680 M using Sterling's formula. 513 00:40:50,680 --> 00:40:51,930 But this, I think, is cleaner. 514 00:40:55,060 --> 00:40:55,910 Great. 515 00:40:55,910 --> 00:40:58,890 Any questions? 516 00:40:58,890 --> 00:40:59,390 Yes. 517 00:40:59,390 --> 00:41:00,800 AUDIENCE: [INAUDIBLE]. 518 00:41:06,748 --> 00:41:07,540 YUFEI ZHAO: Ah, OK. 519 00:41:07,540 --> 00:41:10,010 So why is this step true? 520 00:41:10,010 --> 00:41:14,210 So if you expand the right-hand side, 521 00:41:14,210 --> 00:41:21,040 you see that the right-hand side is upper bounded by all these 522 00:41:21,040 --> 00:41:23,820 a, b, c, as in-- 523 00:41:23,820 --> 00:41:33,300 same as over here, x to the b plus 2c. 524 00:41:37,300 --> 00:41:41,140 And because how many terms-- 525 00:41:41,140 --> 00:41:45,635 and, also, there's a binomial coefficient term. 526 00:41:45,635 --> 00:41:47,760 So, basically, I'm doing the multinomial expansion, 527 00:41:47,760 --> 00:41:50,265 except I toss out everything which is not part of the index. 528 00:41:52,795 --> 00:42:00,045 And because b plus 2c is, at most, 2n over 3, 529 00:42:00,045 --> 00:42:04,030 I get M times x to the 2n over 3. 530 00:42:07,495 --> 00:42:08,397 OK? 531 00:42:08,397 --> 00:42:08,980 AUDIENCE: Yes. 532 00:42:13,435 --> 00:42:17,970 YUFEI ZHAO: Now I want to convey a sense of mystique 533 00:42:17,970 --> 00:42:19,280 about this proof. 534 00:42:19,280 --> 00:42:21,960 This is a really cool proof. 535 00:42:21,960 --> 00:42:24,070 So because you're seeing a lecture, 536 00:42:24,070 --> 00:42:25,800 maybe it went by very quickly. 537 00:42:25,800 --> 00:42:29,130 But when this proof came out, people were very shocked. 538 00:42:29,130 --> 00:42:31,560 They didn't expect that this problem would be tackled, 539 00:42:31,560 --> 00:42:37,950 would be solved using a method that is so unexpected. 540 00:42:37,950 --> 00:42:41,130 And this is part of this power of the algebraic method 541 00:42:41,130 --> 00:42:45,120 in combinatorics, where we often end up 542 00:42:45,120 --> 00:42:47,100 with these short, surprising proofs that 543 00:42:47,100 --> 00:42:48,842 take a very long time to find. 544 00:42:48,842 --> 00:42:50,300 But they turn out to be very short. 545 00:42:50,300 --> 00:42:51,450 So this is very short. 546 00:42:51,450 --> 00:42:55,080 This was basically a four-page paper. 547 00:42:55,080 --> 00:42:56,850 But when they work, they work beautifully. 548 00:42:56,850 --> 00:42:58,100 They work like magic. 549 00:42:58,100 --> 00:43:00,480 But it's hard to predict when they work. 550 00:43:00,480 --> 00:43:03,800 And, also, these methods are somewhat fragile. 551 00:43:03,800 --> 00:43:05,960 So, unlike the Fourier analytic methods 552 00:43:05,960 --> 00:43:10,120 that we saw last time, with that method, it's very analytic. 553 00:43:10,120 --> 00:43:13,160 It works in one situation, you can play with it, 554 00:43:13,160 --> 00:43:15,780 massage it, make it work in a different situation. 555 00:43:15,780 --> 00:43:20,390 Here, we're using something very implicit, very special 556 00:43:20,390 --> 00:43:24,390 about these many variables. 557 00:43:24,390 --> 00:43:28,080 And if you try to tweak the problem just a little bit, 558 00:43:28,080 --> 00:43:29,940 the method seems to break down. 559 00:43:29,940 --> 00:43:31,980 So, in particular, it is open how 560 00:43:31,980 --> 00:43:38,570 to extend this method to other settings. 561 00:43:38,570 --> 00:43:40,590 It's not even clear what the results should be. 562 00:43:40,590 --> 00:43:43,560 So it's open to extend it to, for example, 4-AP. 563 00:43:48,910 --> 00:43:58,130 So we do not know if the maximum size of 4-AP-free subset of F5 564 00:43:58,130 --> 00:44:06,080 to the n is less than some constant, 4.99 to the n. 565 00:44:06,080 --> 00:44:10,511 So that's very much open. 566 00:44:10,511 --> 00:44:12,420 By the way, all of this 3-AP stuff, 567 00:44:12,420 --> 00:44:14,390 right now I've only done it in F3, 568 00:44:14,390 --> 00:44:17,690 but it works for 3-AP in any finite field. 569 00:44:20,450 --> 00:44:23,855 It also is open to extend it to corners. 570 00:44:27,110 --> 00:44:30,950 So you can define a notion of corners. 571 00:44:30,950 --> 00:44:34,010 So, previously, we saw corners in integer grid. 572 00:44:34,010 --> 00:44:35,760 If I replace integer by some other group, 573 00:44:35,760 --> 00:44:38,570 you can define a notion of corners there. 574 00:44:38,570 --> 00:44:43,400 So not clear how to extend this method to corners. 575 00:44:43,400 --> 00:44:46,520 And, also, is there some way to extend some ideas 576 00:44:46,520 --> 00:44:49,010 from this method to the integers? 577 00:44:49,010 --> 00:44:52,680 It completely fails, so this method is not clear at all 578 00:44:52,680 --> 00:44:56,190 how you might have it work in a setting 579 00:44:56,190 --> 00:44:58,500 where you don't have this high dimensionality. 580 00:44:58,500 --> 00:44:59,610 I mean, the result will be different, 581 00:44:59,610 --> 00:45:01,950 because, integers, we know that there's no power saving, 582 00:45:01,950 --> 00:45:03,617 but maybe you can get some other bounds. 583 00:45:06,960 --> 00:45:07,990 Any questions? 584 00:45:12,690 --> 00:45:13,460 OK. 585 00:45:13,460 --> 00:45:14,708 great. 586 00:45:14,708 --> 00:45:15,500 Let's take a break. 587 00:45:18,570 --> 00:45:20,270 So in the first part of today's lecture, 588 00:45:20,270 --> 00:45:23,468 I showed you a proof of Roth's theorem. 589 00:45:23,468 --> 00:45:25,510 In F3 to the n, that gave you a much better bound 590 00:45:25,510 --> 00:45:28,480 than what we did with Fourier. 591 00:45:28,480 --> 00:45:30,490 Second part, I want to show you another proof. 592 00:45:30,490 --> 00:45:33,720 So yet another proof of Roth in F2 to the n, 593 00:45:33,720 --> 00:45:36,700 and this time giving you a much worse bound. 594 00:45:36,700 --> 00:45:38,850 But, of course, I do this for a reason. 595 00:45:38,850 --> 00:45:42,830 So it will give you the new result. 596 00:45:42,830 --> 00:45:45,970 So it will give you some more information about 3-AP's and F3 597 00:45:45,970 --> 00:45:46,630 to the n. 598 00:45:46,630 --> 00:45:50,530 But the more important reason is that in this course 599 00:45:50,530 --> 00:45:52,840 I try to make some connections between graph theory 600 00:45:52,840 --> 00:45:54,430 on one hand and additive combinatorics 601 00:45:54,430 --> 00:45:55,510 on the other hand. 602 00:45:55,510 --> 00:45:57,390 And, so far, we've seen some analogies. 603 00:45:57,390 --> 00:45:59,860 Well, in the proof of Szemeredi's graph regularity 604 00:45:59,860 --> 00:46:02,680 lemma versus the proof-- the Fourier analytic proof 605 00:46:02,680 --> 00:46:07,810 of Roth's theorem, there was this common theme of structure 606 00:46:07,810 --> 00:46:09,920 versus pseudorandomness. 607 00:46:09,920 --> 00:46:13,360 But the actual execution of the proofs are somewhat different. 608 00:46:13,360 --> 00:46:16,480 Because, on one hand, in regularity lemma, 609 00:46:16,480 --> 00:46:18,565 you have energy increment. 610 00:46:18,565 --> 00:46:21,650 You have partitioning and energy increment. 611 00:46:21,650 --> 00:46:23,710 And, on the other hand, with Roth, 612 00:46:23,710 --> 00:46:26,065 you have density increment. 613 00:46:26,065 --> 00:46:27,190 Or you're not partitioning. 614 00:46:27,190 --> 00:46:28,870 You're zooming in. 615 00:46:28,870 --> 00:46:31,880 Take a set, find some structure, zoom in, find some structure, 616 00:46:31,880 --> 00:46:32,730 zoom in. 617 00:46:32,730 --> 00:46:35,210 You'll get density increment. 618 00:46:35,210 --> 00:46:38,690 So it's similar, but differently executed. 619 00:46:38,690 --> 00:46:41,030 So, today-- I mean, this second half, 620 00:46:41,030 --> 00:46:43,520 I want to show you how to do a different proof of Roth's 621 00:46:43,520 --> 00:46:46,640 theorem that is much more closely related 622 00:46:46,640 --> 00:46:50,600 to the regularity proof, so that has this energy increment 623 00:46:50,600 --> 00:46:53,120 element to it. 624 00:46:53,120 --> 00:46:56,920 And I show you this proof because it also gives you 625 00:46:56,920 --> 00:47:00,890 a stronger consequence. 626 00:47:00,890 --> 00:47:08,300 And, namely, we'll get that there is also 627 00:47:08,300 --> 00:47:14,840 not just 3-AP's but 3-AP's with popular difference. 628 00:47:14,840 --> 00:47:17,790 So here's the result that we'll see today. 629 00:47:17,790 --> 00:47:20,256 So it's proved by Ben Green. 630 00:47:20,256 --> 00:47:32,370 That for every epsilon, there exists some n0 such that every 631 00:47:32,370 --> 00:47:43,790 A in subset of F3 to the n with density alpha, 632 00:47:43,790 --> 00:47:58,990 there exists some nonzero y such that the number of 3-AP's with 633 00:47:58,990 --> 00:48:00,220 common difference y-- 634 00:48:04,340 --> 00:48:06,200 so let's think about what's going on here. 635 00:48:06,200 --> 00:48:09,070 So if I just give you a set A and ask you 636 00:48:09,070 --> 00:48:12,640 how many 3-AP's are there, and compare it 637 00:48:12,640 --> 00:48:16,420 to what you get from random, random meaning 638 00:48:16,420 --> 00:48:20,320 if A were a random set of the same density. 639 00:48:20,320 --> 00:48:23,430 So question is, can the number of 3-AP's be less 640 00:48:23,430 --> 00:48:27,370 than the random count? 641 00:48:27,370 --> 00:48:28,570 And the answer is yes. 642 00:48:28,570 --> 00:48:32,595 So, for example, you could have-- 643 00:48:32,595 --> 00:48:36,020 in the integers, you can have a barren type construction that 644 00:48:36,020 --> 00:48:37,560 has no 3-AP's. 645 00:48:37,560 --> 00:48:40,080 So, certainly, that's fewer 3-AP's than random. 646 00:48:40,080 --> 00:48:43,800 And you can do similar things here. 647 00:48:43,800 --> 00:48:47,880 But what Green's theorem says is that there 648 00:48:47,880 --> 00:48:51,090 exists some popular common difference-- 649 00:48:51,090 --> 00:48:55,273 so this is a popular common difference-- 650 00:48:58,520 --> 00:49:03,170 such that the number of 3-AP's in A 651 00:49:03,170 --> 00:49:09,240 with this common difference is at least as much as 652 00:49:09,240 --> 00:49:12,770 what you should expect in a random setting, 653 00:49:12,770 --> 00:49:13,970 up to a minus epsilon. 654 00:49:18,830 --> 00:49:20,510 So this is the theorem. 655 00:49:20,510 --> 00:49:23,006 So let me say the intuition again. 656 00:49:23,006 --> 00:49:26,330 It says that, given an arbitrary set A, 657 00:49:26,330 --> 00:49:30,180 provided the space dimension is large enough, 658 00:49:30,180 --> 00:49:32,830 there exists some popular common difference, 659 00:49:32,830 --> 00:49:35,330 where popular means that the number of 3-AP's 660 00:49:35,330 --> 00:49:38,990 with that common difference is at least roughly as many 661 00:49:38,990 --> 00:49:39,530 as random. 662 00:49:42,410 --> 00:49:46,420 In particular, this proves Roth's theorem, 663 00:49:46,420 --> 00:49:49,373 because you have at least some 3-AP's. 664 00:49:49,373 --> 00:49:50,290 But it tells you more. 665 00:49:50,290 --> 00:49:52,660 It tells you there's some common difference that 666 00:49:52,660 --> 00:49:57,500 has a lot of 3-AP's, even though, on average, 667 00:49:57,500 --> 00:49:59,690 if you just take an average, if you take a random y, 668 00:49:59,690 --> 00:50:00,300 this is false. 669 00:50:04,160 --> 00:50:05,897 Any questions about the statement? 670 00:50:10,630 --> 00:50:15,200 So Green developed an arithmetic analog 671 00:50:15,200 --> 00:50:18,030 of Szemeredi's graph regularity lemma 672 00:50:18,030 --> 00:50:19,500 in order to prove this theorem. 673 00:50:26,120 --> 00:50:33,050 So starting with Szemeredi's graph regularity lemma, 674 00:50:33,050 --> 00:50:37,340 he found a way to import that technique into the arithmetic 675 00:50:37,340 --> 00:50:40,560 setting, in F3 to the n. 676 00:50:40,560 --> 00:50:43,850 So I want to show you how, roughly, how this is done. 677 00:50:43,850 --> 00:50:47,030 And just like in Szemeredi's graph regularity lemma, 678 00:50:47,030 --> 00:50:50,540 there were unavoidable bounds which are of power type, 679 00:50:50,540 --> 00:50:53,240 the same thing is true in the arithmetic setting. 680 00:50:53,240 --> 00:50:59,980 So Green's proof shows that the theorem is true, 681 00:50:59,980 --> 00:51:07,110 with n0 being something like tower in-- 682 00:51:07,110 --> 00:51:09,900 a tower of twos. 683 00:51:09,900 --> 00:51:14,230 The height of the tower is a polynomial in 1 over epsilon. 684 00:51:17,974 --> 00:51:21,390 So just like in regularity lemma for graphs. 685 00:51:21,390 --> 00:51:24,690 So this was recently improved in a paper 686 00:51:24,690 --> 00:51:30,712 by Fox and Pham just a couple of years ago, where-- 687 00:51:30,712 --> 00:51:33,750 and this is the proof that I will show you today-- 688 00:51:33,750 --> 00:51:38,400 where you can take n0 to be slightly better but still 689 00:51:38,400 --> 00:51:43,290 a tower, but a tower of now height log in 1 over epsilon. 690 00:51:43,290 --> 00:51:45,620 So it's from a really, really big tower 691 00:51:45,620 --> 00:51:47,310 to slightly less big tower. 692 00:51:50,060 --> 00:51:52,010 But, more importantly, it turns out-- 693 00:51:52,010 --> 00:51:55,820 so they also showed that this is tight. 694 00:51:58,610 --> 00:52:01,270 You cannot do better. 695 00:52:01,270 --> 00:52:05,915 There exists constructions, there exist sets A for which 696 00:52:05,915 --> 00:52:06,415 you-- 697 00:52:06,415 --> 00:52:08,910 I mean, this theorem is false if you 698 00:52:08,910 --> 00:52:12,390 replace the big O by less than some very small constant. 699 00:52:15,660 --> 00:52:19,100 So many applications of the regularity lemma. 700 00:52:19,100 --> 00:52:22,170 That first proof, maybe using regularity, is difficult. Well, 701 00:52:22,170 --> 00:52:23,700 it gives you a very poor bound. 702 00:52:23,700 --> 00:52:27,270 But, subsequently, there were other proofs, better proofs, 703 00:52:27,270 --> 00:52:30,810 that give you non-tower type bounds. 704 00:52:30,810 --> 00:52:33,000 But this is the first application 705 00:52:33,000 --> 00:52:35,610 that we've seen where, it turns out, 706 00:52:35,610 --> 00:52:41,520 the regularity lemma gives you the correct bound. 707 00:52:41,520 --> 00:52:44,000 So it's really-- you need a tower-type bound. 708 00:52:44,000 --> 00:52:45,500 I mean, we know the regularity lemma 709 00:52:45,500 --> 00:52:47,240 itself needs tower-type bounds. 710 00:52:47,240 --> 00:52:48,890 But it turns out this application also 711 00:52:48,890 --> 00:52:51,208 needs tower-type bounds. 712 00:52:51,208 --> 00:52:52,250 That's quite interesting. 713 00:52:52,250 --> 00:52:54,080 So, here, the use of regularity is 714 00:52:54,080 --> 00:52:57,180 really necessary in this quantitative sense. 715 00:53:02,342 --> 00:53:03,300 So let's see the proof. 716 00:53:05,970 --> 00:53:11,940 So let me first prove a slightly technical lemma 717 00:53:11,940 --> 00:53:13,748 about bounded increments. 718 00:53:16,440 --> 00:53:19,055 So this is-- corresponds to the statement 719 00:53:19,055 --> 00:53:20,550 that if you have energy increments, 720 00:53:20,550 --> 00:53:22,800 you can not increase too many times, 721 00:53:22,800 --> 00:53:25,260 but in a slightly different form. 722 00:53:25,260 --> 00:53:28,380 So suppose you have numbers alpha and epsilon 723 00:53:28,380 --> 00:53:30,060 bigger than 0. 724 00:53:30,060 --> 00:53:39,450 And if you have this sequence of a's between 0 and 1, 725 00:53:39,450 --> 00:53:44,430 and such that a0 is at least alpha, 726 00:53:44,430 --> 00:53:50,760 then there exists some k, at most log 727 00:53:50,760 --> 00:53:56,670 base 2 of 1 over epsilon, such that 2 728 00:53:56,670 --> 00:54:01,950 a sub k minus a sub k plus 1 is at least alpha 729 00:54:01,950 --> 00:54:04,248 cubed minus epsilon. 730 00:54:04,248 --> 00:54:05,540 So don't worry about this form. 731 00:54:05,540 --> 00:54:08,400 We'll see shorty why we want something like that. 732 00:54:08,400 --> 00:54:10,860 But the proof itself is very straightforward. 733 00:54:10,860 --> 00:54:16,461 Because, otherwise-- so you start with a0. 734 00:54:16,461 --> 00:54:21,780 Now, then, if this is not true for k equals to 0, 735 00:54:21,780 --> 00:54:30,560 then a1 is at least 2 a0 minus epsilon cubed plus epsilon. 736 00:54:30,560 --> 00:54:35,210 So a0 is at least alpha cubed. 737 00:54:35,210 --> 00:54:39,265 So if-- otherwise, you have some lower bound on alpha 738 00:54:39,265 --> 00:54:49,080 1, which is at least alpha cubed plus epsilon. 739 00:54:49,080 --> 00:54:52,410 And, likewise, you have some lower bound on alpha 2. 740 00:54:59,880 --> 00:55:01,866 You have some lower bound on-- 741 00:55:01,866 --> 00:55:09,680 sorry-- alpha 2, and this lower bound is plus 2 epsilon. 742 00:55:09,680 --> 00:55:10,920 So you keep iterating. 743 00:55:10,920 --> 00:55:16,450 You see the next thing is 4 epsilon, and so on. 744 00:55:16,450 --> 00:55:20,640 So if you get to more than this many iterations, 745 00:55:20,640 --> 00:55:23,430 you go more than 1. 746 00:55:23,430 --> 00:55:32,790 So alpha k is bigger than 1 if k is ceiling of log base 2 747 00:55:32,790 --> 00:55:34,140 of 1 over epsilon. 748 00:55:34,140 --> 00:55:38,710 And that will be a contradiction to the hypothesis. 749 00:55:38,710 --> 00:55:42,990 So this is a small variation on this fact that you cannot 750 00:55:42,990 --> 00:55:45,220 increment too many times. 751 00:55:45,220 --> 00:55:47,130 Each time, you go up by a bit. 752 00:55:47,130 --> 00:55:51,990 Whereas, we save a little bit because the number 753 00:55:51,990 --> 00:55:53,780 of iterations is now logarithmic. 754 00:55:53,780 --> 00:55:55,290 So you double in epsilon each time. 755 00:56:01,370 --> 00:56:11,490 If I give you a function f on F3 to the n, and U is a subspace-- 756 00:56:11,490 --> 00:56:13,010 so this notation means subspace. 757 00:56:17,360 --> 00:56:24,270 Let me write f sub U to be the function obtained 758 00:56:24,270 --> 00:56:31,131 by averaging f on each U coset. 759 00:56:31,131 --> 00:56:32,730 So you have some subspace. 760 00:56:32,730 --> 00:56:36,420 You partition your space into translates of that subspace, 761 00:56:36,420 --> 00:56:39,030 and you replace the value of f on each coset 762 00:56:39,030 --> 00:56:41,520 by its average on that coset. 763 00:56:41,520 --> 00:56:43,710 So this is similar to what we did with graphons. 764 00:56:43,710 --> 00:56:44,720 You're stepping. 765 00:56:44,720 --> 00:56:46,230 So you're averaging on each block. 766 00:56:51,730 --> 00:56:53,670 So now let me prove something which 767 00:56:53,670 --> 00:56:57,690 is kind of like an arithmetic regularity lemma. 768 00:57:01,610 --> 00:57:05,330 And I mean, this statement will be new to you, 769 00:57:05,330 --> 00:57:08,210 but it should look similar to some of the statements we've 770 00:57:08,210 --> 00:57:09,430 seen before in the course. 771 00:57:12,150 --> 00:57:15,840 And the statement is that, for every epsilon, 772 00:57:15,840 --> 00:57:21,770 there exists some m which is a function of epsilon. 773 00:57:21,770 --> 00:57:25,070 And, in fact, it will be bounded, 774 00:57:25,070 --> 00:57:28,850 in terms of tower of height, at most order 775 00:57:28,850 --> 00:57:31,250 logarithmic in 1 over epsilon. 776 00:57:31,250 --> 00:57:36,860 Such that for every function f on F3 777 00:57:36,860 --> 00:57:45,600 to the n that are values bounded between 0 and 1, there exists 778 00:57:45,600 --> 00:57:57,750 subspaces W and U, where the codimension of W 779 00:57:57,750 --> 00:57:59,020 is, at most, m. 780 00:57:59,020 --> 00:58:01,810 So you should think of this as the course partition 781 00:58:01,810 --> 00:58:05,480 and the fine partition in the partition regularity lemma. 782 00:58:05,480 --> 00:58:07,120 And the codimension is-- 783 00:58:07,120 --> 00:58:09,610 corresponds to the number of pieces. 784 00:58:09,610 --> 00:58:13,150 So three ways to codimension is the number of cosets. 785 00:58:13,150 --> 00:58:18,010 So you have bounded many parts, and have two partitions. 786 00:58:18,010 --> 00:58:22,540 And what I would like is that the number-- 787 00:58:22,540 --> 00:58:23,920 so if I-- 788 00:58:23,920 --> 00:58:28,730 I want f to be pseudorandom after doing this partitioning, 789 00:58:28,730 --> 00:58:29,960 so to speak. 790 00:58:29,960 --> 00:58:32,050 And this corresponds to the statement 791 00:58:32,050 --> 00:58:39,520 that if I look f minus fW, then the maximum Fourier coefficient 792 00:58:39,520 --> 00:58:44,020 is quite small, where quite small means, at most, 793 00:58:44,020 --> 00:58:51,940 epsilon over the size of U complement. 794 00:58:51,940 --> 00:58:55,920 So size of U perp. 795 00:58:55,920 --> 00:59:02,820 And, also, there is this other condition 796 00:59:02,820 --> 00:59:10,310 which tells you that the L3 norms between f sub U 797 00:59:10,310 --> 00:59:16,400 and f sub W are related in this way. 798 00:59:23,798 --> 00:59:25,090 So we haven't seen this before. 799 00:59:25,090 --> 00:59:29,070 In fact, specifically, this inequality is very ad hoc 800 00:59:29,070 --> 00:59:31,740 to the application of popular difference in 3-AP's. 801 00:59:31,740 --> 00:59:33,570 But we have seen something similar, 802 00:59:33,570 --> 00:59:36,030 where this relationship is replaced 803 00:59:36,030 --> 00:59:38,460 by something that accounts for the difference between L2 804 00:59:38,460 --> 00:59:39,828 norms. 805 00:59:39,828 --> 00:59:41,370 So if you go back to your notes, when 806 00:59:41,370 --> 00:59:44,730 we discussed regularity lemma in a more analytic fashion, 807 00:59:44,730 --> 00:59:45,402 we have that. 808 00:59:45,402 --> 00:59:46,860 And you should think of this-- when 809 00:59:46,860 --> 00:59:50,497 we discussed strong regularity lemma, this definition here, 810 00:59:50,497 --> 00:59:52,080 this roughly corresponds to definition 811 00:59:52,080 --> 00:59:54,570 that in the fine partition versus the course partition 812 00:59:54,570 --> 00:59:57,510 the edge densities are roughly similar, that when you 813 00:59:57,510 --> 01:00:00,290 do the further partitioning, you're not changing densities 814 01:00:00,290 --> 01:00:01,230 up by very much. 815 01:00:04,020 --> 01:00:08,315 So that's the arithmetic regularity lemma. 816 01:00:08,315 --> 01:00:09,690 And once you have the statement-- 817 01:00:09,690 --> 01:00:11,760 I mean, I think the hardest part is writing down the statement. 818 01:00:11,760 --> 01:00:14,760 Once you have the statement, the proof itself is kind of this 819 01:00:14,760 --> 01:00:20,230 follow your nose approach, where you first define 820 01:00:20,230 --> 01:00:21,840 the sequence of epsilons. 821 01:00:21,840 --> 01:00:26,120 Epsilon 0 is 1, and epsilon sub k plus 1-- 822 01:00:26,120 --> 01:00:28,060 and don't worry about this for now. 823 01:00:28,060 --> 01:00:32,140 You will see in a second why these numbers are chosen. 824 01:00:41,500 --> 01:00:46,660 Let me write R sub k to be the set of r's-- so there will be 825 01:00:46,660 --> 01:00:48,820 characters-- 826 01:00:48,820 --> 01:00:52,285 such that the Fourier coefficient fr 827 01:00:52,285 --> 01:00:54,400 is at least epsilon sub k. 828 01:00:56,960 --> 01:00:59,570 So the r's are supposed to identify how we're 829 01:00:59,570 --> 01:01:01,175 going to do the partitioning. 830 01:01:05,830 --> 01:01:11,540 Now, the size of this R is bounded. 831 01:01:11,540 --> 01:01:17,140 So I claim that the size of R is, at most, 1 832 01:01:17,140 --> 01:01:22,000 over epsilon sub k squared. 833 01:01:22,000 --> 01:01:29,470 And that's because there is this parsable identity, which 834 01:01:29,470 --> 01:01:34,360 tells you that the L2 sum of the Fourier coefficients 835 01:01:34,360 --> 01:01:40,480 is equal to the L2 of the function, which is at most 1. 836 01:01:40,480 --> 01:01:42,820 So the number of Fourier coefficients that exceed 837 01:01:42,820 --> 01:01:44,800 a certain quantity cannot be too many. 838 01:01:52,950 --> 01:02:00,930 So let U now be the subspace defined by taking 839 01:02:00,930 --> 01:02:03,780 the orthogonal complement of these r's. 840 01:02:07,420 --> 01:02:15,550 And let's note that if we take alpha sub k to be the-- 841 01:02:15,550 --> 01:02:22,270 if we take alpha sub k to be the L3 norm cubed of the function 842 01:02:22,270 --> 01:02:29,160 derived from averaging f along the U's, and then looking 843 01:02:29,160 --> 01:02:32,720 at the third moment of these densities. 844 01:02:32,720 --> 01:02:38,610 So these alphas, we can apply the increment lemma initially 845 01:02:38,610 --> 01:02:41,910 to deduce that there exists-- 846 01:02:41,910 --> 01:02:44,520 so, in particular, this number here is at least alpha 847 01:02:44,520 --> 01:02:46,290 cubed by convexity. 848 01:02:49,890 --> 01:02:57,620 So by the previous lemma, there exists some k, no more than 849 01:02:57,620 --> 01:02:59,550 on the order of 1 over-- 850 01:02:59,550 --> 01:03:05,337 of log 1 over epsilon, such that 2 alpha sub 851 01:03:05,337 --> 01:03:15,580 k minus alpha sub k plus 1 is at least the density 852 01:03:15,580 --> 01:03:18,880 of f cubed minus epsilon. 853 01:03:18,880 --> 01:03:22,000 So this alpha is supposed to be the density of f. 854 01:03:31,320 --> 01:03:34,310 So we find this k. 855 01:03:34,310 --> 01:03:40,610 And we have this bound over here from satisfying 856 01:03:40,610 --> 01:03:43,590 that inequality. 857 01:03:43,590 --> 01:03:46,820 So this is the density increment argument, the energy increment 858 01:03:46,820 --> 01:03:47,800 argument. 859 01:03:47,800 --> 01:03:50,600 So we're doing the energy increment argument, basically 860 01:03:50,600 --> 01:03:52,100 the same argument as the one that we 861 01:03:52,100 --> 01:03:55,100 did when we discussed graph regularity lemma, 862 01:03:55,100 --> 01:03:57,088 but now presented in a slightly different form 863 01:03:57,088 --> 01:03:58,380 and a different order of logic. 864 01:03:58,380 --> 01:04:01,330 But it's the same argument. 865 01:04:01,330 --> 01:04:03,960 And what we would like to show is that you also 866 01:04:03,960 --> 01:04:06,600 have this pseudorandomness condition about having 867 01:04:06,600 --> 01:04:08,343 small Fourier coefficients. 868 01:04:16,560 --> 01:04:19,460 So what's happening here with the Fourier coefficients? 869 01:04:19,460 --> 01:04:23,200 Now, how is the Fourier coefficient of an average 870 01:04:23,200 --> 01:04:26,050 f related to the original f? 871 01:04:26,050 --> 01:04:31,680 So that's something you want to understand up there. 872 01:04:31,680 --> 01:04:34,200 And that's something that's not hard to analyze. 873 01:04:34,200 --> 01:04:40,395 Because if you have a function U or W-- 874 01:04:40,395 --> 01:04:44,790 so either one-- then the Fourier coefficients 875 01:04:44,790 --> 01:04:48,000 of this average version is very much related 876 01:04:48,000 --> 01:04:49,050 to the original function. 877 01:04:51,810 --> 01:04:58,510 It turns out that if you take an r which 878 01:04:58,510 --> 01:05:07,780 is in the orthogonal complement, then the Fourier coefficient 879 01:05:07,780 --> 01:05:10,010 doesn't change. 880 01:05:10,010 --> 01:05:16,800 And if you are not in the orthogonal complement, 881 01:05:16,800 --> 01:05:19,950 then the Fourier coefficient gets zeroed out. 882 01:05:27,170 --> 01:05:29,990 So that's something that's not too hard to check, 883 01:05:29,990 --> 01:05:33,550 and I urge you to think about it. 884 01:05:33,550 --> 01:05:37,590 So, with that in mind, let's go back to verify this over here. 885 01:05:40,680 --> 01:05:43,600 So what we have now is that the-- 886 01:05:49,072 --> 01:05:55,570 so this quantity, which measures the largest Fourier 887 01:05:55,570 --> 01:06:01,430 coefficient, the difference between f and U sub k plus 1, 888 01:06:01,430 --> 01:06:05,220 is, at most-- 889 01:06:05,220 --> 01:06:08,030 and what U sub k plus 1 is doing is 890 01:06:08,030 --> 01:06:11,090 we're looking at possible large Fourier coefficients, 891 01:06:11,090 --> 01:06:14,060 and we are getting rid of them. 892 01:06:14,060 --> 01:06:18,680 So we're zeroing out these large Fourier coefficients, 893 01:06:18,680 --> 01:06:21,350 so that the remaining Fourier coefficients are all 894 01:06:21,350 --> 01:06:22,180 quite small. 895 01:06:31,430 --> 01:06:35,060 But we chose our R so that if-- 896 01:06:35,060 --> 01:06:36,260 so this big R-- 897 01:06:36,260 --> 01:06:38,450 so that if your little r is not in big R, 898 01:06:38,450 --> 01:06:40,400 then the Fourier coefficient must be small. 899 01:06:40,400 --> 01:06:44,960 That's how we chose the big R. So we 900 01:06:44,960 --> 01:06:49,940 have this bound over here. 901 01:06:49,940 --> 01:07:01,790 And by the definition of the epsilon, we have that bound. 902 01:07:01,790 --> 01:07:04,040 And, also, we're combining with this estimate, 903 01:07:04,040 --> 01:07:08,850 upper bound estimate on the size of R sub k. 904 01:07:08,850 --> 01:07:15,010 So point being we have that. 905 01:07:15,010 --> 01:07:23,190 So now take W to be U sub k plus 1, and U to b U sub k, 906 01:07:23,190 --> 01:07:26,670 and then we have everything that we want. 907 01:07:26,670 --> 01:07:27,360 Question, yes. 908 01:07:27,360 --> 01:07:30,540 AUDIENCE: Why is the codimension of W small? 909 01:07:30,540 --> 01:07:33,550 YUFEI ZHAO: Question is, why is the codimension of W small? 910 01:07:33,550 --> 01:07:35,240 So what is the codimension of W? 911 01:07:38,590 --> 01:07:42,860 So we want to know that the codimension of W is bounded. 912 01:07:42,860 --> 01:07:45,620 So the codimension of W is-- 913 01:07:48,742 --> 01:07:52,720 I mean, the codimension of any of these U sub k's is, 914 01:07:52,720 --> 01:08:01,090 at most, 3 raised to the number of r's that produce it. 915 01:08:01,090 --> 01:08:04,540 And the size of R is bounded. 916 01:08:04,540 --> 01:08:13,660 So if we pick m so that it uniformly bounds the size of R, 917 01:08:13,660 --> 01:08:16,660 then we have a bound on the codimension. 918 01:08:16,660 --> 01:08:17,588 So that's important. 919 01:08:17,588 --> 01:08:19,630 So we need to know that the codimension is small. 920 01:08:19,630 --> 01:08:21,838 Otherwise, if you don't have the bound on codimension 921 01:08:21,838 --> 01:08:26,062 you can just take the zero subspace, 922 01:08:26,062 --> 01:08:27,479 and, trivially, everything's true. 923 01:08:31,490 --> 01:08:34,149 We have a regularity lemma, and what comes with a regularity 924 01:08:34,149 --> 01:08:35,652 lemma is a counting lemma. 925 01:08:35,652 --> 01:08:37,319 So let me write down the counting lemma, 926 01:08:37,319 --> 01:08:38,319 and I'll skip the proof. 927 01:08:44,560 --> 01:08:50,439 So the counting lemma tells you that if you have f and g 928 01:08:50,439 --> 01:09:01,510 both functions on F3 to the n, and U is a subspace F, then-- 929 01:09:01,510 --> 01:09:02,859 so let me define-- 930 01:09:02,859 --> 01:09:05,800 so the quantity that I'm interested in is-- 931 01:09:08,939 --> 01:09:19,069 so I'm interested in understanding 3-AP's where 932 01:09:19,069 --> 01:09:21,630 the common difference is in a particular subspace. 933 01:09:32,700 --> 01:09:41,920 So we claim that the 3-AP count of f with common difference 934 01:09:41,920 --> 01:09:43,990 restricted to the subspace U-- 935 01:09:47,850 --> 01:09:51,370 so it's similar between f and g if f and g are 936 01:09:51,370 --> 01:09:53,979 close to each other in Fourier. 937 01:09:57,490 --> 01:09:59,200 Well, not quite, because-- 938 01:09:59,200 --> 01:10:01,890 so something like this, we saw earlier 939 01:10:01,890 --> 01:10:03,990 in the proof of Roth's theorem if we don't 940 01:10:03,990 --> 01:10:05,518 restrict the common difference. 941 01:10:05,518 --> 01:10:07,560 Turns out, if you restrict the common difference, 942 01:10:07,560 --> 01:10:10,020 you lose a little bit. 943 01:10:10,020 --> 01:10:12,510 So you lose a factor which is basically 944 01:10:12,510 --> 01:10:19,280 the size of the complement of U. So I won't prove that. 945 01:10:21,850 --> 01:10:25,500 But now let me go on to the punch line. 946 01:10:29,920 --> 01:10:35,830 So if we start with, again, f function in your space, 947 01:10:35,830 --> 01:10:45,340 taking bounds between 0 and 1, and I have subspaces U and W, 948 01:10:45,340 --> 01:10:49,000 I claim that the-- 949 01:10:49,000 --> 01:10:53,142 if I look at f averaged through W, 950 01:10:53,142 --> 01:10:57,370 and I consider 3-AP counts with common difference restricted 951 01:10:57,370 --> 01:11:01,480 to U, then this quantity here is lower 952 01:11:01,480 --> 01:11:08,490 bounded by this difference between L3 norms. 953 01:11:16,850 --> 01:11:17,890 So I claim this is true. 954 01:11:20,700 --> 01:11:22,480 So this is just some inequality. 955 01:11:22,480 --> 01:11:23,780 This is some inequality. 956 01:11:26,830 --> 01:11:30,130 So of all the things that I did back in high school 957 01:11:30,130 --> 01:11:32,440 doing math competitions, I think the one skill which, 958 01:11:32,440 --> 01:11:34,690 I think, I find most helpful now is 959 01:11:34,690 --> 01:11:36,830 being able to do inequalities. 960 01:11:36,830 --> 01:11:40,000 And I thought I would never see these three-variable 961 01:11:40,000 --> 01:11:42,430 inequalities again, but when I saw this one-- 962 01:11:42,430 --> 01:11:44,560 so Fox and Pham, when they first showed me 963 01:11:44,560 --> 01:11:47,320 a somewhat different proof of an approach that 964 01:11:47,320 --> 01:11:49,763 didn't go through this specific inequality, I told them, 965 01:11:49,763 --> 01:11:51,930 hey, there's this thing I remember from high school. 966 01:11:51,930 --> 01:11:54,042 It's called Schur's inequality. 967 01:11:54,042 --> 01:11:56,500 And I thought I would never see it again after high school, 968 01:11:56,500 --> 01:11:57,875 but apparently it's still useful. 969 01:12:01,360 --> 01:12:03,130 So what Schur's inequality says-- 970 01:12:06,160 --> 01:12:08,560 this is one of those three-variable inequalities 971 01:12:08,560 --> 01:12:13,420 that you would know if you did math olympiads-- 972 01:12:13,420 --> 01:12:22,270 that you have-- so it's an inequality 973 01:12:22,270 --> 01:12:26,860 between non-negative-- actually, it's true for real numbers 974 01:12:26,860 --> 01:12:30,390 as well, but let's say it's non-negative real numbers. 975 01:12:36,040 --> 01:12:37,983 So that's Schur's equality. 976 01:12:42,420 --> 01:12:56,760 So if you look at the left-hand side, the left-hand side is-- 977 01:12:56,760 --> 01:12:59,310 it can be written as a sum in the following way. 978 01:12:59,310 --> 01:13:01,720 I mean, it can be written in the following way. 979 01:13:01,720 --> 01:13:06,450 So its expectation over x, y, z that are 3-AP's 980 01:13:06,450 --> 01:13:08,625 in the same U coset. 981 01:13:12,880 --> 01:13:15,480 So I'm counting 3-AP's with common difference restricted 982 01:13:15,480 --> 01:13:20,310 to U. So common 3-AP's in the same U coset. 983 01:13:20,310 --> 01:13:26,260 And I am looking at the product of f sub 984 01:13:26,260 --> 01:13:31,610 W evaluated on this 3-AP. 985 01:13:31,610 --> 01:13:37,380 So what I would like to do now is apply Schur's inequality 986 01:13:37,380 --> 01:13:45,170 to a, b, and c, being these three numbers. 987 01:13:45,170 --> 01:13:48,030 The point is you have this a, b, c on the left. 988 01:13:48,030 --> 01:13:50,640 And then everything on the right involves only a subset 989 01:13:50,640 --> 01:13:54,220 of a, b, c, and they simplify. 990 01:13:54,220 --> 01:13:58,720 So if I do this, then I lower bound this quantity 991 01:13:58,720 --> 01:14:08,990 by twice the expectation of x and y in the same coset, 992 01:14:08,990 --> 01:14:22,690 same U coset of f sub W of x squared f sub W of y. 993 01:14:25,420 --> 01:14:27,372 Maybe I took two other things, but they're 994 01:14:27,372 --> 01:14:29,080 all symmetric with respect to each other. 995 01:14:31,720 --> 01:14:36,180 And minus the term that corresponds 996 01:14:36,180 --> 01:14:39,915 to this sum of cubes. 997 01:14:39,915 --> 01:14:43,180 So like that. 998 01:14:43,180 --> 01:14:47,130 So this is a consequence of Schur's equality applied 999 01:14:47,130 --> 01:14:48,720 with a, b, c like this. 1000 01:14:51,870 --> 01:14:58,700 But now you see, over here, I can analyze this expression 1001 01:14:58,700 --> 01:14:59,300 even further. 1002 01:14:59,300 --> 01:15:03,640 Because if I let y vary within the same U coset, 1003 01:15:03,640 --> 01:15:08,180 then, over here, it averages out to U cosets. 1004 01:15:08,180 --> 01:15:14,860 So U is bigger than W. So what we have is-- 1005 01:15:18,030 --> 01:15:22,340 so what we have over here is that it is at least twice 1006 01:15:22,340 --> 01:15:26,381 of f of f-- 1007 01:15:26,381 --> 01:15:38,180 f of U-- fW squared fU minus the expectation of fW squared-- 1008 01:15:38,180 --> 01:15:41,290 fW cubed. 1009 01:15:41,290 --> 01:15:46,130 And I can use convexity on f sub W 1010 01:15:46,130 --> 01:16:04,150 to get that, which is what we're looking for. 1011 01:16:04,150 --> 01:16:07,092 So the last step is convexity. 1012 01:16:07,092 --> 01:16:08,550 So I'm running through a little bit 1013 01:16:08,550 --> 01:16:10,590 quick here because we're running out of time, 1014 01:16:10,590 --> 01:16:13,200 but all of these steps are fairly 1015 01:16:13,200 --> 01:16:15,420 simple once you observe the first thing you can do 1016 01:16:15,420 --> 01:16:18,830 is Schur's inequality. 1017 01:16:18,830 --> 01:16:20,460 And we're almost there. 1018 01:16:20,460 --> 01:16:21,360 We're almost done. 1019 01:16:21,360 --> 01:16:23,270 We're almost done. 1020 01:16:23,270 --> 01:16:29,230 So from that lemma up there, I claim now 1021 01:16:29,230 --> 01:16:32,770 that, for every epsilon, there exists 1022 01:16:32,770 --> 01:16:41,680 some m which is tower log in 1 over epsilon, such 1023 01:16:41,680 --> 01:16:48,670 that if f is a function on F3 to the n, 1024 01:16:48,670 --> 01:16:53,380 taking bounds between 0 and 1, then 1025 01:16:53,380 --> 01:17:00,410 there exists a subspace U of codimension, 1026 01:17:00,410 --> 01:17:08,590 at most, m such that the 3-AP count, 3-AP 1027 01:17:08,590 --> 01:17:11,680 density with common difference restricted to U, 1028 01:17:11,680 --> 01:17:17,710 is at least the random bound minus epsilon. 1029 01:17:24,370 --> 01:17:25,580 Why is this true? 1030 01:17:25,580 --> 01:17:32,660 Well, we put everything together, and choose U and W 1031 01:17:32,660 --> 01:17:34,610 as in regularity lemma. 1032 01:17:37,630 --> 01:17:48,720 And, by counting lemma, we have that the 3-AP density of f, 1033 01:17:48,720 --> 01:17:52,560 so it is at least-- 1034 01:17:52,560 --> 01:17:54,830 so we're using counting lemma over here-- 1035 01:17:54,830 --> 01:18:00,480 it is at least the 3-AP density of f sub W of U 1036 01:18:00,480 --> 01:18:04,418 minus a small error which we can control. 1037 01:18:04,418 --> 01:18:05,460 So this step is counting. 1038 01:18:09,710 --> 01:18:11,520 And now we apply that inequality up there. 1039 01:18:23,720 --> 01:18:27,120 And finally, we chose our U and W in the regularity lemma 1040 01:18:27,120 --> 01:18:31,830 so that this difference here is controlled. 1041 01:18:31,830 --> 01:18:37,370 So it is controlled by the random bound minus epsilon. 1042 01:18:42,100 --> 01:18:43,530 And that's it. 1043 01:18:43,530 --> 01:18:45,420 So you change epsilon to 4 epsilon, 1044 01:18:45,420 --> 01:18:47,830 but we can change it back. 1045 01:18:47,830 --> 01:18:48,600 And that's it. 1046 01:18:51,600 --> 01:18:53,640 So we have the statement that you 1047 01:18:53,640 --> 01:18:57,660 have this subspace of bounded codimension where 1048 01:18:57,660 --> 01:18:59,880 you have this popular difference result. 1049 01:18:59,880 --> 01:19:02,790 It doesn't quite guarantee you a single common difference, 1050 01:19:02,790 --> 01:19:04,760 because, well, you don't really want 1051 01:19:04,760 --> 01:19:10,200 it to be the case where U is just a single point because I 1052 01:19:10,200 --> 01:19:13,090 want a nonzero common difference. 1053 01:19:13,090 --> 01:19:15,580 But if U is large enough-- 1054 01:19:15,580 --> 01:19:21,060 if n is large enough at bounded codimension, 1055 01:19:21,060 --> 01:19:22,930 so, then, the size of U is large enough. 1056 01:19:25,780 --> 01:19:31,260 So, then, there exists some nonzero common difference. 1057 01:19:31,260 --> 01:19:35,160 You pick some nonzero element of U. On average, 1058 01:19:35,160 --> 01:19:38,170 this should work out just fine. 1059 01:19:38,170 --> 01:19:41,820 So I'll leave that detail to you. 1060 01:19:41,820 --> 01:19:43,530 One more thing I want to mention is 1061 01:19:43,530 --> 01:19:46,650 that all of this machinery involving regularity 1062 01:19:46,650 --> 01:19:49,450 and Fourier, as with things we've done before, 1063 01:19:49,450 --> 01:19:52,720 carries over to other settings-- general Abelian groups, 1064 01:19:52,720 --> 01:19:55,490 and also the integers. 1065 01:19:55,490 --> 01:19:57,710 And you may ask, well, we have this for 3-AP's. 1066 01:19:57,710 --> 01:20:01,570 What about longer arithmetic progressions? 1067 01:20:01,570 --> 01:20:03,750 In the integers, it turns out it is also 1068 01:20:03,750 --> 01:20:07,170 true, that Green's statement, in the integers 1069 01:20:07,170 --> 01:20:10,845 if you replace 3-AP by 4-AP. 1070 01:20:10,845 --> 01:20:12,220 That's a theorem of Green and Tao 1071 01:20:12,220 --> 01:20:15,470 involving higher-order quadratic analysis-- quadratic Fourier 1072 01:20:15,470 --> 01:20:17,140 analysis. 1073 01:20:17,140 --> 01:20:25,130 However, and rather surprisingly, 4-AP, it's OK. 1074 01:20:25,130 --> 01:20:27,790 But 5-AP and longer, it is false. 1075 01:20:30,880 --> 01:20:33,250 The corresponding statement about popular differences 1076 01:20:33,250 --> 01:20:35,960 for 5-AP in the integers is false. 1077 01:20:35,960 --> 01:20:38,285 There are counterexamples. 1078 01:20:38,285 --> 01:20:40,910 So it's really a statement about 3-AP's and 4-AP's, and there's 1079 01:20:40,910 --> 01:20:42,470 some magic cancellations that happen 1080 01:20:42,470 --> 01:20:43,637 in 4-AP's that make it true. 1081 01:20:48,450 --> 01:20:48,950 OK, great. 1082 01:20:48,950 --> 01:20:50,980 So that's all for today.