1
00:00:18,962 --> 00:00:20,670
YUFEI ZHAO: For the
past couple lectures,

2
00:00:20,670 --> 00:00:22,980
we've been talking
about Roth's theorem.

3
00:00:22,980 --> 00:00:28,190
And we showed-- so we saw
a proof of Roth's theorem

4
00:00:28,190 --> 00:00:30,730
using Fourier analytic methods.

5
00:00:30,730 --> 00:00:32,220
And we saw basically
the same proof

6
00:00:32,220 --> 00:00:33,930
but in two different settings.

7
00:00:33,930 --> 00:00:36,250
So two lectures ago,
we saw a proof in F3

8
00:00:36,250 --> 00:00:39,395
to the M And basically
the same strategy,

9
00:00:39,395 --> 00:00:40,770
but with a bit
more work, we were

10
00:00:40,770 --> 00:00:46,560
able to show Roth's theorem
worth roughly comparable bounds

11
00:00:46,560 --> 00:00:48,750
over the integers.

12
00:00:48,750 --> 00:00:51,900
Today, I want to show you a
very different kind of proof

13
00:00:51,900 --> 00:00:54,840
of Roth's theorem in the
finite fieldfini setting.

14
00:00:54,840 --> 00:00:59,220
So first let me remind
you, the bound that we

15
00:00:59,220 --> 00:01:04,019
saw last time for
Roth's in F3 to the M

16
00:01:04,019 --> 00:01:07,920
gave an upper bound
on the maximum number

17
00:01:07,920 --> 00:01:14,730
of elements in the 3-AP-free
set that were of the form

18
00:01:14,730 --> 00:01:17,520
3 to the n over n.

19
00:01:21,010 --> 00:01:22,510
And so this proof
wasn't too bad.

20
00:01:22,510 --> 00:01:24,190
So we did it in one lecture.

21
00:01:24,190 --> 00:01:26,670
And then with a lot more
work-- and people tried very,

22
00:01:26,670 --> 00:01:28,060
very hard to improve this--

23
00:01:28,060 --> 00:01:33,170
and there was a paper that got
it to just a little bit more.

24
00:01:33,170 --> 00:01:34,400
And this was a lot of work.

25
00:01:34,400 --> 00:01:36,760
And this was something
that people thought

26
00:01:36,760 --> 00:01:39,130
was very exciting at the time.

27
00:01:39,130 --> 00:01:41,820
And then just a few years ago,
there was a major breakthrough,

28
00:01:41,820 --> 00:01:44,155
a very surprising
breakthrough, where--

29
00:01:44,155 --> 00:01:46,030
you know, at this point,
it wasn't even clear

30
00:01:46,030 --> 00:01:49,020
whether 3 should be the
right base for this exponent.

31
00:01:49,020 --> 00:01:50,950
That was a big open problem.

32
00:01:50,950 --> 00:01:52,870
And then there was
a big breakthrough

33
00:01:52,870 --> 00:01:56,170
where the following
bound was proved,

34
00:01:56,170 --> 00:02:03,700
that it was exponentially
less than the previous bound.

35
00:02:03,700 --> 00:02:06,520
So this is one that I want to
talk about in the first part

36
00:02:06,520 --> 00:02:08,090
of today's lecture.

37
00:02:08,090 --> 00:02:11,020
So this development came first--

38
00:02:11,020 --> 00:02:13,060
the history is a
bit interesting.

39
00:02:13,060 --> 00:02:16,510
So Croot, Lev, and
Pach uploaded a paper

40
00:02:16,510 --> 00:02:22,060
to the archive May 5 of
2016, where they showed not

41
00:02:22,060 --> 00:02:26,200
exactly this theorem but in
a slightly different setting

42
00:02:26,200 --> 00:02:32,170
in this group, so in Z
mod 4 instead of Z mod 3.

43
00:02:32,170 --> 00:02:33,670
And this was already
quite exciting,

44
00:02:33,670 --> 00:02:36,640
getting exponential
improvement in this setting.

45
00:02:36,640 --> 00:02:38,650
But it wasn't
exactly obvious how

46
00:02:38,650 --> 00:02:41,440
to use their method to get F3.

47
00:02:41,440 --> 00:02:44,350
But that was done
about a week later.

48
00:02:44,350 --> 00:02:57,400
So Ellenberg and Gijswijt,
they managed to improve the--

49
00:02:57,400 --> 00:03:00,320
use this technique to modify
the Croot-Lev-Pach technique

50
00:03:00,320 --> 00:03:04,570
to the F2 to the n setting,
which is the one that we've

51
00:03:04,570 --> 00:03:05,380
been interested in.

52
00:03:05,380 --> 00:03:07,380
So there's a small
difference between these two,

53
00:03:07,380 --> 00:03:10,855
namely this group has
elements of order 2, which

54
00:03:10,855 --> 00:03:12,730
makes things a bit easier
to do it here with.

55
00:03:16,540 --> 00:03:18,400
So this is the
Croot-Lev-Pach method,

56
00:03:18,400 --> 00:03:20,500
as it's often called
in literature.

57
00:03:20,500 --> 00:03:23,440
And we'll see that--
it's a very ingenious use

58
00:03:23,440 --> 00:03:25,240
of the so-called
linear algebraic method

59
00:03:25,240 --> 00:03:28,570
in combinatorics, in this
case the polynomial method.

60
00:03:28,570 --> 00:03:32,600
And it works specifically in
the finite field vector space.

61
00:03:32,600 --> 00:03:35,650
So what we're talking about
in this part of the lecture

62
00:03:35,650 --> 00:03:37,260
does not translate whatsoever.

63
00:03:37,260 --> 00:03:40,812
At least, nobody knows how
to translate this technique

64
00:03:40,812 --> 00:03:41,770
to the integer setting.

65
00:03:45,010 --> 00:03:47,500
So how does it work?

66
00:03:47,500 --> 00:03:49,420
The presentation
I'm going to give

67
00:03:49,420 --> 00:03:52,910
follows not the original paper,
which is quite nice to read,

68
00:03:52,910 --> 00:03:53,470
by the way.

69
00:03:53,470 --> 00:03:54,803
It's only about four pages long.

70
00:03:54,803 --> 00:03:57,000
It's pleasant to read.

71
00:03:57,000 --> 00:04:00,640
But there's is a slightly
even nicer formulation

72
00:04:00,640 --> 00:04:01,980
on Terry Tao's blog.

73
00:04:01,980 --> 00:04:03,795
And that's the one
that I'm presenting.

74
00:04:07,890 --> 00:04:11,670
So the idea is that if
you have a subset of F3

75
00:04:11,670 --> 00:04:18,630
to the n that is
3-AP-free, such a set also

76
00:04:18,630 --> 00:04:24,265
has a name capset, which
is also used in literature

77
00:04:24,265 --> 00:04:25,890
in this specific
setting where you have

78
00:04:25,890 --> 00:04:27,720
no three points on the line.

79
00:04:27,720 --> 00:04:31,659
In this case, then we have
the following identity.

80
00:04:38,020 --> 00:04:41,710
So here delta is
the Dirac delta.

81
00:04:41,710 --> 00:04:43,300
Let me write that
down in a second.

82
00:04:47,470 --> 00:04:51,130
So the delta of a
is the Dirac delta.

83
00:04:51,130 --> 00:04:56,450
It's either 1 if x equals to a,
and 0 if x does not equal to a.

84
00:04:59,140 --> 00:05:00,910
So this is simply
rewriting the fact

85
00:05:00,910 --> 00:05:03,340
that x, y, z form
a 3-AP if and only

86
00:05:03,340 --> 00:05:08,125
if their sum is equal to 0.

87
00:05:08,125 --> 00:05:10,420
And because you're
3-AP-free, the only 3-AP's

88
00:05:10,420 --> 00:05:13,670
are the trivial ones recorded
on the right-hand side.

89
00:05:13,670 --> 00:05:15,800
So this is simply a
recording of the statement

90
00:05:15,800 --> 00:05:17,388
that A is 3-AP-free.

91
00:05:20,140 --> 00:05:23,890
And the idea now is that you
have this expression up there,

92
00:05:23,890 --> 00:05:28,570
and I want to show that
if A is very, very large,

93
00:05:28,570 --> 00:05:31,780
then I could get a
contradiction by considering

94
00:05:31,780 --> 00:05:33,820
some notion of rank.

95
00:05:33,820 --> 00:05:37,150
So we will show that
the left-hand side

96
00:05:37,150 --> 00:05:41,870
is, in some sense, low rank.

97
00:05:41,870 --> 00:05:44,330
Well, I haven't told
you what rank means yet.

98
00:05:44,330 --> 00:05:46,340
But the left-hand side
is somewhat low rank,

99
00:05:46,340 --> 00:05:52,750
and the right-hand side
is a high-rank object.

100
00:05:57,690 --> 00:06:00,440
So what does rank mean.

101
00:06:00,440 --> 00:06:02,615
So recall from linear algebra--

102
00:06:07,900 --> 00:06:10,390
so the classical notion
of rank corresponds

103
00:06:10,390 --> 00:06:12,710
to two variable functions.

104
00:06:12,710 --> 00:06:17,890
So you should think of F as a
matrix over an arbitrary field

105
00:06:17,890 --> 00:06:23,360
F. So such a function or
a corresponding matrix

106
00:06:23,360 --> 00:06:33,600
is called rank 1
if it is nonzero

107
00:06:33,600 --> 00:06:37,810
and it can be written
in the following form--

108
00:06:37,810 --> 00:06:48,190
F of x, y is f of x g
of y for some functions

109
00:06:48,190 --> 00:06:51,130
that are one variable each.

110
00:06:51,130 --> 00:06:54,990
So, in matrix language,
this is a column vector

111
00:06:54,990 --> 00:06:57,160
times a row vector.

112
00:06:57,160 --> 00:06:58,810
So that's the meaning of rank 1.

113
00:06:58,810 --> 00:07:03,910
And to say that something is of
high rank of a specific rank--

114
00:07:03,910 --> 00:07:08,980
rather, the rank of F is
defined to be the minimum number

115
00:07:08,980 --> 00:07:19,050
of rank 1 functions
needed to write F

116
00:07:19,050 --> 00:07:21,479
as a sum or a
linear combination.

117
00:07:27,840 --> 00:07:29,310
So this is rank 1.

118
00:07:29,310 --> 00:07:31,380
And if you add up
r rank 1 functions,

119
00:07:31,380 --> 00:07:33,930
then get something
that's, at most, rank r.

120
00:07:33,930 --> 00:07:38,740
So that's the basic definition
of rank from linear algebra.

121
00:07:38,740 --> 00:07:40,160
For three-variable
functions, you

122
00:07:40,160 --> 00:07:42,410
can come up with
other notions of rank.

123
00:07:42,410 --> 00:07:47,248
So what about
three-variable functions?

124
00:07:50,460 --> 00:07:53,430
So how do we define a
rank of such a function?

125
00:07:53,430 --> 00:07:56,100
So you might have seen such
objects as generalizations

126
00:07:56,100 --> 00:07:57,960
of matrices called tensors.

127
00:07:57,960 --> 00:08:01,880
And tensors have, already,
a natural notion of rank,

128
00:08:01,880 --> 00:08:04,680
and this is called tensor rank.

129
00:08:04,680 --> 00:08:08,130
Just like how, here, F is--

130
00:08:08,130 --> 00:08:12,030
we say rank 1 if it's
decomposable like that,

131
00:08:12,030 --> 00:08:18,660
we say F has tensor rank 1 if
this three-variable function is

132
00:08:18,660 --> 00:08:22,732
decomposable as a product
of one-variable functions.

133
00:08:25,660 --> 00:08:28,150
The tensor rank,
it turns out, this

134
00:08:28,150 --> 00:08:31,760
is an important notion, which
is actually quite mysterious.

135
00:08:31,760 --> 00:08:33,549
There's a lot of
important problems

136
00:08:33,549 --> 00:08:37,900
that boil down to us not really
understanding what tensor rank,

137
00:08:37,900 --> 00:08:39,703
how it behaves.

138
00:08:39,703 --> 00:08:41,620
And it turns out, this
is not the right notion

139
00:08:41,620 --> 00:08:43,299
to use for our problem.

140
00:08:43,299 --> 00:08:46,180
So we're going to use a
different notion of rank.

141
00:08:46,180 --> 00:08:49,150
Here, rank 1 is decomposing
this three-variable function

142
00:08:49,150 --> 00:08:51,910
into a product of three
one-variable functions.

143
00:08:51,910 --> 00:08:54,550
But, instead, I can
define a different notion.

144
00:08:54,550 --> 00:09:00,280
We say that F has slice rank 1--

145
00:09:00,280 --> 00:09:03,327
so this is a definition
that's introduced

146
00:09:03,327 --> 00:09:05,410
in the context of this
problem, although it's also

147
00:09:05,410 --> 00:09:07,210
quite a natural definition--

148
00:09:07,210 --> 00:09:11,890
if it has one of
the following forms.

149
00:09:16,340 --> 00:09:20,120
So I can write it as a product
of a one-variable function

150
00:09:20,120 --> 00:09:21,740
and a two-variable function.

151
00:09:21,740 --> 00:09:25,340
So one variable and the
remaining two variables.

152
00:09:25,340 --> 00:09:30,350
But this definition should also
be symmetric in the variables,

153
00:09:30,350 --> 00:09:33,010
so the other combinations
are OK as well.

154
00:09:40,200 --> 00:09:42,500
So this is the definition
of a rank one function,

155
00:09:42,500 --> 00:09:43,930
a slice rank 1.

156
00:09:43,930 --> 00:09:46,750
And, also, if nonzero.

157
00:09:46,750 --> 00:09:49,340
If it's nonzero and can be
written in one of these forms.

158
00:09:51,970 --> 00:09:54,340
And, just like
earlier, we define

159
00:09:54,340 --> 00:10:03,440
the slice rank of F to be
the minimum number of slice

160
00:10:03,440 --> 00:10:04,490
rank 1 functions.

161
00:10:08,390 --> 00:10:13,260
Same as before, that you
need to write F as a sum.

162
00:10:13,260 --> 00:10:18,140
So I can decompose this F into
a sum of slice rank 1 functions.

163
00:10:18,140 --> 00:10:21,730
What's the most
efficient way to do so?

164
00:10:21,730 --> 00:10:24,977
So that's the definition
of slice rank.

165
00:10:24,977 --> 00:10:27,060
And, you see, you can come
up with this definition

166
00:10:27,060 --> 00:10:29,640
for any number of
variables, where slice rank

167
00:10:29,640 --> 00:10:32,670
1 means decompose into
two functions, where

168
00:10:32,670 --> 00:10:35,340
one function takes one variable,
and the other function takes

169
00:10:35,340 --> 00:10:37,860
all the remaining variables.

170
00:10:37,860 --> 00:10:40,320
And, therefore, two
variables, slice rank and rank

171
00:10:40,320 --> 00:10:41,640
correspond to the same notion.

172
00:10:45,185 --> 00:10:46,060
Any questions so far?

173
00:10:49,640 --> 00:10:50,140
All right.

174
00:10:50,140 --> 00:10:53,020
So let's look at the
function on the right.

175
00:10:53,020 --> 00:10:55,600
So think of it as
a matrix, a tensor.

176
00:10:55,600 --> 00:10:56,900
So what is it?

177
00:10:56,900 --> 00:10:59,735
Well, it's kind of
like a diagonal matrix.

178
00:10:59,735 --> 00:11:00,610
So that's what it is.

179
00:11:00,610 --> 00:11:03,110
It's a diagonal matrix.

180
00:11:03,110 --> 00:11:10,660
So what is the rank of a
diagonal matrix, in this case

181
00:11:10,660 --> 00:11:11,620
a diagonal function?

182
00:11:14,570 --> 00:11:16,620
Well, you know
from linear algebra

183
00:11:16,620 --> 00:11:20,070
that if you have a matrix, then
the rank of a diagonal matrix

184
00:11:20,070 --> 00:11:23,090
is the number of
nonzero entries.

185
00:11:23,090 --> 00:11:25,700
So something similar
is true for slice rank,

186
00:11:25,700 --> 00:11:27,320
although it's less obvious.

187
00:11:27,320 --> 00:11:30,230
It will require a proof.

188
00:11:30,230 --> 00:11:36,310
So if I have this
three-variable function

189
00:11:36,310 --> 00:11:43,790
defined by the
following formula.

190
00:11:52,090 --> 00:11:54,810
So, in other words, it's
a diagonal function where

191
00:11:54,810 --> 00:11:59,280
the entries on the
diagonals are the Ca's.

192
00:11:59,280 --> 00:12:01,130
So what is the rank
of this function?

193
00:12:01,130 --> 00:12:09,260
So the slice rank of
F. In the matrix case,

194
00:12:09,260 --> 00:12:11,305
it will be the number
of nonzero entries,

195
00:12:11,305 --> 00:12:12,760
and it's exactly the same here.

196
00:12:17,710 --> 00:12:21,060
So number of nonzero
diagonal entries.

197
00:12:21,060 --> 00:12:22,560
That turns out to
be the slice rank.

198
00:12:25,670 --> 00:12:26,610
Let's see a proof.

199
00:12:26,610 --> 00:12:30,500
So we go back to the
definition of slice rank.

200
00:12:30,500 --> 00:12:36,380
And we see that one of
the directions is easy.

201
00:12:36,380 --> 00:12:39,280
So this less than or equal to,
greater than or equal to-- so

202
00:12:39,280 --> 00:12:40,030
which one is easy?

203
00:12:45,860 --> 00:12:50,080
So, you see, the right-hand
side is a sum of r--

204
00:12:50,080 --> 00:12:53,690
of a-- well, this
many rank 1 functions.

205
00:12:53,690 --> 00:12:58,150
So this direction is--

206
00:12:58,150 --> 00:13:02,140
so this direction is clear,
just looking at the definition.

207
00:13:02,140 --> 00:13:06,610
I can write F explicitly
as that many rank 1,

208
00:13:06,610 --> 00:13:10,130
slice rank 1 functions.

209
00:13:10,130 --> 00:13:16,110
So the tricky part is
greater than or equal to.

210
00:13:16,110 --> 00:13:19,920
And for the greater
than or equal to,

211
00:13:19,920 --> 00:13:26,700
let's assume that all the
diagonal entries are nonzero.

212
00:13:30,750 --> 00:13:33,270
So why can we do this?

213
00:13:33,270 --> 00:13:35,730
If it's not nonzero,
I claim that we

214
00:13:35,730 --> 00:13:47,440
can remove this element
from A. If the Ca is not 0,

215
00:13:47,440 --> 00:13:50,350
then I remove a from the set.

216
00:13:50,350 --> 00:13:55,690
And doing so cannot
increase the rank.

217
00:14:03,040 --> 00:14:12,310
A priori, the rank might go
down if you get rid of an entry.

218
00:14:12,310 --> 00:14:14,800
Because if you add an entry,
even though the function

219
00:14:14,800 --> 00:14:18,850
doesn't change on the original
set, if you increase your set,

220
00:14:18,850 --> 00:14:21,850
maybe you have more space,
maybe you have more flexibility

221
00:14:21,850 --> 00:14:22,480
to work with.

222
00:14:27,070 --> 00:14:34,530
But, certainly, if you remove an
element, the rank cannot go up.

223
00:14:38,310 --> 00:14:45,620
Now, so suppose
the slice rank of F

224
00:14:45,620 --> 00:14:47,980
is strictly less
than the size of A.

225
00:14:47,980 --> 00:14:52,190
So all these Ca's are nonzero.

226
00:14:52,190 --> 00:14:55,610
So suppose, for
contradiction, that there

227
00:14:55,610 --> 00:15:01,590
is some different way
to write function F that

228
00:15:01,590 --> 00:15:02,730
uses fewer terms.

229
00:15:05,240 --> 00:15:08,000
So what would such
a sum look like?

230
00:15:11,250 --> 00:15:16,410
So I would be able to write this
function F in a different way.

231
00:15:28,190 --> 00:15:28,810
Like that.

232
00:15:28,810 --> 00:15:31,556
And then, now, I look at these--

233
00:15:31,556 --> 00:15:35,700
the other types of functions
using different combination

234
00:15:35,700 --> 00:15:36,948
of the variables.

235
00:16:01,358 --> 00:16:02,900
So suppose there
were a different way

236
00:16:02,900 --> 00:16:08,210
to write this function
F that uses fewer terms.

237
00:16:08,210 --> 00:16:11,063
So I assume it uses exactly
the size of A minus 1 terms,

238
00:16:11,063 --> 00:16:12,980
and always putting zero
functions if you like.

239
00:16:15,720 --> 00:16:23,720
So now I claim that
there exists a function

240
00:16:23,720 --> 00:16:30,910
h on the set A whose support--

241
00:16:30,910 --> 00:16:34,520
so the support is
the number of entries

242
00:16:34,520 --> 00:16:35,930
that give nonzero values.

243
00:16:35,930 --> 00:16:41,740
The support of F is
bigger than m, such

244
00:16:41,740 --> 00:16:45,420
that the following sum is 0.

245
00:17:05,520 --> 00:17:12,660
So I claim that we can
find a function F--

246
00:17:12,660 --> 00:17:19,829
h such that I think of it as
in the kernel of some of these

247
00:17:19,829 --> 00:17:20,329
f's.

248
00:17:27,869 --> 00:17:29,730
So this is a linear
algebraic statement.

249
00:17:29,730 --> 00:17:30,624
Yes.

250
00:17:30,624 --> 00:17:32,763
AUDIENCE: What is
h sub [INAUDIBLE]??

251
00:17:32,763 --> 00:17:33,680
YUFEI ZHAO: Ah, sorry.

252
00:17:33,680 --> 00:17:35,020
It's just h.

253
00:17:35,020 --> 00:17:36,181
Thank you.

254
00:17:36,181 --> 00:17:39,480
It's a single function h
such that this equation

255
00:17:39,480 --> 00:17:42,486
is true for all x.

256
00:17:46,887 --> 00:17:49,332
AUDIENCE: [INAUDIBLE]
h of x minus the sum

257
00:17:49,332 --> 00:17:52,760
of all [INAUDIBLE].

258
00:17:52,760 --> 00:17:55,770
YUFEI ZHAO: You are right.

259
00:17:55,770 --> 00:17:57,020
So what do I want to say here?

260
00:18:12,355 --> 00:18:25,140
So we want to find a function
h such that the support of h

261
00:18:25,140 --> 00:18:27,220
is at least m.

262
00:18:45,527 --> 00:18:46,610
So what do we want to say?

263
00:18:51,050 --> 00:18:53,070
I want to say that--

264
00:19:00,915 --> 00:19:02,270
yes, so you're right.

265
00:19:02,270 --> 00:19:03,980
This is not what I want to say.

266
00:19:03,980 --> 00:19:09,090
And, instead, it's something--

267
00:19:09,090 --> 00:19:09,590
mm-hmm.

268
00:19:22,982 --> 00:19:26,670
Yes, good.

269
00:19:26,670 --> 00:19:28,570
So, let's see.

270
00:19:31,350 --> 00:19:34,345
So here we have some
number of functions.

271
00:19:34,345 --> 00:19:35,970
Here, we have some
number of functions.

272
00:19:35,970 --> 00:19:42,730
And for each a, I have--

273
00:19:42,730 --> 00:19:47,670
or for each-- let's see.

274
00:20:00,488 --> 00:20:01,967
Umm, hmm.

275
00:20:06,897 --> 00:20:08,445
AUDIENCE: [INAUDIBLE].

276
00:20:08,445 --> 00:20:09,362
YUFEI ZHAO: I'm sorry?

277
00:20:09,362 --> 00:20:10,380
AUDIENCE: [INAUDIBLE].

278
00:20:10,380 --> 00:20:10,570
YUFEI ZHAO: No.

279
00:20:10,570 --> 00:20:12,622
So I do want to show--
no, there's no induction,

280
00:20:12,622 --> 00:20:14,830
because I'm in three variables,
and I want to get rid

281
00:20:14,830 --> 00:20:16,390
of-- so the point is--

282
00:20:16,390 --> 00:20:19,450
so let's see where
we're going eventually,

283
00:20:19,450 --> 00:20:23,140
and then we'll figure out
what happened up there.

284
00:20:23,140 --> 00:20:26,050
So we want to consider--

285
00:20:33,180 --> 00:20:36,650
so I would like to eventually
consider the following sum.

286
00:20:52,200 --> 00:20:57,010
So I want to consider this
sum, which comes from--

287
00:20:57,010 --> 00:20:58,720
so you look at--

288
00:21:01,810 --> 00:21:02,550
wait, no.

289
00:21:02,550 --> 00:21:06,050
That's not the sum
I want to consider.

290
00:21:06,050 --> 00:21:17,352
So let's look at this F of
x, y, z, so F being that sum.

291
00:21:17,352 --> 00:21:17,852
No.

292
00:21:31,010 --> 00:21:33,080
So take that F up there.

293
00:21:33,080 --> 00:21:36,980
And let me consider,
basically, taking

294
00:21:36,980 --> 00:21:42,000
the inner product of
this function viewed

295
00:21:42,000 --> 00:21:44,900
as a function in z.

296
00:21:44,900 --> 00:21:48,690
So consider this inner product.

297
00:21:48,690 --> 00:21:54,558
And if I-- ah.

298
00:21:54,558 --> 00:22:00,340
I think-- so what I
want to say is not this.

299
00:22:03,730 --> 00:22:17,530
So what I want to say is, if I
look at an inner product of h

300
00:22:17,530 --> 00:22:19,880
with the--

301
00:22:24,580 --> 00:22:26,920
so take one of these f's--

302
00:22:26,920 --> 00:22:29,230
take one of these f's
and look at the bilinear

303
00:22:29,230 --> 00:22:31,190
form relating each in f.

304
00:22:31,190 --> 00:22:34,450
So I want to show
that this sum vanishes

305
00:22:34,450 --> 00:22:41,565
for all i between m plus 1
and the size of A minus 1.

306
00:22:41,565 --> 00:22:46,120
So this row, I want it
to vanish when being

307
00:22:46,120 --> 00:22:50,180
taken bilinear form with h.

308
00:22:50,180 --> 00:22:52,246
So that makes sense now.

309
00:22:52,246 --> 00:22:53,190
OK, good.

310
00:22:57,910 --> 00:23:02,020
So the fact that such a
nonzero h exists simply

311
00:23:02,020 --> 00:23:04,390
is a matter of
counting parameters.

312
00:23:04,390 --> 00:23:06,490
It's a linear
algebraic statement.

313
00:23:06,490 --> 00:23:08,420
You have some
number of freedoms.

314
00:23:08,420 --> 00:23:12,400
You have some number
of constraints.

315
00:23:12,400 --> 00:23:20,110
So the set of such h satisfy
all of these constraints.

316
00:23:20,110 --> 00:23:23,200
So there are this
many constraints.

317
00:23:23,200 --> 00:23:25,150
Well, each one of
them could carry down

318
00:23:25,150 --> 00:23:27,970
to one dimension less,
but the set of such h

319
00:23:27,970 --> 00:23:39,710
is a linear subspace of
dimension bigger than m,

320
00:23:39,710 --> 00:23:48,990
because I have A dimensions, and
I have these many constraints.

321
00:23:48,990 --> 00:23:52,640
So the set of such h is-- there
are a lot of possibilities.

322
00:23:52,640 --> 00:23:59,100
And, furthermore, it
is also true that--

323
00:23:59,100 --> 00:24:01,480
and this is a linear
algebraic statement--

324
00:24:01,480 --> 00:24:09,580
that every subspace of
dimension m plus one

325
00:24:09,580 --> 00:24:19,510
has a vector whose support
has size at least m plus 1.

326
00:24:26,170 --> 00:24:28,640
I'll leave this as a
linear algebraic exercise.

327
00:24:28,640 --> 00:24:34,074
It's not entirely
obvious, but it is true.

328
00:24:34,074 --> 00:24:35,830
When you put these
two things together,

329
00:24:35,830 --> 00:24:37,970
you find that there
is some vector--

330
00:24:37,970 --> 00:24:40,150
so I think of the
corners of the vectors

331
00:24:40,150 --> 00:24:42,040
as indexed by the set A--

332
00:24:42,040 --> 00:24:45,270
there is some vector whose
support is large enough.

333
00:24:53,510 --> 00:24:55,350
So we prove the claim.

334
00:24:55,350 --> 00:24:59,420
Let's go back to this lemma
about this diagonal function

335
00:24:59,420 --> 00:25:01,970
having high rank.

336
00:25:01,970 --> 00:25:03,380
Take h from the claim.

337
00:25:08,460 --> 00:25:10,570
So let's take h from the claim.

338
00:25:10,570 --> 00:25:15,260
Then let's consider
this sum over here.

339
00:25:15,260 --> 00:25:19,490
On one hand, what this sum is--

340
00:25:19,490 --> 00:25:24,350
you can do the sum on
the right-hand side.

341
00:25:24,350 --> 00:25:29,910
We see that it's like
multiplying a diagonal matrix

342
00:25:29,910 --> 00:25:31,350
by a vector.

343
00:25:31,350 --> 00:25:36,320
So what you get, following the
formula on the right-hand side,

344
00:25:36,320 --> 00:25:39,876
is the following.

345
00:25:39,876 --> 00:25:41,550
Let me rewrite this part.

346
00:25:47,250 --> 00:25:54,842
Sum over a of C sub a h of
a delta sub a of x delta sub

347
00:25:54,842 --> 00:25:55,650
a of y.

348
00:25:58,460 --> 00:26:02,580
Just looking at the formula
from the right hand side.

349
00:26:02,580 --> 00:26:08,110
On the other hand, if you
had a decomposition up there,

350
00:26:08,110 --> 00:26:14,730
doing this sum and
noting the claim,

351
00:26:14,730 --> 00:26:19,080
we see that the
third row is gone.

352
00:26:19,080 --> 00:26:38,152
So what you would have is
a sum over these z's of--

353
00:26:38,152 --> 00:26:42,950
so let me write that like this.

354
00:26:42,950 --> 00:26:51,200
So you would have a sum that is
of the form f1 of x and g tilde

355
00:26:51,200 --> 00:26:54,830
1 of y, where g
tilde is basically

356
00:26:54,830 --> 00:26:58,460
the inner product of g1
as a function of z with h.

357
00:27:05,180 --> 00:27:09,470
So fl of x gl of y.

358
00:27:09,470 --> 00:27:20,070
And then, also,
functions like that.

359
00:27:30,860 --> 00:27:33,120
So there exists some
functions g, which

360
00:27:33,120 --> 00:27:35,580
come from g tilde,
which come from the g's

361
00:27:35,580 --> 00:27:37,070
up there, such
that this is true.

362
00:27:41,830 --> 00:27:45,840
But now we're in the world
of two-variable functions.

363
00:27:45,840 --> 00:27:49,710
So left and right-hand side
are two-variable functions.

364
00:27:49,710 --> 00:27:51,900
And for two-variable
functions, you

365
00:27:51,900 --> 00:27:56,440
understand what is the rank
of a diagonal function.

366
00:27:56,440 --> 00:28:12,950
So the left-hand side has
more than m diagonal entries,

367
00:28:12,950 --> 00:28:15,260
because h has support.

368
00:28:15,260 --> 00:28:17,450
So the number of
diagonal entries

369
00:28:17,450 --> 00:28:19,310
is just the support of h.

370
00:28:23,320 --> 00:28:27,760
Whereas the right-hand
side has rank--

371
00:28:27,760 --> 00:28:30,850
so now a linear
algebraic matrix rank--

372
00:28:30,850 --> 00:28:32,230
at most, m.

373
00:28:35,080 --> 00:28:36,540
And that's a contradiction.

374
00:28:36,540 --> 00:28:37,092
Yes.

375
00:28:37,092 --> 00:28:39,060
AUDIENCE: So you can
show a similar statement

376
00:28:39,060 --> 00:28:41,560
where [INAUDIBLE].

377
00:28:41,560 --> 00:28:42,310
YUFEI ZHAO: Great.

378
00:28:42,310 --> 00:28:44,170
So we can show a
similar statement

379
00:28:44,170 --> 00:28:51,510
for arbitrary
number of variables

380
00:28:51,510 --> 00:28:53,940
by generalizing this
proof and using induction

381
00:28:53,940 --> 00:28:56,790
on the number of variables.

382
00:28:56,790 --> 00:28:58,540
But we only need three
variables for now.

383
00:29:01,390 --> 00:29:04,130
Any questions?

384
00:29:04,130 --> 00:29:07,550
Just to recap, what we
proved is the generalization

385
00:29:07,550 --> 00:29:10,760
of the statement that
a diagonal matrix has

386
00:29:10,760 --> 00:29:14,240
rank equal to the number of
nonzero diagonal entries.

387
00:29:14,240 --> 00:29:18,380
But the same fact is true for
these three-variable functions

388
00:29:18,380 --> 00:29:19,700
with respect to slice rank.

389
00:29:25,240 --> 00:29:27,618
So this is intuitively
obvious, but the execution

390
00:29:27,618 --> 00:29:28,410
is slightly tricky.

391
00:29:31,160 --> 00:29:31,890
All right.

392
00:29:31,890 --> 00:29:33,870
So now we have the
statement here.

393
00:29:33,870 --> 00:29:41,480
Let's proceed to analyze this
function which comes from--

394
00:29:41,480 --> 00:29:46,547
so this relationship here coming
from set A that is 3-AP-free.

395
00:29:53,020 --> 00:29:56,770
So suppose now I'm in--

396
00:29:56,770 --> 00:30:00,560
so let me-- so everything so
far was generally with any A.

397
00:30:00,560 --> 00:30:05,230
But now let me think
about, specifically,

398
00:30:05,230 --> 00:30:13,166
functions on the finite field
vector space, F3 to the n.

399
00:30:13,166 --> 00:30:16,600
So it's a function
taking value F3.

400
00:30:16,600 --> 00:30:22,180
And this function is defined
to be the left-hand side

401
00:30:22,180 --> 00:30:23,940
of that equation over there.

402
00:30:29,240 --> 00:30:31,850
So the claim is that--

403
00:30:31,850 --> 00:30:35,630
so the left-hand side claim
that this function has low rank.

404
00:30:35,630 --> 00:30:43,160
So we claim that a slice rank of
this function is, at most, 3M,

405
00:30:43,160 --> 00:31:01,120
where M is the sum
of, essentially,

406
00:31:01,120 --> 00:31:03,236
this multinomial coefficient.

407
00:31:09,160 --> 00:31:11,500
So we'll analyze this
number in a second,

408
00:31:11,500 --> 00:31:13,590
but this number is
supposed to be small.

409
00:31:19,540 --> 00:31:23,830
So we want to show that this
function here has small rank.

410
00:31:23,830 --> 00:31:29,170
So let's rewrite this
function in a form

411
00:31:29,170 --> 00:31:37,040
explicitly as a sum of
products by expanding

412
00:31:37,040 --> 00:31:40,410
this function after writing it
in a slightly different form.

413
00:31:40,410 --> 00:31:46,600
So in F3, in a three-variable--

414
00:31:46,600 --> 00:31:53,630
in characteristic-- so in
F3, you have this equation.

415
00:31:53,630 --> 00:31:56,560
You can check that it's true
for x equal to 0, 1, or 2.

416
00:31:59,920 --> 00:32:04,930
So picked that, and
plug it in over here.

417
00:32:04,930 --> 00:32:17,840
So we find-- so now x,
y, z are in F3 to the n.

418
00:32:17,840 --> 00:32:31,350
So we find that, applying
this guy here coordinate-wise,

419
00:32:31,350 --> 00:32:32,910
you have this product.

420
00:32:36,850 --> 00:32:37,390
Great.

421
00:32:37,390 --> 00:32:41,890
Now let's pretend we're
expanding everything.

422
00:32:41,890 --> 00:32:51,370
This is a polynomial in 3n
variables, 3n variables.

423
00:32:51,370 --> 00:32:52,810
It's degrees is 2n.

424
00:32:55,880 --> 00:33:04,440
So if we expand, we get
a bunch of monomials.

425
00:33:04,440 --> 00:33:06,890
And the monomials will
have the following form.

426
00:33:13,710 --> 00:33:18,760
So the x's, which--
whose exponents I call i,

427
00:33:18,760 --> 00:33:25,680
the y's, whose
exponents I call j,

428
00:33:25,680 --> 00:33:34,900
and the z's, whose
exponents I call k, where--

429
00:33:34,900 --> 00:33:41,140
so I get a sum of
monomials like that,

430
00:33:41,140 --> 00:33:51,630
where all of these i, j's,
and k's are either 0, 1, or 2.

431
00:33:55,950 --> 00:33:59,283
So I get this big
sum of monomials,

432
00:33:59,283 --> 00:34:01,200
and I want to show that
it's possible to write

433
00:34:01,200 --> 00:34:07,940
this sum as a small number of
functions that can be written

434
00:34:07,940 --> 00:34:12,320
as a product, where
one of the factors only

435
00:34:12,320 --> 00:34:16,949
involves one of x, y, z.

436
00:34:16,949 --> 00:34:19,409
So what we can do
is to group them.

437
00:34:22,277 --> 00:34:33,120
So group these
monomials by the--

438
00:34:36,860 --> 00:34:39,770
so, for example, I'm going
to group these monomials

439
00:34:39,770 --> 00:34:41,780
by using the
following observation.

440
00:34:41,780 --> 00:34:58,030
So by pigeonhole, at least
one of the exponents of x,

441
00:34:58,030 --> 00:35:05,080
or the exponents of y, or the
exponents of z, at least one

442
00:35:05,080 --> 00:35:10,930
of these guys is,
at most, 2n over 3.

443
00:35:13,920 --> 00:35:20,328
So I group these
monomials by the--

444
00:35:20,328 --> 00:35:23,010
one of x, y, z that has
the smallest exponent.

445
00:35:26,850 --> 00:35:36,240
So the contributions to
the rank or the slice

446
00:35:36,240 --> 00:35:51,340
rank from monomials with the
degree of x being, at most,

447
00:35:51,340 --> 00:35:57,990
2n over 3, well, I can
write such contributions

448
00:35:57,990 --> 00:36:09,770
in the form like that, where
this f of x is a monomial,

449
00:36:09,770 --> 00:36:14,030
and the g is a sum of
whatever that could come up.

450
00:36:14,030 --> 00:36:17,680
This is a sum, but
this is a monomial.

451
00:36:17,680 --> 00:36:21,950
So the number of such terms--

452
00:36:21,950 --> 00:36:28,870
so the number of such
terms is the number

453
00:36:28,870 --> 00:36:35,030
of monomials corresponding
to choices of i's, the sum

454
00:36:35,030 --> 00:36:41,760
to 2n over 3, and individual
i's coming from 0, 1, or 2.

455
00:36:41,760 --> 00:36:46,270
And that number is precisely M.

456
00:36:46,270 --> 00:36:51,420
So M counts the number
of choices of 0, 1, 2's.

457
00:36:51,420 --> 00:36:52,800
There are n of them.

458
00:36:52,800 --> 00:36:55,500
And the sums of the i's
is, at most, 2n over 3.

459
00:37:02,125 --> 00:37:03,500
So these are
contributions coming

460
00:37:03,500 --> 00:37:07,310
from monomials where the degree
of x is, at most, 2n over 3.

461
00:37:07,310 --> 00:37:13,220
And, similarly, with
degree of y being

462
00:37:13,220 --> 00:37:21,910
2n over 3, and also degree of
z being, at most, 2n over 3.

463
00:37:21,910 --> 00:37:24,020
So. all the monomials
can be grouped

464
00:37:24,020 --> 00:37:27,665
in one of these three groups,
and I count the contribution

465
00:37:27,665 --> 00:37:29,774
to the slice rank.

466
00:37:29,774 --> 00:37:32,685
AUDIENCE: Do we have a good idea
as to how sharp this bound is?

467
00:37:32,685 --> 00:37:34,227
YUFEI ZHAO: So the
question is, do we

468
00:37:34,227 --> 00:37:37,730
have a good idea as to
how sharp this bound is?

469
00:37:37,730 --> 00:37:38,980
That's a really good question.

470
00:37:38,980 --> 00:37:40,326
I don't know.

471
00:37:40,326 --> 00:37:40,826
Yes.

472
00:37:46,600 --> 00:37:47,100
Great.

473
00:37:47,100 --> 00:37:49,490
So that finishes the
proof of this lemma.

474
00:37:58,580 --> 00:38:00,260
So now we have this lemma.

475
00:38:00,260 --> 00:38:04,190
I can compare-- so we
have these two lemmas.

476
00:38:04,190 --> 00:38:08,180
One of them tells me the rank
of the right-hand side, which

477
00:38:08,180 --> 00:38:16,860
is A. Let's compare
ranks, the slice rank.

478
00:38:21,160 --> 00:38:25,200
So the left-hand side, we know
it is, at most, this quantity.

479
00:38:25,200 --> 00:38:28,600
And the right-hand
side is equal to A.

480
00:38:28,600 --> 00:38:34,620
So we automatically
find this bound.

481
00:38:34,620 --> 00:38:38,590
So now we want to know
how big this number M is.

482
00:38:38,590 --> 00:38:42,250
So there's actually-- this
is a fairly standard problem

483
00:38:42,250 --> 00:38:45,800
to solve to estimate the
growth of this function M. So

484
00:38:45,800 --> 00:38:48,500
let me show you how to do
it, and this is basically

485
00:38:48,500 --> 00:38:51,590
the universal method.

486
00:38:51,590 --> 00:38:55,940
Notice that I can--

487
00:38:55,940 --> 00:38:59,280
if I look at this
number here, where if--

488
00:38:59,280 --> 00:39:04,930
so now x is some real
number between 0 and 1.

489
00:39:04,930 --> 00:39:09,700
Then I claim the
following is true.

490
00:39:13,800 --> 00:39:16,940
And this is because if you
expand the right-hand side

491
00:39:16,940 --> 00:39:20,290
and count your monomials--

492
00:39:20,290 --> 00:39:24,330
so you can just keep track
of which monomials occur,

493
00:39:24,330 --> 00:39:27,540
and there are M of them,
where you can lower

494
00:39:27,540 --> 00:39:28,988
bound by this quantity here.

495
00:39:32,580 --> 00:39:37,190
So this is kind of related to
things in probability theory

496
00:39:37,190 --> 00:39:40,290
on large deviations, to
the Cramér's theorem.

497
00:39:40,290 --> 00:39:42,860
But that's what you can do.

498
00:39:42,860 --> 00:39:47,720
So this is true for every
value of x, so you pick one

499
00:39:47,720 --> 00:39:51,070
that gives you the best bound.

500
00:39:51,070 --> 00:40:04,980
So M is, at most, the inf
of this quantity here.

501
00:40:04,980 --> 00:40:07,760
And to show you
any bound, I just

502
00:40:07,760 --> 00:40:10,410
have to plug in some value.

503
00:40:10,410 --> 00:40:15,560
So if I plug in, for
example, x being 0.6,

504
00:40:15,560 --> 00:40:18,200
I already get a bound which
is the one that I claimed.

505
00:40:24,900 --> 00:40:28,970
And it turns out this
step here is not lossy.

506
00:40:28,970 --> 00:40:33,380
As in, basically, up to 1 plus
little o1 in the exponent,

507
00:40:33,380 --> 00:40:36,480
this is the correct bound.

508
00:40:36,480 --> 00:40:39,730
And that follows from general
results in large deviation

509
00:40:39,730 --> 00:40:41,980
theory.

510
00:40:41,980 --> 00:40:45,850
And that finishes the proof.

511
00:40:45,850 --> 00:40:47,380
Alternatively, you
can also estimate

512
00:40:47,380 --> 00:40:50,680
M using Sterling's formula.

513
00:40:50,680 --> 00:40:51,930
But this, I think, is cleaner.

514
00:40:55,060 --> 00:40:55,910
Great.

515
00:40:55,910 --> 00:40:58,890
Any questions?

516
00:40:58,890 --> 00:40:59,390
Yes.

517
00:40:59,390 --> 00:41:00,800
AUDIENCE: [INAUDIBLE].

518
00:41:06,748 --> 00:41:07,540
YUFEI ZHAO: Ah, OK.

519
00:41:07,540 --> 00:41:10,010
So why is this step true?

520
00:41:10,010 --> 00:41:14,210
So if you expand
the right-hand side,

521
00:41:14,210 --> 00:41:21,040
you see that the right-hand side
is upper bounded by all these

522
00:41:21,040 --> 00:41:23,820
a, b, c, as in--

523
00:41:23,820 --> 00:41:33,300
same as over here,
x to the b plus 2c.

524
00:41:37,300 --> 00:41:41,140
And because how many terms--

525
00:41:41,140 --> 00:41:45,635
and, also, there's a
binomial coefficient term.

526
00:41:45,635 --> 00:41:47,760
So, basically, I'm doing
the multinomial expansion,

527
00:41:47,760 --> 00:41:50,265
except I toss out everything
which is not part of the index.

528
00:41:52,795 --> 00:42:00,045
And because b plus 2c
is, at most, 2n over 3,

529
00:42:00,045 --> 00:42:04,030
I get M times x
to the 2n over 3.

530
00:42:07,495 --> 00:42:08,397
OK?

531
00:42:08,397 --> 00:42:08,980
AUDIENCE: Yes.

532
00:42:13,435 --> 00:42:17,970
YUFEI ZHAO: Now I want to
convey a sense of mystique

533
00:42:17,970 --> 00:42:19,280
about this proof.

534
00:42:19,280 --> 00:42:21,960
This is a really cool proof.

535
00:42:21,960 --> 00:42:24,070
So because you're
seeing a lecture,

536
00:42:24,070 --> 00:42:25,800
maybe it went by very quickly.

537
00:42:25,800 --> 00:42:29,130
But when this proof came out,
people were very shocked.

538
00:42:29,130 --> 00:42:31,560
They didn't expect that this
problem would be tackled,

539
00:42:31,560 --> 00:42:37,950
would be solved using a
method that is so unexpected.

540
00:42:37,950 --> 00:42:41,130
And this is part of this
power of the algebraic method

541
00:42:41,130 --> 00:42:45,120
in combinatorics,
where we often end up

542
00:42:45,120 --> 00:42:47,100
with these short,
surprising proofs that

543
00:42:47,100 --> 00:42:48,842
take a very long time to find.

544
00:42:48,842 --> 00:42:50,300
But they turn out
to be very short.

545
00:42:50,300 --> 00:42:51,450
So this is very short.

546
00:42:51,450 --> 00:42:55,080
This was basically
a four-page paper.

547
00:42:55,080 --> 00:42:56,850
But when they work,
they work beautifully.

548
00:42:56,850 --> 00:42:58,100
They work like magic.

549
00:42:58,100 --> 00:43:00,480
But it's hard to
predict when they work.

550
00:43:00,480 --> 00:43:03,800
And, also, these methods
are somewhat fragile.

551
00:43:03,800 --> 00:43:05,960
So, unlike the Fourier
analytic methods

552
00:43:05,960 --> 00:43:10,120
that we saw last time, with
that method, it's very analytic.

553
00:43:10,120 --> 00:43:13,160
It works in one situation,
you can play with it,

554
00:43:13,160 --> 00:43:15,780
massage it, make it work
in a different situation.

555
00:43:15,780 --> 00:43:20,390
Here, we're using something
very implicit, very special

556
00:43:20,390 --> 00:43:24,390
about these many variables.

557
00:43:24,390 --> 00:43:28,080
And if you try to tweak the
problem just a little bit,

558
00:43:28,080 --> 00:43:29,940
the method seems to break down.

559
00:43:29,940 --> 00:43:31,980
So, in particular,
it is open how

560
00:43:31,980 --> 00:43:38,570
to extend this method
to other settings.

561
00:43:38,570 --> 00:43:40,590
It's not even clear what
the results should be.

562
00:43:40,590 --> 00:43:43,560
So it's open to extend
it to, for example, 4-AP.

563
00:43:48,910 --> 00:43:58,130
So we do not know if the maximum
size of 4-AP-free subset of F5

564
00:43:58,130 --> 00:44:06,080
to the n is less than some
constant, 4.99 to the n.

565
00:44:06,080 --> 00:44:10,511
So that's very much open.

566
00:44:10,511 --> 00:44:12,420
By the way, all of
this 3-AP stuff,

567
00:44:12,420 --> 00:44:14,390
right now I've
only done it in F3,

568
00:44:14,390 --> 00:44:17,690
but it works for 3-AP
in any finite field.

569
00:44:20,450 --> 00:44:23,855
It also is open to
extend it to corners.

570
00:44:27,110 --> 00:44:30,950
So you can define a
notion of corners.

571
00:44:30,950 --> 00:44:34,010
So, previously, we saw
corners in integer grid.

572
00:44:34,010 --> 00:44:35,760
If I replace integer
by some other group,

573
00:44:35,760 --> 00:44:38,570
you can define a notion
of corners there.

574
00:44:38,570 --> 00:44:43,400
So not clear how to extend
this method to corners.

575
00:44:43,400 --> 00:44:46,520
And, also, is there some
way to extend some ideas

576
00:44:46,520 --> 00:44:49,010
from this method
to the integers?

577
00:44:49,010 --> 00:44:52,680
It completely fails, so this
method is not clear at all

578
00:44:52,680 --> 00:44:56,190
how you might have
it work in a setting

579
00:44:56,190 --> 00:44:58,500
where you don't have
this high dimensionality.

580
00:44:58,500 --> 00:44:59,610
I mean, the result
will be different,

581
00:44:59,610 --> 00:45:01,950
because, integers, we know
that there's no power saving,

582
00:45:01,950 --> 00:45:03,617
but maybe you can get
some other bounds.

583
00:45:06,960 --> 00:45:07,990
Any questions?

584
00:45:12,690 --> 00:45:13,460
OK.

585
00:45:13,460 --> 00:45:14,708
great.

586
00:45:14,708 --> 00:45:15,500
Let's take a break.

587
00:45:18,570 --> 00:45:20,270
So in the first part
of today's lecture,

588
00:45:20,270 --> 00:45:23,468
I showed you a proof
of Roth's theorem.

589
00:45:23,468 --> 00:45:25,510
In F3 to the n, that gave
you a much better bound

590
00:45:25,510 --> 00:45:28,480
than what we did with Fourier.

591
00:45:28,480 --> 00:45:30,490
Second part, I want to
show you another proof.

592
00:45:30,490 --> 00:45:33,720
So yet another proof
of Roth in F2 to the n,

593
00:45:33,720 --> 00:45:36,700
and this time giving
you a much worse bound.

594
00:45:36,700 --> 00:45:38,850
But, of course, I do
this for a reason.

595
00:45:38,850 --> 00:45:42,830
So it will give
you the new result.

596
00:45:42,830 --> 00:45:45,970
So it will give you some more
information about 3-AP's and F3

597
00:45:45,970 --> 00:45:46,630
to the n.

598
00:45:46,630 --> 00:45:50,530
But the more important
reason is that in this course

599
00:45:50,530 --> 00:45:52,840
I try to make some connections
between graph theory

600
00:45:52,840 --> 00:45:54,430
on one hand and
additive combinatorics

601
00:45:54,430 --> 00:45:55,510
on the other hand.

602
00:45:55,510 --> 00:45:57,390
And, so far, we've
seen some analogies.

603
00:45:57,390 --> 00:45:59,860
Well, in the proof of
Szemeredi's graph regularity

604
00:45:59,860 --> 00:46:02,680
lemma versus the proof--
the Fourier analytic proof

605
00:46:02,680 --> 00:46:07,810
of Roth's theorem, there was
this common theme of structure

606
00:46:07,810 --> 00:46:09,920
versus pseudorandomness.

607
00:46:09,920 --> 00:46:13,360
But the actual execution of the
proofs are somewhat different.

608
00:46:13,360 --> 00:46:16,480
Because, on one hand,
in regularity lemma,

609
00:46:16,480 --> 00:46:18,565
you have energy increment.

610
00:46:18,565 --> 00:46:21,650
You have partitioning
and energy increment.

611
00:46:21,650 --> 00:46:23,710
And, on the other
hand, with Roth,

612
00:46:23,710 --> 00:46:26,065
you have density increment.

613
00:46:26,065 --> 00:46:27,190
Or you're not partitioning.

614
00:46:27,190 --> 00:46:28,870
You're zooming in.

615
00:46:28,870 --> 00:46:31,880
Take a set, find some structure,
zoom in, find some structure,

616
00:46:31,880 --> 00:46:32,730
zoom in.

617
00:46:32,730 --> 00:46:35,210
You'll get density increment.

618
00:46:35,210 --> 00:46:38,690
So it's similar, but
differently executed.

619
00:46:38,690 --> 00:46:41,030
So, today-- I mean,
this second half,

620
00:46:41,030 --> 00:46:43,520
I want to show you how to do
a different proof of Roth's

621
00:46:43,520 --> 00:46:46,640
theorem that is much
more closely related

622
00:46:46,640 --> 00:46:50,600
to the regularity proof, so
that has this energy increment

623
00:46:50,600 --> 00:46:53,120
element to it.

624
00:46:53,120 --> 00:46:56,920
And I show you this proof
because it also gives you

625
00:46:56,920 --> 00:47:00,890
a stronger consequence.

626
00:47:00,890 --> 00:47:08,300
And, namely, we'll
get that there is also

627
00:47:08,300 --> 00:47:14,840
not just 3-AP's but 3-AP's
with popular difference.

628
00:47:14,840 --> 00:47:17,790
So here's the result
that we'll see today.

629
00:47:17,790 --> 00:47:20,256
So it's proved by Ben Green.

630
00:47:20,256 --> 00:47:32,370
That for every epsilon, there
exists some n0 such that every

631
00:47:32,370 --> 00:47:43,790
A in subset of F3 to the
n with density alpha,

632
00:47:43,790 --> 00:47:58,990
there exists some nonzero y such
that the number of 3-AP's with

633
00:47:58,990 --> 00:48:00,220
common difference y--

634
00:48:04,340 --> 00:48:06,200
so let's think about
what's going on here.

635
00:48:06,200 --> 00:48:09,070
So if I just give you
a set A and ask you

636
00:48:09,070 --> 00:48:12,640
how many 3-AP's are
there, and compare it

637
00:48:12,640 --> 00:48:16,420
to what you get from
random, random meaning

638
00:48:16,420 --> 00:48:20,320
if A were a random set
of the same density.

639
00:48:20,320 --> 00:48:23,430
So question is, can the
number of 3-AP's be less

640
00:48:23,430 --> 00:48:27,370
than the random count?

641
00:48:27,370 --> 00:48:28,570
And the answer is yes.

642
00:48:28,570 --> 00:48:32,595
So, for example,
you could have--

643
00:48:32,595 --> 00:48:36,020
in the integers, you can have
a barren type construction that

644
00:48:36,020 --> 00:48:37,560
has no 3-AP's.

645
00:48:37,560 --> 00:48:40,080
So, certainly, that's
fewer 3-AP's than random.

646
00:48:40,080 --> 00:48:43,800
And you can do
similar things here.

647
00:48:43,800 --> 00:48:47,880
But what Green's theorem
says is that there

648
00:48:47,880 --> 00:48:51,090
exists some popular
common difference--

649
00:48:51,090 --> 00:48:55,273
so this is a popular
common difference--

650
00:48:58,520 --> 00:49:03,170
such that the number
of 3-AP's in A

651
00:49:03,170 --> 00:49:09,240
with this common difference
is at least as much as

652
00:49:09,240 --> 00:49:12,770
what you should expect
in a random setting,

653
00:49:12,770 --> 00:49:13,970
up to a minus epsilon.

654
00:49:18,830 --> 00:49:20,510
So this is the theorem.

655
00:49:20,510 --> 00:49:23,006
So let me say the
intuition again.

656
00:49:23,006 --> 00:49:26,330
It says that, given
an arbitrary set A,

657
00:49:26,330 --> 00:49:30,180
provided the space
dimension is large enough,

658
00:49:30,180 --> 00:49:32,830
there exists some popular
common difference,

659
00:49:32,830 --> 00:49:35,330
where popular means that
the number of 3-AP's

660
00:49:35,330 --> 00:49:38,990
with that common difference
is at least roughly as many

661
00:49:38,990 --> 00:49:39,530
as random.

662
00:49:42,410 --> 00:49:46,420
In particular, this
proves Roth's theorem,

663
00:49:46,420 --> 00:49:49,373
because you have at
least some 3-AP's.

664
00:49:49,373 --> 00:49:50,290
But it tells you more.

665
00:49:50,290 --> 00:49:52,660
It tells you there's some
common difference that

666
00:49:52,660 --> 00:49:57,500
has a lot of 3-AP's,
even though, on average,

667
00:49:57,500 --> 00:49:59,690
if you just take an average,
if you take a random y,

668
00:49:59,690 --> 00:50:00,300
this is false.

669
00:50:04,160 --> 00:50:05,897
Any questions about
the statement?

670
00:50:10,630 --> 00:50:15,200
So Green developed
an arithmetic analog

671
00:50:15,200 --> 00:50:18,030
of Szemeredi's graph
regularity lemma

672
00:50:18,030 --> 00:50:19,500
in order to prove this theorem.

673
00:50:26,120 --> 00:50:33,050
So starting with Szemeredi's
graph regularity lemma,

674
00:50:33,050 --> 00:50:37,340
he found a way to import that
technique into the arithmetic

675
00:50:37,340 --> 00:50:40,560
setting, in F3 to the n.

676
00:50:40,560 --> 00:50:43,850
So I want to show you how,
roughly, how this is done.

677
00:50:43,850 --> 00:50:47,030
And just like in Szemeredi's
graph regularity lemma,

678
00:50:47,030 --> 00:50:50,540
there were unavoidable bounds
which are of power type,

679
00:50:50,540 --> 00:50:53,240
the same thing is true in
the arithmetic setting.

680
00:50:53,240 --> 00:50:59,980
So Green's proof shows
that the theorem is true,

681
00:50:59,980 --> 00:51:07,110
with n0 being something
like tower in--

682
00:51:07,110 --> 00:51:09,900
a tower of twos.

683
00:51:09,900 --> 00:51:14,230
The height of the tower is a
polynomial in 1 over epsilon.

684
00:51:17,974 --> 00:51:21,390
So just like in regularity
lemma for graphs.

685
00:51:21,390 --> 00:51:24,690
So this was recently
improved in a paper

686
00:51:24,690 --> 00:51:30,712
by Fox and Pham just a
couple of years ago, where--

687
00:51:30,712 --> 00:51:33,750
and this is the proof that
I will show you today--

688
00:51:33,750 --> 00:51:38,400
where you can take n0 to be
slightly better but still

689
00:51:38,400 --> 00:51:43,290
a tower, but a tower of now
height log in 1 over epsilon.

690
00:51:43,290 --> 00:51:45,620
So it's from a really,
really big tower

691
00:51:45,620 --> 00:51:47,310
to slightly less big tower.

692
00:51:50,060 --> 00:51:52,010
But, more importantly,
it turns out--

693
00:51:52,010 --> 00:51:55,820
so they also showed
that this is tight.

694
00:51:58,610 --> 00:52:01,270
You cannot do better.

695
00:52:01,270 --> 00:52:05,915
There exists constructions,
there exist sets A for which

696
00:52:05,915 --> 00:52:06,415
you--

697
00:52:06,415 --> 00:52:08,910
I mean, this theorem
is false if you

698
00:52:08,910 --> 00:52:12,390
replace the big O by less
than some very small constant.

699
00:52:15,660 --> 00:52:19,100
So many applications of
the regularity lemma.

700
00:52:19,100 --> 00:52:22,170
That first proof, maybe using
regularity, is difficult. Well,

701
00:52:22,170 --> 00:52:23,700
it gives you a very poor bound.

702
00:52:23,700 --> 00:52:27,270
But, subsequently, there were
other proofs, better proofs,

703
00:52:27,270 --> 00:52:30,810
that give you
non-tower type bounds.

704
00:52:30,810 --> 00:52:33,000
But this is the
first application

705
00:52:33,000 --> 00:52:35,610
that we've seen
where, it turns out,

706
00:52:35,610 --> 00:52:41,520
the regularity lemma gives
you the correct bound.

707
00:52:41,520 --> 00:52:44,000
So it's really-- you
need a tower-type bound.

708
00:52:44,000 --> 00:52:45,500
I mean, we know the
regularity lemma

709
00:52:45,500 --> 00:52:47,240
itself needs tower-type bounds.

710
00:52:47,240 --> 00:52:48,890
But it turns out
this application also

711
00:52:48,890 --> 00:52:51,208
needs tower-type bounds.

712
00:52:51,208 --> 00:52:52,250
That's quite interesting.

713
00:52:52,250 --> 00:52:54,080
So, here, the use
of regularity is

714
00:52:54,080 --> 00:52:57,180
really necessary in
this quantitative sense.

715
00:53:02,342 --> 00:53:03,300
So let's see the proof.

716
00:53:05,970 --> 00:53:11,940
So let me first prove a
slightly technical lemma

717
00:53:11,940 --> 00:53:13,748
about bounded increments.

718
00:53:16,440 --> 00:53:19,055
So this is-- corresponds
to the statement

719
00:53:19,055 --> 00:53:20,550
that if you have
energy increments,

720
00:53:20,550 --> 00:53:22,800
you can not increase
too many times,

721
00:53:22,800 --> 00:53:25,260
but in a slightly
different form.

722
00:53:25,260 --> 00:53:28,380
So suppose you have
numbers alpha and epsilon

723
00:53:28,380 --> 00:53:30,060
bigger than 0.

724
00:53:30,060 --> 00:53:39,450
And if you have this sequence
of a's between 0 and 1,

725
00:53:39,450 --> 00:53:44,430
and such that a0
is at least alpha,

726
00:53:44,430 --> 00:53:50,760
then there exists
some k, at most log

727
00:53:50,760 --> 00:53:56,670
base 2 of 1 over
epsilon, such that 2

728
00:53:56,670 --> 00:54:01,950
a sub k minus a sub k
plus 1 is at least alpha

729
00:54:01,950 --> 00:54:04,248
cubed minus epsilon.

730
00:54:04,248 --> 00:54:05,540
So don't worry about this form.

731
00:54:05,540 --> 00:54:08,400
We'll see shorty why we
want something like that.

732
00:54:08,400 --> 00:54:10,860
But the proof itself is
very straightforward.

733
00:54:10,860 --> 00:54:16,461
Because, otherwise--
so you start with a0.

734
00:54:16,461 --> 00:54:21,780
Now, then, if this is not
true for k equals to 0,

735
00:54:21,780 --> 00:54:30,560
then a1 is at least 2 a0 minus
epsilon cubed plus epsilon.

736
00:54:30,560 --> 00:54:35,210
So a0 is at least alpha cubed.

737
00:54:35,210 --> 00:54:39,265
So if-- otherwise, you have
some lower bound on alpha

738
00:54:39,265 --> 00:54:49,080
1, which is at least
alpha cubed plus epsilon.

739
00:54:49,080 --> 00:54:52,410
And, likewise, you have
some lower bound on alpha 2.

740
00:54:59,880 --> 00:55:01,866
You have some lower bound on--

741
00:55:01,866 --> 00:55:09,680
sorry-- alpha 2, and this
lower bound is plus 2 epsilon.

742
00:55:09,680 --> 00:55:10,920
So you keep iterating.

743
00:55:10,920 --> 00:55:16,450
You see the next thing
is 4 epsilon, and so on.

744
00:55:16,450 --> 00:55:20,640
So if you get to more
than this many iterations,

745
00:55:20,640 --> 00:55:23,430
you go more than 1.

746
00:55:23,430 --> 00:55:32,790
So alpha k is bigger than 1
if k is ceiling of log base 2

747
00:55:32,790 --> 00:55:34,140
of 1 over epsilon.

748
00:55:34,140 --> 00:55:38,710
And that will be a
contradiction to the hypothesis.

749
00:55:38,710 --> 00:55:42,990
So this is a small variation
on this fact that you cannot

750
00:55:42,990 --> 00:55:45,220
increment too many times.

751
00:55:45,220 --> 00:55:47,130
Each time, you go up by a bit.

752
00:55:47,130 --> 00:55:51,990
Whereas, we save a little
bit because the number

753
00:55:51,990 --> 00:55:53,780
of iterations is
now logarithmic.

754
00:55:53,780 --> 00:55:55,290
So you double in
epsilon each time.

755
00:56:01,370 --> 00:56:11,490
If I give you a function f on F3
to the n, and U is a subspace--

756
00:56:11,490 --> 00:56:13,010
so this notation means subspace.

757
00:56:17,360 --> 00:56:24,270
Let me write f sub U to
be the function obtained

758
00:56:24,270 --> 00:56:31,131
by averaging f on each U coset.

759
00:56:31,131 --> 00:56:32,730
So you have some subspace.

760
00:56:32,730 --> 00:56:36,420
You partition your space into
translates of that subspace,

761
00:56:36,420 --> 00:56:39,030
and you replace the
value of f on each coset

762
00:56:39,030 --> 00:56:41,520
by its average on that coset.

763
00:56:41,520 --> 00:56:43,710
So this is similar to
what we did with graphons.

764
00:56:43,710 --> 00:56:44,720
You're stepping.

765
00:56:44,720 --> 00:56:46,230
So you're averaging
on each block.

766
00:56:51,730 --> 00:56:53,670
So now let me prove
something which

767
00:56:53,670 --> 00:56:57,690
is kind of like an
arithmetic regularity lemma.

768
00:57:01,610 --> 00:57:05,330
And I mean, this statement
will be new to you,

769
00:57:05,330 --> 00:57:08,210
but it should look similar to
some of the statements we've

770
00:57:08,210 --> 00:57:09,430
seen before in the course.

771
00:57:12,150 --> 00:57:15,840
And the statement is
that, for every epsilon,

772
00:57:15,840 --> 00:57:21,770
there exists some m which
is a function of epsilon.

773
00:57:21,770 --> 00:57:25,070
And, in fact, it
will be bounded,

774
00:57:25,070 --> 00:57:28,850
in terms of tower of
height, at most order

775
00:57:28,850 --> 00:57:31,250
logarithmic in 1 over epsilon.

776
00:57:31,250 --> 00:57:36,860
Such that for every
function f on F3

777
00:57:36,860 --> 00:57:45,600
to the n that are values bounded
between 0 and 1, there exists

778
00:57:45,600 --> 00:57:57,750
subspaces W and U, where
the codimension of W

779
00:57:57,750 --> 00:57:59,020
is, at most, m.

780
00:57:59,020 --> 00:58:01,810
So you should think of this
as the course partition

781
00:58:01,810 --> 00:58:05,480
and the fine partition in the
partition regularity lemma.

782
00:58:05,480 --> 00:58:07,120
And the codimension is--

783
00:58:07,120 --> 00:58:09,610
corresponds to the
number of pieces.

784
00:58:09,610 --> 00:58:13,150
So three ways to codimension
is the number of cosets.

785
00:58:13,150 --> 00:58:18,010
So you have bounded many
parts, and have two partitions.

786
00:58:18,010 --> 00:58:22,540
And what I would like
is that the number--

787
00:58:22,540 --> 00:58:23,920
so if I--

788
00:58:23,920 --> 00:58:28,730
I want f to be pseudorandom
after doing this partitioning,

789
00:58:28,730 --> 00:58:29,960
so to speak.

790
00:58:29,960 --> 00:58:32,050
And this corresponds
to the statement

791
00:58:32,050 --> 00:58:39,520
that if I look f minus fW, then
the maximum Fourier coefficient

792
00:58:39,520 --> 00:58:44,020
is quite small, where
quite small means, at most,

793
00:58:44,020 --> 00:58:51,940
epsilon over the
size of U complement.

794
00:58:51,940 --> 00:58:55,920
So size of U perp.

795
00:58:55,920 --> 00:59:02,820
And, also, there is
this other condition

796
00:59:02,820 --> 00:59:10,310
which tells you that the
L3 norms between f sub U

797
00:59:10,310 --> 00:59:16,400
and f sub W are
related in this way.

798
00:59:23,798 --> 00:59:25,090
So we haven't seen this before.

799
00:59:25,090 --> 00:59:29,070
In fact, specifically, this
inequality is very ad hoc

800
00:59:29,070 --> 00:59:31,740
to the application of
popular difference in 3-AP's.

801
00:59:31,740 --> 00:59:33,570
But we have seen
something similar,

802
00:59:33,570 --> 00:59:36,030
where this relationship
is replaced

803
00:59:36,030 --> 00:59:38,460
by something that accounts
for the difference between L2

804
00:59:38,460 --> 00:59:39,828
norms.

805
00:59:39,828 --> 00:59:41,370
So if you go back
to your notes, when

806
00:59:41,370 --> 00:59:44,730
we discussed regularity lemma
in a more analytic fashion,

807
00:59:44,730 --> 00:59:45,402
we have that.

808
00:59:45,402 --> 00:59:46,860
And you should
think of this-- when

809
00:59:46,860 --> 00:59:50,497
we discussed strong regularity
lemma, this definition here,

810
00:59:50,497 --> 00:59:52,080
this roughly corresponds
to definition

811
00:59:52,080 --> 00:59:54,570
that in the fine partition
versus the course partition

812
00:59:54,570 --> 00:59:57,510
the edge densities are
roughly similar, that when you

813
00:59:57,510 --> 01:00:00,290
do the further partitioning,
you're not changing densities

814
01:00:00,290 --> 01:00:01,230
up by very much.

815
01:00:04,020 --> 01:00:08,315
So that's the arithmetic
regularity lemma.

816
01:00:08,315 --> 01:00:09,690
And once you have
the statement--

817
01:00:09,690 --> 01:00:11,760
I mean, I think the hardest part
is writing down the statement.

818
01:00:11,760 --> 01:00:14,760
Once you have the statement,
the proof itself is kind of this

819
01:00:14,760 --> 01:00:20,230
follow your nose approach,
where you first define

820
01:00:20,230 --> 01:00:21,840
the sequence of epsilons.

821
01:00:21,840 --> 01:00:26,120
Epsilon 0 is 1, and
epsilon sub k plus 1--

822
01:00:26,120 --> 01:00:28,060
and don't worry
about this for now.

823
01:00:28,060 --> 01:00:32,140
You will see in a second why
these numbers are chosen.

824
01:00:41,500 --> 01:00:46,660
Let me write R sub k to be the
set of r's-- so there will be

825
01:00:46,660 --> 01:00:48,820
characters--

826
01:00:48,820 --> 01:00:52,285
such that the Fourier
coefficient fr

827
01:00:52,285 --> 01:00:54,400
is at least epsilon sub k.

828
01:00:56,960 --> 01:00:59,570
So the r's are supposed
to identify how we're

829
01:00:59,570 --> 01:01:01,175
going to do the partitioning.

830
01:01:05,830 --> 01:01:11,540
Now, the size of
this R is bounded.

831
01:01:11,540 --> 01:01:17,140
So I claim that the
size of R is, at most, 1

832
01:01:17,140 --> 01:01:22,000
over epsilon sub k squared.

833
01:01:22,000 --> 01:01:29,470
And that's because there is
this parsable identity, which

834
01:01:29,470 --> 01:01:34,360
tells you that the L2 sum
of the Fourier coefficients

835
01:01:34,360 --> 01:01:40,480
is equal to the L2 of the
function, which is at most 1.

836
01:01:40,480 --> 01:01:42,820
So the number of Fourier
coefficients that exceed

837
01:01:42,820 --> 01:01:44,800
a certain quantity
cannot be too many.

838
01:01:52,950 --> 01:02:00,930
So let U now be the
subspace defined by taking

839
01:02:00,930 --> 01:02:03,780
the orthogonal
complement of these r's.

840
01:02:07,420 --> 01:02:15,550
And let's note that if we
take alpha sub k to be the--

841
01:02:15,550 --> 01:02:22,270
if we take alpha sub k to be the
L3 norm cubed of the function

842
01:02:22,270 --> 01:02:29,160
derived from averaging f along
the U's, and then looking

843
01:02:29,160 --> 01:02:32,720
at the third moment
of these densities.

844
01:02:32,720 --> 01:02:38,610
So these alphas, we can apply
the increment lemma initially

845
01:02:38,610 --> 01:02:41,910
to deduce that there exists--

846
01:02:41,910 --> 01:02:44,520
so, in particular, this
number here is at least alpha

847
01:02:44,520 --> 01:02:46,290
cubed by convexity.

848
01:02:49,890 --> 01:02:57,620
So by the previous lemma, there
exists some k, no more than

849
01:02:57,620 --> 01:02:59,550
on the order of 1 over--

850
01:02:59,550 --> 01:03:05,337
of log 1 over epsilon,
such that 2 alpha sub

851
01:03:05,337 --> 01:03:15,580
k minus alpha sub k plus
1 is at least the density

852
01:03:15,580 --> 01:03:18,880
of f cubed minus epsilon.

853
01:03:18,880 --> 01:03:22,000
So this alpha is supposed
to be the density of f.

854
01:03:31,320 --> 01:03:34,310
So we find this k.

855
01:03:34,310 --> 01:03:40,610
And we have this bound
over here from satisfying

856
01:03:40,610 --> 01:03:43,590
that inequality.

857
01:03:43,590 --> 01:03:46,820
So this is the density increment
argument, the energy increment

858
01:03:46,820 --> 01:03:47,800
argument.

859
01:03:47,800 --> 01:03:50,600
So we're doing the energy
increment argument, basically

860
01:03:50,600 --> 01:03:52,100
the same argument
as the one that we

861
01:03:52,100 --> 01:03:55,100
did when we discussed
graph regularity lemma,

862
01:03:55,100 --> 01:03:57,088
but now presented in a
slightly different form

863
01:03:57,088 --> 01:03:58,380
and a different order of logic.

864
01:03:58,380 --> 01:04:01,330
But it's the same argument.

865
01:04:01,330 --> 01:04:03,960
And what we would like
to show is that you also

866
01:04:03,960 --> 01:04:06,600
have this pseudorandomness
condition about having

867
01:04:06,600 --> 01:04:08,343
small Fourier coefficients.

868
01:04:16,560 --> 01:04:19,460
So what's happening here with
the Fourier coefficients?

869
01:04:19,460 --> 01:04:23,200
Now, how is the Fourier
coefficient of an average

870
01:04:23,200 --> 01:04:26,050
f related to the original f?

871
01:04:26,050 --> 01:04:31,680
So that's something you
want to understand up there.

872
01:04:31,680 --> 01:04:34,200
And that's something
that's not hard to analyze.

873
01:04:34,200 --> 01:04:40,395
Because if you have
a function U or W--

874
01:04:40,395 --> 01:04:44,790
so either one-- then
the Fourier coefficients

875
01:04:44,790 --> 01:04:48,000
of this average version
is very much related

876
01:04:48,000 --> 01:04:49,050
to the original function.

877
01:04:51,810 --> 01:04:58,510
It turns out that if
you take an r which

878
01:04:58,510 --> 01:05:07,780
is in the orthogonal complement,
then the Fourier coefficient

879
01:05:07,780 --> 01:05:10,010
doesn't change.

880
01:05:10,010 --> 01:05:16,800
And if you are not in the
orthogonal complement,

881
01:05:16,800 --> 01:05:19,950
then the Fourier
coefficient gets zeroed out.

882
01:05:27,170 --> 01:05:29,990
So that's something that's
not too hard to check,

883
01:05:29,990 --> 01:05:33,550
and I urge you to
think about it.

884
01:05:33,550 --> 01:05:37,590
So, with that in mind, let's go
back to verify this over here.

885
01:05:40,680 --> 01:05:43,600
So what we have
now is that the--

886
01:05:49,072 --> 01:05:55,570
so this quantity, which
measures the largest Fourier

887
01:05:55,570 --> 01:06:01,430
coefficient, the difference
between f and U sub k plus 1,

888
01:06:01,430 --> 01:06:05,220
is, at most--

889
01:06:05,220 --> 01:06:08,030
and what U sub k
plus 1 is doing is

890
01:06:08,030 --> 01:06:11,090
we're looking at possible
large Fourier coefficients,

891
01:06:11,090 --> 01:06:14,060
and we are getting rid of them.

892
01:06:14,060 --> 01:06:18,680
So we're zeroing out these
large Fourier coefficients,

893
01:06:18,680 --> 01:06:21,350
so that the remaining
Fourier coefficients are all

894
01:06:21,350 --> 01:06:22,180
quite small.

895
01:06:31,430 --> 01:06:35,060
But we chose our R so that if--

896
01:06:35,060 --> 01:06:36,260
so this big R--

897
01:06:36,260 --> 01:06:38,450
so that if your little
r is not in big R,

898
01:06:38,450 --> 01:06:40,400
then the Fourier
coefficient must be small.

899
01:06:40,400 --> 01:06:44,960
That's how we chose
the big R. So we

900
01:06:44,960 --> 01:06:49,940
have this bound over here.

901
01:06:49,940 --> 01:07:01,790
And by the definition of the
epsilon, we have that bound.

902
01:07:01,790 --> 01:07:04,040
And, also, we're combining
with this estimate,

903
01:07:04,040 --> 01:07:08,850
upper bound estimate
on the size of R sub k.

904
01:07:08,850 --> 01:07:15,010
So point being we have that.

905
01:07:15,010 --> 01:07:23,190
So now take W to be U sub k
plus 1, and U to b U sub k,

906
01:07:23,190 --> 01:07:26,670
and then we have
everything that we want.

907
01:07:26,670 --> 01:07:27,360
Question, yes.

908
01:07:27,360 --> 01:07:30,540
AUDIENCE: Why is the
codimension of W small?

909
01:07:30,540 --> 01:07:33,550
YUFEI ZHAO: Question is, why
is the codimension of W small?

910
01:07:33,550 --> 01:07:35,240
So what is the codimension of W?

911
01:07:38,590 --> 01:07:42,860
So we want to know that the
codimension of W is bounded.

912
01:07:42,860 --> 01:07:45,620
So the codimension of W is--

913
01:07:48,742 --> 01:07:52,720
I mean, the codimension of
any of these U sub k's is,

914
01:07:52,720 --> 01:08:01,090
at most, 3 raised to the
number of r's that produce it.

915
01:08:01,090 --> 01:08:04,540
And the size of R is bounded.

916
01:08:04,540 --> 01:08:13,660
So if we pick m so that it
uniformly bounds the size of R,

917
01:08:13,660 --> 01:08:16,660
then we have a bound
on the codimension.

918
01:08:16,660 --> 01:08:17,588
So that's important.

919
01:08:17,588 --> 01:08:19,630
So we need to know that
the codimension is small.

920
01:08:19,630 --> 01:08:21,838
Otherwise, if you don't have
the bound on codimension

921
01:08:21,838 --> 01:08:26,062
you can just take
the zero subspace,

922
01:08:26,062 --> 01:08:27,479
and, trivially,
everything's true.

923
01:08:31,490 --> 01:08:34,149
We have a regularity lemma, and
what comes with a regularity

924
01:08:34,149 --> 01:08:35,652
lemma is a counting lemma.

925
01:08:35,652 --> 01:08:37,319
So let me write down
the counting lemma,

926
01:08:37,319 --> 01:08:38,319
and I'll skip the proof.

927
01:08:44,560 --> 01:08:50,439
So the counting lemma tells
you that if you have f and g

928
01:08:50,439 --> 01:09:01,510
both functions on F3 to the n,
and U is a subspace F, then--

929
01:09:01,510 --> 01:09:02,859
so let me define--

930
01:09:02,859 --> 01:09:05,800
so the quantity that
I'm interested in is--

931
01:09:08,939 --> 01:09:19,069
so I'm interested in
understanding 3-AP's where

932
01:09:19,069 --> 01:09:21,630
the common difference is
in a particular subspace.

933
01:09:32,700 --> 01:09:41,920
So we claim that the 3-AP count
of f with common difference

934
01:09:41,920 --> 01:09:43,990
restricted to the subspace U--

935
01:09:47,850 --> 01:09:51,370
so it's similar between
f and g if f and g are

936
01:09:51,370 --> 01:09:53,979
close to each other in Fourier.

937
01:09:57,490 --> 01:09:59,200
Well, not quite, because--

938
01:09:59,200 --> 01:10:01,890
so something like
this, we saw earlier

939
01:10:01,890 --> 01:10:03,990
in the proof of Roth's
theorem if we don't

940
01:10:03,990 --> 01:10:05,518
restrict the common difference.

941
01:10:05,518 --> 01:10:07,560
Turns out, if you restrict
the common difference,

942
01:10:07,560 --> 01:10:10,020
you lose a little bit.

943
01:10:10,020 --> 01:10:12,510
So you lose a factor
which is basically

944
01:10:12,510 --> 01:10:19,280
the size of the complement
of U. So I won't prove that.

945
01:10:21,850 --> 01:10:25,500
But now let me go on
to the punch line.

946
01:10:29,920 --> 01:10:35,830
So if we start with, again,
f function in your space,

947
01:10:35,830 --> 01:10:45,340
taking bounds between 0 and 1,
and I have subspaces U and W,

948
01:10:45,340 --> 01:10:49,000
I claim that the--

949
01:10:49,000 --> 01:10:53,142
if I look at f
averaged through W,

950
01:10:53,142 --> 01:10:57,370
and I consider 3-AP counts with
common difference restricted

951
01:10:57,370 --> 01:11:01,480
to U, then this
quantity here is lower

952
01:11:01,480 --> 01:11:08,490
bounded by this difference
between L3 norms.

953
01:11:16,850 --> 01:11:17,890
So I claim this is true.

954
01:11:20,700 --> 01:11:22,480
So this is just some inequality.

955
01:11:22,480 --> 01:11:23,780
This is some inequality.

956
01:11:26,830 --> 01:11:30,130
So of all the things that
I did back in high school

957
01:11:30,130 --> 01:11:32,440
doing math competitions, I
think the one skill which,

958
01:11:32,440 --> 01:11:34,690
I think, I find
most helpful now is

959
01:11:34,690 --> 01:11:36,830
being able to do inequalities.

960
01:11:36,830 --> 01:11:40,000
And I thought I would never
see these three-variable

961
01:11:40,000 --> 01:11:42,430
inequalities again, but
when I saw this one--

962
01:11:42,430 --> 01:11:44,560
so Fox and Pham, when
they first showed me

963
01:11:44,560 --> 01:11:47,320
a somewhat different
proof of an approach that

964
01:11:47,320 --> 01:11:49,763
didn't go through this specific
inequality, I told them,

965
01:11:49,763 --> 01:11:51,930
hey, there's this thing I
remember from high school.

966
01:11:51,930 --> 01:11:54,042
It's called Schur's inequality.

967
01:11:54,042 --> 01:11:56,500
And I thought I would never
see it again after high school,

968
01:11:56,500 --> 01:11:57,875
but apparently
it's still useful.

969
01:12:01,360 --> 01:12:03,130
So what Schur's
inequality says--

970
01:12:06,160 --> 01:12:08,560
this is one of those
three-variable inequalities

971
01:12:08,560 --> 01:12:13,420
that you would know if
you did math olympiads--

972
01:12:13,420 --> 01:12:22,270
that you have-- so
it's an inequality

973
01:12:22,270 --> 01:12:26,860
between non-negative-- actually,
it's true for real numbers

974
01:12:26,860 --> 01:12:30,390
as well, but let's say it's
non-negative real numbers.

975
01:12:36,040 --> 01:12:37,983
So that's Schur's equality.

976
01:12:42,420 --> 01:12:56,760
So if you look at the left-hand
side, the left-hand side is--

977
01:12:56,760 --> 01:12:59,310
it can be written as a
sum in the following way.

978
01:12:59,310 --> 01:13:01,720
I mean, it can be written
in the following way.

979
01:13:01,720 --> 01:13:06,450
So its expectation over
x, y, z that are 3-AP's

980
01:13:06,450 --> 01:13:08,625
in the same U coset.

981
01:13:12,880 --> 01:13:15,480
So I'm counting 3-AP's with
common difference restricted

982
01:13:15,480 --> 01:13:20,310
to U. So common 3-AP's
in the same U coset.

983
01:13:20,310 --> 01:13:26,260
And I am looking at
the product of f sub

984
01:13:26,260 --> 01:13:31,610
W evaluated on this 3-AP.

985
01:13:31,610 --> 01:13:37,380
So what I would like to do now
is apply Schur's inequality

986
01:13:37,380 --> 01:13:45,170
to a, b, and c, being
these three numbers.

987
01:13:45,170 --> 01:13:48,030
The point is you have
this a, b, c on the left.

988
01:13:48,030 --> 01:13:50,640
And then everything on the
right involves only a subset

989
01:13:50,640 --> 01:13:54,220
of a, b, c, and they simplify.

990
01:13:54,220 --> 01:13:58,720
So if I do this, then I
lower bound this quantity

991
01:13:58,720 --> 01:14:08,990
by twice the expectation of
x and y in the same coset,

992
01:14:08,990 --> 01:14:22,690
same U coset of f sub W
of x squared f sub W of y.

993
01:14:25,420 --> 01:14:27,372
Maybe I took two other
things, but they're

994
01:14:27,372 --> 01:14:29,080
all symmetric with
respect to each other.

995
01:14:31,720 --> 01:14:36,180
And minus the term
that corresponds

996
01:14:36,180 --> 01:14:39,915
to this sum of cubes.

997
01:14:39,915 --> 01:14:43,180
So like that.

998
01:14:43,180 --> 01:14:47,130
So this is a consequence
of Schur's equality applied

999
01:14:47,130 --> 01:14:48,720
with a, b, c like this.

1000
01:14:51,870 --> 01:14:58,700
But now you see, over here,
I can analyze this expression

1001
01:14:58,700 --> 01:14:59,300
even further.

1002
01:14:59,300 --> 01:15:03,640
Because if I let y vary
within the same U coset,

1003
01:15:03,640 --> 01:15:08,180
then, over here, it
averages out to U cosets.

1004
01:15:08,180 --> 01:15:14,860
So U is bigger than W.
So what we have is--

1005
01:15:18,030 --> 01:15:22,340
so what we have over here
is that it is at least twice

1006
01:15:22,340 --> 01:15:26,381
of f of f--

1007
01:15:26,381 --> 01:15:38,180
f of U-- fW squared fU minus
the expectation of fW squared--

1008
01:15:38,180 --> 01:15:41,290
fW cubed.

1009
01:15:41,290 --> 01:15:46,130
And I can use
convexity on f sub W

1010
01:15:46,130 --> 01:16:04,150
to get that, which is
what we're looking for.

1011
01:16:04,150 --> 01:16:07,092
So the last step is convexity.

1012
01:16:07,092 --> 01:16:08,550
So I'm running
through a little bit

1013
01:16:08,550 --> 01:16:10,590
quick here because we're
running out of time,

1014
01:16:10,590 --> 01:16:13,200
but all of these
steps are fairly

1015
01:16:13,200 --> 01:16:15,420
simple once you observe
the first thing you can do

1016
01:16:15,420 --> 01:16:18,830
is Schur's inequality.

1017
01:16:18,830 --> 01:16:20,460
And we're almost there.

1018
01:16:20,460 --> 01:16:21,360
We're almost done.

1019
01:16:21,360 --> 01:16:23,270
We're almost done.

1020
01:16:23,270 --> 01:16:29,230
So from that lemma
up there, I claim now

1021
01:16:29,230 --> 01:16:32,770
that, for every
epsilon, there exists

1022
01:16:32,770 --> 01:16:41,680
some m which is tower log
in 1 over epsilon, such

1023
01:16:41,680 --> 01:16:48,670
that if f is a function
on F3 to the n,

1024
01:16:48,670 --> 01:16:53,380
taking bounds
between 0 and 1, then

1025
01:16:53,380 --> 01:17:00,410
there exists a subspace
U of codimension,

1026
01:17:00,410 --> 01:17:08,590
at most, m such that
the 3-AP count, 3-AP

1027
01:17:08,590 --> 01:17:11,680
density with common
difference restricted to U,

1028
01:17:11,680 --> 01:17:17,710
is at least the random
bound minus epsilon.

1029
01:17:24,370 --> 01:17:25,580
Why is this true?

1030
01:17:25,580 --> 01:17:32,660
Well, we put everything
together, and choose U and W

1031
01:17:32,660 --> 01:17:34,610
as in regularity lemma.

1032
01:17:37,630 --> 01:17:48,720
And, by counting lemma, we have
that the 3-AP density of f,

1033
01:17:48,720 --> 01:17:52,560
so it is at least--

1034
01:17:52,560 --> 01:17:54,830
so we're using counting
lemma over here--

1035
01:17:54,830 --> 01:18:00,480
it is at least the 3-AP
density of f sub W of U

1036
01:18:00,480 --> 01:18:04,418
minus a small error
which we can control.

1037
01:18:04,418 --> 01:18:05,460
So this step is counting.

1038
01:18:09,710 --> 01:18:11,520
And now we apply that
inequality up there.

1039
01:18:23,720 --> 01:18:27,120
And finally, we chose our U
and W in the regularity lemma

1040
01:18:27,120 --> 01:18:31,830
so that this difference
here is controlled.

1041
01:18:31,830 --> 01:18:37,370
So it is controlled by the
random bound minus epsilon.

1042
01:18:42,100 --> 01:18:43,530
And that's it.

1043
01:18:43,530 --> 01:18:45,420
So you change
epsilon to 4 epsilon,

1044
01:18:45,420 --> 01:18:47,830
but we can change it back.

1045
01:18:47,830 --> 01:18:48,600
And that's it.

1046
01:18:51,600 --> 01:18:53,640
So we have the
statement that you

1047
01:18:53,640 --> 01:18:57,660
have this subspace of
bounded codimension where

1048
01:18:57,660 --> 01:18:59,880
you have this popular
difference result.

1049
01:18:59,880 --> 01:19:02,790
It doesn't quite guarantee you
a single common difference,

1050
01:19:02,790 --> 01:19:04,760
because, well, you
don't really want

1051
01:19:04,760 --> 01:19:10,200
it to be the case where U is
just a single point because I

1052
01:19:10,200 --> 01:19:13,090
want a nonzero
common difference.

1053
01:19:13,090 --> 01:19:15,580
But if U is large enough--

1054
01:19:15,580 --> 01:19:21,060
if n is large enough
at bounded codimension,

1055
01:19:21,060 --> 01:19:22,930
so, then, the size
of U is large enough.

1056
01:19:25,780 --> 01:19:31,260
So, then, there exists some
nonzero common difference.

1057
01:19:31,260 --> 01:19:35,160
You pick some nonzero
element of U. On average,

1058
01:19:35,160 --> 01:19:38,170
this should work out just fine.

1059
01:19:38,170 --> 01:19:41,820
So I'll leave that
detail to you.

1060
01:19:41,820 --> 01:19:43,530
One more thing I
want to mention is

1061
01:19:43,530 --> 01:19:46,650
that all of this machinery
involving regularity

1062
01:19:46,650 --> 01:19:49,450
and Fourier, as with
things we've done before,

1063
01:19:49,450 --> 01:19:52,720
carries over to other settings--
general Abelian groups,

1064
01:19:52,720 --> 01:19:55,490
and also the integers.

1065
01:19:55,490 --> 01:19:57,710
And you may ask, well,
we have this for 3-AP's.

1066
01:19:57,710 --> 01:20:01,570
What about longer
arithmetic progressions?

1067
01:20:01,570 --> 01:20:03,750
In the integers, it
turns out it is also

1068
01:20:03,750 --> 01:20:07,170
true, that Green's
statement, in the integers

1069
01:20:07,170 --> 01:20:10,845
if you replace 3-AP by 4-AP.

1070
01:20:10,845 --> 01:20:12,220
That's a theorem
of Green and Tao

1071
01:20:12,220 --> 01:20:15,470
involving higher-order quadratic
analysis-- quadratic Fourier

1072
01:20:15,470 --> 01:20:17,140
analysis.

1073
01:20:17,140 --> 01:20:25,130
However, and rather
surprisingly, 4-AP, it's OK.

1074
01:20:25,130 --> 01:20:27,790
But 5-AP and
longer, it is false.

1075
01:20:30,880 --> 01:20:33,250
The corresponding statement
about popular differences

1076
01:20:33,250 --> 01:20:35,960
for 5-AP in the
integers is false.

1077
01:20:35,960 --> 01:20:38,285
There are counterexamples.

1078
01:20:38,285 --> 01:20:40,910
So it's really a statement about
3-AP's and 4-AP's, and there's

1079
01:20:40,910 --> 01:20:42,470
some magic cancellations
that happen

1080
01:20:42,470 --> 01:20:43,637
in 4-AP's that make it true.

1081
01:20:48,450 --> 01:20:48,950
OK, great.

1082
01:20:48,950 --> 01:20:50,980
So that's all for today.