1
00:00:00,500 --> 00:00:03,912
[SQUEAKING]

2
00:00:18,590 --> 00:00:21,720
PROFESSOR: Last time, we
started discussing graph limits.

3
00:00:21,720 --> 00:00:24,908
And let me remind you some of
the notions and definitions

4
00:00:24,908 --> 00:00:25,700
that were involved.

5
00:00:35,590 --> 00:00:37,490
One of the main
objects in graph limits

6
00:00:37,490 --> 00:00:46,670
is that of a graphon, which are
symmetric, measurable functions

7
00:00:46,670 --> 00:00:49,490
from the unit squared
to the unit interval.

8
00:00:58,890 --> 00:01:02,570
So here, symmetric means
that w of x, comma, y

9
00:01:02,570 --> 00:01:04,670
equals to w of y, comma, x.

10
00:01:09,810 --> 00:01:11,520
We define a notion
of convergence

11
00:01:11,520 --> 00:01:13,980
for a sequence of graphons.

12
00:01:13,980 --> 00:01:21,080
And remember, the
notion of convergence

13
00:01:21,080 --> 00:01:33,330
is that a sequence is convergent
if the sequence of homomorphism

14
00:01:33,330 --> 00:01:43,330
densities converges as n goes
to infinity for every fixed

15
00:01:43,330 --> 00:01:45,680
F, every fixed graph.

16
00:01:49,480 --> 00:01:52,180
So this is how we
define convergence.

17
00:01:52,180 --> 00:01:53,920
So a sequence of
graphs or graphons,

18
00:01:53,920 --> 00:01:58,360
they converge if all the
homomorphism densities--

19
00:01:58,360 --> 00:02:01,200
so you should think of this
as subgraph statistics--

20
00:02:01,200 --> 00:02:04,520
if all of these
statistics converge.

21
00:02:04,520 --> 00:02:10,180
We also say that a sequence
converges to a particular limit

22
00:02:10,180 --> 00:02:16,180
if these homomorphism
densities converge

23
00:02:16,180 --> 00:02:20,170
to the corresponding
homomorphism density

24
00:02:20,170 --> 00:02:24,510
of the limit for every F.

25
00:02:24,510 --> 00:02:25,010
OK.

26
00:02:25,010 --> 00:02:27,740
So this is how we
define convergence.

27
00:02:27,740 --> 00:02:29,870
We also define this
notion of a distance.

28
00:02:33,140 --> 00:02:35,170
And to do that, we
first define the cut

29
00:02:35,170 --> 00:02:41,900
norm to be the following
quantity defined

30
00:02:41,900 --> 00:02:49,340
by taking two subsets, S
and T, which are measurable.

31
00:02:49,340 --> 00:02:51,890
Everything so far is
going to be measurable.

32
00:02:51,890 --> 00:02:55,820
And look at what is the
maximum possible deviation

33
00:02:55,820 --> 00:03:00,350
of the integral of this
function on this box, S cross T.

34
00:03:00,350 --> 00:03:03,800
And here, w, you should think
of it as taking real values,

35
00:03:03,800 --> 00:03:06,133
allowing both positive
and negative values,

36
00:03:06,133 --> 00:03:07,550
because otherwise,
you should just

37
00:03:07,550 --> 00:03:11,410
take S and T to be
the whole interval.

38
00:03:11,410 --> 00:03:12,850
OK.

39
00:03:12,850 --> 00:03:14,950
And this definition
was motivated

40
00:03:14,950 --> 00:03:19,620
by our discussion of discrepancy
coming from quasi randomness.

41
00:03:19,620 --> 00:03:22,280
Now, if I give you
two graphs or graphons

42
00:03:22,280 --> 00:03:24,170
and ask you to
compare them, you are

43
00:03:24,170 --> 00:03:28,550
allowed to permute the
vertices in some sense,

44
00:03:28,550 --> 00:03:31,140
so to find the best overlay.

45
00:03:31,140 --> 00:03:34,040
And that notion is
captured in the definition

46
00:03:34,040 --> 00:03:40,610
of cut distance, which is
defined to be the following

47
00:03:40,610 --> 00:03:53,540
quantity, where we consider over
all possible measure-preserving

48
00:03:53,540 --> 00:04:10,470
bijections from the interval
to itself of the difference

49
00:04:10,470 --> 00:04:14,130
between these two
graphons if I rotate

50
00:04:14,130 --> 00:04:18,750
one of them using this
measure-preserving bijection.

51
00:04:26,460 --> 00:04:29,175
So think of this as
permuting the vertices.

52
00:04:36,130 --> 00:04:39,660
So these were the definitions
that were involved last time.

53
00:04:39,660 --> 00:04:41,410
And at the end of
last lecture, I

54
00:04:41,410 --> 00:04:45,060
stated three main theorems
of graph limit theory.

55
00:04:45,060 --> 00:04:47,230
So I forgot to
mention what are some

56
00:04:47,230 --> 00:04:49,820
of the histories of this theory.

57
00:04:49,820 --> 00:04:52,360
So there were a number
of important papers

58
00:04:52,360 --> 00:04:57,250
that developed this very idea of
graph limits, which is actually

59
00:04:57,250 --> 00:05:00,100
somewhat-- if you think
about all of combinatorics,

60
00:05:00,100 --> 00:05:02,830
we like to deal with
discrete objects.

61
00:05:02,830 --> 00:05:06,610
And even the idea of taking
a limit is rather novel.

62
00:05:06,610 --> 00:05:11,830
So this work is due
to a number of people.

63
00:05:11,830 --> 00:05:14,830
In particular, Laszlo Lovasz
played a very important

64
00:05:14,830 --> 00:05:17,200
central role in the
development of this theory.

65
00:05:17,200 --> 00:05:19,480
And various people
came to this theory

66
00:05:19,480 --> 00:05:21,460
from different
perspectives-- some

67
00:05:21,460 --> 00:05:24,160
from more pure
perspectives, and some

68
00:05:24,160 --> 00:05:26,290
from more applied perspectives.

69
00:05:26,290 --> 00:05:29,810
And this theory is now getting
used in more and more places,

70
00:05:29,810 --> 00:05:33,030
including statistics,
machine learning, and so on.

71
00:05:33,030 --> 00:05:37,990
And I'll explain where that
comes up just a little bit.

72
00:05:37,990 --> 00:05:40,870
At the end of last lecture,
I stated three main theorems.

73
00:05:40,870 --> 00:05:44,560
And what I want to do
today is develop some tools

74
00:05:44,560 --> 00:05:47,777
so that we can prove those
theorems in the next lecture.

75
00:05:47,777 --> 00:05:48,277
OK.

76
00:05:48,277 --> 00:05:49,910
So I want to develop some tools.

77
00:05:49,910 --> 00:05:52,510
In particular, you'll see some
of the things that we've talked

78
00:05:52,510 --> 00:05:55,960
about in the chapter on
Szemerédi's regularity lemma

79
00:05:55,960 --> 00:05:59,140
come up again in a slightly
different language.

80
00:05:59,140 --> 00:06:02,320
So much of what I will say
today hopefully should already

81
00:06:02,320 --> 00:06:04,660
be familiar to you, but
you will see it again

82
00:06:04,660 --> 00:06:08,690
from the perspective
of graph limits.

83
00:06:08,690 --> 00:06:11,257
But first, before telling
you about the tools,

84
00:06:11,257 --> 00:06:12,840
I want to give you
some more examples.

85
00:06:15,580 --> 00:06:17,970
So one of the ways that
I motivated graph limits

86
00:06:17,970 --> 00:06:22,380
last time is this example of
an Erdos-Renyi random graph

87
00:06:22,380 --> 00:06:25,470
or a sequence of quasi-random
graphs converging

88
00:06:25,470 --> 00:06:26,540
to a constant.

89
00:06:26,540 --> 00:06:30,690
The constant graphon
is the limit.

90
00:06:30,690 --> 00:06:32,170
But what about generalizations?

91
00:06:32,170 --> 00:06:34,590
What about generalizations
of that construction when

92
00:06:34,590 --> 00:06:37,500
your limit is not the constant?

93
00:06:37,500 --> 00:06:43,530
So this leads to this idea
of a w random graph, which

94
00:06:43,530 --> 00:06:49,250
generalizes that of an
Erdos-Renyi random graph.

95
00:06:49,250 --> 00:06:58,390
So in Erdos-Renyi, we're
looking at every edge occurring

96
00:06:58,390 --> 00:07:03,260
with the same probability, p,
uniform throughout the graph.

97
00:07:03,260 --> 00:07:07,250
But what I want to do now is
allow you to change the edge

98
00:07:07,250 --> 00:07:08,920
probability somewhat.

99
00:07:08,920 --> 00:07:09,420
OK.

100
00:07:12,288 --> 00:07:14,330
So before giving you the
more general definition,

101
00:07:14,330 --> 00:07:19,160
a special case of this
is an important model

102
00:07:19,160 --> 00:07:22,090
of random graphs known as
the stochastic block model.

103
00:07:25,802 --> 00:07:31,700
And in particular, a two-block
model consists of the following

104
00:07:31,700 --> 00:07:39,330
data where I am looking
at two types of vertices--

105
00:07:44,750 --> 00:07:46,030
let's call them red and blue--

106
00:07:49,650 --> 00:07:54,830
where the vertices are
assigned to colors at random--

107
00:08:01,570 --> 00:08:03,050
for example, 50/50.

108
00:08:03,050 --> 00:08:05,690
But any other
probability is fine.

109
00:08:05,690 --> 00:08:10,330
And now I put down the edges
according to which colors

110
00:08:10,330 --> 00:08:12,340
the two endpoints are.

111
00:08:12,340 --> 00:08:23,500
So two red vertices are joined
with edge probability Prr.

112
00:08:23,500 --> 00:08:26,920
If I have a red
and a blue, then I

113
00:08:26,920 --> 00:08:32,380
may have a different probability
joining them, and likewise

114
00:08:32,380 --> 00:08:38,400
with blue-blue, like that.

115
00:08:38,400 --> 00:08:40,960
So in other words, I can encode
this probability information

116
00:08:40,960 --> 00:08:50,890
in the matrix, like that.

117
00:08:50,890 --> 00:08:54,800
So it's symmetric
across the diagonal.

118
00:08:54,800 --> 00:08:57,220
So this is a slightly
more general version

119
00:08:57,220 --> 00:08:59,950
of an Erdos-Renyi
random graph where now I

120
00:08:59,950 --> 00:09:02,320
have potentially different
types of vertices.

121
00:09:02,320 --> 00:09:04,090
And you can imagine
these kinds of models

122
00:09:04,090 --> 00:09:06,010
are very important in
applied mathematics

123
00:09:06,010 --> 00:09:09,740
for modeling certain situations
such as, for example,

124
00:09:09,740 --> 00:09:14,890
if you have people with
different political party

125
00:09:14,890 --> 00:09:16,270
affiliations.

126
00:09:16,270 --> 00:09:19,890
How likely are they
to talk to each other?

127
00:09:19,890 --> 00:09:22,050
So you can imagine
some of these numbers

128
00:09:22,050 --> 00:09:24,700
might be bigger than others.

129
00:09:24,700 --> 00:09:27,360
And there's an important
statistical problem.

130
00:09:27,360 --> 00:09:31,050
If I give you a graph, can
you cluster or classify

131
00:09:31,050 --> 00:09:33,330
the vertices according
to their types

132
00:09:33,330 --> 00:09:36,690
if I do not show you in advance
what the colors are but show

133
00:09:36,690 --> 00:09:39,340
you what the output graph is?

134
00:09:39,340 --> 00:09:41,490
So these are important
statistical questions

135
00:09:41,490 --> 00:09:45,750
with lots of applications.

136
00:09:45,750 --> 00:09:48,570
This is an example of if
you have only two blocks.

137
00:09:48,570 --> 00:09:52,030
But of course, you can
have more than two blocks.

138
00:09:52,030 --> 00:09:55,810
And the graphon context
tells us that we should not

139
00:09:55,810 --> 00:09:58,540
limit ourselves to just blocks.

140
00:09:58,540 --> 00:10:02,200
If I give you any
graphon w, I can also

141
00:10:02,200 --> 00:10:06,040
construct a random graph.

142
00:10:06,040 --> 00:10:08,980
So what I would like
to do is to consider

143
00:10:08,980 --> 00:10:12,080
the following
construction where--

144
00:10:12,080 --> 00:10:19,420
OK, so let's just call
it w random graph denoted

145
00:10:19,420 --> 00:10:23,920
by g and w--

146
00:10:23,920 --> 00:10:28,510
where I form the graph
using the following process.

147
00:10:28,510 --> 00:10:34,480
First, the vertex set is
labeled by 1 through n.

148
00:10:34,480 --> 00:10:44,640
And let me draw the vertex types
by taking uniform random x1

149
00:10:44,640 --> 00:10:46,946
through xn--

150
00:10:46,946 --> 00:10:51,080
OK, so uniform iid.

151
00:10:51,080 --> 00:10:54,170
So you think of them as the
vertex colors, the vertex

152
00:10:54,170 --> 00:10:55,560
types.

153
00:10:55,560 --> 00:11:03,440
And I put an edge
between i and j

154
00:11:03,440 --> 00:11:10,834
with probability
exactly w of xi,

155
00:11:10,834 --> 00:11:17,382
xj, so for all i less
than j independently.

156
00:11:21,160 --> 00:11:23,950
That's the definition
of a w random graph.

157
00:11:23,950 --> 00:11:26,790
And the two-block
stochastic model

158
00:11:26,790 --> 00:11:29,470
is a special case of
this w random graph

159
00:11:29,470 --> 00:11:31,720
for the graphon,
which corresponds

160
00:11:31,720 --> 00:11:35,310
to this red-blue picture here.

161
00:11:38,650 --> 00:11:49,300
So the generation process would
be I give you some x1, x2, x3,

162
00:11:49,300 --> 00:11:57,250
and then, likewise, x1, x3, x2.

163
00:11:57,250 --> 00:12:01,260
And then I evaluate, what
is the value of this graphon

164
00:12:01,260 --> 00:12:02,750
at these points?

165
00:12:11,450 --> 00:12:15,080
And those are my
edge probabilities.

166
00:12:15,080 --> 00:12:17,420
So what I described
is a special case

167
00:12:17,420 --> 00:12:19,580
of this general w random graph.

168
00:12:22,460 --> 00:12:25,570
Any questions?

169
00:12:25,570 --> 00:12:28,390
So like before, an important
statistical question

170
00:12:28,390 --> 00:12:31,250
is if I show you
the graph, can you

171
00:12:31,250 --> 00:12:37,210
tell me a good model for
where this graph came from?

172
00:12:37,210 --> 00:12:41,460
So that's one of the reasons
why people in applied math

173
00:12:41,460 --> 00:12:45,970
might care about these
types of constructions.

174
00:12:45,970 --> 00:12:47,350
Let me talk about some theorems.

175
00:12:51,050 --> 00:12:54,800
I've told you that the sequence
of Erdos-Renyi random graphs

176
00:12:54,800 --> 00:12:57,770
converges to the
constant graphon p.

177
00:12:57,770 --> 00:13:01,190
So instead of taking
a constant graphon p,

178
00:13:01,190 --> 00:13:04,190
now I start with w random graph.

179
00:13:04,190 --> 00:13:06,860
And you should expect,
and it is indeed true,

180
00:13:06,860 --> 00:13:12,500
that this sequence converges
to w as their limit.

181
00:13:12,500 --> 00:13:14,190
So let w be a graphon.

182
00:13:19,695 --> 00:13:21,830
So let w be a graphon.

183
00:13:21,830 --> 00:13:28,450
And for each n, let me
draw this graph G sub

184
00:13:28,450 --> 00:13:34,472
n using the w random
graph model independently.

185
00:13:37,640 --> 00:13:47,810
Then with probability
1, the sequence

186
00:13:47,810 --> 00:13:50,480
converges to the graphon w.

187
00:13:53,680 --> 00:13:58,190
So in the sense that I've
shown above, described above.

188
00:13:58,190 --> 00:14:01,640
So this statement
tells us a couple

189
00:14:01,640 --> 00:14:04,900
of things-- one, that w random
graphs converge to the limit w,

190
00:14:04,900 --> 00:14:12,400
as you should expect; and
two, that every graphon w

191
00:14:12,400 --> 00:14:17,750
is the limit point of
some sequence of graphs.

192
00:14:17,750 --> 00:14:20,650
So this is something that
we never quite explicitly

193
00:14:20,650 --> 00:14:21,950
stated before.

194
00:14:21,950 --> 00:14:24,980
So let me make this remark.

195
00:14:24,980 --> 00:14:39,670
So in particular,
every w is the limit

196
00:14:39,670 --> 00:14:47,998
of some sequence of graphs,
just like every real number,

197
00:14:47,998 --> 00:14:49,540
in analogy to what
we said last time.

198
00:14:49,540 --> 00:14:52,340
Every real number is
the limit of a sequence

199
00:14:52,340 --> 00:14:55,760
of rational numbers through
rational approximation.

200
00:14:55,760 --> 00:14:59,570
And this is some form of
approximation of a graphon

201
00:14:59,570 --> 00:15:01,425
by a sequence of graphs.

202
00:15:01,425 --> 00:15:01,925
OK.

203
00:15:01,925 --> 00:15:03,740
So I'm not going to
prove this theorem.

204
00:15:03,740 --> 00:15:08,420
The proof is not difficult.
So using that definition

205
00:15:08,420 --> 00:15:11,240
of subgraph
convergence, the proof

206
00:15:11,240 --> 00:15:16,890
uses what's known as
Azuma's inequality.

207
00:15:16,890 --> 00:15:21,110
So by an appropriate application
of Azuma's inequality

208
00:15:21,110 --> 00:15:22,790
on the concentration
of martingales,

209
00:15:22,790 --> 00:15:27,110
one can prove this
theorem here by estimating

210
00:15:27,110 --> 00:15:28,970
the probability that--

211
00:15:35,180 --> 00:15:41,960
to show that the probability
that the F density in Gn,

212
00:15:41,960 --> 00:15:47,330
it is very close to
the F density in w

213
00:15:47,330 --> 00:15:49,336
with high probability.

214
00:15:52,252 --> 00:15:55,145
OK.

215
00:15:55,145 --> 00:15:56,020
Any questions so far?

216
00:15:58,820 --> 00:16:02,600
So this is an important
example of one

217
00:16:02,600 --> 00:16:06,220
of the motivations
of graph limits.

218
00:16:06,220 --> 00:16:09,460
But now, let's get back
to what I said earlier.

219
00:16:09,460 --> 00:16:11,810
I would like to develop
a sequence of tools

220
00:16:11,810 --> 00:16:14,150
that will allow us to prove
the main theorem stated

221
00:16:14,150 --> 00:16:18,000
at the end of the last lecture.

222
00:16:18,000 --> 00:16:19,470
And this will sound
very familiar,

223
00:16:19,470 --> 00:16:23,610
because we're going to write
down some lemmas that we did

224
00:16:23,610 --> 00:16:26,490
back in the chapter of
Szemerédi's regularity lemma

225
00:16:26,490 --> 00:16:29,450
but now in the
language of graphons.

226
00:16:29,450 --> 00:16:31,600
So the first is
a counting lemma.

227
00:16:38,270 --> 00:16:39,770
The goal of the
counting lemma is

228
00:16:39,770 --> 00:16:42,590
to show that if you
have two graphons which

229
00:16:42,590 --> 00:16:50,060
are close to each other in the
sense of cut distance, then

230
00:16:50,060 --> 00:16:55,530
their F densities are
similar to each other.

231
00:16:55,530 --> 00:16:57,190
So here's a statement.

232
00:16:57,190 --> 00:17:05,403
So if w and u are
graphons and F is

233
00:17:05,403 --> 00:17:19,460
a graph, then the F density
of w minus the F density of u,

234
00:17:19,460 --> 00:17:24,940
their difference is no more
than a constant-- so number

235
00:17:24,940 --> 00:17:32,110
of edges of F times the cut
distance between u and w.

236
00:17:37,670 --> 00:17:41,740
So maybe some of you already
see how to do this from

237
00:17:41,740 --> 00:17:45,930
our discussion on
Szemerédi's regularity lemma.

238
00:17:45,930 --> 00:17:48,790
In any case, I want to just
rewrite the proof again

239
00:17:48,790 --> 00:17:50,350
in the language of graphons.

240
00:17:50,350 --> 00:17:52,190
And this will hopefully--

241
00:17:52,190 --> 00:17:55,700
so we did two proofs of the
triangle counting lemma.

242
00:17:55,700 --> 00:17:58,445
One was hopefully more
intuitive for you,

243
00:17:58,445 --> 00:18:00,070
which is you pick a
typical vertex that

244
00:18:00,070 --> 00:18:01,528
has lots of neighbors
on both sides

245
00:18:01,528 --> 00:18:04,412
and therefore lots
of edges between.

246
00:18:04,412 --> 00:18:06,370
And then there was a
second proof, which I said

247
00:18:06,370 --> 00:18:08,470
was a more analytic
proof, where you took out

248
00:18:08,470 --> 00:18:10,420
one edge at a time.

249
00:18:10,420 --> 00:18:13,450
And that proof, I think
it's technically easier

250
00:18:13,450 --> 00:18:16,383
to implement, especially
for general H.

251
00:18:16,383 --> 00:18:17,800
But the first time
you see it, you

252
00:18:17,800 --> 00:18:20,680
might not quite see what
the calculation was about.

253
00:18:20,680 --> 00:18:23,320
So I want to do this exact
same calculation again

254
00:18:23,320 --> 00:18:24,547
in the language of graphons.

255
00:18:24,547 --> 00:18:26,380
And hopefully, it should
be clear this time.

256
00:18:29,600 --> 00:18:31,390
So this is the same
as the counting lemma

257
00:18:31,390 --> 00:18:34,800
over epsilon-regular pairs.

258
00:18:34,800 --> 00:18:44,120
So it suffices to
prove the inequality

259
00:18:44,120 --> 00:18:49,330
where the right-hand side
is replaced not by the cut

260
00:18:49,330 --> 00:18:53,440
distance but by the cut norm.

261
00:18:53,440 --> 00:18:57,550
And the reason is that once
you have the second inequality

262
00:18:57,550 --> 00:19:04,410
by taking an infimum over all
measure-preserving bijections

263
00:19:04,410 --> 00:19:05,290
phi--

264
00:19:05,290 --> 00:19:10,990
and notice that that change
does not affect the F density.

265
00:19:10,990 --> 00:19:12,900
By taking an infimum
over phi, you

266
00:19:12,900 --> 00:19:14,752
recover the first inequality.

267
00:19:17,590 --> 00:19:22,360
I want to give you a small
reformulation of the cut norm

268
00:19:22,360 --> 00:19:25,606
that will be useful for thinking
about this counting lemma.

269
00:19:29,980 --> 00:19:37,750
Here's a reformulation
of the cut norm--

270
00:19:37,750 --> 00:19:42,470
namely, that I can
define the cut norm.

271
00:19:42,470 --> 00:19:45,840
So here, w is taking
real values, so

272
00:19:45,840 --> 00:19:48,630
not necessarily non-negative.

273
00:19:48,630 --> 00:19:52,860
So the cut norm
we saw earlier is

274
00:19:52,860 --> 00:20:01,940
defined to be the supremum
over all measurable subsets

275
00:20:01,940 --> 00:20:08,900
of the 0, 1 interval of this
integral in absolute value.

276
00:20:08,900 --> 00:20:14,780
But it turns out I can rewrite
this supremum over a slightly

277
00:20:14,780 --> 00:20:16,850
larger set of objects.

278
00:20:16,850 --> 00:20:21,500
Instead of just looking
over measurable subsets

279
00:20:21,500 --> 00:20:26,330
of the interval, let me now
look at measurable functions.

280
00:20:26,330 --> 00:20:29,130
Little u.

281
00:20:29,130 --> 00:20:32,570
So OK, let me look at functions.

282
00:20:32,570 --> 00:20:40,860
So u and v from 0, 1 to 0, 1--

283
00:20:40,860 --> 00:20:46,530
and as always, everything
is measurable--

284
00:20:46,530 --> 00:20:49,650
of the following integral.

285
00:21:01,570 --> 00:21:04,260
So I claim this is true.

286
00:21:04,260 --> 00:21:09,370
So I consider this integral.

287
00:21:09,370 --> 00:21:11,480
Instead of integrating
over a box,

288
00:21:11,480 --> 00:21:16,160
now I'm integrating
this expression.

289
00:21:16,160 --> 00:21:16,660
OK.

290
00:21:16,660 --> 00:21:19,380
So why is this true?

291
00:21:19,380 --> 00:21:23,670
Well, one of the
directions is easy to see,

292
00:21:23,670 --> 00:21:27,630
because the right-hand side
is strictly an enlargement

293
00:21:27,630 --> 00:21:29,070
of the left-hand side.

294
00:21:29,070 --> 00:21:35,940
So by taking u to be the
indicator function of S

295
00:21:35,940 --> 00:21:38,750
and v to be the indicator
of function of T,

296
00:21:38,750 --> 00:21:40,680
you see that the
right-hand side, in fact,

297
00:21:40,680 --> 00:21:42,690
includes the left-hand
side in terms

298
00:21:42,690 --> 00:21:45,330
of what you are allowed to do.

299
00:21:45,330 --> 00:21:48,160
But what about the
other direction?

300
00:21:48,160 --> 00:21:50,070
So for the other
direction, the main thing

301
00:21:50,070 --> 00:21:56,700
is to notice that the
integral or the integrand,

302
00:21:56,700 --> 00:22:05,800
what's inside this integral,
is bilinear in the values of u

303
00:22:05,800 --> 00:22:12,390
and v. So in particular, the
extrema of this integral,

304
00:22:12,390 --> 00:22:17,210
as you allow to vary u
and v, they are obtained.

305
00:22:17,210 --> 00:22:22,350
So they are obtained
for u and v,

306
00:22:22,350 --> 00:22:31,610
taking values in the
endpoints 0, comma, 1.

307
00:22:36,030 --> 00:22:39,160
It may be helpful to think about
the discrete setting, when,

308
00:22:39,160 --> 00:22:42,070
instead of this integral, you
have a matrix and two vectors

309
00:22:42,070 --> 00:22:43,870
multiplied from left and right.

310
00:22:43,870 --> 00:22:46,840
And you had to decide,
what are the coordinates

311
00:22:46,840 --> 00:22:48,560
of those vectors?

312
00:22:48,560 --> 00:22:50,260
It's a bilinear form.

313
00:22:50,260 --> 00:22:53,090
How do you maximize
it or minimize it?

314
00:22:53,090 --> 00:22:57,900
You have to change every entry
to one of its two endpoints.

315
00:22:57,900 --> 00:23:00,660
Otherwise, it can never be--

316
00:23:00,660 --> 00:23:04,610
you never lose by doing that.

317
00:23:04,610 --> 00:23:05,950
OK, so think about it.

318
00:23:05,950 --> 00:23:12,610
So this is not difficult once
you see it the right way.

319
00:23:12,610 --> 00:23:18,630
But now, we have this cut
norm expressed over not sets,

320
00:23:18,630 --> 00:23:22,220
but over bounded functions.

321
00:23:22,220 --> 00:23:24,620
And now I'm ready to
prove the counting lemma.

322
00:23:32,400 --> 00:23:36,000
And instead of writing down
the whole proof for general H,

323
00:23:36,000 --> 00:23:40,650
let me write down the
calculation that illustrates

324
00:23:40,650 --> 00:23:42,600
this proof for triangles.

325
00:23:49,460 --> 00:23:50,840
And the general
proof is the same

326
00:23:50,840 --> 00:23:54,500
once you understand how
this argument works.

327
00:23:54,500 --> 00:24:00,770
And the argument works by
considering the difference

328
00:24:00,770 --> 00:24:09,890
between these two F densities.

329
00:24:09,890 --> 00:24:12,710
And what I want to do is--

330
00:24:12,710 --> 00:24:14,160
so this is some integral, right?

331
00:24:14,160 --> 00:24:17,090
So this is this integral,
which I'll write out.

332
00:24:41,780 --> 00:24:46,640
So we would like to show
that this quantity here

333
00:24:46,640 --> 00:24:51,730
is small if u and w
are close in cut norm.

334
00:24:51,730 --> 00:24:59,830
So let's write this integral
as a telescoping sum

335
00:24:59,830 --> 00:25:03,900
where the first term
is obtained by--

336
00:25:08,990 --> 00:25:11,150
so by this, I mean
w of x, comma, y

337
00:25:11,150 --> 00:25:12,440
minus u of x, comma, y.

338
00:25:24,440 --> 00:25:27,290
And then the second term
of the telescoping sum--

339
00:25:27,290 --> 00:25:28,950
so you see what happens.

340
00:25:28,950 --> 00:25:31,040
I change one factor at a time.

341
00:25:51,570 --> 00:25:54,810
And finally, I change
the third factor.

342
00:26:09,300 --> 00:26:10,300
So this is the identity.

343
00:26:10,300 --> 00:26:12,280
If you expand out all
of these differences,

344
00:26:12,280 --> 00:26:15,630
you see that everything
intermediate cancels out.

345
00:26:15,630 --> 00:26:19,700
So it's a telescoping sum.

346
00:26:19,700 --> 00:26:24,281
But now I want to show
that each term is small.

347
00:26:24,281 --> 00:26:28,170
So how can I show that
each term is small?

348
00:26:28,170 --> 00:26:32,400
Look at this expression here.

349
00:26:34,992 --> 00:26:38,280
I claim that for a
fixed value of z--

350
00:26:45,300 --> 00:26:47,490
so imagine fixing z.

351
00:26:47,490 --> 00:26:52,000
And let x and y vary
in this integral.

352
00:26:52,000 --> 00:26:55,760
It has the form up there, right?

353
00:26:55,760 --> 00:27:00,680
If you fix z, then
you have this u and v

354
00:27:00,680 --> 00:27:02,660
coming from these two factors.

355
00:27:02,660 --> 00:27:04,880
And they are both
bounded between 0 and 1.

356
00:27:08,090 --> 00:27:18,170
So for a fixed value of z,
this is at most w minus u--

357
00:27:18,170 --> 00:27:23,290
the cut norm difference between
w and u in absolute value.

358
00:27:27,520 --> 00:27:33,590
So if I left z vary, it is
still bounded in absolute value

359
00:27:33,590 --> 00:27:36,450
by that quantity.

360
00:27:36,450 --> 00:27:46,580
So therefore each is
bounded by w minus u cut

361
00:27:46,580 --> 00:27:49,910
norm in absolute value.

362
00:27:49,910 --> 00:27:52,410
Add all three of them together.

363
00:27:52,410 --> 00:27:57,290
We find that the whole thing
is bounded in absolute value

364
00:27:57,290 --> 00:27:59,963
by 3 times the cut
normal difference.

365
00:28:03,350 --> 00:28:06,660
OK, and that finishes the
proof of the counting lemma.

366
00:28:06,660 --> 00:28:10,050
For triangles, of course,
if you have general H,

367
00:28:10,050 --> 00:28:12,600
then you just have more terms.

368
00:28:12,600 --> 00:28:18,040
You have a longer telescoping
sum, and you have this bound.

369
00:28:18,040 --> 00:28:18,540
OK.

370
00:28:18,540 --> 00:28:19,450
So this is a counting lemma.

371
00:28:19,450 --> 00:28:22,080
And I claim that it's exactly
the same proof as the second

372
00:28:22,080 --> 00:28:24,952
proof of the counting lemma
that we did when we discussed

373
00:28:24,952 --> 00:28:27,160
Szemerédi's regularity lemma
and this counting lemma.

374
00:28:30,220 --> 00:28:33,194
Any questions?

375
00:28:33,194 --> 00:28:33,694
Yeah.

376
00:28:37,082 --> 00:28:42,487
AUDIENCE: Why did it suffice
to prove over the [INAUDIBLE]??

377
00:28:42,487 --> 00:28:43,070
PROFESSOR: OK.

378
00:28:43,070 --> 00:28:45,460
So let me answer
that in a second.

379
00:28:45,460 --> 00:28:48,280
So first, this should
be H, not F. OK,

380
00:28:48,280 --> 00:28:55,000
so your question
was, up there, why

381
00:28:55,000 --> 00:28:59,350
was it sufficient to
prove this version instead

382
00:28:59,350 --> 00:29:00,365
of that version?

383
00:29:00,365 --> 00:29:01,240
Is that the question?

384
00:29:01,240 --> 00:29:02,177
AUDIENCE: Yeah.

385
00:29:02,177 --> 00:29:02,760
PROFESSOR: OK.

386
00:29:02,760 --> 00:29:04,970
Suppose I prove it
for this version.

387
00:29:04,970 --> 00:29:06,870
So I know this is true.

388
00:29:06,870 --> 00:29:09,610
Now I take infimum
of both sides.

389
00:29:09,610 --> 00:29:17,990
So now I consider
infimum of both sides.

390
00:29:17,990 --> 00:29:21,380
So then this is true, right?

391
00:29:21,380 --> 00:29:24,440
Because it's true for every phi.

392
00:29:24,440 --> 00:29:28,490
But the left-hand side doesn't
change, because the F density

393
00:29:28,490 --> 00:29:32,930
in a relabeling of the vertices,
it's still the same quantity,

394
00:29:32,930 --> 00:29:34,880
whereas this one
here is now that.

395
00:29:40,226 --> 00:29:41,198
All right.

396
00:29:44,600 --> 00:29:53,320
So what we see as a corollary
of this counting lemma

397
00:29:53,320 --> 00:29:58,540
is that if you are a Cauchy
sequence with respect

398
00:29:58,540 --> 00:30:06,940
to the cut distance,
then the sequence

399
00:30:06,940 --> 00:30:09,347
is automatically convergent.

400
00:30:15,663 --> 00:30:17,330
So recall the definition
of convergence.

401
00:30:17,330 --> 00:30:20,920
Convergence has to do with
F densities converging.

402
00:30:20,920 --> 00:30:22,940
And if you have a
Cauchy sequence,

403
00:30:22,940 --> 00:30:25,970
then the F densities converge.

404
00:30:25,970 --> 00:30:29,000
And also, a related
but different statement

405
00:30:29,000 --> 00:30:35,180
is that if you have
a sequence wn that

406
00:30:35,180 --> 00:30:41,040
converges to w in
cut distance, then

407
00:30:41,040 --> 00:30:45,810
it implies that wn
converges to w in the sense

408
00:30:45,810 --> 00:30:48,496
as defined for F densities.

409
00:30:51,880 --> 00:30:55,270
So qualitatively, what
the counting lemma says

410
00:30:55,270 --> 00:31:00,550
is that the cut norm is stronger
than the notion of convergence

411
00:31:00,550 --> 00:31:05,260
coming from subgraph densities.

412
00:31:05,260 --> 00:31:08,668
So this is one part of
this regularity method, so

413
00:31:08,668 --> 00:31:09,460
the counting lemma.

414
00:31:09,460 --> 00:31:12,503
Of course, the other part is
the regularity lemma itself.

415
00:31:12,503 --> 00:31:13,920
So that's the next
thing we'll do.

416
00:31:17,020 --> 00:31:18,610
And it turns out
that we actually

417
00:31:18,610 --> 00:31:21,190
don't need the full strength
of the regularity lemma.

418
00:31:21,190 --> 00:31:23,740
We only need something called
a weak regularity lemma.

419
00:31:37,660 --> 00:31:41,690
What the weak regularity
lemma says is--

420
00:31:41,690 --> 00:31:44,850
I mean, you still have a
partition of the vertices.

421
00:31:44,850 --> 00:31:46,370
So let me now state
it for graphons.

422
00:31:46,370 --> 00:31:53,110
So for a partition p--

423
00:31:53,110 --> 00:31:56,920
so I have a partition
of the vertex set--

424
00:32:04,120 --> 00:32:13,100
and a symmetric,
measurable function w--

425
00:32:13,100 --> 00:32:16,080
I'm just going to omit the
word "measurable" from now on.

426
00:32:16,080 --> 00:32:18,990
Everything will be measurable.

427
00:32:18,990 --> 00:32:22,160
What I can do is, OK,
all of these assets

428
00:32:22,160 --> 00:32:24,463
are also measurable.

429
00:32:27,780 --> 00:32:38,130
I can define what's known as a
stepping operator that sends w

430
00:32:38,130 --> 00:32:43,190
to this object,
w sub p, obtained

431
00:32:43,190 --> 00:32:55,210
by averaging over
the steps si cross sj

432
00:32:55,210 --> 00:33:01,490
and replacing that graphon by
its average over each step.

433
00:33:01,490 --> 00:33:07,900
Precisely, so I
obtain a new graphon,

434
00:33:07,900 --> 00:33:11,630
a new symmetric, measurable
function, w sub p,

435
00:33:11,630 --> 00:33:20,100
where the value on x,
comma, y is defined

436
00:33:20,100 --> 00:33:23,890
to be the following quantity--

437
00:33:30,610 --> 00:33:39,040
if x, comma, y lies
in si cross sj.

438
00:33:39,040 --> 00:33:43,840
So pictorially, what happens is
that you look at your graphon.

439
00:33:47,540 --> 00:33:51,262
There's a partition
of the vertex set,

440
00:33:51,262 --> 00:33:52,853
so to speak, the interval.

441
00:33:52,853 --> 00:33:54,770
Doesn't have to be a
partition into intervals,

442
00:33:54,770 --> 00:33:57,850
but for illustration,
suppose it looks like that.

443
00:33:57,850 --> 00:34:01,850
And what I do is I take
this w, and I replace it

444
00:34:01,850 --> 00:34:06,590
by a new graphon, a new
symmetric, measurable function,

445
00:34:06,590 --> 00:34:12,749
w sub p, obtained by averaging.

446
00:34:16,421 --> 00:34:17,600
Take each box.

447
00:34:17,600 --> 00:34:18,860
Replace it by its average.

448
00:34:18,860 --> 00:34:22,310
Put that average into the box.

449
00:34:22,310 --> 00:34:26,920
So this is what w sub
p is supposed to be.

450
00:34:26,920 --> 00:34:29,710
Just a few minor technicalities.

451
00:34:29,710 --> 00:34:39,690
If this denominator is equal
to 0, let's ignore the set.

452
00:34:39,690 --> 00:34:42,679
I mean, then you have a
zero measure set, anyway,

453
00:34:42,679 --> 00:34:44,820
so we ignore that set.

454
00:34:44,820 --> 00:34:47,330
So everything will be
treated up to measure zero,

455
00:34:47,330 --> 00:34:49,850
changing the function
on measure zero sets.

456
00:34:49,850 --> 00:34:53,883
So it doesn't really matter
if you're not strictly

457
00:34:53,883 --> 00:34:55,050
allowed to do this division.

458
00:34:58,310 --> 00:34:59,200
OK.

459
00:34:59,200 --> 00:35:01,990
So this operator plays
an important role

460
00:35:01,990 --> 00:35:03,820
in the regularity
lemma, because it's

461
00:35:03,820 --> 00:35:07,050
how we think about partitioning,
what happens to a graph

462
00:35:07,050 --> 00:35:08,260
under partitioning.

463
00:35:08,260 --> 00:35:12,640
It has several other names if
you look at it from slightly

464
00:35:12,640 --> 00:35:14,060
different perspectives.

465
00:35:14,060 --> 00:35:19,400
So you can view
it as a projection

466
00:35:19,400 --> 00:35:22,280
in the sense of Hilbert space.

467
00:35:22,280 --> 00:35:35,170
So in the Hilbert space of
functions on the unit square,

468
00:35:35,170 --> 00:35:44,840
the stepping operator is a
projection unto the subspace

469
00:35:44,840 --> 00:35:52,090
of constants,
subspace of functions

470
00:35:52,090 --> 00:35:56,660
that are constant on each step.

471
00:36:05,210 --> 00:36:06,920
So that's one interpretation.

472
00:36:06,920 --> 00:36:09,860
Another interpretation is
that this operation is also

473
00:36:09,860 --> 00:36:11,870
a conditional expectation.

474
00:36:17,340 --> 00:36:21,900
If you know what a conditional
expectation actually

475
00:36:21,900 --> 00:36:25,130
is in the sense of
probability theory,

476
00:36:25,130 --> 00:36:26,940
so then that's
what happens here.

477
00:36:26,940 --> 00:36:30,720
If you view 0, 1 squared
as a probability space,

478
00:36:30,720 --> 00:36:35,340
then what we're doing is we're
doing conditional expectation

479
00:36:35,340 --> 00:36:39,750
relative to the sigma algebra
generated by these steps.

480
00:36:41,793 --> 00:36:43,710
So these are just a
couple of ways of thinking

481
00:36:43,710 --> 00:36:44,627
about what's going on.

482
00:36:44,627 --> 00:36:46,290
They might be somewhat
helpful later on

483
00:36:46,290 --> 00:36:47,873
if you're familiar
with these notions.

484
00:36:47,873 --> 00:36:49,705
But if you're not,
don't worry about it.

485
00:36:49,705 --> 00:36:51,330
Concretely, it's what
happens up there.

486
00:36:58,340 --> 00:36:58,990
OK.

487
00:36:58,990 --> 00:37:01,930
So now let me state the
weak regularity lemma.

488
00:37:13,530 --> 00:37:16,550
So the weak regularity
lemma is attributed

489
00:37:16,550 --> 00:37:25,800
to Frieze and Kannan,
although their work predates

490
00:37:25,800 --> 00:37:27,540
the language of graphons.

491
00:37:27,540 --> 00:37:29,720
So it's stated in the
language of graphs,

492
00:37:29,720 --> 00:37:30,720
but it's the same proof.

493
00:37:30,720 --> 00:37:33,410
So let me state it for you
both in terms of graphons

494
00:37:33,410 --> 00:37:35,070
and in graphs.

495
00:37:35,070 --> 00:37:48,160
What it says is that for every
epsilon and every graphon w,

496
00:37:48,160 --> 00:38:00,760
there exists a partition
denoted p of the 0, 1 interval.

497
00:38:00,760 --> 00:38:03,110
And now I tell you how
many sets there are.

498
00:38:03,110 --> 00:38:05,320
So it's a partition into--

499
00:38:05,320 --> 00:38:08,300
so not a tower-type
number of parts,

500
00:38:08,300 --> 00:38:11,920
but only roughly an
exponential number of parts--

501
00:38:11,920 --> 00:38:22,250
4 to the 1 over epsilon
squared measurable sets such

502
00:38:22,250 --> 00:38:29,710
that if we apply the stepping
operator to this graphon,

503
00:38:29,710 --> 00:38:35,538
we obtain an approximation of
the graphon in the cut norm.

504
00:38:40,520 --> 00:38:45,050
So that's the statement of
the weak regularity lemma.

505
00:38:45,050 --> 00:38:51,620
There exists a partition such
that if you do this stepping,

506
00:38:51,620 --> 00:38:53,460
then you obtain
an approximation.

507
00:38:53,460 --> 00:38:56,120
So I want you to think about
what this has to do with

508
00:38:56,120 --> 00:38:58,600
the usual version of Szemerédi's
regularity lemma that

509
00:38:58,600 --> 00:39:00,030
you've seen earlier.

510
00:39:00,030 --> 00:39:01,970
So hopefully, you
should realize, morally,

511
00:39:01,970 --> 00:39:04,660
they're about the same
types of statements.

512
00:39:04,660 --> 00:39:07,980
But more importantly, how are
they different from each other?

513
00:39:07,980 --> 00:39:12,620
And now let me state a version
for graphs, which is similar

514
00:39:12,620 --> 00:39:17,090
but not exactly the same as
what we just saw for graphons.

515
00:39:17,090 --> 00:39:19,520
So let me state it.

516
00:39:19,520 --> 00:39:26,300
So for graphs, the
weak regularity lemma

517
00:39:26,300 --> 00:39:36,420
says that, OK, so for graphs,
let me define a partition

518
00:39:36,420 --> 00:39:55,130
p of the vertex set is
called weakly epsilon regular

519
00:39:55,130 --> 00:39:58,360
if the following is true.

520
00:39:58,360 --> 00:40:03,055
If it is the case that whenever
I look at two vertex subsets,

521
00:40:03,055 --> 00:40:08,650
A and B, of the
vertex set of g, then

522
00:40:08,650 --> 00:40:13,880
the number of vertices
between A and B

523
00:40:13,880 --> 00:40:21,530
is what you should expect based
on the density information that

524
00:40:21,530 --> 00:40:24,710
comes out of this partition.

525
00:40:24,710 --> 00:40:32,830
Namely, if I sum over all
the parts of the partition,

526
00:40:32,830 --> 00:40:46,200
look at how many vertices from A
lie in the corresponding parts.

527
00:40:46,200 --> 00:40:51,090
And then multiply by the edge
density between these parts.

528
00:40:51,090 --> 00:40:53,820
So that's your predicted
value based on the data that

529
00:40:53,820 --> 00:40:55,900
comes out of the partition.

530
00:40:55,900 --> 00:40:58,170
So I claim that this is
the actual number of edges.

531
00:40:58,170 --> 00:41:00,720
This is the predicted
number of edges.

532
00:41:00,720 --> 00:41:07,395
And those two numbers should
be similar to each other bt

533
00:41:07,395 --> 00:41:11,380
at most epsilon n, where n
is the number of vertices.

534
00:41:11,380 --> 00:41:14,680
So this is the definition of
what it means for a partition

535
00:41:14,680 --> 00:41:18,700
to be weakly epsilon regular.

536
00:41:18,700 --> 00:41:22,190
So it's important to think
about why this is weaker.

537
00:41:22,190 --> 00:41:23,190
It's called weak, right?

538
00:41:23,190 --> 00:41:28,150
So why is it weaker than a
notion of epsilon regularity?

539
00:41:28,150 --> 00:41:30,450
So why is it weaker?

540
00:41:30,450 --> 00:41:34,110
So previously, we had
epsilon-regular partition

541
00:41:34,110 --> 00:41:36,900
in the definition of
Szemerédi's regularity lemma,

542
00:41:36,900 --> 00:41:38,880
this epsilon-regular partition.

543
00:41:38,880 --> 00:41:43,350
And here, notion of
weakly epsilon regular.

544
00:41:43,350 --> 00:41:44,620
So why is this a lot weaker?

545
00:41:47,460 --> 00:41:52,050
It is not saying that
individual pairs of parts

546
00:41:52,050 --> 00:41:55,355
are epsilon regular.

547
00:41:55,355 --> 00:41:57,730
And eventually, we're going
to have this number of parts.

548
00:41:57,730 --> 00:42:00,210
So I'll state a
theorem in a second.

549
00:42:00,210 --> 00:42:04,070
So the sizes of the
parts are much smaller

550
00:42:04,070 --> 00:42:07,380
than epsilon fraction.

551
00:42:07,380 --> 00:42:12,080
But what this weak notion of
regularity says, if you look

552
00:42:12,080 --> 00:42:13,950
at it globally--

553
00:42:13,950 --> 00:42:15,740
so not looking at
specific parts,

554
00:42:15,740 --> 00:42:17,450
but looking at it globally--

555
00:42:17,450 --> 00:42:19,670
then this partition is
a good approximation

556
00:42:19,670 --> 00:42:24,280
of what's going on in the
actual graph, whereas--

557
00:42:24,280 --> 00:42:25,710
OK, so it's worth
thinking about.

558
00:42:25,710 --> 00:42:27,335
It's really worth
thinking about what's

559
00:42:27,335 --> 00:42:29,990
the difference between this weak
notion and the usual notion.

560
00:42:29,990 --> 00:42:33,380
But first, let me state
this regularity lemma.

561
00:42:33,380 --> 00:42:43,330
So the weak regularity
lemma for graphs

562
00:42:43,330 --> 00:42:50,820
says that for every
epsilon and every graph G,

563
00:42:50,820 --> 00:43:03,360
there exists a weakly
epsilon-regular partition

564
00:43:03,360 --> 00:43:09,090
of the vertex set
of G into at most 4

565
00:43:09,090 --> 00:43:11,570
to the 1 over epsilon
squared parts.

566
00:43:20,240 --> 00:43:24,640
Now, you might wonder why
did Frieze and Kannan come up

567
00:43:24,640 --> 00:43:29,010
with this notion of regularity.

568
00:43:29,010 --> 00:43:32,010
It's a weaker result if you
don't care about the bounds,

569
00:43:32,010 --> 00:43:38,070
because an epsilon-regular
partition will be automatically

570
00:43:38,070 --> 00:43:41,360
weakly epsilon regular.

571
00:43:41,360 --> 00:43:43,220
So maybe with small
changes of epsilon

572
00:43:43,220 --> 00:43:46,370
if you wish, but basically,
this is a weaker notion

573
00:43:46,370 --> 00:43:47,690
compared to what we had before.

574
00:43:50,780 --> 00:43:53,560
But of course, the advantage
is that you have a much more

575
00:43:53,560 --> 00:43:56,230
reasonable number of parts.

576
00:43:56,230 --> 00:43:58,210
It's not a tower.

577
00:43:58,210 --> 00:44:01,180
It's just a single exponential.

578
00:44:01,180 --> 00:44:02,110
And this is important.

579
00:44:02,110 --> 00:44:05,740
And their motivation was a
computer science and algorithm

580
00:44:05,740 --> 00:44:06,760
application.

581
00:44:06,760 --> 00:44:11,410
So I want to take a
brief detour and mention

582
00:44:11,410 --> 00:44:18,022
why you might care about weakly
epsilon-regular partitions.

583
00:44:22,180 --> 00:44:25,240
In particular, the problem
that is of interest

584
00:44:25,240 --> 00:44:30,980
is in approximating
something called a max cut.

585
00:44:30,980 --> 00:44:38,060
So the max cut problem asks you
to determine-- given a graph G,

586
00:44:38,060 --> 00:44:46,360
find the maximum over
all subsets of vertices,

587
00:44:46,360 --> 00:44:49,610
the maximum number of
vertices between a set

588
00:44:49,610 --> 00:44:51,040
and its complement.

589
00:44:51,040 --> 00:44:52,430
That's called a cut.

590
00:44:52,430 --> 00:44:56,430
I give you a graph,
and I want to know--

591
00:44:56,430 --> 00:45:01,860
find this s so that it
can have as many edges

592
00:45:01,860 --> 00:45:05,488
across this set as possible.

593
00:45:05,488 --> 00:45:07,530
This is an important
problem in computer science,

594
00:45:07,530 --> 00:45:09,120
extremely important problem.

595
00:45:09,120 --> 00:45:12,450
And the status of
this problem is

596
00:45:12,450 --> 00:45:20,640
that it is known to be difficult
to get it even within 1%.

597
00:45:20,640 --> 00:45:24,188
So the best algorithm is due
to Goemans and Williamson.

598
00:45:30,410 --> 00:45:32,120
It's an important
algorithm that was

599
00:45:32,120 --> 00:45:33,560
one of the
foundational algorithms

600
00:45:33,560 --> 00:45:35,690
in semidefinite
programming, so related--

601
00:45:35,690 --> 00:45:37,340
the words "semidefinite
programming"

602
00:45:37,340 --> 00:45:40,070
came up earlier in this course
when we discussed growth index

603
00:45:40,070 --> 00:45:40,970
inequality.

604
00:45:40,970 --> 00:45:43,820
So they came up with an
approximation algorithm.

605
00:45:43,820 --> 00:45:47,100
So here, I'm only talking
about polynomial time,

606
00:45:47,100 --> 00:45:48,830
so efficient algorithms.

607
00:45:48,830 --> 00:45:53,600
Approximation algorithm
with approximation ratio

608
00:45:53,600 --> 00:45:56,120
around 0.878.

609
00:45:56,120 --> 00:46:03,900
So one can obtain a cut
that is within basically

610
00:46:03,900 --> 00:46:07,820
13% of the maximum.

611
00:46:07,820 --> 00:46:10,540
So it's an
approximation algorithm.

612
00:46:10,540 --> 00:46:17,380
However, it is known that it is
hard in the sense of complexity

613
00:46:17,380 --> 00:46:18,540
theory.

614
00:46:18,540 --> 00:46:29,830
It'd be hard to approximate
beyond the ratio 16 over 17,

615
00:46:29,830 --> 00:46:37,000
which is around 0.491.

616
00:46:37,000 --> 00:46:38,980
And there is an
important conjecture

617
00:46:38,980 --> 00:46:41,800
in computer science called
a unique games conjecture

618
00:46:41,800 --> 00:46:44,240
that, if that
conjecture were true,

619
00:46:44,240 --> 00:46:46,710
then it would be
difficult. It would

620
00:46:46,710 --> 00:46:52,010
be hard to approximate beyond
the Goemans-Williamson ratio.

621
00:46:52,010 --> 00:46:54,070
So this indicates the
status of this problem.

622
00:46:54,070 --> 00:46:59,760
It is difficult to do an
epsilon approximation.

623
00:46:59,760 --> 00:47:03,135
But if the graph I
give you is dense--

624
00:47:10,460 --> 00:47:13,040
"dense" meaning a
quadratic number

625
00:47:13,040 --> 00:47:17,970
of edges, where n is
a number of vertices--

626
00:47:17,970 --> 00:47:25,210
then it turns out that the
regularity-type algorithms--

627
00:47:25,210 --> 00:47:28,390
so that theorem combined
with the algorithmic versions

628
00:47:28,390 --> 00:47:35,360
allows you to get polynomial
time approximation algorithms.

629
00:47:35,360 --> 00:47:38,660
So this is polynomial time
approximation schemes.

630
00:47:41,620 --> 00:47:52,000
So one can approximate up
to 1 minus epsilon ratio.

631
00:47:52,000 --> 00:47:57,940
So one can approximate
up to epsilon

632
00:47:57,940 --> 00:48:07,796
n squared additive error
in polynomial time.

633
00:48:07,796 --> 00:48:12,730
So in particular, if I'm
willing to lose 0.01 n squared,

634
00:48:12,730 --> 00:48:16,540
then there is an algorithm to
approximate the size of the max

635
00:48:16,540 --> 00:48:17,040
cut.

636
00:48:17,040 --> 00:48:21,110
And that algorithm
basically comes from--

637
00:48:21,110 --> 00:48:23,320
without giving you any
details whatsoever,

638
00:48:23,320 --> 00:48:27,310
the algorithm essentially comes
from first finding a regularity

639
00:48:27,310 --> 00:48:28,334
partition.

640
00:48:35,110 --> 00:48:40,120
So the partition breaks
the set of vertices

641
00:48:40,120 --> 00:48:43,240
into some number of pieces.

642
00:48:43,240 --> 00:48:57,640
And now I search over
all possible ratios

643
00:48:57,640 --> 00:49:01,080
to divide each piece.

644
00:49:04,280 --> 00:49:06,210
So there is a bounded
number of parts.

645
00:49:06,210 --> 00:49:09,320
Each one of those, I decide,
do I cut this up half-half?

646
00:49:09,320 --> 00:49:13,270
Do I cut it up 1/3,
2/3, and so on?

647
00:49:13,270 --> 00:49:17,940
And those numbers alone,
because of this definition

648
00:49:17,940 --> 00:49:22,040
of weakly epsilon
regular, once you

649
00:49:22,040 --> 00:49:27,005
know what the intersection
of A, B is, let's say,

650
00:49:27,005 --> 00:49:29,780
a complement is with
individual sets,

651
00:49:29,780 --> 00:49:32,510
then I basically know
the number of edges.

652
00:49:32,510 --> 00:49:36,800
So I can approximate
the size of the max cut

653
00:49:36,800 --> 00:49:41,300
using a weakly
epsilon-regular partition.

654
00:49:41,300 --> 00:49:47,360
So that was the motivation
for these weakly epsilon

655
00:49:47,360 --> 00:49:51,820
partitions, at least the
algorithmic application.

656
00:49:51,820 --> 00:49:52,320
OK.

657
00:49:52,320 --> 00:49:53,420
Any questions?

658
00:49:56,240 --> 00:49:56,740
OK.

659
00:49:56,740 --> 00:49:58,150
So let's take a quick break.

660
00:49:58,150 --> 00:50:00,100
And then afterwards,
I want to show

661
00:50:00,100 --> 00:50:03,160
you the proof of the
weak regularity lemma.

662
00:50:05,730 --> 00:50:06,230
All right.

663
00:50:06,230 --> 00:50:12,560
So let me start the proof of
the weak regularity lemma.

664
00:50:12,560 --> 00:50:14,775
And the proof is by this
energy increment argument.

665
00:50:14,775 --> 00:50:16,400
So let's see what
this energy increment

666
00:50:16,400 --> 00:50:19,700
argument looks like in
the language of graphons.

667
00:50:19,700 --> 00:50:27,610
So energy now means L2,
so L2 energy increment.

668
00:50:27,610 --> 00:50:29,600
So the statement
of this lemma is

669
00:50:29,600 --> 00:50:42,230
that if you have w, a graphon,
and p, a partition, of 0,

670
00:50:42,230 --> 00:50:46,740
comma, 1 interval such that--

671
00:50:50,120 --> 00:50:51,260
always measurable pieces.

672
00:50:51,260 --> 00:50:52,300
I'm not going to even write it.

673
00:50:52,300 --> 00:50:53,592
It's always measurable pieces--

674
00:50:57,320 --> 00:51:08,390
such that the difference between
w and w averaged over steps p

675
00:51:08,390 --> 00:51:11,420
is bigger than epsilon.

676
00:51:11,420 --> 00:51:14,390
So this is the notion
of being not epsilon

677
00:51:14,390 --> 00:51:22,280
regular in the weak sense,
not weakly epsilon regular.

678
00:51:22,280 --> 00:51:33,100
Then there exists a
refinement, p prime of p,

679
00:51:33,100 --> 00:51:45,430
dividing each part of p
into at most four parts

680
00:51:45,430 --> 00:51:57,380
such that the true norm
increases by more than epsilon

681
00:51:57,380 --> 00:52:02,040
squared under this refinement.

682
00:52:02,040 --> 00:52:04,110
So it should be similar.

683
00:52:04,110 --> 00:52:06,450
It should be familiar to
you, because we have similar

684
00:52:06,450 --> 00:52:09,686
arguments from Szemerédi's
regularity lemma.

685
00:52:09,686 --> 00:52:10,644
So let's see the proof.

686
00:52:13,490 --> 00:52:18,380
Because you have violation
of weak epsilon regularity,

687
00:52:18,380 --> 00:52:23,250
there exists sets S and T,
measurable subsets of 0,

688
00:52:23,250 --> 00:52:29,510
1 interval, such that this
integral evaluated over S

689
00:52:29,510 --> 00:52:39,140
cross T is more than
epsilon in absolute value.

690
00:52:39,140 --> 00:52:55,690
So now let me take p prime to
be the common refinement of p

691
00:52:55,690 --> 00:53:07,890
by introducing S and
T into this partition.

692
00:53:07,890 --> 00:53:10,900
So throw S and T in
and break everything

693
00:53:10,900 --> 00:53:12,930
according to S and T.

694
00:53:12,930 --> 00:53:21,140
And so each part becomes
at most four subparts.

695
00:53:21,140 --> 00:53:22,960
So that's the at
most four subparts.

696
00:53:25,780 --> 00:53:29,060
I now need to show that I
have an energy increment.

697
00:53:29,060 --> 00:53:33,340
And to do this, let
me first perform

698
00:53:33,340 --> 00:53:36,530
the following calculation.

699
00:53:36,530 --> 00:53:41,590
So remember, this symbol
here is the inner product

700
00:53:41,590 --> 00:53:44,230
obtained by multiplying
and integrating

701
00:53:44,230 --> 00:53:46,890
over the entire box.

702
00:53:46,890 --> 00:53:52,400
I claim that that
inner product equals

703
00:53:52,400 --> 00:54:00,790
to the inner product
between wp and wp prime,

704
00:54:00,790 --> 00:54:08,580
because what happens here is
we are looking at a situation

705
00:54:08,580 --> 00:54:15,510
where wp prime is
constant on each part.

706
00:54:15,510 --> 00:54:20,920
So when I do this inner product,
I can replace w by its average.

707
00:54:20,920 --> 00:54:23,810
And likewise, over here, I can
also replace it by its average.

708
00:54:23,810 --> 00:54:26,990
And you end up having
the same average.

709
00:54:26,990 --> 00:54:33,340
And these two averages
are both just what happens

710
00:54:33,340 --> 00:54:35,170
if you do stepping by p.

711
00:54:38,440 --> 00:54:48,380
You also have that w has inner
product with 1 sub S cross T

712
00:54:48,380 --> 00:54:54,780
the same as that of p
prime by the same reason,

713
00:54:54,780 --> 00:55:00,790
because over S
cross T. So S cross

714
00:55:00,790 --> 00:55:06,000
T is a union of the
parts of p prime.

715
00:55:06,000 --> 00:55:16,360
So S is union of
parts of p prime.

716
00:55:16,360 --> 00:55:16,860
OK.

717
00:55:16,860 --> 00:55:18,140
So let's see.

718
00:55:18,140 --> 00:55:21,770
With those observations,
you find that--

719
00:55:30,580 --> 00:55:33,580
so this is true.

720
00:55:33,580 --> 00:55:35,870
This is from the first equality.

721
00:55:35,870 --> 00:55:40,795
So now let me draw
you a right triangle.

722
00:55:49,890 --> 00:55:51,840
So you have a right
angle, because you have

723
00:55:51,840 --> 00:55:54,450
an inner product that is 0.

724
00:55:54,450 --> 00:56:04,530
So by Pythagorean theorem,
so what is this hypotenuse?

725
00:56:04,530 --> 00:56:06,520
So you add these two vectors.

726
00:56:06,520 --> 00:56:14,060
And you find out this wp prime.

727
00:56:14,060 --> 00:56:16,010
So by Pythagorean
theorem, you find

728
00:56:16,010 --> 00:56:20,540
that the L2 norm
of wp prime equals

729
00:56:20,540 --> 00:56:33,990
to the L2 norm of the sum of
the L2 norm squares of the two

730
00:56:33,990 --> 00:56:35,910
legs of this right triangle.

731
00:56:43,420 --> 00:56:48,153
On the other hand,
this quantity here.

732
00:56:48,153 --> 00:56:50,070
So let's think about
that quantity over there.

733
00:56:52,810 --> 00:56:54,420
It's an L2 norm.

734
00:56:54,420 --> 00:57:16,580
So in particular, it is at
least this quantity here,

735
00:57:16,580 --> 00:57:20,180
which you can derive
in one of many ways--

736
00:57:20,180 --> 00:57:25,840
for example, by Cauchy-Schwarz
inequality or go from L2 to L1

737
00:57:25,840 --> 00:57:28,330
and then pass down to L1.

738
00:57:28,330 --> 00:57:31,890
So this is true.

739
00:57:31,890 --> 00:57:33,687
So let's say by Cauchy-Schwarz.

740
00:57:49,570 --> 00:57:55,580
But this quantity here, we
said was bigger than epsilon.

741
00:58:04,690 --> 00:58:12,180
So as a result,
this final quantity,

742
00:58:12,180 --> 00:58:17,300
this L2 norm of
the new refinement,

743
00:58:17,300 --> 00:58:20,540
increases from the previous one
by more than epsilon squared.

744
00:58:24,620 --> 00:58:25,880
OK.

745
00:58:25,880 --> 00:58:27,910
So this is the L2 energy
increment argument.

746
00:58:27,910 --> 00:58:29,870
I claim it's the same
argument, basically,

747
00:58:29,870 --> 00:58:32,480
as the one that we did for
Szemerédi's regularity lemma.

748
00:58:32,480 --> 00:58:34,700
And I encourage you to
go back and compare them

749
00:58:34,700 --> 00:58:36,200
to see why they're the same.

750
00:58:40,280 --> 00:58:41,360
All right, moving on.

751
00:58:41,360 --> 00:58:45,230
So the other part
of regularity lemma

752
00:58:45,230 --> 00:58:48,820
is to iterate this approach.

753
00:58:48,820 --> 00:58:51,980
So if you have something
which is not epsilon regular,

754
00:58:51,980 --> 00:58:52,790
refine it.

755
00:58:52,790 --> 00:58:53,960
And then iterate.

756
00:58:53,960 --> 00:58:58,820
And you cannot perceive more
than a bounded number of times,

757
00:58:58,820 --> 00:59:02,390
because energy is always
bounded between 0 and 1.

758
00:59:02,390 --> 00:59:09,260
So for every epsilon bigger
than 0 and graphon w,

759
00:59:09,260 --> 00:59:17,210
suppose you have P0, a
partition of 0, 1 interval

760
00:59:17,210 --> 00:59:19,960
into measurable sets.

761
00:59:19,960 --> 00:59:38,280
Then there exists a partition
p that cuts up each part of P0

762
00:59:38,280 --> 00:59:47,460
into at most 4 to the
1 over epsilon parts

763
00:59:47,460 --> 00:59:55,920
such that w minus w sub
p is at most epsilon.

764
00:59:55,920 --> 00:59:59,620
So I'm basically restating
the weak regularity lemma

765
00:59:59,620 --> 01:00:03,630
over there but with a
small difference, which

766
01:00:03,630 --> 01:00:07,020
will become useful later on
when we prove compactness.

767
01:00:07,020 --> 01:00:09,645
Namely, I'm allowed to
start with any partition.

768
01:00:09,645 --> 01:00:11,520
Instead of starting with
a trivial partition,

769
01:00:11,520 --> 01:00:14,100
I can start with any partition.

770
01:00:14,100 --> 01:00:16,382
This was also true when
we were talking about

771
01:00:16,382 --> 01:00:18,840
Szemerédi's regularity lemma,
although I didn't stress that

772
01:00:18,840 --> 01:00:20,320
point.

773
01:00:20,320 --> 01:00:21,858
That's certainly the case here.

774
01:00:21,858 --> 01:00:23,400
I mean, the proof
is exactly the same

775
01:00:23,400 --> 01:00:26,250
with or without this extra.

776
01:00:26,250 --> 01:00:30,780
This extra P0 really plays
an insignificant role.

777
01:00:30,780 --> 01:00:34,520
What happens, as in the proof
of Szemerédi's regularity lemma,

778
01:00:34,520 --> 01:00:42,770
is that we repeatedly apply
the previous lemma to obtain

779
01:00:42,770 --> 01:00:56,040
the sequence of partitions
of the 0, 1 interval where,

780
01:00:56,040 --> 01:01:10,160
each step, either we find that
we obtain some partition p sub

781
01:01:10,160 --> 01:01:15,790
i such that it's a good
approximation of w,

782
01:01:15,790 --> 01:01:33,150
in which case we stop, or the
L2 energy increases by more than

783
01:01:33,150 --> 01:01:34,750
epsilon squared.

784
01:01:40,630 --> 01:01:49,890
And since the final energy
is always at most 1--

785
01:01:49,890 --> 01:01:52,620
so it's always bounded
between 0 and 1--

786
01:01:52,620 --> 01:02:01,060
we must stop after at
most 1 over epsilon steps.

787
01:02:06,460 --> 01:02:14,160
And if you calculate
the number of parts,

788
01:02:14,160 --> 01:02:20,165
each part is subdivided
into at most four parts

789
01:02:20,165 --> 01:02:26,780
at each step, which
gives you the conclusion

790
01:02:26,780 --> 01:02:29,580
on the final number of parts.

791
01:02:29,580 --> 01:02:31,640
OK, so very similar
to what we did before.

792
01:02:35,780 --> 01:02:36,720
All right.

793
01:02:36,720 --> 01:02:41,850
So that concludes the discussion
of the weak regularity lemma.

794
01:02:41,850 --> 01:02:44,360
So basically the same proof.

795
01:02:44,360 --> 01:02:48,403
Weaker conclusion and
better quantitative balance.

796
01:02:48,403 --> 01:02:50,820
The next thing and the final
thing I want to discuss today

797
01:02:50,820 --> 01:02:55,140
is a new ingredient which
we haven't seen before

798
01:02:55,140 --> 01:02:58,110
but that will play an
important role in the proof

799
01:02:58,110 --> 01:02:59,580
of the compactness--

800
01:02:59,580 --> 01:03:03,160
in particular, the proof of
the existence of the limit.

801
01:03:03,160 --> 01:03:08,280
And this is something where I
need to discuss martingales.

802
01:03:12,410 --> 01:03:15,010
So martingale gill is
an important object

803
01:03:15,010 --> 01:03:16,555
in probability theory.

804
01:03:16,555 --> 01:03:18,865
And it's a random sequence.

805
01:03:23,620 --> 01:03:28,350
So we'll look at discrete
sequences, so indexed

806
01:03:28,350 --> 01:03:32,010
by non-negative integers.

807
01:03:32,010 --> 01:03:36,620
And is martingale is
such a sequence where

808
01:03:36,620 --> 01:03:43,330
if I'm interested in the
expectation of the next term

809
01:03:43,330 --> 01:03:47,720
and even if you know
all the previous terms--

810
01:03:47,720 --> 01:03:51,530
so you have full knowledge of
the sequence before time n,

811
01:03:51,530 --> 01:03:55,000
and you want to predict
on the expectation what

812
01:03:55,000 --> 01:03:56,440
the nth term is--

813
01:03:56,440 --> 01:04:04,830
then you cannot do better than
simply predicting the last term

814
01:04:04,830 --> 01:04:06,800
that you saw.

815
01:04:06,800 --> 01:04:11,000
So this is the definition
of a martingale.

816
01:04:11,000 --> 01:04:13,730
Now, to do this
formally, I need to talk

817
01:04:13,730 --> 01:04:18,080
about filtrations and what
not in measured theory.

818
01:04:18,080 --> 01:04:20,702
But let me not do that.

819
01:04:20,702 --> 01:04:22,160
OK, so this is how
you should think

820
01:04:22,160 --> 01:04:25,960
about martingales and a
couple of important examples

821
01:04:25,960 --> 01:04:27,080
of martingales.

822
01:04:27,080 --> 01:04:31,670
So the first one comes
from-- the reason

823
01:04:31,670 --> 01:04:35,000
why these things are called
martingales is that there

824
01:04:35,000 --> 01:04:37,280
is a gambling strategy
which is related

825
01:04:37,280 --> 01:04:44,720
to such a sequence where
let's say you consider

826
01:04:44,720 --> 01:04:48,130
a sequence of fair coin tosses.

827
01:04:48,130 --> 01:04:50,500
So here's what
we're going to do.

828
01:04:50,500 --> 01:04:53,850
So suppose we consider
a betting strategy.

829
01:05:03,240 --> 01:05:17,080
And x sub n is equal
to your balance time n.

830
01:05:17,080 --> 01:05:21,120
And suppose that we're
looking at a fair casino

831
01:05:21,120 --> 01:05:27,700
where the expectation of
every game is exactly 0.

832
01:05:27,700 --> 01:05:31,260
Then this is a martingale.

833
01:05:31,260 --> 01:05:33,000
So imagine you have
a sequence of coin

834
01:05:33,000 --> 01:05:38,600
flips, and you win $1 for each
head and lose $1 for each tail.

835
01:05:38,600 --> 01:05:42,050
When you're at time five, you
should have $2 in your pocket.

836
01:05:42,050 --> 01:05:46,420
Then time five
plus 1, you expect

837
01:05:46,420 --> 01:05:48,505
to also have that many dollars.

838
01:05:48,505 --> 01:05:49,130
It might go up.

839
01:05:49,130 --> 01:05:49,838
It might go down.

840
01:05:49,838 --> 01:05:52,726
But in expectation,
it doesn't change.

841
01:05:52,726 --> 01:05:53,670
Is there a question?

842
01:05:56,220 --> 01:05:56,720
OK.

843
01:05:56,720 --> 01:05:59,778
So they're asking about,
is there some independence

844
01:05:59,778 --> 01:06:00,570
condition required?

845
01:06:00,570 --> 01:06:02,190
And the answer is no.

846
01:06:02,190 --> 01:06:04,540
So there's no independence
condition that is required.

847
01:06:04,540 --> 01:06:06,270
So the definition
of a martingale

848
01:06:06,270 --> 01:06:10,020
is just if, even with complete
knowledge of the sequence up

849
01:06:10,020 --> 01:06:14,130
to a certain point, the
difference going forward

850
01:06:14,130 --> 01:06:16,565
is 0 in expectation.

851
01:06:22,570 --> 01:06:27,970
OK, so here's another
example of a martingale,

852
01:06:27,970 --> 01:06:31,490
which actually turns out to
be more relevant to our use--

853
01:06:34,250 --> 01:06:39,980
namely, that if I
have some hidden--

854
01:06:39,980 --> 01:06:44,540
think of x as some hidden
random variable, so something

855
01:06:44,540 --> 01:06:46,880
that you have no idea.

856
01:06:46,880 --> 01:06:56,170
But you can observe
it at time n based

857
01:06:56,170 --> 01:07:07,100
on information up to time n.

858
01:07:11,380 --> 01:07:16,600
So for example, suppose
you have no idea who

859
01:07:16,600 --> 01:07:21,910
is going to win the
presidential election.

860
01:07:21,910 --> 01:07:24,550
And really, nobody has any idea.

861
01:07:24,550 --> 01:07:28,990
But as time proceeds, you
make an educated guess

862
01:07:28,990 --> 01:07:30,790
based on the information
that you have,

863
01:07:30,790 --> 01:07:33,890
all the information you
have up to that point.

864
01:07:33,890 --> 01:07:36,590
And that information becomes
a larger and larger set

865
01:07:36,590 --> 01:07:38,420
as time moves forward.

866
01:07:38,420 --> 01:07:41,090
Your prediction is going to
be a random variable that

867
01:07:41,090 --> 01:07:43,790
goes up and down.

868
01:07:43,790 --> 01:07:48,120
And that will be a
martingale, because--

869
01:07:48,120 --> 01:07:52,980
so how I predict
today based on what

870
01:07:52,980 --> 01:07:56,660
are all the possibilities
happening going forward,

871
01:07:56,660 --> 01:08:00,300
well, one of many
things could happen.

872
01:08:00,300 --> 01:08:05,810
But if I knew that my prediction
is going to, in expectation,

873
01:08:05,810 --> 01:08:08,390
shift upwards, then
I shouldn't have

874
01:08:08,390 --> 01:08:09,710
predicted what I predict today.

875
01:08:09,710 --> 01:08:13,300
I should have predicted
upwards anyway.

876
01:08:13,300 --> 01:08:13,800
OK.

877
01:08:13,800 --> 01:08:19,819
So this is another
construction of martingales.

878
01:08:19,819 --> 01:08:21,410
So this also comes up.

879
01:08:21,410 --> 01:08:26,120
You could have other more pure
mathematics-type explanations,

880
01:08:26,120 --> 01:08:29,930
where suppose I
want to know what

881
01:08:29,930 --> 01:08:34,490
is the chromatic number
of a random graph.

882
01:08:34,490 --> 01:08:38,960
And I show you that
graph one edge at a time.

883
01:08:38,960 --> 01:08:41,270
You can predict the expectation.

884
01:08:41,270 --> 01:08:44,540
You can find the expectation
of this graph's statistic

885
01:08:44,540 --> 01:08:47,630
based on what you've
seen up to time n.

886
01:08:47,630 --> 01:08:51,979
And that sequence
will be a martingale.

887
01:08:51,979 --> 01:08:56,149
An important property
of a martingale,

888
01:08:56,149 --> 01:08:59,990
which is known as the
martingale convergence theorem--

889
01:08:59,990 --> 01:09:06,740
and so that's what we'll need
for the proof of the existence

890
01:09:06,740 --> 01:09:07,790
of the limit next time--

891
01:09:15,689 --> 01:09:20,359
says that every
bounded martingale--

892
01:09:23,649 --> 01:09:27,229
so for example, suppose
your martingale only

893
01:09:27,229 --> 01:09:29,590
takes values between 0 and 1.

894
01:09:29,590 --> 01:09:33,500
So every bounded martingale
converges almost surely.

895
01:09:42,870 --> 01:09:46,715
You cannot have a martingale
which you expect to constantly

896
01:09:46,715 --> 01:09:47,340
go up and down.

897
01:09:53,040 --> 01:09:56,170
So I want to show you
a proof of this fact.

898
01:09:56,170 --> 01:09:59,090
Let me just mention that
the bounded condition is

899
01:09:59,090 --> 01:10:01,490
a little bit stronger than
what we actually need.

900
01:10:01,490 --> 01:10:03,470
From the proof, you'll
see that you really only

901
01:10:03,470 --> 01:10:08,010
need them to be L1 bounded.

902
01:10:08,010 --> 01:10:10,360
It's enough.

903
01:10:10,360 --> 01:10:12,190
And more generally,
there is a condition

904
01:10:12,190 --> 01:10:19,380
called uniform integrability,
which I won't explain.

905
01:10:22,368 --> 01:10:23,364
All right.

906
01:10:26,120 --> 01:10:26,620
OK.

907
01:10:26,620 --> 01:10:29,250
So let me show you a proof
of the martingale convergence

908
01:10:29,250 --> 01:10:29,750
theorem.

909
01:10:29,750 --> 01:10:33,520
And I'm going to be somewhat
informal and somewhat cavalier,

910
01:10:33,520 --> 01:10:35,650
because I don't want
to get into some

911
01:10:35,650 --> 01:10:38,550
of the fine details
of probability theory.

912
01:10:38,550 --> 01:10:43,840
But if you have taken something
like 18.675 probability theory,

913
01:10:43,840 --> 01:10:45,520
then you can fill in
all those details.

914
01:10:48,580 --> 01:10:50,290
So I like this
proof, because it's

915
01:10:50,290 --> 01:10:51,580
kind of a proof by gambling.

916
01:10:56,680 --> 01:11:00,070
So I want to tell you a story
which should convince you that

917
01:11:00,070 --> 01:11:04,380
a martingale cannot
keep going up and down.

918
01:11:04,380 --> 01:11:06,120
It must converge almost surely.

919
01:11:08,640 --> 01:11:15,970
So suppose x sub n
doesn't converge.

920
01:11:19,863 --> 01:11:21,280
OK, so this is why
I say I'm going

921
01:11:21,280 --> 01:11:23,040
to be somewhat cavalier
with probability theory.

922
01:11:23,040 --> 01:11:24,680
So when I say this
doesn't converge,

923
01:11:24,680 --> 01:11:28,060
I mean a specific instance of
the sequence doesn't converge

924
01:11:28,060 --> 01:11:30,050
or some specific realization.

925
01:11:30,050 --> 01:11:39,490
If it doesn't converge,
then there exists a and b,

926
01:11:39,490 --> 01:11:50,740
both rational numbers between
0 and 1, such that the sequence

927
01:11:50,740 --> 01:11:59,040
crosses the interval a,
b infinitely many times.

928
01:12:06,040 --> 01:12:11,060
So by crossing this interval,
what I mean is the following.

929
01:12:19,510 --> 01:12:20,010
OK.

930
01:12:20,010 --> 01:12:23,140
So there's an
important picture which

931
01:12:23,140 --> 01:12:25,900
will help a lot in
understanding this theorem.

932
01:12:31,550 --> 01:12:41,300
So imagine I have this
time n, and I have a and b.

933
01:12:41,300 --> 01:12:43,130
So I have this martingale.

934
01:12:43,130 --> 01:12:55,850
It's realization curve
will be like that.

935
01:12:55,850 --> 01:12:58,390
So that's an instance
of this martingale.

936
01:12:58,390 --> 01:13:03,950
And by crossing, I
mean a sequence that--

937
01:13:03,950 --> 01:13:07,390
OK, so here's what
I mean by crossing.

938
01:13:07,390 --> 01:13:15,192
I start below a and--

939
01:13:15,192 --> 01:13:16,400
let me use a different color.

940
01:13:19,170 --> 01:13:26,320
So I start below a, and I
go above b and then wait

941
01:13:26,320 --> 01:13:30,430
until I come back below a.

942
01:13:30,430 --> 01:13:32,740
And I go above b.

943
01:13:32,740 --> 01:13:36,040
Wait until I come back.

944
01:13:36,040 --> 01:13:37,500
So do like that.

945
01:13:45,592 --> 01:13:46,558
Like that.

946
01:13:52,860 --> 01:13:57,900
So I start below a until
the first time I go above b.

947
01:13:57,900 --> 01:13:59,700
And then I stop that sequence.

948
01:13:59,700 --> 01:14:05,705
So those are the upcrossings
of this martingale.

949
01:14:12,980 --> 01:14:15,960
So upcrossing is when
you start below a,

950
01:14:15,960 --> 01:14:18,720
and then you end up above b.

951
01:14:18,720 --> 01:14:26,040
So if you don't converge,
then there exists such a

952
01:14:26,040 --> 01:14:30,360
and b such that there are
infinitely many such crossings.

953
01:14:30,360 --> 01:14:32,950
So this is just a fact.

954
01:14:32,950 --> 01:14:36,910
It's not hard to see.

955
01:14:36,910 --> 01:14:40,000
And what we'll show is
that this doesn't happen

956
01:14:40,000 --> 01:14:42,280
except with probability 0.

957
01:14:42,280 --> 01:14:53,330
So we'll show that this
occurs with probability 0.

958
01:14:55,950 --> 01:15:02,930
And because there are
only countably many

959
01:15:02,930 --> 01:15:11,690
rational numbers, we find
that x sub n converges

960
01:15:11,690 --> 01:15:13,000
with probability 1.

961
01:15:22,440 --> 01:15:23,630
So these are upcrossings.

962
01:15:23,630 --> 01:15:25,920
So I didn't define
it, but hopefully you

963
01:15:25,920 --> 01:15:29,160
understood from my picture
and my description.

964
01:15:29,160 --> 01:15:36,270
And let me define
by u sub n to be

965
01:15:36,270 --> 01:15:44,620
the number of
upcrossings up to time

966
01:15:44,620 --> 01:15:53,207
n, so the number of
such upcrossings.

967
01:15:55,950 --> 01:15:58,205
Now let me consider
a betting strategy.

968
01:16:05,790 --> 01:16:07,770
Basically, I want to make money.

969
01:16:07,770 --> 01:16:15,290
And I want to make money by
following these upcrossings.

970
01:16:15,290 --> 01:16:15,790
OK.

971
01:16:15,790 --> 01:16:20,050
So every time you
give me a number and--

972
01:16:20,050 --> 01:16:21,710
so think of this as
the stock market.

973
01:16:21,710 --> 01:16:26,647
So it's a fair stock market
where you tell me the price,

974
01:16:26,647 --> 01:16:28,230
and I get to decide,
do I want to buy?

975
01:16:28,230 --> 01:16:31,070
Or do I want to sell?

976
01:16:31,070 --> 01:16:45,720
So consider the betting
strategy where at any time,

977
01:16:45,720 --> 01:16:54,530
we're going to hold either 0
or 1 share of the stock, which

978
01:16:54,530 --> 01:16:57,590
has these moving prices.

979
01:16:57,590 --> 01:17:07,980
And what we're going to do
is if xn is less than a,

980
01:17:07,980 --> 01:17:12,060
is less than the lower
bound, then we're

981
01:17:12,060 --> 01:17:27,890
going to buy and hold, meaning
1, until the first time

982
01:17:27,890 --> 01:17:42,450
that the price reaches
above b and then

983
01:17:42,450 --> 01:17:48,052
sell as soon as the first time
we see the price goes above b.

984
01:17:50,950 --> 01:17:52,900
So this is the betting strategy.

985
01:17:52,900 --> 01:17:54,960
And it's something
which you can implement.

986
01:17:54,960 --> 01:17:57,030
If you see a sequence
of prices, you

987
01:17:57,030 --> 01:17:59,130
can implement this strategy.

988
01:17:59,130 --> 01:18:03,000
And you already hopefully see,
if you have many upcrossings,

989
01:18:03,000 --> 01:18:05,310
then each upcrossing,
you make money.

990
01:18:05,310 --> 01:18:07,620
Each upcrossing, you make money.

991
01:18:07,620 --> 01:18:09,880
And this is almost
too good to be true.

992
01:18:09,880 --> 01:18:15,160
And in fact, we see that the
total gain from this strategy--

993
01:18:15,160 --> 01:18:17,300
so if you start with
some balance, what

994
01:18:17,300 --> 01:18:18,460
you get at the end--

995
01:18:18,460 --> 01:18:22,750
is at least this
difference from a

996
01:18:22,750 --> 01:18:27,452
to b times the number
of upcrossings.

997
01:18:31,270 --> 01:18:33,610
You might start somewhere.

998
01:18:33,610 --> 01:18:35,790
You buy, and then you
just lose everything.

999
01:18:35,790 --> 01:18:38,840
So there might be
an initial cost.

1000
01:18:38,840 --> 01:18:42,400
And that cost is
bounded, because we start

1001
01:18:42,400 --> 01:18:44,680
with a bounded martingale.

1002
01:18:44,680 --> 01:18:52,780
So suppose the martingale
is always between 0 and 1.

1003
01:18:52,780 --> 01:18:54,915
We start with a
bounded martingale.

1004
01:18:57,530 --> 01:19:01,730
But on the other hand,
there is a theorem

1005
01:19:01,730 --> 01:19:04,670
about martingales, which
is not hard to deduce

1006
01:19:04,670 --> 01:19:07,700
from the definition, that
no matter what the betting

1007
01:19:07,700 --> 01:19:11,150
strategy is, the gain
at any particular time

1008
01:19:11,150 --> 01:19:13,580
must be 0 in expectation.

1009
01:19:16,940 --> 01:19:19,240
So this is just the
property of the martingale.

1010
01:19:19,240 --> 01:19:24,190
So 0 equals the
expected gain, which

1011
01:19:24,190 --> 01:19:27,520
is at least b minus a
times the expected number

1012
01:19:27,520 --> 01:19:30,630
of upcrossings minus 1.

1013
01:19:30,630 --> 01:19:35,430
And thus the expected number
of upcrossings up to time n

1014
01:19:35,430 --> 01:19:41,600
is at most 1 over b minus a.

1015
01:19:41,600 --> 01:19:47,140
Now, we let n go to infinity.

1016
01:19:47,140 --> 01:19:57,780
And let u sub infinity be the
total number of upcrossings.

1017
01:20:02,030 --> 01:20:17,430
By the monotone convergence
theorem in this limit,

1018
01:20:17,430 --> 01:20:20,310
the limit of these u sub
n's, it can never go down.

1019
01:20:20,310 --> 01:20:23,740
It's always weakly increasing.

1020
01:20:23,740 --> 01:20:28,020
It converges to the
expectation of the total number

1021
01:20:28,020 --> 01:20:29,232
of upcrossings.

1022
01:20:29,232 --> 01:20:31,440
So now, in particular, you
know that the total number

1023
01:20:31,440 --> 01:20:38,120
of upcrossings is at
most some finite number.

1024
01:20:38,120 --> 01:20:40,300
So in particular,
the probability

1025
01:20:40,300 --> 01:20:45,630
that you have infinitely
many crossings is 0.

1026
01:20:45,630 --> 01:20:50,330
So with probability 0, you
cross infinitely many times,

1027
01:20:50,330 --> 01:20:52,880
which proves the
claim over there

1028
01:20:52,880 --> 01:20:54,870
and which concludes
the proof of the claim

1029
01:20:54,870 --> 01:20:58,535
that the martingale
converges almost surely.

1030
01:20:58,535 --> 01:21:00,660
OK, so that proves the
martingale converge theorem.

1031
01:21:00,660 --> 01:21:02,430
So next time, we'll
combine everything

1032
01:21:02,430 --> 01:21:05,640
that we did today to prove the
three main theorems that we

1033
01:21:05,640 --> 01:21:09,230
stated last time
on graph limits.