1 00:00:00,500 --> 00:00:03,912 [SQUEAKING] 2 00:00:18,590 --> 00:00:21,720 PROFESSOR: Last time, we started discussing graph limits. 3 00:00:21,720 --> 00:00:24,908 And let me remind you some of the notions and definitions 4 00:00:24,908 --> 00:00:25,700 that were involved. 5 00:00:35,590 --> 00:00:37,490 One of the main objects in graph limits 6 00:00:37,490 --> 00:00:46,670 is that of a graphon, which are symmetric, measurable functions 7 00:00:46,670 --> 00:00:49,490 from the unit squared to the unit interval. 8 00:00:58,890 --> 00:01:02,570 So here, symmetric means that w of x, comma, y 9 00:01:02,570 --> 00:01:04,670 equals to w of y, comma, x. 10 00:01:09,810 --> 00:01:11,520 We define a notion of convergence 11 00:01:11,520 --> 00:01:13,980 for a sequence of graphons. 12 00:01:13,980 --> 00:01:21,080 And remember, the notion of convergence 13 00:01:21,080 --> 00:01:33,330 is that a sequence is convergent if the sequence of homomorphism 14 00:01:33,330 --> 00:01:43,330 densities converges as n goes to infinity for every fixed 15 00:01:43,330 --> 00:01:45,680 F, every fixed graph. 16 00:01:49,480 --> 00:01:52,180 So this is how we define convergence. 17 00:01:52,180 --> 00:01:53,920 So a sequence of graphs or graphons, 18 00:01:53,920 --> 00:01:58,360 they converge if all the homomorphism densities-- 19 00:01:58,360 --> 00:02:01,200 so you should think of this as subgraph statistics-- 20 00:02:01,200 --> 00:02:04,520 if all of these statistics converge. 21 00:02:04,520 --> 00:02:10,180 We also say that a sequence converges to a particular limit 22 00:02:10,180 --> 00:02:16,180 if these homomorphism densities converge 23 00:02:16,180 --> 00:02:20,170 to the corresponding homomorphism density 24 00:02:20,170 --> 00:02:24,510 of the limit for every F. 25 00:02:24,510 --> 00:02:25,010 OK. 26 00:02:25,010 --> 00:02:27,740 So this is how we define convergence. 27 00:02:27,740 --> 00:02:29,870 We also define this notion of a distance. 28 00:02:33,140 --> 00:02:35,170 And to do that, we first define the cut 29 00:02:35,170 --> 00:02:41,900 norm to be the following quantity defined 30 00:02:41,900 --> 00:02:49,340 by taking two subsets, S and T, which are measurable. 31 00:02:49,340 --> 00:02:51,890 Everything so far is going to be measurable. 32 00:02:51,890 --> 00:02:55,820 And look at what is the maximum possible deviation 33 00:02:55,820 --> 00:03:00,350 of the integral of this function on this box, S cross T. 34 00:03:00,350 --> 00:03:03,800 And here, w, you should think of it as taking real values, 35 00:03:03,800 --> 00:03:06,133 allowing both positive and negative values, 36 00:03:06,133 --> 00:03:07,550 because otherwise, you should just 37 00:03:07,550 --> 00:03:11,410 take S and T to be the whole interval. 38 00:03:11,410 --> 00:03:12,850 OK. 39 00:03:12,850 --> 00:03:14,950 And this definition was motivated 40 00:03:14,950 --> 00:03:19,620 by our discussion of discrepancy coming from quasi randomness. 41 00:03:19,620 --> 00:03:22,280 Now, if I give you two graphs or graphons 42 00:03:22,280 --> 00:03:24,170 and ask you to compare them, you are 43 00:03:24,170 --> 00:03:28,550 allowed to permute the vertices in some sense, 44 00:03:28,550 --> 00:03:31,140 so to find the best overlay. 45 00:03:31,140 --> 00:03:34,040 And that notion is captured in the definition 46 00:03:34,040 --> 00:03:40,610 of cut distance, which is defined to be the following 47 00:03:40,610 --> 00:03:53,540 quantity, where we consider over all possible measure-preserving 48 00:03:53,540 --> 00:04:10,470 bijections from the interval to itself of the difference 49 00:04:10,470 --> 00:04:14,130 between these two graphons if I rotate 50 00:04:14,130 --> 00:04:18,750 one of them using this measure-preserving bijection. 51 00:04:26,460 --> 00:04:29,175 So think of this as permuting the vertices. 52 00:04:36,130 --> 00:04:39,660 So these were the definitions that were involved last time. 53 00:04:39,660 --> 00:04:41,410 And at the end of last lecture, I 54 00:04:41,410 --> 00:04:45,060 stated three main theorems of graph limit theory. 55 00:04:45,060 --> 00:04:47,230 So I forgot to mention what are some 56 00:04:47,230 --> 00:04:49,820 of the histories of this theory. 57 00:04:49,820 --> 00:04:52,360 So there were a number of important papers 58 00:04:52,360 --> 00:04:57,250 that developed this very idea of graph limits, which is actually 59 00:04:57,250 --> 00:05:00,100 somewhat-- if you think about all of combinatorics, 60 00:05:00,100 --> 00:05:02,830 we like to deal with discrete objects. 61 00:05:02,830 --> 00:05:06,610 And even the idea of taking a limit is rather novel. 62 00:05:06,610 --> 00:05:11,830 So this work is due to a number of people. 63 00:05:11,830 --> 00:05:14,830 In particular, Laszlo Lovasz played a very important 64 00:05:14,830 --> 00:05:17,200 central role in the development of this theory. 65 00:05:17,200 --> 00:05:19,480 And various people came to this theory 66 00:05:19,480 --> 00:05:21,460 from different perspectives-- some 67 00:05:21,460 --> 00:05:24,160 from more pure perspectives, and some 68 00:05:24,160 --> 00:05:26,290 from more applied perspectives. 69 00:05:26,290 --> 00:05:29,810 And this theory is now getting used in more and more places, 70 00:05:29,810 --> 00:05:33,030 including statistics, machine learning, and so on. 71 00:05:33,030 --> 00:05:37,990 And I'll explain where that comes up just a little bit. 72 00:05:37,990 --> 00:05:40,870 At the end of last lecture, I stated three main theorems. 73 00:05:40,870 --> 00:05:44,560 And what I want to do today is develop some tools 74 00:05:44,560 --> 00:05:47,777 so that we can prove those theorems in the next lecture. 75 00:05:47,777 --> 00:05:48,277 OK. 76 00:05:48,277 --> 00:05:49,910 So I want to develop some tools. 77 00:05:49,910 --> 00:05:52,510 In particular, you'll see some of the things that we've talked 78 00:05:52,510 --> 00:05:55,960 about in the chapter on Szemerédi's regularity lemma 79 00:05:55,960 --> 00:05:59,140 come up again in a slightly different language. 80 00:05:59,140 --> 00:06:02,320 So much of what I will say today hopefully should already 81 00:06:02,320 --> 00:06:04,660 be familiar to you, but you will see it again 82 00:06:04,660 --> 00:06:08,690 from the perspective of graph limits. 83 00:06:08,690 --> 00:06:11,257 But first, before telling you about the tools, 84 00:06:11,257 --> 00:06:12,840 I want to give you some more examples. 85 00:06:15,580 --> 00:06:17,970 So one of the ways that I motivated graph limits 86 00:06:17,970 --> 00:06:22,380 last time is this example of an Erdos-Renyi random graph 87 00:06:22,380 --> 00:06:25,470 or a sequence of quasi-random graphs converging 88 00:06:25,470 --> 00:06:26,540 to a constant. 89 00:06:26,540 --> 00:06:30,690 The constant graphon is the limit. 90 00:06:30,690 --> 00:06:32,170 But what about generalizations? 91 00:06:32,170 --> 00:06:34,590 What about generalizations of that construction when 92 00:06:34,590 --> 00:06:37,500 your limit is not the constant? 93 00:06:37,500 --> 00:06:43,530 So this leads to this idea of a w random graph, which 94 00:06:43,530 --> 00:06:49,250 generalizes that of an Erdos-Renyi random graph. 95 00:06:49,250 --> 00:06:58,390 So in Erdos-Renyi, we're looking at every edge occurring 96 00:06:58,390 --> 00:07:03,260 with the same probability, p, uniform throughout the graph. 97 00:07:03,260 --> 00:07:07,250 But what I want to do now is allow you to change the edge 98 00:07:07,250 --> 00:07:08,920 probability somewhat. 99 00:07:08,920 --> 00:07:09,420 OK. 100 00:07:12,288 --> 00:07:14,330 So before giving you the more general definition, 101 00:07:14,330 --> 00:07:19,160 a special case of this is an important model 102 00:07:19,160 --> 00:07:22,090 of random graphs known as the stochastic block model. 103 00:07:25,802 --> 00:07:31,700 And in particular, a two-block model consists of the following 104 00:07:31,700 --> 00:07:39,330 data where I am looking at two types of vertices-- 105 00:07:44,750 --> 00:07:46,030 let's call them red and blue-- 106 00:07:49,650 --> 00:07:54,830 where the vertices are assigned to colors at random-- 107 00:08:01,570 --> 00:08:03,050 for example, 50/50. 108 00:08:03,050 --> 00:08:05,690 But any other probability is fine. 109 00:08:05,690 --> 00:08:10,330 And now I put down the edges according to which colors 110 00:08:10,330 --> 00:08:12,340 the two endpoints are. 111 00:08:12,340 --> 00:08:23,500 So two red vertices are joined with edge probability Prr. 112 00:08:23,500 --> 00:08:26,920 If I have a red and a blue, then I 113 00:08:26,920 --> 00:08:32,380 may have a different probability joining them, and likewise 114 00:08:32,380 --> 00:08:38,400 with blue-blue, like that. 115 00:08:38,400 --> 00:08:40,960 So in other words, I can encode this probability information 116 00:08:40,960 --> 00:08:50,890 in the matrix, like that. 117 00:08:50,890 --> 00:08:54,800 So it's symmetric across the diagonal. 118 00:08:54,800 --> 00:08:57,220 So this is a slightly more general version 119 00:08:57,220 --> 00:08:59,950 of an Erdos-Renyi random graph where now I 120 00:08:59,950 --> 00:09:02,320 have potentially different types of vertices. 121 00:09:02,320 --> 00:09:04,090 And you can imagine these kinds of models 122 00:09:04,090 --> 00:09:06,010 are very important in applied mathematics 123 00:09:06,010 --> 00:09:09,740 for modeling certain situations such as, for example, 124 00:09:09,740 --> 00:09:14,890 if you have people with different political party 125 00:09:14,890 --> 00:09:16,270 affiliations. 126 00:09:16,270 --> 00:09:19,890 How likely are they to talk to each other? 127 00:09:19,890 --> 00:09:22,050 So you can imagine some of these numbers 128 00:09:22,050 --> 00:09:24,700 might be bigger than others. 129 00:09:24,700 --> 00:09:27,360 And there's an important statistical problem. 130 00:09:27,360 --> 00:09:31,050 If I give you a graph, can you cluster or classify 131 00:09:31,050 --> 00:09:33,330 the vertices according to their types 132 00:09:33,330 --> 00:09:36,690 if I do not show you in advance what the colors are but show 133 00:09:36,690 --> 00:09:39,340 you what the output graph is? 134 00:09:39,340 --> 00:09:41,490 So these are important statistical questions 135 00:09:41,490 --> 00:09:45,750 with lots of applications. 136 00:09:45,750 --> 00:09:48,570 This is an example of if you have only two blocks. 137 00:09:48,570 --> 00:09:52,030 But of course, you can have more than two blocks. 138 00:09:52,030 --> 00:09:55,810 And the graphon context tells us that we should not 139 00:09:55,810 --> 00:09:58,540 limit ourselves to just blocks. 140 00:09:58,540 --> 00:10:02,200 If I give you any graphon w, I can also 141 00:10:02,200 --> 00:10:06,040 construct a random graph. 142 00:10:06,040 --> 00:10:08,980 So what I would like to do is to consider 143 00:10:08,980 --> 00:10:12,080 the following construction where-- 144 00:10:12,080 --> 00:10:19,420 OK, so let's just call it w random graph denoted 145 00:10:19,420 --> 00:10:23,920 by g and w-- 146 00:10:23,920 --> 00:10:28,510 where I form the graph using the following process. 147 00:10:28,510 --> 00:10:34,480 First, the vertex set is labeled by 1 through n. 148 00:10:34,480 --> 00:10:44,640 And let me draw the vertex types by taking uniform random x1 149 00:10:44,640 --> 00:10:46,946 through xn-- 150 00:10:46,946 --> 00:10:51,080 OK, so uniform iid. 151 00:10:51,080 --> 00:10:54,170 So you think of them as the vertex colors, the vertex 152 00:10:54,170 --> 00:10:55,560 types. 153 00:10:55,560 --> 00:11:03,440 And I put an edge between i and j 154 00:11:03,440 --> 00:11:10,834 with probability exactly w of xi, 155 00:11:10,834 --> 00:11:17,382 xj, so for all i less than j independently. 156 00:11:21,160 --> 00:11:23,950 That's the definition of a w random graph. 157 00:11:23,950 --> 00:11:26,790 And the two-block stochastic model 158 00:11:26,790 --> 00:11:29,470 is a special case of this w random graph 159 00:11:29,470 --> 00:11:31,720 for the graphon, which corresponds 160 00:11:31,720 --> 00:11:35,310 to this red-blue picture here. 161 00:11:38,650 --> 00:11:49,300 So the generation process would be I give you some x1, x2, x3, 162 00:11:49,300 --> 00:11:57,250 and then, likewise, x1, x3, x2. 163 00:11:57,250 --> 00:12:01,260 And then I evaluate, what is the value of this graphon 164 00:12:01,260 --> 00:12:02,750 at these points? 165 00:12:11,450 --> 00:12:15,080 And those are my edge probabilities. 166 00:12:15,080 --> 00:12:17,420 So what I described is a special case 167 00:12:17,420 --> 00:12:19,580 of this general w random graph. 168 00:12:22,460 --> 00:12:25,570 Any questions? 169 00:12:25,570 --> 00:12:28,390 So like before, an important statistical question 170 00:12:28,390 --> 00:12:31,250 is if I show you the graph, can you 171 00:12:31,250 --> 00:12:37,210 tell me a good model for where this graph came from? 172 00:12:37,210 --> 00:12:41,460 So that's one of the reasons why people in applied math 173 00:12:41,460 --> 00:12:45,970 might care about these types of constructions. 174 00:12:45,970 --> 00:12:47,350 Let me talk about some theorems. 175 00:12:51,050 --> 00:12:54,800 I've told you that the sequence of Erdos-Renyi random graphs 176 00:12:54,800 --> 00:12:57,770 converges to the constant graphon p. 177 00:12:57,770 --> 00:13:01,190 So instead of taking a constant graphon p, 178 00:13:01,190 --> 00:13:04,190 now I start with w random graph. 179 00:13:04,190 --> 00:13:06,860 And you should expect, and it is indeed true, 180 00:13:06,860 --> 00:13:12,500 that this sequence converges to w as their limit. 181 00:13:12,500 --> 00:13:14,190 So let w be a graphon. 182 00:13:19,695 --> 00:13:21,830 So let w be a graphon. 183 00:13:21,830 --> 00:13:28,450 And for each n, let me draw this graph G sub 184 00:13:28,450 --> 00:13:34,472 n using the w random graph model independently. 185 00:13:37,640 --> 00:13:47,810 Then with probability 1, the sequence 186 00:13:47,810 --> 00:13:50,480 converges to the graphon w. 187 00:13:53,680 --> 00:13:58,190 So in the sense that I've shown above, described above. 188 00:13:58,190 --> 00:14:01,640 So this statement tells us a couple 189 00:14:01,640 --> 00:14:04,900 of things-- one, that w random graphs converge to the limit w, 190 00:14:04,900 --> 00:14:12,400 as you should expect; and two, that every graphon w 191 00:14:12,400 --> 00:14:17,750 is the limit point of some sequence of graphs. 192 00:14:17,750 --> 00:14:20,650 So this is something that we never quite explicitly 193 00:14:20,650 --> 00:14:21,950 stated before. 194 00:14:21,950 --> 00:14:24,980 So let me make this remark. 195 00:14:24,980 --> 00:14:39,670 So in particular, every w is the limit 196 00:14:39,670 --> 00:14:47,998 of some sequence of graphs, just like every real number, 197 00:14:47,998 --> 00:14:49,540 in analogy to what we said last time. 198 00:14:49,540 --> 00:14:52,340 Every real number is the limit of a sequence 199 00:14:52,340 --> 00:14:55,760 of rational numbers through rational approximation. 200 00:14:55,760 --> 00:14:59,570 And this is some form of approximation of a graphon 201 00:14:59,570 --> 00:15:01,425 by a sequence of graphs. 202 00:15:01,425 --> 00:15:01,925 OK. 203 00:15:01,925 --> 00:15:03,740 So I'm not going to prove this theorem. 204 00:15:03,740 --> 00:15:08,420 The proof is not difficult. So using that definition 205 00:15:08,420 --> 00:15:11,240 of subgraph convergence, the proof 206 00:15:11,240 --> 00:15:16,890 uses what's known as Azuma's inequality. 207 00:15:16,890 --> 00:15:21,110 So by an appropriate application of Azuma's inequality 208 00:15:21,110 --> 00:15:22,790 on the concentration of martingales, 209 00:15:22,790 --> 00:15:27,110 one can prove this theorem here by estimating 210 00:15:27,110 --> 00:15:28,970 the probability that-- 211 00:15:35,180 --> 00:15:41,960 to show that the probability that the F density in Gn, 212 00:15:41,960 --> 00:15:47,330 it is very close to the F density in w 213 00:15:47,330 --> 00:15:49,336 with high probability. 214 00:15:52,252 --> 00:15:55,145 OK. 215 00:15:55,145 --> 00:15:56,020 Any questions so far? 216 00:15:58,820 --> 00:16:02,600 So this is an important example of one 217 00:16:02,600 --> 00:16:06,220 of the motivations of graph limits. 218 00:16:06,220 --> 00:16:09,460 But now, let's get back to what I said earlier. 219 00:16:09,460 --> 00:16:11,810 I would like to develop a sequence of tools 220 00:16:11,810 --> 00:16:14,150 that will allow us to prove the main theorem stated 221 00:16:14,150 --> 00:16:18,000 at the end of the last lecture. 222 00:16:18,000 --> 00:16:19,470 And this will sound very familiar, 223 00:16:19,470 --> 00:16:23,610 because we're going to write down some lemmas that we did 224 00:16:23,610 --> 00:16:26,490 back in the chapter of Szemerédi's regularity lemma 225 00:16:26,490 --> 00:16:29,450 but now in the language of graphons. 226 00:16:29,450 --> 00:16:31,600 So the first is a counting lemma. 227 00:16:38,270 --> 00:16:39,770 The goal of the counting lemma is 228 00:16:39,770 --> 00:16:42,590 to show that if you have two graphons which 229 00:16:42,590 --> 00:16:50,060 are close to each other in the sense of cut distance, then 230 00:16:50,060 --> 00:16:55,530 their F densities are similar to each other. 231 00:16:55,530 --> 00:16:57,190 So here's a statement. 232 00:16:57,190 --> 00:17:05,403 So if w and u are graphons and F is 233 00:17:05,403 --> 00:17:19,460 a graph, then the F density of w minus the F density of u, 234 00:17:19,460 --> 00:17:24,940 their difference is no more than a constant-- so number 235 00:17:24,940 --> 00:17:32,110 of edges of F times the cut distance between u and w. 236 00:17:37,670 --> 00:17:41,740 So maybe some of you already see how to do this from 237 00:17:41,740 --> 00:17:45,930 our discussion on Szemerédi's regularity lemma. 238 00:17:45,930 --> 00:17:48,790 In any case, I want to just rewrite the proof again 239 00:17:48,790 --> 00:17:50,350 in the language of graphons. 240 00:17:50,350 --> 00:17:52,190 And this will hopefully-- 241 00:17:52,190 --> 00:17:55,700 so we did two proofs of the triangle counting lemma. 242 00:17:55,700 --> 00:17:58,445 One was hopefully more intuitive for you, 243 00:17:58,445 --> 00:18:00,070 which is you pick a typical vertex that 244 00:18:00,070 --> 00:18:01,528 has lots of neighbors on both sides 245 00:18:01,528 --> 00:18:04,412 and therefore lots of edges between. 246 00:18:04,412 --> 00:18:06,370 And then there was a second proof, which I said 247 00:18:06,370 --> 00:18:08,470 was a more analytic proof, where you took out 248 00:18:08,470 --> 00:18:10,420 one edge at a time. 249 00:18:10,420 --> 00:18:13,450 And that proof, I think it's technically easier 250 00:18:13,450 --> 00:18:16,383 to implement, especially for general H. 251 00:18:16,383 --> 00:18:17,800 But the first time you see it, you 252 00:18:17,800 --> 00:18:20,680 might not quite see what the calculation was about. 253 00:18:20,680 --> 00:18:23,320 So I want to do this exact same calculation again 254 00:18:23,320 --> 00:18:24,547 in the language of graphons. 255 00:18:24,547 --> 00:18:26,380 And hopefully, it should be clear this time. 256 00:18:29,600 --> 00:18:31,390 So this is the same as the counting lemma 257 00:18:31,390 --> 00:18:34,800 over epsilon-regular pairs. 258 00:18:34,800 --> 00:18:44,120 So it suffices to prove the inequality 259 00:18:44,120 --> 00:18:49,330 where the right-hand side is replaced not by the cut 260 00:18:49,330 --> 00:18:53,440 distance but by the cut norm. 261 00:18:53,440 --> 00:18:57,550 And the reason is that once you have the second inequality 262 00:18:57,550 --> 00:19:04,410 by taking an infimum over all measure-preserving bijections 263 00:19:04,410 --> 00:19:05,290 phi-- 264 00:19:05,290 --> 00:19:10,990 and notice that that change does not affect the F density. 265 00:19:10,990 --> 00:19:12,900 By taking an infimum over phi, you 266 00:19:12,900 --> 00:19:14,752 recover the first inequality. 267 00:19:17,590 --> 00:19:22,360 I want to give you a small reformulation of the cut norm 268 00:19:22,360 --> 00:19:25,606 that will be useful for thinking about this counting lemma. 269 00:19:29,980 --> 00:19:37,750 Here's a reformulation of the cut norm-- 270 00:19:37,750 --> 00:19:42,470 namely, that I can define the cut norm. 271 00:19:42,470 --> 00:19:45,840 So here, w is taking real values, so 272 00:19:45,840 --> 00:19:48,630 not necessarily non-negative. 273 00:19:48,630 --> 00:19:52,860 So the cut norm we saw earlier is 274 00:19:52,860 --> 00:20:01,940 defined to be the supremum over all measurable subsets 275 00:20:01,940 --> 00:20:08,900 of the 0, 1 interval of this integral in absolute value. 276 00:20:08,900 --> 00:20:14,780 But it turns out I can rewrite this supremum over a slightly 277 00:20:14,780 --> 00:20:16,850 larger set of objects. 278 00:20:16,850 --> 00:20:21,500 Instead of just looking over measurable subsets 279 00:20:21,500 --> 00:20:26,330 of the interval, let me now look at measurable functions. 280 00:20:26,330 --> 00:20:29,130 Little u. 281 00:20:29,130 --> 00:20:32,570 So OK, let me look at functions. 282 00:20:32,570 --> 00:20:40,860 So u and v from 0, 1 to 0, 1-- 283 00:20:40,860 --> 00:20:46,530 and as always, everything is measurable-- 284 00:20:46,530 --> 00:20:49,650 of the following integral. 285 00:21:01,570 --> 00:21:04,260 So I claim this is true. 286 00:21:04,260 --> 00:21:09,370 So I consider this integral. 287 00:21:09,370 --> 00:21:11,480 Instead of integrating over a box, 288 00:21:11,480 --> 00:21:16,160 now I'm integrating this expression. 289 00:21:16,160 --> 00:21:16,660 OK. 290 00:21:16,660 --> 00:21:19,380 So why is this true? 291 00:21:19,380 --> 00:21:23,670 Well, one of the directions is easy to see, 292 00:21:23,670 --> 00:21:27,630 because the right-hand side is strictly an enlargement 293 00:21:27,630 --> 00:21:29,070 of the left-hand side. 294 00:21:29,070 --> 00:21:35,940 So by taking u to be the indicator function of S 295 00:21:35,940 --> 00:21:38,750 and v to be the indicator of function of T, 296 00:21:38,750 --> 00:21:40,680 you see that the right-hand side, in fact, 297 00:21:40,680 --> 00:21:42,690 includes the left-hand side in terms 298 00:21:42,690 --> 00:21:45,330 of what you are allowed to do. 299 00:21:45,330 --> 00:21:48,160 But what about the other direction? 300 00:21:48,160 --> 00:21:50,070 So for the other direction, the main thing 301 00:21:50,070 --> 00:21:56,700 is to notice that the integral or the integrand, 302 00:21:56,700 --> 00:22:05,800 what's inside this integral, is bilinear in the values of u 303 00:22:05,800 --> 00:22:12,390 and v. So in particular, the extrema of this integral, 304 00:22:12,390 --> 00:22:17,210 as you allow to vary u and v, they are obtained. 305 00:22:17,210 --> 00:22:22,350 So they are obtained for u and v, 306 00:22:22,350 --> 00:22:31,610 taking values in the endpoints 0, comma, 1. 307 00:22:36,030 --> 00:22:39,160 It may be helpful to think about the discrete setting, when, 308 00:22:39,160 --> 00:22:42,070 instead of this integral, you have a matrix and two vectors 309 00:22:42,070 --> 00:22:43,870 multiplied from left and right. 310 00:22:43,870 --> 00:22:46,840 And you had to decide, what are the coordinates 311 00:22:46,840 --> 00:22:48,560 of those vectors? 312 00:22:48,560 --> 00:22:50,260 It's a bilinear form. 313 00:22:50,260 --> 00:22:53,090 How do you maximize it or minimize it? 314 00:22:53,090 --> 00:22:57,900 You have to change every entry to one of its two endpoints. 315 00:22:57,900 --> 00:23:00,660 Otherwise, it can never be-- 316 00:23:00,660 --> 00:23:04,610 you never lose by doing that. 317 00:23:04,610 --> 00:23:05,950 OK, so think about it. 318 00:23:05,950 --> 00:23:12,610 So this is not difficult once you see it the right way. 319 00:23:12,610 --> 00:23:18,630 But now, we have this cut norm expressed over not sets, 320 00:23:18,630 --> 00:23:22,220 but over bounded functions. 321 00:23:22,220 --> 00:23:24,620 And now I'm ready to prove the counting lemma. 322 00:23:32,400 --> 00:23:36,000 And instead of writing down the whole proof for general H, 323 00:23:36,000 --> 00:23:40,650 let me write down the calculation that illustrates 324 00:23:40,650 --> 00:23:42,600 this proof for triangles. 325 00:23:49,460 --> 00:23:50,840 And the general proof is the same 326 00:23:50,840 --> 00:23:54,500 once you understand how this argument works. 327 00:23:54,500 --> 00:24:00,770 And the argument works by considering the difference 328 00:24:00,770 --> 00:24:09,890 between these two F densities. 329 00:24:09,890 --> 00:24:12,710 And what I want to do is-- 330 00:24:12,710 --> 00:24:14,160 so this is some integral, right? 331 00:24:14,160 --> 00:24:17,090 So this is this integral, which I'll write out. 332 00:24:41,780 --> 00:24:46,640 So we would like to show that this quantity here 333 00:24:46,640 --> 00:24:51,730 is small if u and w are close in cut norm. 334 00:24:51,730 --> 00:24:59,830 So let's write this integral as a telescoping sum 335 00:24:59,830 --> 00:25:03,900 where the first term is obtained by-- 336 00:25:08,990 --> 00:25:11,150 so by this, I mean w of x, comma, y 337 00:25:11,150 --> 00:25:12,440 minus u of x, comma, y. 338 00:25:24,440 --> 00:25:27,290 And then the second term of the telescoping sum-- 339 00:25:27,290 --> 00:25:28,950 so you see what happens. 340 00:25:28,950 --> 00:25:31,040 I change one factor at a time. 341 00:25:51,570 --> 00:25:54,810 And finally, I change the third factor. 342 00:26:09,300 --> 00:26:10,300 So this is the identity. 343 00:26:10,300 --> 00:26:12,280 If you expand out all of these differences, 344 00:26:12,280 --> 00:26:15,630 you see that everything intermediate cancels out. 345 00:26:15,630 --> 00:26:19,700 So it's a telescoping sum. 346 00:26:19,700 --> 00:26:24,281 But now I want to show that each term is small. 347 00:26:24,281 --> 00:26:28,170 So how can I show that each term is small? 348 00:26:28,170 --> 00:26:32,400 Look at this expression here. 349 00:26:34,992 --> 00:26:38,280 I claim that for a fixed value of z-- 350 00:26:45,300 --> 00:26:47,490 so imagine fixing z. 351 00:26:47,490 --> 00:26:52,000 And let x and y vary in this integral. 352 00:26:52,000 --> 00:26:55,760 It has the form up there, right? 353 00:26:55,760 --> 00:27:00,680 If you fix z, then you have this u and v 354 00:27:00,680 --> 00:27:02,660 coming from these two factors. 355 00:27:02,660 --> 00:27:04,880 And they are both bounded between 0 and 1. 356 00:27:08,090 --> 00:27:18,170 So for a fixed value of z, this is at most w minus u-- 357 00:27:18,170 --> 00:27:23,290 the cut norm difference between w and u in absolute value. 358 00:27:27,520 --> 00:27:33,590 So if I left z vary, it is still bounded in absolute value 359 00:27:33,590 --> 00:27:36,450 by that quantity. 360 00:27:36,450 --> 00:27:46,580 So therefore each is bounded by w minus u cut 361 00:27:46,580 --> 00:27:49,910 norm in absolute value. 362 00:27:49,910 --> 00:27:52,410 Add all three of them together. 363 00:27:52,410 --> 00:27:57,290 We find that the whole thing is bounded in absolute value 364 00:27:57,290 --> 00:27:59,963 by 3 times the cut normal difference. 365 00:28:03,350 --> 00:28:06,660 OK, and that finishes the proof of the counting lemma. 366 00:28:06,660 --> 00:28:10,050 For triangles, of course, if you have general H, 367 00:28:10,050 --> 00:28:12,600 then you just have more terms. 368 00:28:12,600 --> 00:28:18,040 You have a longer telescoping sum, and you have this bound. 369 00:28:18,040 --> 00:28:18,540 OK. 370 00:28:18,540 --> 00:28:19,450 So this is a counting lemma. 371 00:28:19,450 --> 00:28:22,080 And I claim that it's exactly the same proof as the second 372 00:28:22,080 --> 00:28:24,952 proof of the counting lemma that we did when we discussed 373 00:28:24,952 --> 00:28:27,160 Szemerédi's regularity lemma and this counting lemma. 374 00:28:30,220 --> 00:28:33,194 Any questions? 375 00:28:33,194 --> 00:28:33,694 Yeah. 376 00:28:37,082 --> 00:28:42,487 AUDIENCE: Why did it suffice to prove over the [INAUDIBLE]?? 377 00:28:42,487 --> 00:28:43,070 PROFESSOR: OK. 378 00:28:43,070 --> 00:28:45,460 So let me answer that in a second. 379 00:28:45,460 --> 00:28:48,280 So first, this should be H, not F. OK, 380 00:28:48,280 --> 00:28:55,000 so your question was, up there, why 381 00:28:55,000 --> 00:28:59,350 was it sufficient to prove this version instead 382 00:28:59,350 --> 00:29:00,365 of that version? 383 00:29:00,365 --> 00:29:01,240 Is that the question? 384 00:29:01,240 --> 00:29:02,177 AUDIENCE: Yeah. 385 00:29:02,177 --> 00:29:02,760 PROFESSOR: OK. 386 00:29:02,760 --> 00:29:04,970 Suppose I prove it for this version. 387 00:29:04,970 --> 00:29:06,870 So I know this is true. 388 00:29:06,870 --> 00:29:09,610 Now I take infimum of both sides. 389 00:29:09,610 --> 00:29:17,990 So now I consider infimum of both sides. 390 00:29:17,990 --> 00:29:21,380 So then this is true, right? 391 00:29:21,380 --> 00:29:24,440 Because it's true for every phi. 392 00:29:24,440 --> 00:29:28,490 But the left-hand side doesn't change, because the F density 393 00:29:28,490 --> 00:29:32,930 in a relabeling of the vertices, it's still the same quantity, 394 00:29:32,930 --> 00:29:34,880 whereas this one here is now that. 395 00:29:40,226 --> 00:29:41,198 All right. 396 00:29:44,600 --> 00:29:53,320 So what we see as a corollary of this counting lemma 397 00:29:53,320 --> 00:29:58,540 is that if you are a Cauchy sequence with respect 398 00:29:58,540 --> 00:30:06,940 to the cut distance, then the sequence 399 00:30:06,940 --> 00:30:09,347 is automatically convergent. 400 00:30:15,663 --> 00:30:17,330 So recall the definition of convergence. 401 00:30:17,330 --> 00:30:20,920 Convergence has to do with F densities converging. 402 00:30:20,920 --> 00:30:22,940 And if you have a Cauchy sequence, 403 00:30:22,940 --> 00:30:25,970 then the F densities converge. 404 00:30:25,970 --> 00:30:29,000 And also, a related but different statement 405 00:30:29,000 --> 00:30:35,180 is that if you have a sequence wn that 406 00:30:35,180 --> 00:30:41,040 converges to w in cut distance, then 407 00:30:41,040 --> 00:30:45,810 it implies that wn converges to w in the sense 408 00:30:45,810 --> 00:30:48,496 as defined for F densities. 409 00:30:51,880 --> 00:30:55,270 So qualitatively, what the counting lemma says 410 00:30:55,270 --> 00:31:00,550 is that the cut norm is stronger than the notion of convergence 411 00:31:00,550 --> 00:31:05,260 coming from subgraph densities. 412 00:31:05,260 --> 00:31:08,668 So this is one part of this regularity method, so 413 00:31:08,668 --> 00:31:09,460 the counting lemma. 414 00:31:09,460 --> 00:31:12,503 Of course, the other part is the regularity lemma itself. 415 00:31:12,503 --> 00:31:13,920 So that's the next thing we'll do. 416 00:31:17,020 --> 00:31:18,610 And it turns out that we actually 417 00:31:18,610 --> 00:31:21,190 don't need the full strength of the regularity lemma. 418 00:31:21,190 --> 00:31:23,740 We only need something called a weak regularity lemma. 419 00:31:37,660 --> 00:31:41,690 What the weak regularity lemma says is-- 420 00:31:41,690 --> 00:31:44,850 I mean, you still have a partition of the vertices. 421 00:31:44,850 --> 00:31:46,370 So let me now state it for graphons. 422 00:31:46,370 --> 00:31:53,110 So for a partition p-- 423 00:31:53,110 --> 00:31:56,920 so I have a partition of the vertex set-- 424 00:32:04,120 --> 00:32:13,100 and a symmetric, measurable function w-- 425 00:32:13,100 --> 00:32:16,080 I'm just going to omit the word "measurable" from now on. 426 00:32:16,080 --> 00:32:18,990 Everything will be measurable. 427 00:32:18,990 --> 00:32:22,160 What I can do is, OK, all of these assets 428 00:32:22,160 --> 00:32:24,463 are also measurable. 429 00:32:27,780 --> 00:32:38,130 I can define what's known as a stepping operator that sends w 430 00:32:38,130 --> 00:32:43,190 to this object, w sub p, obtained 431 00:32:43,190 --> 00:32:55,210 by averaging over the steps si cross sj 432 00:32:55,210 --> 00:33:01,490 and replacing that graphon by its average over each step. 433 00:33:01,490 --> 00:33:07,900 Precisely, so I obtain a new graphon, 434 00:33:07,900 --> 00:33:11,630 a new symmetric, measurable function, w sub p, 435 00:33:11,630 --> 00:33:20,100 where the value on x, comma, y is defined 436 00:33:20,100 --> 00:33:23,890 to be the following quantity-- 437 00:33:30,610 --> 00:33:39,040 if x, comma, y lies in si cross sj. 438 00:33:39,040 --> 00:33:43,840 So pictorially, what happens is that you look at your graphon. 439 00:33:47,540 --> 00:33:51,262 There's a partition of the vertex set, 440 00:33:51,262 --> 00:33:52,853 so to speak, the interval. 441 00:33:52,853 --> 00:33:54,770 Doesn't have to be a partition into intervals, 442 00:33:54,770 --> 00:33:57,850 but for illustration, suppose it looks like that. 443 00:33:57,850 --> 00:34:01,850 And what I do is I take this w, and I replace it 444 00:34:01,850 --> 00:34:06,590 by a new graphon, a new symmetric, measurable function, 445 00:34:06,590 --> 00:34:12,749 w sub p, obtained by averaging. 446 00:34:16,421 --> 00:34:17,600 Take each box. 447 00:34:17,600 --> 00:34:18,860 Replace it by its average. 448 00:34:18,860 --> 00:34:22,310 Put that average into the box. 449 00:34:22,310 --> 00:34:26,920 So this is what w sub p is supposed to be. 450 00:34:26,920 --> 00:34:29,710 Just a few minor technicalities. 451 00:34:29,710 --> 00:34:39,690 If this denominator is equal to 0, let's ignore the set. 452 00:34:39,690 --> 00:34:42,679 I mean, then you have a zero measure set, anyway, 453 00:34:42,679 --> 00:34:44,820 so we ignore that set. 454 00:34:44,820 --> 00:34:47,330 So everything will be treated up to measure zero, 455 00:34:47,330 --> 00:34:49,850 changing the function on measure zero sets. 456 00:34:49,850 --> 00:34:53,883 So it doesn't really matter if you're not strictly 457 00:34:53,883 --> 00:34:55,050 allowed to do this division. 458 00:34:58,310 --> 00:34:59,200 OK. 459 00:34:59,200 --> 00:35:01,990 So this operator plays an important role 460 00:35:01,990 --> 00:35:03,820 in the regularity lemma, because it's 461 00:35:03,820 --> 00:35:07,050 how we think about partitioning, what happens to a graph 462 00:35:07,050 --> 00:35:08,260 under partitioning. 463 00:35:08,260 --> 00:35:12,640 It has several other names if you look at it from slightly 464 00:35:12,640 --> 00:35:14,060 different perspectives. 465 00:35:14,060 --> 00:35:19,400 So you can view it as a projection 466 00:35:19,400 --> 00:35:22,280 in the sense of Hilbert space. 467 00:35:22,280 --> 00:35:35,170 So in the Hilbert space of functions on the unit square, 468 00:35:35,170 --> 00:35:44,840 the stepping operator is a projection unto the subspace 469 00:35:44,840 --> 00:35:52,090 of constants, subspace of functions 470 00:35:52,090 --> 00:35:56,660 that are constant on each step. 471 00:36:05,210 --> 00:36:06,920 So that's one interpretation. 472 00:36:06,920 --> 00:36:09,860 Another interpretation is that this operation is also 473 00:36:09,860 --> 00:36:11,870 a conditional expectation. 474 00:36:17,340 --> 00:36:21,900 If you know what a conditional expectation actually 475 00:36:21,900 --> 00:36:25,130 is in the sense of probability theory, 476 00:36:25,130 --> 00:36:26,940 so then that's what happens here. 477 00:36:26,940 --> 00:36:30,720 If you view 0, 1 squared as a probability space, 478 00:36:30,720 --> 00:36:35,340 then what we're doing is we're doing conditional expectation 479 00:36:35,340 --> 00:36:39,750 relative to the sigma algebra generated by these steps. 480 00:36:41,793 --> 00:36:43,710 So these are just a couple of ways of thinking 481 00:36:43,710 --> 00:36:44,627 about what's going on. 482 00:36:44,627 --> 00:36:46,290 They might be somewhat helpful later on 483 00:36:46,290 --> 00:36:47,873 if you're familiar with these notions. 484 00:36:47,873 --> 00:36:49,705 But if you're not, don't worry about it. 485 00:36:49,705 --> 00:36:51,330 Concretely, it's what happens up there. 486 00:36:58,340 --> 00:36:58,990 OK. 487 00:36:58,990 --> 00:37:01,930 So now let me state the weak regularity lemma. 488 00:37:13,530 --> 00:37:16,550 So the weak regularity lemma is attributed 489 00:37:16,550 --> 00:37:25,800 to Frieze and Kannan, although their work predates 490 00:37:25,800 --> 00:37:27,540 the language of graphons. 491 00:37:27,540 --> 00:37:29,720 So it's stated in the language of graphs, 492 00:37:29,720 --> 00:37:30,720 but it's the same proof. 493 00:37:30,720 --> 00:37:33,410 So let me state it for you both in terms of graphons 494 00:37:33,410 --> 00:37:35,070 and in graphs. 495 00:37:35,070 --> 00:37:48,160 What it says is that for every epsilon and every graphon w, 496 00:37:48,160 --> 00:38:00,760 there exists a partition denoted p of the 0, 1 interval. 497 00:38:00,760 --> 00:38:03,110 And now I tell you how many sets there are. 498 00:38:03,110 --> 00:38:05,320 So it's a partition into-- 499 00:38:05,320 --> 00:38:08,300 so not a tower-type number of parts, 500 00:38:08,300 --> 00:38:11,920 but only roughly an exponential number of parts-- 501 00:38:11,920 --> 00:38:22,250 4 to the 1 over epsilon squared measurable sets such 502 00:38:22,250 --> 00:38:29,710 that if we apply the stepping operator to this graphon, 503 00:38:29,710 --> 00:38:35,538 we obtain an approximation of the graphon in the cut norm. 504 00:38:40,520 --> 00:38:45,050 So that's the statement of the weak regularity lemma. 505 00:38:45,050 --> 00:38:51,620 There exists a partition such that if you do this stepping, 506 00:38:51,620 --> 00:38:53,460 then you obtain an approximation. 507 00:38:53,460 --> 00:38:56,120 So I want you to think about what this has to do with 508 00:38:56,120 --> 00:38:58,600 the usual version of Szemerédi's regularity lemma that 509 00:38:58,600 --> 00:39:00,030 you've seen earlier. 510 00:39:00,030 --> 00:39:01,970 So hopefully, you should realize, morally, 511 00:39:01,970 --> 00:39:04,660 they're about the same types of statements. 512 00:39:04,660 --> 00:39:07,980 But more importantly, how are they different from each other? 513 00:39:07,980 --> 00:39:12,620 And now let me state a version for graphs, which is similar 514 00:39:12,620 --> 00:39:17,090 but not exactly the same as what we just saw for graphons. 515 00:39:17,090 --> 00:39:19,520 So let me state it. 516 00:39:19,520 --> 00:39:26,300 So for graphs, the weak regularity lemma 517 00:39:26,300 --> 00:39:36,420 says that, OK, so for graphs, let me define a partition 518 00:39:36,420 --> 00:39:55,130 p of the vertex set is called weakly epsilon regular 519 00:39:55,130 --> 00:39:58,360 if the following is true. 520 00:39:58,360 --> 00:40:03,055 If it is the case that whenever I look at two vertex subsets, 521 00:40:03,055 --> 00:40:08,650 A and B, of the vertex set of g, then 522 00:40:08,650 --> 00:40:13,880 the number of vertices between A and B 523 00:40:13,880 --> 00:40:21,530 is what you should expect based on the density information that 524 00:40:21,530 --> 00:40:24,710 comes out of this partition. 525 00:40:24,710 --> 00:40:32,830 Namely, if I sum over all the parts of the partition, 526 00:40:32,830 --> 00:40:46,200 look at how many vertices from A lie in the corresponding parts. 527 00:40:46,200 --> 00:40:51,090 And then multiply by the edge density between these parts. 528 00:40:51,090 --> 00:40:53,820 So that's your predicted value based on the data that 529 00:40:53,820 --> 00:40:55,900 comes out of the partition. 530 00:40:55,900 --> 00:40:58,170 So I claim that this is the actual number of edges. 531 00:40:58,170 --> 00:41:00,720 This is the predicted number of edges. 532 00:41:00,720 --> 00:41:07,395 And those two numbers should be similar to each other bt 533 00:41:07,395 --> 00:41:11,380 at most epsilon n, where n is the number of vertices. 534 00:41:11,380 --> 00:41:14,680 So this is the definition of what it means for a partition 535 00:41:14,680 --> 00:41:18,700 to be weakly epsilon regular. 536 00:41:18,700 --> 00:41:22,190 So it's important to think about why this is weaker. 537 00:41:22,190 --> 00:41:23,190 It's called weak, right? 538 00:41:23,190 --> 00:41:28,150 So why is it weaker than a notion of epsilon regularity? 539 00:41:28,150 --> 00:41:30,450 So why is it weaker? 540 00:41:30,450 --> 00:41:34,110 So previously, we had epsilon-regular partition 541 00:41:34,110 --> 00:41:36,900 in the definition of Szemerédi's regularity lemma, 542 00:41:36,900 --> 00:41:38,880 this epsilon-regular partition. 543 00:41:38,880 --> 00:41:43,350 And here, notion of weakly epsilon regular. 544 00:41:43,350 --> 00:41:44,620 So why is this a lot weaker? 545 00:41:47,460 --> 00:41:52,050 It is not saying that individual pairs of parts 546 00:41:52,050 --> 00:41:55,355 are epsilon regular. 547 00:41:55,355 --> 00:41:57,730 And eventually, we're going to have this number of parts. 548 00:41:57,730 --> 00:42:00,210 So I'll state a theorem in a second. 549 00:42:00,210 --> 00:42:04,070 So the sizes of the parts are much smaller 550 00:42:04,070 --> 00:42:07,380 than epsilon fraction. 551 00:42:07,380 --> 00:42:12,080 But what this weak notion of regularity says, if you look 552 00:42:12,080 --> 00:42:13,950 at it globally-- 553 00:42:13,950 --> 00:42:15,740 so not looking at specific parts, 554 00:42:15,740 --> 00:42:17,450 but looking at it globally-- 555 00:42:17,450 --> 00:42:19,670 then this partition is a good approximation 556 00:42:19,670 --> 00:42:24,280 of what's going on in the actual graph, whereas-- 557 00:42:24,280 --> 00:42:25,710 OK, so it's worth thinking about. 558 00:42:25,710 --> 00:42:27,335 It's really worth thinking about what's 559 00:42:27,335 --> 00:42:29,990 the difference between this weak notion and the usual notion. 560 00:42:29,990 --> 00:42:33,380 But first, let me state this regularity lemma. 561 00:42:33,380 --> 00:42:43,330 So the weak regularity lemma for graphs 562 00:42:43,330 --> 00:42:50,820 says that for every epsilon and every graph G, 563 00:42:50,820 --> 00:43:03,360 there exists a weakly epsilon-regular partition 564 00:43:03,360 --> 00:43:09,090 of the vertex set of G into at most 4 565 00:43:09,090 --> 00:43:11,570 to the 1 over epsilon squared parts. 566 00:43:20,240 --> 00:43:24,640 Now, you might wonder why did Frieze and Kannan come up 567 00:43:24,640 --> 00:43:29,010 with this notion of regularity. 568 00:43:29,010 --> 00:43:32,010 It's a weaker result if you don't care about the bounds, 569 00:43:32,010 --> 00:43:38,070 because an epsilon-regular partition will be automatically 570 00:43:38,070 --> 00:43:41,360 weakly epsilon regular. 571 00:43:41,360 --> 00:43:43,220 So maybe with small changes of epsilon 572 00:43:43,220 --> 00:43:46,370 if you wish, but basically, this is a weaker notion 573 00:43:46,370 --> 00:43:47,690 compared to what we had before. 574 00:43:50,780 --> 00:43:53,560 But of course, the advantage is that you have a much more 575 00:43:53,560 --> 00:43:56,230 reasonable number of parts. 576 00:43:56,230 --> 00:43:58,210 It's not a tower. 577 00:43:58,210 --> 00:44:01,180 It's just a single exponential. 578 00:44:01,180 --> 00:44:02,110 And this is important. 579 00:44:02,110 --> 00:44:05,740 And their motivation was a computer science and algorithm 580 00:44:05,740 --> 00:44:06,760 application. 581 00:44:06,760 --> 00:44:11,410 So I want to take a brief detour and mention 582 00:44:11,410 --> 00:44:18,022 why you might care about weakly epsilon-regular partitions. 583 00:44:22,180 --> 00:44:25,240 In particular, the problem that is of interest 584 00:44:25,240 --> 00:44:30,980 is in approximating something called a max cut. 585 00:44:30,980 --> 00:44:38,060 So the max cut problem asks you to determine-- given a graph G, 586 00:44:38,060 --> 00:44:46,360 find the maximum over all subsets of vertices, 587 00:44:46,360 --> 00:44:49,610 the maximum number of vertices between a set 588 00:44:49,610 --> 00:44:51,040 and its complement. 589 00:44:51,040 --> 00:44:52,430 That's called a cut. 590 00:44:52,430 --> 00:44:56,430 I give you a graph, and I want to know-- 591 00:44:56,430 --> 00:45:01,860 find this s so that it can have as many edges 592 00:45:01,860 --> 00:45:05,488 across this set as possible. 593 00:45:05,488 --> 00:45:07,530 This is an important problem in computer science, 594 00:45:07,530 --> 00:45:09,120 extremely important problem. 595 00:45:09,120 --> 00:45:12,450 And the status of this problem is 596 00:45:12,450 --> 00:45:20,640 that it is known to be difficult to get it even within 1%. 597 00:45:20,640 --> 00:45:24,188 So the best algorithm is due to Goemans and Williamson. 598 00:45:30,410 --> 00:45:32,120 It's an important algorithm that was 599 00:45:32,120 --> 00:45:33,560 one of the foundational algorithms 600 00:45:33,560 --> 00:45:35,690 in semidefinite programming, so related-- 601 00:45:35,690 --> 00:45:37,340 the words "semidefinite programming" 602 00:45:37,340 --> 00:45:40,070 came up earlier in this course when we discussed growth index 603 00:45:40,070 --> 00:45:40,970 inequality. 604 00:45:40,970 --> 00:45:43,820 So they came up with an approximation algorithm. 605 00:45:43,820 --> 00:45:47,100 So here, I'm only talking about polynomial time, 606 00:45:47,100 --> 00:45:48,830 so efficient algorithms. 607 00:45:48,830 --> 00:45:53,600 Approximation algorithm with approximation ratio 608 00:45:53,600 --> 00:45:56,120 around 0.878. 609 00:45:56,120 --> 00:46:03,900 So one can obtain a cut that is within basically 610 00:46:03,900 --> 00:46:07,820 13% of the maximum. 611 00:46:07,820 --> 00:46:10,540 So it's an approximation algorithm. 612 00:46:10,540 --> 00:46:17,380 However, it is known that it is hard in the sense of complexity 613 00:46:17,380 --> 00:46:18,540 theory. 614 00:46:18,540 --> 00:46:29,830 It'd be hard to approximate beyond the ratio 16 over 17, 615 00:46:29,830 --> 00:46:37,000 which is around 0.491. 616 00:46:37,000 --> 00:46:38,980 And there is an important conjecture 617 00:46:38,980 --> 00:46:41,800 in computer science called a unique games conjecture 618 00:46:41,800 --> 00:46:44,240 that, if that conjecture were true, 619 00:46:44,240 --> 00:46:46,710 then it would be difficult. It would 620 00:46:46,710 --> 00:46:52,010 be hard to approximate beyond the Goemans-Williamson ratio. 621 00:46:52,010 --> 00:46:54,070 So this indicates the status of this problem. 622 00:46:54,070 --> 00:46:59,760 It is difficult to do an epsilon approximation. 623 00:46:59,760 --> 00:47:03,135 But if the graph I give you is dense-- 624 00:47:10,460 --> 00:47:13,040 "dense" meaning a quadratic number 625 00:47:13,040 --> 00:47:17,970 of edges, where n is a number of vertices-- 626 00:47:17,970 --> 00:47:25,210 then it turns out that the regularity-type algorithms-- 627 00:47:25,210 --> 00:47:28,390 so that theorem combined with the algorithmic versions 628 00:47:28,390 --> 00:47:35,360 allows you to get polynomial time approximation algorithms. 629 00:47:35,360 --> 00:47:38,660 So this is polynomial time approximation schemes. 630 00:47:41,620 --> 00:47:52,000 So one can approximate up to 1 minus epsilon ratio. 631 00:47:52,000 --> 00:47:57,940 So one can approximate up to epsilon 632 00:47:57,940 --> 00:48:07,796 n squared additive error in polynomial time. 633 00:48:07,796 --> 00:48:12,730 So in particular, if I'm willing to lose 0.01 n squared, 634 00:48:12,730 --> 00:48:16,540 then there is an algorithm to approximate the size of the max 635 00:48:16,540 --> 00:48:17,040 cut. 636 00:48:17,040 --> 00:48:21,110 And that algorithm basically comes from-- 637 00:48:21,110 --> 00:48:23,320 without giving you any details whatsoever, 638 00:48:23,320 --> 00:48:27,310 the algorithm essentially comes from first finding a regularity 639 00:48:27,310 --> 00:48:28,334 partition. 640 00:48:35,110 --> 00:48:40,120 So the partition breaks the set of vertices 641 00:48:40,120 --> 00:48:43,240 into some number of pieces. 642 00:48:43,240 --> 00:48:57,640 And now I search over all possible ratios 643 00:48:57,640 --> 00:49:01,080 to divide each piece. 644 00:49:04,280 --> 00:49:06,210 So there is a bounded number of parts. 645 00:49:06,210 --> 00:49:09,320 Each one of those, I decide, do I cut this up half-half? 646 00:49:09,320 --> 00:49:13,270 Do I cut it up 1/3, 2/3, and so on? 647 00:49:13,270 --> 00:49:17,940 And those numbers alone, because of this definition 648 00:49:17,940 --> 00:49:22,040 of weakly epsilon regular, once you 649 00:49:22,040 --> 00:49:27,005 know what the intersection of A, B is, let's say, 650 00:49:27,005 --> 00:49:29,780 a complement is with individual sets, 651 00:49:29,780 --> 00:49:32,510 then I basically know the number of edges. 652 00:49:32,510 --> 00:49:36,800 So I can approximate the size of the max cut 653 00:49:36,800 --> 00:49:41,300 using a weakly epsilon-regular partition. 654 00:49:41,300 --> 00:49:47,360 So that was the motivation for these weakly epsilon 655 00:49:47,360 --> 00:49:51,820 partitions, at least the algorithmic application. 656 00:49:51,820 --> 00:49:52,320 OK. 657 00:49:52,320 --> 00:49:53,420 Any questions? 658 00:49:56,240 --> 00:49:56,740 OK. 659 00:49:56,740 --> 00:49:58,150 So let's take a quick break. 660 00:49:58,150 --> 00:50:00,100 And then afterwards, I want to show 661 00:50:00,100 --> 00:50:03,160 you the proof of the weak regularity lemma. 662 00:50:05,730 --> 00:50:06,230 All right. 663 00:50:06,230 --> 00:50:12,560 So let me start the proof of the weak regularity lemma. 664 00:50:12,560 --> 00:50:14,775 And the proof is by this energy increment argument. 665 00:50:14,775 --> 00:50:16,400 So let's see what this energy increment 666 00:50:16,400 --> 00:50:19,700 argument looks like in the language of graphons. 667 00:50:19,700 --> 00:50:27,610 So energy now means L2, so L2 energy increment. 668 00:50:27,610 --> 00:50:29,600 So the statement of this lemma is 669 00:50:29,600 --> 00:50:42,230 that if you have w, a graphon, and p, a partition, of 0, 670 00:50:42,230 --> 00:50:46,740 comma, 1 interval such that-- 671 00:50:50,120 --> 00:50:51,260 always measurable pieces. 672 00:50:51,260 --> 00:50:52,300 I'm not going to even write it. 673 00:50:52,300 --> 00:50:53,592 It's always measurable pieces-- 674 00:50:57,320 --> 00:51:08,390 such that the difference between w and w averaged over steps p 675 00:51:08,390 --> 00:51:11,420 is bigger than epsilon. 676 00:51:11,420 --> 00:51:14,390 So this is the notion of being not epsilon 677 00:51:14,390 --> 00:51:22,280 regular in the weak sense, not weakly epsilon regular. 678 00:51:22,280 --> 00:51:33,100 Then there exists a refinement, p prime of p, 679 00:51:33,100 --> 00:51:45,430 dividing each part of p into at most four parts 680 00:51:45,430 --> 00:51:57,380 such that the true norm increases by more than epsilon 681 00:51:57,380 --> 00:52:02,040 squared under this refinement. 682 00:52:02,040 --> 00:52:04,110 So it should be similar. 683 00:52:04,110 --> 00:52:06,450 It should be familiar to you, because we have similar 684 00:52:06,450 --> 00:52:09,686 arguments from Szemerédi's regularity lemma. 685 00:52:09,686 --> 00:52:10,644 So let's see the proof. 686 00:52:13,490 --> 00:52:18,380 Because you have violation of weak epsilon regularity, 687 00:52:18,380 --> 00:52:23,250 there exists sets S and T, measurable subsets of 0, 688 00:52:23,250 --> 00:52:29,510 1 interval, such that this integral evaluated over S 689 00:52:29,510 --> 00:52:39,140 cross T is more than epsilon in absolute value. 690 00:52:39,140 --> 00:52:55,690 So now let me take p prime to be the common refinement of p 691 00:52:55,690 --> 00:53:07,890 by introducing S and T into this partition. 692 00:53:07,890 --> 00:53:10,900 So throw S and T in and break everything 693 00:53:10,900 --> 00:53:12,930 according to S and T. 694 00:53:12,930 --> 00:53:21,140 And so each part becomes at most four subparts. 695 00:53:21,140 --> 00:53:22,960 So that's the at most four subparts. 696 00:53:25,780 --> 00:53:29,060 I now need to show that I have an energy increment. 697 00:53:29,060 --> 00:53:33,340 And to do this, let me first perform 698 00:53:33,340 --> 00:53:36,530 the following calculation. 699 00:53:36,530 --> 00:53:41,590 So remember, this symbol here is the inner product 700 00:53:41,590 --> 00:53:44,230 obtained by multiplying and integrating 701 00:53:44,230 --> 00:53:46,890 over the entire box. 702 00:53:46,890 --> 00:53:52,400 I claim that that inner product equals 703 00:53:52,400 --> 00:54:00,790 to the inner product between wp and wp prime, 704 00:54:00,790 --> 00:54:08,580 because what happens here is we are looking at a situation 705 00:54:08,580 --> 00:54:15,510 where wp prime is constant on each part. 706 00:54:15,510 --> 00:54:20,920 So when I do this inner product, I can replace w by its average. 707 00:54:20,920 --> 00:54:23,810 And likewise, over here, I can also replace it by its average. 708 00:54:23,810 --> 00:54:26,990 And you end up having the same average. 709 00:54:26,990 --> 00:54:33,340 And these two averages are both just what happens 710 00:54:33,340 --> 00:54:35,170 if you do stepping by p. 711 00:54:38,440 --> 00:54:48,380 You also have that w has inner product with 1 sub S cross T 712 00:54:48,380 --> 00:54:54,780 the same as that of p prime by the same reason, 713 00:54:54,780 --> 00:55:00,790 because over S cross T. So S cross 714 00:55:00,790 --> 00:55:06,000 T is a union of the parts of p prime. 715 00:55:06,000 --> 00:55:16,360 So S is union of parts of p prime. 716 00:55:16,360 --> 00:55:16,860 OK. 717 00:55:16,860 --> 00:55:18,140 So let's see. 718 00:55:18,140 --> 00:55:21,770 With those observations, you find that-- 719 00:55:30,580 --> 00:55:33,580 so this is true. 720 00:55:33,580 --> 00:55:35,870 This is from the first equality. 721 00:55:35,870 --> 00:55:40,795 So now let me draw you a right triangle. 722 00:55:49,890 --> 00:55:51,840 So you have a right angle, because you have 723 00:55:51,840 --> 00:55:54,450 an inner product that is 0. 724 00:55:54,450 --> 00:56:04,530 So by Pythagorean theorem, so what is this hypotenuse? 725 00:56:04,530 --> 00:56:06,520 So you add these two vectors. 726 00:56:06,520 --> 00:56:14,060 And you find out this wp prime. 727 00:56:14,060 --> 00:56:16,010 So by Pythagorean theorem, you find 728 00:56:16,010 --> 00:56:20,540 that the L2 norm of wp prime equals 729 00:56:20,540 --> 00:56:33,990 to the L2 norm of the sum of the L2 norm squares of the two 730 00:56:33,990 --> 00:56:35,910 legs of this right triangle. 731 00:56:43,420 --> 00:56:48,153 On the other hand, this quantity here. 732 00:56:48,153 --> 00:56:50,070 So let's think about that quantity over there. 733 00:56:52,810 --> 00:56:54,420 It's an L2 norm. 734 00:56:54,420 --> 00:57:16,580 So in particular, it is at least this quantity here, 735 00:57:16,580 --> 00:57:20,180 which you can derive in one of many ways-- 736 00:57:20,180 --> 00:57:25,840 for example, by Cauchy-Schwarz inequality or go from L2 to L1 737 00:57:25,840 --> 00:57:28,330 and then pass down to L1. 738 00:57:28,330 --> 00:57:31,890 So this is true. 739 00:57:31,890 --> 00:57:33,687 So let's say by Cauchy-Schwarz. 740 00:57:49,570 --> 00:57:55,580 But this quantity here, we said was bigger than epsilon. 741 00:58:04,690 --> 00:58:12,180 So as a result, this final quantity, 742 00:58:12,180 --> 00:58:17,300 this L2 norm of the new refinement, 743 00:58:17,300 --> 00:58:20,540 increases from the previous one by more than epsilon squared. 744 00:58:24,620 --> 00:58:25,880 OK. 745 00:58:25,880 --> 00:58:27,910 So this is the L2 energy increment argument. 746 00:58:27,910 --> 00:58:29,870 I claim it's the same argument, basically, 747 00:58:29,870 --> 00:58:32,480 as the one that we did for Szemerédi's regularity lemma. 748 00:58:32,480 --> 00:58:34,700 And I encourage you to go back and compare them 749 00:58:34,700 --> 00:58:36,200 to see why they're the same. 750 00:58:40,280 --> 00:58:41,360 All right, moving on. 751 00:58:41,360 --> 00:58:45,230 So the other part of regularity lemma 752 00:58:45,230 --> 00:58:48,820 is to iterate this approach. 753 00:58:48,820 --> 00:58:51,980 So if you have something which is not epsilon regular, 754 00:58:51,980 --> 00:58:52,790 refine it. 755 00:58:52,790 --> 00:58:53,960 And then iterate. 756 00:58:53,960 --> 00:58:58,820 And you cannot perceive more than a bounded number of times, 757 00:58:58,820 --> 00:59:02,390 because energy is always bounded between 0 and 1. 758 00:59:02,390 --> 00:59:09,260 So for every epsilon bigger than 0 and graphon w, 759 00:59:09,260 --> 00:59:17,210 suppose you have P0, a partition of 0, 1 interval 760 00:59:17,210 --> 00:59:19,960 into measurable sets. 761 00:59:19,960 --> 00:59:38,280 Then there exists a partition p that cuts up each part of P0 762 00:59:38,280 --> 00:59:47,460 into at most 4 to the 1 over epsilon parts 763 00:59:47,460 --> 00:59:55,920 such that w minus w sub p is at most epsilon. 764 00:59:55,920 --> 00:59:59,620 So I'm basically restating the weak regularity lemma 765 00:59:59,620 --> 01:00:03,630 over there but with a small difference, which 766 01:00:03,630 --> 01:00:07,020 will become useful later on when we prove compactness. 767 01:00:07,020 --> 01:00:09,645 Namely, I'm allowed to start with any partition. 768 01:00:09,645 --> 01:00:11,520 Instead of starting with a trivial partition, 769 01:00:11,520 --> 01:00:14,100 I can start with any partition. 770 01:00:14,100 --> 01:00:16,382 This was also true when we were talking about 771 01:00:16,382 --> 01:00:18,840 Szemerédi's regularity lemma, although I didn't stress that 772 01:00:18,840 --> 01:00:20,320 point. 773 01:00:20,320 --> 01:00:21,858 That's certainly the case here. 774 01:00:21,858 --> 01:00:23,400 I mean, the proof is exactly the same 775 01:00:23,400 --> 01:00:26,250 with or without this extra. 776 01:00:26,250 --> 01:00:30,780 This extra P0 really plays an insignificant role. 777 01:00:30,780 --> 01:00:34,520 What happens, as in the proof of Szemerédi's regularity lemma, 778 01:00:34,520 --> 01:00:42,770 is that we repeatedly apply the previous lemma to obtain 779 01:00:42,770 --> 01:00:56,040 the sequence of partitions of the 0, 1 interval where, 780 01:00:56,040 --> 01:01:10,160 each step, either we find that we obtain some partition p sub 781 01:01:10,160 --> 01:01:15,790 i such that it's a good approximation of w, 782 01:01:15,790 --> 01:01:33,150 in which case we stop, or the L2 energy increases by more than 783 01:01:33,150 --> 01:01:34,750 epsilon squared. 784 01:01:40,630 --> 01:01:49,890 And since the final energy is always at most 1-- 785 01:01:49,890 --> 01:01:52,620 so it's always bounded between 0 and 1-- 786 01:01:52,620 --> 01:02:01,060 we must stop after at most 1 over epsilon steps. 787 01:02:06,460 --> 01:02:14,160 And if you calculate the number of parts, 788 01:02:14,160 --> 01:02:20,165 each part is subdivided into at most four parts 789 01:02:20,165 --> 01:02:26,780 at each step, which gives you the conclusion 790 01:02:26,780 --> 01:02:29,580 on the final number of parts. 791 01:02:29,580 --> 01:02:31,640 OK, so very similar to what we did before. 792 01:02:35,780 --> 01:02:36,720 All right. 793 01:02:36,720 --> 01:02:41,850 So that concludes the discussion of the weak regularity lemma. 794 01:02:41,850 --> 01:02:44,360 So basically the same proof. 795 01:02:44,360 --> 01:02:48,403 Weaker conclusion and better quantitative balance. 796 01:02:48,403 --> 01:02:50,820 The next thing and the final thing I want to discuss today 797 01:02:50,820 --> 01:02:55,140 is a new ingredient which we haven't seen before 798 01:02:55,140 --> 01:02:58,110 but that will play an important role in the proof 799 01:02:58,110 --> 01:02:59,580 of the compactness-- 800 01:02:59,580 --> 01:03:03,160 in particular, the proof of the existence of the limit. 801 01:03:03,160 --> 01:03:08,280 And this is something where I need to discuss martingales. 802 01:03:12,410 --> 01:03:15,010 So martingale gill is an important object 803 01:03:15,010 --> 01:03:16,555 in probability theory. 804 01:03:16,555 --> 01:03:18,865 And it's a random sequence. 805 01:03:23,620 --> 01:03:28,350 So we'll look at discrete sequences, so indexed 806 01:03:28,350 --> 01:03:32,010 by non-negative integers. 807 01:03:32,010 --> 01:03:36,620 And is martingale is such a sequence where 808 01:03:36,620 --> 01:03:43,330 if I'm interested in the expectation of the next term 809 01:03:43,330 --> 01:03:47,720 and even if you know all the previous terms-- 810 01:03:47,720 --> 01:03:51,530 so you have full knowledge of the sequence before time n, 811 01:03:51,530 --> 01:03:55,000 and you want to predict on the expectation what 812 01:03:55,000 --> 01:03:56,440 the nth term is-- 813 01:03:56,440 --> 01:04:04,830 then you cannot do better than simply predicting the last term 814 01:04:04,830 --> 01:04:06,800 that you saw. 815 01:04:06,800 --> 01:04:11,000 So this is the definition of a martingale. 816 01:04:11,000 --> 01:04:13,730 Now, to do this formally, I need to talk 817 01:04:13,730 --> 01:04:18,080 about filtrations and what not in measured theory. 818 01:04:18,080 --> 01:04:20,702 But let me not do that. 819 01:04:20,702 --> 01:04:22,160 OK, so this is how you should think 820 01:04:22,160 --> 01:04:25,960 about martingales and a couple of important examples 821 01:04:25,960 --> 01:04:27,080 of martingales. 822 01:04:27,080 --> 01:04:31,670 So the first one comes from-- the reason 823 01:04:31,670 --> 01:04:35,000 why these things are called martingales is that there 824 01:04:35,000 --> 01:04:37,280 is a gambling strategy which is related 825 01:04:37,280 --> 01:04:44,720 to such a sequence where let's say you consider 826 01:04:44,720 --> 01:04:48,130 a sequence of fair coin tosses. 827 01:04:48,130 --> 01:04:50,500 So here's what we're going to do. 828 01:04:50,500 --> 01:04:53,850 So suppose we consider a betting strategy. 829 01:05:03,240 --> 01:05:17,080 And x sub n is equal to your balance time n. 830 01:05:17,080 --> 01:05:21,120 And suppose that we're looking at a fair casino 831 01:05:21,120 --> 01:05:27,700 where the expectation of every game is exactly 0. 832 01:05:27,700 --> 01:05:31,260 Then this is a martingale. 833 01:05:31,260 --> 01:05:33,000 So imagine you have a sequence of coin 834 01:05:33,000 --> 01:05:38,600 flips, and you win $1 for each head and lose $1 for each tail. 835 01:05:38,600 --> 01:05:42,050 When you're at time five, you should have $2 in your pocket. 836 01:05:42,050 --> 01:05:46,420 Then time five plus 1, you expect 837 01:05:46,420 --> 01:05:48,505 to also have that many dollars. 838 01:05:48,505 --> 01:05:49,130 It might go up. 839 01:05:49,130 --> 01:05:49,838 It might go down. 840 01:05:49,838 --> 01:05:52,726 But in expectation, it doesn't change. 841 01:05:52,726 --> 01:05:53,670 Is there a question? 842 01:05:56,220 --> 01:05:56,720 OK. 843 01:05:56,720 --> 01:05:59,778 So they're asking about, is there some independence 844 01:05:59,778 --> 01:06:00,570 condition required? 845 01:06:00,570 --> 01:06:02,190 And the answer is no. 846 01:06:02,190 --> 01:06:04,540 So there's no independence condition that is required. 847 01:06:04,540 --> 01:06:06,270 So the definition of a martingale 848 01:06:06,270 --> 01:06:10,020 is just if, even with complete knowledge of the sequence up 849 01:06:10,020 --> 01:06:14,130 to a certain point, the difference going forward 850 01:06:14,130 --> 01:06:16,565 is 0 in expectation. 851 01:06:22,570 --> 01:06:27,970 OK, so here's another example of a martingale, 852 01:06:27,970 --> 01:06:31,490 which actually turns out to be more relevant to our use-- 853 01:06:34,250 --> 01:06:39,980 namely, that if I have some hidden-- 854 01:06:39,980 --> 01:06:44,540 think of x as some hidden random variable, so something 855 01:06:44,540 --> 01:06:46,880 that you have no idea. 856 01:06:46,880 --> 01:06:56,170 But you can observe it at time n based 857 01:06:56,170 --> 01:07:07,100 on information up to time n. 858 01:07:11,380 --> 01:07:16,600 So for example, suppose you have no idea who 859 01:07:16,600 --> 01:07:21,910 is going to win the presidential election. 860 01:07:21,910 --> 01:07:24,550 And really, nobody has any idea. 861 01:07:24,550 --> 01:07:28,990 But as time proceeds, you make an educated guess 862 01:07:28,990 --> 01:07:30,790 based on the information that you have, 863 01:07:30,790 --> 01:07:33,890 all the information you have up to that point. 864 01:07:33,890 --> 01:07:36,590 And that information becomes a larger and larger set 865 01:07:36,590 --> 01:07:38,420 as time moves forward. 866 01:07:38,420 --> 01:07:41,090 Your prediction is going to be a random variable that 867 01:07:41,090 --> 01:07:43,790 goes up and down. 868 01:07:43,790 --> 01:07:48,120 And that will be a martingale, because-- 869 01:07:48,120 --> 01:07:52,980 so how I predict today based on what 870 01:07:52,980 --> 01:07:56,660 are all the possibilities happening going forward, 871 01:07:56,660 --> 01:08:00,300 well, one of many things could happen. 872 01:08:00,300 --> 01:08:05,810 But if I knew that my prediction is going to, in expectation, 873 01:08:05,810 --> 01:08:08,390 shift upwards, then I shouldn't have 874 01:08:08,390 --> 01:08:09,710 predicted what I predict today. 875 01:08:09,710 --> 01:08:13,300 I should have predicted upwards anyway. 876 01:08:13,300 --> 01:08:13,800 OK. 877 01:08:13,800 --> 01:08:19,819 So this is another construction of martingales. 878 01:08:19,819 --> 01:08:21,410 So this also comes up. 879 01:08:21,410 --> 01:08:26,120 You could have other more pure mathematics-type explanations, 880 01:08:26,120 --> 01:08:29,930 where suppose I want to know what 881 01:08:29,930 --> 01:08:34,490 is the chromatic number of a random graph. 882 01:08:34,490 --> 01:08:38,960 And I show you that graph one edge at a time. 883 01:08:38,960 --> 01:08:41,270 You can predict the expectation. 884 01:08:41,270 --> 01:08:44,540 You can find the expectation of this graph's statistic 885 01:08:44,540 --> 01:08:47,630 based on what you've seen up to time n. 886 01:08:47,630 --> 01:08:51,979 And that sequence will be a martingale. 887 01:08:51,979 --> 01:08:56,149 An important property of a martingale, 888 01:08:56,149 --> 01:08:59,990 which is known as the martingale convergence theorem-- 889 01:08:59,990 --> 01:09:06,740 and so that's what we'll need for the proof of the existence 890 01:09:06,740 --> 01:09:07,790 of the limit next time-- 891 01:09:15,689 --> 01:09:20,359 says that every bounded martingale-- 892 01:09:23,649 --> 01:09:27,229 so for example, suppose your martingale only 893 01:09:27,229 --> 01:09:29,590 takes values between 0 and 1. 894 01:09:29,590 --> 01:09:33,500 So every bounded martingale converges almost surely. 895 01:09:42,870 --> 01:09:46,715 You cannot have a martingale which you expect to constantly 896 01:09:46,715 --> 01:09:47,340 go up and down. 897 01:09:53,040 --> 01:09:56,170 So I want to show you a proof of this fact. 898 01:09:56,170 --> 01:09:59,090 Let me just mention that the bounded condition is 899 01:09:59,090 --> 01:10:01,490 a little bit stronger than what we actually need. 900 01:10:01,490 --> 01:10:03,470 From the proof, you'll see that you really only 901 01:10:03,470 --> 01:10:08,010 need them to be L1 bounded. 902 01:10:08,010 --> 01:10:10,360 It's enough. 903 01:10:10,360 --> 01:10:12,190 And more generally, there is a condition 904 01:10:12,190 --> 01:10:19,380 called uniform integrability, which I won't explain. 905 01:10:22,368 --> 01:10:23,364 All right. 906 01:10:26,120 --> 01:10:26,620 OK. 907 01:10:26,620 --> 01:10:29,250 So let me show you a proof of the martingale convergence 908 01:10:29,250 --> 01:10:29,750 theorem. 909 01:10:29,750 --> 01:10:33,520 And I'm going to be somewhat informal and somewhat cavalier, 910 01:10:33,520 --> 01:10:35,650 because I don't want to get into some 911 01:10:35,650 --> 01:10:38,550 of the fine details of probability theory. 912 01:10:38,550 --> 01:10:43,840 But if you have taken something like 18.675 probability theory, 913 01:10:43,840 --> 01:10:45,520 then you can fill in all those details. 914 01:10:48,580 --> 01:10:50,290 So I like this proof, because it's 915 01:10:50,290 --> 01:10:51,580 kind of a proof by gambling. 916 01:10:56,680 --> 01:11:00,070 So I want to tell you a story which should convince you that 917 01:11:00,070 --> 01:11:04,380 a martingale cannot keep going up and down. 918 01:11:04,380 --> 01:11:06,120 It must converge almost surely. 919 01:11:08,640 --> 01:11:15,970 So suppose x sub n doesn't converge. 920 01:11:19,863 --> 01:11:21,280 OK, so this is why I say I'm going 921 01:11:21,280 --> 01:11:23,040 to be somewhat cavalier with probability theory. 922 01:11:23,040 --> 01:11:24,680 So when I say this doesn't converge, 923 01:11:24,680 --> 01:11:28,060 I mean a specific instance of the sequence doesn't converge 924 01:11:28,060 --> 01:11:30,050 or some specific realization. 925 01:11:30,050 --> 01:11:39,490 If it doesn't converge, then there exists a and b, 926 01:11:39,490 --> 01:11:50,740 both rational numbers between 0 and 1, such that the sequence 927 01:11:50,740 --> 01:11:59,040 crosses the interval a, b infinitely many times. 928 01:12:06,040 --> 01:12:11,060 So by crossing this interval, what I mean is the following. 929 01:12:19,510 --> 01:12:20,010 OK. 930 01:12:20,010 --> 01:12:23,140 So there's an important picture which 931 01:12:23,140 --> 01:12:25,900 will help a lot in understanding this theorem. 932 01:12:31,550 --> 01:12:41,300 So imagine I have this time n, and I have a and b. 933 01:12:41,300 --> 01:12:43,130 So I have this martingale. 934 01:12:43,130 --> 01:12:55,850 It's realization curve will be like that. 935 01:12:55,850 --> 01:12:58,390 So that's an instance of this martingale. 936 01:12:58,390 --> 01:13:03,950 And by crossing, I mean a sequence that-- 937 01:13:03,950 --> 01:13:07,390 OK, so here's what I mean by crossing. 938 01:13:07,390 --> 01:13:15,192 I start below a and-- 939 01:13:15,192 --> 01:13:16,400 let me use a different color. 940 01:13:19,170 --> 01:13:26,320 So I start below a, and I go above b and then wait 941 01:13:26,320 --> 01:13:30,430 until I come back below a. 942 01:13:30,430 --> 01:13:32,740 And I go above b. 943 01:13:32,740 --> 01:13:36,040 Wait until I come back. 944 01:13:36,040 --> 01:13:37,500 So do like that. 945 01:13:45,592 --> 01:13:46,558 Like that. 946 01:13:52,860 --> 01:13:57,900 So I start below a until the first time I go above b. 947 01:13:57,900 --> 01:13:59,700 And then I stop that sequence. 948 01:13:59,700 --> 01:14:05,705 So those are the upcrossings of this martingale. 949 01:14:12,980 --> 01:14:15,960 So upcrossing is when you start below a, 950 01:14:15,960 --> 01:14:18,720 and then you end up above b. 951 01:14:18,720 --> 01:14:26,040 So if you don't converge, then there exists such a 952 01:14:26,040 --> 01:14:30,360 and b such that there are infinitely many such crossings. 953 01:14:30,360 --> 01:14:32,950 So this is just a fact. 954 01:14:32,950 --> 01:14:36,910 It's not hard to see. 955 01:14:36,910 --> 01:14:40,000 And what we'll show is that this doesn't happen 956 01:14:40,000 --> 01:14:42,280 except with probability 0. 957 01:14:42,280 --> 01:14:53,330 So we'll show that this occurs with probability 0. 958 01:14:55,950 --> 01:15:02,930 And because there are only countably many 959 01:15:02,930 --> 01:15:11,690 rational numbers, we find that x sub n converges 960 01:15:11,690 --> 01:15:13,000 with probability 1. 961 01:15:22,440 --> 01:15:23,630 So these are upcrossings. 962 01:15:23,630 --> 01:15:25,920 So I didn't define it, but hopefully you 963 01:15:25,920 --> 01:15:29,160 understood from my picture and my description. 964 01:15:29,160 --> 01:15:36,270 And let me define by u sub n to be 965 01:15:36,270 --> 01:15:44,620 the number of upcrossings up to time 966 01:15:44,620 --> 01:15:53,207 n, so the number of such upcrossings. 967 01:15:55,950 --> 01:15:58,205 Now let me consider a betting strategy. 968 01:16:05,790 --> 01:16:07,770 Basically, I want to make money. 969 01:16:07,770 --> 01:16:15,290 And I want to make money by following these upcrossings. 970 01:16:15,290 --> 01:16:15,790 OK. 971 01:16:15,790 --> 01:16:20,050 So every time you give me a number and-- 972 01:16:20,050 --> 01:16:21,710 so think of this as the stock market. 973 01:16:21,710 --> 01:16:26,647 So it's a fair stock market where you tell me the price, 974 01:16:26,647 --> 01:16:28,230 and I get to decide, do I want to buy? 975 01:16:28,230 --> 01:16:31,070 Or do I want to sell? 976 01:16:31,070 --> 01:16:45,720 So consider the betting strategy where at any time, 977 01:16:45,720 --> 01:16:54,530 we're going to hold either 0 or 1 share of the stock, which 978 01:16:54,530 --> 01:16:57,590 has these moving prices. 979 01:16:57,590 --> 01:17:07,980 And what we're going to do is if xn is less than a, 980 01:17:07,980 --> 01:17:12,060 is less than the lower bound, then we're 981 01:17:12,060 --> 01:17:27,890 going to buy and hold, meaning 1, until the first time 982 01:17:27,890 --> 01:17:42,450 that the price reaches above b and then 983 01:17:42,450 --> 01:17:48,052 sell as soon as the first time we see the price goes above b. 984 01:17:50,950 --> 01:17:52,900 So this is the betting strategy. 985 01:17:52,900 --> 01:17:54,960 And it's something which you can implement. 986 01:17:54,960 --> 01:17:57,030 If you see a sequence of prices, you 987 01:17:57,030 --> 01:17:59,130 can implement this strategy. 988 01:17:59,130 --> 01:18:03,000 And you already hopefully see, if you have many upcrossings, 989 01:18:03,000 --> 01:18:05,310 then each upcrossing, you make money. 990 01:18:05,310 --> 01:18:07,620 Each upcrossing, you make money. 991 01:18:07,620 --> 01:18:09,880 And this is almost too good to be true. 992 01:18:09,880 --> 01:18:15,160 And in fact, we see that the total gain from this strategy-- 993 01:18:15,160 --> 01:18:17,300 so if you start with some balance, what 994 01:18:17,300 --> 01:18:18,460 you get at the end-- 995 01:18:18,460 --> 01:18:22,750 is at least this difference from a 996 01:18:22,750 --> 01:18:27,452 to b times the number of upcrossings. 997 01:18:31,270 --> 01:18:33,610 You might start somewhere. 998 01:18:33,610 --> 01:18:35,790 You buy, and then you just lose everything. 999 01:18:35,790 --> 01:18:38,840 So there might be an initial cost. 1000 01:18:38,840 --> 01:18:42,400 And that cost is bounded, because we start 1001 01:18:42,400 --> 01:18:44,680 with a bounded martingale. 1002 01:18:44,680 --> 01:18:52,780 So suppose the martingale is always between 0 and 1. 1003 01:18:52,780 --> 01:18:54,915 We start with a bounded martingale. 1004 01:18:57,530 --> 01:19:01,730 But on the other hand, there is a theorem 1005 01:19:01,730 --> 01:19:04,670 about martingales, which is not hard to deduce 1006 01:19:04,670 --> 01:19:07,700 from the definition, that no matter what the betting 1007 01:19:07,700 --> 01:19:11,150 strategy is, the gain at any particular time 1008 01:19:11,150 --> 01:19:13,580 must be 0 in expectation. 1009 01:19:16,940 --> 01:19:19,240 So this is just the property of the martingale. 1010 01:19:19,240 --> 01:19:24,190 So 0 equals the expected gain, which 1011 01:19:24,190 --> 01:19:27,520 is at least b minus a times the expected number 1012 01:19:27,520 --> 01:19:30,630 of upcrossings minus 1. 1013 01:19:30,630 --> 01:19:35,430 And thus the expected number of upcrossings up to time n 1014 01:19:35,430 --> 01:19:41,600 is at most 1 over b minus a. 1015 01:19:41,600 --> 01:19:47,140 Now, we let n go to infinity. 1016 01:19:47,140 --> 01:19:57,780 And let u sub infinity be the total number of upcrossings. 1017 01:20:02,030 --> 01:20:17,430 By the monotone convergence theorem in this limit, 1018 01:20:17,430 --> 01:20:20,310 the limit of these u sub n's, it can never go down. 1019 01:20:20,310 --> 01:20:23,740 It's always weakly increasing. 1020 01:20:23,740 --> 01:20:28,020 It converges to the expectation of the total number 1021 01:20:28,020 --> 01:20:29,232 of upcrossings. 1022 01:20:29,232 --> 01:20:31,440 So now, in particular, you know that the total number 1023 01:20:31,440 --> 01:20:38,120 of upcrossings is at most some finite number. 1024 01:20:38,120 --> 01:20:40,300 So in particular, the probability 1025 01:20:40,300 --> 01:20:45,630 that you have infinitely many crossings is 0. 1026 01:20:45,630 --> 01:20:50,330 So with probability 0, you cross infinitely many times, 1027 01:20:50,330 --> 01:20:52,880 which proves the claim over there 1028 01:20:52,880 --> 01:20:54,870 and which concludes the proof of the claim 1029 01:20:54,870 --> 01:20:58,535 that the martingale converges almost surely. 1030 01:20:58,535 --> 01:21:00,660 OK, so that proves the martingale converge theorem. 1031 01:21:00,660 --> 01:21:02,430 So next time, we'll combine everything 1032 01:21:02,430 --> 01:21:05,640 that we did today to prove the three main theorems that we 1033 01:21:05,640 --> 01:21:09,230 stated last time on graph limits.