1 00:00:01,550 --> 00:00:03,920 The following content is provided under a Creative 2 00:00:03,920 --> 00:00:05,310 Commons license. 3 00:00:05,310 --> 00:00:07,520 Your support will help MIT OpenCourseWare 4 00:00:07,520 --> 00:00:11,610 continue to offer high-quality educational resources for free. 5 00:00:11,610 --> 00:00:14,180 To make a donation or to view additional materials 6 00:00:14,180 --> 00:00:18,140 from hundreds of MIT courses, visit MIT OpenCourseWare 7 00:00:18,140 --> 00:00:19,026 at ocw.mit.edu. 8 00:00:22,830 --> 00:00:25,220 GILBERT STRANG: So I've mentioned 9 00:00:25,220 --> 00:00:27,890 randomized linear algebra a few times, 10 00:00:27,890 --> 00:00:33,720 and I thought, OK, I'm going to jump in and describe 11 00:00:33,720 --> 00:00:36,390 randomized matrix multiplication. 12 00:00:36,390 --> 00:00:40,710 It's a pretty cool idea, it seems to me. 13 00:00:40,710 --> 00:00:45,640 So this is a topic within randomized linear algebra. 14 00:00:45,640 --> 00:00:48,180 And when would we be doing any of this? 15 00:00:48,180 --> 00:00:53,320 It would be for matrices that are just really, really large. 16 00:00:53,320 --> 00:01:00,090 So we plan to sample the columns of A and sample 17 00:01:00,090 --> 00:01:03,210 the corresponding rows of B, so actually 18 00:01:03,210 --> 00:01:07,530 that when we decide on a column, we've also decided on a row. 19 00:01:07,530 --> 00:01:11,460 So we're taking those pieces, which 20 00:01:11,460 --> 00:01:15,210 do correctly add up to AB, but we're not 21 00:01:15,210 --> 00:01:17,040 going to take them all. 22 00:01:17,040 --> 00:01:20,490 We're going to take different ones, 23 00:01:20,490 --> 00:01:23,940 randomly sampled with probabilities-- 24 00:01:23,940 --> 00:01:26,700 we have to decide probabilities-- 25 00:01:26,700 --> 00:01:33,840 and then we'll add up our samples, 26 00:01:33,840 --> 00:01:38,520 and we hope that the result is close to AB. 27 00:01:38,520 --> 00:01:40,570 That's the idea. 28 00:01:40,570 --> 00:01:48,510 OK, so this lecture then, so I wrote these pages 29 00:01:48,510 --> 00:01:51,900 about six months ago. 30 00:01:51,900 --> 00:01:54,600 So I've been desperately trying to remember 31 00:01:54,600 --> 00:01:58,650 what I wrote, because it's not a subject I have ever spoken 32 00:01:58,650 --> 00:02:01,590 about before, but it's so neat. 33 00:02:01,590 --> 00:02:04,770 So here are some of the things that come into it. 34 00:02:04,770 --> 00:02:08,100 We have to decide on probabilities. 35 00:02:08,100 --> 00:02:11,540 Then we want to compute the mean. 36 00:02:11,540 --> 00:02:15,150 So it's our first day with some of these key ideas 37 00:02:15,150 --> 00:02:18,300 from statistics and probability. 38 00:02:18,300 --> 00:02:21,730 So we're going to take probabilities that add to 1. 39 00:02:21,730 --> 00:02:24,390 We're going to figure out what's the mean value 40 00:02:24,390 --> 00:02:28,230 of our random AB. 41 00:02:28,230 --> 00:02:34,170 We hope, and we will see that the mean value of the random AB 42 00:02:34,170 --> 00:02:37,720 is correct AB. 43 00:02:37,720 --> 00:02:41,380 But there will be a variance. 44 00:02:41,380 --> 00:02:43,090 Every sample won't be correct. 45 00:02:43,090 --> 00:02:45,760 In fact, no samples will be correct. 46 00:02:45,760 --> 00:02:49,840 Only when we add them up on the average, they're correct, 47 00:02:49,840 --> 00:02:53,500 and we get to correct AB. 48 00:02:53,500 --> 00:02:55,620 So the mean will come out right. 49 00:02:55,620 --> 00:02:57,455 It will come out as AB. 50 00:02:57,455 --> 00:02:58,330 You'll see it happen. 51 00:03:02,065 --> 00:03:02,565 Correct. 52 00:03:06,380 --> 00:03:10,160 But there's a big variance, not zero. 53 00:03:13,430 --> 00:03:17,060 We'll be all over the place with our samples. 54 00:03:17,060 --> 00:03:20,390 They'll just average out right, but they'll be all over. 55 00:03:20,390 --> 00:03:24,020 No particular sample will be right at all. 56 00:03:24,020 --> 00:03:27,860 So then we want to pick the best probabilities. 57 00:03:31,050 --> 00:03:35,990 So our job will be, once we see how the system works, 58 00:03:35,990 --> 00:03:38,930 we're going to assign probabilities. 59 00:03:38,930 --> 00:03:41,890 And we're going to choose the probabilities that 60 00:03:41,890 --> 00:03:44,360 minimize the variance. 61 00:03:44,360 --> 00:03:47,870 So this is a typical situation where 62 00:03:47,870 --> 00:03:50,780 the mean is pretty straightforward 63 00:03:50,780 --> 00:03:54,200 and does what you want, but having 64 00:03:54,200 --> 00:03:58,295 the correct mean does not mean you've got good answers at all. 65 00:03:58,295 --> 00:04:01,400 And the average of, you know, like 66 00:04:01,400 --> 00:04:05,630 minus 100 and 100 might be the correct answer is zero, 67 00:04:05,630 --> 00:04:07,750 but you're way off. 68 00:04:07,750 --> 00:04:11,893 And this measures how far you are. 69 00:04:11,893 --> 00:04:13,560 So I don't know if you know these words. 70 00:04:13,560 --> 00:04:19,070 It's unfortunate, but I guess 18.065 can't change it now, 71 00:04:19,070 --> 00:04:22,310 that the variance is written sigma squared. 72 00:04:22,310 --> 00:04:26,430 And we already have a good use for the Greek letter sigma, 73 00:04:26,430 --> 00:04:32,020 but today it has a different use for variance. 74 00:04:32,020 --> 00:04:36,460 And this-- Lagrange multipliers will come in near the end. 75 00:04:36,460 --> 00:04:39,930 So basically, let me do a practice 76 00:04:39,930 --> 00:04:43,770 example to recall what mean and variance 77 00:04:43,770 --> 00:04:45,360 and what those are about. 78 00:04:45,360 --> 00:04:49,680 So let me take a matrix that's 1 by 2. 79 00:04:49,680 --> 00:04:53,670 So my matrix is just going to be a b. 80 00:04:57,080 --> 00:05:05,240 OK, I'm going to sample that twice, 81 00:05:05,240 --> 00:05:08,160 and my rule for the two samples will be the same. 82 00:05:08,160 --> 00:05:14,570 They will be a identically distributed, totally identical. 83 00:05:14,570 --> 00:05:15,890 And what's my rule going to be? 84 00:05:15,890 --> 00:05:18,020 So this is like practice. 85 00:05:18,020 --> 00:05:20,810 My rule is going to be I take that column or that column 86 00:05:20,810 --> 00:05:22,370 with probabilities I have. 87 00:05:26,320 --> 00:05:27,235 And I do it twice. 88 00:05:30,200 --> 00:05:31,630 And I take the average. 89 00:05:31,630 --> 00:05:35,630 So I'm going to take probabilities are going to be 90 00:05:35,630 --> 00:05:39,800 1/2, 1/2 for the two columns. 91 00:05:39,800 --> 00:05:43,175 And I'm going to do s equals to 2 samples. 92 00:05:47,840 --> 00:05:50,150 And I'm going to add-- 93 00:05:50,150 --> 00:05:52,530 I'll weight them with-- 94 00:05:52,530 --> 00:06:02,400 and I'll take the average of the two samples. 95 00:06:02,400 --> 00:06:02,900 OK. 96 00:06:05,560 --> 00:06:11,690 And that will be my randomized matrix. 97 00:06:11,690 --> 00:06:14,730 OK, so could we compute the mean for the-- 98 00:06:14,730 --> 00:06:18,900 so I've described a randomized sampling process. 99 00:06:18,900 --> 00:06:21,870 I've given you the probabilities, 1/2 and 1/2, 100 00:06:21,870 --> 00:06:24,210 the number of times I'm going to do it, 101 00:06:24,210 --> 00:06:26,610 and then I divide by that number of times. 102 00:06:26,610 --> 00:06:28,230 So this is really-- 103 00:06:28,230 --> 00:06:32,850 I have a 1 over s here, because I've got s of these. 104 00:06:32,850 --> 00:06:38,060 And now-- so what are the possibilities here? 105 00:06:38,060 --> 00:06:39,310 I want to find the mean. 106 00:06:39,310 --> 00:06:43,530 First of all, let's practice with the mean. 107 00:06:43,530 --> 00:06:46,740 OK, so here are two-- 108 00:06:46,740 --> 00:06:49,770 I could think of two different ways to compute the mean. 109 00:06:49,770 --> 00:06:52,080 Let me start with this one. 110 00:06:52,080 --> 00:06:54,380 What is the mean, the average value-- 111 00:06:54,380 --> 00:06:56,550 mean means average value-- 112 00:06:56,550 --> 00:06:59,000 of the first sample? 113 00:06:59,000 --> 00:07:02,870 So the average value of the first sample, I would-- 114 00:07:02,870 --> 00:07:09,050 what is the mean, the general formula is you add up all 115 00:07:09,050 --> 00:07:15,960 the sample times it's-- 116 00:07:15,960 --> 00:07:18,710 the possible samples times their probabilities. 117 00:07:21,670 --> 00:07:24,270 And in this case, the probabilities 118 00:07:24,270 --> 00:07:30,690 are 1/2 that the sample is a, 0 and 1/2 119 00:07:30,690 --> 00:07:32,430 that the sample is 0, b. 120 00:07:35,920 --> 00:07:38,680 So those are my two samples. 121 00:07:38,680 --> 00:07:45,850 And computing the mean of the total, I get-- 122 00:07:45,850 --> 00:07:50,710 but then mean for each sample, but then I have to multiply, 123 00:07:50,710 --> 00:07:54,310 so let's put what I got here 1/2 of a, b. 124 00:07:59,090 --> 00:08:02,140 That was the meaning of each sample, 125 00:08:02,140 --> 00:08:05,650 because my probabilities were equal, 1/2 and 1/2. 126 00:08:05,650 --> 00:08:09,040 And now, I've got s equal 2 samples. 127 00:08:13,870 --> 00:08:24,480 So I multiply by 2, and I get a, b as the mean. 128 00:08:27,510 --> 00:08:28,675 Mean is correct. 129 00:08:36,844 --> 00:08:38,340 Good. 130 00:08:38,340 --> 00:08:40,530 We did the easy one, the mean. 131 00:08:40,530 --> 00:08:44,710 Now, practice with a variance, or else quit here. 132 00:08:44,710 --> 00:08:46,450 Maybe I should quit while I'm ahead. 133 00:08:46,450 --> 00:08:53,080 I've got the mean exactly right, but of course, the samples 134 00:08:53,080 --> 00:08:54,010 might not be right. 135 00:08:56,530 --> 00:08:58,690 So now for the variance. 136 00:08:58,690 --> 00:09:00,820 So what is variance? 137 00:09:00,820 --> 00:09:01,780 Do you remember that? 138 00:09:01,780 --> 00:09:04,960 There are actually two ways to compute variance. 139 00:09:04,960 --> 00:09:09,070 Let me just remember those over here and push that board up. 140 00:09:11,940 --> 00:09:13,740 So the variance sigma squared. 141 00:09:18,600 --> 00:09:20,280 Forgive me, if you're a statistician, 142 00:09:20,280 --> 00:09:25,170 this is like you were born knowing this. 143 00:09:25,170 --> 00:09:27,670 But the rest of us, we're not. 144 00:09:27,670 --> 00:09:33,850 So the variance is the sum-- 145 00:09:33,850 --> 00:09:38,080 one way to do it is add up the different probabilities 146 00:09:38,080 --> 00:09:45,190 of different things that could happen of output 147 00:09:45,190 --> 00:09:49,890 minus the mean squared. 148 00:09:49,890 --> 00:09:52,630 So it's the average-- 149 00:09:52,630 --> 00:10:02,030 it's the average distance squared from the mean. 150 00:10:04,860 --> 00:10:08,780 So it takes whatever output that came with output number 151 00:10:08,780 --> 00:10:13,430 i, minus the mean, which is the average output. 152 00:10:13,430 --> 00:10:16,510 I square those, and I get a number. 153 00:10:16,510 --> 00:10:19,350 And that sort of tells me how-- 154 00:10:19,350 --> 00:10:30,310 it tells me like in the famous Gaussian, 155 00:10:30,310 --> 00:10:32,410 if I had a Gaussian distribution here, 156 00:10:32,410 --> 00:10:34,840 I have a distribution of 1/2 and 1/2. 157 00:10:34,840 --> 00:10:38,170 So like that maybe even has a name, 158 00:10:38,170 --> 00:10:42,860 like binomial or something or Bernoulli or whatever. 159 00:10:42,860 --> 00:10:45,890 But here on this Gaussian that we all remember, 160 00:10:45,890 --> 00:10:53,180 can I mark what in that figure where the mean is? 161 00:10:53,180 --> 00:10:54,930 Right in the center. 162 00:10:54,930 --> 00:10:56,210 OK. 163 00:10:56,210 --> 00:10:58,600 Mean. 164 00:10:58,600 --> 00:11:00,910 And what is the variance? 165 00:11:00,910 --> 00:11:04,960 Just to recall what everybody in the first time 166 00:11:04,960 --> 00:11:08,290 may even hear that word variance, 167 00:11:08,290 --> 00:11:11,950 what is the variance kind of measuring? 168 00:11:11,950 --> 00:11:15,040 You're summing squares, so whether you're 169 00:11:15,040 --> 00:11:16,930 on the right of the mean or the left 170 00:11:16,930 --> 00:11:20,080 of the mean, no difference, because you're squaring it. 171 00:11:20,080 --> 00:11:22,620 And it's the distance. 172 00:11:22,620 --> 00:11:28,090 The variance would be sort of like a typical width. 173 00:11:28,090 --> 00:11:29,860 Maybe I overdid it. 174 00:11:29,860 --> 00:11:33,250 But that would be a sort of typical sigma. 175 00:11:35,860 --> 00:11:40,270 I'm really just-- since the words statistics, mean, 176 00:11:40,270 --> 00:11:42,970 and variance haven't been mentioned in 18.065 177 00:11:42,970 --> 00:11:46,000 until today, I'm just kind of recalling. 178 00:11:46,000 --> 00:11:52,840 OK, so now I'm prepared to compute this example. 179 00:11:52,840 --> 00:11:57,640 OK, maybe I'll-- maybe I'll compute it over here. 180 00:11:57,640 --> 00:12:03,710 OK, so shall I compute the variance for each sample, 181 00:12:03,710 --> 00:12:07,090 and then I'll multiply by 2, because I have two samples. 182 00:12:07,090 --> 00:12:09,610 So what are they-- 183 00:12:09,610 --> 00:12:12,390 so this is the sigma squared sample. 184 00:12:16,070 --> 00:12:19,250 Obviously, I could write down all the possibilities. 185 00:12:19,250 --> 00:12:21,530 Yeah, let me just do the sigma. 186 00:12:21,530 --> 00:12:28,820 So the sample could either have picked out a, 0 or 0, b. 187 00:12:28,820 --> 00:12:30,900 And the probabilities were a 1/2. 188 00:12:30,900 --> 00:12:39,080 So I have 1/2 times the probability times the output. 189 00:12:39,080 --> 00:12:47,810 Let's say the output is a, 0 minus the mean, 190 00:12:47,810 --> 00:12:51,460 which was a over 2, b over 2. 191 00:12:54,330 --> 00:12:56,800 And I want to square that. 192 00:12:56,800 --> 00:13:00,420 So that was one possibility when I picked a, 0, 193 00:13:00,420 --> 00:13:02,730 and the other one, which I'm also 194 00:13:02,730 --> 00:13:05,400 doing with probability 1/2, is in case 195 00:13:05,400 --> 00:13:08,280 I picked 0, b, what was-- 196 00:13:15,730 --> 00:13:19,980 you see, I'm not getting 0 for the variance, 197 00:13:19,980 --> 00:13:22,470 because I'm making an error every time. 198 00:13:22,470 --> 00:13:26,550 I'm never getting the correct a, 0 or the correct 0, 199 00:13:26,550 --> 00:13:31,110 b, because I'm always doing this one in the middle. 200 00:13:31,110 --> 00:13:36,600 Now, if I compute all that, I get a quantity, 201 00:13:36,600 --> 00:13:40,950 and maybe I'll just, to be on the safe side, 202 00:13:40,950 --> 00:13:42,280 ask your forgiveness. 203 00:13:42,280 --> 00:13:44,400 If I write the answer. 204 00:13:44,400 --> 00:13:47,280 And we could even try to get the answer, but-- 205 00:13:53,760 --> 00:13:57,550 so this is from two samples. 206 00:13:57,550 --> 00:13:58,810 So this is double that one. 207 00:14:06,530 --> 00:14:10,090 I guess I'm bold enough to try it. 208 00:14:10,090 --> 00:14:16,030 So a, 0, so that would be minus a over 2 and a b over 2. 209 00:14:16,030 --> 00:14:20,800 I think we got here 1/2 of-- 210 00:14:20,800 --> 00:14:33,380 I think-- looks to me like a over 2 squared plus b over-- 211 00:14:33,380 --> 00:14:39,862 I'm missing my plus or minus, but when I'm squaring them, 212 00:14:39,862 --> 00:14:41,320 that's the whole point of variance. 213 00:14:41,320 --> 00:14:42,590 Doesn't matter. 214 00:14:42,590 --> 00:14:45,340 And the b over 2. 215 00:14:45,340 --> 00:14:51,790 And here, I think I'm wrong by a over 2, 216 00:14:51,790 --> 00:14:57,160 and I'm wrong by b over 2 or minus a over 2. 217 00:14:57,160 --> 00:15:00,080 But when I square them again, doesn't matter. 218 00:15:00,080 --> 00:15:05,770 So I think I get another 1/2 of a over 2 squared 219 00:15:05,770 --> 00:15:07,180 plus b over 2 squared. 220 00:15:13,700 --> 00:15:17,360 Forgive me for this simple computation, 221 00:15:17,360 --> 00:15:19,670 but just to practice. 222 00:15:19,670 --> 00:15:20,900 So what have I got? 223 00:15:20,900 --> 00:15:22,550 I've got a 1/2 of that and 1/2 of that. 224 00:15:22,550 --> 00:15:26,260 So that adds up to this thing a squared over 4 225 00:15:26,260 --> 00:15:33,790 plus b squared over 4, but then I'm doing two samples. 226 00:15:33,790 --> 00:15:36,230 I have to multiply by the number of samples. 227 00:15:36,230 --> 00:15:39,880 So I think so times 2 for two samples. 228 00:15:39,880 --> 00:15:41,950 I think I'm getting-- 229 00:15:41,950 --> 00:15:49,220 it was 1/4, but now it will be 1/2 of a squared b squared. 230 00:15:49,220 --> 00:15:51,360 Yeah, I didn't-- yeah, yeah. 231 00:15:53,870 --> 00:15:56,450 I think that's right, but forgive me 232 00:15:56,450 --> 00:15:58,790 while I just ask myself. 233 00:15:58,790 --> 00:15:59,290 Yeah. 234 00:16:03,990 --> 00:16:07,010 This will be-- actually, you already have these notes. 235 00:16:07,010 --> 00:16:08,870 This is section 2.4. 236 00:16:08,870 --> 00:16:12,550 So I think it's there on Stellar. 237 00:16:12,550 --> 00:16:16,210 So what's the point of this? 238 00:16:16,210 --> 00:16:20,680 First point was to like remember some of the steps that 239 00:16:20,680 --> 00:16:23,660 go into the variance. 240 00:16:23,660 --> 00:16:25,640 Oh, there's another formula for variance 241 00:16:25,640 --> 00:16:27,310 and I want to tell you. 242 00:16:27,310 --> 00:16:31,650 And the second point is to bring in a new idea. 243 00:16:34,290 --> 00:16:36,720 Suppose we want to make this-- 244 00:16:36,720 --> 00:16:39,890 suppose this variance is bigger than we want. 245 00:16:39,890 --> 00:16:44,540 Suppose, for example, that b is a lot bigger than a. 246 00:16:44,540 --> 00:16:46,970 Suppose b is a lot bigger than a. 247 00:16:46,970 --> 00:16:49,130 Then what should we have done differently 248 00:16:49,130 --> 00:16:53,030 in this randomized linear algebra? 249 00:16:53,030 --> 00:17:01,940 If I'm trying to get this thing close, get close to that thing, 250 00:17:01,940 --> 00:17:04,849 and if b is a lot bigger than a, then what should 251 00:17:04,849 --> 00:17:06,460 I do differently? 252 00:17:09,280 --> 00:17:14,099 I don't know what b is exactly, but I have the information 253 00:17:14,099 --> 00:17:16,710 that it's bigger than a. 254 00:17:16,710 --> 00:17:18,810 Then I should increase the probability-- 255 00:17:18,810 --> 00:17:22,310 I shouldn't do half and half. 256 00:17:22,310 --> 00:17:28,440 So here was randomized sampling taking the average. 257 00:17:28,440 --> 00:17:32,070 My probabilities were a 1/2 and a 1/2. 258 00:17:36,560 --> 00:17:41,770 I believe that I could keep the mean correct. 259 00:17:41,770 --> 00:17:46,050 Of course, that's fundamental to get the mean right. 260 00:17:46,050 --> 00:17:49,860 And get a better answer, you get a smaller variance 261 00:17:49,860 --> 00:17:55,680 than that b squared over there by picking that thing 262 00:17:55,680 --> 00:17:57,210 with higher probability. 263 00:17:57,210 --> 00:18:02,270 So that's where the randomized-- 264 00:18:02,270 --> 00:18:07,390 it turns out to be called norm squared probability. 265 00:18:07,390 --> 00:18:16,310 The decision on what the probability should be goes-- 266 00:18:16,310 --> 00:18:19,010 it turns out to be the optimal one, 267 00:18:19,010 --> 00:18:22,470 goes with the square of the size. 268 00:18:22,470 --> 00:18:25,400 So if b is twice as big as a, and I 269 00:18:25,400 --> 00:18:29,540 want to get the variance down, then the probability-- 270 00:18:29,540 --> 00:18:32,630 I should use probabilities that are four times-- 271 00:18:35,750 --> 00:18:38,800 four times as often I will choose b than a. 272 00:18:38,800 --> 00:18:42,594 That's going to be the conclusion at 2 o'clock, 273 00:18:42,594 --> 00:18:44,810 hopefully. 274 00:18:44,810 --> 00:18:48,090 OK, so that's one point. 275 00:18:48,090 --> 00:18:51,260 And just another little point while we 276 00:18:51,260 --> 00:18:57,080 are reviewing variance, this is the standard formula 277 00:18:57,080 --> 00:19:04,710 for the variance, sum of all the possible outcomes 278 00:19:04,710 --> 00:19:10,070 with their probabilities, the distance from the mean squared. 279 00:19:10,070 --> 00:19:14,090 Do you know a second formula, which is very close to this 280 00:19:14,090 --> 00:19:18,470 and very similar, and it comes from substituting 281 00:19:18,470 --> 00:19:22,790 the meaning of the mean, substituting what the mean is? 282 00:19:22,790 --> 00:19:28,150 So yeah, I just want to mention a second formula. 283 00:19:30,800 --> 00:19:34,270 And I don't know which one we'll actually use. 284 00:19:34,270 --> 00:19:41,110 But the second formula for the same quantity, sigma squared, 285 00:19:41,110 --> 00:19:48,870 is the sum of probabilities times output squared. 286 00:19:55,510 --> 00:19:58,660 So I haven't subtracted off the mean in this second formula. 287 00:19:58,660 --> 00:20:00,350 I have to do it now. 288 00:20:00,350 --> 00:20:02,290 And I'll do the mean-- 289 00:20:02,290 --> 00:20:05,440 I'll do the mean all at once, mean squared. 290 00:20:11,860 --> 00:20:16,150 Of course, the mean involves-- 291 00:20:16,150 --> 00:20:19,890 remember that the mean is the sum of the probability 292 00:20:19,890 --> 00:20:20,650 times the outcome. 293 00:20:25,710 --> 00:20:29,910 And it's just playing with a little algebra 294 00:20:29,910 --> 00:20:32,760 to show that you can either-- you have a choice of whatever 295 00:20:32,760 --> 00:20:37,650 is more convenient, subtract the mean of from each output 296 00:20:37,650 --> 00:20:42,120 or do all the outputs, but then you 297 00:20:42,120 --> 00:20:45,570 haven't accounted for the fact that you really 298 00:20:45,570 --> 00:20:48,390 want the distances from the mean, 299 00:20:48,390 --> 00:20:50,670 and then you subtract off the mean squared. 300 00:20:50,670 --> 00:20:56,150 Two ways to do it, two ways, equal ways to do it. 301 00:20:56,150 --> 00:21:04,810 Yeah, we will review the basic ideas of mean and variance 302 00:21:04,810 --> 00:21:09,750 in the section on probability. 303 00:21:09,750 --> 00:21:12,000 Here, yes, question? 304 00:21:12,000 --> 00:21:12,500 Yeah. 305 00:21:12,500 --> 00:21:15,690 AUDIENCE: Is the mean a part of [INAUDIBLE]?? 306 00:21:15,690 --> 00:21:16,880 GILBERT STRANG: The mean? 307 00:21:16,880 --> 00:21:18,540 Oh, in here? 308 00:21:18,540 --> 00:21:19,260 That's separate. 309 00:21:19,260 --> 00:21:20,880 Yeah, that's the whole point. 310 00:21:20,880 --> 00:21:24,960 Yeah, so this was like, do this, and then subtract off 311 00:21:24,960 --> 00:21:26,490 the mean squared. 312 00:21:26,490 --> 00:21:32,620 Or keep the mean in every term, and do it that way. 313 00:21:32,620 --> 00:21:38,910 Yeah, you could verify that the two are the same. 314 00:21:38,910 --> 00:21:44,760 OK, so when we go now to the bigger question, 315 00:21:44,760 --> 00:21:48,110 I've forgotten which way I do it, but I'm free to choose. 316 00:21:48,110 --> 00:21:52,680 OK, is that like small sample reasonable, 317 00:21:52,680 --> 00:21:58,630 and you get the idea that if the-- 318 00:21:58,630 --> 00:22:04,390 if we know that if we look at our matrix, first of all, 319 00:22:04,390 --> 00:22:09,820 and find out which columns are large, large norm, and which 320 00:22:09,820 --> 00:22:14,620 columns are smaller, then that might be useful information 321 00:22:14,620 --> 00:22:19,240 to weight our probabilities to pick the larger one more often. 322 00:22:19,240 --> 00:22:21,030 OK. 323 00:22:21,030 --> 00:22:22,350 OK. 324 00:22:22,350 --> 00:22:26,010 In fact, let me just tell you what are the two possibilities 325 00:22:26,010 --> 00:22:27,660 there. 326 00:22:27,660 --> 00:22:32,490 One is what I just said, weight your probabilities 327 00:22:32,490 --> 00:22:36,330 by the square of the norm, this norm squared weighting 328 00:22:36,330 --> 00:22:42,730 that we'll see and take the columns as they come, 329 00:22:42,730 --> 00:22:45,540 but with higher probability on the big columns, 330 00:22:45,540 --> 00:22:53,140 or you could say another way would be mix the columns, 331 00:22:53,140 --> 00:22:57,060 so that they more or less have similar sizes, 332 00:22:57,060 --> 00:23:03,220 and then, keep track of what you've done, 333 00:23:03,220 --> 00:23:07,330 and then just the probabilities can all be equal. 334 00:23:07,330 --> 00:23:08,980 So that would be the other way. 335 00:23:08,980 --> 00:23:11,710 Take your matrix, mix it up, take 336 00:23:11,710 --> 00:23:15,930 combinations of the columns with random numbers. 337 00:23:15,930 --> 00:23:17,290 It's a random world here. 338 00:23:20,500 --> 00:23:25,565 Do a mixing, and then operate on the mixed matrix. 339 00:23:28,210 --> 00:23:31,420 OK, I'm going to do it the first way. 340 00:23:31,420 --> 00:23:35,560 I'm going to pick these probabilities to-- 341 00:23:35,560 --> 00:23:39,220 they'll turn out to be proportional to norm squared. 342 00:23:39,220 --> 00:23:41,080 OK, ready for that? 343 00:23:41,080 --> 00:23:41,680 Here it comes. 344 00:23:45,990 --> 00:23:52,180 So let me bring that down. 345 00:23:54,690 --> 00:23:55,990 Yeah. 346 00:23:55,990 --> 00:23:57,150 OK. 347 00:23:57,150 --> 00:23:59,150 Actually, I could leave it up for now, 348 00:23:59,150 --> 00:24:03,990 because it told us what we're up to. 349 00:24:03,990 --> 00:24:04,490 OK. 350 00:24:07,550 --> 00:24:09,230 So what have I got? 351 00:24:09,230 --> 00:24:10,580 Let me just see if I can-- 352 00:24:13,290 --> 00:24:16,070 so we're multiplying a times b, and we're 353 00:24:16,070 --> 00:24:17,880 going to use these probabilities. 354 00:24:17,880 --> 00:24:26,540 Pj is going to be the length of that column times the length 355 00:24:26,540 --> 00:24:32,100 of that row, norm squared. 356 00:24:32,100 --> 00:24:36,750 Well, norm squared, if I was multiplying a by a transpose, 357 00:24:36,750 --> 00:24:38,400 then I really would be squaring. 358 00:24:38,400 --> 00:24:41,340 That would be the same as that. 359 00:24:41,340 --> 00:24:46,680 So I'm going to use the word norm squared or length squared. 360 00:24:46,680 --> 00:24:51,050 Also, here, where the two-- 361 00:24:51,050 --> 00:24:54,620 I'm not assuming that b is a transpose. 362 00:24:54,620 --> 00:24:57,510 OK, so that will be the probabilities 363 00:24:57,510 --> 00:24:59,310 will be proportional to that. 364 00:24:59,310 --> 00:25:03,730 But now, those that don't add up to 1, so how 365 00:25:03,730 --> 00:25:06,640 do I make the probabilities add up to 1? 366 00:25:06,640 --> 00:25:16,810 This is the probability of choosing column 367 00:25:16,810 --> 00:25:26,230 j of a times times row j of b. 368 00:25:26,230 --> 00:25:28,870 That's what Pj refers to. 369 00:25:32,720 --> 00:25:37,070 OK, so what is my plan? 370 00:25:37,070 --> 00:25:39,620 Oh, I have to make the probabilities add to 1, 371 00:25:39,620 --> 00:25:46,610 or I'm really breaking the fundamental law here. 372 00:25:46,610 --> 00:25:49,490 So if I have a bunch of probabilities, 373 00:25:49,490 --> 00:25:52,180 and I kind of know what I want, but they don't add up to 1, 374 00:25:52,180 --> 00:25:55,010 what do I do? 375 00:25:55,010 --> 00:25:56,930 Divide by their sum. 376 00:25:56,930 --> 00:25:58,790 Let me call c their sum. 377 00:25:58,790 --> 00:26:02,570 So the probability is going to be that over c, 378 00:26:02,570 --> 00:26:09,470 and c is going to be the sum of however many rows and columns. 379 00:26:09,470 --> 00:26:18,790 I guess maybe I had r in my picture of aj bj transpose. 380 00:26:18,790 --> 00:26:24,270 OK, so all I did was scale the probability so 381 00:26:24,270 --> 00:26:26,640 that they now add to 1. 382 00:26:26,640 --> 00:26:28,100 Good. 383 00:26:28,100 --> 00:26:31,820 OK, so now I'm ready to go to work. 384 00:26:31,820 --> 00:26:34,460 I'm ready to choose-- 385 00:26:34,460 --> 00:26:37,620 oh, yes, so here's my rule. 386 00:26:37,620 --> 00:26:45,860 I will choose column row j with this probability, 387 00:26:45,860 --> 00:26:47,930 but then I'm going to multiply it, 388 00:26:47,930 --> 00:26:50,990 and I'm free to do that if I want to. 389 00:26:50,990 --> 00:26:58,340 So my approximation, my approximate AB will be-- 390 00:26:58,340 --> 00:27:04,540 I'll take this, whichever comes out, 391 00:27:04,540 --> 00:27:09,970 I'll take the aj bj transpose that comes out. 392 00:27:09,970 --> 00:27:13,180 It comes out with probability Pj. 393 00:27:13,180 --> 00:27:16,000 But I'm going to divide this by-- 394 00:27:16,000 --> 00:27:18,384 and I think I'm, this is the right one-- 395 00:27:18,384 --> 00:27:22,390 s, the number of samples, times Pj. 396 00:27:22,390 --> 00:27:27,670 So I thought, at first, that's weird. 397 00:27:27,670 --> 00:27:33,480 Went to all the trouble to pick these Pj's, claiming 398 00:27:33,480 --> 00:27:34,950 that these are the good ones. 399 00:27:34,950 --> 00:27:39,060 So my claim to eventually prove at the end-- 400 00:27:39,060 --> 00:27:42,420 first, I'll have to understand how the sampling is done. 401 00:27:42,420 --> 00:27:44,340 That's like the most important. 402 00:27:44,340 --> 00:27:47,100 But then when I go to compute the mean, 403 00:27:47,100 --> 00:27:49,930 I'll get the correct mean, and when 404 00:27:49,930 --> 00:27:52,240 I go to compute the variance, I'll 405 00:27:52,240 --> 00:27:55,210 get some expression for the variance, 406 00:27:55,210 --> 00:28:02,640 and then the plan will be choose these Pj's to minimize 407 00:28:02,640 --> 00:28:06,030 that total variance. 408 00:28:06,030 --> 00:28:07,860 So this is what-- 409 00:28:07,860 --> 00:28:09,870 that's a typical sample. 410 00:28:09,870 --> 00:28:16,770 With probability Pj, pick that that matrix, that 411 00:28:16,770 --> 00:28:19,600 rank 1 matrix. 412 00:28:19,600 --> 00:28:24,220 So then my approximate AB is the sum of all these 413 00:28:24,220 --> 00:28:26,050 over s samples. 414 00:28:30,800 --> 00:28:32,150 Are you with me? 415 00:28:32,150 --> 00:28:34,660 Let me just repeat. 416 00:28:34,660 --> 00:28:37,970 I'm trying to multiply AB. 417 00:28:37,970 --> 00:28:41,150 Each sample is just a single column times row. 418 00:28:41,150 --> 00:28:43,850 So it's way wrong, way wrong. 419 00:28:43,850 --> 00:28:46,960 It's just a tiny piece of AB. 420 00:28:46,960 --> 00:28:50,890 But I take that sample with probability Pj, 421 00:28:50,890 --> 00:28:55,180 and I divide it by S Pj, so that the Pj's cancel here. 422 00:29:00,740 --> 00:29:02,320 Oh, yes. 423 00:29:02,320 --> 00:29:04,640 OK, right. 424 00:29:04,640 --> 00:29:10,420 So I would like to see that the mean is correct. 425 00:29:10,420 --> 00:29:14,480 I would like to see that the mean is correct. 426 00:29:14,480 --> 00:29:17,640 I'm going to compute the mean of my process. 427 00:29:17,640 --> 00:29:20,640 So like it's falling into my lap here. 428 00:29:20,640 --> 00:29:22,260 I made it that way. 429 00:29:22,260 --> 00:29:24,540 These Pj's cancel. 430 00:29:24,540 --> 00:29:26,460 I divided by s. 431 00:29:26,460 --> 00:29:30,690 So the mean of a typical sample will be-- 432 00:29:30,690 --> 00:29:41,910 so the mean of one sample is the probability 433 00:29:41,910 --> 00:29:43,800 of getting it times what I take. 434 00:29:43,800 --> 00:29:51,080 So it's just the sum of aj bj transpose over s. 435 00:29:51,080 --> 00:29:53,580 You're going to say, OK, you're wasting our time. 436 00:29:53,580 --> 00:29:55,440 But we got-- 437 00:29:55,440 --> 00:29:57,000 I would just want to show that I'm 438 00:29:57,000 --> 00:30:02,010 getting the correct mean out of this plan. 439 00:30:02,010 --> 00:30:05,370 So do you see that if that's a mean of one sample, 440 00:30:05,370 --> 00:30:08,730 so what's the mean of the sum of all the samples? 441 00:30:11,310 --> 00:30:14,910 Well, multiply by s, because it was the same mean. 442 00:30:14,910 --> 00:30:17,490 Every sample had the same mean, just 443 00:30:17,490 --> 00:30:21,960 as it did in our Little League practice example. 444 00:30:21,960 --> 00:30:24,090 So that's the mean of one sample. 445 00:30:24,090 --> 00:30:33,970 So the mean of all samples added together, multiplies this by s. 446 00:30:33,970 --> 00:30:36,630 The s's cancel, and I get AB. 447 00:30:43,590 --> 00:30:49,010 Remembering my-- however way I defined AB there, yeah. 448 00:30:49,010 --> 00:30:49,510 Yeah. 449 00:30:53,880 --> 00:30:58,990 All I'm saying here is that I did something reasonable 450 00:30:58,990 --> 00:31:05,440 in the sampling process, so that the mean came out right. 451 00:31:05,440 --> 00:31:08,575 And now is the hard part, the variance. 452 00:31:11,600 --> 00:31:13,130 What's the variance? 453 00:31:13,130 --> 00:31:15,650 OK, so what do I have to compute-- 454 00:31:15,650 --> 00:31:38,560 and I may-- it will depend on the p's, p1 to pr, I guess. 455 00:31:38,560 --> 00:31:42,970 We had r different rows, different column row 456 00:31:42,970 --> 00:31:49,390 pairs to choose, and we chose probabilities, these guys, 457 00:31:49,390 --> 00:31:52,030 that depended on this size. 458 00:31:52,030 --> 00:31:56,750 And now I'm going to compute the variance, 459 00:31:56,750 --> 00:32:03,900 and it won't be 0, because every sample is wrong. 460 00:32:03,900 --> 00:32:07,080 I'm never getting from a sample. 461 00:32:07,080 --> 00:32:11,130 A sample is just giving me a column times a row, a rank 1 462 00:32:11,130 --> 00:32:17,410 guy, and they averaged out to give the correct product. 463 00:32:17,410 --> 00:32:22,660 But each one is certainly wrong, because it's just a rank 1. 464 00:32:22,660 --> 00:32:26,940 So when I compute variance, I'm going to definitely not get 0, 465 00:32:26,940 --> 00:32:28,380 right? 466 00:32:28,380 --> 00:32:31,910 In other words, of course, when would the variance be 0? 467 00:32:34,530 --> 00:32:37,580 Yeah, if AB were rank 1, I guess I'd get it right every time. 468 00:32:37,580 --> 00:32:38,250 Thanks. 469 00:32:38,250 --> 00:32:40,560 That was a better answer than I had in mind. 470 00:32:40,560 --> 00:32:42,120 Yeah, yeah. 471 00:32:42,120 --> 00:32:46,780 The variance would only be 0 if every sample was right. 472 00:32:46,780 --> 00:32:49,930 And that would be true if the rank was 1, 473 00:32:49,930 --> 00:32:51,760 and there was only one thing to choose. 474 00:32:51,760 --> 00:32:55,780 But that's not the problem we want. 475 00:32:55,780 --> 00:32:57,600 OK, so the variance is there. 476 00:33:01,660 --> 00:33:07,650 My instinct is to tell you what this calculation produces, 477 00:33:07,650 --> 00:33:09,240 since you and I can read. 478 00:33:14,060 --> 00:33:17,690 Would you allow me to do that? 479 00:33:17,690 --> 00:33:26,210 So here, the variance for a sample turned out 480 00:33:26,210 --> 00:33:29,670 to equal, so we will figure it out, 481 00:33:29,670 --> 00:33:33,290 turns out to equal the sum over-- 482 00:33:33,290 --> 00:33:42,980 as it was up there of the aj bj transpose, probably squared. 483 00:33:42,980 --> 00:33:46,450 Let me just check. 484 00:33:46,450 --> 00:33:48,470 Yes, squared. 485 00:33:48,470 --> 00:33:51,990 Yeah, why don't I help myself here? 486 00:33:51,990 --> 00:33:57,140 So these are squared because variances are squared. 487 00:33:57,140 --> 00:34:01,310 And then when I look to see what-- 488 00:34:01,310 --> 00:34:05,940 I think there is an s there, and there's a Pj, 489 00:34:05,940 --> 00:34:10,380 so why is there a Pj there, when it canceled here? 490 00:34:10,380 --> 00:34:15,000 So here, the Pj, when I multiply by that, canceled. 491 00:34:15,000 --> 00:34:19,030 Why doesn't it cancel over there? 492 00:34:19,030 --> 00:34:21,460 Because it's squared over there. 493 00:34:21,460 --> 00:34:23,500 Over there, this thing is squared. 494 00:34:23,500 --> 00:34:25,389 So it was Pj twice. 495 00:34:25,389 --> 00:34:28,170 Here, I have Pj, its probability once. 496 00:34:28,170 --> 00:34:33,250 So I've still got the Pj in the denominator, one factor of Pj 497 00:34:33,250 --> 00:34:35,350 in the denominator. 498 00:34:35,350 --> 00:34:37,600 And then-- so that is-- 499 00:34:37,600 --> 00:34:41,230 I guess what I'm doing is I'm computing 500 00:34:41,230 --> 00:34:44,610 the variance this way. 501 00:34:44,610 --> 00:34:48,830 So what I've computed now is this first bit, 502 00:34:48,830 --> 00:34:54,350 and then I said should subtract the mean squared. 503 00:34:54,350 --> 00:34:57,410 And this is for one sample. 504 00:34:57,410 --> 00:35:01,275 So the mean squared is-- 505 00:35:04,260 --> 00:35:10,440 I think it turns out to be 1 over s times AB squared 506 00:35:10,440 --> 00:35:12,830 in this Frobenius norm. 507 00:35:12,830 --> 00:35:17,700 It's a squared plus b squared stuff that I saw before. 508 00:35:21,080 --> 00:35:25,460 OK, so this-- 509 00:35:25,460 --> 00:35:33,142 I've jumped a serious step to get from the sum-- 510 00:35:33,142 --> 00:35:35,110 the formula for the variance. 511 00:35:35,110 --> 00:35:40,730 I've plugged in this problem and got that. 512 00:35:40,730 --> 00:35:43,350 OK, and now I'm going to sample. 513 00:35:46,670 --> 00:35:51,760 Let's see where-- yeah. 514 00:35:51,760 --> 00:35:53,410 I would like to simplify this. 515 00:35:56,570 --> 00:35:58,010 I would like to simplify that. 516 00:36:01,710 --> 00:36:05,140 So I have to plug in the Pj's. 517 00:36:05,140 --> 00:36:08,380 OK, so after plug in for that Pj, 518 00:36:08,380 --> 00:36:13,930 and we decided what Pj was going to be here. 519 00:36:18,270 --> 00:36:23,380 OK, so when I plug that in in the denominator, 520 00:36:23,380 --> 00:36:27,670 it will cancel one of these. 521 00:36:27,670 --> 00:36:33,870 And I'll just have a sum of of aj Pj bj norms. 522 00:36:33,870 --> 00:36:38,730 And what that is C. 523 00:36:38,730 --> 00:36:42,920 So let me say this again just. 524 00:36:42,920 --> 00:36:46,620 It's something you can just check when you have a minute. 525 00:36:46,620 --> 00:36:53,600 When I plug in that value for Pj here, it cancels the squares 526 00:36:53,600 --> 00:36:55,800 and just leaves the first power. 527 00:36:55,800 --> 00:37:03,530 So then I'm adding up the first power, and I get C. 528 00:37:03,530 --> 00:37:08,800 But the Pj had a factor C in the denominator, 529 00:37:08,800 --> 00:37:12,110 and it's in the denominator over there, so that C is up there. 530 00:37:12,110 --> 00:37:19,190 So it's C squared coming here, a constant squared, 531 00:37:19,190 --> 00:37:20,640 minus the other term. 532 00:37:20,640 --> 00:37:22,580 There's a 1 over s. 533 00:37:22,580 --> 00:37:24,200 That will eventually go away. 534 00:37:24,200 --> 00:37:28,070 And this other term is 1 over s norm 535 00:37:28,070 --> 00:37:32,360 AB the Fromenius norm squared. 536 00:37:34,940 --> 00:37:37,550 Or maybe 1 over s's are-- 537 00:37:42,370 --> 00:37:50,800 so you're seeing-- and I apologize, a little bit messy 538 00:37:50,800 --> 00:37:52,720 bit of algebra. 539 00:37:52,720 --> 00:37:54,370 A little bit messy bit of algebra. 540 00:37:54,370 --> 00:37:57,910 But that's what we ended up with. 541 00:37:57,910 --> 00:38:00,880 And when we take s samples and combine them, 542 00:38:00,880 --> 00:38:05,530 that will cancel the s, and I think 543 00:38:05,530 --> 00:38:09,350 it'll knock that out when we combine the s samples. 544 00:38:09,350 --> 00:38:09,850 OK. 545 00:38:19,340 --> 00:38:21,660 OK. 546 00:38:21,660 --> 00:38:23,700 Now what? 547 00:38:28,480 --> 00:38:34,080 Now, we get to choose those probabilities. 548 00:38:34,080 --> 00:38:35,580 And how are we going to choose them? 549 00:38:38,720 --> 00:38:40,130 What will be the best choice? 550 00:38:40,130 --> 00:38:42,350 Here is the expression for the variance. 551 00:38:42,350 --> 00:38:43,500 Yeah, this is good. 552 00:38:43,500 --> 00:38:45,380 This is good. 553 00:38:45,380 --> 00:38:51,530 Stay with me for now, and you will be saying to yourself, 554 00:38:51,530 --> 00:38:55,160 there's some steps there that I didn't see fully, 555 00:38:55,160 --> 00:38:56,970 and I want to check. 556 00:38:56,970 --> 00:38:58,490 And I agree. 557 00:38:58,490 --> 00:39:03,870 But let me say that we get to that point, 558 00:39:03,870 --> 00:39:07,240 and this is a fixed number. 559 00:39:07,240 --> 00:39:11,340 So it's C that we would like to make small, 560 00:39:11,340 --> 00:39:14,340 and that's our final job. 561 00:39:14,340 --> 00:39:20,550 This was true for any choice of the probabilities P. Well, oh, 562 00:39:20,550 --> 00:39:21,790 yeah, sorry. 563 00:39:21,790 --> 00:39:24,610 Yeah, yeah. 564 00:39:24,610 --> 00:39:27,490 So I want to-- 565 00:39:27,490 --> 00:39:30,392 this still had in it a probability. 566 00:39:34,410 --> 00:39:34,910 Yeah. 567 00:39:34,910 --> 00:39:36,420 What do I want to do? 568 00:39:36,420 --> 00:39:40,890 I want to show that that was the best choice, that this 569 00:39:40,890 --> 00:39:42,640 was the best choice. 570 00:39:42,640 --> 00:39:44,220 Yeah, yeah. 571 00:39:44,220 --> 00:39:45,840 I want to show that that's the best 572 00:39:45,840 --> 00:39:54,850 choice, that the choice of weights of probabilities, based 573 00:39:54,850 --> 00:39:59,440 on length of a times the length of b-- of course, it sounds 574 00:39:59,440 --> 00:40:00,880 reasonable, doesn't it? 575 00:40:00,880 --> 00:40:05,290 We want to-- for big columns and big rows, 576 00:40:05,290 --> 00:40:08,590 we want to have a higher probability to choose those. 577 00:40:08,590 --> 00:40:11,020 But is the probability proportional 578 00:40:11,020 --> 00:40:14,380 to the length of both, or should it 579 00:40:14,380 --> 00:40:17,800 be proportional to the 10th power or the square root 580 00:40:17,800 --> 00:40:18,760 or what? 581 00:40:18,760 --> 00:40:25,810 That's what our final step of optimizing the P. 582 00:40:25,810 --> 00:40:27,190 So this is the final step. 583 00:40:30,290 --> 00:40:40,900 Optimize the probabilities, P1 to P2, 584 00:40:40,900 --> 00:40:47,450 I guess, no, P1 to Pr, for the r rows, r columns of a and r rows 585 00:40:47,450 --> 00:40:50,630 of b, subject to-- 586 00:40:50,630 --> 00:40:53,030 they have to add up to 1. 587 00:40:53,030 --> 00:40:55,070 And what do I mean by optimize? 588 00:40:55,070 --> 00:40:56,200 I mean minimize. 589 00:40:59,050 --> 00:41:04,350 This optimize means minimizing this expression, 590 00:41:04,350 --> 00:41:11,603 C. So aj bj transpose. 591 00:41:18,710 --> 00:41:23,420 Where is-- over Pj. 592 00:41:23,420 --> 00:41:26,540 Oh yeah, wait a minute. 593 00:41:26,540 --> 00:41:28,848 Help. 594 00:41:28,848 --> 00:41:30,780 Help. 595 00:41:30,780 --> 00:41:32,190 So let me just see. 596 00:41:35,494 --> 00:41:39,720 Yeah, my variance has got a Pj in it. 597 00:41:39,720 --> 00:41:41,170 Yeah, my variance-- sorry-- 598 00:41:41,170 --> 00:41:43,990 my variance-- oh, OK. 599 00:41:43,990 --> 00:41:44,995 This is my variance. 600 00:41:48,300 --> 00:41:54,300 This is the result if I make the right choice for the-- 601 00:41:54,300 --> 00:41:57,390 if I make this choice for the probabilities. 602 00:41:57,390 --> 00:42:00,820 But I'm backing up a minute. 603 00:42:00,820 --> 00:42:11,050 This is if-- this is the with optimal Pj's, then 604 00:42:11,050 --> 00:42:12,490 we got that answer. 605 00:42:12,490 --> 00:42:13,150 Great. 606 00:42:13,150 --> 00:42:14,560 That was our answer. 607 00:42:14,560 --> 00:42:18,790 But I'm backing up to this and saying, 608 00:42:18,790 --> 00:42:24,420 what are the optimal Pj's to make this variance small? 609 00:42:24,420 --> 00:42:29,900 So really, I'm just doing this. 610 00:42:29,900 --> 00:42:33,180 Let me write the problem simpler. 611 00:42:33,180 --> 00:42:44,370 Minimize with the sum of the P's equal 1, 612 00:42:44,370 --> 00:42:51,240 some quantity Q squared over Qj over Pj. 613 00:42:51,240 --> 00:42:52,030 Yeah, that's it. 614 00:42:56,860 --> 00:43:01,240 How do you-- so these Qj's that I just introduced that letter 615 00:43:01,240 --> 00:43:03,310 for are the aj bj's. 616 00:43:06,000 --> 00:43:07,000 They're given. 617 00:43:07,000 --> 00:43:10,390 Maybe I'll just put back aj Pj. 618 00:43:10,390 --> 00:43:16,330 So to repeat, this is the calculation of the variance 619 00:43:16,330 --> 00:43:18,940 for any choice of Pj's. 620 00:43:18,940 --> 00:43:22,480 This is what I get if I make the best choice, 621 00:43:22,480 --> 00:43:24,970 but over here, I'm going to show that it is the best 622 00:43:24,970 --> 00:43:29,230 choice, that it's the choice that makes this result as 623 00:43:29,230 --> 00:43:30,850 small as possible. 624 00:43:30,850 --> 00:43:34,930 So that's the Lagrange multiplier aspect. 625 00:43:34,930 --> 00:43:39,370 So the statistics has been done. 626 00:43:39,370 --> 00:43:41,630 I'm getting this answer. 627 00:43:41,630 --> 00:43:46,330 And instead of putting in some weird Q, 628 00:43:46,330 --> 00:43:49,546 let me put in what these are. 629 00:43:49,546 --> 00:43:50,370 They're whatever. 630 00:43:50,370 --> 00:43:51,495 They're a bunch of numbers. 631 00:43:54,490 --> 00:44:02,630 But I'm dividing by the Pj, and how do you find the best Pj? 632 00:44:02,630 --> 00:44:09,580 Do you know about that optimization question? 633 00:44:09,580 --> 00:44:12,200 They have to add to 1. 634 00:44:12,200 --> 00:44:15,310 And the Lagrange had the great idea. 635 00:44:15,310 --> 00:44:19,650 So this is maybe the first time we've used his idea. 636 00:44:19,650 --> 00:44:23,390 So do you remember what his idea is? 637 00:44:23,390 --> 00:44:28,970 He takes this constraint, and he builds it into the function. 638 00:44:28,970 --> 00:44:34,460 He multiplies it by some unknown mysterious number, often called 639 00:44:34,460 --> 00:44:37,280 lambda, but nothing to do with eigenvalues, 640 00:44:37,280 --> 00:44:42,050 of the constraints that the Pi's should add to 1. 641 00:44:42,050 --> 00:44:44,660 So he had 0. 642 00:44:44,660 --> 00:44:49,680 He had 0, but with a variable lambda. 643 00:44:49,680 --> 00:44:51,830 This is Lagrange's idea. 644 00:44:51,830 --> 00:44:55,160 So it's pretty neat that this problem-- 645 00:44:55,160 --> 00:44:56,960 I've left randomized sampling. 646 00:44:56,960 --> 00:45:00,380 I've arrived at this final sub problem, 647 00:45:00,380 --> 00:45:03,530 optimizing the probabilities under the condition 648 00:45:03,530 --> 00:45:06,320 that they add to 1, and Lagrange's idea 649 00:45:06,320 --> 00:45:11,640 was build that equation into the function. 650 00:45:11,640 --> 00:45:15,200 Then you can take derivatives, but you also 651 00:45:15,200 --> 00:45:17,570 take derivatives with respect to lambda, 652 00:45:17,570 --> 00:45:20,570 because that's now an unknown. 653 00:45:20,570 --> 00:45:23,510 And you solve-- you set the derivatives to 0, 654 00:45:23,510 --> 00:45:24,470 and you get the answer. 655 00:45:24,470 --> 00:45:26,060 It's like a miracle. 656 00:45:28,720 --> 00:45:32,970 But if you've seen Lagrange, it's a confusing miracle. 657 00:45:32,970 --> 00:45:34,030 That's what it is. 658 00:45:34,030 --> 00:45:34,710 Yeah. 659 00:45:34,710 --> 00:45:35,680 OK. 660 00:45:35,680 --> 00:45:39,580 So if I take the derivatives with respect to the P's, set 661 00:45:39,580 --> 00:45:45,870 them to 0, I think I'm going to get the recommended P's. 662 00:45:48,540 --> 00:45:52,980 So I've computed the final answer with a recommended P's, 663 00:45:52,980 --> 00:45:55,860 but now I'm going to show that they really are recommended. 664 00:45:55,860 --> 00:45:59,910 So can you take the derivative of that with respect to P? 665 00:45:59,910 --> 00:46:05,280 Can I-- I'll just raise this a little, raise it a little more. 666 00:46:05,280 --> 00:46:07,170 OK, take the derivative with respect 667 00:46:07,170 --> 00:46:11,460 to P, each P, because I've got n unknowns there, 668 00:46:11,460 --> 00:46:14,670 or however many, maybe r unknowns. 669 00:46:14,670 --> 00:46:17,890 And I've got lambda, so I've got r plus 1 things. 670 00:46:17,890 --> 00:46:22,260 So what's the derivative with respect to P. OK, calculus. 671 00:46:22,260 --> 00:46:23,760 Take the derivative of that. 672 00:46:23,760 --> 00:46:30,240 It's aj bj transpose over-- 673 00:46:30,240 --> 00:46:32,150 with a minus Pj squared, right? 674 00:46:34,690 --> 00:46:37,645 And the derivative of that with respect to Pj is? 675 00:46:40,740 --> 00:46:42,940 Minus lambda. 676 00:46:42,940 --> 00:46:46,450 So that derivative with respect to Pj is 0, 677 00:46:46,450 --> 00:46:52,360 and the derivative-- so this was a derivative with respect to Pj 678 00:46:52,360 --> 00:46:54,440 has to be 0. 679 00:46:54,440 --> 00:46:57,735 And then the derivative with respect to lambda-- 680 00:46:57,735 --> 00:47:03,020 the derivative with respect to lambda is that, 681 00:47:03,020 --> 00:47:04,850 on call them j's-- 682 00:47:04,850 --> 00:47:07,772 j's minus equals 1. 683 00:47:10,880 --> 00:47:13,780 Lagrange confused the whole world, 684 00:47:13,780 --> 00:47:18,360 but he gave us a break that in the derivative with respect 685 00:47:18,360 --> 00:47:21,190 to lambda, it just brings back that constraint, 686 00:47:21,190 --> 00:47:23,440 because he just built it in with the factor of lambda, 687 00:47:23,440 --> 00:47:25,970 then he took the derivative, and it brought back 688 00:47:25,970 --> 00:47:27,160 that constraint. 689 00:47:27,160 --> 00:47:29,170 But this part is the beautiful part. 690 00:47:32,970 --> 00:47:35,392 Now, what do I learn from that? 691 00:47:39,170 --> 00:47:42,350 And sometimes this would be a plus. 692 00:47:42,350 --> 00:47:47,640 Why don't I make it a plus just to make my life easier? 693 00:47:47,640 --> 00:47:50,330 Lagrange is dead now, and he don't care anyway, 694 00:47:50,330 --> 00:47:52,450 whether it's plus or a minus. 695 00:47:52,450 --> 00:47:52,950 OK. 696 00:47:56,810 --> 00:47:58,850 So this is telling me this. 697 00:47:58,850 --> 00:48:02,030 So this is tell me what its multiplier is. 698 00:48:02,030 --> 00:48:03,110 He's telling me that-- 699 00:48:03,110 --> 00:48:11,750 this equation is telling me that the multiplier is aj bj 700 00:48:11,750 --> 00:48:15,090 transpose over Pj squared. 701 00:48:17,770 --> 00:48:22,840 Or put it another way, he's telling me that Pj squared is-- 702 00:48:26,070 --> 00:48:30,870 I guess, I'm hoping that after the pretty confusing steps 703 00:48:30,870 --> 00:48:35,400 that we took, this is a separate little bit of math, 704 00:48:35,400 --> 00:48:37,710 using the Lagrange multiplier idea, 705 00:48:37,710 --> 00:48:41,490 and I hope that your thought will be, 706 00:48:41,490 --> 00:48:43,450 boy, that was pretty simple. 707 00:48:43,450 --> 00:48:45,300 So I'm going to put the Pj squareds here 708 00:48:45,300 --> 00:48:46,256 and the lambda there. 709 00:48:49,470 --> 00:48:50,490 What does this tell me? 710 00:48:54,650 --> 00:48:57,800 I've taken the derivative with respect to the Pj's, and I got 711 00:48:57,800 --> 00:49:02,750 this equation for each j because I took the derivative, 712 00:49:02,750 --> 00:49:06,290 the partial derivative with respect to each of the Pj's. 713 00:49:06,290 --> 00:49:09,720 And it tells me that Pj squared-- 714 00:49:13,260 --> 00:49:14,040 wait a minute. 715 00:49:14,040 --> 00:49:15,630 What's the square in there for? 716 00:49:15,630 --> 00:49:17,070 Help. 717 00:49:17,070 --> 00:49:18,420 I've only got two minutes. 718 00:49:18,420 --> 00:49:24,580 And oh, they have to add to 1. 719 00:49:24,580 --> 00:49:27,280 Oh yeah, lambda is going to save us. 720 00:49:27,280 --> 00:49:30,310 Right, lambda is going to save us, 721 00:49:30,310 --> 00:49:34,360 because the total probabilities-- so Pj 722 00:49:34,360 --> 00:49:37,480 will be the square root of this stuff. 723 00:49:40,820 --> 00:49:44,230 And then I-- the number lambda, I haven't decided. 724 00:49:44,230 --> 00:49:46,970 Lagrange's multiplier, I haven't decided. 725 00:49:46,970 --> 00:49:48,300 So what is it? 726 00:49:48,300 --> 00:49:51,660 It's the correct number to make this equal to 1. 727 00:49:51,660 --> 00:49:56,950 So that is the C. Oh god. 728 00:49:59,740 --> 00:50:02,450 Why have I got square root there? 729 00:50:02,450 --> 00:50:03,780 Shoot. 730 00:50:03,780 --> 00:50:04,194 AUDIENCE: I think you're supposed 731 00:50:04,194 --> 00:50:05,330 to start off with squares. 732 00:50:05,330 --> 00:50:07,455 GILBERT STRANG: I should have started with squares? 733 00:50:07,455 --> 00:50:08,870 AUDIENCE: [INAUDIBLE] 734 00:50:08,870 --> 00:50:11,200 GILBERT STRANG: So these should be squares? 735 00:50:11,200 --> 00:50:13,398 Ah, thank you. 736 00:50:13,398 --> 00:50:14,690 You could have told me earlier. 737 00:50:17,620 --> 00:50:19,210 When you see a professor in trouble, 738 00:50:19,210 --> 00:50:22,490 don't just let him hang there. 739 00:50:22,490 --> 00:50:24,260 OK, all right. 740 00:50:24,260 --> 00:50:29,400 OK, and this is aj bj transpose. 741 00:50:29,400 --> 00:50:31,910 So apart from the kerfuffle here, 742 00:50:31,910 --> 00:50:34,850 and the notes get it right, because I 743 00:50:34,850 --> 00:50:40,310 had time to think there, it turns out 744 00:50:40,310 --> 00:50:45,960 that this optimum gave the formula for the Pj's 745 00:50:45,960 --> 00:50:49,320 that I used earlier. 746 00:50:49,320 --> 00:50:51,350 So when I introduced this formula, 747 00:50:51,350 --> 00:50:55,130 I said, let's choose those probabilities, 748 00:50:55,130 --> 00:50:57,560 but then I came back at the very end 749 00:50:57,560 --> 00:51:00,620 and showed that they are the probabilities that 750 00:51:00,620 --> 00:51:02,720 minimize the variance. 751 00:51:02,720 --> 00:51:05,690 So that's like today's lecture. 752 00:51:05,690 --> 00:51:08,060 Can you just think a minute, but please 753 00:51:08,060 --> 00:51:12,870 do go back through the notes, because there 754 00:51:12,870 --> 00:51:15,800 is some messy steps in the variance 755 00:51:15,800 --> 00:51:20,120 there that I had to go by quickly. 756 00:51:20,120 --> 00:51:23,240 But you understand the principle, that we set up 757 00:51:23,240 --> 00:51:26,300 a randomized system. 758 00:51:26,300 --> 00:51:35,570 We choose probabilities, aiming to get the smallest variance. 759 00:51:35,570 --> 00:51:38,750 And it turns out that the good probabilities are bigger 760 00:51:38,750 --> 00:51:43,770 when the column is a larger column, so that to use this, 761 00:51:43,770 --> 00:51:45,890 you have to go through the matrix 762 00:51:45,890 --> 00:51:48,710 and find the length of the columns, 763 00:51:48,710 --> 00:51:52,290 because that's what's telling you the probabilities. 764 00:51:52,290 --> 00:51:55,070 So that's like a first pass through. 765 00:51:55,070 --> 00:51:57,380 Before you do the randomized sampling, 766 00:51:57,380 --> 00:51:59,720 you must decide on the probabilities, 767 00:51:59,720 --> 00:52:04,280 and they depend on the sizes of the different columns. 768 00:52:04,280 --> 00:52:08,630 Thank you for getting me through that. 769 00:52:08,630 --> 00:52:11,630 I'll come back to a little more about randomized things 770 00:52:11,630 --> 00:52:16,700 next time, and then later, not much later, but a little bit 771 00:52:16,700 --> 00:52:19,940 later, we'll be seeing probability much more 772 00:52:19,940 --> 00:52:23,086 seriously OK, thank you.