1 00:00:01,550 --> 00:00:03,920 The following content is provided under a Creative 2 00:00:03,920 --> 00:00:05,310 Commons license. 3 00:00:05,310 --> 00:00:07,520 Your support will help MIT OpenCourseWare 4 00:00:07,520 --> 00:00:11,610 continue to offer high quality educational resources for free. 5 00:00:11,610 --> 00:00:14,180 To make a donation or to view additional materials 6 00:00:14,180 --> 00:00:18,140 from hundreds of MIT courses, visit MIT OpenCourseWare 7 00:00:18,140 --> 00:00:19,026 at ocw.mit.edu. 8 00:00:22,340 --> 00:00:26,860 GILBERT STRANG: OK, so actually, I 9 00:00:26,860 --> 00:00:29,560 know where people are working on projects, 10 00:00:29,560 --> 00:00:34,030 and you're not responsible for any new material 11 00:00:34,030 --> 00:00:35,810 in the lectures. 12 00:00:35,810 --> 00:00:37,240 Thank you for coming. 13 00:00:37,240 --> 00:00:42,290 But I do have something, an important topic, 14 00:00:42,290 --> 00:00:46,570 which is a revised version about the construction 15 00:00:46,570 --> 00:00:48,910 of neural nets, the basic structure 16 00:00:48,910 --> 00:00:51,130 that we're working with. 17 00:00:51,130 --> 00:01:00,310 So that's on the open web at section 7.1, 18 00:01:00,310 --> 00:01:08,980 so Construction of Neural Nets. 19 00:01:13,890 --> 00:01:23,870 Really, it's a construction of the learning function, F. 20 00:01:23,870 --> 00:01:27,870 So that's the function that you optimize 21 00:01:27,870 --> 00:01:32,150 by gradient descent or stochastic gradient descent, 22 00:01:32,150 --> 00:01:40,560 and you apply to the training data to minimize the loss. 23 00:01:40,560 --> 00:01:45,470 So I'm just thinking about it in a more organized way, 24 00:01:45,470 --> 00:01:49,130 because I wrote that section before I knew anything 25 00:01:49,130 --> 00:01:52,400 more than how to spell neural nets, 26 00:01:52,400 --> 00:01:56,170 but now I've thought about it more. 27 00:01:56,170 --> 00:02:05,820 So the key point maybe, compared to what I had in the past, 28 00:02:05,820 --> 00:02:09,930 is that I now think of this as a function of two 29 00:02:09,930 --> 00:02:21,240 sets of variables, x and v. So x are the weights, 30 00:02:21,240 --> 00:02:29,775 and v are the feature vectors, the sample feature vectors. 31 00:02:36,490 --> 00:02:42,030 So those come from the training data, either one at a time, 32 00:02:42,030 --> 00:02:44,220 if we're doing stochastic gradient 33 00:02:44,220 --> 00:02:47,160 descent with mini-batch size 1. 34 00:02:47,160 --> 00:02:51,570 Or B at a time, if we're doing mini-batch of size B, 35 00:02:51,570 --> 00:02:54,190 or the whole thing, a whole epoch 36 00:02:54,190 --> 00:02:59,490 at once, if we're doing full-scale gradient descent. 37 00:02:59,490 --> 00:03:01,890 So those are the feature vectors, 38 00:03:01,890 --> 00:03:12,670 and these are the numbers in the linear steps, the weights. 39 00:03:12,670 --> 00:03:24,360 So they're the matrices AK that you multiply by, multiply v by. 40 00:03:24,360 --> 00:03:34,270 And also the bias vectors bK that you add on 41 00:03:34,270 --> 00:03:36,970 to shift the origin. 42 00:03:36,970 --> 00:03:39,640 OK. 43 00:03:39,640 --> 00:03:44,260 It's these that you optimize, those are to optimize. 44 00:03:49,010 --> 00:03:57,620 And what's the structure of the whole of the learning function, 45 00:03:57,620 --> 00:03:59,610 and how do you use it? 46 00:03:59,610 --> 00:04:02,200 What does a neural net look like? 47 00:04:02,200 --> 00:04:09,700 So you take F of a first set of weights, 48 00:04:09,700 --> 00:04:17,500 so F of the first set of weights would be A1 and B1, 49 00:04:17,500 --> 00:04:20,140 so that's x part. 50 00:04:20,140 --> 00:04:27,190 And the actual sample vector, the sample vectors 51 00:04:27,190 --> 00:04:32,320 are v0 in the iteration. 52 00:04:32,320 --> 00:04:38,140 And then you do the nonlinear step to each component, 53 00:04:38,140 --> 00:04:41,050 and that produces v1. 54 00:04:41,050 --> 00:04:44,620 So there is a typical-- 55 00:04:44,620 --> 00:04:51,620 I could write out what this is here, A1 v0 plus b1. 56 00:04:54,940 --> 00:04:57,950 The two steps are the linear step. 57 00:04:57,950 --> 00:05:00,600 The endpoint is v0. 58 00:05:00,600 --> 00:05:06,240 You take the linear step using the first weights, A1 and b1. 59 00:05:06,240 --> 00:05:11,190 Then, you takes a nonlinear step, and that gives you v1. 60 00:05:11,190 --> 00:05:14,790 So that really better than my line above, 61 00:05:14,790 --> 00:05:17,520 so I'll erase that line above. 62 00:05:17,520 --> 00:05:18,020 Yeah. 63 00:05:25,340 --> 00:05:32,070 So that produces v1 from v0 and the first weights. 64 00:05:32,070 --> 00:05:36,980 And then the next level inputs v1, 65 00:05:36,980 --> 00:05:43,550 so I'll just call this vK or vK minus 1, 66 00:05:43,550 --> 00:05:45,970 and I'll call this one vK. 67 00:05:45,970 --> 00:05:51,320 OK, so K equal to 1 up to however many layers, 68 00:05:51,320 --> 00:05:52,740 you are l layers. 69 00:05:56,790 --> 00:05:59,190 So the input was v0. 70 00:05:59,190 --> 00:06:03,420 So this v is really v0, you could say. 71 00:06:03,420 --> 00:06:12,120 And this is the neural net, and this is the input and output 72 00:06:12,120 --> 00:06:13,980 from each layer. 73 00:06:13,980 --> 00:06:20,050 And then vl is the final output from the final layer. 74 00:06:20,050 --> 00:06:23,370 So let's just do a picture here. 75 00:06:23,370 --> 00:06:29,820 Here is v0, a sample vector, or if we're 76 00:06:29,820 --> 00:06:37,320 doing image processing, it's all the pixels in the data, 77 00:06:37,320 --> 00:06:39,570 in the training. 78 00:06:39,570 --> 00:06:43,440 From one sample, this is one training sample. 79 00:06:50,340 --> 00:06:55,470 And then you multiply by A1, and you add b1. 80 00:06:55,470 --> 00:07:04,190 And you take ReLU of that vector, and that gives you v1. 81 00:07:04,190 --> 00:07:09,480 That gives you v1, and then you iterate 82 00:07:09,480 --> 00:07:14,010 to finally vl, the last layer. 83 00:07:14,010 --> 00:07:16,680 You don't do ReLU at the last layer, 84 00:07:16,680 --> 00:07:23,430 so it's just Al vl minus 1 plus bl. 85 00:07:23,430 --> 00:07:27,480 And you may not do a bias vector also at that layer, 86 00:07:27,480 --> 00:07:32,310 but you might, and this is the finally the output. 87 00:07:35,910 --> 00:07:37,800 So that picture is clearer for me 88 00:07:37,800 --> 00:07:41,220 than it was previously to distinguish 89 00:07:41,220 --> 00:07:44,370 between the weights. 90 00:07:44,370 --> 00:07:49,410 So in the gradient descent algorithm, 91 00:07:49,410 --> 00:07:51,990 it's these x's that you're choosing. 92 00:07:51,990 --> 00:07:55,980 The v's are given by the training data. 93 00:07:55,980 --> 00:07:59,850 That's not part of the optimization part. 94 00:07:59,850 --> 00:08:02,850 It's x in chapter 6, where you're 95 00:08:02,850 --> 00:08:05,170 finding the optimal weights. 96 00:08:05,170 --> 00:08:12,980 So this x really stands for all the weights 97 00:08:12,980 --> 00:08:26,370 that you compute up to Al, bl, so that's 98 00:08:26,370 --> 00:08:27,840 a collection of all the weights. 99 00:08:27,840 --> 00:08:32,100 And the important part for applications for practice is 100 00:08:32,100 --> 00:08:36,150 to realize that there are often more weights and more 101 00:08:36,150 --> 00:08:39,299 components in the weights than there are components 102 00:08:39,299 --> 00:08:43,679 in the feature vectors, in the samples, in the v's. 103 00:08:43,679 --> 00:08:48,870 So often, the size of x is greater than the size of v's 104 00:08:48,870 --> 00:08:54,720 which is an interesting and sort of unexpected situation. 105 00:08:54,720 --> 00:08:58,160 So often, I'll just write that. 106 00:08:58,160 --> 00:09:03,290 Often, the x's are the weights. 107 00:09:06,800 --> 00:09:23,930 x's are underdetermined, because the number of x's exceeds, 108 00:09:23,930 --> 00:09:27,300 and often far exceeds, the number of v's, 109 00:09:27,300 --> 00:09:32,490 the number of the cardinality, the number of weights. 110 00:09:32,490 --> 00:09:39,380 This is in the A's and b's, and these 111 00:09:39,380 --> 00:09:43,180 are in the samples in the training 112 00:09:43,180 --> 00:09:57,010 set, the number of features of all the samples in the training 113 00:09:57,010 --> 00:09:58,060 set. 114 00:09:58,060 --> 00:10:05,710 So I'll get that new section 7.1 up hopefully 115 00:10:05,710 --> 00:10:09,370 this week on the open-- 116 00:10:09,370 --> 00:10:15,250 that's the open set-- and I'll email to you on Stellar. 117 00:10:15,250 --> 00:10:17,890 Is there more I should say about this? 118 00:10:17,890 --> 00:10:21,250 You see here, I can draw the picture, 119 00:10:21,250 --> 00:10:23,140 but of course, a hand-drawn picture 120 00:10:23,140 --> 00:10:30,850 is far inferior to a machine-drawn picture 121 00:10:30,850 --> 00:10:33,820 an online picture, but let me just do it. 122 00:10:33,820 --> 00:10:40,090 So there is v, the training sample has some components, 123 00:10:40,090 --> 00:10:42,250 and then they're multiplied. 124 00:10:42,250 --> 00:10:51,970 Now, here is going to be v1, the first hidden layer, and that 125 00:10:51,970 --> 00:11:02,620 can have a different number of components in the first layer, 126 00:11:02,620 --> 00:11:04,600 a different number of neurons. 127 00:11:04,600 --> 00:11:10,960 And then each one comes from the v's-- 128 00:11:10,960 --> 00:11:16,780 so I will keep going here, but you see the picture. 129 00:11:16,780 --> 00:11:21,400 So that describes a matrix A1 that tells you 130 00:11:21,400 --> 00:11:24,100 what the weights are on those, and then there's 131 00:11:24,100 --> 00:11:28,590 a b1 that's added. 132 00:11:28,590 --> 00:11:34,290 The bias vector is added to all those to get the v1. 133 00:11:34,290 --> 00:11:42,220 so v1 is A1 v0 plus b1, and then onwards. 134 00:11:42,220 --> 00:11:46,750 So this is the spot where drawing it by hand 135 00:11:46,750 --> 00:11:53,560 is clearly inferior to any other possible way to do it. 136 00:11:53,560 --> 00:11:55,690 OK. 137 00:11:55,690 --> 00:12:03,430 So now, I haven't yet put into the picture the loss function. 138 00:12:03,430 --> 00:12:08,410 So that's the function that you want to minimize. 139 00:12:08,410 --> 00:12:09,675 So what is the loss function? 140 00:12:13,420 --> 00:12:18,740 So we're choosing x to-- 141 00:12:18,740 --> 00:12:20,830 that's all the A's and b's-- 142 00:12:20,830 --> 00:12:27,250 to minimize the loss, function L. 143 00:12:27,250 --> 00:12:28,600 OK. 144 00:12:28,600 --> 00:12:34,540 So it's this part that Professor Sra's lecture was about. 145 00:12:34,540 --> 00:12:45,490 So he said, L is often a finite sum over all of F. 146 00:12:45,490 --> 00:12:46,630 So what would that be? 147 00:12:46,630 --> 00:13:00,210 F of x, vi, so this is the output from with weights 148 00:13:00,210 --> 00:13:03,600 in x from sample number i. 149 00:13:03,600 --> 00:13:06,600 And if we're doing batch processing-- that is, 150 00:13:06,600 --> 00:13:08,700 we're doing the whole batch at once-- 151 00:13:08,700 --> 00:13:10,530 then we compute that for all i. 152 00:13:10,530 --> 00:13:14,970 And that's the computation that's ridiculously expensive, 153 00:13:14,970 --> 00:13:19,020 and you go instead to stochastic gradient. 154 00:13:19,020 --> 00:13:22,260 And you just choose one of those, or b of those, 155 00:13:22,260 --> 00:13:27,430 a small number b, like 32 or 128 of these F's. 156 00:13:27,430 --> 00:13:34,290 But full-scale gradient descent chooses the weights 157 00:13:34,290 --> 00:13:36,840 x to minimize the loss. 158 00:13:36,840 --> 00:13:40,500 Now, so I haven't got the loss here yet. 159 00:13:40,500 --> 00:13:49,860 This function, the loss would be minus the true result 160 00:13:49,860 --> 00:13:52,110 from sample i. 161 00:13:52,110 --> 00:13:53,940 I haven't got a good notation for that. 162 00:13:53,940 --> 00:13:56,400 I'm open to suggestions. 163 00:13:56,400 --> 00:13:58,390 So how do I want to write the error? 164 00:14:01,200 --> 00:14:03,330 So that would be-- 165 00:14:03,330 --> 00:14:06,630 if it was least squares, I would maybe be squaring that. 166 00:14:09,480 --> 00:14:12,300 So it would be a sum of squares of errors 167 00:14:12,300 --> 00:14:15,930 squared over all the samples. 168 00:14:15,930 --> 00:14:18,810 Or if I'm doing stochastic gradient descent, 169 00:14:18,810 --> 00:14:19,700 I would minimize. 170 00:14:19,700 --> 00:14:21,870 I guess I'm minimizing this. 171 00:14:21,870 --> 00:14:25,740 But the question is, do I use the whole function 172 00:14:25,740 --> 00:14:32,580 L at each iteration, or do I just pick one, or only b, 173 00:14:32,580 --> 00:14:37,830 of the samples to look at iteration number K? 174 00:14:37,830 --> 00:14:43,890 So this is the L of x then. 175 00:14:43,890 --> 00:14:47,250 I've added up over all the v's. 176 00:14:47,250 --> 00:14:51,510 So just to keep the notation straight, 177 00:14:51,510 --> 00:14:54,180 I have this function of x and v's. 178 00:14:54,180 --> 00:14:55,530 I find it's output. 179 00:14:58,380 --> 00:15:03,150 This is what the neural net produces. 180 00:15:03,150 --> 00:15:06,300 It's supposed to be close to the true. 181 00:15:06,300 --> 00:15:08,010 We don't want it to be exactly-- 182 00:15:08,010 --> 00:15:10,620 we don't expect this to be exactly 0, 183 00:15:10,620 --> 00:15:15,540 but it could be, because we have lots of weight to achieve that. 184 00:15:18,270 --> 00:15:22,020 So anyway, that would be the loss we minimize, 185 00:15:22,020 --> 00:15:24,780 and it'd be squared for square loss. 186 00:15:24,780 --> 00:15:30,720 I guess I haven't really spoken about loss functions. 187 00:15:30,720 --> 00:15:38,520 Let me just put those here, and actually these 188 00:15:38,520 --> 00:15:42,070 are popular loss functions. 189 00:15:42,070 --> 00:15:49,890 One would be the one we know best, square loss, and number 190 00:15:49,890 --> 00:15:53,730 two, I've never seen it used quite this directly, 191 00:15:53,730 --> 00:16:03,330 would be the l1 loss, maybe the sum of L1 norms. 192 00:16:03,330 --> 00:16:10,250 This is sum of these errors squared in the L2 norm. 193 00:16:10,250 --> 00:16:16,080 The L1 loss could be the sum over i of the L1 losses. 194 00:16:22,510 --> 00:16:28,150 Well, this comes into specific other problems like Lasso 195 00:16:28,150 --> 00:16:32,350 and other important problems you're minimizing an L1 norm 196 00:16:32,350 --> 00:16:36,550 but not in deep learning. 197 00:16:36,550 --> 00:16:39,415 Now, and three would be Hinge loss. 198 00:16:43,960 --> 00:16:48,070 Probably some of you know better than I the formula 199 00:16:48,070 --> 00:16:52,390 and the background behind hinge losses. 200 00:16:52,390 --> 00:16:57,565 This is for the minus 1, 1 classification problems. 201 00:17:05,349 --> 00:17:08,960 That would be appropriate for regression, 202 00:17:08,960 --> 00:17:11,079 so this would be for a regression. 203 00:17:14,670 --> 00:17:18,359 And then finally, the most important for neural nets, 204 00:17:18,359 --> 00:17:21,539 is cross-entropy loss. 205 00:17:28,099 --> 00:17:29,570 This is for neural nets. 206 00:17:34,800 --> 00:17:42,690 So this is really the most used loss function in the setup 207 00:17:42,690 --> 00:17:46,590 that we are mostly thinking of, and I'll 208 00:17:46,590 --> 00:17:51,510 try to say more about that before the course ends. 209 00:17:51,510 --> 00:17:52,470 So is that-- 210 00:17:52,470 --> 00:17:54,090 I don't know. 211 00:17:54,090 --> 00:18:00,450 For me, I hadn't got this straight until rewriting 212 00:18:00,450 --> 00:18:05,400 that section, and it's now in better form, 213 00:18:05,400 --> 00:18:07,735 but comments are welcome. 214 00:18:07,735 --> 00:18:08,235 OK. 215 00:18:11,360 --> 00:18:14,570 So that just completes what I wanted to say, 216 00:18:14,570 --> 00:18:16,325 and you'll see the new section. 217 00:18:19,750 --> 00:18:25,610 Any comment on that before I go to a different topic entirely? 218 00:18:25,610 --> 00:18:26,600 OK. 219 00:18:26,600 --> 00:18:31,190 Oh, any questions before I go to this topic? 220 00:18:31,190 --> 00:18:32,600 Which I'll tell you what it is. 221 00:18:36,880 --> 00:18:47,230 It's a short section in the book about distance matrices, 222 00:18:47,230 --> 00:18:49,000 and the question is. 223 00:18:58,540 --> 00:19:07,160 We have a bunch of points in space, and what we know 224 00:19:07,160 --> 00:19:19,700 is we know the distances between the points, 225 00:19:19,700 --> 00:19:23,050 and it's convenient to talk about distances squared here. 226 00:19:28,050 --> 00:19:30,610 OK. 227 00:19:30,610 --> 00:19:32,740 And how would we know of these distances? 228 00:19:32,740 --> 00:19:39,820 Maybe by radar or any measurement. 229 00:19:39,820 --> 00:19:47,110 They might be sensors, which we've placed around, 230 00:19:47,110 --> 00:19:51,100 and we can measure the distances between them. 231 00:19:51,100 --> 00:19:54,190 And the question is, what's their position? 232 00:19:54,190 --> 00:19:59,450 So that's the question. 233 00:19:59,450 --> 00:20:01,660 So let me talk a little bit about this question 234 00:20:01,660 --> 00:20:03,310 and then pause. 235 00:20:03,310 --> 00:20:14,200 Find positions in, well, in space, but I don't know. 236 00:20:14,200 --> 00:20:16,000 We don't know ahead of time maybe 237 00:20:16,000 --> 00:20:20,500 whether the space is ordinary 3D space, or whether these 238 00:20:20,500 --> 00:20:24,160 are sensors in a plane, or whether we have 239 00:20:24,160 --> 00:20:25,610 to go to higher dimensions. 240 00:20:25,610 --> 00:20:30,130 I'll just put d, and also, I'll just say then, 241 00:20:30,130 --> 00:20:31,820 we're also finding d. 242 00:20:37,570 --> 00:20:39,160 And what are these positions? 243 00:20:39,160 --> 00:20:44,410 These are positions x, xi, so that the distance 244 00:20:44,410 --> 00:20:53,920 between xi minus xj squared is the given dij. 245 00:20:56,470 --> 00:20:59,110 So we're given distances between them, 246 00:20:59,110 --> 00:21:02,110 and we want to find their positions. 247 00:21:02,110 --> 00:21:05,800 So we know distances, and we want to find positions. 248 00:21:05,800 --> 00:21:07,510 That's the question. 249 00:21:07,510 --> 00:21:11,650 It's just a neat math question that is solved, 250 00:21:11,650 --> 00:21:13,360 and you'll see a complete solution. 251 00:21:16,120 --> 00:21:22,680 And it has lots of applications, and it's just a nice question. 252 00:21:22,680 --> 00:21:24,750 So it occupies a section of the book, 253 00:21:24,750 --> 00:21:28,100 but that section is only two pages long. 254 00:21:28,100 --> 00:21:32,670 It's just a straightforward solution to that question. 255 00:21:32,670 --> 00:21:36,900 Given the distances, find the positions. 256 00:21:36,900 --> 00:21:38,965 Given the distances, find the excess. 257 00:21:43,610 --> 00:21:45,500 OK. 258 00:21:45,500 --> 00:21:47,120 So I'm going to speak about that. 259 00:21:50,350 --> 00:21:54,500 I had a suggestion, a good suggestion, by email. 260 00:21:54,500 --> 00:21:58,340 Well, questions about the projects coming in? 261 00:21:58,340 --> 00:22:01,400 Projects are beginning to come in, and at least 262 00:22:01,400 --> 00:22:04,930 at the beginning-- 263 00:22:04,930 --> 00:22:07,460 well, in all cases, beginning and end, 264 00:22:07,460 --> 00:22:09,260 I'll read them carefully. 265 00:22:09,260 --> 00:22:12,290 And as long as I can, I'll send back 266 00:22:12,290 --> 00:22:19,190 suggestions for a final rewrite, and as I said, 267 00:22:19,190 --> 00:22:20,930 a print out is great. 268 00:22:20,930 --> 00:22:24,320 You could leave it in the envelope outside my office, 269 00:22:24,320 --> 00:22:30,330 but of course, online is what everybody's doing. 270 00:22:30,330 --> 00:22:32,240 So those are just beginning to come in, 271 00:22:32,240 --> 00:22:36,230 and if we can get them in by a week from today, 272 00:22:36,230 --> 00:22:38,840 I'm really, really happy. 273 00:22:38,840 --> 00:22:41,940 Yeah, and just feel free to email me. 274 00:22:41,940 --> 00:22:49,450 I would email me about projects, not Jonathan and not anonymous 275 00:22:49,450 --> 00:22:50,660 Stellar. 276 00:22:50,660 --> 00:22:55,280 I think you'd probably do better just to ask me the question. 277 00:22:55,280 --> 00:23:00,410 That's fine, and I'll try to answer in a useful way. 278 00:23:00,410 --> 00:23:04,250 Yeah, and I'm always open to questions. 279 00:23:04,250 --> 00:23:11,240 So you could email me like how long should this project be? 280 00:23:11,240 --> 00:23:14,030 My tutor in Oxford said something like-- 281 00:23:14,030 --> 00:23:19,310 when you were writing essays. 282 00:23:19,310 --> 00:23:22,460 That's the Oxford system is to write an essay-- 283 00:23:22,460 --> 00:23:25,070 and he said, just start where it starts, 284 00:23:25,070 --> 00:23:26,870 and end when it finishes. 285 00:23:26,870 --> 00:23:33,230 So that's the idea, certainly not enormously long. 286 00:23:33,230 --> 00:23:39,740 And then a question was raised-- and I can ask you if you are 287 00:23:39,740 --> 00:23:41,300 interested in that-- 288 00:23:41,300 --> 00:23:45,490 the question was, what courses after this one 289 00:23:45,490 --> 00:23:51,200 are natural to take to go forward? 290 00:23:51,200 --> 00:23:55,730 And I don't know how many of you are thinking to take, have time 291 00:23:55,730 --> 00:24:02,510 to take, other MIT courses in this area of deep learning, 292 00:24:02,510 --> 00:24:08,270 machine learning, optimization, all the topics we've had here. 293 00:24:08,270 --> 00:24:12,320 Anybody expecting to take more courses, just stick up a hand. 294 00:24:12,320 --> 00:24:15,540 Yeah, and you already know like what MIT offers? 295 00:24:18,700 --> 00:24:21,020 So that was the question that came to me, 296 00:24:21,020 --> 00:24:25,710 what does MIT offer in this direction? 297 00:24:25,710 --> 00:24:29,300 And I haven't looked up to see the number of Professor Sra's 298 00:24:29,300 --> 00:24:32,780 course, S-R-A, in course 6. 299 00:24:32,780 --> 00:24:37,850 It's 6 point high number, and after his good lecture, 300 00:24:37,850 --> 00:24:42,510 I think that's got to be worthwhile. 301 00:24:42,510 --> 00:24:44,390 So I looked in course 6. 302 00:24:44,390 --> 00:24:50,450 I didn't find really an institute-wide list. 303 00:24:50,450 --> 00:24:52,970 Maybe course 6 feels that they are the Institute, 304 00:24:52,970 --> 00:24:55,550 but there are other courses around. 305 00:25:00,570 --> 00:25:04,790 But I found in the operations research site, 306 00:25:04,790 --> 00:25:11,390 ORC, the Operations Research Center, let me just put there. 307 00:25:11,390 --> 00:25:14,450 This is just in case you would like 308 00:25:14,450 --> 00:25:17,545 to think about any of these things. 309 00:25:24,280 --> 00:25:29,740 As I write that, so I heard the lecture by Tim Berners-Lee. 310 00:25:29,740 --> 00:25:33,490 Did others hear that a week or so ago? 311 00:25:33,490 --> 00:25:36,710 He created the web. 312 00:25:36,710 --> 00:25:39,820 So that's pretty amazing-- 313 00:25:39,820 --> 00:25:45,580 it wasn't Al Gore, after all, and do you know his name? 314 00:25:45,580 --> 00:25:50,500 Well, he's now Sir Tim Berners-Lee. 315 00:25:53,960 --> 00:25:57,760 So that double name makes you suspect that he's from England, 316 00:25:57,760 --> 00:26:00,520 and he is. 317 00:26:00,520 --> 00:26:03,910 So anyway, I was going to say, I hold 318 00:26:03,910 --> 00:26:07,840 him responsible for these excessive letters 319 00:26:07,840 --> 00:26:14,260 in the address, in the URL. 320 00:26:14,260 --> 00:26:19,090 I mean, he's made us all say W-W-W for years. 321 00:26:19,090 --> 00:26:23,650 Find some other way to say it, but it's not easy to say, 322 00:26:23,650 --> 00:26:24,220 I think. 323 00:26:24,220 --> 00:26:25,690 OK, whatever. 324 00:26:25,690 --> 00:26:36,070 This is the OR Center at MIT, and then it's 325 00:26:36,070 --> 00:26:42,750 academics or something, and then it's something 326 00:26:42,750 --> 00:26:45,330 like course offerings. 327 00:26:45,330 --> 00:26:46,650 That's approximately right. 328 00:26:52,080 --> 00:26:55,920 And since they do applied optimization, 329 00:26:55,920 --> 00:26:59,940 under the heading of data analytics or statistics, 330 00:26:59,940 --> 00:27:05,790 there's optimization, there's OR, Operations Research, 331 00:27:05,790 --> 00:27:13,020 other lists but a good list of courses from many departments, 332 00:27:13,020 --> 00:27:19,200 especially course 6. 333 00:27:19,200 --> 00:27:22,650 Course 15 which is where the operation and research 334 00:27:22,650 --> 00:27:26,190 center is, course 18, and there are others 335 00:27:26,190 --> 00:27:28,780 in course 2 and elsewhere. 336 00:27:28,780 --> 00:27:29,280 Yeah. 337 00:27:34,390 --> 00:27:37,930 Would somebody like to say what course you have in mind 338 00:27:37,930 --> 00:27:40,250 to take next, after this one? 339 00:27:40,250 --> 00:27:47,260 If you looked ahead to next year, any suggestions of what 340 00:27:47,260 --> 00:27:48,490 looks like a good course? 341 00:27:51,340 --> 00:27:56,290 I sat in once on 6.036, the really basic course, 342 00:27:56,290 --> 00:28:00,640 and you would want to go higher. 343 00:28:00,640 --> 00:28:01,140 OK. 344 00:28:04,180 --> 00:28:06,970 Maybe this is just to say, I'd be 345 00:28:06,970 --> 00:28:10,810 interested to know what you do next, what your experience is, 346 00:28:10,810 --> 00:28:14,270 or I'd be happy to give advice. 347 00:28:14,270 --> 00:28:18,250 But maybe my general advice is that that's 348 00:28:18,250 --> 00:28:20,405 a useful list of courses. 349 00:28:23,440 --> 00:28:25,820 OK? 350 00:28:25,820 --> 00:28:29,000 Back to distance matrices. 351 00:28:29,000 --> 00:28:32,830 OK, so here's the problem. 352 00:28:32,830 --> 00:28:34,040 Yeah. 353 00:28:34,040 --> 00:28:36,620 OK, I'll probably have to erase that, 354 00:28:36,620 --> 00:28:40,080 but I'll leave it for a minute. 355 00:28:40,080 --> 00:28:41,550 OK. 356 00:28:41,550 --> 00:28:46,550 So we know these distances, and we want to find the x's, so 357 00:28:46,550 --> 00:28:49,730 let's call this dij maybe. 358 00:28:53,240 --> 00:28:58,010 So we have a D matrix, and we want to find a position matrix, 359 00:28:58,010 --> 00:29:00,060 let me just see what notation. 360 00:29:00,060 --> 00:29:10,460 So this is section 3.9, no 4.9, previously 3.9, but chapters 3 361 00:29:10,460 --> 00:29:13,130 and 4 got switched. 362 00:29:13,130 --> 00:29:19,250 Maybe actually, yeah, I think it's 8 or 9 or 10, 363 00:29:19,250 --> 00:29:24,640 other topics are trying to find their way in. 364 00:29:24,640 --> 00:29:25,220 OK. 365 00:29:25,220 --> 00:29:29,640 So that's the reference on the web, 366 00:29:29,640 --> 00:29:32,480 and I'll get these sections onto Stellar. 367 00:29:32,480 --> 00:29:33,500 OK. 368 00:29:33,500 --> 00:29:37,760 So the question is, can we recover the positions 369 00:29:37,760 --> 00:29:40,480 from the distances? 370 00:29:40,480 --> 00:29:41,980 In fact, there's also a question, 371 00:29:41,980 --> 00:29:48,400 are there always positions from given distances? 372 00:29:48,400 --> 00:29:51,980 And I mentioned several applications. 373 00:29:51,980 --> 00:29:56,810 I've already spoken about wireless sensor networks, where 374 00:29:56,810 --> 00:30:00,860 you can measure travel times between them, 375 00:30:00,860 --> 00:30:02,120 between the sensors. 376 00:30:02,120 --> 00:30:05,210 And then that gives you the distances, 377 00:30:05,210 --> 00:30:09,560 and then you use this neat little bit of math 378 00:30:09,560 --> 00:30:11,750 to find the positions. 379 00:30:11,750 --> 00:30:17,270 Well, of course, you can't find the positions uniquely. 380 00:30:17,270 --> 00:30:23,780 Clearly, you could any rigid motion of all the positions. 381 00:30:23,780 --> 00:30:27,920 If I have a set of positions, what 382 00:30:27,920 --> 00:30:30,410 am I going to call that, x? 383 00:30:30,410 --> 00:30:35,375 So I'll write here, and so I'm given the D matrix. 384 00:30:40,440 --> 00:30:49,450 That's distances, and the job is to find the X matrix which 385 00:30:49,450 --> 00:30:50,770 gives the positions. 386 00:30:54,150 --> 00:30:57,780 And what I'm just going to say, and you already 387 00:30:57,780 --> 00:31:02,580 saw it your mind-- that if I have a set of positions, 388 00:31:02,580 --> 00:31:05,880 I could do a translation. 389 00:31:05,880 --> 00:31:08,130 The distances wouldn't change, or I 390 00:31:08,130 --> 00:31:13,980 could do a rigid motion, a rigid rotation. 391 00:31:13,980 --> 00:31:21,840 So positions are not unique, but I can come closer by saying, 392 00:31:21,840 --> 00:31:26,640 put the centroid at the origin, or something like that. 393 00:31:26,640 --> 00:31:29,680 That will take out the translations at least. 394 00:31:29,680 --> 00:31:30,180 OK. 395 00:31:30,180 --> 00:31:31,500 So find the X matrix. 396 00:31:31,500 --> 00:31:32,550 That's the job. 397 00:31:32,550 --> 00:31:36,390 OK, and I was going to-- before I started on that-- 398 00:31:36,390 --> 00:31:39,910 the shapes of molecules is another application. 399 00:31:39,910 --> 00:31:45,260 Nuclear magnetic resonance gives distances, gives d, 400 00:31:45,260 --> 00:31:49,280 and then we find the positions x. 401 00:31:49,280 --> 00:31:51,770 And of course, there's a noise in there, 402 00:31:51,770 --> 00:31:54,350 and sometimes missing entries. 403 00:31:54,350 --> 00:31:58,100 And machine learning could be just described 404 00:31:58,100 --> 00:32:01,400 also as you're given a whole lot of points 405 00:32:01,400 --> 00:32:04,760 in space, feature vectors in a high-dimensional space. 406 00:32:04,760 --> 00:32:06,740 Actually, this is a big deal. 407 00:32:06,740 --> 00:32:09,590 You're given a whole lot of points 408 00:32:09,590 --> 00:32:17,210 with in high-dimensional space, and those are related. 409 00:32:17,210 --> 00:32:19,310 They sort of come together naturally, 410 00:32:19,310 --> 00:32:25,620 so they tend to fit on a surface in high-dimensional space, 411 00:32:25,620 --> 00:32:28,920 a low-dimensional surface in high-dimensional space. 412 00:32:28,920 --> 00:32:31,290 And really, a lot of mathematics is 413 00:32:31,290 --> 00:32:36,720 devoted to finding that low-dimensional, that subspace, 414 00:32:36,720 --> 00:32:38,430 but it could be curved. 415 00:32:38,430 --> 00:32:42,630 So subspace is not the correct word. 416 00:32:42,630 --> 00:32:45,600 Really, manifold, curved manifold 417 00:32:45,600 --> 00:32:49,230 is what a geometer would say. 418 00:32:49,230 --> 00:32:51,900 That is close to all the-- 419 00:32:51,900 --> 00:32:55,110 it's smooth and close to all the points, 420 00:32:55,110 --> 00:32:58,290 and you could linearize it. 421 00:32:58,290 --> 00:33:01,830 You could flatten it out, and then you 422 00:33:01,830 --> 00:33:04,050 have a much reduced problem. 423 00:33:04,050 --> 00:33:07,320 The dimension is reduced from the original dimension 424 00:33:07,320 --> 00:33:13,050 of where the points lie with a lot of data 425 00:33:13,050 --> 00:33:17,498 to the true dimension of the problem which, of course, 426 00:33:17,498 --> 00:33:19,290 sets of points were all on a straight line. 427 00:33:19,290 --> 00:33:22,630 The true dimension of the problem would be 1. 428 00:33:22,630 --> 00:33:25,065 So we have to discover this. 429 00:33:29,230 --> 00:33:33,520 We also have to find that dimension d. 430 00:33:33,520 --> 00:33:34,650 OK, so how do we do it? 431 00:33:38,160 --> 00:33:40,420 So it's a classical problem. 432 00:33:40,420 --> 00:33:42,150 It just has a neat answer. 433 00:33:42,150 --> 00:33:42,650 OK. 434 00:33:46,870 --> 00:33:53,170 All right, so let's recognize the connection 435 00:33:53,170 --> 00:33:55,750 between distances and positions. 436 00:33:55,750 --> 00:34:04,090 So dij is the square distance between them, 437 00:34:04,090 --> 00:34:19,649 so that is xi dot xi minus xi to xj minus xj, xi plus xj, xj. 438 00:34:22,730 --> 00:34:23,230 OK. 439 00:34:26,210 --> 00:34:28,667 Is that right? 440 00:34:28,667 --> 00:34:30,635 Yes. 441 00:34:30,635 --> 00:34:33,600 OK. 442 00:34:33,600 --> 00:34:37,219 So those are the dij's in a matrix, 443 00:34:37,219 --> 00:34:45,020 and these are entries in the matrix D. OK. 444 00:34:48,070 --> 00:34:57,560 Well, these entries depend only on i. 445 00:34:57,560 --> 00:34:59,820 They're the same for every j. 446 00:34:59,820 --> 00:35:04,200 So this is going to be-- this will this part will produce-- 447 00:35:04,200 --> 00:35:05,430 I'll rank one matrix. 448 00:35:09,690 --> 00:35:13,020 Because things depend not only on the row but not on j, 449 00:35:13,020 --> 00:35:20,565 the column number, so columns repeated. 450 00:35:27,970 --> 00:35:34,420 Yeah, and this produces similarly 451 00:35:34,420 --> 00:35:39,550 something that depends only on j, only on the column number. 452 00:35:39,550 --> 00:35:43,350 So the rows are all the same, so this is also 453 00:35:43,350 --> 00:35:55,730 a rank one matrix with all repeated, all the same rows. 454 00:35:55,730 --> 00:36:01,180 Because if I change i, nothing changes in a product. 455 00:36:01,180 --> 00:36:06,630 So really, these are the terms that 456 00:36:06,630 --> 00:36:13,900 produce most of the matrix, the significant part of the matrix. 457 00:36:13,900 --> 00:36:14,890 OK. 458 00:36:14,890 --> 00:36:20,440 So what do we do with those? 459 00:36:26,710 --> 00:36:30,640 So let's see, did I give a name for the matrix 460 00:36:30,640 --> 00:36:31,690 that I'm looking for? 461 00:36:31,690 --> 00:36:49,930 I think in the notes I call it X. So I'm given D, find X. 462 00:36:49,930 --> 00:36:52,720 And what I'll actually find-- 463 00:36:52,720 --> 00:36:54,620 you can see it coming here-- 464 00:36:54,620 --> 00:37:06,490 is actually find X transpose X. Because what I'm given is dot 465 00:37:06,490 --> 00:37:08,740 products of X's. 466 00:37:11,530 --> 00:37:17,430 So I would like to discover out of all this 467 00:37:17,430 --> 00:37:20,870 what xi dotted with xj is. 468 00:37:20,870 --> 00:37:23,380 That'll be the correct dot product. 469 00:37:23,380 --> 00:37:29,470 Let's call this matrix G for the dot product matrix, 470 00:37:29,470 --> 00:37:40,780 and then find X from G. 471 00:37:40,780 --> 00:37:44,960 So this is a nice argument. 472 00:37:44,960 --> 00:37:50,230 So what this tells me is some information about dot products. 473 00:37:50,230 --> 00:37:54,280 So this is telling me something about the G matrix, the X 474 00:37:54,280 --> 00:37:56,110 transpose X matrix. 475 00:37:56,110 --> 00:38:01,390 And then once I know G, then it's a separate step to find X. 476 00:38:01,390 --> 00:38:06,640 And of course, this is the point at which X is not unique. 477 00:38:06,640 --> 00:38:11,800 If I put it in a rotation into X, then that rotation q, 478 00:38:11,800 --> 00:38:15,850 I'll see a q transpose q, and it'll disappear. 479 00:38:15,850 --> 00:38:19,720 So I'm free to rotate the X's, because that 480 00:38:19,720 --> 00:38:22,420 doesn't change the dot product. 481 00:38:22,420 --> 00:38:25,990 So it's G that I want to know, and this tells me something 482 00:38:25,990 --> 00:38:31,252 about G, and this tells me something about G. 483 00:38:31,252 --> 00:38:36,480 And so does that, but that's what I have to see. 484 00:38:36,480 --> 00:38:40,170 So what do those tell me? 485 00:38:40,170 --> 00:38:40,950 Let's see. 486 00:38:40,950 --> 00:38:43,560 Let me write down what I have here. 487 00:38:50,060 --> 00:39:00,940 So let's say a diagonal matrix with Dii as the inner product 488 00:39:00,940 --> 00:39:10,120 xi with xi that we're getting partial information from here. 489 00:39:10,120 --> 00:39:11,530 So is that OK? 490 00:39:11,530 --> 00:39:15,630 I'm introducing that notation, because this is now 491 00:39:15,630 --> 00:39:20,560 going to tell me that my D matrix is-- 492 00:39:20,560 --> 00:39:22,390 so what is that? 493 00:39:22,390 --> 00:39:25,660 So this is the diagonal matrix. 494 00:39:25,660 --> 00:39:27,704 Maybe it's just a vector, I should say. 495 00:39:30,220 --> 00:39:30,720 Yeah. 496 00:39:39,160 --> 00:39:45,220 Yeah, so can I write down the equation that is fundamental 497 00:39:45,220 --> 00:39:49,750 here, and then we'll figure out what it means. 498 00:39:49,750 --> 00:39:58,600 So it's an equation for G, for the dot product matrix. 499 00:39:58,600 --> 00:40:01,120 OK, let me make space for that equation. 500 00:40:05,410 --> 00:40:07,660 I believe that we can get the dot product 501 00:40:07,660 --> 00:40:10,054 matrix which I'm calling G as-- 502 00:40:14,380 --> 00:40:20,680 according to this, it's minus 1/2 of the D matrix 503 00:40:20,680 --> 00:40:33,070 plus 1/2 of the 1's times the d, the diagonal d. 504 00:40:33,070 --> 00:40:43,410 And it's plus 1/2 of the d times the 1's. 505 00:40:48,460 --> 00:40:52,420 That's a matrix with constant rows. 506 00:40:55,390 --> 00:40:59,860 This here is coming from there. 507 00:40:59,860 --> 00:41:11,020 This is a matrix with always the same columns, or let me see. 508 00:41:11,020 --> 00:41:13,990 No, I haven't got those right yet. 509 00:41:13,990 --> 00:41:21,100 I mean, I want these to be rank 1 matrices, so it's this one. 510 00:41:21,100 --> 00:41:22,160 Let me fix that. 511 00:41:32,780 --> 00:41:40,090 1, 1, 1, 1 times d transpose, so it's column times row, 512 00:41:40,090 --> 00:41:49,720 and this one is also column times row with the d here. 513 00:41:52,310 --> 00:41:56,150 OK, now let me look at that properly. 514 00:41:56,150 --> 00:42:03,350 So every row in this guy is a multiple of 1, 1, 1, 1. 515 00:42:03,350 --> 00:42:04,580 So what is that telling me? 516 00:42:04,580 --> 00:42:09,920 That all columns are the same, this part 517 00:42:09,920 --> 00:42:15,860 is reflecting these ones, where the columns are repeated. 518 00:42:15,860 --> 00:42:19,280 This one is reflecting this, where the rows are repeated. 519 00:42:19,280 --> 00:42:22,100 The d is just the set of d numbers. 520 00:42:22,100 --> 00:42:34,455 Let's call that di, and this is dj, and here's the D matrix. 521 00:42:41,090 --> 00:42:46,760 So part of the D matrix is this bit and this bit, 522 00:42:46,760 --> 00:42:49,670 each giving a rank 1. 523 00:42:49,670 --> 00:42:52,970 Now, it's this part that I have to understand, 524 00:42:52,970 --> 00:42:56,960 so while you're checking on that, 525 00:42:56,960 --> 00:43:02,090 let me look again at this. 526 00:43:07,590 --> 00:43:08,090 Yeah. 527 00:43:13,590 --> 00:43:17,170 Let's just see where we are if this is true. 528 00:43:17,170 --> 00:43:20,590 If this is true, I'm given the D matrix, 529 00:43:20,590 --> 00:43:28,810 and then these dot products I can find. 530 00:43:28,810 --> 00:43:30,710 So I can find these, so in other words, 531 00:43:30,710 --> 00:43:34,164 this is the key equation that tells me 532 00:43:34,164 --> 00:43:39,130 D. That's the key equation, and it's 533 00:43:39,130 --> 00:43:43,300 going to come just from that simple identity, 534 00:43:43,300 --> 00:43:44,930 just from checking each term. 535 00:43:44,930 --> 00:43:47,530 This term we identified, that last term we 536 00:43:47,530 --> 00:43:54,860 identified, and now this term is D. Well, of, course it's D. 537 00:43:54,860 --> 00:43:58,750 So I have two of those, and I'm going 538 00:43:58,750 --> 00:44:03,460 to take half of that to get D, I think. 539 00:44:10,150 --> 00:44:12,970 Yeah, and we'll look. 540 00:44:15,900 --> 00:44:16,400 Yeah. 541 00:44:22,340 --> 00:44:29,300 So I guess I'm not seeing right away why this 1/2 is in here, 542 00:44:29,300 --> 00:44:32,960 but I think I had it right, and there's a reason. 543 00:44:32,960 --> 00:44:37,010 You see that this matrix this, X transpose X matrix, 544 00:44:37,010 --> 00:44:41,630 is coming from these rank 1 pieces and these pieces which 545 00:44:41,630 --> 00:44:47,430 are the cross product. 546 00:44:47,430 --> 00:44:48,570 Oh, I see. 547 00:44:48,570 --> 00:44:51,060 I see. 548 00:44:51,060 --> 00:44:53,060 What that equation is really saying 549 00:44:53,060 --> 00:44:56,730 is that the D matrix is this-- 550 00:44:59,340 --> 00:45:02,220 if I just read that along and translate it and put it 551 00:45:02,220 --> 00:45:14,070 in matrix language-- is this 1, 1, 1, 1, d1 to d4, let's say, 552 00:45:14,070 --> 00:45:17,310 transpose is this rank 1 matrix. 553 00:45:17,310 --> 00:45:24,900 And the other one is the d's times the 1, 1 554 00:45:24,900 --> 00:45:27,060 which is a transpose of that. 555 00:45:27,060 --> 00:45:30,650 And then the other one was a minus 2 556 00:45:30,650 --> 00:45:32,590 of the cross product matrices. 557 00:45:32,590 --> 00:45:33,370 I see. 558 00:45:33,370 --> 00:45:34,530 Yeah. 559 00:45:34,530 --> 00:45:38,520 So when I write that equation in matrix language, 560 00:45:38,520 --> 00:45:39,280 I just get that. 561 00:45:41,810 --> 00:45:44,611 And now, when I solve for X-- 562 00:45:44,611 --> 00:45:53,200 oh, minus 2 X transpose X. Yeah. 563 00:45:53,200 --> 00:45:57,270 Sorry, cross products, the X's. 564 00:45:57,270 --> 00:45:59,910 So I had one set of cross products, 565 00:45:59,910 --> 00:46:03,090 and then this is the same as this, so I have minus 2 566 00:46:03,090 --> 00:46:03,790 of them. 567 00:46:03,790 --> 00:46:06,420 So now, I'm just rewriting that. 568 00:46:06,420 --> 00:46:08,430 When I rewrite that equation, I have that. 569 00:46:08,430 --> 00:46:09,960 Do you see that? 570 00:46:09,960 --> 00:46:12,360 I put that on this side. 571 00:46:12,360 --> 00:46:15,105 I put the d over here as a minus d. 572 00:46:15,105 --> 00:46:20,250 I divide by 2, and then that's the formula. 573 00:46:20,250 --> 00:46:26,240 So ultimately, this simple identity 574 00:46:26,240 --> 00:46:30,560 just looked at-- because these pieces were so simple, 575 00:46:30,560 --> 00:46:33,860 just rank one pieces, and these pieces 576 00:46:33,860 --> 00:46:39,500 were exactly what we want, the X transpose X pieces, the G. 577 00:46:39,500 --> 00:46:45,110 That equation told us the D. All this is known. 578 00:46:48,350 --> 00:46:53,260 Well, so what's known is D and this and this. 579 00:46:53,260 --> 00:47:01,310 So now, we have the equation for X transpose X is minus 1/2 of D 580 00:47:01,310 --> 00:47:05,330 minus these rank 1's. 581 00:47:11,530 --> 00:47:16,250 Sorry to make it look messy. 582 00:47:16,250 --> 00:47:19,060 I remember Raj Rao talking about it last spring, 583 00:47:19,060 --> 00:47:25,330 also the algebra got flustered. 584 00:47:25,330 --> 00:47:26,710 So we get it. 585 00:47:26,710 --> 00:47:31,450 So we know X transpose X, that matrix. 586 00:47:31,450 --> 00:47:35,320 Now, can we just do four minutes of linear algebra 587 00:47:35,320 --> 00:47:38,620 at the end today? 588 00:47:41,900 --> 00:47:51,170 Given X transpose X, find X. This is n by n. 589 00:47:56,590 --> 00:47:58,880 How would you do that? 590 00:47:58,880 --> 00:48:00,680 Could you do it? 591 00:48:00,680 --> 00:48:03,190 Would there be just one X? 592 00:48:03,190 --> 00:48:05,620 No. 593 00:48:05,620 --> 00:48:11,380 So if you had one X, multiply that by a rotation, 594 00:48:11,380 --> 00:48:14,530 by an orthogonal matrix, you'd have another one. 595 00:48:14,530 --> 00:48:19,690 So this is finding X up to an orthogonal transformation, 596 00:48:19,690 --> 00:48:22,150 but how would you actually do that? 597 00:48:22,150 --> 00:48:27,025 What do we know about this matrix, X transpose X? 598 00:48:27,025 --> 00:48:31,030 It's symmetric, clearly, and what we especially know 599 00:48:31,030 --> 00:48:32,470 is that it is also? 600 00:48:32,470 --> 00:48:33,490 AUDIENCE: Positive. 601 00:48:33,490 --> 00:48:36,970 GILBERT STRANG: Positive or semidefinite, 602 00:48:36,970 --> 00:48:38,890 so this is semidefinite. 603 00:48:42,180 --> 00:48:44,260 So I'm given a semidefinite matrix, 604 00:48:44,260 --> 00:48:48,640 and I want to find a square root, you could say. 605 00:48:48,640 --> 00:48:50,890 That matrix is the X transpose X, 606 00:48:50,890 --> 00:48:54,160 and I want to find X. I think there are two leading 607 00:48:54,160 --> 00:48:56,920 candidates. 608 00:48:56,920 --> 00:49:00,730 There are many candidates, because if you find one, 609 00:49:00,730 --> 00:49:09,730 then any QX is OK. 610 00:49:09,730 --> 00:49:13,910 Because if I put a Q transpose Q in there, it's the identity. 611 00:49:13,910 --> 00:49:14,650 OK. 612 00:49:14,650 --> 00:49:24,580 So one way is to use eigenvalues of X transpose X, 613 00:49:24,580 --> 00:49:31,320 and the other way would be to use elimination on X transpose 614 00:49:31,320 --> 00:49:37,280 X. So I'll put use. 615 00:49:37,280 --> 00:49:40,150 So if I use eigenvalues of X, if I 616 00:49:40,150 --> 00:49:43,500 find the eigenvalues of X transpose X, 617 00:49:43,500 --> 00:49:45,940 then I'm writing this a-- 618 00:49:45,940 --> 00:49:47,870 it's a symmetric, positive definition-- 619 00:49:47,870 --> 00:49:51,410 I'm writing it as Q lambda Q transpose. 620 00:49:51,410 --> 00:49:51,910 Right? 621 00:49:51,910 --> 00:49:54,820 That's the fundamental most important theorem 622 00:49:54,820 --> 00:49:57,130 in linear algebra, you could say. 623 00:49:57,130 --> 00:50:00,940 That a symmetric, positive, semidefinite matrix 624 00:50:00,940 --> 00:50:06,340 has greater eigenvalues, greater or equal to 0, 625 00:50:06,340 --> 00:50:09,650 and eigenvectors that are orthogonal. 626 00:50:09,650 --> 00:50:12,580 So now, if I know that, what's a good X? 627 00:50:12,580 --> 00:50:16,820 Then, take X to be what? 628 00:50:20,300 --> 00:50:24,920 So I've got the eigenvalues and eigenvectors of X transpose X, 629 00:50:24,920 --> 00:50:27,590 and I'm looking for an X that will work. 630 00:50:27,590 --> 00:50:33,800 And one idea is just to take the same eigenvectors, 631 00:50:33,800 --> 00:50:37,160 and take the square roots of the eigenvalues. 632 00:50:40,660 --> 00:50:41,930 That's symmetric now. 633 00:50:41,930 --> 00:50:51,860 This is equal to X transpose, and that's 634 00:50:51,860 --> 00:50:57,150 a square root symbol, or a lambda to the 1/2, I could say. 635 00:50:57,150 --> 00:51:00,260 So when I multiply that-- 636 00:51:00,260 --> 00:51:03,500 X transpose X is just X squared here. 637 00:51:03,500 --> 00:51:07,610 When I square it, the Q transpose Q 638 00:51:07,610 --> 00:51:10,700 multiplies itself to give the identity. 639 00:51:10,700 --> 00:51:14,740 The square root of lambda times the square root of lambda, 640 00:51:14,740 --> 00:51:17,750 those are diagonal matrices that give lambda, 641 00:51:17,750 --> 00:51:20,010 and I get the right answer. 642 00:51:20,010 --> 00:51:23,510 So one way is, in a few words, take 643 00:51:23,510 --> 00:51:25,550 the square roots of the eigenvalues 644 00:51:25,550 --> 00:51:27,560 and keep the eigenvectors. 645 00:51:27,560 --> 00:51:30,440 So that's the eigenvalue construction. 646 00:51:30,440 --> 00:51:34,220 So that's producing an X that is symmetric, 647 00:51:34,220 --> 00:51:36,650 positive, semidefinite. 648 00:51:36,650 --> 00:51:38,930 That might be what you want. 649 00:51:38,930 --> 00:51:42,260 It's a little work, because your computing eigenvalues 650 00:51:42,260 --> 00:51:46,550 and eigenvectors to do it, but that's one choice. 651 00:51:46,550 --> 00:51:51,200 Now, I believe that elimination would give us another choice. 652 00:51:51,200 --> 00:51:55,100 So elimination produces what factorization of this? 653 00:51:55,100 --> 00:52:00,020 This is still our symmetric, positive, definite matrix. 654 00:52:00,020 --> 00:52:03,020 If you do elimination on that, you usually 655 00:52:03,020 --> 00:52:12,980 expect L, a lower triangular, times D, the pivots, times U, 656 00:52:12,980 --> 00:52:14,060 the upper triangle. 657 00:52:14,060 --> 00:52:18,620 That's the usual result of elimination, LDU. 658 00:52:18,620 --> 00:52:21,260 I'm factoring out the pivots, so they're 659 00:52:21,260 --> 00:52:25,460 1's on the diagonals of L and U. But now, 660 00:52:25,460 --> 00:52:29,980 if it's a symmetric matrix, what's up? 661 00:52:29,980 --> 00:52:32,830 We zipped by elimination, regarding 662 00:52:32,830 --> 00:52:40,120 that as a 18.06 trivial bit of linear algebra, 663 00:52:40,120 --> 00:52:42,280 but of course, it's highly important. 664 00:52:42,280 --> 00:52:47,290 So what's the situation here when the matrix is actually 665 00:52:47,290 --> 00:52:48,010 symmetric? 666 00:52:51,020 --> 00:52:53,810 So I want something to look symmetric. 667 00:52:53,810 --> 00:52:56,130 How do I make that look symmetric? 668 00:52:56,130 --> 00:52:59,060 The U gets replaced by L transpose. 669 00:53:03,614 --> 00:53:05,920 If I'm working on a positive definite-- 670 00:53:05,920 --> 00:53:08,090 say positive definite matrix-- 671 00:53:08,090 --> 00:53:15,020 then I get positive pivots, and L and lower triangular 672 00:53:15,020 --> 00:53:17,820 and upper triangular are transposes of each other. 673 00:53:17,820 --> 00:53:20,960 So now, what is then the X? 674 00:53:23,840 --> 00:53:25,610 It's just like that. 675 00:53:25,610 --> 00:53:31,990 I'll use L square root of the D L transpose. 676 00:53:31,990 --> 00:53:34,800 Is that right? 677 00:53:34,800 --> 00:53:37,260 Oh, wait a minute. 678 00:53:37,260 --> 00:53:38,920 What's up? 679 00:53:38,920 --> 00:53:41,620 No, that's not going to work, because I 680 00:53:41,620 --> 00:53:45,310 don't have L transpose L. Where I had Q transpose Q, 681 00:53:45,310 --> 00:53:46,490 it was good. 682 00:53:46,490 --> 00:53:48,250 No, sorry. 683 00:53:48,250 --> 00:53:52,400 Let's get that totally erased. 684 00:53:52,400 --> 00:53:57,560 The X part should just be the square root of DL transpose. 685 00:54:00,230 --> 00:54:02,840 The X is now a triangular matrix, 686 00:54:02,840 --> 00:54:07,550 the square root of the pivots, and the L transpose part. 687 00:54:07,550 --> 00:54:10,220 And now, when I do X transpose X, 688 00:54:10,220 --> 00:54:13,790 then you see X transpose X coming correctly. 689 00:54:13,790 --> 00:54:18,950 X transpose will be L transpose. 690 00:54:18,950 --> 00:54:21,440 Transpose will give me the L. Square root of D 691 00:54:21,440 --> 00:54:24,080 will be square root of D. We'll give the D, 692 00:54:24,080 --> 00:54:26,030 and then the L transpose is right. 693 00:54:26,030 --> 00:54:27,740 So this is called the-- 694 00:54:30,290 --> 00:54:31,880 do I try to write it here? 695 00:54:31,880 --> 00:54:35,073 This is my last word for today-- 696 00:54:35,073 --> 00:54:35,615 the Cholesky. 697 00:54:39,470 --> 00:54:42,140 This is the Cholesky Factorization, 698 00:54:42,140 --> 00:54:47,430 named after a French guy, a French soldier actually. 699 00:54:47,430 --> 00:54:54,470 So LDL transpose is Cholesky, and that's 700 00:54:54,470 --> 00:54:56,840 easy to compute, much faster to compute 701 00:54:56,840 --> 00:54:59,120 than the eigenvalue square root. 702 00:54:59,120 --> 00:55:01,460 But this square root is triangular. 703 00:55:01,460 --> 00:55:03,710 This square root is symmetric. 704 00:55:03,710 --> 00:55:08,810 Those are the two pieces of linear algebra to find things, 705 00:55:08,810 --> 00:55:12,020 to reduce things to triangular form, 706 00:55:12,020 --> 00:55:16,350 or to reduce them to connect them with symmetric matrices. 707 00:55:16,350 --> 00:55:19,130 OK, thank you for attention today. 708 00:55:19,130 --> 00:55:26,180 So today, we did the distance matrices, 709 00:55:26,180 --> 00:55:29,720 and this was the final step to get the X. 710 00:55:29,720 --> 00:55:35,840 And also, most important was to get 711 00:55:35,840 --> 00:55:39,350 the structure of a neural net straight, 712 00:55:39,350 --> 00:55:42,500 separating the v's, the sample vectors, 713 00:55:42,500 --> 00:55:44,780 from the x's, the weights. 714 00:55:44,780 --> 00:55:49,580 OK, so Friday, I've got one volunteer 715 00:55:49,580 --> 00:55:54,560 to talk about a project, and I'm desperately looking for more. 716 00:55:54,560 --> 00:55:56,750 Please just send me an email. 717 00:56:00,226 --> 00:56:02,660 It'd would be appreciated, or I'll send you an email, 718 00:56:02,660 --> 00:56:03,920 if necessary. 719 00:56:03,920 --> 00:56:05,880 OK, thanks.