1 00:00:01,550 --> 00:00:03,920 The following content is provided under a Creative 2 00:00:03,920 --> 00:00:05,310 Commons license. 3 00:00:05,310 --> 00:00:07,520 Your support will help MIT OpenCourseWare 4 00:00:07,520 --> 00:00:11,610 continue to offer high quality educational resources for free. 5 00:00:11,610 --> 00:00:14,180 To make a donation, or to view additional materials 6 00:00:14,180 --> 00:00:18,140 from hundreds of MIT courses, visit MIT OpenCourseWare 7 00:00:18,140 --> 00:00:19,026 at ocw.mit.edu. 8 00:00:22,218 --> 00:00:23,010 GILBERT STRANG: OK. 9 00:00:23,010 --> 00:00:29,070 Just as we're getting started, I thought 10 00:00:29,070 --> 00:00:33,450 I'd add a few words about a question that 11 00:00:33,450 --> 00:00:35,550 came up after class. 12 00:00:35,550 --> 00:00:40,710 Suppose, in that discussion last time, 13 00:00:40,710 --> 00:00:44,430 where you were given three-- 14 00:00:44,430 --> 00:00:47,220 you were given a distance matrix-- 15 00:00:47,220 --> 00:00:51,420 you were given the distance between x1 and x2, 16 00:00:51,420 --> 00:00:54,630 between x2 and x3, and between x1 and x3, 17 00:00:54,630 --> 00:01:00,330 and you wanted to find points that satisfied that. 18 00:01:00,330 --> 00:01:04,260 Well, we're going to fail on this example, 19 00:01:04,260 --> 00:01:09,150 because if the distance here is 1, the distance here is 1, 20 00:01:09,150 --> 00:01:14,850 then by the triangle inequality, the distance from x1 to x3 21 00:01:14,850 --> 00:01:16,980 could not be more than 2. 22 00:01:16,980 --> 00:01:19,890 And when we square it, it could not be more than 4. 23 00:01:19,890 --> 00:01:21,690 And here it's 6. 24 00:01:21,690 --> 00:01:22,800 So what's going to happen? 25 00:01:22,800 --> 00:01:26,060 What goes wrong in that case? 26 00:01:26,060 --> 00:01:26,560 Yeah. 27 00:01:26,560 --> 00:01:30,660 I hadn't commented on that, and I'm not sure 28 00:01:30,660 --> 00:01:38,460 that the paper that I referenced does so. 29 00:01:38,460 --> 00:01:41,760 So I had to do a little search back in the literature, 30 00:01:41,760 --> 00:01:45,430 because people couldn't overlook this problem. 31 00:01:45,430 --> 00:01:52,230 So this is the triangle inequality fails. 32 00:01:57,340 --> 00:02:01,670 And it's not going to help to go into 10 dimensions, 33 00:02:01,670 --> 00:02:07,700 because the triangle inequalities doesn't change. 34 00:02:07,700 --> 00:02:09,900 And it's still there in 10 dimensions. 35 00:02:09,900 --> 00:02:12,170 And we're still failing. 36 00:02:12,170 --> 00:02:13,160 So what happens? 37 00:02:13,160 --> 00:02:15,410 Well, what could happen? 38 00:02:15,410 --> 00:02:16,250 Do you remember? 39 00:02:16,250 --> 00:02:20,360 And you'll have to remind me, the key equation. 40 00:02:20,360 --> 00:02:29,680 You remember, we had an equation connecting the-- 41 00:02:29,680 --> 00:02:33,500 so what is the matrix D for this problem? 42 00:02:33,500 --> 00:02:39,410 So D is-- this is a 3 by 3 matrix 43 00:02:39,410 --> 00:02:41,630 with these distances squared. 44 00:02:41,630 --> 00:02:44,150 And it was convenient to use distances 45 00:02:44,150 --> 00:02:49,340 squared, because that's what comes into the next steps. 46 00:02:49,340 --> 00:02:54,450 So of course, the distance from each x to itself is zero. 47 00:02:54,450 --> 00:02:58,970 The distance from x distance squared was that. 48 00:02:58,970 --> 00:03:00,980 This one was that. 49 00:03:00,980 --> 00:03:03,630 But this one is 6. 50 00:03:03,630 --> 00:03:06,260 OK. 51 00:03:06,260 --> 00:03:08,570 So that's the distance matrix. 52 00:03:08,570 --> 00:03:11,030 And we would like to find-- 53 00:03:11,030 --> 00:03:13,430 the job was to find-- 54 00:03:13,430 --> 00:03:20,360 and I'm just going to write down, we cannot find x1, x2, 55 00:03:20,360 --> 00:03:26,760 and x3 to match those distances. 56 00:03:26,760 --> 00:03:27,650 So what goes wrong? 57 00:03:27,650 --> 00:03:30,320 Well, there's only one thing that could go wrong. 58 00:03:30,320 --> 00:03:35,720 When you connect this distance matrix D to the matrix X 59 00:03:35,720 --> 00:03:38,780 transpose X-- you remember the position matrix-- 60 00:03:38,780 --> 00:03:42,210 maybe I called it G? 61 00:03:42,210 --> 00:03:51,720 This is giving-- so Gij is the dot product of xi with xj. 62 00:03:54,540 --> 00:03:57,270 Make that into a j. 63 00:03:57,270 --> 00:03:57,960 Thank you. 64 00:04:03,180 --> 00:04:08,340 So Gij is the matrix of dot product. 65 00:04:08,340 --> 00:04:15,070 And the great thing was that we can discover what that matrix-- 66 00:04:15,070 --> 00:04:18,320 that matrix G comes directly from D-- 67 00:04:18,320 --> 00:04:20,820 comes directly from D. And of course, 68 00:04:20,820 --> 00:04:23,950 what do we know about this matrix of cross products? 69 00:04:23,950 --> 00:04:27,779 We know that is positive semidefinite. 70 00:04:33,530 --> 00:04:35,270 So what goes wrong? 71 00:04:35,270 --> 00:04:40,250 Well, just in a word, when we write out that equation 72 00:04:40,250 --> 00:04:45,690 and discover what G is, if the triangle inequality fails, 73 00:04:45,690 --> 00:04:51,650 we learn that G doesn't come out positive definite. 74 00:04:51,650 --> 00:04:54,760 That's really all I want to say. 75 00:04:54,760 --> 00:04:57,670 And I could push through the example. 76 00:04:57,670 --> 00:05:00,820 G will not come out positive definite if D-- 77 00:05:00,820 --> 00:05:05,110 if that's D because it can't. 78 00:05:05,110 --> 00:05:07,420 If it came out positive definite, 79 00:05:07,420 --> 00:05:11,350 then we could find an X. So if we had the G, then 80 00:05:11,350 --> 00:05:14,140 the final step, you remember, is to find 81 00:05:14,140 --> 00:05:19,360 an X. Well we know that if G is positive semidefinite, 82 00:05:19,360 --> 00:05:24,970 there are multiple ways to find an X. 83 00:05:24,970 --> 00:05:29,320 This is positive semidefinite matrices is what you get out 84 00:05:29,320 --> 00:05:31,420 of X transpose X's. 85 00:05:31,420 --> 00:05:36,160 And we can find an x given a G. We can find G given an x. 86 00:05:36,160 --> 00:05:48,390 So it has to be that this won't be true-- 87 00:05:48,390 --> 00:05:51,580 that the matrix G that comes out of that equation 88 00:05:51,580 --> 00:05:54,370 will turn out not to be positive definite. 89 00:05:54,370 --> 00:05:57,350 So it's really quite nice. 90 00:05:57,350 --> 00:05:59,290 It's a beautiful little bit of mathematics, 91 00:05:59,290 --> 00:06:04,270 that if, and only if, the triangle inequality 92 00:06:04,270 --> 00:06:08,140 is satisfied by these numbers-- 93 00:06:08,140 --> 00:06:09,730 if and only if-- 94 00:06:09,730 --> 00:06:14,300 then the matrix in the D matrix-- 95 00:06:14,300 --> 00:06:17,300 then the G matrix that comes out of this equation-- 96 00:06:17,300 --> 00:06:18,730 which I haven't written-- 97 00:06:18,730 --> 00:06:21,360 is positive semidefinite. 98 00:06:21,360 --> 00:06:26,160 If the triangle inequality is OK, we can find the points. 99 00:06:26,160 --> 00:06:30,600 If the triangle inequality is violated-- like here-- 100 00:06:30,600 --> 00:06:34,260 then the matrix G is not positive semidefinite, 101 00:06:34,260 --> 00:06:38,070 has negative eigenvalues, and we cannot find the point. 102 00:06:38,070 --> 00:06:39,060 Yeah. 103 00:06:39,060 --> 00:06:42,120 I could recall the G matrix but-- 104 00:06:42,120 --> 00:06:54,000 the G equation, but it's coming to you in the two page section 105 00:06:54,000 --> 00:06:59,250 that does distance matrices. 106 00:06:59,250 --> 00:07:00,480 OK. 107 00:07:00,480 --> 00:07:06,740 That's just-- I should have made a point. 108 00:07:06,740 --> 00:07:09,000 It's nice to have specific numbers. 109 00:07:09,000 --> 00:07:11,820 And I could get the specific numbers for G, 110 00:07:11,820 --> 00:07:13,380 and we would see, no way. 111 00:07:13,380 --> 00:07:14,960 It's not positive definite. 112 00:07:14,960 --> 00:07:15,840 OK. 113 00:07:15,840 --> 00:07:18,990 So that's just tidying up last time. 114 00:07:18,990 --> 00:07:23,670 I have another small problem to talk about, 115 00:07:23,670 --> 00:07:29,610 and then a big question of whether deep learning actually 116 00:07:29,610 --> 00:07:30,510 works. 117 00:07:30,510 --> 00:07:33,600 I had an email from an expert last night, 118 00:07:33,600 --> 00:07:39,930 which changed my view of the world about that question, 119 00:07:39,930 --> 00:07:43,390 as you can imagine. 120 00:07:43,390 --> 00:07:49,170 The change in my world was, I had thought the answer was yes, 121 00:07:49,170 --> 00:07:52,140 and I now think the answer is no. 122 00:07:52,140 --> 00:07:56,820 So that's like rather a big issue for 18.065. 123 00:07:56,820 --> 00:07:59,095 But we'll-- let's see about that later. 124 00:07:59,095 --> 00:07:59,595 OK. 125 00:08:03,750 --> 00:08:06,380 Now Procrustes' problem. 126 00:08:06,380 --> 00:08:10,920 So Procrustes-- and it's included in the notes-- 127 00:08:10,920 --> 00:08:12,930 that name comes from a Greek myth. 128 00:08:12,930 --> 00:08:16,800 Are you guys into Greek myths? 129 00:08:16,800 --> 00:08:20,102 So what was the story of Procrustes? 130 00:08:23,280 --> 00:08:30,540 Was it Procrustes who adjusted the length of his-- 131 00:08:30,540 --> 00:08:34,179 so he had a special bed. 132 00:08:34,179 --> 00:08:36,850 Procrustes' bed-- certain length. 133 00:08:36,850 --> 00:08:38,950 And then, he had visitors coming. 134 00:08:38,950 --> 00:08:42,730 And instead of adjusting the length of the bed 135 00:08:42,730 --> 00:08:47,380 to fit the visitor, Procrustes adjusted the length 136 00:08:47,380 --> 00:08:50,350 of the visitor to fit the bed. 137 00:08:50,350 --> 00:08:55,330 So either stretched the visitor or chopped off 138 00:08:55,330 --> 00:08:56,290 part of the visitor. 139 00:08:56,290 --> 00:09:01,380 So anyway-- the Greeks like this sort of thing. 140 00:09:01,380 --> 00:09:02,050 OK. 141 00:09:02,050 --> 00:09:06,970 So anyway, that's a Greek myth for 18.065. 142 00:09:06,970 --> 00:09:07,750 OK. 143 00:09:07,750 --> 00:09:12,340 So the whole idea, the Procrustes problem, 144 00:09:12,340 --> 00:09:15,860 is to make something fit something else. 145 00:09:18,500 --> 00:09:20,165 So the two things are-- 146 00:09:25,410 --> 00:09:27,360 so suppose I'm just in three dimensions 147 00:09:27,360 --> 00:09:30,140 and I have two vectors here. 148 00:09:30,140 --> 00:09:33,780 So I have a basis for a two dimensional space. 149 00:09:33,780 --> 00:09:34,905 And over here I have-- 150 00:09:37,680 --> 00:09:44,880 people-- space scientists might have one computation 151 00:09:44,880 --> 00:09:49,150 of the positions of satellites. 152 00:09:49,150 --> 00:09:51,480 Then, of course, they wouldn't be off by as much 153 00:09:51,480 --> 00:09:53,070 as this figure shows. 154 00:09:53,070 --> 00:09:55,830 But then they have another computation 155 00:09:55,830 --> 00:09:58,170 using different coordinates. 156 00:09:58,170 --> 00:10:03,400 So it partly rotated from this picture, 157 00:10:03,400 --> 00:10:07,230 but also it's partly got round off errors and error 158 00:10:07,230 --> 00:10:09,000 in it between the two. 159 00:10:09,000 --> 00:10:14,160 So the question is, what's the best orthogonal transformation? 160 00:10:14,160 --> 00:10:22,100 So this is a bunch of vectors, x1, x2, to xn, let's say. 161 00:10:22,100 --> 00:10:26,460 And I want to modify them by an orthogonal matrix-- 162 00:10:26,460 --> 00:10:28,260 maybe I'd do it on the other side. 163 00:10:28,260 --> 00:10:29,010 I think I do. 164 00:10:29,010 --> 00:10:29,510 Yeah. 165 00:10:33,970 --> 00:10:41,340 Q, to be as close as possible to this other set, y1, 166 00:10:41,340 --> 00:10:44,640 y2 up to yn. 167 00:10:44,640 --> 00:10:47,330 So let me just say it again. 168 00:10:47,330 --> 00:10:50,100 I have two sets of vectors. 169 00:10:50,100 --> 00:10:53,430 And I'm looking, and they're different-- like those two 170 00:10:53,430 --> 00:10:54,390 sets. 171 00:10:54,390 --> 00:10:58,060 And I'm looking for the orthogonality matrix 172 00:10:58,060 --> 00:11:02,490 that, as well as possible, takes this set into this one. 173 00:11:02,490 --> 00:11:06,150 Of course, if this was an orthogonal basis, 174 00:11:06,150 --> 00:11:08,760 and this was an orthogonal basis, then 175 00:11:08,760 --> 00:11:11,910 we would be home free. 176 00:11:11,910 --> 00:11:13,590 Q-- we could get equality. 177 00:11:13,590 --> 00:11:16,800 We could take an orthogonal basis directly 178 00:11:16,800 --> 00:11:21,390 into an orthogonal basis with a orthogonal matrix Q. 179 00:11:21,390 --> 00:11:24,240 In other words, if x was an orthogonal matrix, 180 00:11:24,240 --> 00:11:26,640 and y was an orthogonal matrix, we 181 00:11:26,640 --> 00:11:34,010 would get the exact correct Q. But that's not the case. 182 00:11:34,010 --> 00:11:35,900 So we're looking for the best possible. 183 00:11:35,900 --> 00:11:38,010 So that's the problem there-- 184 00:11:38,010 --> 00:11:46,610 minimize over orthogonal matrix-- 185 00:11:46,610 --> 00:11:50,870 matrices Q. And I just want to get my notation 186 00:11:50,870 --> 00:11:52,220 to be consistent here. 187 00:11:54,950 --> 00:11:55,450 OK. 188 00:11:59,670 --> 00:12:04,980 So I've-- I see that starting with the y's and mapping them 189 00:12:04,980 --> 00:12:07,200 to x's-- 190 00:12:07,200 --> 00:12:09,790 so let me ask the question. 191 00:12:09,790 --> 00:12:15,810 What orthogonal matrix Q multiplies the y's to come 192 00:12:15,810 --> 00:12:18,270 as close as possible to the x's? 193 00:12:18,270 --> 00:12:22,740 So over all orthogonal Q's I want 194 00:12:22,740 --> 00:12:30,420 to minimize YQ minus X in the Frobenius norm. 195 00:12:30,420 --> 00:12:33,100 And I might as well square it. 196 00:12:33,100 --> 00:12:37,560 So Frobenius-- we're into the Frobenius norm. 197 00:12:37,560 --> 00:12:40,120 Remember the-- of a matrix? 198 00:12:45,850 --> 00:12:50,930 This is a very convenient norm in data science, 199 00:12:50,930 --> 00:12:52,700 to measure the size of a matrix. 200 00:12:52,700 --> 00:12:56,030 And we have several possible formulas for it. 201 00:12:56,030 --> 00:13:02,900 So let me call the matrix A. And the Frobenius norm squared-- 202 00:13:02,900 --> 00:13:05,570 so what's one expression, in terms 203 00:13:05,570 --> 00:13:08,350 of the entries of the matrix-- 204 00:13:08,350 --> 00:13:11,470 the numbers Aij in the matrix? 205 00:13:11,470 --> 00:13:16,370 The Frobenius norm just treats it like a long vector. 206 00:13:16,370 --> 00:13:20,300 So it's a11 squared, plus a12 squared, 207 00:13:20,300 --> 00:13:27,440 of all the way along the first plus second row, just-- 208 00:13:32,800 --> 00:13:35,460 I'll say nn squared. 209 00:13:35,460 --> 00:13:36,650 OK. 210 00:13:36,650 --> 00:13:39,730 Sum of all the squares-- 211 00:13:39,730 --> 00:13:42,370 just treating it like a long vector. 212 00:13:42,370 --> 00:13:43,450 OK. 213 00:13:43,450 --> 00:13:47,860 This-- but that's a awkward expression to write down. 214 00:13:47,860 --> 00:13:51,900 So what other ways do we have to find 215 00:13:51,900 --> 00:13:56,485 the Frobenius norm of a matrix? 216 00:13:59,690 --> 00:14:01,030 Let's see. 217 00:14:01,030 --> 00:14:08,560 I could look at this as A transpose A. Is that right? 218 00:14:08,560 --> 00:14:13,870 A transpose A. So what what's happening there? 219 00:14:13,870 --> 00:14:19,690 Remind me what-- yeah. 220 00:14:19,690 --> 00:14:21,400 I would get all that. 221 00:14:21,400 --> 00:14:28,750 I would get all these by taking the matrix A transpose times 222 00:14:28,750 --> 00:14:31,200 A. But what-- 223 00:14:31,200 --> 00:14:31,700 sorry. 224 00:14:31,700 --> 00:14:38,110 I'm not-- I haven't-- 225 00:14:38,110 --> 00:14:44,060 I've lost my thread of talk here. 226 00:14:44,060 --> 00:14:47,960 So here's-- oh, and then I take the trace, of course. 227 00:14:47,960 --> 00:14:51,320 So that first row-- 228 00:14:51,320 --> 00:14:58,250 first column of A times that one will give me the-- 229 00:14:58,250 --> 00:15:00,650 one set of squares. 230 00:15:00,650 --> 00:15:03,950 And then, that one times the other, and the next one, 231 00:15:03,950 --> 00:15:07,310 will give me the next set of squares, right? 232 00:15:07,310 --> 00:15:08,870 So this is going to-- 233 00:15:08,870 --> 00:15:10,490 if I look at the trace-- 234 00:15:10,490 --> 00:15:13,350 so now, let me. 235 00:15:13,350 --> 00:15:17,090 So I just want to look at the diagonal here. 236 00:15:17,090 --> 00:15:20,240 So it's the trace. 237 00:15:20,240 --> 00:15:23,950 You remember, the trace of a matrix-- 238 00:15:23,950 --> 00:15:30,220 of a matrix M is the sum down the diagonal M11, 239 00:15:30,220 --> 00:15:34,430 M22, down to Mnn. 240 00:15:34,430 --> 00:15:38,800 It's the diagonal sum. 241 00:15:44,300 --> 00:15:47,880 And-- everybody with me here now? 242 00:15:47,880 --> 00:15:51,600 So that term on the diagonal-- 243 00:15:51,600 --> 00:15:54,860 A transpose A-- gives me all of that. 244 00:15:54,860 --> 00:16:00,020 Then-- or maybe I should be doing AA transpose. 245 00:16:00,020 --> 00:16:02,520 The point is, it doesn't matter. 246 00:16:02,520 --> 00:16:06,700 Or the trace of AA transpose. 247 00:16:06,700 --> 00:16:12,020 That would be-- those would both give the correct Frobenius 248 00:16:12,020 --> 00:16:12,665 norm squared. 249 00:16:15,740 --> 00:16:20,010 So traces are going to come into this little problem. 250 00:16:20,010 --> 00:16:22,560 Now there's another formula for the Frobenius norm-- 251 00:16:22,560 --> 00:16:24,180 even shorter-- 252 00:16:24,180 --> 00:16:26,940 well, certainly shorter than this one-- 253 00:16:26,940 --> 00:16:28,800 involving a sum of squares. 254 00:16:28,800 --> 00:16:31,380 And what's that one? 255 00:16:31,380 --> 00:16:36,170 What's the other way to get the same answer? 256 00:16:36,170 --> 00:16:37,760 If I look at the SVD-- 257 00:16:37,760 --> 00:16:40,070 look at singular values. 258 00:16:40,070 --> 00:16:46,040 I think that this is also equal to the sum square of all 259 00:16:46,040 --> 00:16:46,895 the singular values. 260 00:16:52,850 --> 00:16:59,140 So it's three nice expressions for the Frobenius norm. 261 00:16:59,140 --> 00:17:03,190 The nice ones involve A transpose A, or AA transpose. 262 00:17:03,190 --> 00:17:06,089 And of course, that connects to the singular values, 263 00:17:06,089 --> 00:17:09,390 because what are-- what's the connection between singular 264 00:17:09,390 --> 00:17:10,800 values and those-- 265 00:17:10,800 --> 00:17:12,089 and these guys-- 266 00:17:12,089 --> 00:17:15,069 A transpose A, or AA transpose? 267 00:17:15,069 --> 00:17:19,140 The singular values are the-- 268 00:17:19,140 --> 00:17:23,606 or the singular values squared are the-- 269 00:17:23,606 --> 00:17:24,560 AUDIENCE: Eigenvalues. 270 00:17:24,560 --> 00:17:27,020 GILBERT STRANG: Eigenvalues of A transpose A. 271 00:17:27,020 --> 00:17:30,110 And then when I add up the trace, 272 00:17:30,110 --> 00:17:34,370 I'm adding up the eigenvalues and that's the-- 273 00:17:34,370 --> 00:17:39,080 that gives me the Frobenius norm squared. 274 00:17:39,080 --> 00:17:39,940 So this is a-- 275 00:17:43,490 --> 00:17:46,370 that tells us something important, 276 00:17:46,370 --> 00:17:49,870 which we can see in different ways, that the-- 277 00:17:49,870 --> 00:17:52,640 so to solve this problem, we're going 278 00:17:52,640 --> 00:17:59,410 to need various facts, like the QA in the Frobenius norm 279 00:17:59,410 --> 00:18:03,200 is the same as A in the Frobenius norm. 280 00:18:03,200 --> 00:18:05,480 Why is that? 281 00:18:05,480 --> 00:18:05,980 Why? 282 00:18:12,440 --> 00:18:15,290 So here I'm multiplying every column 283 00:18:15,290 --> 00:18:18,840 by the matrix Q. What happens to the length of the column 284 00:18:18,840 --> 00:18:20,210 when I multiply it by q? 285 00:18:20,210 --> 00:18:21,377 AUDIENCE: It doesn't change. 286 00:18:21,377 --> 00:18:22,700 GILBERT STRANG: Doesn't change. 287 00:18:22,700 --> 00:18:26,450 So I could add up the length of the columns all squared. 288 00:18:26,450 --> 00:18:29,130 Here I wrote it in terms of rows. 289 00:18:29,130 --> 00:18:33,530 But I could have reordered that, and got it in terms of columns. 290 00:18:33,530 --> 00:18:44,420 That's because the length of Q times any vector squared 291 00:18:44,420 --> 00:18:48,890 is the same as the vector squared. 292 00:18:48,890 --> 00:18:56,600 And these-- take these to be the columns of A. 293 00:18:56,600 --> 00:19:01,500 So for column by column, the multiplication by Q 294 00:19:01,500 --> 00:19:03,480 doesn't change the length. 295 00:19:03,480 --> 00:19:07,800 And then when I add up all the columns squared, 296 00:19:07,800 --> 00:19:10,860 I get the Frobenius norm squared. 297 00:19:10,860 --> 00:19:13,440 And another way to say it-- 298 00:19:13,440 --> 00:19:17,070 let's make that connection between this fact-- 299 00:19:17,070 --> 00:19:20,460 that Q didn't change the Frobenius norm-- 300 00:19:20,460 --> 00:19:24,240 and this fact, that the Frobenius norm is expressed 301 00:19:24,240 --> 00:19:26,580 in terms of the sigmas. 302 00:19:26,580 --> 00:19:30,520 So what does Q do to the sigmas? 303 00:19:30,520 --> 00:19:35,260 I want to see in another way the answer to why. 304 00:19:35,260 --> 00:19:38,590 So if I have a matrix A with singular values, 305 00:19:38,590 --> 00:19:40,000 I multiply by Q-- 306 00:19:40,000 --> 00:19:42,832 what happens to the singular values? 307 00:19:42,832 --> 00:19:43,790 AUDIENCE: Don't change. 308 00:19:43,790 --> 00:19:45,140 GILBERT STRANG: Don't change. 309 00:19:45,140 --> 00:19:46,010 Don't change. 310 00:19:46,010 --> 00:19:49,760 That's the key point about singular values. 311 00:19:49,760 --> 00:19:57,530 If I multiply-- so A has a SVD, U sigma V transpose. 312 00:19:57,530 --> 00:20:04,800 And QA will have the SVD QU sigma V transpose. 313 00:20:04,800 --> 00:20:07,160 So all I've changed when I multiply by Q-- 314 00:20:07,160 --> 00:20:10,670 all I changed was the first factor-- 315 00:20:10,670 --> 00:20:14,930 the first orthogonal factor in the SVD. 316 00:20:14,930 --> 00:20:17,190 I didn't change the sigmas. 317 00:20:17,190 --> 00:20:19,130 They're still sitting there. 318 00:20:19,130 --> 00:20:22,220 So-- and of course, I could do also A on the other side-- 319 00:20:22,220 --> 00:20:25,970 different Q. Same Q or a different Q on the other side 320 00:20:25,970 --> 00:20:29,360 would show up here, and would not change the sigmas, 321 00:20:29,360 --> 00:20:32,420 and therefore would not change the Frobenius norm. 322 00:20:32,420 --> 00:20:36,590 So these are important properties 323 00:20:36,590 --> 00:20:38,480 of this Frobenius norm. 324 00:20:38,480 --> 00:20:44,240 It's a-- it looks messy to write down in that form, 325 00:20:44,240 --> 00:20:49,480 but it's much nicer in these forms and in that form. 326 00:20:49,480 --> 00:20:50,860 OK. 327 00:20:50,860 --> 00:20:53,310 So now, if I can just-- 328 00:20:53,310 --> 00:20:58,370 then we saw that it involves traces. 329 00:20:58,370 --> 00:21:02,345 So let me make a few observations about traces. 330 00:21:09,180 --> 00:21:15,090 So I'll just-- we want to be able to play with traces, 331 00:21:15,090 --> 00:21:18,310 and that's something we really haven't done. 332 00:21:18,310 --> 00:21:24,450 Here's a fact-- that the trace of A transpose B 333 00:21:24,450 --> 00:21:33,210 is equal to the trace of B transpose A. 334 00:21:33,210 --> 00:21:38,700 Of course, if B is A, it's clear, 335 00:21:38,700 --> 00:21:45,120 and it's equal to the trace of BA transpose. 336 00:21:49,000 --> 00:21:55,340 So even do little changes in your matrix 337 00:21:55,340 --> 00:21:57,530 without changing the trace. 338 00:21:57,530 --> 00:21:59,900 Let's see why one of these is true. 339 00:21:59,900 --> 00:22:02,420 Why is that first statement true? 340 00:22:09,130 --> 00:22:13,166 How is that matrix related to this matrix? 341 00:22:13,166 --> 00:22:14,630 AUDIENCE: [INAUDIBLE] transpose. 342 00:22:14,630 --> 00:22:16,940 GILBERT STRANG: It's just a transpose. 343 00:22:16,940 --> 00:22:19,730 If I take the transpose of that matrix, I get that. 344 00:22:19,730 --> 00:22:22,120 So what happens to the trace? 345 00:22:22,120 --> 00:22:23,760 I'm adding down the diagonal. 346 00:22:23,760 --> 00:22:26,300 The transpose has no effect. 347 00:22:26,300 --> 00:22:32,840 Clearly, this is just a fact that the trace doesn't change-- 348 00:22:32,840 --> 00:22:35,780 is not changed when you transpose a matrix, 349 00:22:35,780 --> 00:22:38,060 because the diagonal is not changed. 350 00:22:38,060 --> 00:22:41,060 Now what about this guy? 351 00:22:41,060 --> 00:22:44,210 I guess we're getting back to old fashioned 18.065, 352 00:22:44,210 --> 00:22:47,390 remembering facts about linear algebra, 353 00:22:47,390 --> 00:22:49,580 because this is a pure linear algebra. 354 00:22:49,580 --> 00:22:51,170 So what's this one about? 355 00:22:51,170 --> 00:22:55,570 This says that I can reverse the order of two matrices. 356 00:22:55,570 --> 00:23:00,510 So I'm now looking at the connection between those two. 357 00:23:00,510 --> 00:23:05,870 And so let me just-- to use different letters-- 358 00:23:05,870 --> 00:23:11,560 CD equals the trace of DC. 359 00:23:11,560 --> 00:23:14,540 I can flip the order. 360 00:23:14,540 --> 00:23:16,130 That's all I've done here is. 361 00:23:16,130 --> 00:23:18,980 I've reversed B with A transpose. 362 00:23:18,980 --> 00:23:23,450 I reversed C with D. So why is that true? 363 00:23:23,450 --> 00:23:24,800 Why is that true? 364 00:23:28,030 --> 00:23:33,100 Well, how shall we see the truth of that fact? 365 00:23:33,100 --> 00:23:35,560 So these are really convenient facts, 366 00:23:35,560 --> 00:23:40,180 that make a lot of people use the trace more often than we 367 00:23:40,180 --> 00:23:41,770 have in 18.065. 368 00:23:41,770 --> 00:23:45,910 I'm not a big user of arguments based on trace, 369 00:23:45,910 --> 00:23:52,430 but these are identities that go a long way with many problems. 370 00:23:52,430 --> 00:23:55,850 So let's see why that's true. 371 00:23:55,850 --> 00:23:57,640 Any time you think about trace, you've 372 00:23:57,640 --> 00:24:01,090 got two languages to use. 373 00:24:01,090 --> 00:24:02,680 You can use the eigenvalues. 374 00:24:02,680 --> 00:24:05,320 It's the sum of the eigenvalues. 375 00:24:05,320 --> 00:24:07,480 Or you can use the diagonal entries, 376 00:24:07,480 --> 00:24:09,760 because it's the sum of the diagonal entries. 377 00:24:09,760 --> 00:24:11,980 Let's use eigenvalues. 378 00:24:11,980 --> 00:24:14,740 How are the eigenvalues of CD related 379 00:24:14,740 --> 00:24:17,550 to the eigenvalues of DC? 380 00:24:17,550 --> 00:24:19,800 They're the same. 381 00:24:19,800 --> 00:24:21,720 If these matrices are rectangular, 382 00:24:21,720 --> 00:24:24,240 then there might be some extra zero eigenvalues, 383 00:24:24,240 --> 00:24:26,880 because they would have different shapes. 384 00:24:26,880 --> 00:24:30,060 But zeros are not going to affect the trace. 385 00:24:30,060 --> 00:24:35,590 So this is the same nonzero eigenvalues. 386 00:24:42,020 --> 00:24:43,730 OK. 387 00:24:43,730 --> 00:24:44,990 And so on. 388 00:24:44,990 --> 00:24:45,960 Yeah. 389 00:24:45,960 --> 00:24:46,910 OK. 390 00:24:46,910 --> 00:24:54,930 Let me just-- let me try to tell you the steps 391 00:24:54,930 --> 00:24:59,770 now to get the correct Q. And let me tell you the answer 392 00:24:59,770 --> 00:25:00,270 first. 393 00:25:04,410 --> 00:25:11,240 And I'm realizing that all important question four-- 394 00:25:11,240 --> 00:25:14,030 does deep learning actually work? 395 00:25:14,030 --> 00:25:15,740 We're going to run out of time today, 396 00:25:15,740 --> 00:25:18,170 because we only have a few minutes left. 397 00:25:18,170 --> 00:25:21,320 I suggest we bring that question back up, 398 00:25:21,320 --> 00:25:26,870 because it's pretty important to a lot of people. 399 00:25:26,870 --> 00:25:32,270 There's-- I had lunch with Professor Edelman, and he said, 400 00:25:32,270 --> 00:25:37,160 you know, deep learning and neural nets have had a record 401 00:25:37,160 --> 00:25:44,390 amount of publicity and hype for sort of computational 402 00:25:44,390 --> 00:25:45,830 algorithm. 403 00:25:45,830 --> 00:25:49,950 And-- but I had-- 404 00:25:49,950 --> 00:25:55,860 I've had people now tell me that typical first-- 405 00:25:55,860 --> 00:25:59,300 if you create a network-- 406 00:25:59,300 --> 00:26:04,690 using Alex's design, for example-- 407 00:26:04,690 --> 00:26:09,590 the chances are that it won't be successful-- 408 00:26:09,590 --> 00:26:15,620 that the successful networks have been worked on, 409 00:26:15,620 --> 00:26:17,310 and experimented with. 410 00:26:17,310 --> 00:26:22,220 And a good structure has emerged, but didn't-- 411 00:26:22,220 --> 00:26:23,520 wasn't there at the start. 412 00:26:23,520 --> 00:26:27,350 So I think that's a topic for Monday. 413 00:26:27,350 --> 00:26:31,370 And I'm really just realizing, from talking 414 00:26:31,370 --> 00:26:37,190 to people in the field, that it's by no means automatic. 415 00:26:37,190 --> 00:26:42,840 That structure-- even if you put in a whole bunch of layers-- 416 00:26:42,840 --> 00:26:45,350 it may not be what you want. 417 00:26:45,350 --> 00:26:46,160 OK. 418 00:26:46,160 --> 00:26:50,790 So I'm-- let me finish this argument today. 419 00:26:50,790 --> 00:26:52,310 Let me give you the answer. 420 00:26:52,310 --> 00:26:54,140 So what's the good Q? 421 00:26:54,140 --> 00:27:04,140 I have matrices Y and X. And the idea is that I take it-- 422 00:27:04,140 --> 00:27:07,310 I look at Y transpose X. So that'll 423 00:27:07,310 --> 00:27:10,370 be all the dot products of one set of vectors 424 00:27:10,370 --> 00:27:11,630 or the other set of vectors. 425 00:27:11,630 --> 00:27:12,860 That's a matrix. 426 00:27:12,860 --> 00:27:14,470 And I do its SVD-- 427 00:27:14,470 --> 00:27:16,520 U sigma V transpose. 428 00:27:19,900 --> 00:27:21,450 So multiply this. 429 00:27:23,990 --> 00:27:27,840 Multiply Y-- the two bases that you're given. 430 00:27:27,840 --> 00:27:30,650 Of course, if Y was the same as X-- 431 00:27:30,650 --> 00:27:32,750 if it was an orthogonal basis-- 432 00:27:32,750 --> 00:27:36,050 you'd have the identity, no questions. 433 00:27:36,050 --> 00:27:37,970 But generally, we have-- 434 00:27:37,970 --> 00:27:39,980 it has an SVD. 435 00:27:39,980 --> 00:27:44,840 And we're looking for a orthogonal matrix 436 00:27:44,840 --> 00:27:49,055 of the best Q is-- 437 00:27:51,670 --> 00:27:52,510 Da dun da duh. 438 00:27:52,510 --> 00:28:01,830 I mean, it's the right time for expressions of amazement. 439 00:28:01,830 --> 00:28:03,690 It is UV transpose. 440 00:28:07,380 --> 00:28:08,130 OK. 441 00:28:08,130 --> 00:28:12,270 So that gives us the answer. 442 00:28:12,270 --> 00:28:17,140 We're given X and Y. We're looking for the best Q. 443 00:28:17,140 --> 00:28:21,280 And the answer comes in the simplest possible way. 444 00:28:21,280 --> 00:28:24,880 Compute Y transpose X. Compute its SVD, 445 00:28:24,880 --> 00:28:29,340 and use the orthogonal matrices from the SVD. 446 00:28:29,340 --> 00:28:29,840 Yeah. 447 00:28:29,840 --> 00:28:33,590 And I'm out of time so proof-- 448 00:28:36,240 --> 00:28:41,520 it's [INAUDIBLE] line later-- 449 00:28:41,520 --> 00:28:45,180 either to just send you the section online, 450 00:28:45,180 --> 00:28:47,430 or to discuss it in class Monday. 451 00:28:47,430 --> 00:28:53,730 But I'm really planning Monday to start with question 4. 452 00:28:53,730 --> 00:28:56,910 And meanwhile to ask a whole lot of people-- 453 00:28:56,910 --> 00:29:00,810 everybody I can find-- 454 00:29:00,810 --> 00:29:04,560 about that important question, is-- 455 00:29:04,560 --> 00:29:06,550 does deep learning usually work? 456 00:29:06,550 --> 00:29:09,375 How-- what can you do to make sure it works, 457 00:29:09,375 --> 00:29:13,080 or give yourself a better chance to have it work? 458 00:29:13,080 --> 00:29:15,560 So let's-- that's up for Monday then. 459 00:29:15,560 --> 00:29:17,110 Good.