1 00:00:01,550 --> 00:00:03,920 The following content is provided under a Creative 2 00:00:03,920 --> 00:00:05,310 Commons license. 3 00:00:05,310 --> 00:00:07,520 Your support will help MIT OpenCourseWare 4 00:00:07,520 --> 00:00:11,610 continue to offer high quality educational resources for free. 5 00:00:11,610 --> 00:00:14,180 To make a donation or to view additional materials 6 00:00:14,180 --> 00:00:18,140 from hundreds of MIT courses, visit MIT OpenCourseWare 7 00:00:18,140 --> 00:00:19,026 at ocw.mit.edu. 8 00:00:24,420 --> 00:00:26,490 PROFESSOR: Well, OK. 9 00:00:26,490 --> 00:00:30,330 So first important things about the course, 10 00:00:30,330 --> 00:00:31,890 plans for the course. 11 00:00:31,890 --> 00:00:38,970 And then today I'm going to move to the next section 12 00:00:38,970 --> 00:00:44,670 of the notes, section 2, or part 2, I should say. 13 00:00:44,670 --> 00:00:49,260 And actually I'll skip for the moments section 2-1 14 00:00:49,260 --> 00:00:57,180 and go to section 2-2, and all of chapter 2 15 00:00:57,180 --> 00:01:03,840 will come to you probably today or latest tomorrow. 16 00:01:03,840 --> 00:01:06,480 So that's where we're going next. 17 00:01:06,480 --> 00:01:09,060 I'm following the notes pretty carefully, 18 00:01:09,060 --> 00:01:15,180 except I'm going to skip the section on tensors until I 19 00:01:15,180 --> 00:01:18,640 learn more basically. 20 00:01:18,640 --> 00:01:19,140 Yeah. 21 00:01:19,140 --> 00:01:19,650 Yeah. 22 00:01:19,650 --> 00:01:21,510 I could say a little about tensors, 23 00:01:21,510 --> 00:01:29,190 but this flows naturally using the SVD. 24 00:01:29,190 --> 00:01:34,620 So it's just a terribly important problem, 25 00:01:34,620 --> 00:01:35,400 least squares. 26 00:01:35,400 --> 00:01:38,910 And of course, I know that you've seen one or two 27 00:01:38,910 --> 00:01:40,440 ways to do least squares. 28 00:01:40,440 --> 00:01:44,310 And really the whole subject comes together. 29 00:01:44,310 --> 00:01:48,750 Here I want to say something, before I send out 30 00:01:48,750 --> 00:01:52,620 a plan for looking ahead for the course as a whole. 31 00:01:55,750 --> 00:01:57,220 So there's no final exam. 32 00:01:57,220 --> 00:02:01,810 And I don't really see how to examine you, how to give tests. 33 00:02:01,810 --> 00:02:06,220 I could, of course, create our tests 34 00:02:06,220 --> 00:02:08,229 about the linear algebra part. 35 00:02:08,229 --> 00:02:11,730 But I don't think it's-- 36 00:02:11,730 --> 00:02:14,440 it's not sort of the style of this course 37 00:02:14,440 --> 00:02:21,010 to expect you quickly to create a proof for something in class. 38 00:02:21,010 --> 00:02:23,530 So I think, and especially looking 39 00:02:23,530 --> 00:02:28,360 at what we're headed for, and moving quite steadily 40 00:02:28,360 --> 00:02:34,750 in that direction, is all the problems 41 00:02:34,750 --> 00:02:39,280 that this linear algebra is is aimed at, right up to 42 00:02:39,280 --> 00:02:46,990 and including conjugate gradient descent and deep learning, 43 00:02:46,990 --> 00:02:58,270 the overwhelmingly important and lively, active research area. 44 00:02:58,270 --> 00:03:02,140 I couldn't do better than to keep the course going 45 00:03:02,140 --> 00:03:03,500 in that direction. 46 00:03:03,500 --> 00:03:07,060 So I think what I would ask you to do 47 00:03:07,060 --> 00:03:17,380 is late in sort of April, May, the regular homeworks I'll 48 00:03:17,380 --> 00:03:19,900 discontinue at a certain point. 49 00:03:19,900 --> 00:03:27,220 And then instead, I'll be asking and encouraging a project-- 50 00:03:27,220 --> 00:03:30,700 I don't know if that's the right word to be using-- 51 00:03:30,700 --> 00:03:36,150 in which you use what we've done. 52 00:03:36,150 --> 00:03:39,070 And I'll send out a message on Stellar 53 00:03:39,070 --> 00:03:42,820 listing five or six areas and only-- 54 00:03:42,820 --> 00:03:46,090 I mean, one of them is the machine learning, deep 55 00:03:46,090 --> 00:03:46,850 learning part. 56 00:03:46,850 --> 00:03:50,410 But they're all the other parts, things 57 00:03:50,410 --> 00:03:53,050 we are learning how to do. 58 00:03:53,050 --> 00:03:57,010 How to find sparse solutions, for example, 59 00:03:57,010 --> 00:03:59,400 or something about the pseudo inverse. 60 00:03:59,400 --> 00:04:00,700 All kinds of things. 61 00:04:00,700 --> 00:04:06,310 So that's my goal, is to give you 62 00:04:06,310 --> 00:04:11,800 something to do which uses the material that you've learned. 63 00:04:11,800 --> 00:04:15,010 And look, I'm not expecting a thesis. 64 00:04:15,010 --> 00:04:19,040 But it's a good chance. 65 00:04:19,040 --> 00:04:22,880 So it will be more than just, drag 66 00:04:22,880 --> 00:04:29,120 in some code for deep learning and some data matrix and do it. 67 00:04:29,120 --> 00:04:32,910 But we'll talk more as the time comes. 68 00:04:32,910 --> 00:04:35,420 So I just thought I'd say, before sending out 69 00:04:35,420 --> 00:04:36,920 the announcement, I would say it's 70 00:04:36,920 --> 00:04:47,180 coming about what as a larger scale than single one 71 00:04:47,180 --> 00:04:51,090 week homeworks would be here before. 72 00:04:51,090 --> 00:04:52,400 Any thoughts about that? 73 00:04:52,400 --> 00:04:56,310 I haven't given you details. 74 00:04:56,310 --> 00:05:01,520 So let me do that with a message, and then ask again. 75 00:05:01,520 --> 00:05:02,930 But I'm open to-- 76 00:05:02,930 --> 00:05:04,640 I hope you've understood-- 77 00:05:04,640 --> 00:05:08,270 I think you have-- that if you make suggestions, 78 00:05:08,270 --> 00:05:13,850 either directly to my email or on Piazza or whatever, 79 00:05:13,850 --> 00:05:15,890 they get paid attention to. 80 00:05:15,890 --> 00:05:17,450 OK. 81 00:05:17,450 --> 00:05:21,220 Shall I just go forward with least squares? 82 00:05:21,220 --> 00:05:22,900 So what's the least squares problem, 83 00:05:22,900 --> 00:05:28,470 and what are these four ways, each bringing-- 84 00:05:28,470 --> 00:05:31,150 so let me speak about the pseudo inverse first. 85 00:05:31,150 --> 00:05:33,770 OK, the pseudo inverse of a matrix. 86 00:05:33,770 --> 00:05:34,270 All right. 87 00:05:34,270 --> 00:05:34,770 Good. 88 00:05:39,620 --> 00:05:44,510 So we have a matrix A, m by n. 89 00:05:44,510 --> 00:05:48,680 And the pseudo inverse I'm going to call A plus. 90 00:05:48,680 --> 00:05:52,630 And it naturally is going to be n by m. 91 00:05:52,630 --> 00:05:55,370 I'm going to multiply those together. 92 00:05:55,370 --> 00:05:59,570 And I'm going to get as near to the identity as I can. 93 00:05:59,570 --> 00:06:01,970 That's the idea, of course, of the pseudo inverse, 94 00:06:01,970 --> 00:06:06,330 The word pseudo is in there, so no one's deceived. 95 00:06:06,330 --> 00:06:08,330 It's not an actual inverse. 96 00:06:08,330 --> 00:06:14,210 Oh, if the matrix is square and has an inverse, of course. 97 00:06:14,210 --> 00:06:22,130 Then if A inverse exists, which requires-- 98 00:06:22,130 --> 00:06:24,620 everybody remembers it requires the matrix 99 00:06:24,620 --> 00:06:29,360 to be square, because I mean inverse on both sides. 100 00:06:29,360 --> 00:06:34,910 And it requires rank n, full rank. 101 00:06:34,910 --> 00:06:36,710 Then the inverse will exist. 102 00:06:36,710 --> 00:06:38,000 You can check it. 103 00:06:38,000 --> 00:06:40,460 MATLAB would check it by computing 104 00:06:40,460 --> 00:06:44,540 the pivots in elimination and finding n pivots. 105 00:06:44,540 --> 00:06:49,130 So if A inverse exists, which means 106 00:06:49,130 --> 00:06:55,280 A times A inverse, and A inverse times A, both give I, 107 00:06:55,280 --> 00:07:01,640 then A plus is A inverse, of course. 108 00:07:04,290 --> 00:07:07,810 The pseudo inverse is the inverse when there is one. 109 00:07:07,810 --> 00:07:10,600 But I'm thinking about cases where 110 00:07:10,600 --> 00:07:13,920 either the matrix is rectangular, 111 00:07:13,920 --> 00:07:19,550 or it has zero eigenvalues. 112 00:07:19,550 --> 00:07:24,090 It could be square, but it has a null space, other than just 113 00:07:24,090 --> 00:07:25,840 the 0 vector. 114 00:07:25,840 --> 00:07:28,800 In other words, the columns are dependent. 115 00:07:28,800 --> 00:07:32,370 What can we do then about inverting it? 116 00:07:32,370 --> 00:07:34,350 We can't literally invert it. 117 00:07:34,350 --> 00:07:38,520 If A has a null space, then when I 118 00:07:38,520 --> 00:07:44,910 multiply by a vector x in that null space, Ax will be 0. 119 00:07:44,910 --> 00:07:48,720 And when I multiply by A inverse, still 0. 120 00:07:48,720 --> 00:07:50,850 That can't change the 0. 121 00:07:50,850 --> 00:07:56,560 So if there is an x in the null space, then this can't happen. 122 00:07:56,560 --> 00:07:59,000 So we just do the best we can. 123 00:07:59,000 --> 00:08:01,390 And that's what this pseudo inverse is. 124 00:08:01,390 --> 00:08:06,400 And so let me draw a picture of the picture you know of the row 125 00:08:06,400 --> 00:08:11,530 space and the null space. 126 00:08:11,530 --> 00:08:13,120 OK, and it's there, you see. 127 00:08:13,120 --> 00:08:14,950 There is a null space. 128 00:08:14,950 --> 00:08:19,210 And over here I have the column space and the null space 129 00:08:19,210 --> 00:08:20,880 of A transpose. 130 00:08:20,880 --> 00:08:21,850 OK. 131 00:08:21,850 --> 00:08:24,620 So this is the row space, of course. 132 00:08:24,620 --> 00:08:28,180 That's the column space of A transpose, 133 00:08:28,180 --> 00:08:32,030 and there is the column space of A. OK. 134 00:08:32,030 --> 00:08:35,059 So which part of that picture is invertible, and which part 135 00:08:35,059 --> 00:08:37,250 of the picture is hopeless? 136 00:08:37,250 --> 00:08:39,710 The top part is invertible. 137 00:08:39,710 --> 00:08:45,470 This is the r-dimensional row space, r-dimensional column 138 00:08:45,470 --> 00:08:46,040 space. 139 00:08:46,040 --> 00:08:50,450 A takes a vector in here, zaps it into every-- 140 00:08:50,450 --> 00:08:54,710 you always end up in the column space. 141 00:08:54,710 --> 00:08:57,410 Here I take a vector in the row space-- 142 00:08:57,410 --> 00:09:02,210 say, x-- and it gets mapped to Ax. 143 00:09:02,210 --> 00:09:10,080 And between those two spaces, A is entirely invertible. 144 00:09:10,080 --> 00:09:12,260 You get separate vectors here, go 145 00:09:12,260 --> 00:09:15,230 to separate vectors in the column space, 146 00:09:15,230 --> 00:09:19,720 and the inverse just brings it back. 147 00:09:19,720 --> 00:09:23,770 So we know what the pseudo inverse should do. 148 00:09:23,770 --> 00:09:30,060 It will take A will go that way, and A plus, 149 00:09:30,060 --> 00:09:32,160 the pseudo inverse will be just-- 150 00:09:36,270 --> 00:09:39,810 on the top half of the picture, it'll give us A plus. 151 00:09:39,810 --> 00:09:46,800 We'll take Ax back to x in the top half. 152 00:09:46,800 --> 00:09:48,570 Now, what about here? 153 00:09:48,570 --> 00:09:51,210 That's where we have trouble, when we don't have-- 154 00:09:51,210 --> 00:09:53,340 that's what spoils our inverse. 155 00:09:53,340 --> 00:09:59,970 If there is a null space vector, then it goes where? 156 00:09:59,970 --> 00:10:03,940 When you multiply by A, this guy in the null space goes to 0. 157 00:10:07,050 --> 00:10:09,840 Usually along a straighter line than I've drawn. 158 00:10:09,840 --> 00:10:10,590 But it goes there. 159 00:10:10,590 --> 00:10:12,300 It gets to 0. 160 00:10:12,300 --> 00:10:15,780 So you can't raise it from the dead, so to speak. 161 00:10:15,780 --> 00:10:18,950 You can't recover it when there's no A inverse. 162 00:10:18,950 --> 00:10:24,120 So we have to think, what shall A inverse do to this space 163 00:10:24,120 --> 00:10:26,910 here, where nobody's hitting it? 164 00:10:26,910 --> 00:10:34,830 So this would be the null space of A transpose. 165 00:10:34,830 --> 00:10:40,040 Because A-- sorry-- yeah, what should the pseudo inverse do? 166 00:10:40,040 --> 00:10:41,690 I said what should the inverse do? 167 00:10:41,690 --> 00:10:43,430 The inverse is helpless. 168 00:10:43,430 --> 00:10:46,610 But we have to define A plus. 169 00:10:46,610 --> 00:10:50,630 I've said what it should do on that guy, on the column space. 170 00:10:50,630 --> 00:10:52,520 It should take everything in the column space 171 00:10:52,520 --> 00:10:54,790 back where it came from. 172 00:10:54,790 --> 00:11:00,020 But what should it do on this orthogonal space, where-- 173 00:11:00,020 --> 00:11:03,460 yeah, just tell me, what do you think? 174 00:11:03,460 --> 00:11:06,750 If I have some vector here-- 175 00:11:06,750 --> 00:11:09,580 let's call it V r plus 1. 176 00:11:09,580 --> 00:11:11,410 That would be like-- 177 00:11:11,410 --> 00:11:24,430 so here I have a nice basis for the column space. 178 00:11:24,430 --> 00:11:30,400 I would use V's for the ones that come up in the SVD. 179 00:11:30,400 --> 00:11:34,090 They're orthogonal, and they come from orthogonal U's. 180 00:11:34,090 --> 00:11:36,040 So the top half is great. 181 00:11:36,040 --> 00:11:40,120 What shall I do with this stuff? 182 00:11:40,120 --> 00:11:43,780 I'm going to send that back by A plus. 183 00:11:43,780 --> 00:11:46,720 And what am I going to do with it? 184 00:11:46,720 --> 00:11:49,750 Send it to-- nowhere else could it go. 185 00:11:49,750 --> 00:11:51,430 0 is the right answer. 186 00:11:51,430 --> 00:11:53,470 All this stuff goes back to 0. 187 00:11:56,260 --> 00:12:00,090 I'm looking for a linear operator, a matrix. 188 00:12:00,090 --> 00:12:02,440 And I have to think, once I've decided 189 00:12:02,440 --> 00:12:05,740 what to do with all those and what to do with all these, 190 00:12:05,740 --> 00:12:08,110 then I know what to do with any combination. 191 00:12:08,110 --> 00:12:09,640 So I've got it. 192 00:12:09,640 --> 00:12:10,540 I've got it. 193 00:12:10,540 --> 00:12:19,050 So the idea will be, this is true for x in the row space. 194 00:12:19,050 --> 00:12:25,735 For x in the row space, if x is in the row space, 195 00:12:25,735 --> 00:12:29,710 Ax is in the column space, and A inverse just brings it back 196 00:12:29,710 --> 00:12:30,980 as it should. 197 00:12:30,980 --> 00:12:34,510 And in the case of an invertible matrix 198 00:12:34,510 --> 00:12:37,000 A, what happens to my picture? 199 00:12:37,000 --> 00:12:41,230 What is this picture looking like if A is actually a 6 200 00:12:41,230 --> 00:12:44,020 by 6 invertible matrix? 201 00:12:44,020 --> 00:12:47,200 In that case, what's in my picture 202 00:12:47,200 --> 00:12:50,800 and what is not in my picture? 203 00:12:50,800 --> 00:12:54,070 All this null space stuff isn't there. 204 00:12:54,070 --> 00:12:57,800 And null space is just a 0 vector. 205 00:12:57,800 --> 00:12:59,670 But all that I don't have to worry about. 206 00:12:59,670 --> 00:13:01,980 But in general, I do have to say. 207 00:13:01,980 --> 00:13:08,890 So the point is that A plus on the-- 208 00:13:08,890 --> 00:13:12,330 what am I calling this? 209 00:13:12,330 --> 00:13:14,980 It's the null space of A transpose, 210 00:13:14,980 --> 00:13:25,790 or whatever on V r plus 1 to Vn, all those 211 00:13:25,790 --> 00:13:32,150 vectors, the guys that are not orthogonal to the column space. 212 00:13:32,150 --> 00:13:36,540 Then we have to say, what does A plus do to them? 213 00:13:36,540 --> 00:13:38,470 And the answer is, it takes them all to 0. 214 00:13:41,020 --> 00:13:45,580 So there is a picture using what I 215 00:13:45,580 --> 00:13:48,850 call the big picture of linear algebra, the four spaces. 216 00:13:48,850 --> 00:13:51,520 You see what A plus should do. 217 00:13:51,520 --> 00:13:55,610 Now, I need a little formula for it. 218 00:13:55,610 --> 00:13:58,970 I've got the plan for what it should be, 219 00:13:58,970 --> 00:14:01,310 and it's sort of the natural thing. 220 00:14:01,310 --> 00:14:06,470 So A plus A is, you could say it's a projection matrix. 221 00:14:06,470 --> 00:14:11,420 It's not the identity matrix because if x 222 00:14:11,420 --> 00:14:17,750 is in the null space, A plus A will take it to 0. 223 00:14:17,750 --> 00:14:18,830 So it's a projection. 224 00:14:18,830 --> 00:14:22,130 A plus A is the identity on the top half, 225 00:14:22,130 --> 00:14:23,840 and 0 on the bottom half. 226 00:14:23,840 --> 00:14:26,780 That's really what the matrix is. 227 00:14:26,780 --> 00:14:34,240 And now, I want a simple formula for it. 228 00:14:34,240 --> 00:14:37,700 And I guess my message here is, that if we're 229 00:14:37,700 --> 00:14:41,750 looking for a nice expression, start with the SVD. 230 00:14:41,750 --> 00:14:46,070 Because the SVD works for any matrix. 231 00:14:46,070 --> 00:14:49,160 And it writes it as an orthogonal matrix 232 00:14:49,160 --> 00:14:53,920 times a diagonal matrix times an orthogonal matrix. 233 00:14:53,920 --> 00:14:56,560 And now I want to invert it. 234 00:14:56,560 --> 00:15:00,460 Well, suppose A had an inverse. 235 00:15:00,460 --> 00:15:01,360 What would that be? 236 00:15:04,600 --> 00:15:13,870 This is if invertible, what would be the SVD of A inverse? 237 00:15:13,870 --> 00:15:18,080 What would be the singular value decomposition, if this is good? 238 00:15:18,080 --> 00:15:20,410 So when is this going to be good? 239 00:15:20,410 --> 00:15:24,380 What would I have to know about that matrix sigma, 240 00:15:24,380 --> 00:15:26,680 that diagonal matrix in the middle, 241 00:15:26,680 --> 00:15:32,100 if this is truly an invertible matrix? 242 00:15:32,100 --> 00:15:32,600 Well, no. 243 00:15:32,600 --> 00:15:33,590 What's its name? 244 00:15:33,590 --> 00:15:35,270 Those are not eigenvalues. 245 00:15:35,270 --> 00:15:38,900 Well, they're eigenvalues of A transpose A. 246 00:15:38,900 --> 00:15:40,280 But they're singular values. 247 00:15:40,280 --> 00:15:41,850 Singular value, that's fine. 248 00:15:41,850 --> 00:15:44,300 So that's the singular value matrix. 249 00:15:44,300 --> 00:15:51,250 And what would be the situation if A had an inverse? 250 00:15:51,250 --> 00:15:53,050 There would be no 0's. 251 00:15:53,050 --> 00:15:54,880 All the singular values would be sitting 252 00:15:54,880 --> 00:15:57,220 there, sigma 1 to sigma n. 253 00:15:57,220 --> 00:15:59,650 What would be the shape of this sigma matrix? 254 00:15:59,650 --> 00:16:05,470 If I have an inverse, then it's got to be square n by n. 255 00:16:05,470 --> 00:16:09,280 So what's the shape of the sigma guy? 256 00:16:09,280 --> 00:16:12,220 Also square, n by n. 257 00:16:12,220 --> 00:16:16,010 So the invertible case would be-- 258 00:16:16,010 --> 00:16:18,110 and I'm going to erase this in a minute-- 259 00:16:18,110 --> 00:16:23,426 the invertbile case would be when sigma is just that. 260 00:16:23,426 --> 00:16:25,800 That would be the invertible case. 261 00:16:25,800 --> 00:16:28,830 So let's see. 262 00:16:28,830 --> 00:16:30,660 Can you finish this formula? 263 00:16:30,660 --> 00:16:35,220 What would be the SVD of A inverse? 264 00:16:35,220 --> 00:16:38,370 So I'm given the SVD of A. I'm given the U 265 00:16:38,370 --> 00:16:43,080 and the sigma is cool and the V transpose. 266 00:16:43,080 --> 00:16:44,640 What's the inverse of that? 267 00:16:44,640 --> 00:16:46,190 Yeah, I'm just really asking what's 268 00:16:46,190 --> 00:16:51,130 the inverse of that product of three matrices. 269 00:16:51,130 --> 00:16:52,930 What comes first here? 270 00:16:52,930 --> 00:16:56,230 V. The inverse of V transpose is V. 271 00:16:56,230 --> 00:17:00,380 That's because V is a orthogonal matrix. 272 00:17:00,380 --> 00:17:02,830 The inverse of sigma, just 1 over it, 273 00:17:02,830 --> 00:17:04,720 is just the sigma inverse. 274 00:17:04,720 --> 00:17:06,609 It's obvious what that means. 275 00:17:06,609 --> 00:17:09,020 And the inverse of U would go here. 276 00:17:09,020 --> 00:17:11,230 And that is U transpose. 277 00:17:11,230 --> 00:17:12,589 Great. 278 00:17:12,589 --> 00:17:13,089 OK. 279 00:17:13,089 --> 00:17:16,140 So this is if invertible. 280 00:17:16,140 --> 00:17:21,609 If invertible, we know what the SVD of A inverse is. 281 00:17:21,609 --> 00:17:28,560 It just takes the V's back to the U's, or the U's back 282 00:17:28,560 --> 00:17:29,830 to the V's, whichever. 283 00:17:29,830 --> 00:17:30,540 OK. 284 00:17:30,540 --> 00:17:31,170 OK. 285 00:17:31,170 --> 00:17:37,970 Now we've got to do it, if we're going to allow-- 286 00:17:37,970 --> 00:17:42,330 if we're going to get beyond this limit, this situation, 287 00:17:42,330 --> 00:17:46,860 allow the matrix sigma to be rectangular. 288 00:17:46,860 --> 00:17:51,750 Then let me just show you the idea here. 289 00:17:51,750 --> 00:17:57,510 So now I'm going to say, now sigma, in general, 290 00:17:57,510 --> 00:17:59,190 it's rectangular. 291 00:17:59,190 --> 00:18:04,460 It's got r non 0's on the diagonal, but then it quits. 292 00:18:04,460 --> 00:18:09,610 So it's got a bunch of 0's that make it not invertible. 293 00:18:09,610 --> 00:18:14,020 But let's do our best and pseudo invert it. 294 00:18:14,020 --> 00:18:15,130 OK. 295 00:18:15,130 --> 00:18:20,650 So now help me get started on a formula for using-- 296 00:18:20,650 --> 00:18:24,970 I want to write this A plus, which I described up there, 297 00:18:24,970 --> 00:18:27,880 in terms of the subspaces. 298 00:18:27,880 --> 00:18:33,700 Now I'm going to describe A plus in terms of U, sigma, and V, 299 00:18:33,700 --> 00:18:35,350 the SVD guys. 300 00:18:35,350 --> 00:18:36,010 OK. 301 00:18:36,010 --> 00:18:39,520 So what shall I start with here? 302 00:18:39,520 --> 00:18:41,410 Well, let me give a hint. 303 00:18:41,410 --> 00:18:43,360 That was a great start. 304 00:18:43,360 --> 00:18:47,620 My V is still an orthogonal matrix. 305 00:18:47,620 --> 00:18:50,170 V transpose is still an orthogonal matrix. 306 00:18:50,170 --> 00:18:52,780 I'll invert it. 307 00:18:52,780 --> 00:18:57,280 At the end, the U was no problem. 308 00:18:57,280 --> 00:19:00,170 All the problems are in sigma. 309 00:19:00,170 --> 00:19:04,640 And sigma, remember, sigma-- 310 00:19:04,640 --> 00:19:06,320 so it's rectangular. 311 00:19:06,320 --> 00:19:10,070 Maybe I'll make it wide, two wide. 312 00:19:10,070 --> 00:19:13,970 And maybe I'll only give it two non-zeros, and then 313 00:19:13,970 --> 00:19:16,260 all the rest. 314 00:19:16,260 --> 00:19:22,400 So the rank of my matrix A is 2, but the m and n 315 00:19:22,400 --> 00:19:23,990 are bigger than 2. 316 00:19:23,990 --> 00:19:26,660 It's just got two independent columns, 317 00:19:26,660 --> 00:19:30,980 and then it's just sort of totally singular. 318 00:19:30,980 --> 00:19:31,550 OK. 319 00:19:31,550 --> 00:19:35,630 So my question is, what am I going to put there? 320 00:19:35,630 --> 00:19:37,850 And I've described it one way, but now I'm 321 00:19:37,850 --> 00:19:39,500 going to describe it another way. 322 00:19:39,500 --> 00:19:42,940 Well, let me just say, what I'll put there 323 00:19:42,940 --> 00:19:46,390 is the pseudo inverse of sigma. 324 00:19:46,390 --> 00:19:50,680 I can't put sigma inverse using that symbol, 325 00:19:50,680 --> 00:19:53,080 because there is no such thing. 326 00:19:53,080 --> 00:19:55,220 With this, I can't invert it. 327 00:19:55,220 --> 00:19:59,390 So that's the best I can do. 328 00:19:59,390 --> 00:20:03,330 So I'm almost done, but to finish, I have to tell you, 329 00:20:03,330 --> 00:20:06,880 what is this thing? 330 00:20:06,880 --> 00:20:09,250 So sigma plus. 331 00:20:09,250 --> 00:20:11,860 I'm now going to tell you sigma plus. 332 00:20:11,860 --> 00:20:15,490 And then that's what should sit there in the middle. 333 00:20:15,490 --> 00:20:18,400 So if sigma is this diagonal matrix 334 00:20:18,400 --> 00:20:22,690 which quits after two sigmas, what should sigma plus be? 335 00:20:22,690 --> 00:20:28,180 Well, first of all, it should be rectangular the other way. 336 00:20:28,180 --> 00:20:33,130 If this was m by n column, n columns and m rows, 337 00:20:33,130 --> 00:20:39,070 now I want to have n rows and m columns. 338 00:20:39,070 --> 00:20:42,750 And yeah, here's the question. 339 00:20:42,750 --> 00:20:44,760 What's the best inverse you could come up 340 00:20:44,760 --> 00:20:47,270 with for that sigma? 341 00:20:47,270 --> 00:20:51,230 I mean, if somebody independent of 18.065, 342 00:20:51,230 --> 00:20:55,850 if somebody asks you, do your best to invert that matrix, 343 00:20:55,850 --> 00:21:00,320 I think we'd all agree it is, yeah. 344 00:21:00,320 --> 00:21:04,580 One over the sigma 1 would come there. 345 00:21:04,580 --> 00:21:08,180 And 1 over sigma 2, the non zeros. 346 00:21:08,180 --> 00:21:10,800 And then? 347 00:21:10,800 --> 00:21:12,290 Zeros. 348 00:21:12,290 --> 00:21:15,630 Just the way up there, when we didn't know what to do, 349 00:21:15,630 --> 00:21:17,860 when there was nothing good to do. 350 00:21:17,860 --> 00:21:20,400 Zero was the right answer. 351 00:21:20,400 --> 00:21:23,310 So this is all zeros. 352 00:21:23,310 --> 00:21:25,470 Of course, it's rectangular the other way. 353 00:21:25,470 --> 00:21:28,200 But do you see that if I multiply 354 00:21:28,200 --> 00:21:31,260 sigma plus times sigma, if I multiply 355 00:21:31,260 --> 00:21:35,250 the pseudo inverse times the matrix, what do I 356 00:21:35,250 --> 00:21:39,170 get if I multiply that by that? 357 00:21:39,170 --> 00:21:41,440 What does that multiplication produce? 358 00:21:41,440 --> 00:21:42,950 Can you describe the-- 359 00:21:42,950 --> 00:21:46,100 well, or when you tell me what it looks like, 360 00:21:46,100 --> 00:21:47,170 I'll write it down. 361 00:21:47,170 --> 00:21:50,740 So what is sigma plus times sigma? 362 00:21:50,740 --> 00:21:54,250 If sigma is a diagonal, sigma plus is a diagonal, 363 00:21:54,250 --> 00:21:59,510 and they both quit after two guys. 364 00:21:59,510 --> 00:22:01,990 What do I have? 365 00:22:01,990 --> 00:22:03,300 One? 366 00:22:03,300 --> 00:22:07,480 Because sigma 1 times 1 over sigma 1 is a 1. 367 00:22:07,480 --> 00:22:11,260 And the other next guy is a 1. 368 00:22:11,260 --> 00:22:14,320 And the rest are all zeros. 369 00:22:14,320 --> 00:22:16,360 That's right. 370 00:22:16,360 --> 00:22:18,250 That's the best I could do. 371 00:22:18,250 --> 00:22:22,250 The rank was only two, so I couldn't get anywhere. 372 00:22:22,250 --> 00:22:26,020 So that tells you what sigma plus is. 373 00:22:26,020 --> 00:22:27,320 OK. 374 00:22:27,320 --> 00:22:30,210 So I described the pseudo inverse 375 00:22:30,210 --> 00:22:33,020 then with a picture of spaces, and then 376 00:22:33,020 --> 00:22:35,750 with a formula of matrices. 377 00:22:35,750 --> 00:22:39,960 And now I want to use it in least squares. 378 00:22:39,960 --> 00:22:43,820 So now I'm going to say what is the least squares problem. 379 00:22:43,820 --> 00:22:50,450 And the first way to solve it will be to involve-- 380 00:22:50,450 --> 00:22:52,550 A plus will give the solution. 381 00:22:52,550 --> 00:22:53,050 OK. 382 00:22:53,050 --> 00:22:56,420 So what is the least squares problem? 383 00:22:56,420 --> 00:22:58,010 Let me put it here. 384 00:23:05,050 --> 00:23:09,190 OK, the least squares problem is simply, you have an equation, 385 00:23:09,190 --> 00:23:10,440 Ax equals b. 386 00:23:14,190 --> 00:23:16,730 But A is not invertible. 387 00:23:16,730 --> 00:23:19,190 So you can't solve it. 388 00:23:19,190 --> 00:23:21,290 Of course, for which-- 389 00:23:21,290 --> 00:23:24,740 yeah, you could solve it for certain b's. 390 00:23:24,740 --> 00:23:27,380 If b is in the column space of A, 391 00:23:27,380 --> 00:23:30,620 then just by the meaning of column space, 392 00:23:30,620 --> 00:23:32,270 this has a solution. 393 00:23:32,270 --> 00:23:36,440 The vectors in the column space are the guys that you can get. 394 00:23:36,440 --> 00:23:39,540 But the vectors in the orthogonal space you cannot 395 00:23:39,540 --> 00:23:40,040 get. 396 00:23:40,040 --> 00:23:42,230 All the rest of the vectors you cannot get. 397 00:23:42,230 --> 00:23:50,090 So suppose this is like so, but always A is m by n rank r. 398 00:23:54,340 --> 00:24:03,820 And then we get A inverse when m equals n equals r. 399 00:24:03,820 --> 00:24:06,190 That's the invertible case. 400 00:24:06,190 --> 00:24:07,730 OK. 401 00:24:07,730 --> 00:24:12,960 What do we do with a system of equations 402 00:24:12,960 --> 00:24:15,600 when we can't solve it? 403 00:24:15,600 --> 00:24:19,050 This is probably the main application in 18.06. 404 00:24:19,050 --> 00:24:25,560 So you've seen this problem before. 405 00:24:25,560 --> 00:24:29,180 What do we do if Ax equal b has no solution? 406 00:24:29,180 --> 00:24:33,210 So typically, b would be a vector of measurements, 407 00:24:33,210 --> 00:24:39,480 like we're tracking a satellite, and we get some measurements. 408 00:24:39,480 --> 00:24:44,310 But often we get too many measurements. 409 00:24:44,310 --> 00:24:46,800 And of course, there's a little noise in them. 410 00:24:46,800 --> 00:24:50,790 And a little noise means that we can't solve the equations. 411 00:24:50,790 --> 00:24:54,900 That may be the case everybody knows 412 00:24:54,900 --> 00:25:02,340 is, where this equation is like expressing a straight line 413 00:25:02,340 --> 00:25:03,860 going through the data points. 414 00:25:03,860 --> 00:25:07,170 So the famous example of least squares 415 00:25:07,170 --> 00:25:21,140 is fit a straight line to the b's, to b1, b2. 416 00:25:21,140 --> 00:25:22,985 We've got m measurements. 417 00:25:26,100 --> 00:25:28,450 We've got m measurements. 418 00:25:28,450 --> 00:25:32,170 The physics or the mechanics of the problem 419 00:25:32,170 --> 00:25:34,000 is pretty well linear. 420 00:25:34,000 --> 00:25:36,650 But of course, there's noise. 421 00:25:36,650 --> 00:25:41,480 And a straight line only has two degrees of freedom. 422 00:25:41,480 --> 00:25:44,870 So we're going to have only two columns in our matrix. 423 00:25:44,870 --> 00:25:53,190 A will be only two columns, with many rows. 424 00:25:53,190 --> 00:25:54,900 Highly rectangular. 425 00:25:54,900 --> 00:25:56,490 So fit a straight line. 426 00:25:56,490 --> 00:26:02,400 Let me call that line Cx plus D. Say this is the x direction. 427 00:26:02,400 --> 00:26:05,880 This is the b's direction. 428 00:26:05,880 --> 00:26:08,910 And we've got a whole bunch of data points. 429 00:26:08,910 --> 00:26:10,720 And they're not on a line. 430 00:26:10,720 --> 00:26:11,730 Or they are on the line. 431 00:26:15,360 --> 00:26:18,780 Suppose those did lie on a line. 432 00:26:18,780 --> 00:26:22,610 What would that tell me about Ax equal b? 433 00:26:22,610 --> 00:26:25,540 I haven't said everything I need to, 434 00:26:25,540 --> 00:26:28,960 but maybe the insight is what I'm after here. 435 00:26:28,960 --> 00:26:32,940 If my points are right on the line-- 436 00:26:32,940 --> 00:26:37,590 so there is a straight line through them-- 437 00:26:37,590 --> 00:26:39,830 the unknowns here-- so let me-- 438 00:26:39,830 --> 00:26:44,930 so Ax-- the unknowns here are C and D. 439 00:26:44,930 --> 00:26:48,845 And the right hand side is all my measurements. 440 00:26:51,820 --> 00:26:54,380 OK. 441 00:26:54,380 --> 00:26:57,920 Suppose-- without my drawing a picture-- 442 00:26:57,920 --> 00:27:01,300 suppose these points are on the line. 443 00:27:01,300 --> 00:27:05,000 Here's the different x's, the measurement times. 444 00:27:05,000 --> 00:27:06,680 Here is the different measurements. 445 00:27:09,250 --> 00:27:10,870 But if they're on a line, what does 446 00:27:10,870 --> 00:27:16,060 that tell me about my linear system, Ax equal b? 447 00:27:16,060 --> 00:27:19,800 It has a solution. 448 00:27:19,800 --> 00:27:22,740 Being on a line means everything's perfect. 449 00:27:22,740 --> 00:27:24,360 There is a solution. 450 00:27:24,360 --> 00:27:27,210 But will there usually be a solution? 451 00:27:27,210 --> 00:27:28,080 Certainly not. 452 00:27:28,080 --> 00:27:35,980 If I have only two parameters, two unknowns, two columns here, 453 00:27:35,980 --> 00:27:38,350 the rank is going to be two. 454 00:27:38,350 --> 00:27:44,440 And here I'm trying to hit any noisy set of measurements. 455 00:27:44,440 --> 00:27:47,800 So of course, in general the picture will look like that. 456 00:27:47,800 --> 00:27:50,500 And I'm going to look for the best C and D. 457 00:27:50,500 --> 00:28:01,840 So I'll call it Cx plus D. Yeah, right. 458 00:28:01,840 --> 00:28:03,250 Sorry. 459 00:28:03,250 --> 00:28:05,760 That's my line. 460 00:28:05,760 --> 00:28:07,570 So those are my equations. 461 00:28:10,220 --> 00:28:14,200 Sorry, I often write it C plus dx. 462 00:28:14,200 --> 00:28:16,810 Do you mind if I put the constant term 463 00:28:16,810 --> 00:28:21,770 first in the highly difficult equation here 464 00:28:21,770 --> 00:28:22,780 for a straight line? 465 00:28:26,790 --> 00:28:29,340 So let me tell you what I'm-- 466 00:28:29,340 --> 00:28:32,565 so these are the points where you have a measurement-- 467 00:28:32,565 --> 00:28:35,820 x1, x2, up to xn. 468 00:28:35,820 --> 00:28:39,410 And these are the actual measurements, b1 up to bm, 469 00:28:39,410 --> 00:28:40,770 let's say . 470 00:28:40,770 --> 00:28:43,740 And then my equations are-- 471 00:28:43,740 --> 00:28:46,530 I just want to set up a matrix here. 472 00:28:46,530 --> 00:28:49,420 I just want to set up the matrix. 473 00:28:49,420 --> 00:28:55,160 So I want C to get multiplied by ones every time. 474 00:28:55,160 --> 00:29:02,010 And I want D to get multiplied by these x's-- x1, x2, x3, 475 00:29:02,010 --> 00:29:05,790 to xm, the measurement places. 476 00:29:05,790 --> 00:29:07,740 And those are the measurements. 477 00:29:07,740 --> 00:29:10,300 Anyway. 478 00:29:10,300 --> 00:29:13,165 And my problem is, this has no solution. 479 00:29:15,850 --> 00:29:17,770 So what do I do when there's no solution? 480 00:29:20,710 --> 00:29:24,810 Well, I'll do what Gauss did. 481 00:29:24,810 --> 00:29:28,570 He was a good mathematician, so I'll follow his advice. 482 00:29:28,570 --> 00:29:34,020 And I won't do it all semester, as you know. 483 00:29:34,020 --> 00:29:40,330 But Gauss's advice was, minimize-- 484 00:29:40,330 --> 00:29:43,200 I'll blame it on Gauss-- 485 00:29:43,200 --> 00:29:53,550 the distance between Ax and b squared, the L2 norm squared, 486 00:29:53,550 --> 00:30:02,030 which is just Ax minus b transpose Ax minus b. 487 00:30:02,030 --> 00:30:04,190 It's a quadratic. 488 00:30:04,190 --> 00:30:10,050 And minimizing it gives me a system of linear equations. 489 00:30:10,050 --> 00:30:12,330 So in the end, they will have a solution. 490 00:30:12,330 --> 00:30:14,660 So that's the whole point of least squares. 491 00:30:14,660 --> 00:30:20,080 We have an unsolvable problem, not no solution. 492 00:30:20,080 --> 00:30:25,330 We follow Gauss's advice to get the best we can. 493 00:30:25,330 --> 00:30:28,840 And that does produce an answer. 494 00:30:28,840 --> 00:30:34,690 So this is-- if I multiply this out, it's x transpose, 495 00:30:34,690 --> 00:30:36,610 A transpose, Ax. 496 00:30:36,610 --> 00:30:39,830 That comes from the squared term. 497 00:30:39,830 --> 00:30:42,100 And then I have probably these-- 498 00:30:42,100 --> 00:30:48,150 actually, probably I'll get two of those, and then 499 00:30:48,150 --> 00:30:52,240 a constant term that has derivative 0 500 00:30:52,240 --> 00:30:54,100 so it doesn't enter. 501 00:30:54,100 --> 00:30:56,590 So this is what I'm minimizing. 502 00:30:56,590 --> 00:30:59,800 This is the loss function. 503 00:30:59,800 --> 00:31:01,540 And it leads to-- 504 00:31:01,540 --> 00:31:07,390 let's just jump to the key here. 505 00:31:07,390 --> 00:31:11,920 What equation do I get when I look for-- 506 00:31:11,920 --> 00:31:19,160 what equation is solved by the best x, the best x? 507 00:31:19,160 --> 00:31:23,610 The best x solves the famous-- 508 00:31:23,610 --> 00:31:28,860 this is regression in statistics, linear regression. 509 00:31:31,760 --> 00:31:36,240 It's one of the main computations in statistics, 510 00:31:36,240 --> 00:31:38,700 not of course just for straight line fits, 511 00:31:38,700 --> 00:31:42,930 but for any system Ax equal b. 512 00:31:42,930 --> 00:31:45,300 That will lead to-- 513 00:31:45,300 --> 00:31:48,660 this minimum will lead to a system of equations 514 00:31:48,660 --> 00:31:50,760 that I'm going to put a box around, 515 00:31:50,760 --> 00:31:53,340 because it's so fundamental. 516 00:31:53,340 --> 00:31:58,990 And are you willing to tell me what that equation is? 517 00:31:58,990 --> 00:31:59,750 Yes, thanks. 518 00:31:59,750 --> 00:32:00,750 AUDIENCE: A transpose A. 519 00:32:00,750 --> 00:32:04,240 PROFESSOR: A transpose A is going to come from there-- 520 00:32:04,240 --> 00:32:06,270 you see it-- 521 00:32:06,270 --> 00:32:13,080 times the best x equals A transpose b. 522 00:32:17,980 --> 00:32:20,610 That gives the minimum. 523 00:32:20,610 --> 00:32:23,610 Let me forego checking that. 524 00:32:23,610 --> 00:32:27,520 You see that the quadratic term has the matrix in it. 525 00:32:27,520 --> 00:32:29,920 So it's derivative. 526 00:32:29,920 --> 00:32:34,980 Maybe the derivative of this is 2 A transpose Ax, 527 00:32:34,980 --> 00:32:38,100 and then the 2 cancels that 2. 528 00:32:38,100 --> 00:32:43,650 And this could also be written as x transpose A transpose b. 529 00:32:43,650 --> 00:32:47,970 So it's x transpose against A transpose b. 530 00:32:47,970 --> 00:32:48,960 That's linear. 531 00:32:48,960 --> 00:32:53,990 So when I take the derivative, it's that constant. 532 00:32:53,990 --> 00:32:55,490 That's pretty fast. 533 00:32:55,490 --> 00:33:04,090 18.06 would patiently derive that. 534 00:33:04,090 --> 00:33:08,640 But here, let me give you the picture that 535 00:33:08,640 --> 00:33:12,500 goes with it, the geometry. 536 00:33:12,500 --> 00:33:17,950 So we have the problem. 537 00:33:17,950 --> 00:33:19,920 No solution. 538 00:33:19,920 --> 00:33:24,920 We have Gauss's best answer. 539 00:33:24,920 --> 00:33:29,210 Minimize the 2 norm of the error. 540 00:33:29,210 --> 00:33:33,410 We have the conclusion, the matrix that we get in. 541 00:33:33,410 --> 00:33:36,170 And now I want to draw a picture that goes with it. 542 00:33:36,170 --> 00:33:37,010 OK. 543 00:33:37,010 --> 00:33:38,450 So here is a picture. 544 00:33:44,490 --> 00:33:48,630 I want to have a column space of A there in that picture. 545 00:33:48,630 --> 00:33:54,060 Of course, the 0 vector's in the column space of A. 546 00:33:54,060 --> 00:34:01,070 So this is all possible vectors Ax. 547 00:34:05,560 --> 00:34:06,100 Right? 548 00:34:06,100 --> 00:34:11,630 You're never forgetting that the column space is all the Ax's. 549 00:34:11,630 --> 00:34:15,820 Now, I've got to put b in the picture. 550 00:34:15,820 --> 00:34:20,770 So where does this vector b-- so I'm trying to solve Ax 551 00:34:20,770 --> 00:34:23,350 equal b, but failing. 552 00:34:23,350 --> 00:34:28,570 So if I draw b in this picture, how do I draw b? 553 00:34:28,570 --> 00:34:29,830 Where do I put it? 554 00:34:29,830 --> 00:34:32,889 Shall I put it in the column space? 555 00:34:32,889 --> 00:34:34,120 No. 556 00:34:34,120 --> 00:34:37,389 The whole point is, it's not in the column space. 557 00:34:37,389 --> 00:34:39,580 It's not an Ax. 558 00:34:39,580 --> 00:34:43,000 It's out there somewhere, b. 559 00:34:43,000 --> 00:34:45,350 OK. 560 00:34:45,350 --> 00:34:47,960 And then what's the geometry that 561 00:34:47,960 --> 00:34:51,860 goes with least squares and the normal equations 562 00:34:51,860 --> 00:34:57,710 and Gauss's suggestion to minimize the error? 563 00:34:57,710 --> 00:35:04,460 Where will Ax be, the best Ax that I can do? 564 00:35:04,460 --> 00:35:12,350 So what Gauss has produced is an A here. 565 00:35:12,350 --> 00:35:14,390 You can't find an x. 566 00:35:14,390 --> 00:35:17,040 He'll do as best he can. 567 00:35:17,040 --> 00:35:20,550 And we're calling that guy x hat. 568 00:35:20,550 --> 00:35:24,690 And this is the algebra to find x hat. 569 00:35:24,690 --> 00:35:28,770 And now, where is the picture here? 570 00:35:28,770 --> 00:35:31,910 Where is this vector Ax hat, which 571 00:35:31,910 --> 00:35:35,750 is the best Ax we can get? 572 00:35:35,750 --> 00:35:39,320 So it has to be in the column space, 573 00:35:39,320 --> 00:35:41,420 because it's A times something. 574 00:35:41,420 --> 00:35:44,210 And where is it in the column space? 575 00:35:44,210 --> 00:35:48,330 It's the projection. 576 00:35:48,330 --> 00:35:50,640 That's Ax hat. 577 00:35:50,640 --> 00:35:54,540 And here is the error, which you couldn't do anything about, 578 00:35:54,540 --> 00:35:56,040 b minus Ax hat. 579 00:35:58,970 --> 00:35:59,470 Yeah. 580 00:35:59,470 --> 00:36:02,170 So it's the projection, right. 581 00:36:02,170 --> 00:36:07,210 So all this is justifying the-- 582 00:36:07,210 --> 00:36:13,030 so we're in the second approach to least squares, 583 00:36:13,030 --> 00:36:15,940 solve the normal equations. 584 00:36:15,940 --> 00:36:18,730 Solve the normal equations. 585 00:36:18,730 --> 00:36:22,590 That would be the second approach to least squares. 586 00:36:22,590 --> 00:36:30,800 And most examples, if they're not very big or very difficult, 587 00:36:30,800 --> 00:36:33,550 you just create the matrix A transpose A, 588 00:36:33,550 --> 00:36:39,130 and you call MATLAB and solve that linear system. 589 00:36:39,130 --> 00:36:41,692 You create the matrix, you create the right hand side, 590 00:36:41,692 --> 00:36:42,400 and you solve it. 591 00:36:45,670 --> 00:36:51,330 So that's the ordinary run of the mill least squares problem. 592 00:36:51,330 --> 00:36:54,150 Just do it. 593 00:36:54,150 --> 00:36:56,620 So that's method two, just do it. 594 00:36:59,650 --> 00:37:02,590 What's method three? 595 00:37:02,590 --> 00:37:05,140 For the same-- we're talking about the same problem here, 596 00:37:05,140 --> 00:37:10,950 but now I'm thinking it may be a little more difficult. 597 00:37:10,950 --> 00:37:17,820 This matrix A transpose A might be nearly singular. 598 00:37:17,820 --> 00:37:21,200 Gauss is assuming that-- 599 00:37:21,200 --> 00:37:23,410 yeah, when did this work? 600 00:37:23,410 --> 00:37:24,890 When did this work? 601 00:37:24,890 --> 00:37:30,150 And it will continue to work in the next three-- 602 00:37:30,150 --> 00:37:38,110 this works, this is good, if assuming A 603 00:37:38,110 --> 00:37:41,295 has independent columns. 604 00:37:52,570 --> 00:37:54,810 Yeah, better just make clear. 605 00:37:54,810 --> 00:37:58,320 I'm claiming that when A has-- 606 00:37:58,320 --> 00:38:00,570 so what's the reasoning? 607 00:38:00,570 --> 00:38:03,430 If A has independent columns-- 608 00:38:03,430 --> 00:38:06,550 but maybe not enough columns, like here-- 609 00:38:06,550 --> 00:38:07,840 it's only got two columns. 610 00:38:07,840 --> 00:38:10,750 It's obviously not going to be able to match any right hand 611 00:38:10,750 --> 00:38:11,410 side. 612 00:38:11,410 --> 00:38:13,570 But it's got independent columns. 613 00:38:13,570 --> 00:38:16,630 When A has independent columns, then what can I 614 00:38:16,630 --> 00:38:17,815 say about this matrix? 615 00:38:21,800 --> 00:38:23,870 It's invertible. 616 00:38:23,870 --> 00:38:25,500 Gauss's plan works. 617 00:38:25,500 --> 00:38:29,790 If A has independent columns, then this 618 00:38:29,790 --> 00:38:32,880 would be a linear algebra step. 619 00:38:32,880 --> 00:38:35,040 Then this will be invertible. 620 00:38:35,040 --> 00:38:37,230 You see the importance of that step. 621 00:38:37,230 --> 00:38:38,910 If A has independent columns, that 622 00:38:38,910 --> 00:38:41,550 means it has no null space. 623 00:38:41,550 --> 00:38:44,760 Only x equals 0 is in the null space. 624 00:38:44,760 --> 00:38:48,000 Two independent columns, but only two. 625 00:38:48,000 --> 00:38:52,140 So not enough to solve systems, but independent. 626 00:38:52,140 --> 00:38:53,560 Then you're OK. 627 00:38:53,560 --> 00:38:55,590 This matrix is invertible. 628 00:38:55,590 --> 00:38:57,120 You can do what Gauss tells you. 629 00:39:00,200 --> 00:39:03,530 But we're prepared now-- 630 00:39:03,530 --> 00:39:07,790 we have to think, OK. 631 00:39:07,790 --> 00:39:11,930 So what do I really want to do? 632 00:39:11,930 --> 00:39:19,410 I want to connect this Gauss's solution to the pseudo inverse. 633 00:39:19,410 --> 00:39:22,860 Because I'm claiming they both give the same result. 634 00:39:22,860 --> 00:39:29,065 The pseudo inverse will apply. 635 00:39:31,910 --> 00:39:35,790 But we have something-- 636 00:39:35,790 --> 00:39:37,280 A is not invertible. 637 00:39:37,280 --> 00:39:40,310 Just keep remembering this matrix. 638 00:39:40,310 --> 00:39:41,990 It's not invertible. 639 00:39:41,990 --> 00:39:47,850 But it has got independent columns. 640 00:39:47,850 --> 00:39:50,143 What am I saying there? 641 00:39:50,143 --> 00:39:51,435 Just going back to the picture. 642 00:39:56,030 --> 00:40:01,040 If A is a matrix with independent columns, 643 00:40:01,040 --> 00:40:03,370 what space disappears in this picture? 644 00:40:06,460 --> 00:40:08,690 The null space goes away. 645 00:40:08,690 --> 00:40:10,450 So the picture is simpler. 646 00:40:10,450 --> 00:40:16,160 But it's still the null space of A transpose. 647 00:40:16,160 --> 00:40:18,800 This is still pretty big, because I only 648 00:40:18,800 --> 00:40:21,530 had two columns and a whole lot of rows. 649 00:40:21,530 --> 00:40:24,530 And that's going to be reflected here. 650 00:40:24,530 --> 00:40:28,520 So what am I trying to say? 651 00:40:28,520 --> 00:40:31,250 I'm trying to say that this answer is the same 652 00:40:31,250 --> 00:40:33,740 as the pseudo inverse answer. 653 00:40:33,740 --> 00:40:36,590 We could possibly even check that point. 654 00:40:36,590 --> 00:40:38,210 Let me write it down first. 655 00:40:40,900 --> 00:40:52,160 I claim that the answer A plus b is 656 00:40:52,160 --> 00:40:58,870 the same as the answer coming from here, A transpose A, 657 00:40:58,870 --> 00:41:10,710 inverse A transpose b, when I guess the null space is 0, 658 00:41:10,710 --> 00:41:15,740 the rank is all of n, whatever you like to say. 659 00:41:15,740 --> 00:41:23,970 I believe that method one, this two within one quick formula-- 660 00:41:23,970 --> 00:41:32,970 so you remember that this was V sigma plus U transpose, right? 661 00:41:32,970 --> 00:41:35,550 That's what A transpose was. 662 00:41:35,550 --> 00:41:37,250 That this should agree with this. 663 00:41:43,750 --> 00:41:49,820 I believe those are the same when the null space isn't 664 00:41:49,820 --> 00:41:51,380 in the picture. 665 00:41:51,380 --> 00:41:54,920 So the fact that the null space is just a 0 vector 666 00:41:54,920 --> 00:41:58,700 means that this inverse does exist. 667 00:41:58,700 --> 00:42:01,400 So this inverse exists. 668 00:42:01,400 --> 00:42:07,970 But A A transpose is not invertible. 669 00:42:07,970 --> 00:42:09,410 Right? 670 00:42:09,410 --> 00:42:12,200 No inverse. 671 00:42:12,200 --> 00:42:18,170 Because A A transpose would be coming-- 672 00:42:18,170 --> 00:42:20,750 all this is the null space of A transpose. 673 00:42:20,750 --> 00:42:22,790 So A transpose is not invertible. 674 00:42:26,390 --> 00:42:30,950 But A transpose A is invertible. 675 00:42:30,950 --> 00:42:34,090 How would you check that? 676 00:42:34,090 --> 00:42:36,610 You see what I'm-- 677 00:42:36,610 --> 00:42:40,330 it's taken pretty much the whole hour 678 00:42:40,330 --> 00:42:46,930 to get a picture of the geometry of the pseudo inverse. 679 00:42:46,930 --> 00:42:50,790 So this is the pseudo inverse. 680 00:42:50,790 --> 00:42:57,090 And this is-- that matrix there, it's 681 00:42:57,090 --> 00:43:00,090 really doing its best to be the inverse. 682 00:43:00,090 --> 00:43:03,450 In fact, everybody here is just doing their best 683 00:43:03,450 --> 00:43:04,950 to be the inverse. 684 00:43:04,950 --> 00:43:07,680 Now, how well is this-- how close 685 00:43:07,680 --> 00:43:09,420 is that to being the inverse? 686 00:43:09,420 --> 00:43:11,820 Can I just ask you about that, and then I'll 687 00:43:11,820 --> 00:43:15,870 make this connection, and then we're out of time. 688 00:43:15,870 --> 00:43:18,600 How close is that to being the inverse of A? 689 00:43:21,600 --> 00:43:25,660 Suppose I multiply that by A. What do I get? 690 00:43:25,660 --> 00:43:26,835 So just notice. 691 00:43:30,030 --> 00:43:38,110 If I multiply that by A, what do I get? 692 00:43:38,110 --> 00:43:40,170 I get, yeah? 693 00:43:40,170 --> 00:43:45,480 I get I. Terrific. 694 00:43:45,480 --> 00:43:47,340 But don't be deceived to thinking 695 00:43:47,340 --> 00:43:51,370 that this is the inverse of A. It worked on the left side, 696 00:43:51,370 --> 00:43:54,670 but it's not going to be good on the right hand side. 697 00:43:54,670 --> 00:44:06,640 So if I multiply A by this guy in that direction, 698 00:44:06,640 --> 00:44:09,730 I'll get as close to the identity as I can come, 699 00:44:09,730 --> 00:44:12,250 but I won't get the identity that way. 700 00:44:12,250 --> 00:44:14,455 So this is just a little box to say-- 701 00:44:17,090 --> 00:44:19,420 so what's the point I'm making? 702 00:44:19,420 --> 00:44:23,170 I'm claiming that this is the pseudo inverse. 703 00:44:23,170 --> 00:44:25,420 Whatever. 704 00:44:25,420 --> 00:44:26,880 Whatever these spaces. 705 00:44:26,880 --> 00:44:30,580 The rank could be tiny, just one. 706 00:44:30,580 --> 00:44:36,190 This works when the rank is n. 707 00:44:36,190 --> 00:44:38,440 I needed independent columns. 708 00:44:38,440 --> 00:44:40,240 So when the rank is n-- 709 00:44:40,240 --> 00:44:44,470 so this is rank equal n. 710 00:44:44,470 --> 00:44:46,930 That Gauss worked. 711 00:44:46,930 --> 00:44:48,290 Then I can get a-- 712 00:44:48,290 --> 00:44:51,500 then it's a one-sided inverse, but it's not 713 00:44:51,500 --> 00:44:52,730 a two-sided inverse. 714 00:44:52,730 --> 00:44:54,010 I can't do it. 715 00:44:54,010 --> 00:44:55,940 Look, my matrix there. 716 00:44:55,940 --> 00:45:01,160 I could find a one-sided inverse to get the 2 by 2 identity. 717 00:45:01,160 --> 00:45:04,640 But I could never multiply that by some matrix 718 00:45:04,640 --> 00:45:09,520 and get the n by n identity out of those two pathetic columns. 719 00:45:09,520 --> 00:45:12,020 OK. 720 00:45:12,020 --> 00:45:14,990 Maybe you feel like just checking this. 721 00:45:14,990 --> 00:45:16,520 Just takes patience. 722 00:45:16,520 --> 00:45:18,200 What do I mean by checking it? 723 00:45:20,870 --> 00:45:28,940 I mean stick in the pseudo SVD. 724 00:45:28,940 --> 00:45:32,780 Just put it in the SVD and cancel like crazy. 725 00:45:32,780 --> 00:45:35,120 And I think that'll pop out. 726 00:45:35,120 --> 00:45:38,006 Do you believe me? 727 00:45:38,006 --> 00:45:40,380 Because it's going to be a little painful. 728 00:45:40,380 --> 00:45:45,707 3 U sigma V transpose, all transposed, and then something 729 00:45:45,707 --> 00:45:46,790 there and something there. 730 00:45:46,790 --> 00:45:50,690 I've got nine matrices multiplying away. 731 00:45:50,690 --> 00:45:52,400 But it's going to-- 732 00:45:52,400 --> 00:45:54,837 all sorts of things will produce the identity. 733 00:45:54,837 --> 00:45:56,420 And in the end, that's what I'll have. 734 00:45:59,510 --> 00:46:08,420 So this is a one-sided true inverse, where the SVD-- 735 00:46:08,420 --> 00:46:13,960 this fit formula is prepared to have neither side invertible. 736 00:46:13,960 --> 00:46:16,580 It's still-- we know what sigma plus means. 737 00:46:16,580 --> 00:46:18,740 Anyway. 738 00:46:18,740 --> 00:46:22,910 So under the assumption of independent columns, 739 00:46:22,910 --> 00:46:27,215 Gauss works and gives the same answer as the pseudo inverse. 740 00:46:29,950 --> 00:46:31,347 OK. 741 00:46:31,347 --> 00:46:31,930 Three minutes. 742 00:46:34,550 --> 00:46:40,380 That's hardly time, but this being MIT, 743 00:46:40,380 --> 00:46:43,110 I feel I should use it. 744 00:46:43,110 --> 00:46:44,340 Oh my god. 745 00:46:44,340 --> 00:46:45,075 Number three. 746 00:46:47,910 --> 00:46:49,850 So what's number three about? 747 00:46:49,850 --> 00:46:57,430 Number three has the same requirement 748 00:46:57,430 --> 00:47:02,640 as number two, the same requirement of no null space. 749 00:47:02,640 --> 00:47:07,590 But it says, if I could get orthogonal columns first, 750 00:47:07,590 --> 00:47:11,490 then this problem would be easy. 751 00:47:11,490 --> 00:47:15,750 So everybody knows that Gram-Schmidt is a way-- 752 00:47:15,750 --> 00:47:22,780 boring way-- to get from these two columns 753 00:47:22,780 --> 00:47:26,300 to get two orthogonal columns. 754 00:47:26,300 --> 00:47:28,250 Actually, the whole idea of Gram-Schmidt 755 00:47:28,250 --> 00:47:29,900 is already there for 2 by 2. 756 00:47:29,900 --> 00:47:32,840 So I have two minutes, and we can do it. 757 00:47:32,840 --> 00:47:37,025 Let's do Gram-Schmidt on these two columns-- 758 00:47:42,500 --> 00:47:44,060 I don't want to use U and V-- 759 00:47:44,060 --> 00:47:46,970 column y and z. 760 00:47:46,970 --> 00:47:48,020 OK. 761 00:47:48,020 --> 00:47:50,570 Suppose I want to orthogonalize those guys. 762 00:47:50,570 --> 00:47:53,190 What's the Gram-Schmidt idea? 763 00:47:53,190 --> 00:47:54,210 I take y. 764 00:47:54,210 --> 00:47:56,160 It's perfectly good. 765 00:47:56,160 --> 00:47:58,710 No problem with y. 766 00:47:58,710 --> 00:48:03,160 There is the y vector, the all 1's. 767 00:48:03,160 --> 00:48:07,680 Then this guy is not orthogonal probably to that. 768 00:48:07,680 --> 00:48:11,970 It'll go off in this direction, with an angle 769 00:48:11,970 --> 00:48:14,160 that's not 90 degrees. 770 00:48:14,160 --> 00:48:16,170 So what do I do? 771 00:48:16,170 --> 00:48:19,680 I want to get orthogonal vectors. 772 00:48:19,680 --> 00:48:23,520 I'm OK with this first guy, but the second guy 773 00:48:23,520 --> 00:48:25,110 isn't orthogonal to the first. 774 00:48:25,110 --> 00:48:27,430 So what do I do? 775 00:48:27,430 --> 00:48:30,070 How do I-- in this picture, how do I come up 776 00:48:30,070 --> 00:48:32,620 with a vector orthogonal to y? 777 00:48:35,280 --> 00:48:36,900 Project. 778 00:48:36,900 --> 00:48:40,630 I take this z, and I take its projection. 779 00:48:40,630 --> 00:48:43,830 So z has a little piece-- 780 00:48:43,830 --> 00:48:49,020 that z vector has a big piece already in the direction of y, 781 00:48:49,020 --> 00:48:52,080 which I don't want, and a piece orthogonal to it. 782 00:48:52,080 --> 00:48:53,970 That's my other piece. 783 00:48:53,970 --> 00:48:55,020 That's my other piece. 784 00:48:55,020 --> 00:48:56,830 So here's y. 785 00:48:56,830 --> 00:49:05,340 And here's the-- that is z minus projection, let me just say. 786 00:49:05,340 --> 00:49:06,560 Whatever. 787 00:49:06,560 --> 00:49:07,200 Yeah. 788 00:49:07,200 --> 00:49:09,300 I don't know if I even drew that picture right. 789 00:49:09,300 --> 00:49:10,290 Probably I didn't. 790 00:49:10,290 --> 00:49:11,040 Anyway. 791 00:49:11,040 --> 00:49:11,890 Whatever. 792 00:49:11,890 --> 00:49:17,100 The Gram-Schmidt idea is just orthogonalize 793 00:49:17,100 --> 00:49:18,730 in the natural way. 794 00:49:18,730 --> 00:49:21,660 I'll come back to that at the beginning of next time 795 00:49:21,660 --> 00:49:27,950 and say a word about the fourth way. 796 00:49:27,950 --> 00:49:32,330 So this least squares is not deep learning. 797 00:49:32,330 --> 00:49:36,350 It's what people did a century ago 798 00:49:36,350 --> 00:49:38,730 and continue to do for good reason. 799 00:49:38,730 --> 00:49:39,770 OK. 800 00:49:39,770 --> 00:49:43,100 And I'll send out that announcement about the class, 801 00:49:43,100 --> 00:49:44,840 and you know the homework, and you know 802 00:49:44,840 --> 00:49:48,000 the new due date is Friday. 803 00:49:48,000 --> 00:49:48,500 Good. 804 00:49:48,500 --> 00:49:50,089 Thank you.