1 00:00:01,550 --> 00:00:03,920 The following content is provided under a Creative 2 00:00:03,920 --> 00:00:05,310 Commons license. 3 00:00:05,310 --> 00:00:07,520 Your support will help MIT open courseware 4 00:00:07,520 --> 00:00:11,610 continue to offer high quality educational resources for free. 5 00:00:11,610 --> 00:00:14,180 To make a donation or to view additional materials 6 00:00:14,180 --> 00:00:18,140 from hundreds of MIT courses, visit MIT OpenCourseWare 7 00:00:18,140 --> 00:00:19,026 at ocw.mit.edu. 8 00:00:25,045 --> 00:00:27,150 PROFESSOR: Let's go. 9 00:00:27,150 --> 00:00:32,040 So if you want to know the subject of today's class, 10 00:00:32,040 --> 00:00:34,650 it's A x = b. 11 00:00:34,650 --> 00:00:38,460 I got started writing down different possibilities 12 00:00:38,460 --> 00:00:43,320 for A x = b, and I got carried away. 13 00:00:43,320 --> 00:00:52,420 It just appears all over the place for different sizes, 14 00:00:52,420 --> 00:00:55,770 different ranks, different situations, nearly singular, 15 00:00:55,770 --> 00:00:57,660 not nearly singular. 16 00:00:57,660 --> 00:01:02,590 And the question is, what do you do in each case? 17 00:01:02,590 --> 00:01:08,730 So can I outline my little two pages of notes 18 00:01:08,730 --> 00:01:12,600 here, and then pick on one or two of these topics 19 00:01:12,600 --> 00:01:17,820 to develop today, and a little more on Friday 20 00:01:17,820 --> 00:01:19,950 about Gram-Schmidt? 21 00:01:19,950 --> 00:01:23,730 So I won't do much, if any, of Gram-Schmidt today, 22 00:01:23,730 --> 00:01:28,260 but I will do the others. 23 00:01:28,260 --> 00:01:30,705 So the problem is A x = b. 24 00:01:30,705 --> 00:01:32,670 That problem has come from somewhere. 25 00:01:32,670 --> 00:01:36,510 We have to produce some kind of an answer, x. 26 00:01:36,510 --> 00:01:46,530 So I'm going from good to bad or easy to difficult in this list. 27 00:01:46,530 --> 00:01:53,680 Well, except for number 0, which is an answer in all cases, 28 00:01:53,680 --> 00:01:58,120 using the pseudo inverse that I introduced last time. 29 00:01:58,120 --> 00:02:03,910 So that deals with 0 eigenvalues and zero singular 30 00:02:03,910 --> 00:02:06,430 values by saying their inverse is also 31 00:02:06,430 --> 00:02:09,310 0, which is kind of wild. 32 00:02:09,310 --> 00:02:13,750 So we'll come back to the meaning of the pseudo inverse. 33 00:02:13,750 --> 00:02:16,630 But now, I want to get real, here, 34 00:02:16,630 --> 00:02:18,250 about different situations. 35 00:02:18,250 --> 00:02:21,910 So number 1 is the good, normal case, 36 00:02:21,910 --> 00:02:27,430 when a person has a square matrix of reasonable size, 37 00:02:27,430 --> 00:02:30,020 reasonable condition, a condition 38 00:02:30,020 --> 00:02:31,750 number-- oh, the condition number, 39 00:02:31,750 --> 00:02:37,290 I should call it sigma 1 over sigma n. 40 00:02:37,290 --> 00:02:41,560 It's the ratio of the largest to the smallest singular value. 41 00:02:41,560 --> 00:02:45,940 And let's say that's within reason, not more than 1,000 42 00:02:45,940 --> 00:02:46,900 or something. 43 00:02:46,900 --> 00:02:53,530 Then normal, ordinary elimination is going to work, 44 00:02:53,530 --> 00:02:55,030 and Matlab-- 45 00:02:55,030 --> 00:03:00,040 the command that would produce the answer is just backslash. 46 00:03:00,040 --> 00:03:04,180 So this is the normal case. 47 00:03:04,180 --> 00:03:10,340 Now, the cases that follow have problems of some kind, 48 00:03:10,340 --> 00:03:12,880 and I guess I'm hoping that this is 49 00:03:12,880 --> 00:03:21,790 a sort of useful dictionary of what to do for you and me both. 50 00:03:21,790 --> 00:03:26,800 So we have this case here, where we have too many equations. 51 00:03:26,800 --> 00:03:32,710 So that's a pretty normal case, and we'll think mostly 52 00:03:32,710 --> 00:03:36,400 of solving by least squares, which leads us 53 00:03:36,400 --> 00:03:37,940 to the normal equation. 54 00:03:37,940 --> 00:03:42,040 So this is standard, happens all the time in statistics. 55 00:03:42,040 --> 00:03:47,200 And I'm thinking in the reasonable case, 56 00:03:47,200 --> 00:03:48,370 that would be ex hat. 57 00:03:51,070 --> 00:03:54,280 The solution A-- this matrix would be 58 00:03:54,280 --> 00:03:57,700 invertible and reasonable size. 59 00:03:57,700 --> 00:04:02,650 So backslash would still solve that problem. 60 00:04:02,650 --> 00:04:05,860 Backslash doesn't require a square matrix 61 00:04:05,860 --> 00:04:07,570 to give you an answer. 62 00:04:07,570 --> 00:04:12,730 So that's the good case, where the matrix is not 63 00:04:12,730 --> 00:04:16,300 too big, so it's not unreasonable to form 64 00:04:16,300 --> 00:04:18,930 a transpose. 65 00:04:18,930 --> 00:04:20,499 Now, here's the other extreme. 66 00:04:23,290 --> 00:04:28,240 What's exciting for us is this is the underdetermined case. 67 00:04:28,240 --> 00:04:30,490 I don't have enough equations, so I 68 00:04:30,490 --> 00:04:34,870 have to put something more in to get a specific answer. 69 00:04:34,870 --> 00:04:38,620 And what makes it exciting for us is that that's 70 00:04:38,620 --> 00:04:41,290 typical of deep learning. 71 00:04:41,290 --> 00:04:44,740 There are so many weights in a deep neural network 72 00:04:44,740 --> 00:04:49,060 that the weights would be the unknowns. 73 00:04:49,060 --> 00:04:51,355 Of course, it wouldn't be necessarily linear. 74 00:04:51,355 --> 00:04:54,325 It wouldn't be linear, but still the idea's 75 00:04:54,325 --> 00:05:01,580 the same that we have many solutions, 76 00:05:01,580 --> 00:05:04,360 and we have to pick one. 77 00:05:04,360 --> 00:05:07,000 Or we have to pick an algorithm, and then it will find one. 78 00:05:09,850 --> 00:05:13,450 So we could pick the minimum norm solution, the shortest 79 00:05:13,450 --> 00:05:14,050 solution. 80 00:05:14,050 --> 00:05:16,690 That would be an L2 answer. 81 00:05:16,690 --> 00:05:19,440 Or we could go to L1. 82 00:05:19,440 --> 00:05:25,980 And the big question that, I think, might be settled in 2018 83 00:05:25,980 --> 00:05:32,340 is, does deep learning and the iteration from stochastic 84 00:05:32,340 --> 00:05:36,750 gradient descent that we'll see pretty soon-- 85 00:05:36,750 --> 00:05:40,230 does it go to the minimum L1? 86 00:05:40,230 --> 00:05:42,330 Does it pick out an L1 solution? 87 00:05:42,330 --> 00:05:46,340 That's really an exciting math question. 88 00:05:46,340 --> 00:05:51,620 For a long time, it was standard to say 89 00:05:51,620 --> 00:05:57,760 that these deep learning AI codes are fantastic, 90 00:05:57,760 --> 00:05:59,880 but what are they doing? 91 00:05:59,880 --> 00:06:05,160 We don't know all the interior, but we-- when I say we, 92 00:06:05,160 --> 00:06:10,110 I don't mean I. Other people are getting there, 93 00:06:10,110 --> 00:06:13,710 and I'm going to tell you as much as I can about it 94 00:06:13,710 --> 00:06:16,080 when we get there. 95 00:06:16,080 --> 00:06:18,645 So those are pretty standard cases. 96 00:06:21,820 --> 00:06:27,730 m = n, m greater than n, m less than n, but not crazy. 97 00:06:27,730 --> 00:06:32,440 Now, the second board will have more difficult problems. 98 00:06:37,300 --> 00:06:40,100 Usually, because they're nearly singular in some way, 99 00:06:40,100 --> 00:06:43,720 the columns are nearly dependent. 100 00:06:43,720 --> 00:06:47,140 So that would be the columns in bad condition. 101 00:06:47,140 --> 00:06:49,750 You just picked a terrible basis, 102 00:06:49,750 --> 00:06:54,160 or nature did, or somehow you got a matrix A whose columns 103 00:06:54,160 --> 00:06:57,640 are virtually dependent-- 104 00:06:57,640 --> 00:06:59,800 almost linearly dependent. 105 00:06:59,800 --> 00:07:06,340 The inverse matrix is really big, but it exists. 106 00:07:06,340 --> 00:07:10,270 Then that's when you go in, and you fix the columns. 107 00:07:10,270 --> 00:07:12,250 You orthogonalize columns. 108 00:07:12,250 --> 00:07:16,840 Instead of accepting the columns A1, A2, up to An 109 00:07:16,840 --> 00:07:20,170 of the given matrix, you go in, and you 110 00:07:20,170 --> 00:07:25,170 find orthonormal vectors in that column space 111 00:07:25,170 --> 00:07:30,610 and orthonormal basis Q1 to Qn. 112 00:07:30,610 --> 00:07:32,980 And the two are connected by Gram-Schmidt. 113 00:07:32,980 --> 00:07:37,030 And the famous matrix statement of Gram-Schmidt 114 00:07:37,030 --> 00:07:41,920 is here are the columns of A. Here are the columns of Q, 115 00:07:41,920 --> 00:07:47,890 and there's a triangular matrix that connects the two. 116 00:07:47,890 --> 00:07:52,710 So that is the central topic of Gram-Schmidt 117 00:07:52,710 --> 00:07:55,680 in that idea of orthogonalizing. 118 00:07:55,680 --> 00:07:56,805 It just appears everywhere. 119 00:07:56,805 --> 00:08:01,920 It appears all over course 6 in many, many situations 120 00:08:01,920 --> 00:08:02,970 with different names. 121 00:08:02,970 --> 00:08:08,550 So that, I'm sort of saving a little bit until next time, 122 00:08:08,550 --> 00:08:10,990 and let me tell you why. 123 00:08:10,990 --> 00:08:17,430 Because just the organization of Gram-Schmidt is interesting. 124 00:08:17,430 --> 00:08:21,810 So Gram-Schmidt, you could do the normal way. 125 00:08:21,810 --> 00:08:25,410 So that's what I teach in 18.06. 126 00:08:25,410 --> 00:08:29,220 Just take every column as it comes. 127 00:08:29,220 --> 00:08:33,419 Subtract off projections onto their previous stuff. 128 00:08:33,419 --> 00:08:36,960 Get it orthogonal to the previous guys. 129 00:08:36,960 --> 00:08:39,630 Normalize it to be a unit vector. 130 00:08:39,630 --> 00:08:40,830 Then you've got that column. 131 00:08:40,830 --> 00:08:41,460 Go on. 132 00:08:41,460 --> 00:08:43,710 So I say that again, and then I'll 133 00:08:43,710 --> 00:08:49,090 say it again two days from now. 134 00:08:49,090 --> 00:08:52,380 So Gram-Schmidt, the idea is you take the columns-- 135 00:08:55,440 --> 00:09:03,690 you say the second orthogonal vector, Q2, 136 00:09:03,690 --> 00:09:07,140 will be some combination of columns 1 and 2, 137 00:09:07,140 --> 00:09:08,625 orthogonal to the first. 138 00:09:11,720 --> 00:09:12,840 Lots to do. 139 00:09:12,840 --> 00:09:17,760 And there's another order, which is really the better order 140 00:09:17,760 --> 00:09:20,160 to do Gram-Schmidt, and it allows 141 00:09:20,160 --> 00:09:23,140 you to do column pivoting. 142 00:09:23,140 --> 00:09:27,330 So this is my topic for next time, 143 00:09:27,330 --> 00:09:30,180 to see Gram-Schmidt more carefully. 144 00:09:32,920 --> 00:09:39,130 Column pivoting means the columns might not 145 00:09:39,130 --> 00:09:46,750 come in a good order, so you allow yourself to reorder them. 146 00:09:46,750 --> 00:09:50,930 We know that you have to do that for elimination. 147 00:09:50,930 --> 00:09:53,710 In elimination, it would be rows. 148 00:09:53,710 --> 00:09:58,270 So elimination, we would have the matrix A, 149 00:09:58,270 --> 00:10:06,430 and we take the first row as the first pivot row, 150 00:10:06,430 --> 00:10:08,900 and then the second row, and then the third row. 151 00:10:08,900 --> 00:10:23,280 But if the pivot is too small, then reorder the rows. 152 00:10:27,060 --> 00:10:31,165 So it's row ordering that comes up in elimination. 153 00:10:37,350 --> 00:10:42,450 And Matlab just systematically says, OK, 154 00:10:42,450 --> 00:10:45,000 that's the pivot that's coming up. 155 00:10:45,000 --> 00:10:48,330 The third pivot comes up out of the third row. 156 00:10:48,330 --> 00:10:51,450 But Matlab says look down that whole third column 157 00:10:51,450 --> 00:10:53,850 for a better pivot, a bigger pivot. 158 00:10:53,850 --> 00:10:56,490 Switch to a row exchange. 159 00:10:56,490 --> 00:10:58,590 So there are lots of permutations then. 160 00:10:58,590 --> 00:11:03,960 You end up with something there that permutes the rows, 161 00:11:03,960 --> 00:11:07,170 and then that gets factored into LU. 162 00:11:07,170 --> 00:11:10,680 So I'm saying something about elimination 163 00:11:10,680 --> 00:11:14,430 that's just sort of a side comment 164 00:11:14,430 --> 00:11:18,240 that you would never do elimination 165 00:11:18,240 --> 00:11:22,020 without considering the possibility of row exchanges. 166 00:11:22,020 --> 00:11:29,120 And then this is Gram-Schmidt orthogonalization. 167 00:11:29,120 --> 00:11:31,420 So this is the LU world. 168 00:11:31,420 --> 00:11:34,580 Here is the QR world, and here, it 169 00:11:34,580 --> 00:11:37,800 happens to be columns that you're permuting. 170 00:11:37,800 --> 00:11:39,150 So that's coming. 171 00:11:39,150 --> 00:11:46,710 This is section 2.2, now. 172 00:11:46,710 --> 00:11:47,760 But there's more. 173 00:11:47,760 --> 00:11:50,580 2.2 has quite a bit in it, including 174 00:11:50,580 --> 00:11:53,605 number 0, the pseudo inverse, and including 175 00:11:53,605 --> 00:11:54,480 some of these things. 176 00:11:54,480 --> 00:11:58,710 Actually, this will be also in 2.2. 177 00:11:58,710 --> 00:12:03,160 And maybe this is what I'm saying more about today. 178 00:12:03,160 --> 00:12:08,460 So I'll put a little star for today, here. 179 00:12:08,460 --> 00:12:09,600 What do you do? 180 00:12:09,600 --> 00:12:15,960 So this is a case where the matrix is nearly singular. 181 00:12:15,960 --> 00:12:17,290 You're in danger. 182 00:12:17,290 --> 00:12:19,570 It's inverse is going to be big-- 183 00:12:19,570 --> 00:12:21,190 unreasonably big. 184 00:12:21,190 --> 00:12:23,320 And I wrote inverse problems there, 185 00:12:23,320 --> 00:12:31,360 because inverse problem is a type of problem 186 00:12:31,360 --> 00:12:34,360 with an application that you often need to solve 187 00:12:34,360 --> 00:12:40,970 or that engineering and science have to solve. 188 00:12:40,970 --> 00:12:43,300 So I'll just say a little more about that, 189 00:12:43,300 --> 00:12:50,240 but that's a typical application in which you're near singular. 190 00:12:50,240 --> 00:12:53,720 Your matrix isn't good enough to invert. 191 00:12:53,720 --> 00:12:55,340 Well, of course, you could always say, 192 00:12:55,340 --> 00:12:57,140 well, I'll just use the pseudo inverse, 193 00:12:57,140 --> 00:13:00,440 but numerically, that's like cheating. 194 00:13:00,440 --> 00:13:04,380 You've got to get in there and do something about it. 195 00:13:04,380 --> 00:13:06,920 So inverse problems would be examples. 196 00:13:10,180 --> 00:13:12,220 Actually, as I write that, I think 197 00:13:12,220 --> 00:13:14,680 that would be a topic that I should 198 00:13:14,680 --> 00:13:18,400 add to the list of potential topics for a three week 199 00:13:18,400 --> 00:13:19,120 project. 200 00:13:19,120 --> 00:13:22,360 Look up a book on inverse problems. 201 00:13:22,360 --> 00:13:24,640 So what do I mean by an inverse problem? 202 00:13:24,640 --> 00:13:26,880 I'll just finish this thought. 203 00:13:26,880 --> 00:13:29,020 What's an inverse problem? 204 00:13:29,020 --> 00:13:41,330 Typically, you know about a system, say a network, 205 00:13:41,330 --> 00:13:46,810 RLC network, and you give it a voltage or current. 206 00:13:46,810 --> 00:13:50,310 You give it an input, and you find the output. 207 00:13:50,310 --> 00:13:54,960 You find out what current flows, what the voltages are. 208 00:13:54,960 --> 00:13:57,600 But inverse problems are-- 209 00:13:57,600 --> 00:14:07,730 suppose you know the response to different voltages. 210 00:14:07,730 --> 00:14:11,210 What was the network? 211 00:14:11,210 --> 00:14:12,560 You see the problem? 212 00:14:12,560 --> 00:14:14,150 Let me say it again. 213 00:14:14,150 --> 00:14:17,860 Discover what the network is from its outputs. 214 00:14:17,860 --> 00:14:21,400 So that turns out to typically be a problem that 215 00:14:21,400 --> 00:14:23,640 gives nearly singular matrices. 216 00:14:23,640 --> 00:14:28,190 That's a difficult problem. 217 00:14:28,190 --> 00:14:32,440 A lot of nearby networks would give virtually the same output. 218 00:14:32,440 --> 00:14:37,700 So you have a matrix that's nearly singular. 219 00:14:37,700 --> 00:14:43,340 It's got singular values very close to 0. 220 00:14:43,340 --> 00:14:46,400 What do you do then? 221 00:14:46,400 --> 00:14:49,810 Well, the world of inverse problems 222 00:14:49,810 --> 00:14:55,660 thinks of adding a penalty term, some kind of a penalty term. 223 00:14:55,660 --> 00:14:59,110 When I minimize this thing just by itself, 224 00:14:59,110 --> 00:15:04,472 in the usual way, A transpose, it has a giant inverse. 225 00:15:04,472 --> 00:15:08,800 The matrix A is badly conditioned. 226 00:15:08,800 --> 00:15:13,140 It takes vectors almost to 0. 227 00:15:13,140 --> 00:15:18,090 So that A transpose has got a giant inverse, 228 00:15:18,090 --> 00:15:22,380 and you're at risk of losing everything to round off. 229 00:15:22,380 --> 00:15:26,490 So this is the solution. 230 00:15:26,490 --> 00:15:28,680 You could call it a cheap solution, 231 00:15:28,680 --> 00:15:30,510 but everybody uses it. 232 00:15:30,510 --> 00:15:36,070 So I won't put that word on videotape. 233 00:15:36,070 --> 00:15:40,440 But that sort of resolves the problem, but then 234 00:15:40,440 --> 00:15:41,820 the question-- 235 00:15:41,820 --> 00:15:46,260 it shifts the problem, anyway, to what number-- 236 00:15:46,260 --> 00:15:47,940 what should be the penalty? 237 00:15:47,940 --> 00:15:50,110 How much should you penalize it? 238 00:15:50,110 --> 00:15:58,500 You see, by adding that, you're going to make it invertible. 239 00:15:58,500 --> 00:16:02,040 And if you make this bigger, and bigger, and bigger, 240 00:16:02,040 --> 00:16:07,130 it's more and more well-conditioned. 241 00:16:07,130 --> 00:16:09,350 It resolves the trouble, here. 242 00:16:09,350 --> 00:16:13,780 And like today, I'm going to do more with that. 243 00:16:13,780 --> 00:16:17,530 So with that, I'll stop there and pick it up 244 00:16:17,530 --> 00:16:20,110 after saying something about 6 and 7. 245 00:16:23,350 --> 00:16:24,770 I hope this is helpful. 246 00:16:24,770 --> 00:16:29,470 It was helpful to me, certainly, to see all these possibilities 247 00:16:29,470 --> 00:16:33,880 and to write down what the symptom is. 248 00:16:33,880 --> 00:16:38,560 It's like a linear equation doctor. 249 00:16:38,560 --> 00:16:45,160 Like you look for the symptoms, and then you propose something 250 00:16:45,160 --> 00:16:48,160 at CVS that works or doesn't work. 251 00:16:48,160 --> 00:16:52,590 But you do something about it. 252 00:16:52,590 --> 00:16:55,470 So when the problem is too big-- 253 00:16:59,390 --> 00:17:04,430 up to now, the problems have not been giant out of core. 254 00:17:04,430 --> 00:17:06,050 But now, when it's too big-- 255 00:17:06,050 --> 00:17:08,619 maybe it's still in core but really big-- 256 00:17:08,619 --> 00:17:13,250 then this is in 2.1. 257 00:17:13,250 --> 00:17:15,470 So that's to come back to. 258 00:17:18,020 --> 00:17:19,980 The word I could have written in here, 259 00:17:19,980 --> 00:17:25,640 if I was just going to write one word, would be iteration. 260 00:17:25,640 --> 00:17:31,835 Iterative methods, meaning you take a step like-- 261 00:17:31,835 --> 00:17:37,580 the conjugate radiant method is the hero of iterative methods. 262 00:17:37,580 --> 00:17:40,760 And then that name I erased is Krylov, 263 00:17:40,760 --> 00:17:43,160 and there are other names associated 264 00:17:43,160 --> 00:17:45,080 with iterative methods. 265 00:17:45,080 --> 00:17:51,060 So that's the section that we passed over just 266 00:17:51,060 --> 00:17:56,030 to get rolling, but we'll come back to. 267 00:17:56,030 --> 00:17:59,860 So then that one, you never get the exact answer, 268 00:17:59,860 --> 00:18:03,190 but you get closer and closer. 269 00:18:03,190 --> 00:18:05,230 If the iterative method is successful, 270 00:18:05,230 --> 00:18:09,250 like conjugate gradients, you get pretty close, pretty fast. 271 00:18:09,250 --> 00:18:12,820 And then you say, OK, I'll take it. 272 00:18:12,820 --> 00:18:19,160 And then finally, way too big, like nowhere. 273 00:18:19,160 --> 00:18:20,960 You're not in core. 274 00:18:20,960 --> 00:18:24,890 Just your matrix-- you just have a giant, giant problem, 275 00:18:24,890 --> 00:18:28,540 which, of course, is happening these days. 276 00:18:28,540 --> 00:18:33,490 And then one way to do it is your matrix. 277 00:18:33,490 --> 00:18:36,250 You can't even look at the matrix A, 278 00:18:36,250 --> 00:18:37,840 much less A transpose. 279 00:18:37,840 --> 00:18:40,870 A transpose would be unthinkable. 280 00:18:40,870 --> 00:18:46,990 You couldn't do it in a year. 281 00:18:46,990 --> 00:18:52,430 So randomized linear algebra has popped up, 282 00:18:52,430 --> 00:18:55,580 and the idea there, which we'll see, 283 00:18:55,580 --> 00:19:05,460 is to use probability to sample the matrix 284 00:19:05,460 --> 00:19:08,750 and work with your samples. 285 00:19:08,750 --> 00:19:17,020 So if the matrix is way too big, but not too crazy, so to speak, 286 00:19:17,020 --> 00:19:23,090 then you could sample the columns and the rows, 287 00:19:23,090 --> 00:19:29,280 and get an answer from the sample. 288 00:19:29,280 --> 00:19:33,420 See, if I sample the columns of a matrix, I'm getting-- 289 00:19:33,420 --> 00:19:35,010 so what does sampling mean? 290 00:19:35,010 --> 00:19:40,710 Let me just complete this, say, add a little to this thought. 291 00:19:40,710 --> 00:19:42,010 Sample a matrix. 292 00:19:42,010 --> 00:19:46,760 So I have a giant matrix A. It might be sparse, of course. 293 00:19:46,760 --> 00:19:49,670 I didn't distinguish over their sparse things. 294 00:19:49,670 --> 00:19:51,620 That would be another thing. 295 00:19:51,620 --> 00:20:01,930 So if I just take random X's, more than one, 296 00:20:01,930 --> 00:20:07,620 but not the full n dimensions, those 297 00:20:07,620 --> 00:20:12,510 will give me random guys in the column space. 298 00:20:12,510 --> 00:20:18,660 And if the matrix is reasonable, it 299 00:20:18,660 --> 00:20:21,450 won't take too many to have a pretty reasonable idea of what 300 00:20:21,450 --> 00:20:26,520 that column space is like, and with it's the right hand side. 301 00:20:26,520 --> 00:20:29,730 So this world of randomized linear algebra 302 00:20:29,730 --> 00:20:33,870 has grown because it had to. 303 00:20:33,870 --> 00:20:37,620 And of course, any statement can never 304 00:20:37,620 --> 00:20:40,340 say for sure you're going to get the right answer, 305 00:20:40,340 --> 00:20:47,070 but using the inequalities of probability, 306 00:20:47,070 --> 00:20:51,220 you can often say that the chance of being way off 307 00:20:51,220 --> 00:20:55,650 is less than 1 in 2 to the 20th or something. 308 00:20:55,650 --> 00:21:01,580 So the answer is, in reality, you get a good answer. 309 00:21:01,580 --> 00:21:06,390 That is the end of this chapter, 2.4. 310 00:21:06,390 --> 00:21:08,650 So this is all chapter 2, really. 311 00:21:11,700 --> 00:21:16,890 The iterative method's in 2.1. 312 00:21:16,890 --> 00:21:20,980 Most of this is in 2.2. 313 00:21:20,980 --> 00:21:29,540 Big is 2.3, and then really big is randomized in 2.4. 314 00:21:29,540 --> 00:21:32,650 So now, where are we? 315 00:21:32,650 --> 00:21:37,710 You were going to let me know or not if this is useful to see. 316 00:21:37,710 --> 00:21:43,070 But you sort of see what are real life problems. 317 00:21:43,070 --> 00:21:46,940 And of course, we're highly, especially interested 318 00:21:46,940 --> 00:21:50,990 in getting to the deep learning examples, which 319 00:21:50,990 --> 00:21:53,720 are underdetermined. 320 00:21:53,720 --> 00:21:55,220 Then when you're underdetermined, 321 00:21:55,220 --> 00:21:58,850 you've got many solutions, and the question 322 00:21:58,850 --> 00:22:00,740 is, which one is a good one? 323 00:22:00,740 --> 00:22:02,900 And in deep learning, I just can't 324 00:22:02,900 --> 00:22:05,420 resist saying another word. 325 00:22:09,350 --> 00:22:12,840 So there are many solutions. 326 00:22:12,840 --> 00:22:14,160 What to do? 327 00:22:14,160 --> 00:22:20,350 Well, you pick some algorithm, like steepest descent, which 328 00:22:20,350 --> 00:22:22,970 is going to find a solution. 329 00:22:22,970 --> 00:22:25,790 So you hope it's a good one. 330 00:22:25,790 --> 00:22:29,540 And what does a good one mean verses a not good one? 331 00:22:29,540 --> 00:22:31,570 They're all solutions. 332 00:22:31,570 --> 00:22:35,330 A good one means that when you apply it to the test data 333 00:22:35,330 --> 00:22:39,080 that you haven't yet seen, it gives good results 334 00:22:39,080 --> 00:22:41,270 on the test data. 335 00:22:41,270 --> 00:22:43,460 The solution has learned something 336 00:22:43,460 --> 00:22:47,220 from the training data, and it works on the test data. 337 00:22:47,220 --> 00:22:51,860 So that's the big question in deep learning. 338 00:22:51,860 --> 00:22:55,640 How does it happen that you, by doing gradient descent 339 00:22:55,640 --> 00:22:58,220 or whatever algorithm-- 340 00:22:58,220 --> 00:23:02,610 how does that algorithm bias the solution? 341 00:23:02,610 --> 00:23:04,670 It's called implicit bias. 342 00:23:04,670 --> 00:23:07,190 How does that algorithm bias a solution 343 00:23:07,190 --> 00:23:11,810 toward a solution that generalizes, 344 00:23:11,810 --> 00:23:14,270 that works on test data? 345 00:23:14,270 --> 00:23:16,130 And you can think of algorithms which 346 00:23:16,130 --> 00:23:21,160 would approach a solution that did not work on test data. 347 00:23:21,160 --> 00:23:23,583 So that's what you want to stay away from. 348 00:23:23,583 --> 00:23:24,750 You want the ones that work. 349 00:23:24,750 --> 00:23:31,050 So there's very deep math questions there, 350 00:23:31,050 --> 00:23:33,340 which are kind of new. 351 00:23:33,340 --> 00:23:36,430 They didn't arise until they did. 352 00:23:36,430 --> 00:23:45,190 And we'll try to save some of what's being understood. 353 00:23:45,190 --> 00:23:50,980 Can I focus now on, for probably the rest 354 00:23:50,980 --> 00:23:57,030 of today, this case, when the matrix is nearly singular? 355 00:23:57,030 --> 00:24:00,600 So you could apply elimination, but it 356 00:24:00,600 --> 00:24:11,330 would give a poor result. So one solution is the SVD. 357 00:24:11,330 --> 00:24:14,440 I haven't even mentioned the SVD, here, as an algorithm, 358 00:24:14,440 --> 00:24:16,250 but of course, it is. 359 00:24:16,250 --> 00:24:20,340 The SVD gives you an answer. 360 00:24:20,340 --> 00:24:22,020 Boy, where should that have gone? 361 00:24:22,020 --> 00:24:27,660 Well, the space over here, the SVD. 362 00:24:27,660 --> 00:24:35,160 So that produces-- you have A = U sigma V transposed, 363 00:24:35,160 --> 00:24:40,320 and then A inverse is V sigma inverse U transposed. 364 00:24:46,670 --> 00:24:48,170 So we're in the case, here. 365 00:24:48,170 --> 00:24:50,510 We're talking about number 5. 366 00:24:50,510 --> 00:24:54,620 Nearly singular, where sigma has some very small, 367 00:24:54,620 --> 00:24:56,390 singular values. 368 00:24:56,390 --> 00:25:00,570 Then sigma inverse has some very big singular values. 369 00:25:00,570 --> 00:25:07,220 So you're really in wild territory 370 00:25:07,220 --> 00:25:11,120 here with very big inverses. 371 00:25:11,120 --> 00:25:13,080 So that would be one way to do it. 372 00:25:13,080 --> 00:25:19,340 But this is a way to regularize the problem. 373 00:25:19,340 --> 00:25:21,310 So let's just pay attention to that. 374 00:25:28,730 --> 00:25:36,800 So suppose I minimize the sum of A x minus b squared and delta 375 00:25:36,800 --> 00:25:40,790 squared times the size of x squared. 376 00:25:40,790 --> 00:25:43,370 And I'm going to use the L2 norm. 377 00:25:43,370 --> 00:25:48,410 It's going to be a least squares with penalty, 378 00:25:48,410 --> 00:25:50,600 so of course, it's the L2 norm here, too. 379 00:25:55,240 --> 00:25:58,150 Suppose I solve that for a delta. 380 00:25:58,150 --> 00:26:03,820 For some, I have to choose a positive delta. 381 00:26:06,550 --> 00:26:08,350 And when I choose a positive delta, 382 00:26:08,350 --> 00:26:12,070 then I have a solvable problem. 383 00:26:12,070 --> 00:26:17,830 Even if this goes to 0, or A does crazy things, 384 00:26:17,830 --> 00:26:23,710 this is going to keep me away from singular. 385 00:26:23,710 --> 00:26:27,760 In fact, what equation does that lead to? 386 00:26:27,760 --> 00:26:31,480 So that's a least squares problem with an extra penalty 387 00:26:31,480 --> 00:26:32,960 term. 388 00:26:32,960 --> 00:26:34,980 So it would come, I suppose. 389 00:26:34,980 --> 00:26:42,260 Let's see, if I write the equations A delta I, 390 00:26:42,260 --> 00:26:53,170 x equals b 0, maybe that is the least squares equation-- 391 00:26:53,170 --> 00:26:55,270 the usual, normal equation-- 392 00:26:55,270 --> 00:26:59,200 for this augmented system. 393 00:26:59,200 --> 00:27:01,600 Because what's the error here? 394 00:27:01,600 --> 00:27:03,460 This is the new big A-- 395 00:27:03,460 --> 00:27:05,214 A star, let's say. 396 00:27:08,330 --> 00:27:10,550 X equals-- this is the new b. 397 00:27:13,720 --> 00:27:19,850 So if I apply least squares to that, what do I do? 398 00:27:19,850 --> 00:27:21,665 I minimize the sum of squares. 399 00:27:25,580 --> 00:27:29,400 So least squares would minimize A x minus b squared. 400 00:27:29,400 --> 00:27:32,720 That would be from the first components. 401 00:27:32,720 --> 00:27:40,790 And delta squared x squared from the last component, which 402 00:27:40,790 --> 00:27:43,610 is exactly what we said we were doing. 403 00:27:43,610 --> 00:27:47,480 So in a way, this is the equation 404 00:27:47,480 --> 00:27:52,100 that the penalty method is solving. 405 00:27:52,100 --> 00:27:57,530 And one question, naturally, is, what should delta be? 406 00:27:57,530 --> 00:28:02,450 Well, that question's beyond us, today. 407 00:28:02,450 --> 00:28:06,800 It's a balance of what you can believe, 408 00:28:06,800 --> 00:28:11,900 and how much noise is in the system, and everything. 409 00:28:11,900 --> 00:28:13,970 That choice of delta-- 410 00:28:13,970 --> 00:28:18,830 what we could ask is a math question. 411 00:28:18,830 --> 00:28:22,580 What happens as delta goes to 0? 412 00:28:22,580 --> 00:28:25,050 So suppose I solve this problem. 413 00:28:25,050 --> 00:28:28,010 Let's see, I could write it differently. 414 00:28:28,010 --> 00:28:31,430 What would be the equation, here? 415 00:28:31,430 --> 00:28:33,740 This part would give us the A transpose, 416 00:28:33,740 --> 00:28:39,890 and then this part would give us just the identity, 417 00:28:39,890 --> 00:28:45,620 x equals A transpose b, I think. 418 00:28:45,620 --> 00:28:46,850 Wouldn't that be? 419 00:28:46,850 --> 00:28:49,580 So really, I've written here-- 420 00:28:49,580 --> 00:28:53,570 what that is is A star transpose A star. 421 00:28:53,570 --> 00:29:00,440 This is least squares on this gives that equation. 422 00:29:00,440 --> 00:29:03,200 So all of those are equivalent. 423 00:29:03,200 --> 00:29:05,510 All of those would be equivalent statements 424 00:29:05,510 --> 00:29:09,890 of what the penalized problem is that you're solving. 425 00:29:09,890 --> 00:29:14,390 And then the question is, as delta goes to 0, what happens? 426 00:29:17,210 --> 00:29:18,830 Of course, something. 427 00:29:18,830 --> 00:29:22,610 When delta goes to 0, you're falling off the cliff. 428 00:29:22,610 --> 00:29:25,700 Something quite different is suddenly 429 00:29:25,700 --> 00:29:26,960 going to happen, there. 430 00:29:26,960 --> 00:29:33,470 Maybe we could even understand this question with a 1 431 00:29:33,470 --> 00:29:36,200 by 1 matrix. 432 00:29:36,200 --> 00:29:39,830 I think this section starts with a 1 by 1. 433 00:29:39,830 --> 00:29:41,680 Suppose A is just a number. 434 00:29:44,580 --> 00:29:47,370 Maybe I'll just put that on this board, here. 435 00:29:47,370 --> 00:29:48,810 Suppose A is just a number. 436 00:29:51,630 --> 00:29:54,300 So what am I going to call that number? 437 00:29:54,300 --> 00:29:55,620 Just 1 by 1. 438 00:29:55,620 --> 00:29:58,140 Let me call it sigma, because it's certainly 439 00:29:58,140 --> 00:30:00,230 the leading singular value. 440 00:30:06,520 --> 00:30:10,130 So what's my equation that I'm solving? 441 00:30:10,130 --> 00:30:15,500 A transpose A would be sigma squared plus delta squared, 1 442 00:30:15,500 --> 00:30:18,350 by 1, x-- 443 00:30:18,350 --> 00:30:20,690 should I give some subscript here? 444 00:30:20,690 --> 00:30:23,960 I should, really, to do it right. 445 00:30:23,960 --> 00:30:26,750 This is the solution for a given delta. 446 00:30:32,150 --> 00:30:33,700 So that solution will exist. 447 00:30:33,700 --> 00:30:34,390 Fine. 448 00:30:34,390 --> 00:30:36,670 This matrix is certainly invertible. 449 00:30:36,670 --> 00:30:40,460 That's positive semidefinite, at least. 450 00:30:40,460 --> 00:30:42,320 That's positive semidefinite, and then what 451 00:30:42,320 --> 00:30:45,530 about delta squared I? 452 00:30:45,530 --> 00:30:49,160 It is positive definite, of course. 453 00:30:49,160 --> 00:30:52,680 It's just the identity with a factor. 454 00:30:52,680 --> 00:30:55,370 So this is a positive definite matrix. 455 00:30:55,370 --> 00:30:57,470 I certainly have a solution. 456 00:30:57,470 --> 00:31:01,500 And let me keep going on this 1 by 1 case. 457 00:31:01,500 --> 00:31:03,180 This would be A transpose. 458 00:31:03,180 --> 00:31:04,700 A is just a sigma. 459 00:31:04,700 --> 00:31:06,890 I think it's just sigma b. 460 00:31:11,710 --> 00:31:17,890 So A is 1 by 1, and there are two cases, here-- 461 00:31:17,890 --> 00:31:25,230 Sigma bigger than 0, or sigma equals 0. 462 00:31:25,230 --> 00:31:28,290 And in either case, I just want to know what's the limit. 463 00:31:28,290 --> 00:31:32,310 So the answer x-- 464 00:31:32,310 --> 00:31:34,490 let me just take the right hand side. 465 00:31:34,490 --> 00:31:35,370 Well, that's fine. 466 00:31:39,140 --> 00:31:42,810 Am I computing OK? 467 00:31:42,810 --> 00:31:47,580 Using the penalize thing on a 1 by 1 problem, which you could 468 00:31:47,580 --> 00:31:50,910 say is a little bit small-- 469 00:31:50,910 --> 00:32:00,620 so solving this equation or equivalently minimizing this, 470 00:32:00,620 --> 00:32:03,151 so here, I'm finding the minimum of-- 471 00:32:07,590 --> 00:32:14,030 A was sigma x minus b squared plus delta squared x squared. 472 00:32:18,890 --> 00:32:20,620 You see it's just 1 by 1? 473 00:32:20,620 --> 00:32:21,400 Just a number. 474 00:32:21,400 --> 00:32:25,480 And I'm hoping that calculus will agree with linear algebra 475 00:32:25,480 --> 00:32:29,060 here, that if I find the minimum of this-- 476 00:32:29,060 --> 00:32:31,030 so let me write it out. 477 00:32:31,030 --> 00:32:36,170 Sigma squared x squared and delta squared x squared, 478 00:32:36,170 --> 00:32:42,820 and then minus 2 sigma xb, and then plus b squared. 479 00:32:42,820 --> 00:32:46,060 And now, I'm going to find the minimum, which means 480 00:32:46,060 --> 00:32:48,490 I'd set the derivative to 0. 481 00:32:48,490 --> 00:32:51,430 So I get 2 sigma squared and 2 delta squared. 482 00:32:51,430 --> 00:32:55,780 I get a two here, and this gives me 483 00:32:55,780 --> 00:32:57,850 the x derivative as 2 sigma b. 484 00:32:57,850 --> 00:33:00,490 So I get a 2 there, and I'm OK. 485 00:33:00,490 --> 00:33:06,610 I just cancel both 2s, and that's the equation. 486 00:33:06,610 --> 00:33:09,840 So I can solve that equation. 487 00:33:09,840 --> 00:33:19,110 X is sigma over sigma squared plus delta squared b. 488 00:33:19,110 --> 00:33:22,260 So it's really that quantity. 489 00:33:22,260 --> 00:33:25,230 I want to let delta go to 0. 490 00:33:28,850 --> 00:33:31,960 So again, what am I doing here? 491 00:33:31,960 --> 00:33:34,710 I'm taking a 1 by 1 example just to see 492 00:33:34,710 --> 00:33:42,840 what happens in the limit as delta goes to 0. 493 00:33:42,840 --> 00:33:45,600 What happens? 494 00:33:45,600 --> 00:33:48,300 So I just have to look at that. 495 00:33:48,300 --> 00:33:54,130 What is the limit of that thing in a circle, as delta 496 00:33:54,130 --> 00:33:55,150 goes to 0? 497 00:33:55,150 --> 00:33:58,090 So I'm finding out for a 1 by 1 problem what 498 00:33:58,090 --> 00:34:04,390 a penalized least squares problem, ridge regression, 499 00:34:04,390 --> 00:34:05,860 all over the place-- 500 00:34:05,860 --> 00:34:07,630 what happens? 501 00:34:07,630 --> 00:34:12,690 So what happens to that number as delta goes to 0? 502 00:34:15,400 --> 00:34:17,659 1 over sigma. 503 00:34:17,659 --> 00:34:21,670 So now, let delta go to 0. 504 00:34:21,670 --> 00:34:27,159 So that approaches 1 over sigma, because delta disappears. 505 00:34:27,159 --> 00:34:29,570 Sigma over sigma squared, 1 over sigma. 506 00:34:29,570 --> 00:34:34,590 So it approaches the inverse, but what's 507 00:34:34,590 --> 00:34:37,100 the other possibility, here? 508 00:34:37,100 --> 00:34:41,380 The other possibility is that sigma is 0. 509 00:34:41,380 --> 00:34:44,719 I didn't say whether this matrix, this 1 by 1 matrix, 510 00:34:44,719 --> 00:34:46,909 was invertible or not. 511 00:34:46,909 --> 00:34:53,500 If sigma is not 0, then I go to 1 over sigma. 512 00:34:53,500 --> 00:34:57,330 If sigma is really small, it will take a while. 513 00:34:57,330 --> 00:35:00,930 Delta will have to get small, small, small, even compared 514 00:35:00,930 --> 00:35:04,230 to sigma, until finally, that term goes away, 515 00:35:04,230 --> 00:35:06,000 and I just have 1 over sigma. 516 00:35:06,000 --> 00:35:09,390 But what if sigma is 0? 517 00:35:09,390 --> 00:35:14,410 Sorry to get excited about 0. 518 00:35:14,410 --> 00:35:16,970 Who would get excited about 0? 519 00:35:16,970 --> 00:35:20,840 So this is the case when this is 1 over sigma, 520 00:35:20,840 --> 00:35:23,000 if sigma is positive. 521 00:35:23,000 --> 00:35:25,325 And what does it approach if sigma is 0? 522 00:35:28,080 --> 00:35:29,770 0! 523 00:35:29,770 --> 00:35:32,400 Because this is 0, the whole problem 524 00:35:32,400 --> 00:35:34,810 was like disappeared, here. 525 00:35:34,810 --> 00:35:37,400 The sigma was 0. 526 00:35:37,400 --> 00:35:39,570 Here is a sigma. 527 00:35:39,570 --> 00:35:48,430 So anyway, if sigma is 0, then I'm getting 0 all the time. 528 00:35:48,430 --> 00:35:50,400 But I have a decent problem, because the delta 529 00:35:50,400 --> 00:35:51,940 squared is there. 530 00:35:51,940 --> 00:35:53,920 I have a decent problem until the last minute. 531 00:35:53,920 --> 00:35:55,090 My problem falls apart. 532 00:35:55,090 --> 00:35:58,660 Delta goes to 0, and I have a 0 equals 0 problem. 533 00:35:58,660 --> 00:35:59,440 I'm lost. 534 00:35:59,440 --> 00:36:03,370 But the point is the penalty kept me positive. 535 00:36:03,370 --> 00:36:07,300 It kept me with his delta squared term 536 00:36:07,300 --> 00:36:10,890 until the last critical moment. 537 00:36:10,890 --> 00:36:14,260 It kept me positive even if that was 0. 538 00:36:14,260 --> 00:36:19,600 If that is 0, and this is 0, I still have something here. 539 00:36:19,600 --> 00:36:22,030 I still have a problem to solve. 540 00:36:22,030 --> 00:36:24,430 And what's the limit then? 541 00:36:24,430 --> 00:36:29,050 So 1 over sigma if sigma is positive. 542 00:36:29,050 --> 00:36:32,720 And what's the answer if sigma is not positive? 543 00:36:32,720 --> 00:36:34,690 It's 0. 544 00:36:34,690 --> 00:36:36,700 Just tell me. 545 00:36:36,700 --> 00:36:38,260 I'm getting 0. 546 00:36:38,260 --> 00:36:40,750 I get 0 all the way, and I get 0 in the limit. 547 00:36:47,080 --> 00:36:53,450 And now, let me just ask, what have I got here? 548 00:36:53,450 --> 00:36:59,870 What is this sudden bifurcation? 549 00:36:59,870 --> 00:37:02,000 Do I recognize this? 550 00:37:02,000 --> 00:37:06,710 The inverse in the limit as delta goes to 0 551 00:37:06,710 --> 00:37:11,120 is either 1 over sigma, if that makes sense, 552 00:37:11,120 --> 00:37:14,090 or it's 0, which is not like 1 over sigma. 553 00:37:14,090 --> 00:37:16,970 1 over sigma-- as sigma goes to 0, 554 00:37:16,970 --> 00:37:19,130 this thing is getting bigger and bigger. 555 00:37:19,130 --> 00:37:22,850 But at sigma equals 0, it's 0. 556 00:37:22,850 --> 00:37:27,230 You see, that's a really strange kind of a limit. 557 00:37:30,560 --> 00:37:32,810 Now, it would be over there. 558 00:37:32,810 --> 00:37:38,910 What have I found here, in this limit? 559 00:37:38,910 --> 00:37:40,950 Say it again, because that was exactly right. 560 00:37:40,950 --> 00:37:43,230 The pseudo inverse. 561 00:37:43,230 --> 00:37:49,290 So this system-- choose delta greater than 0, 562 00:37:49,290 --> 00:37:51,810 then delta going to 0. 563 00:37:51,810 --> 00:37:55,710 The solution goes to the pseudo inverse. 564 00:38:00,360 --> 00:38:01,680 That's the key fact. 565 00:38:05,240 --> 00:38:07,980 When delta is really, really small, 566 00:38:07,980 --> 00:38:12,440 then this behaves in a pretty crazy way. 567 00:38:12,440 --> 00:38:18,770 If delta is really, really small, then sigma is bigger, 568 00:38:18,770 --> 00:38:20,300 or it's 0. 569 00:38:20,300 --> 00:38:22,140 If it's bigger, you go this way. 570 00:38:22,140 --> 00:38:23,560 If it's 0, you go that way. 571 00:38:27,850 --> 00:38:32,972 So that's the message, and this is penalized. 572 00:38:39,240 --> 00:38:45,550 These squares, as the penalty gets smaller and smaller, 573 00:38:45,550 --> 00:38:50,070 approaches the correct answer, the always correct answer, 574 00:38:50,070 --> 00:38:54,600 with that sudden split between 0 and not 0 575 00:38:54,600 --> 00:39:01,020 that we associate with the pseudo inverse. 576 00:39:01,020 --> 00:39:04,340 Of course, in a practical case, you're 577 00:39:04,340 --> 00:39:09,860 trying to find the resistances and inductions in a circuit 578 00:39:09,860 --> 00:39:16,460 by trying the circuit, and looking at the output b, 579 00:39:16,460 --> 00:39:18,890 and figuring out what input. 580 00:39:21,810 --> 00:39:29,100 So the unknown x is the unknown system parameters. 581 00:39:29,100 --> 00:39:33,780 Not the voltage and current, but the resistance, and inductance, 582 00:39:33,780 --> 00:39:34,953 and capacitance. 583 00:39:42,251 --> 00:39:46,200 I've only proved that in the 1 by 1 case. 584 00:39:46,200 --> 00:39:49,790 You may say that's not much of a proof. 585 00:39:49,790 --> 00:39:56,840 In the 1 by 1 case, we can see it happen in front of our eyes. 586 00:39:56,840 --> 00:40:01,820 So really, a step I haven't taken here 587 00:40:01,820 --> 00:40:07,320 is to complete that to any matrix A. 588 00:40:07,320 --> 00:40:10,080 So that the statement then. 589 00:40:10,080 --> 00:40:11,342 That's the statement. 590 00:40:20,500 --> 00:40:21,610 So that's the statement. 591 00:40:21,610 --> 00:40:29,820 For any matrix A, this matrix, A transpose A plus delta 592 00:40:29,820 --> 00:40:35,010 squared inverse times A transpose-- 593 00:40:35,010 --> 00:40:37,455 that's the solution matrix to our problem. 594 00:40:40,700 --> 00:40:42,440 That's what I wrote down up there. 595 00:40:42,440 --> 00:40:45,560 I take the inverse and pop it over there. 596 00:40:45,560 --> 00:40:54,020 That approaches A plus, the pseudo inverse. 597 00:40:59,480 --> 00:41:02,380 And that's what we just checked for 1 by 1. 598 00:41:02,380 --> 00:41:07,220 For 1 by 1, this was sigma over sigma 599 00:41:07,220 --> 00:41:09,200 squared plus delta squared. 600 00:41:09,200 --> 00:41:18,100 And it went either to 1 over sigma or to 0. 601 00:41:18,100 --> 00:41:20,370 It split in the limit. 602 00:41:20,370 --> 00:41:23,460 It shows that limits can be delicate. 603 00:41:23,460 --> 00:41:26,550 The limit-- as delta goes to 0, this thing 604 00:41:26,550 --> 00:41:28,920 is suddenly discontinuous. 605 00:41:28,920 --> 00:41:31,200 It's this number that is growing, 606 00:41:31,200 --> 00:41:35,100 and then suddenly, at 0, it falls back to 0. 607 00:41:35,100 --> 00:41:38,460 Anyway, that would be the statement. 608 00:41:38,460 --> 00:41:42,810 Actually, statisticians discovered the pseudo inverse 609 00:41:42,810 --> 00:41:49,620 independently of the linear algebra history of it, 610 00:41:49,620 --> 00:41:54,330 because statisticians did exactly that. 611 00:41:54,330 --> 00:41:58,620 To regularize the problem, they introduced a penalty 612 00:41:58,620 --> 00:42:01,390 and worked with this matrix. 613 00:42:01,390 --> 00:42:07,980 So statisticians were the first to think 614 00:42:07,980 --> 00:42:13,642 of that as a natural thing to do in a practical case-- 615 00:42:13,642 --> 00:42:14,225 add a penalty. 616 00:42:19,020 --> 00:42:23,130 So this is adding a penalty, but remember 617 00:42:23,130 --> 00:42:30,730 that we stayed with L2 norms, staying with L2, least squares. 618 00:42:38,000 --> 00:42:41,090 We could ask, what happens? 619 00:42:41,090 --> 00:42:45,050 Suppose the penalty is the L1 norm. 620 00:42:48,200 --> 00:42:50,840 I'm not up to do this today. 621 00:42:50,840 --> 00:42:52,850 Suppose I minimize that. 622 00:42:52,850 --> 00:43:01,666 Maybe I'll do L2, but I'll do the penalty guy in the L1 norm. 623 00:43:07,850 --> 00:43:11,000 I'm certainly not an expert on that. 624 00:43:11,000 --> 00:43:15,290 Or you could even think just that power. 625 00:43:15,290 --> 00:43:18,680 So that would have a name. 626 00:43:18,680 --> 00:43:21,590 A statistician invented this. 627 00:43:21,590 --> 00:43:27,100 It's called the Lasso in the L1 norm, and it's a big deal. 628 00:43:27,100 --> 00:43:36,600 Statisticians like the L1 norm, because it 629 00:43:36,600 --> 00:43:38,010 gives sparse solutions. 630 00:43:38,010 --> 00:43:41,700 It gives more genuine solutions without a whole lot 631 00:43:41,700 --> 00:43:46,320 of little components in the answer. 632 00:43:46,320 --> 00:43:48,660 So this was an important step. 633 00:43:52,780 --> 00:43:56,695 Let me just say again where we are in that big list. 634 00:44:00,910 --> 00:44:04,970 The two important ones that I haven't done yet 635 00:44:04,970 --> 00:44:08,900 are these iterative methods in 2.1. 636 00:44:08,900 --> 00:44:12,590 So that's like conventional linear algebra, 637 00:44:12,590 --> 00:44:15,530 just how to deal with a big matrix, 638 00:44:15,530 --> 00:44:17,390 maybe with some special structure. 639 00:44:17,390 --> 00:44:21,740 That's what numerical linear algebra is all about. 640 00:44:21,740 --> 00:44:27,790 And then Gram-Schmidt with or without pivoting, 641 00:44:27,790 --> 00:44:32,410 which is a workhorse of numerical computing, 642 00:44:32,410 --> 00:44:37,350 and I think I better save that for next time. 643 00:44:37,350 --> 00:44:42,870 So this is the one I picked for this time. 644 00:44:42,870 --> 00:44:47,010 And we saw what happened in L2. 645 00:44:47,010 --> 00:44:49,290 Well, we saw it for 1 by 1. 646 00:44:49,290 --> 00:44:56,740 Would you want to extend to prove this for any A, 647 00:44:56,740 --> 00:45:00,210 going beyond 1 by 1? 648 00:45:00,210 --> 00:45:05,735 How would you prove such a thing for any A? 649 00:45:05,735 --> 00:45:10,710 I guess I'm not going to do it. 650 00:45:10,710 --> 00:45:17,930 It's too painful, but how would you do it? 651 00:45:17,930 --> 00:45:20,330 You would use the SVD. 652 00:45:20,330 --> 00:45:23,780 If you want to prove something about matrices, about 653 00:45:23,780 --> 00:45:28,230 any matrix, the SVD is the best thing 654 00:45:28,230 --> 00:45:30,210 you could have-- the best tool you could have. 655 00:45:30,210 --> 00:45:34,810 I can write this in terms of the SVD. 656 00:45:34,810 --> 00:45:40,830 I just plug-in A equals whatever the SVD tells 657 00:45:40,830 --> 00:45:41,850 me to put in there. 658 00:45:41,850 --> 00:45:46,600 U sigma V transposed. 659 00:45:46,600 --> 00:45:50,800 Plug it in there, simplify it using the fact 660 00:45:50,800 --> 00:45:54,370 that these are orthogonal. 661 00:45:54,370 --> 00:45:58,000 If I have any good luck, it'll get an identity 662 00:45:58,000 --> 00:46:01,110 somewhere from there and an identity somewhere from there. 663 00:46:04,400 --> 00:46:05,900 And it will all simplify. 664 00:46:05,900 --> 00:46:09,480 It will all diagonalize. 665 00:46:09,480 --> 00:46:13,710 That's what the SVD really does is turns my messy problem 666 00:46:13,710 --> 00:46:17,310 into a problem about their diagonal matrix, sigma 667 00:46:17,310 --> 00:46:18,180 in the middle. 668 00:46:18,180 --> 00:46:20,330 So I might as well put sigma in the middle. 669 00:46:20,330 --> 00:46:21,420 Yeah, why not? 670 00:46:21,420 --> 00:46:23,627 Before we give up on it-- 671 00:46:26,970 --> 00:46:32,340 a special case of that, but really, the genuine case 672 00:46:32,340 --> 00:46:34,350 would be when A is sigma. 673 00:46:34,350 --> 00:46:41,580 Sigma transpose sigma plus delta squared I inverse times 674 00:46:41,580 --> 00:46:49,820 sigma transpose approaches the pseudo inverse, sigma plus. 675 00:46:49,820 --> 00:46:52,540 And the point is the matrix sigma here is diagonal. 676 00:46:55,840 --> 00:46:59,920 Oh, I'm practically there, actually. 677 00:46:59,920 --> 00:47:06,390 Why am I close to being able to read this off? 678 00:47:06,390 --> 00:47:08,580 Well, everything is diagonal here. 679 00:47:08,580 --> 00:47:10,320 Diagonal, diagonal, diagonal. 680 00:47:13,340 --> 00:47:16,085 And what's happening on those diagonal entries? 681 00:47:20,690 --> 00:47:25,330 So you had to take my word that when I plugged in the SVD, 682 00:47:25,330 --> 00:47:30,050 the U and the V got separated out to the far left 683 00:47:30,050 --> 00:47:31,220 and the far right. 684 00:47:31,220 --> 00:47:35,940 And it was that that stayed in the middle. 685 00:47:35,940 --> 00:47:38,920 So it's really this is the heart of it. 686 00:47:38,920 --> 00:47:46,570 And say, well, that's diagonal matrix. 687 00:47:46,570 --> 00:47:52,010 So I'm just looking at what happens on each diagonal entry, 688 00:47:52,010 --> 00:47:55,650 and which problem is that? 689 00:47:55,650 --> 00:47:59,520 The question of what's happening on a typical diagonal entry 690 00:47:59,520 --> 00:48:05,020 of this thing is what question? 691 00:48:05,020 --> 00:48:07,880 The 1 by 1 case! 692 00:48:07,880 --> 00:48:11,660 The 1 by 1, because each entry in the diagonal 693 00:48:11,660 --> 00:48:15,980 is not even noticing the others. 694 00:48:15,980 --> 00:48:19,940 So that's the logic, and it would be in the notes. 695 00:48:19,940 --> 00:48:27,450 Prove it first for 1 by 1, then secondly for diagonal. 696 00:48:27,450 --> 00:48:33,750 This, and finally with A's, and they're using the SVD with 697 00:48:33,750 --> 00:48:37,560 and U and V transposed to get out of the way 698 00:48:37,560 --> 00:48:39,720 and bring us back to here. 699 00:48:39,720 --> 00:48:45,750 So that's the theory, but really, I 700 00:48:45,750 --> 00:48:50,910 guess I'm thinking that far the most important message 701 00:48:50,910 --> 00:48:57,840 in today's lecture is in this list of different types 702 00:48:57,840 --> 00:49:01,830 of problems that appear and different ways 703 00:49:01,830 --> 00:49:03,630 to work with them. 704 00:49:03,630 --> 00:49:08,250 And we haven't done Gram-Schmidt, 705 00:49:08,250 --> 00:49:10,920 and we haven't done iteration. 706 00:49:10,920 --> 00:49:16,470 So this chapter is a survey of-- 707 00:49:16,470 --> 00:49:20,160 well, more than a survey of what numerical linear algebra 708 00:49:20,160 --> 00:49:20,760 is about. 709 00:49:20,760 --> 00:49:22,370 And I haven't done random, yet. 710 00:49:22,370 --> 00:49:23,640 Sorry, that's coming, too. 711 00:49:26,240 --> 00:49:29,120 So three pieces are still to come, 712 00:49:29,120 --> 00:49:35,252 but let's take the last two minutes off and call it a day.