1 00:00:01,550 --> 00:00:03,920 The following content is provided under a Creative 2 00:00:03,920 --> 00:00:05,310 Commons license. 3 00:00:05,310 --> 00:00:07,520 Your support will help MIT OpenCourseWare 4 00:00:07,520 --> 00:00:11,610 continue to offer high-quality educational resources for free. 5 00:00:11,610 --> 00:00:14,180 To make a donation or to view additional materials 6 00:00:14,180 --> 00:00:18,140 from hundreds of MIT courses, visit MIT OpenCourseWare 7 00:00:18,140 --> 00:00:19,512 at ocw.mit.edu. 8 00:00:22,470 --> 00:00:24,750 GILBERT STRANG: OK, let me start one minute early. 9 00:00:28,300 --> 00:00:35,140 So this being MIT, I just came from a terrific faculty member, 10 00:00:35,140 --> 00:00:41,200 Andy Lowe in the Sloan School, and I have 11 00:00:41,200 --> 00:00:43,070 to tell you what he told us. 12 00:00:43,070 --> 00:00:48,250 And then I had to leave before he could explain why it's true, 13 00:00:48,250 --> 00:00:52,300 but this is like an amazing fact which I don't want to forget, 14 00:00:52,300 --> 00:00:54,730 so here you go. 15 00:00:54,730 --> 00:00:57,380 Everything will be on that board. 16 00:00:57,380 --> 00:01:06,010 So it's an observation about us or other people, maybe not us. 17 00:01:06,010 --> 00:01:09,730 So suppose you have a biased coin. 18 00:01:09,730 --> 00:01:13,870 Maybe the people playing this game don't know, 19 00:01:13,870 --> 00:01:17,950 but it's 75% likely to produce heads, 20 00:01:17,950 --> 00:01:22,180 25% likely to produce tails. 21 00:01:22,180 --> 00:01:27,520 And then the player has to guess for one flip 22 00:01:27,520 --> 00:01:36,860 after another heads or tails, and you get $1 if you're right, 23 00:01:36,860 --> 00:01:39,530 you pay $1 if you're wrong. 24 00:01:39,530 --> 00:01:43,700 So you just want to get as many right choices 25 00:01:43,700 --> 00:01:48,510 as possible from this coin flip that continues. 26 00:01:48,510 --> 00:01:51,120 So what should you do? 27 00:01:51,120 --> 00:01:58,370 Well what I hope we would do is we would not 28 00:01:58,370 --> 00:02:03,740 know what the probabilities were, so we would guess maybe 29 00:02:03,740 --> 00:02:05,870 heads the first time, tails the second time, 30 00:02:05,870 --> 00:02:10,590 heads the third time, and so on. 31 00:02:10,590 --> 00:02:15,270 But the actual result would be mostly heads, 32 00:02:15,270 --> 00:02:20,520 so we would learn at some point that-- maybe not quite as soon 33 00:02:20,520 --> 00:02:21,330 as that. 34 00:02:21,330 --> 00:02:24,990 We would eventually learn that we should 35 00:02:24,990 --> 00:02:26,610 keep guessing heads, right? 36 00:02:26,610 --> 00:02:30,430 And that would be our optimal strategy, 37 00:02:30,430 --> 00:02:32,500 to guess heads all the time. 38 00:02:32,500 --> 00:02:36,620 But what do people actually do? 39 00:02:36,620 --> 00:02:43,950 They start like this, the same way, 40 00:02:43,950 --> 00:02:46,020 and then they're beginning to learn 41 00:02:46,020 --> 00:02:48,730 that heads is more common. 42 00:02:48,730 --> 00:02:55,460 So maybe they do more heads than tails, 43 00:02:55,460 --> 00:02:59,440 but sometimes tails is right, and then after a little while, 44 00:02:59,440 --> 00:03:04,430 they maybe see that it's-- yeah. 45 00:03:04,430 --> 00:03:07,970 Well maybe they're not counting, they're just 46 00:03:07,970 --> 00:03:11,060 operating like ordinary people. 47 00:03:11,060 --> 00:03:16,270 And what do ordinary people actually do in the long run? 48 00:03:16,270 --> 00:03:20,270 You would think guess heads every time, right? 49 00:03:20,270 --> 00:03:22,530 But they don't. 50 00:03:22,530 --> 00:03:26,130 In the long run, people and maybe animals 51 00:03:26,130 --> 00:03:31,740 and whatever guess heads three quarters of the time and tails 52 00:03:31,740 --> 00:03:33,300 one quarter of the time. 53 00:03:33,300 --> 00:03:35,310 Isn't that unbelievable? 54 00:03:35,310 --> 00:03:38,040 They're guessing tails a quarter of the time 55 00:03:38,040 --> 00:03:42,130 when the odds are never changing. 56 00:03:42,130 --> 00:03:48,000 Anyway, that's something that economists and other people 57 00:03:48,000 --> 00:03:52,080 have to explain, and if I had been able to stay another hour, 58 00:03:52,080 --> 00:03:54,230 I could tell you about the explanation. 59 00:03:57,080 --> 00:03:58,890 Oh, I see I've written that on a board 60 00:03:58,890 --> 00:04:02,440 that I have no way to bury, so it's going to be there, 61 00:04:02,440 --> 00:04:05,880 and it's not the subject of 18.065 62 00:04:05,880 --> 00:04:09,060 but it's kind of amazing. 63 00:04:09,060 --> 00:04:12,930 All right, so there's good math problems everywhere. 64 00:04:12,930 --> 00:04:13,650 OK. 65 00:04:13,650 --> 00:04:20,339 Can I just leave you with what I know, and if I learn more, 66 00:04:20,339 --> 00:04:22,510 I'll come back to that question. 67 00:04:22,510 --> 00:04:24,780 OK. 68 00:04:24,780 --> 00:04:28,630 Please turn attention this way, right? 69 00:04:28,630 --> 00:04:29,870 Norms. 70 00:04:29,870 --> 00:04:33,330 A few words on norms, like that should 71 00:04:33,330 --> 00:04:35,730 be a word in your language. 72 00:04:35,730 --> 00:04:37,740 And so you should know what it means 73 00:04:37,740 --> 00:04:41,610 and you should know a few of the important norms. 74 00:04:41,610 --> 00:04:44,880 Again, a norm is a way to measure 75 00:04:44,880 --> 00:04:49,740 the size of a vector or the size of a matrix or the size 76 00:04:49,740 --> 00:04:52,200 of a tensor, whatever we have. 77 00:04:52,200 --> 00:04:53,100 Or a function. 78 00:04:53,100 --> 00:04:54,490 Very important. 79 00:04:54,490 --> 00:04:58,800 A norm would be a function like sine x. 80 00:04:58,800 --> 00:05:02,760 From 0 to pi, what would be the size of that function? 81 00:05:02,760 --> 00:05:06,970 Well if it was 2 sine x, the size would be twice as much, 82 00:05:06,970 --> 00:05:09,420 so the norm should reflect that. 83 00:05:12,900 --> 00:05:17,610 So yesterday, or Wednesday, I told you that-- 84 00:05:17,610 --> 00:05:25,050 so p equals 2, 1, actually infinity, 85 00:05:25,050 --> 00:05:29,490 and then I'm going to put in the 0 norm with a question mark 86 00:05:29,490 --> 00:05:33,270 because you'll see that it has a problem. 87 00:05:33,270 --> 00:05:36,100 But let me just recall from last time. 88 00:05:36,100 --> 00:05:43,110 So p equal to 2 is the usual sum of squares square root. 89 00:05:43,110 --> 00:05:45,090 Usual length of a vector. 90 00:05:45,090 --> 00:05:50,850 p equal 1 is this very important norm, 91 00:05:50,850 --> 00:05:55,240 so I would call that the l1 norm, 92 00:05:55,240 --> 00:05:58,690 and we'll see a lot of that. 93 00:05:58,690 --> 00:06:02,200 I mentioned that it plays a very significant part now 94 00:06:02,200 --> 00:06:03,730 in compressed sensing. 95 00:06:03,730 --> 00:06:09,010 It really was a bombshell in signal processing to discover-- 96 00:06:09,010 --> 00:06:11,290 and in other fields, too-- to discover 97 00:06:11,290 --> 00:06:17,220 that some things really work best in the l1 norm. 98 00:06:17,220 --> 00:06:20,770 The maximum norm has a natural part to play, 99 00:06:20,770 --> 00:06:26,410 and we'll see that, or its matrix analog. 100 00:06:26,410 --> 00:06:29,930 So I didn't mention the l0 norm. 101 00:06:29,930 --> 00:06:31,810 All this lp business. 102 00:06:31,810 --> 00:06:41,605 So the lp norm, for any p, is you take the pth power-- 103 00:06:45,370 --> 00:06:47,380 to the pth power. 104 00:06:47,380 --> 00:06:49,630 Up here, p was 2. 105 00:06:49,630 --> 00:06:52,040 And you take the pth root. 106 00:06:52,040 --> 00:06:54,700 So maybe I should write it to the 1/p. 107 00:06:57,550 --> 00:07:00,820 Then that way, taking pth powers and pth 108 00:07:00,820 --> 00:07:05,350 roots, we do get the norm of 2v has a factor 109 00:07:05,350 --> 00:07:08,050 2 compared to the norm of v. 110 00:07:08,050 --> 00:07:10,120 So p equal to 2, you see it. 111 00:07:10,120 --> 00:07:11,590 We've got it right there. 112 00:07:11,590 --> 00:07:15,320 p equal 1, you see it here because it's just 113 00:07:15,320 --> 00:07:17,290 the sum of the absolute values. 114 00:07:17,290 --> 00:07:20,830 p equal to infinity, if I move p up 115 00:07:20,830 --> 00:07:24,160 and up and up, it will pick out-- 116 00:07:24,160 --> 00:07:27,490 as I increase p, whichever one is biggest 117 00:07:27,490 --> 00:07:29,740 is going to just take over, and that's 118 00:07:29,740 --> 00:07:31,450 why you get the max norm. 119 00:07:31,450 --> 00:07:37,330 Now the zero norm, where I'm using that word improperly, 120 00:07:37,330 --> 00:07:39,520 as you'll see. 121 00:07:39,520 --> 00:07:41,490 So what is the zero norm? 122 00:07:44,766 --> 00:07:50,200 So let me write it [INAUDIBLE] It's 123 00:07:50,200 --> 00:07:59,420 the number of non-zero components. 124 00:08:04,970 --> 00:08:06,710 It's the thing that you'd like to know 125 00:08:06,710 --> 00:08:10,100 about in question of sparsity. 126 00:08:10,100 --> 00:08:12,890 Is there just one non-zero component? 127 00:08:12,890 --> 00:08:13,820 Are there 11? 128 00:08:13,820 --> 00:08:16,880 Are there 101? 129 00:08:16,880 --> 00:08:22,940 That you might want to minimize that because sparse vectors 130 00:08:22,940 --> 00:08:26,060 and sparse matrices are much faster to compute with. 131 00:08:26,060 --> 00:08:27,500 You've got good stuff. 132 00:08:27,500 --> 00:08:30,200 But now I claim that's not a norm, the number 133 00:08:30,200 --> 00:08:38,640 of non-zero components, because how does the norm of 2v 134 00:08:38,640 --> 00:08:42,620 compare with the norm of v, the zero norm? 135 00:08:42,620 --> 00:08:43,850 It would be the same. 136 00:08:43,850 --> 00:08:47,450 2v has the same number of non-zeros as v. 137 00:08:47,450 --> 00:08:51,020 So it violates the rule for a norm. 138 00:08:51,020 --> 00:09:00,290 So I think with these norms and all the p's in between-- 139 00:09:00,290 --> 00:09:04,240 so actually, the math papers are full of, 140 00:09:04,240 --> 00:09:08,180 let p be between 1 and infinity, because that's 141 00:09:08,180 --> 00:09:12,800 the range where you do have a proper norm, as we will see. 142 00:09:12,800 --> 00:09:16,760 I think the good thing to do with these norms is 143 00:09:16,760 --> 00:09:18,830 to have a picture in your mind. 144 00:09:18,830 --> 00:09:21,900 The geometry of a norm is good. 145 00:09:21,900 --> 00:09:23,690 So the picture I'm going to suggest 146 00:09:23,690 --> 00:09:29,050 is, plot all the vectors, let's say in 2D. 147 00:09:29,050 --> 00:09:33,940 So two-dimensional space, R2. 148 00:09:33,940 --> 00:09:40,750 So I want to plot the vectors that have v equal 1 149 00:09:40,750 --> 00:09:43,060 in these different norms. 150 00:09:43,060 --> 00:09:45,680 So let me ask you-- 151 00:09:45,680 --> 00:09:50,350 so here's 2D space, R2, and now I 152 00:09:50,350 --> 00:09:56,590 want to plot all the vectors that their ordinary l2 lengths 153 00:09:56,590 --> 00:09:58,660 equal 1. 154 00:09:58,660 --> 00:10:00,900 So what does that picture look like? 155 00:10:00,900 --> 00:10:03,010 I just think a picture is really worth something. 156 00:10:03,010 --> 00:10:04,770 It's a circle, thanks. 157 00:10:04,770 --> 00:10:06,340 It's a circle. 158 00:10:06,340 --> 00:10:07,290 It's a circle. 159 00:10:07,290 --> 00:10:10,380 This circle has the equation, of course, 160 00:10:10,380 --> 00:10:13,560 v1 squared plus v2 squared equal 1. 161 00:10:18,006 --> 00:10:22,970 So I would call that the unit ball for the norm 162 00:10:22,970 --> 00:10:26,660 or whatever is a circle. 163 00:10:26,660 --> 00:10:30,230 OK, now here comes more interesting. 164 00:10:30,230 --> 00:10:31,950 What about in the l1, though? 165 00:10:35,780 --> 00:10:41,060 So again, tell me how to plot all the points that 166 00:10:41,060 --> 00:10:46,130 have v1 plus v2 equal 1. 167 00:10:46,130 --> 00:10:49,890 What's the boundary going to look like now? 168 00:10:49,890 --> 00:10:51,840 It's going to be, let's see. 169 00:10:51,840 --> 00:10:54,660 Well I can put down a certain number of points. 170 00:10:54,660 --> 00:10:58,980 There up at 1 and there at 1 and there at minus 1 and there 171 00:10:58,980 --> 00:11:00,300 at minus 1. 172 00:11:00,300 --> 00:11:04,920 That would reflect the vector 1, 0 and this 173 00:11:04,920 --> 00:11:09,510 would reflect the vector 0, minus 1. 174 00:11:09,510 --> 00:11:10,860 So yeah. 175 00:11:10,860 --> 00:11:11,820 OK. 176 00:11:11,820 --> 00:11:15,240 So those are like four points easy to plot. 177 00:11:15,240 --> 00:11:18,570 Easy to see the l1 norm. 178 00:11:18,570 --> 00:11:26,250 But what's the rest of the boundary here? 179 00:11:26,250 --> 00:11:27,720 It's a diamond, good. 180 00:11:27,720 --> 00:11:30,000 It's a diamond. 181 00:11:30,000 --> 00:11:35,580 We have linear set equal to 1. 182 00:11:35,580 --> 00:11:38,220 Up here in the positive quadrant, 183 00:11:38,220 --> 00:11:41,460 it's just v1 plus v2 equal to 1, and the graph of that 184 00:11:41,460 --> 00:11:42,520 is a straight line. 185 00:11:45,430 --> 00:11:48,000 So all these guys-- this is all the points 186 00:11:48,000 --> 00:11:51,520 with v1 plus v2 equal 1. 187 00:11:51,520 --> 00:11:55,300 And over here and over here and over here. 188 00:11:55,300 --> 00:12:01,650 So the unit ball in the l1 norm is a diamond. 189 00:12:01,650 --> 00:12:04,950 And that's a very important picture. 190 00:12:04,950 --> 00:12:09,060 It reflects in a very simple way something 191 00:12:09,060 --> 00:12:11,460 important about the l1 norm and the reason 192 00:12:11,460 --> 00:12:16,090 it's just exploded in importance. 193 00:12:16,090 --> 00:12:17,520 Let me continue, though. 194 00:12:17,520 --> 00:12:20,400 What about the max norm? 195 00:12:20,400 --> 00:12:23,910 v max or v infinity equal to 1. 196 00:12:23,910 --> 00:12:26,700 So again, let me plot these guys, 197 00:12:26,700 --> 00:12:31,810 and these guys are certainly going to be in it again 198 00:12:31,810 --> 00:12:37,260 because 0 [INAUDIBLE] plus or minus i and plus or minus j 199 00:12:37,260 --> 00:12:39,420 are good friends. 200 00:12:39,420 --> 00:12:42,650 What's the rest of the boundary look like now? 201 00:12:42,650 --> 00:12:52,260 Now this means max of the v's equal to 1. 202 00:12:52,260 --> 00:12:54,440 So what are the rest of the points? 203 00:12:57,720 --> 00:13:00,440 You see, it does take a little thought, but then you get it 204 00:13:00,440 --> 00:13:03,520 and you don't forget it. 205 00:13:03,520 --> 00:13:07,390 OK, so what's up? 206 00:13:07,390 --> 00:13:09,550 I'm looking. 207 00:13:09,550 --> 00:13:12,880 So suppose the maximum is v1. 208 00:13:12,880 --> 00:13:26,100 I think it's going to look like that, out to 1, 0 and up to 0, 209 00:13:26,100 --> 00:13:27,690 1. 210 00:13:27,690 --> 00:13:36,380 And up here, the vector would be 1.4 or something, 211 00:13:36,380 --> 00:13:38,560 so the maximum would be 1. 212 00:13:38,560 --> 00:13:40,520 Is that OK? 213 00:13:40,520 --> 00:13:45,920 So somehow, what really sees, as you change this number p, 214 00:13:45,920 --> 00:13:50,360 you start with p equal to 1, or a diamond, 215 00:13:50,360 --> 00:13:55,640 and it kind of swells out to be a circle at p equal to 2, 216 00:13:55,640 --> 00:13:59,780 and then it kind of keeps swelling 217 00:13:59,780 --> 00:14:03,200 to be a square and p equal to infinity. 218 00:14:03,200 --> 00:14:05,490 That's an interesting thing. 219 00:14:05,490 --> 00:14:06,250 And yeah. 220 00:14:09,050 --> 00:14:12,960 Now what's the problem with the zero norm? 221 00:14:15,510 --> 00:14:17,075 This is the number of non-zeros. 222 00:14:21,250 --> 00:14:23,440 OK, let me draw it. 223 00:14:23,440 --> 00:14:26,410 Where are the points with one non-zero? 224 00:14:29,080 --> 00:14:33,760 So I'm plotting the unit ball. 225 00:14:33,760 --> 00:14:39,850 Where are the vectors in this thing that have one non-zero? 226 00:14:39,850 --> 00:14:42,220 Not zero non-zero. 227 00:14:42,220 --> 00:14:43,960 So that's not included. 228 00:14:48,550 --> 00:14:50,260 So what do I have? 229 00:14:50,260 --> 00:14:53,920 I'm not allowed the vector 1/3, 2/3 230 00:14:53,920 --> 00:14:56,860 because that has two non-zeros, so where 231 00:14:56,860 --> 00:15:01,020 are the points with only one non-zero? 232 00:15:01,020 --> 00:15:03,520 Yeah, on the axes, yeah. 233 00:15:03,520 --> 00:15:04,660 That tells you. 234 00:15:04,660 --> 00:15:08,250 So it can be there and there. 235 00:15:08,250 --> 00:15:11,190 Oops, without that guy. 236 00:15:11,190 --> 00:15:13,950 And of course those just keep going out. 237 00:15:13,950 --> 00:15:17,025 So it totally violates the-- 238 00:15:20,400 --> 00:15:26,140 so maybe the point that I should make about these figures-- 239 00:15:26,140 --> 00:15:27,690 so like, what's happening? 240 00:15:27,690 --> 00:15:30,280 When I go down to zero-- 241 00:15:30,280 --> 00:15:33,930 and really, that figure should be at the other end, right? 242 00:15:33,930 --> 00:15:35,850 Oh no, shoot. 243 00:15:35,850 --> 00:15:37,200 This guy's in the middle. 244 00:15:37,200 --> 00:15:40,860 This is a badly drawn figure. 245 00:15:40,860 --> 00:15:43,270 l2 is kind of the center guy. 246 00:15:43,270 --> 00:15:48,210 l1 is at one end, l infinity is at the other end, 247 00:15:48,210 --> 00:15:53,700 and this one has gone off the end at the left there. 248 00:15:53,700 --> 00:15:56,810 The diamond has-- yeah, what's happened here, 249 00:15:56,810 --> 00:16:02,340 as that one goes down towards zero, none of these will be OK. 250 00:16:02,340 --> 00:16:09,780 These balls or these sets will lose weight. 251 00:16:12,590 --> 00:16:15,080 So they'll always have these points in, 252 00:16:15,080 --> 00:16:19,910 but they'll be like this and then like this and then finally 253 00:16:19,910 --> 00:16:24,080 in the unacceptable limit, but none of those-- 254 00:16:24,080 --> 00:16:25,720 this was not any good either. 255 00:16:25,720 --> 00:16:28,790 This was for people equal 1/2, let's say. 256 00:16:31,820 --> 00:16:36,120 That's a p equal to 1/2 and that's not a good norm. 257 00:16:36,120 --> 00:16:36,660 Yeah. 258 00:16:36,660 --> 00:16:44,190 So maybe the property of the circle, the diamond, 259 00:16:44,190 --> 00:16:51,420 and the square, which is a nice math property of those three 260 00:16:51,420 --> 00:16:57,140 sets and is not possessed by this. 261 00:16:57,140 --> 00:17:01,530 As this thing loses weight, I lose the property. 262 00:17:01,530 --> 00:17:03,650 And then of course it's totally lost over there. 263 00:17:03,650 --> 00:17:06,530 Do you know what that property would be? 264 00:17:06,530 --> 00:17:08,130 It's what? 265 00:17:08,130 --> 00:17:09,390 Concave, convex. 266 00:17:09,390 --> 00:17:10,900 Convex, I would say. 267 00:17:10,900 --> 00:17:11,400 Convex. 268 00:17:14,020 --> 00:17:21,970 This is a true norm as the convex unit. 269 00:17:21,970 --> 00:17:28,970 Well maybe for ball, I'm taking all the v's less or equal to 1. 270 00:17:28,970 --> 00:17:33,070 Yeah, so I'm allowing the insides of these shapes. 271 00:17:33,070 --> 00:17:35,280 So this is not a convex set. 272 00:17:35,280 --> 00:17:37,910 That set, which I should maybe-- 273 00:17:37,910 --> 00:17:48,330 so not convex would be this one like so. 274 00:17:48,330 --> 00:17:54,680 And that reflects the fact that the rules for a norm 275 00:17:54,680 --> 00:17:56,030 are broken in the triangle. 276 00:17:56,030 --> 00:18:00,230 Inequality is probably broken in the-- 277 00:18:00,230 --> 00:18:02,510 other stuff, yeah. 278 00:18:02,510 --> 00:18:06,210 I think that's sort of worth remembering. 279 00:18:06,210 --> 00:18:20,480 And then one more norm that's natural to think about is-- 280 00:18:20,480 --> 00:18:24,180 so S, as in the Piazza question, S 281 00:18:24,180 --> 00:18:30,030 does always represent a symmetric matrix in 18.065. 282 00:18:30,030 --> 00:18:33,600 And now my norm is going to be-- 283 00:18:33,600 --> 00:18:36,840 so I'm going to call it the S norm. 284 00:18:36,840 --> 00:18:41,610 So actually, it's a positive definite symmetric matrix. 285 00:18:41,610 --> 00:18:44,400 S is a positive definite symmetric matrix. 286 00:18:44,400 --> 00:18:45,600 And what do I do? 287 00:18:45,600 --> 00:18:47,340 I'll take v transpose Sv. 288 00:18:53,920 --> 00:18:56,730 OK, what's our word for that? 289 00:18:56,730 --> 00:18:57,780 The energy. 290 00:18:57,780 --> 00:19:00,540 That's the energy in the vector v. And I'll 291 00:19:00,540 --> 00:19:10,270 take the square root so that I now have the length of two 292 00:19:10,270 --> 00:19:13,530 if I double v, from v to 2v. 293 00:19:13,530 --> 00:19:15,420 Then I got a 2 here and a 2 here, 294 00:19:15,420 --> 00:19:18,880 and when I take the square root, I get a overall 2 295 00:19:18,880 --> 00:19:20,250 and that's what I want. 296 00:19:20,250 --> 00:19:24,960 I want the norm to grow linearly with the two or three 297 00:19:24,960 --> 00:19:26,610 or whatever I multiply by. 298 00:19:26,610 --> 00:19:30,720 But what is the shape of this thing? 299 00:19:30,720 --> 00:19:36,540 So what is the shape of-- 300 00:19:36,540 --> 00:19:39,460 let me put it on this board. 301 00:19:39,460 --> 00:19:41,510 I'm going to get a picture like that. 302 00:19:41,510 --> 00:19:51,960 So what is the shape of v transpose Sv equal 1 or less 303 00:19:51,960 --> 00:19:52,750 or equal 1? 304 00:19:55,460 --> 00:19:59,630 This is a symmetric positive definite. 305 00:19:59,630 --> 00:20:03,800 People use those three letters to tell us. 306 00:20:03,800 --> 00:20:06,780 I'm claiming that we get a bunch of norms. 307 00:20:06,780 --> 00:20:10,280 When do we get the l2 norm? 308 00:20:10,280 --> 00:20:15,290 What matrix S would this give us the l2 norm? 309 00:20:15,290 --> 00:20:17,240 The identity, certainly. 310 00:20:17,240 --> 00:20:21,260 Now what's going to happen if I use some different matrix S? 311 00:20:21,260 --> 00:20:26,060 This circle is going to change shape. 312 00:20:26,060 --> 00:20:28,770 I might have a different norm, depending on S. 313 00:20:28,770 --> 00:20:33,970 And a typical case would be S equal 2, 3, say. 314 00:20:37,320 --> 00:20:40,500 That's a positive definite symmetric matrix. 315 00:20:40,500 --> 00:20:48,160 And now I would be drawing the graph of 2v1 squared plus 3v2 316 00:20:48,160 --> 00:20:48,660 squared. 317 00:20:48,660 --> 00:20:50,580 That would be the energy, right? 318 00:20:50,580 --> 00:20:51,890 Equal 1. 319 00:20:51,890 --> 00:20:56,190 And I just want you to tell me what shape that is. 320 00:20:56,190 --> 00:20:59,550 So that's a perfectly good norm that you 321 00:20:59,550 --> 00:21:01,830 could check all its properties. 322 00:21:01,830 --> 00:21:03,930 They all come out easily. 323 00:21:03,930 --> 00:21:06,480 But I get a new picture-- 324 00:21:06,480 --> 00:21:09,570 a new norm that's kind of adjustable. 325 00:21:09,570 --> 00:21:11,400 You could say it's a weighted norm. 326 00:21:11,400 --> 00:21:14,190 Weights mean that you kind of have 327 00:21:14,190 --> 00:21:16,440 picked some numbers sort of appropriate 328 00:21:16,440 --> 00:21:18,670 to the particular problem. 329 00:21:18,670 --> 00:21:21,060 Well, suppose those numbers are 2 and 3. 330 00:21:21,060 --> 00:21:26,440 What shape is the unit ball in this S norm? 331 00:21:26,440 --> 00:21:28,270 It's an ellipse, right. 332 00:21:28,270 --> 00:21:29,650 It's an ellipse. 333 00:21:29,650 --> 00:21:32,050 And I guess it will actually be-- 334 00:21:32,050 --> 00:21:34,900 so the larger number, 3, will mean 335 00:21:34,900 --> 00:21:39,180 you can't go as far as the smaller number, 2. 336 00:21:39,180 --> 00:21:43,420 I think it would probably be an ellipse like this, 337 00:21:43,420 --> 00:21:46,690 and the axes length of the ellipse 338 00:21:46,690 --> 00:21:50,530 would probably have something to do with the 2 and the 3. 339 00:21:50,530 --> 00:21:56,940 OK, so now you know really all the vector norms that 340 00:21:56,940 --> 00:22:01,380 are sort of naturally used. 341 00:22:01,380 --> 00:22:04,950 These come up in a natural way. 342 00:22:04,950 --> 00:22:10,040 As we said, the identity matrix brings us back to the two norm. 343 00:22:10,040 --> 00:22:14,850 So these are all sort of variations on the two norm. 344 00:22:14,850 --> 00:22:20,950 And these are variations as p runs from 1 up to 2 345 00:22:20,950 --> 00:22:26,660 on to infinity and is not allowed to go below 1. 346 00:22:26,660 --> 00:22:27,970 OK, that's norms. 347 00:22:30,710 --> 00:22:34,740 And then maybe you can actually see from this picture-- 348 00:22:34,740 --> 00:22:41,150 here is a, like, somewhat hokey idea 349 00:22:41,150 --> 00:22:49,310 of why it is that minimizing the area in this norm-- 350 00:22:49,310 --> 00:22:52,600 so what do I mean by that? 351 00:22:52,600 --> 00:22:55,700 Here would be a typical problem. 352 00:22:55,700 --> 00:23:05,460 Minimize, subject to Ax equal b, the l2-- 353 00:23:05,460 --> 00:23:09,490 sorry, I'm using x now-- 354 00:23:09,490 --> 00:23:11,980 the l1 norm of x. 355 00:23:16,900 --> 00:23:19,560 So that would be an important problem. 356 00:23:19,560 --> 00:23:21,090 Actually, it has a name. 357 00:23:21,090 --> 00:23:24,120 People have spent a lot of time thinking of a fast way 358 00:23:24,120 --> 00:23:25,200 to solve it. 359 00:23:25,200 --> 00:23:26,900 It's almost like least squares. 360 00:23:26,900 --> 00:23:31,500 What would make it more like least squares would be, 361 00:23:31,500 --> 00:23:32,947 change that to 2. 362 00:23:35,750 --> 00:23:37,860 Yeah. 363 00:23:37,860 --> 00:23:43,410 Can I just sort of sketch, without making a big argument 364 00:23:43,410 --> 00:23:49,480 here, the difference between l equal 1 or 2 here. 365 00:23:49,480 --> 00:23:52,070 Yeah, I'll just draw a picture. 366 00:23:52,070 --> 00:23:56,580 Now I'll erase this ellipse, but you won't forget. 367 00:23:56,580 --> 00:23:59,055 OK. 368 00:23:59,055 --> 00:24:02,030 So this is our problem. 369 00:24:02,030 --> 00:24:05,930 With l1, it has a famous name, basis pursuit. 370 00:24:05,930 --> 00:24:09,920 Well famous to people who work in optimization. 371 00:24:09,920 --> 00:24:12,740 For l2, it has an important name. 372 00:24:12,740 --> 00:24:16,340 Well it's sort of like least squares. 373 00:24:16,340 --> 00:24:20,180 Ridge regression. 374 00:24:20,180 --> 00:24:23,000 This is like a beautiful model problem. 375 00:24:23,000 --> 00:24:25,340 Among all solutions to Ax, suppose 376 00:24:25,340 --> 00:24:31,220 this is just one equation, like c1x1 plus 377 00:24:31,220 --> 00:24:35,970 c2x2 equals some right side, b. 378 00:24:35,970 --> 00:24:43,500 So the constraint says that the vectors x have to be on a line. 379 00:24:43,500 --> 00:24:46,560 Suppose that's a graph of that line. 380 00:24:46,560 --> 00:24:51,510 So among all these x's, which one-- 381 00:24:51,510 --> 00:24:56,880 oh, I'm realizing what I'm going to say is going to be smart. 382 00:24:56,880 --> 00:25:00,500 I mean, it's going to be nice. 383 00:25:00,500 --> 00:25:05,850 Not going to be difficult. Let's do the one we know best, l2. 384 00:25:05,850 --> 00:25:07,560 So here's a picture of the line. 385 00:25:07,560 --> 00:25:11,380 Let me make it a little more tilted so you-- 386 00:25:15,980 --> 00:25:19,470 yeah, like 2, 3. 387 00:25:19,470 --> 00:25:21,030 OK. 388 00:25:21,030 --> 00:25:22,500 This is the xy plane. 389 00:25:22,500 --> 00:25:25,320 Here's x1, here's x2. 390 00:25:25,320 --> 00:25:29,040 Here are the points that satisfy my condition. 391 00:25:29,040 --> 00:25:33,780 Which point on that line minimizes-- 392 00:25:33,780 --> 00:25:37,570 has the smallest l2 norm? 393 00:25:37,570 --> 00:25:43,920 Which point on the line has the smallest l2 norm? 394 00:25:43,920 --> 00:25:48,990 Yeah, you're drawing the right figure with your hands. 395 00:25:48,990 --> 00:25:50,640 The smallest l2 norm-- 396 00:25:50,640 --> 00:25:55,160 l2, remember, is just how far out you go. 397 00:25:55,160 --> 00:25:59,450 It's circular here, so it doesn't matter what direction. 398 00:25:59,450 --> 00:26:02,740 They're all giving the same l2 norm, it's just how far. 399 00:26:02,740 --> 00:26:07,053 So we're looking for the closest point on the line 400 00:26:07,053 --> 00:26:08,720 because we don't want to go any further. 401 00:26:08,720 --> 00:26:11,880 We want to go a minimum distance with-- 402 00:26:11,880 --> 00:26:13,460 I'm doing l2 now. 403 00:26:16,850 --> 00:26:18,890 So where is the point at minimum distance? 404 00:26:18,890 --> 00:26:23,800 Yeah, just show me again once more, with hands or whatever. 405 00:26:23,800 --> 00:26:26,420 It'll be that. 406 00:26:26,420 --> 00:26:29,840 I didn't want 45 degree angles there. 407 00:26:29,840 --> 00:26:33,350 I'm going to erase it again and really-- 408 00:26:33,350 --> 00:26:36,200 this time, I'm going to get angles 409 00:26:36,200 --> 00:26:42,890 that are not 45 [INAUDIBLE] All right, brilliant. 410 00:26:42,890 --> 00:26:44,180 Got it. 411 00:26:44,180 --> 00:26:45,880 OK, that's my line. 412 00:26:45,880 --> 00:26:48,430 OK, and what's the nearest point in the l2 norm? 413 00:26:48,430 --> 00:26:51,040 Here's the winner in l2, right? 414 00:26:51,040 --> 00:26:53,870 The nearest point. 415 00:26:53,870 --> 00:26:55,280 Everybody sees that picture? 416 00:26:55,280 --> 00:27:04,380 So that's a basic picture for minimizing something 417 00:27:04,380 --> 00:27:07,680 with a constraint, which is the fundamental problem 418 00:27:07,680 --> 00:27:12,600 of optimization, of neural nets, of everything, really. 419 00:27:12,600 --> 00:27:14,530 Of life. 420 00:27:14,530 --> 00:27:16,980 Well I'm getting philosophical. 421 00:27:19,750 --> 00:27:23,980 But the question always is, and maybe it's true in life, 422 00:27:23,980 --> 00:27:26,690 too, which norm are you using? 423 00:27:26,690 --> 00:27:34,640 OK, now that was the minimum in l2. 424 00:27:34,640 --> 00:27:36,590 That's the shortest distance, where 425 00:27:36,590 --> 00:27:40,430 distance means what we usually think of it as meaning. 426 00:27:40,430 --> 00:27:44,120 But now, let's go for the l1 norm. 427 00:27:44,120 --> 00:27:49,830 Which point on the line has the smallest l1 norm? 428 00:27:49,830 --> 00:27:53,180 So now I'm going to add the 2. 429 00:27:53,180 --> 00:28:00,050 So if this is some point a, 0 and this 430 00:28:00,050 --> 00:28:04,520 is some point 0, b right there. 431 00:28:04,520 --> 00:28:07,030 So those two points are obviously important. 432 00:28:07,030 --> 00:28:08,740 And that point, we could figure out 433 00:28:08,740 --> 00:28:14,560 the formula for because we know what the geometry is. 434 00:28:14,560 --> 00:28:17,230 But I've just put those two points in. 435 00:28:17,230 --> 00:28:19,160 So did I get a 0, b? 436 00:28:19,160 --> 00:28:22,790 Yeah, that's a zero. 437 00:28:22,790 --> 00:28:24,310 So let me just ask you the question. 438 00:28:24,310 --> 00:28:28,590 What point on that line has the smallest l1 norm? 439 00:28:28,590 --> 00:28:30,714 Which has the smallest l1 norm? 440 00:28:33,482 --> 00:28:34,190 Somebody said it. 441 00:28:34,190 --> 00:28:37,430 Just say it a little louder so that you're on tape forever. 442 00:28:40,130 --> 00:28:40,970 AUDIENCE: 0, b. 443 00:28:40,970 --> 00:28:43,120 GILBERT STRANG: 0, b, this point. 444 00:28:43,120 --> 00:28:45,500 That's the winner. 445 00:28:45,500 --> 00:28:51,440 This is the l1 winner and this was the l2 winner. 446 00:28:54,180 --> 00:28:58,800 And notice what I said earlier, and I didn't see it coming, 447 00:28:58,800 --> 00:29:02,540 but now I realize this is a figure to put in the notes. 448 00:29:02,540 --> 00:29:07,920 The winner has the most zeros. 449 00:29:07,920 --> 00:29:10,860 It's the [? sparsest ?] vector. 450 00:29:10,860 --> 00:29:14,500 Well out of two components, it didn't have much freedom, 451 00:29:14,500 --> 00:29:16,830 but it has a zero component. 452 00:29:16,830 --> 00:29:23,950 It's on the axes. 453 00:29:23,950 --> 00:29:26,700 It's the things on the axes that have the smallest 454 00:29:26,700 --> 00:29:28,490 number of components. 455 00:29:28,490 --> 00:29:36,780 So yeah, this is the picture in two dimensions. 456 00:29:36,780 --> 00:29:39,240 So I'm in 2D. 457 00:29:39,240 --> 00:29:42,850 And you can see that the winner has a zero component, yeah. 458 00:29:46,680 --> 00:29:51,040 And that's a fact that extends into higher dimensions too 459 00:29:51,040 --> 00:29:55,240 and that makes the l1 norm special, as I've said. 460 00:29:55,240 --> 00:29:55,740 Yeah. 461 00:29:55,740 --> 00:30:00,130 Is there more to say about that example? 462 00:30:00,130 --> 00:30:07,040 For a simple 2D question, that really makes the point 463 00:30:07,040 --> 00:30:09,200 that the l1 winner is there. 464 00:30:09,200 --> 00:30:10,070 It's not further. 465 00:30:10,070 --> 00:30:12,040 You don't go further up the line, right? 466 00:30:12,040 --> 00:30:17,530 Because that's bad in all ways. 467 00:30:17,530 --> 00:30:20,420 When you go up further, you're adding 468 00:30:20,420 --> 00:30:23,960 some non-zero first component and you're 469 00:30:23,960 --> 00:30:27,000 increasing the non-zero second component, 470 00:30:27,000 --> 00:30:29,840 so that's a bad idea. 471 00:30:29,840 --> 00:30:30,870 That's a bad idea. 472 00:30:30,870 --> 00:30:32,480 This is the winner. 473 00:30:32,480 --> 00:30:38,570 And in a way, here's the picture. 474 00:30:38,570 --> 00:30:40,490 Oh yeah. 475 00:30:40,490 --> 00:30:43,340 I should prepare these lectures, but this one's 476 00:30:43,340 --> 00:30:45,050 coming out all right anyway. 477 00:30:45,050 --> 00:30:49,940 So the picture there is the nearest ball hits 478 00:30:49,940 --> 00:30:52,470 at that point. 479 00:30:52,470 --> 00:30:53,640 And what is it? 480 00:30:53,640 --> 00:30:54,540 Can you see that? 481 00:30:54,540 --> 00:30:57,840 So that star is outside the circle. 482 00:30:57,840 --> 00:31:07,260 This is the l1 winner and that's the blow up the l1 norm 483 00:31:07,260 --> 00:31:08,730 until it hits. 484 00:31:08,730 --> 00:31:13,980 That's the point where the l1 norm hits. 485 00:31:13,980 --> 00:31:16,260 Do you see it? 486 00:31:16,260 --> 00:31:20,970 Just give it a little thought, that another geometric way 487 00:31:20,970 --> 00:31:23,640 to see the answer to this problem is, 488 00:31:23,640 --> 00:31:26,080 you start at the origin and you blow up 489 00:31:26,080 --> 00:31:30,450 the norm until you get a point on the line that 490 00:31:30,450 --> 00:31:32,280 satisfies your constraint. 491 00:31:32,280 --> 00:31:36,870 And because you were blowing up the norm, when it hits first, 492 00:31:36,870 --> 00:31:39,350 that's the smallest blow-up possible. 493 00:31:39,350 --> 00:31:42,970 That's the guy that minimizes. 494 00:31:42,970 --> 00:31:44,820 Yeah, so just think about that picture 495 00:31:44,820 --> 00:31:49,760 and I'll draw it better somewhere, too. 496 00:31:49,760 --> 00:31:52,810 Well that's vector norms. 497 00:31:55,520 --> 00:31:59,420 And then I introduce some matrix norms, and let me just 498 00:31:59,420 --> 00:32:00,740 say a word about those. 499 00:32:05,460 --> 00:32:08,410 OK, a word about matrix norms. 500 00:32:08,410 --> 00:32:16,720 So the matrix norms were the-- 501 00:32:16,720 --> 00:32:25,050 so now I have a matrix A and I want to define those same three 502 00:32:25,050 --> 00:32:27,870 norms again for a matrix. 503 00:32:27,870 --> 00:32:34,340 And this was the 2 norm, and what 504 00:32:34,340 --> 00:32:36,660 was the 2 norm of a matrix? 505 00:32:36,660 --> 00:32:45,710 Well it was sigma 1, it turned out to be. 506 00:32:45,710 --> 00:32:47,900 So that doesn't define it. 507 00:32:47,900 --> 00:32:49,220 Or we could define it. 508 00:32:49,220 --> 00:32:51,800 Just say, OK, the largest singular value 509 00:32:51,800 --> 00:32:54,320 is the 2 norm of the matrix. 510 00:32:54,320 --> 00:32:56,350 But actually, it comes from somewhere. 511 00:32:56,350 --> 00:33:04,670 So I want to speak about this one first, the 2 norm. 512 00:33:04,670 --> 00:33:09,070 So it's the 2 norm of a matrix, and one way 513 00:33:09,070 --> 00:33:19,500 to see the 2 norm of a matrix is to connect it 514 00:33:19,500 --> 00:33:21,150 to the 2 norm of vectors. 515 00:33:23,790 --> 00:33:26,220 I'd like to connect the 2 norm of matrices 516 00:33:26,220 --> 00:33:29,400 to the 2 norm of vectors. 517 00:33:29,400 --> 00:33:32,400 And how shall I do that? 518 00:33:32,400 --> 00:33:39,020 I think I'm going to look at the 2 norm of Ax 519 00:33:39,020 --> 00:33:43,420 over the 2 norm of x. 520 00:33:43,420 --> 00:33:49,370 So in a way, to me, that ratio is like the blow-up factor. 521 00:33:49,370 --> 00:33:51,880 If A was seven times the identity, 522 00:33:51,880 --> 00:33:53,690 to take an easy case-- 523 00:33:53,690 --> 00:33:55,880 if A is seven times the identity, 524 00:33:55,880 --> 00:33:57,140 what will that ratio be? 525 00:34:00,380 --> 00:34:01,760 Say it, yeah. 526 00:34:01,760 --> 00:34:03,340 Seven. 527 00:34:03,340 --> 00:34:09,580 If A is 7i, this will be 7x and this will be x, and norms, 528 00:34:09,580 --> 00:34:14,662 the factor seven comes out, so that ratio will be seven. 529 00:34:14,662 --> 00:34:16,600 OK. 530 00:34:16,600 --> 00:34:19,989 For me, the norm is-- 531 00:34:19,989 --> 00:34:21,174 that's the blow-up factor. 532 00:34:23,995 --> 00:34:29,130 So here's the idea of a matrix norm. 533 00:34:29,130 --> 00:34:31,159 Now I'm doing matrix. 534 00:34:31,159 --> 00:34:34,998 Matrix norm from vector norm. 535 00:34:38,830 --> 00:34:42,770 And the answer will be the maximum blow-up. 536 00:34:46,210 --> 00:34:48,770 The maximum of this ratio. 537 00:34:48,770 --> 00:34:50,909 I call that ratio the blow-up factor. 538 00:34:50,909 --> 00:34:53,210 That's just a made-up name. 539 00:34:53,210 --> 00:34:57,731 The maximum over all x. 540 00:34:57,731 --> 00:34:58,760 All of x. 541 00:34:58,760 --> 00:35:02,590 I look to see which vector gets blown up the most 542 00:35:02,590 --> 00:35:09,670 and that is the norm of the matrix. 543 00:35:09,670 --> 00:35:12,460 I've settled on norms of vectors. 544 00:35:12,460 --> 00:35:15,520 That's done upstairs there. 545 00:35:15,520 --> 00:35:18,580 Now I'm looking at norms of matrices. 546 00:35:18,580 --> 00:35:22,660 And this is one way to get a good norm of a matrix that 547 00:35:22,660 --> 00:35:24,760 kind of comes from the 2 norm. 548 00:35:24,760 --> 00:35:27,520 So there would be other norms for matrices coming 549 00:35:27,520 --> 00:35:31,300 from other vector norms, and those, we haven't seen, 550 00:35:31,300 --> 00:35:35,600 but the 2 norm is a very important one. 551 00:35:35,600 --> 00:35:40,030 So what is the maximum value of this? 552 00:35:40,030 --> 00:35:43,433 Of that ratio for a matrix A? 553 00:35:43,433 --> 00:35:47,890 The claim is that it's sigma 1. 554 00:35:47,890 --> 00:35:49,880 I'll just put a big equality there. 555 00:35:53,090 --> 00:35:58,372 Now, can we see, why is sigma 1 the answer to this problem? 556 00:36:03,200 --> 00:36:05,400 I can see a couple of ways to think about that 557 00:36:05,400 --> 00:36:07,220 but that's a very important fact. 558 00:36:07,220 --> 00:36:14,850 In fact, this is a way to discover what sigma 1 is 559 00:36:14,850 --> 00:36:16,650 without all the other sigmas. 560 00:36:16,650 --> 00:36:19,860 If I look for the x that has the biggest blow-up factor-- 561 00:36:19,860 --> 00:36:22,260 and by the way, which x will it be? 562 00:36:22,260 --> 00:36:27,630 Which x will win the max competition here and be sigma 563 00:36:27,630 --> 00:36:30,300 1 times as large as-- 564 00:36:30,300 --> 00:36:34,920 the ratio will be sigma 1. 565 00:36:34,920 --> 00:36:36,090 That will be sigma 1. 566 00:36:36,090 --> 00:36:39,540 When is this thing sigma 1 times as large as that? 567 00:36:39,540 --> 00:36:42,220 For which x? 568 00:36:42,220 --> 00:36:45,120 Not for an eigenvector. 569 00:36:45,120 --> 00:36:50,070 If x was an eigenvector, what would that ratio be? 570 00:36:50,070 --> 00:36:50,570 Lambda. 571 00:36:53,200 --> 00:36:56,680 But if A is not a symmetric matrix, 572 00:36:56,680 --> 00:37:03,170 maybe the eigenvectors don't tell you the exact way they go. 573 00:37:03,170 --> 00:37:06,090 So what vector would you now guess? 574 00:37:06,090 --> 00:37:10,460 It's not an eigenvector, it is a singular vector. 575 00:37:10,460 --> 00:37:14,180 And which singular vector is it probably going to be? 576 00:37:14,180 --> 00:37:16,140 v1. 577 00:37:16,140 --> 00:37:17,550 Yeah, v1 makes sense. 578 00:37:17,550 --> 00:37:18,370 Winner. 579 00:37:18,370 --> 00:37:21,240 So the winner of this competition 580 00:37:21,240 --> 00:37:29,468 is x equal v1, the first right singular vector. 581 00:37:32,960 --> 00:37:34,700 And we better be able to check that. 582 00:37:34,700 --> 00:37:42,410 So again, this maximization problem, the answer 583 00:37:42,410 --> 00:37:46,440 is in terms of the singular vector. 584 00:37:46,440 --> 00:37:49,590 So that's a way to find this first singular vector 585 00:37:49,590 --> 00:37:52,130 without finding them all. 586 00:37:52,130 --> 00:37:55,850 And let's just plug in the first singular vector 587 00:37:55,850 --> 00:38:02,600 and see that the ratio is sigma 1. 588 00:38:02,600 --> 00:38:04,890 So now let me plug it in. 589 00:38:04,890 --> 00:38:06,290 So what do I have? 590 00:38:06,290 --> 00:38:12,910 I want Av1 over length of v1. 591 00:38:12,910 --> 00:38:14,750 OK. 592 00:38:14,750 --> 00:38:17,990 And I'm hoping to get that answer. 593 00:38:17,990 --> 00:38:20,450 Well what's the denominator here? 594 00:38:20,450 --> 00:38:23,835 The length of v1 is one. 595 00:38:23,835 --> 00:38:25,690 So no big deal there. 596 00:38:25,690 --> 00:38:27,350 That's one. 597 00:38:27,350 --> 00:38:29,605 What's the length of the top one? 598 00:38:32,530 --> 00:38:34,360 Now what is Av1? 599 00:38:34,360 --> 00:38:40,060 If v1 is the first right singular vector, than Av1 600 00:38:40,060 --> 00:38:45,558 is sigma 1 times u1. 601 00:38:45,558 --> 00:38:52,560 Remember, the singular vector deals were Av equals sigma u. 602 00:38:52,560 --> 00:38:58,740 Avk equals sigma k uk. 603 00:38:58,740 --> 00:38:59,890 You remember that. 604 00:38:59,890 --> 00:39:01,940 So they're not eigenvectors. 605 00:39:01,940 --> 00:39:03,120 They're singular vectors. 606 00:39:03,120 --> 00:39:13,110 So Av1 is the length of sigma 1 u1 and it's divided by 1. 607 00:39:13,110 --> 00:39:19,086 And of course, u1 is also a unit vector, so I just get sigma 1. 608 00:39:19,086 --> 00:39:19,586 OK. 609 00:39:23,250 --> 00:39:25,290 So that's another way to say that you 610 00:39:25,290 --> 00:39:30,420 can find sigma 1 by solving this maximum problem. 611 00:39:30,420 --> 00:39:33,030 And you get that sigma 1. 612 00:39:33,030 --> 00:39:35,200 OK. 613 00:39:35,200 --> 00:39:38,960 And I could get other matrix norms 614 00:39:38,960 --> 00:39:44,770 by maximizing that blow-up factor in that vector norm. 615 00:39:44,770 --> 00:39:50,750 I won't do that now, just to keep control of what we've got. 616 00:39:50,750 --> 00:39:55,970 Now what was the next matrix norm that came in last time? 617 00:39:55,970 --> 00:40:01,070 Very, very important one for deep learning and neural nets. 618 00:40:01,070 --> 00:40:04,680 Somehow it's a little simpler than this guy. 619 00:40:04,680 --> 00:40:07,860 And what was that matrix norm? 620 00:40:07,860 --> 00:40:12,490 What letter whose name goes here? 621 00:40:12,490 --> 00:40:13,960 Frobenius. 622 00:40:13,960 --> 00:40:17,200 So capital F for Frobenius. 623 00:40:17,200 --> 00:40:19,060 And what was that? 624 00:40:19,060 --> 00:40:23,410 That was the square root of the sum of all the-- 625 00:40:23,410 --> 00:40:33,670 add all the aij squares, for all over the matrix, 626 00:40:33,670 --> 00:40:36,600 and then take the square root. 627 00:40:36,600 --> 00:40:40,160 And then somebody asked a good question after class 628 00:40:40,160 --> 00:40:44,690 on Wednesday, what has that got to do with the sigmas? 629 00:40:44,690 --> 00:40:52,040 Because my point was that these norms are the guys that 630 00:40:52,040 --> 00:40:56,930 go with the sigmas, that have nice formulas for the sigmas, 631 00:40:56,930 --> 00:40:58,220 and here it is. 632 00:40:58,220 --> 00:41:01,665 It's the square root of the sum of the squares of all 633 00:41:01,665 --> 00:41:02,165 the sigmas. 634 00:41:07,130 --> 00:41:09,890 So let me write Frobenius again. 635 00:41:14,810 --> 00:41:20,450 But this notation with an F is now pretty standard, 636 00:41:20,450 --> 00:41:25,280 and we should be able to see why that number is 637 00:41:25,280 --> 00:41:26,701 the same as that number. 638 00:41:34,940 --> 00:41:35,440 Yeah. 639 00:41:39,420 --> 00:41:41,408 I could give you a reason or I could put it 640 00:41:41,408 --> 00:41:42,200 on the problem set. 641 00:41:46,330 --> 00:41:48,510 Yeah, I think that's better on the problem 642 00:41:48,510 --> 00:41:51,570 set, because first of all, I get off the hook 643 00:41:51,570 --> 00:41:56,910 right away, and secondly, this connection between-- 644 00:41:56,910 --> 00:42:00,820 in Frobenius, that's a beautiful fact about Frobenius norm 645 00:42:00,820 --> 00:42:03,415 that you add up all the sigma squares-- 646 00:42:03,415 --> 00:42:09,630 it's just m times n of them because it's a filled matrix. 647 00:42:09,630 --> 00:42:12,990 So another way to say it is, we haven't written down 648 00:42:12,990 --> 00:42:16,604 the SVD today, A equal u sigma v transposed. 649 00:42:20,960 --> 00:42:26,530 And the point is that, for the Frobenius norm-- 650 00:42:26,530 --> 00:42:29,300 actually, for all these norms-- 651 00:42:29,300 --> 00:42:30,790 I can change u. 652 00:42:30,790 --> 00:42:35,240 It doesn't change the norm, so I can make u the identity. 653 00:42:35,240 --> 00:42:38,070 u, as we all know, is an orthogonal matrix, 654 00:42:38,070 --> 00:42:41,220 and what I'm saying is, orthogonal matrix u 655 00:42:41,220 --> 00:42:43,980 doesn't change any of these particular norms. 656 00:42:43,980 --> 00:42:46,530 So suppose it was the identity. 657 00:42:46,530 --> 00:42:47,580 Same here. 658 00:42:47,580 --> 00:42:50,580 That could be the identity without changing the norm. 659 00:42:50,580 --> 00:42:55,110 So we're down to the norm of Frobenius. 660 00:42:55,110 --> 00:42:58,480 So what's the Frobenius norm of that guy? 661 00:43:01,180 --> 00:43:06,440 What's the Frobenius norm of that diagonal matrix? 662 00:43:06,440 --> 00:43:08,660 Well you're supposed to add up the squares 663 00:43:08,660 --> 00:43:13,410 of all the numbers in the matrix and what do you get? 664 00:43:13,410 --> 00:43:15,970 You get that, right? 665 00:43:15,970 --> 00:43:18,690 So that's why this is the same as this 666 00:43:18,690 --> 00:43:22,360 because the orthogonal guy there and the orthogonal guy there 667 00:43:22,360 --> 00:43:24,220 make no difference in the norm. 668 00:43:24,220 --> 00:43:27,630 But that takes checking, right? 669 00:43:27,630 --> 00:43:28,880 Yeah. 670 00:43:28,880 --> 00:43:30,820 But that's another way to see why 671 00:43:30,820 --> 00:43:32,810 the Frobenius norm gives this. 672 00:43:32,810 --> 00:43:35,330 And then finally, this was the nuclear norm. 673 00:43:38,410 --> 00:43:41,350 And actually, just before my lunch 674 00:43:41,350 --> 00:43:43,440 lecture on the subject of probability-- 675 00:43:43,440 --> 00:43:47,350 I've had a learning morning. 676 00:43:47,350 --> 00:43:52,030 The lunch lecture was about this crazy way that humans behave. 677 00:43:52,030 --> 00:43:58,000 Not us but other humans. 678 00:43:58,000 --> 00:44:02,310 Other actual-- well, no, I don't want to say that. 679 00:44:02,310 --> 00:44:06,090 Take that out of the tape. 680 00:44:06,090 --> 00:44:06,970 Yeah, OK. 681 00:44:06,970 --> 00:44:09,370 Anyway, that was that lecture, but before that 682 00:44:09,370 --> 00:44:16,240 was a lecture for an hour plus about deep learning by somebody 683 00:44:16,240 --> 00:44:19,600 who really, really has begun to understand 684 00:44:19,600 --> 00:44:21,820 what is happening inside. 685 00:44:21,820 --> 00:44:25,510 How does that gradient descent optimization 686 00:44:25,510 --> 00:44:31,660 algorithm pick out, what does it pick out as the thing 687 00:44:31,660 --> 00:44:33,550 it learns. 688 00:44:33,550 --> 00:44:38,200 This is going to be our goal in this course. 689 00:44:38,200 --> 00:44:39,550 We're not there yet. 690 00:44:39,550 --> 00:44:43,870 But his conjecture is that-- 691 00:44:43,870 --> 00:44:45,290 yeah, so it's a conjecture. 692 00:44:45,290 --> 00:44:46,390 He doesn't have a proof. 693 00:44:46,390 --> 00:44:49,720 He's got proofs of some nice cases 694 00:44:49,720 --> 00:44:52,720 where things commute but he hasn't got the whole thing yet, 695 00:44:52,720 --> 00:44:55,840 but it's pretty terrific work. 696 00:44:55,840 --> 00:45:03,430 So this was Professor Srebro who's in Chicago. 697 00:45:03,430 --> 00:45:05,610 So he just announced his conjecture, 698 00:45:05,610 --> 00:45:11,700 and his conjecture is that, in a modeled case, the deep learning 699 00:45:11,700 --> 00:45:14,910 that we'll learn about with the gradient descent 700 00:45:14,910 --> 00:45:19,210 that we'll learn about to find the best weights-- 701 00:45:19,210 --> 00:45:24,880 the point is that, in a typical deep learning 702 00:45:24,880 --> 00:45:30,050 problem these days, there are many more weights than samples 703 00:45:30,050 --> 00:45:34,170 and so there are a lot of possible minima. 704 00:45:34,170 --> 00:45:37,230 Many different weights give the same minimum loss 705 00:45:37,230 --> 00:45:40,050 because there are so many weights. 706 00:45:40,050 --> 00:45:44,010 The problem is, like, got too many variables, 707 00:45:44,010 --> 00:45:46,440 but it turns out to be a very, very good thing. 708 00:45:46,440 --> 00:45:48,310 That's part of the success. 709 00:45:48,310 --> 00:45:54,780 And he believes that in a model situation, 710 00:45:54,780 --> 00:46:00,090 that optimization by gradient descent 711 00:46:00,090 --> 00:46:06,970 picks out the weights that minimize the nuclear norm. 712 00:46:06,970 --> 00:46:10,650 So this would be a norm of a lot of weights. 713 00:46:10,650 --> 00:46:15,120 And he thinks that's where the system goes. 714 00:46:15,120 --> 00:46:16,155 We'll see this. 715 00:46:16,155 --> 00:46:18,510 This comes up in compressed sensing, 716 00:46:18,510 --> 00:46:21,420 as I mentioned last time. 717 00:46:21,420 --> 00:46:26,840 But now I have to remember what was the definition. 718 00:46:26,840 --> 00:46:30,580 Do you remember what the nuclear norm? 719 00:46:30,580 --> 00:46:35,060 He often used a little star instead of an N. 720 00:46:35,060 --> 00:46:37,020 I'll put that in the notes. 721 00:46:37,020 --> 00:46:39,790 Other people call it the trace norm. 722 00:46:39,790 --> 00:46:47,730 But I think this N kind of gives it a notation you can remember. 723 00:46:47,730 --> 00:46:49,733 So let's call it the nuclear norm. 724 00:46:49,733 --> 00:46:51,150 Do you remember what that one was? 725 00:46:54,000 --> 00:46:56,570 Yeah, somebody's saying it right. 726 00:46:56,570 --> 00:46:58,070 Add the sigmas, yeah. 727 00:46:58,070 --> 00:47:05,620 Just the sum of the sigmas, like the l1 norm, in a way. 728 00:47:05,620 --> 00:47:07,950 So that's the idea, is that this is 729 00:47:07,950 --> 00:47:14,210 the natural sort of l1 type of norm for matrices. 730 00:47:14,210 --> 00:47:17,353 It's the l1 norm for that sigma vector. 731 00:47:17,353 --> 00:47:19,270 This would be the l2 norm of the sigma vector. 732 00:47:19,270 --> 00:47:21,870 That would be the l infinity norm. 733 00:47:21,870 --> 00:47:28,010 Notice that the vector numbers, infinity, 2, and 1, get 734 00:47:28,010 --> 00:47:35,300 changed around when you look at the matrix guy. 735 00:47:35,300 --> 00:47:42,460 So that's an exciting idea and it remains to be proved. 736 00:47:42,460 --> 00:47:45,170 And expert people are experimenting to see, 737 00:47:45,170 --> 00:47:47,080 is it true? 738 00:47:47,080 --> 00:47:47,910 Yeah. 739 00:47:47,910 --> 00:47:50,820 So that's a big thing for their future. 740 00:47:50,820 --> 00:47:51,780 Yes. 741 00:47:51,780 --> 00:47:55,560 OK, so today, we've talked about norms 742 00:47:55,560 --> 00:47:59,730 and this section of the notes will be all about norms. 743 00:48:02,540 --> 00:48:09,930 We've taken a big leap into a comment about deep learning 744 00:48:09,930 --> 00:48:14,880 and this is what I want to say the most. 745 00:48:14,880 --> 00:48:18,120 And I say it to every class I teach 746 00:48:18,120 --> 00:48:22,050 near the start of the semester. 747 00:48:22,050 --> 00:48:26,410 My feeling about what my job is to teach you things, 748 00:48:26,410 --> 00:48:30,880 or to join with you in learning things, as happened today. 749 00:48:30,880 --> 00:48:32,260 It's not to grade you. 750 00:48:32,260 --> 00:48:37,810 I don't spend any time losing sleep-- you know, 751 00:48:37,810 --> 00:48:42,550 should that person take a one point or epsilon penalty 752 00:48:42,550 --> 00:48:47,461 for turning it in four minutes late? 753 00:48:47,461 --> 00:48:49,390 To Hell with that, right? 754 00:48:49,390 --> 00:48:52,780 We've got a lot to do here. 755 00:48:52,780 --> 00:48:55,150 So anyway, we'll get on with the job. 756 00:48:55,150 --> 00:49:00,760 So homework three coming up, and you'll 757 00:49:00,760 --> 00:49:02,950 be using the notes that you already 758 00:49:02,950 --> 00:49:07,410 have posted in Stellar for those sections eight and nine 759 00:49:07,410 --> 00:49:09,130 and so on. 760 00:49:09,130 --> 00:49:11,270 And we'll keep going on Monday. 761 00:49:11,270 --> 00:49:14,580 OK, see you on Monday and have a great weekend.