1 00:00:22,290 --> 00:00:26,050 GILBERT STRANG: So let me use the mic 2 00:00:26,050 --> 00:00:31,000 to introduce Alex Townsend, who taught here at MIT-- 3 00:00:31,000 --> 00:00:35,570 taught Linear Algebra 18.06 very successfully. 4 00:00:35,570 --> 00:00:38,980 And then now he's at Cornell on the faculty, 5 00:00:38,980 --> 00:00:40,870 still teaching very successfully. 6 00:00:40,870 --> 00:00:43,690 And he was invited here yesterday 7 00:00:43,690 --> 00:00:47,260 for a big event over in Engineering. 8 00:00:47,260 --> 00:00:54,820 And he agreed to give a talk about a section of the book-- 9 00:00:54,820 --> 00:00:57,520 section 4.3-- 10 00:00:57,520 --> 00:01:00,790 which, if you look at it, you'll see is all about his work. 11 00:01:00,790 --> 00:01:04,500 And now you get to hear from the creator himself. 12 00:01:04,500 --> 00:01:05,000 OK. 13 00:01:10,450 --> 00:01:11,200 ALEX TOWNSEND: OK. 14 00:01:11,200 --> 00:01:11,830 Thanks. 15 00:01:11,830 --> 00:01:12,710 Thank you, Gil. 16 00:01:12,710 --> 00:01:14,442 Thank you for inviting me here. 17 00:01:14,442 --> 00:01:15,990 I hope you're enjoying the course. 18 00:01:15,990 --> 00:01:19,990 Today I want to tell you a little about why 19 00:01:19,990 --> 00:01:24,440 there so many matrices that are low rank in the world. 20 00:01:24,440 --> 00:01:26,480 So as computational mathematicians-- 21 00:01:26,480 --> 00:01:30,700 Gil and myself-- we come across low-rank matrices all the time. 22 00:01:30,700 --> 00:01:36,650 And we started wondering, as a community, why? 23 00:01:36,650 --> 00:01:41,290 What is it about the problems that we are looking at? 24 00:01:41,290 --> 00:01:44,320 What makes low-rank matrices appear? 25 00:01:44,320 --> 00:01:46,690 And today I want to give you that story-- 26 00:01:46,690 --> 00:01:49,510 or at least an overview of that story. 27 00:01:49,510 --> 00:01:57,600 So for this class, x is going to be n by n real matrix. 28 00:01:57,600 --> 00:01:59,680 So nice and square. 29 00:01:59,680 --> 00:02:02,410 And you already know, or are very comfortable with, 30 00:02:02,410 --> 00:02:05,500 the singular values of a matrix. 31 00:02:05,500 --> 00:02:09,220 So the singular values of a matrix, as you know, 32 00:02:09,220 --> 00:02:15,360 are a sequence of numbers that are monotonically 33 00:02:15,360 --> 00:02:21,030 non-increasing that tell us all kinds of things 34 00:02:21,030 --> 00:02:22,290 about the matrix x. 35 00:02:26,250 --> 00:02:30,090 For example, the number of nonzero singular values 36 00:02:30,090 --> 00:02:33,840 tell us the rank of the matrix x. 37 00:02:33,840 --> 00:02:37,830 And they also, you probably know, tell us how well a matrix 38 00:02:37,830 --> 00:02:42,600 x can be approximated by a low-rank matrix. 39 00:02:42,600 --> 00:02:45,900 So let me just write two facts down that you already 40 00:02:45,900 --> 00:02:47,740 are familiar with. 41 00:02:47,740 --> 00:02:50,940 So here's a fact-- 42 00:02:50,940 --> 00:02:57,210 that, if I look at the number of non-zero singular values in x-- 43 00:02:57,210 --> 00:03:01,105 so I'm imagining there's going to be k non-zero singular 44 00:03:01,105 --> 00:03:01,605 values-- 45 00:03:06,600 --> 00:03:09,480 then we can say a few things about x. 46 00:03:09,480 --> 00:03:16,350 For example, the rank of x, as we know, is k-- 47 00:03:16,350 --> 00:03:19,980 the number of non-zero singular values. 48 00:03:19,980 --> 00:03:25,770 But we also know from the SVD that we can decompose x 49 00:03:25,770 --> 00:03:29,670 into a sum of rank 1 matrices-- 50 00:03:29,670 --> 00:03:33,120 in fact, the sum of k of them. 51 00:03:33,120 --> 00:03:37,770 So because x is rank k, we can write down 52 00:03:37,770 --> 00:03:45,150 a low-rank representation for x, and it involves k terms, 53 00:03:45,150 --> 00:03:47,830 like this. 54 00:03:47,830 --> 00:03:52,350 Each one of these vectors here is a column vector. 55 00:03:52,350 --> 00:03:57,390 So if I draw this pictorially, this guy 56 00:03:57,390 --> 00:03:58,800 looks like this, right? 57 00:03:58,800 --> 00:04:01,240 And we have k of them. 58 00:04:01,240 --> 00:04:06,390 So because x is rank k, we can write x as a sum of k 59 00:04:06,390 --> 00:04:08,580 rank 1 matrices. 60 00:04:08,580 --> 00:04:11,880 And we also have an initial fact that we already know-- 61 00:04:11,880 --> 00:04:16,740 that the dimension of the column space of x 62 00:04:16,740 --> 00:04:20,730 is equal to k, and the same with the row space. 63 00:04:20,730 --> 00:04:27,620 So the column space of x equals the row space of x-- 64 00:04:27,620 --> 00:04:35,190 the dimension-- and they all equal k. 65 00:04:35,190 --> 00:04:37,650 And so there are three facts we can determine 66 00:04:37,650 --> 00:04:44,100 from looking at this sequence of singular values of a matrix x. 67 00:04:44,100 --> 00:04:47,460 Of course, the singular value sequence is unique. 68 00:04:47,460 --> 00:04:50,880 X defines its own singular values. 69 00:04:54,270 --> 00:04:58,520 What we're interested in here is, what makes x? 70 00:04:58,520 --> 00:05:00,780 What are the properties of x that make 71 00:05:00,780 --> 00:05:02,970 sure that the singular values have a lot 72 00:05:02,970 --> 00:05:05,250 of zeros in that sequence? 73 00:05:05,250 --> 00:05:09,750 Can we try to understand what kind of x makes that happen? 74 00:05:12,960 --> 00:05:16,530 And we really like matrices that have a lot of zeros 75 00:05:16,530 --> 00:05:18,630 here, for the following reason-- 76 00:05:22,530 --> 00:05:30,330 we say x is low rank if the following holds, right? 77 00:05:30,330 --> 00:05:33,120 Because if we wanted to send x to our friend-- 78 00:05:33,120 --> 00:05:36,810 we're imagining x as picture where each entry 79 00:05:36,810 --> 00:05:40,170 is a pixel of that image. 80 00:05:40,170 --> 00:05:43,710 If that matrix-- that image-- was low rank, 81 00:05:43,710 --> 00:05:48,870 we could send the picture to our friend in two ways. 82 00:05:48,870 --> 00:05:53,310 We could send one every single entry of x. 83 00:05:53,310 --> 00:05:55,140 And for us to do that, we would have 84 00:05:55,140 --> 00:05:58,410 to send n squared pieces of information, 85 00:05:58,410 --> 00:06:01,010 because we'd have to send every entry. 86 00:06:01,010 --> 00:06:03,300 But if x is sufficiently low rank, 87 00:06:03,300 --> 00:06:07,250 we could also send our friend the vectors-- 88 00:06:07,250 --> 00:06:12,240 u, u1, v1, uk, up to vk. 89 00:06:12,240 --> 00:06:15,570 And how much pieces of data would we 90 00:06:15,570 --> 00:06:18,240 have to send our friend to get x to them 91 00:06:18,240 --> 00:06:20,730 if we sent in the low-rank form? 92 00:06:20,730 --> 00:06:27,390 Well, there's 2n here, 2n here numbers. 93 00:06:27,390 --> 00:06:28,530 There's k of them. 94 00:06:28,530 --> 00:06:33,750 So we'd have to send 2kn numbers. 95 00:06:33,750 --> 00:06:36,210 And we strictly say a matrix is low 96 00:06:36,210 --> 00:06:41,700 rank if it's more efficient to send x to our friend 97 00:06:41,700 --> 00:06:46,650 in low-rank form then in full-rank form. 98 00:06:46,650 --> 00:06:49,800 So this, of course, by a little calculation, 99 00:06:49,800 --> 00:06:53,880 just shows us that, provided the rank is 100 00:06:53,880 --> 00:06:56,640 less than half the size of the matrix, 101 00:06:56,640 --> 00:07:00,450 we are calling the matrix low rank. 102 00:07:00,450 --> 00:07:05,960 Now, often, in practice, we demand more. 103 00:07:09,920 --> 00:07:15,260 We demand that k is much smaller than this number, 104 00:07:15,260 --> 00:07:19,850 so that it's far more efficient to send our friend the matrix x 105 00:07:19,850 --> 00:07:23,750 in low-rank form than in full-rank form. 106 00:07:23,750 --> 00:07:26,810 So the colloquial use of the word low rank 107 00:07:26,810 --> 00:07:29,540 is kind of this situation. 108 00:07:29,540 --> 00:07:31,480 But this is the strict definition of it. 109 00:07:34,010 --> 00:07:39,600 So what do low-rank matrices look like? 110 00:07:39,600 --> 00:07:43,580 And to do that, I have some pictures for you. 111 00:07:43,580 --> 00:07:44,990 I have some flags-- 112 00:07:44,990 --> 00:07:47,990 the world flags. 113 00:07:47,990 --> 00:07:50,430 So these are all matrices x-- 114 00:07:50,430 --> 00:07:54,690 these examples-- because their flags happen to not be square. 115 00:07:54,690 --> 00:07:56,610 I hope you can all see this. 116 00:07:56,610 --> 00:08:01,380 But the top row here are all matrices 117 00:08:01,380 --> 00:08:04,360 that are extremely low rank. 118 00:08:04,360 --> 00:08:06,610 For example, the Austria flag-- 119 00:08:06,610 --> 00:08:08,350 if you want to send that to your friend, 120 00:08:08,350 --> 00:08:11,020 that matrix is of rank 1. 121 00:08:11,020 --> 00:08:14,740 So all you have to do is send your friend two vectors. 122 00:08:14,740 --> 00:08:18,250 You have to tell your friend the column space and the row space. 123 00:08:18,250 --> 00:08:21,190 And there's only the dimensions of one of both. 124 00:08:21,190 --> 00:08:24,790 For the English flag, you need to send them two column 125 00:08:24,790 --> 00:08:27,640 vectors and two row vectors-- 126 00:08:27,640 --> 00:08:31,900 u1, v1, u2 and v2. 127 00:08:31,900 --> 00:08:35,440 And as we go down this row, they get slowly fuller and fuller 128 00:08:35,440 --> 00:08:36,289 rank. 129 00:08:36,289 --> 00:08:38,440 So the Japanese flag, for example, 130 00:08:38,440 --> 00:08:43,059 is low rank but not that small. 131 00:08:43,059 --> 00:08:45,880 The Scottish flag is essentially full rank. 132 00:08:45,880 --> 00:08:50,350 So it's very inefficient to send your friend the Scottish flag 133 00:08:50,350 --> 00:08:51,340 in low-rank form. 134 00:08:51,340 --> 00:08:55,190 You're better off sending almost every single entry. 135 00:08:55,190 --> 00:08:58,430 So what do low-rank matrices look like? 136 00:09:11,360 --> 00:09:15,250 Well, if the matrix is extremely low rank, 137 00:09:15,250 --> 00:09:18,340 like rank 1, then when you look at that matrix-- 138 00:09:18,340 --> 00:09:19,900 like here, like the flag-- 139 00:09:19,900 --> 00:09:25,420 it's highly aligned with the coordinates-- 140 00:09:25,420 --> 00:09:27,440 with the rows and columns. 141 00:09:27,440 --> 00:09:33,590 So if it's rank 1, the matrix is highly aligned-- 142 00:09:33,590 --> 00:09:34,780 like the Austria flag. 143 00:09:41,050 --> 00:09:43,690 And of course, as we add in more and more rank here, 144 00:09:43,690 --> 00:09:46,300 the situation gets a bit blurry. 145 00:09:46,300 --> 00:09:49,450 For example, once we get into the medium rank situation, 146 00:09:49,450 --> 00:09:51,550 which is a circle, it's very hard 147 00:09:51,550 --> 00:09:56,800 to see that the circle is actually, in fact, low rank. 148 00:09:56,800 --> 00:09:58,960 But what I'm going to do was try to understand 149 00:09:58,960 --> 00:10:03,750 why the Scottish flag or diagonal patterns-- 150 00:10:03,750 --> 00:10:07,990 particularly a bad example for low rank. 151 00:10:07,990 --> 00:10:12,040 So I'm going to take the triangular flag 152 00:10:12,040 --> 00:10:15,980 to examine that more carefully. 153 00:10:15,980 --> 00:10:19,600 So the triangular flag looks like-- 154 00:10:19,600 --> 00:10:27,110 I'll take a square matrix and I'll color in the bottom half. 155 00:10:27,110 --> 00:10:31,055 So this matrix is the matrix of ones below the diagonal. 156 00:10:38,830 --> 00:10:41,560 And I'm interested in this matrix and, in particular, 157 00:10:41,560 --> 00:10:43,360 its singular values, to try to understand 158 00:10:43,360 --> 00:10:47,740 why diagonal patterns are not particularly 159 00:10:47,740 --> 00:10:51,760 useful for low-rank compression. 160 00:10:51,760 --> 00:10:57,340 And this matrix of all ones has a really nice property that, 161 00:10:57,340 --> 00:11:01,750 if I take its inverse, it looks a lot like-- 162 00:11:01,750 --> 00:11:04,850 getting close to Gil's favorite matrix. 163 00:11:04,850 --> 00:11:08,550 So if I take the inverse of this matrix-- 164 00:11:08,550 --> 00:11:12,100 it has an inverse because it's got ones on the diagonal-- 165 00:11:12,100 --> 00:11:21,220 then its inverse is the following matrix, 166 00:11:21,220 --> 00:11:24,220 which people familiar with finite difference schemes 167 00:11:24,220 --> 00:11:27,400 will notice the familiarity between that 168 00:11:27,400 --> 00:11:32,770 and the first order finite difference approximation. 169 00:11:32,770 --> 00:11:35,500 In particular, if I go a bit further and times 170 00:11:35,500 --> 00:11:39,130 two of these together, and do this, 171 00:11:39,130 --> 00:11:45,370 then this is essentially Gil's favorite matrix, 172 00:11:45,370 --> 00:11:49,355 except one entry happens to be different-- 173 00:11:52,540 --> 00:11:55,570 ends up being this matrix, which is 174 00:11:55,570 --> 00:11:59,800 very close to the second order, central, finite difference 175 00:11:59,800 --> 00:12:01,210 matrix. 176 00:12:01,210 --> 00:12:02,950 And people have very well studied 177 00:12:02,950 --> 00:12:06,430 that matrix and know its eigenvalues, 178 00:12:06,430 --> 00:12:07,930 its singular values-- 179 00:12:07,930 --> 00:12:10,300 they know everything about that matrix. 180 00:12:10,300 --> 00:12:12,310 And you'll remember that if we know 181 00:12:12,310 --> 00:12:18,010 the eigenvalues of a matrix, like x transpose x, 182 00:12:18,010 --> 00:12:21,610 we know the singular values of x. 183 00:12:21,610 --> 00:12:25,810 So this allows us to show, by the fact 184 00:12:25,810 --> 00:12:32,590 that we know that, that the singular values of this matrix 185 00:12:32,590 --> 00:12:34,990 are not very amenable to low rank. 186 00:12:34,990 --> 00:12:39,400 They're all non-zero, and they don't even decay. 187 00:12:39,400 --> 00:12:41,920 So I'm getting this from-- 188 00:12:44,520 --> 00:12:47,430 I rang up Gil, and Gil tells me these numbers. 189 00:12:54,670 --> 00:12:57,300 That allows us to work out exactly what the singular 190 00:12:57,300 --> 00:12:59,760 values of this matrix are, from the connection 191 00:12:59,760 --> 00:13:02,130 to finite differences. 192 00:13:02,130 --> 00:13:04,260 And so we can understand why this is not 193 00:13:04,260 --> 00:13:06,700 good by looking at the singular values. 194 00:13:06,700 --> 00:13:10,410 So the first singular value of x from this expression 195 00:13:10,410 --> 00:13:15,960 is going to be approximately 2n over pi. 196 00:13:15,960 --> 00:13:19,110 And from this expression, again, for the last guy-- 197 00:13:19,110 --> 00:13:23,260 the last singular value of x is going 198 00:13:23,260 --> 00:13:26,430 to be approximately a half. 199 00:13:26,430 --> 00:13:28,890 So these singular values are all large. 200 00:13:28,890 --> 00:13:31,110 They're not getting close to zero. 201 00:13:31,110 --> 00:13:37,070 If I plotted these singular values on a graph-- 202 00:13:37,070 --> 00:13:40,680 so here's the first singular value, the second, 203 00:13:40,680 --> 00:13:43,040 and the n-th-- 204 00:13:43,040 --> 00:13:45,210 then what would the graph look like? 205 00:13:45,210 --> 00:13:47,885 Well, plot these numbers. 206 00:13:51,330 --> 00:13:54,190 Divide by this guy so that they all 207 00:13:54,190 --> 00:13:59,800 are bounded between 1 and 0 because of the normalization, 208 00:13:59,800 --> 00:14:03,190 because I divided by sigma 1 of x. 209 00:14:03,190 --> 00:14:05,230 And so we can plot them, and they will 210 00:14:05,230 --> 00:14:11,520 look like this kind of thing. 211 00:14:15,150 --> 00:14:16,830 This number happens to be here where 212 00:14:16,830 --> 00:14:21,780 they come to be pi over 4n, which 213 00:14:21,780 --> 00:14:27,540 is me dividing this number by this number, approximately. 214 00:14:27,540 --> 00:14:32,071 So triangular patterns are extremely bad for low rank. 215 00:14:32,071 --> 00:14:35,040 We need things-- or we at least intuitively think 216 00:14:35,040 --> 00:14:39,120 that we need things-- aligned with the rows and columns, 217 00:14:39,120 --> 00:14:45,570 but the circle case happens to also be low rank. 218 00:14:45,570 --> 00:14:49,410 And so what happened to the Japanese flag? 219 00:14:54,300 --> 00:14:59,970 Why is the Japanese flag convenient for low rank? 220 00:14:59,970 --> 00:15:03,420 Well it's the fact that it's a circle, 221 00:15:03,420 --> 00:15:06,340 and there's lots of symmetry in a circle. 222 00:15:06,340 --> 00:15:13,740 So if I try to look at the rank of a circle, the Japanese flag, 223 00:15:13,740 --> 00:15:21,870 then I can bound this rank by decomposing the Japanese flag 224 00:15:21,870 --> 00:15:24,130 into two things. 225 00:15:24,130 --> 00:15:29,460 So this is going to be less than or equal to the rank of sum 226 00:15:29,460 --> 00:15:33,480 of two matrices, and I'll do it so that the decomposition works 227 00:15:33,480 --> 00:15:34,050 out. 228 00:15:34,050 --> 00:15:35,430 I have the circle. 229 00:15:35,430 --> 00:15:40,252 I'm going to cut out a rank one piece that lives 230 00:15:40,252 --> 00:15:41,460 in the middle of this circle. 231 00:15:47,440 --> 00:15:47,940 OK? 232 00:15:47,940 --> 00:15:51,360 And I'm going to cut out a square from the interior 233 00:15:51,360 --> 00:15:54,650 of that circle. 234 00:15:54,650 --> 00:15:55,490 OK? 235 00:15:55,490 --> 00:15:58,110 And I can figure out-- of course the rank is just bounded 236 00:15:58,110 --> 00:16:00,630 by the sum of those two ranks. 237 00:16:00,630 --> 00:16:04,320 This guy is bounded by rank one because it's highly 238 00:16:04,320 --> 00:16:05,370 aligned with the grid. 239 00:16:09,570 --> 00:16:11,800 So this guy is bounded by rank one. 240 00:16:11,800 --> 00:16:23,360 So this thing here plus 1. 241 00:16:26,010 --> 00:16:29,280 And now I have to try to understand 242 00:16:29,280 --> 00:16:32,820 the rank of this piece. 243 00:16:32,820 --> 00:16:35,910 Now this piece has lots of symmetry. 244 00:16:35,910 --> 00:16:39,690 For example, we know that the rank of that matrix 245 00:16:39,690 --> 00:16:43,320 is the dimension of the column space 246 00:16:43,320 --> 00:16:46,360 and the dimension of the row space. 247 00:16:46,360 --> 00:16:49,650 So when we look at this matrix, because of symmetry, 248 00:16:49,650 --> 00:16:55,230 if I divide this matrix in half along the columns, 249 00:16:55,230 --> 00:16:58,320 all the columns on the left appear on the right. 250 00:16:58,320 --> 00:17:01,920 So for example, the rank of this matrix 251 00:17:01,920 --> 00:17:04,589 is the same as the rank of that matrix 252 00:17:04,589 --> 00:17:07,880 because I didn't change the column space. 253 00:17:07,880 --> 00:17:08,430 OK? 254 00:17:08,430 --> 00:17:13,650 Now I go again and divide along the rows, 255 00:17:13,650 --> 00:17:17,819 and now the row dimension of this matrix 256 00:17:17,819 --> 00:17:20,880 is the same as the top half, because as I wipe out those, 257 00:17:20,880 --> 00:17:23,130 I didn't change the dimension of the row space 258 00:17:23,130 --> 00:17:26,079 because the rows are the same top-bottom. 259 00:17:26,079 --> 00:17:30,650 And so this becomes the rank of that tiny little matrix there. 260 00:17:30,650 --> 00:17:35,840 And because it's small, it won't have too large a rank. 261 00:17:35,840 --> 00:17:42,978 So this is definitely less than-- if I divide that up, 262 00:17:42,978 --> 00:17:50,330 a little guy here looks like that plus the other guy that 263 00:17:50,330 --> 00:18:00,450 looks like that plus 1. 264 00:18:00,450 --> 00:18:07,900 And so of course the row space of this matrix cannot be very 265 00:18:07,900 --> 00:18:10,030 high because this is a very thin matrix. 266 00:18:10,030 --> 00:18:13,800 There's lots of zeros in that matrix, only a few ones. 267 00:18:13,800 --> 00:18:15,520 And so you can go along and do a bit 268 00:18:15,520 --> 00:18:19,570 of trig to try to figure out how many rows are 269 00:18:19,570 --> 00:18:22,420 non-zero in this matrix. 270 00:18:22,420 --> 00:18:25,660 And a bit of trig tells you-- 271 00:18:25,660 --> 00:18:29,630 well it depends on the radius of this original circle. 272 00:18:29,630 --> 00:18:34,120 So if I make the original radius r of this Japanese flag, 273 00:18:34,120 --> 00:18:38,110 then the bound that you end up getting will be, 274 00:18:38,110 --> 00:18:43,710 for this matrix, r 1 minus square root 2 over 2 275 00:18:43,710 --> 00:18:44,570 for this guy. 276 00:18:44,570 --> 00:18:46,020 That's a bit of trig. 277 00:18:46,020 --> 00:18:48,220 I've got to make sure that's an integer. 278 00:18:48,220 --> 00:18:52,160 And then again, here it's the same but for the column space. 279 00:18:52,160 --> 00:18:53,430 So this is me just doing trig. 280 00:18:56,140 --> 00:18:56,640 OK? 281 00:18:56,640 --> 00:18:57,870 And that's bound on the rank. 282 00:18:57,870 --> 00:18:59,650 It happens to be extremely good. 283 00:18:59,650 --> 00:19:03,540 And if you work out what that rank is and try to look back, 284 00:19:03,540 --> 00:19:05,170 you will find it's extremely efficient 285 00:19:05,170 --> 00:19:10,513 to send the Japanese flag to your friend in low rank form, 286 00:19:10,513 --> 00:19:12,680 because it's not full rank because these numbers are 287 00:19:12,680 --> 00:19:13,760 so small. 288 00:19:13,760 --> 00:19:21,080 So this comes out to be, like, approximately 1/2 r plus 1. 289 00:19:21,080 --> 00:19:23,750 So much smaller than what you would expect, 290 00:19:23,750 --> 00:19:27,350 because remember, a circle is almost the anti-version version 291 00:19:27,350 --> 00:19:32,860 of a line with the grid, but yet, it's still low rank. 292 00:19:32,860 --> 00:19:35,050 OK. 293 00:19:35,050 --> 00:19:39,350 Now most matrices that we come up 294 00:19:39,350 --> 00:19:44,600 with in computational math are not exactly of finite rank. 295 00:19:44,600 --> 00:19:48,860 They are of numerical rank. 296 00:19:48,860 --> 00:19:51,390 And so I'll just define that. 297 00:19:51,390 --> 00:19:58,060 So the numerical rank of a matrix 298 00:19:58,060 --> 00:20:01,190 is very similar to the rank, except we allow ourselves 299 00:20:01,190 --> 00:20:04,310 a little bit of wiggle room when we define it, 300 00:20:04,310 --> 00:20:09,110 and so that amount of wiggle room will be of parameter 301 00:20:09,110 --> 00:20:12,140 called tol called epsilon. 302 00:20:12,140 --> 00:20:13,010 That's a tolerance. 303 00:20:13,010 --> 00:20:16,436 I'm thinking of epsilon as a tolerance. 304 00:20:16,436 --> 00:20:21,110 That's the amount of wiggle room I'm going to give myself. 305 00:20:21,110 --> 00:20:22,140 OK. 306 00:20:22,140 --> 00:20:27,240 And we say that the numerical rank-- 307 00:20:27,240 --> 00:20:31,350 I'll put an epsilon there to denote numerical rank-- 308 00:20:31,350 --> 00:20:34,170 is k. 309 00:20:34,170 --> 00:20:37,650 k is the first singular value, or the last singular value, 310 00:20:37,650 --> 00:20:39,120 above epsilon. 311 00:20:39,120 --> 00:20:42,370 In the following sense, I'm copying the definition above 312 00:20:42,370 --> 00:20:45,090 but with epsilons instead of zeros. 313 00:20:45,090 --> 00:20:52,960 If this singular value is less than epsilon, relatively, 314 00:20:52,960 --> 00:20:56,220 and the kth one was not below. 315 00:20:56,220 --> 00:21:00,880 So k plus 1 is the first singular value below epsilon 316 00:21:00,880 --> 00:21:03,130 in this relative sense. 317 00:21:03,130 --> 00:21:10,480 So of course the rank of 0x, if that was defined, 318 00:21:10,480 --> 00:21:13,480 is the same as the rank of x. 319 00:21:13,480 --> 00:21:14,290 OK? 320 00:21:14,290 --> 00:21:17,540 So this is just allowing ourselves some wiggle room. 321 00:21:17,540 --> 00:21:20,870 But this is actually what we're interested more in practice. 322 00:21:20,870 --> 00:21:21,370 All right? 323 00:21:21,370 --> 00:21:23,560 I don't want to necessarily send my friend 324 00:21:23,560 --> 00:21:26,145 the flag to exact precision. 325 00:21:26,145 --> 00:21:27,520 I would actually be happy to send 326 00:21:27,520 --> 00:21:31,550 my friend the flag up to 16 digits of precision, 327 00:21:31,550 --> 00:21:32,273 for example. 328 00:21:32,273 --> 00:21:34,690 They're not going to tell the difference between those two 329 00:21:34,690 --> 00:21:35,830 flags. 330 00:21:35,830 --> 00:21:39,550 And if I can get away with compressing the matrix 331 00:21:39,550 --> 00:21:42,220 a lot more once I have a little bit of wiggle room, 332 00:21:42,220 --> 00:21:44,270 that would be a good thing. 333 00:21:44,270 --> 00:21:58,250 So we know from the Eckart and Young 334 00:21:58,250 --> 00:22:02,540 that the singular values tell us how well we can approximate 335 00:22:02,540 --> 00:22:05,900 x by a low-rank matrix. 336 00:22:05,900 --> 00:22:13,460 In particular, we know that the k plus 1 singular value of x 337 00:22:13,460 --> 00:22:18,350 tells us how well x can be approximated by a rank k 338 00:22:18,350 --> 00:22:19,390 matrix. 339 00:22:19,390 --> 00:22:20,570 OK? 340 00:22:20,570 --> 00:22:26,180 For example, when the rank was exactly k, the sigma k plus 1 341 00:22:26,180 --> 00:22:29,570 was 0, and then this came out to be 0 342 00:22:29,570 --> 00:22:33,260 and we found that x was exactly a rank k matrix. 343 00:22:33,260 --> 00:22:36,170 Here, because we have the wiggle room, the epsilon, 344 00:22:36,170 --> 00:22:39,180 we get an approximation, not an exact. 345 00:22:39,180 --> 00:22:44,330 So this is telling us how well we can approximate 346 00:22:44,330 --> 00:22:47,330 x by a rank k matrix. 347 00:22:50,030 --> 00:22:51,290 OK? 348 00:22:51,290 --> 00:22:54,170 That's what the singular values are telling us. 349 00:22:54,170 --> 00:22:59,480 And so this allows us to try our best to compress matrices 350 00:22:59,480 --> 00:23:03,470 but use low-rank approximation rather 351 00:23:03,470 --> 00:23:05,315 than doing things exactly. 352 00:23:07,930 --> 00:23:11,100 And of course, on a computer, when we're using floating point 353 00:23:11,100 --> 00:23:16,350 arithmetic, or on a computer because we always round numbers 354 00:23:16,350 --> 00:23:21,450 to the nearest 16-digit number, if epsilon was 16 digits, 355 00:23:21,450 --> 00:23:23,310 your computer wouldn't be able to tell 356 00:23:23,310 --> 00:23:29,640 the difference between x or x the rank k 357 00:23:29,640 --> 00:23:35,040 approximation if this number satisfied this expression. 358 00:23:35,040 --> 00:23:38,190 Your computer would think of x and xk as the same matrix 359 00:23:38,190 --> 00:23:42,240 because it would inevitably round both 360 00:23:42,240 --> 00:23:45,410 to epsilon, within epsilon. 361 00:23:45,410 --> 00:23:46,150 OK. 362 00:23:46,150 --> 00:23:49,400 So what kind of matrices are numerically of low rank? 363 00:24:03,130 --> 00:24:08,620 Of course all low-rank matrices are numerically of low rank 364 00:24:08,620 --> 00:24:16,410 because the wiggle room can only help you 365 00:24:16,410 --> 00:24:19,230 but it's far more than that. 366 00:24:19,230 --> 00:24:21,060 There are many full-rank matrices-- 367 00:24:21,060 --> 00:24:24,570 matrices that don't have any singular values that are zero-- 368 00:24:24,570 --> 00:24:27,700 but the singular values decay rapidly to zero. 369 00:24:27,700 --> 00:24:32,370 That are full-rank matrices with low numerical rank because 370 00:24:32,370 --> 00:24:33,780 of the wiggle room. 371 00:24:33,780 --> 00:24:38,880 So for example, here is the classic matrix 372 00:24:38,880 --> 00:24:43,140 that fits this regime. 373 00:24:43,140 --> 00:24:45,570 If I give you this, this is called the Hilbert matrix. 374 00:24:51,070 --> 00:24:53,200 This is a matrix that happens to have 375 00:24:53,200 --> 00:24:57,860 extremely low numerical rank but it's actually 376 00:24:57,860 --> 00:25:05,560 full rank, which means that I can approximate H by a rank k 377 00:25:05,560 --> 00:25:08,620 matrix where k is quite small very well, 378 00:25:08,620 --> 00:25:10,880 provided you give me some wiggle room, 379 00:25:10,880 --> 00:25:13,750 but it's not a low-rank matrix in the sense 380 00:25:13,750 --> 00:25:16,300 that if epsilon was zero here, you didn't allow me 381 00:25:16,300 --> 00:25:18,220 the wriggle room, all the singular values 382 00:25:18,220 --> 00:25:20,500 of this matrix are positive. 383 00:25:20,500 --> 00:25:28,920 So it's of low numerical rank but it's not a low-rank matrix. 384 00:25:28,920 --> 00:25:32,550 The other classical example which 385 00:25:32,550 --> 00:25:35,370 motivated a lot of the research in this area 386 00:25:35,370 --> 00:25:37,780 was the Vandermonde matrix. 387 00:25:37,780 --> 00:25:39,285 So here is the Vandermonde matrix. 388 00:25:48,370 --> 00:25:50,580 An n by n version of it. 389 00:25:50,580 --> 00:25:52,110 Think of the xi's as real. 390 00:25:55,838 --> 00:25:57,578 And this is Vandermonde. 391 00:26:02,060 --> 00:26:03,680 This is the matrix that comes up when 392 00:26:03,680 --> 00:26:08,450 you try to do polynomial interpolation at real points. 393 00:26:08,450 --> 00:26:13,820 This is an extremely bad matrix to deal with because it's 394 00:26:13,820 --> 00:26:17,090 numerically low rank, and often, you actually 395 00:26:17,090 --> 00:26:21,050 want to solve a linear system with this matrix. 396 00:26:21,050 --> 00:26:24,530 And numerical low rank implies that it's extremely hard 397 00:26:24,530 --> 00:26:32,120 to invert, so numerical low rank is not always good for you. 398 00:26:32,120 --> 00:26:33,020 OK? 399 00:26:33,020 --> 00:26:42,620 Often, we want the inverse, which exists, 400 00:26:42,620 --> 00:26:56,030 but it's difficult because V has low numerical rank. 401 00:27:03,700 --> 00:27:04,300 OK. 402 00:27:04,300 --> 00:27:06,280 So people have been trying to understand 403 00:27:06,280 --> 00:27:09,400 why these matrices are numerically 404 00:27:09,400 --> 00:27:12,220 of low rank for a number of years, 405 00:27:12,220 --> 00:27:16,450 and the classic reason why there are 406 00:27:16,450 --> 00:27:21,040 so many low-rank matrices is because the world is smooth, 407 00:27:21,040 --> 00:27:22,240 as people say. 408 00:27:22,240 --> 00:27:25,630 They say, the world is smooth. 409 00:27:25,630 --> 00:27:32,570 That's why matrices are of numerical low rank. 410 00:27:32,570 --> 00:27:38,710 And to illustrate that point, I will do an example. 411 00:27:38,710 --> 00:27:41,140 So this is classically understood 412 00:27:41,140 --> 00:27:50,150 by a man called Reade in 1983, and this 413 00:27:50,150 --> 00:27:51,740 is what his reason was. 414 00:27:51,740 --> 00:27:54,090 I have a picture of John Reade. 415 00:27:54,090 --> 00:27:56,780 He's not very famous, so I try to make 416 00:27:56,780 --> 00:28:00,492 sure his picture gets around. 417 00:28:00,492 --> 00:28:01,450 He's playing the piano. 418 00:28:01,450 --> 00:28:04,400 It's, like, one of the only pictures I could find of him. 419 00:28:04,400 --> 00:28:06,830 So what is in this reason? 420 00:28:06,830 --> 00:28:08,520 Why do people say this? 421 00:28:08,520 --> 00:28:12,830 Well here's an example that illustrates it. 422 00:28:12,830 --> 00:28:19,930 If I take a polynomial in two variables and I-- 423 00:28:19,930 --> 00:28:23,050 for example, this is a polynomial of two variables-- 424 00:28:23,050 --> 00:28:27,340 and my x matrix comes from sampling 425 00:28:27,340 --> 00:28:30,580 that polynomial integers-- 426 00:28:30,580 --> 00:28:38,940 for example, this matrix-- 427 00:28:38,940 --> 00:28:41,730 then that matrix happens to be of low rank-- 428 00:28:44,940 --> 00:28:50,250 mathematically of low rank, with epsilon equals zero. 429 00:28:50,250 --> 00:28:50,790 Why is that? 430 00:28:50,790 --> 00:28:54,480 Well if I write down x in terms of matrices, 431 00:28:54,480 --> 00:28:56,220 you could easily see it. 432 00:28:56,220 --> 00:29:00,120 So this is made up of a matrix of all ones 433 00:29:00,120 --> 00:29:11,160 plus a matrix of j-- so that's 1, 2, up to n, 1, 2, up to n, 434 00:29:11,160 --> 00:29:12,750 because every entry of that matrix 435 00:29:12,750 --> 00:29:15,500 just depends on the row index. 436 00:29:15,500 --> 00:29:18,730 And then this guy depends on both j and k. 437 00:29:18,730 --> 00:29:21,330 So this is a multiplication table, right? 438 00:29:21,330 --> 00:29:31,635 So this is n, 2, 4, up to 2n, n, 2n, n squared. 439 00:29:31,635 --> 00:29:34,050 OK. 440 00:29:34,050 --> 00:29:38,130 Clearly, the matrix of all ones is a rank one matrix. 441 00:29:42,260 --> 00:29:43,560 The same with this guy. 442 00:29:43,560 --> 00:29:47,220 The column space is just of dimension one. 443 00:29:47,220 --> 00:29:51,960 And the last guy also happens to be of rank one 444 00:29:51,960 --> 00:29:58,365 because I can write this matrix in rank one form, which 445 00:29:58,365 --> 00:30:03,250 is a column vector times a row vector. 446 00:30:03,250 --> 00:30:04,090 OK. 447 00:30:04,090 --> 00:30:07,280 So this matrix x is of rank three. 448 00:30:13,710 --> 00:30:16,620 I guess at lowest rank three is what I've actually shown. 449 00:30:16,620 --> 00:30:17,120 OK. 450 00:30:20,000 --> 00:30:23,480 Now of course this hasn't got to numerical low rank yet, 451 00:30:23,480 --> 00:30:24,820 so let's get ourselves there. 452 00:30:28,690 --> 00:30:32,160 So Reade knew this, and he said to himself, OK, 453 00:30:32,160 --> 00:30:35,590 well if I can approximate-- 454 00:30:35,590 --> 00:30:38,800 if x is actually coming from sampling a function, 455 00:30:38,800 --> 00:30:41,890 and I approximate that function by polynomial, 456 00:30:41,890 --> 00:30:45,670 then I'm going to get myself a low-rank approximation 457 00:30:45,670 --> 00:30:48,920 and get a bound on the numerical rank. 458 00:30:48,920 --> 00:30:56,620 So in general, if I give you a polynomial of two variables, 459 00:30:56,620 --> 00:30:58,630 which can be written down-- 460 00:30:58,630 --> 00:31:04,000 it's degree n in both x and y. 461 00:31:04,000 --> 00:31:07,375 Let's just keep these indexes away from the matrix index. 462 00:31:10,360 --> 00:31:14,170 I give you this such polynomial, and I go away 463 00:31:14,170 --> 00:31:22,150 and I sample it and make a matrix X, then X, 464 00:31:22,150 --> 00:31:24,220 by looking at each term individually like I 465 00:31:24,220 --> 00:31:30,520 did there, will have low rank mathematically, 466 00:31:30,520 --> 00:31:31,830 with epsilon equals zero. 467 00:31:31,830 --> 00:31:35,590 This will have, at most, m squared rank, 468 00:31:35,590 --> 00:31:39,160 and if m is 3 or 4 or 10, it possibly 469 00:31:39,160 --> 00:31:43,570 could be low because this X could be a large matrix. 470 00:31:43,570 --> 00:31:44,320 OK. 471 00:31:44,320 --> 00:31:47,020 So what Reade did for the Hilbert matrix was said, 472 00:31:47,020 --> 00:31:49,270 OK, well look at that guy. 473 00:31:49,270 --> 00:31:52,193 That guy looks like it's sampling a function. 474 00:31:52,193 --> 00:31:53,860 It looks like it's sampling the function 475 00:31:53,860 --> 00:31:57,170 1 over x plus y minus 1. 476 00:31:57,170 --> 00:32:02,650 So he said to himself, well, that x, 477 00:32:02,650 --> 00:32:07,540 if I look at the Hilbert matrix, then that 478 00:32:07,540 --> 00:32:09,600 is sampling a function. 479 00:32:09,600 --> 00:32:13,270 It happens to not be a polynomial. 480 00:32:13,270 --> 00:32:16,480 It happens to be this function. 481 00:32:16,480 --> 00:32:20,950 But that's OK because sampling polynomials, integers, 482 00:32:20,950 --> 00:32:22,970 gives me low rank exactly. 483 00:32:22,970 --> 00:32:27,670 Maybe sampling smooth functions, functions like this, 484 00:32:27,670 --> 00:32:29,770 can be well approximated by polynomials 485 00:32:29,770 --> 00:32:32,680 and therefore have low numerical rank. 486 00:32:32,680 --> 00:32:34,900 And that's what he did in this case. 487 00:32:34,900 --> 00:32:42,790 So he tried to find a p, a polynomial approximation to f. 488 00:32:42,790 --> 00:32:45,610 In particular, he looked at exactly this kind 489 00:32:45,610 --> 00:32:46,994 of approximation. 490 00:32:50,870 --> 00:32:54,320 So he has some numbers here so that things 491 00:32:54,320 --> 00:32:55,805 get dissolved later. 492 00:32:55,805 --> 00:33:01,220 And he tried to find a p that did this kind of approximation. 493 00:33:01,220 --> 00:33:03,200 So this approximates f. 494 00:33:08,560 --> 00:33:14,080 And then he would develop a low-rank approximation to X 495 00:33:14,080 --> 00:33:16,780 by sampling p. 496 00:33:16,780 --> 00:33:26,260 So he would say, OK, well if I let y be a sampling of p, then 497 00:33:26,260 --> 00:33:29,620 from the fact that f is a good approximation to p, 498 00:33:29,620 --> 00:33:34,590 y is a good approximation to X. And so this has finite rank. 499 00:33:38,198 --> 00:33:43,100 He wrote down that this must hold. 500 00:33:46,160 --> 00:33:49,500 And the epsilon comes out here because these factors 501 00:33:49,500 --> 00:33:51,390 were chosen just right. 502 00:33:51,390 --> 00:33:54,910 The divide by n was chosen so that the epsilon came out just 503 00:33:54,910 --> 00:33:55,960 there. 504 00:33:55,960 --> 00:33:56,460 OK? 505 00:33:56,460 --> 00:33:59,340 So, for many years, that was kind of the canonical reason 506 00:33:59,340 --> 00:34:01,600 that people would give, that, well, 507 00:34:01,600 --> 00:34:05,920 if the matrix X is sampled from a smooth function, 508 00:34:05,920 --> 00:34:10,440 then we can approximate our function by a polynomial 509 00:34:10,440 --> 00:34:15,060 and get polynomial rank approximations. 510 00:34:15,060 --> 00:34:18,929 And therefore, the matrix X will be of low numerical rank. 511 00:34:22,310 --> 00:34:26,630 There's an issue with this reasoning, 512 00:34:26,630 --> 00:34:28,489 especially for the Hilbert matrix, 513 00:34:28,489 --> 00:34:31,710 that it doesn't actually work that well. 514 00:34:31,710 --> 00:34:38,710 So for example, if I take the 1,000 by 1,000 Hilbert matrix 515 00:34:38,710 --> 00:34:41,072 and I look at its rank-- 516 00:34:41,072 --> 00:34:45,040 OK, well I've already told you this is full rank. 517 00:34:45,040 --> 00:34:46,650 You'll get 1,000. 518 00:34:46,650 --> 00:34:50,929 All the singular values are positive. 519 00:34:50,929 --> 00:34:55,980 If I look at the numerical rank of this 1,000 520 00:34:55,980 --> 00:35:00,480 by 1,000 Hilbert matrix and I compute it, I compute the SVD 521 00:35:00,480 --> 00:35:06,600 and I look at how many are above epsilon where epsilon is 10 522 00:35:06,600 --> 00:35:10,620 to the minus 15, so that means I can 523 00:35:10,620 --> 00:35:13,650 approximate the 1,000 by 1,000 Hilbert matrix 524 00:35:13,650 --> 00:35:18,750 by a rank 28 matrix and only give up 525 00:35:18,750 --> 00:35:24,450 15-- there will be exact 15 digits, which is a huge amount. 526 00:35:24,450 --> 00:35:27,030 So this is what we get in practice, 527 00:35:27,030 --> 00:35:42,670 but Reade's argument here shows that the rank of this matrix, 528 00:35:42,670 --> 00:35:45,980 the numerical rank, is at most. 529 00:35:49,220 --> 00:35:53,210 So it doesn't do a very good job on the Hilbert matrix 530 00:35:53,210 --> 00:35:57,220 for bounding the rank, right? 531 00:35:57,220 --> 00:36:00,100 So Reade comes along, takes this function. 532 00:36:00,100 --> 00:36:02,520 He tries to find a polynomial that does this, where 533 00:36:02,520 --> 00:36:04,480 epsilon is 10 to the minus 15. 534 00:36:04,480 --> 00:36:07,240 He finds that the number of terms 535 00:36:07,240 --> 00:36:13,570 that he needs in this expression here is around 719, 536 00:36:13,570 --> 00:36:16,580 and therefore, that's the rank that he gets. 537 00:36:16,580 --> 00:36:19,300 The bound on the numerical rank. 538 00:36:19,300 --> 00:36:25,120 The trouble is that 719 tells us that this is not 539 00:36:25,120 --> 00:36:27,820 of low numerical rank, but we know 540 00:36:27,820 --> 00:36:32,450 it is, so it's an unsatisfactory reason. 541 00:36:32,450 --> 00:36:36,690 So there's been several people trying 542 00:36:36,690 --> 00:36:39,900 to come up with more appropriate reasons that 543 00:36:39,900 --> 00:36:44,190 explain the 28 here. 544 00:36:44,190 --> 00:36:50,220 And so one reason that I've started to use 545 00:36:50,220 --> 00:36:52,710 is another slightly different way 546 00:36:52,710 --> 00:36:57,990 of looking at things, which is to say the world is Sylvester. 547 00:37:03,690 --> 00:37:10,420 Now Sylvester, what does that mean? 548 00:37:10,420 --> 00:37:13,010 What does the word "Sylvester" mean in this case? 549 00:37:13,010 --> 00:37:14,950 It means that the matrices satisfy 550 00:37:14,950 --> 00:37:20,080 a certain type of equation called the Sylvester equation, 551 00:37:20,080 --> 00:37:25,420 and so the reason is really, many of these matrices 552 00:37:25,420 --> 00:37:30,220 satisfy a Sylvester equation, and that takes the form-- 553 00:37:36,270 --> 00:37:44,190 for sum A, B, and C. 554 00:37:44,190 --> 00:37:44,690 OK. 555 00:37:44,690 --> 00:37:46,580 So X is your matrix of interest. 556 00:37:46,580 --> 00:37:50,690 You want to show X is of numerical low rank. 557 00:37:50,690 --> 00:37:54,770 And the task at hand is to find an A, B, and C so 558 00:37:54,770 --> 00:37:58,880 that X satisfies that equation. 559 00:37:58,880 --> 00:38:00,030 OK. 560 00:38:00,030 --> 00:38:05,450 For example, the two matrices I've had on the board 561 00:38:05,450 --> 00:38:09,360 satisfy a Sylvester equation-- 562 00:38:09,360 --> 00:38:10,780 a Sylvester matrix equation. 563 00:38:10,780 --> 00:38:14,490 There is an A, a B, and a C for which they do this. 564 00:38:14,490 --> 00:38:17,677 For example, remember the Hilbert matrix, 565 00:38:17,677 --> 00:38:20,010 which we have there still, but I'll write it down again. 566 00:38:24,710 --> 00:38:26,770 Has these entries. 567 00:38:26,770 --> 00:38:28,770 So all we need to do is to try to figure out 568 00:38:28,770 --> 00:38:32,760 an A, a B, and then a C so that we can make 569 00:38:32,760 --> 00:38:34,230 it fit a Sylvester equation. 570 00:38:34,230 --> 00:38:36,820 There's many different ways of doing this. 571 00:38:36,820 --> 00:38:41,050 The one that I like is the following, 572 00:38:41,050 --> 00:38:45,660 where if I put 1/2 here and 3/2 here, 573 00:38:45,660 --> 00:38:51,245 all the way down to n minus 1/2, times this matrix-- 574 00:38:53,850 --> 00:38:59,080 so this is timesing the top of this matrix by 1/2 575 00:38:59,080 --> 00:39:02,050 and then 3/2 and then 5/2. 576 00:39:02,050 --> 00:39:05,620 So we're basically timesing each entry of this matrix 577 00:39:05,620 --> 00:39:07,435 by j minus 1/2. 578 00:39:10,510 --> 00:39:12,040 And then I do something on the right 579 00:39:12,040 --> 00:39:14,582 here, which I'm allowed to do because I've got the B freedom, 580 00:39:14,582 --> 00:39:19,280 and I choose this to be the same up to a minus sign. 581 00:39:23,680 --> 00:39:26,830 Then when you think about this, what is it doing? 582 00:39:26,830 --> 00:39:30,460 It's timing the jk entry-- 583 00:39:30,460 --> 00:39:33,370 this is-- by j minus 1/2. 584 00:39:33,370 --> 00:39:34,840 That's what this is doing. 585 00:39:34,840 --> 00:39:37,570 And what's this doing is timesing the jk entry 586 00:39:37,570 --> 00:39:40,420 by k minus 1/2. 587 00:39:40,420 --> 00:39:44,590 So this is, in total, timesing the jk entry 588 00:39:44,590 --> 00:39:49,700 by j plus k minus 1/2 minus 1/2, which is minus 1, 589 00:39:49,700 --> 00:39:54,810 so this is timesing the jk entry by j plus k minus 1. 590 00:39:54,810 --> 00:39:57,690 So it knocks out the denominator. 591 00:39:57,690 --> 00:40:02,260 And what we get from this equation is a bunch of ones. 592 00:40:11,050 --> 00:40:14,080 So in this case, A and B are diagonal, 593 00:40:14,080 --> 00:40:17,250 and C is the matrix of all ones. 594 00:40:17,250 --> 00:40:17,920 OK? 595 00:40:17,920 --> 00:40:20,590 We can also do this for Vandermonde. 596 00:40:20,590 --> 00:40:25,960 So Vandermonde, you'll remember, looks like this. 597 00:40:30,970 --> 00:40:35,610 And then over here, we have this guy, 598 00:40:35,610 --> 00:40:40,910 the matrix that appears with polynomial interpolation. 599 00:40:40,910 --> 00:40:41,410 OK. 600 00:40:41,410 --> 00:40:44,530 So if I think about this, I could also 601 00:40:44,530 --> 00:40:50,200 come up with an A, B, and C, and for example, here's 602 00:40:50,200 --> 00:40:52,180 one that works. 603 00:40:52,180 --> 00:40:55,360 I can stick the x's on the diagonal. 604 00:40:59,740 --> 00:41:03,850 So if you imagine what that matrix on the left is doing, 605 00:41:03,850 --> 00:41:08,090 it's timesing each column by the vector x. 606 00:41:08,090 --> 00:41:08,590 OK? 607 00:41:08,590 --> 00:41:12,940 So the first column of this matrix becomes x, the vector x. 608 00:41:12,940 --> 00:41:16,630 The second becomes the vector x squared, 609 00:41:16,630 --> 00:41:18,670 where squared is done entry-wise. 610 00:41:18,670 --> 00:41:21,140 And then the third entry is now x cubed, 611 00:41:21,140 --> 00:41:24,420 and when we get to the last, it's x to the n. 612 00:41:24,420 --> 00:41:24,920 OK? 613 00:41:24,920 --> 00:41:30,080 So that's like, multiply each column by the vector x. 614 00:41:30,080 --> 00:41:32,480 So if I want to try to come up with a matrix-- 615 00:41:32,480 --> 00:41:36,680 so what's left is of low rank, is like of this form. 616 00:41:36,680 --> 00:41:40,520 What I can do is shift the columns. 617 00:41:40,520 --> 00:41:43,580 So I've noticed that this product here, 618 00:41:43,580 --> 00:41:46,688 this diagonal matrix, has made the first column x. 619 00:41:46,688 --> 00:41:48,230 So if I want to kill off that column, 620 00:41:48,230 --> 00:41:52,187 I can take the second column and permute it to the first column. 621 00:41:52,187 --> 00:41:54,020 I could take the third column and permute it 622 00:41:54,020 --> 00:41:56,810 to the second, the last column and permute it 623 00:41:56,810 --> 00:41:58,910 to the penultimate column here. 624 00:41:58,910 --> 00:42:00,590 And that will actually kill off a lot 625 00:42:00,590 --> 00:42:03,710 of what I've created in this matrix right here. 626 00:42:03,710 --> 00:42:05,300 So let me write that down. 627 00:42:05,300 --> 00:42:08,510 This is a circumshift matrix. 628 00:42:08,510 --> 00:42:10,280 This does that permutation. 629 00:42:17,350 --> 00:42:18,393 I've put a minus 1 there. 630 00:42:18,393 --> 00:42:19,810 I could have put any number there. 631 00:42:19,810 --> 00:42:22,440 It doesn't make any difference. 632 00:42:22,440 --> 00:42:25,120 But this is the one that works out extremely nicely. 633 00:42:25,120 --> 00:42:29,170 Now this zeros out lots of things because of the way 634 00:42:29,170 --> 00:42:32,005 I've done the multiplication by x and the circumshift 635 00:42:32,005 --> 00:42:33,790 of the columns. 636 00:42:33,790 --> 00:42:39,010 And so the first column is zero because this first column is x, 637 00:42:39,010 --> 00:42:43,240 this first column is x, so I've got x minus x. 638 00:42:43,240 --> 00:42:47,170 This column was x squared minus x squared, so I got zero, 639 00:42:47,170 --> 00:42:51,430 and I just keep going along until that last column. 640 00:42:51,430 --> 00:42:54,040 That last column is a problem because the last column 641 00:42:54,040 --> 00:42:57,280 of this guy is x to the n, whereas I 642 00:42:57,280 --> 00:43:02,240 don't have x to the n in V, so there are some numbers here. 643 00:43:02,240 --> 00:43:02,740 OK. 644 00:43:05,940 --> 00:43:08,550 You'll notice that C in both cases 645 00:43:08,550 --> 00:43:11,070 happens to be a low-rank matrix. 646 00:43:11,070 --> 00:43:14,661 In these cases, it happens to be of rank one. 647 00:43:14,661 --> 00:43:19,290 And so people were wondering, maybe it's 648 00:43:19,290 --> 00:43:23,040 something to do with satisfying these kind of equations that 649 00:43:23,040 --> 00:43:27,180 makes these matrices that appear in practice 650 00:43:27,180 --> 00:43:29,850 numerically of low rank. 651 00:43:29,850 --> 00:43:33,630 And after a lot of work in this area, 652 00:43:33,630 --> 00:43:37,740 people have come up with a bound that 653 00:43:37,740 --> 00:43:42,360 demonstrates that these kind of equations 654 00:43:42,360 --> 00:43:46,470 are key to understanding numerical low rank. 655 00:43:46,470 --> 00:44:04,750 So if X satisfies a Sylvester equation, like this, and A 656 00:44:04,750 --> 00:44:06,993 is normal, B is normal-- 657 00:44:06,993 --> 00:44:08,410 I don't really want to concentrate 658 00:44:08,410 --> 00:44:12,590 on those two conditions. 659 00:44:12,590 --> 00:44:17,230 It's a little bit academic. 660 00:44:17,230 --> 00:44:21,430 Then-- people have found a bound on the singular 661 00:44:21,430 --> 00:44:24,010 values of any matrix that satisfies 662 00:44:24,010 --> 00:44:27,970 this kind of expression, and they 663 00:44:27,970 --> 00:44:30,460 found this following bound. 664 00:44:41,829 --> 00:44:49,240 OK, so here, the rank of C is r. 665 00:44:49,240 --> 00:44:50,180 So that goes there. 666 00:44:50,180 --> 00:44:52,360 So in our cases, the two examples we have, 667 00:44:52,360 --> 00:44:55,990 r is 1, so we can forget about r. 668 00:44:55,990 --> 00:45:02,930 This nasty guy here is called the Zolotarev number. 669 00:45:07,010 --> 00:45:12,814 E is a set that contains the eigenvalues of A, 670 00:45:12,814 --> 00:45:23,270 and F is a set that contains the eigenvalues of B. OK. 671 00:45:23,270 --> 00:45:26,480 Now it looks like we have gained absolutely nothing 672 00:45:26,480 --> 00:45:30,260 by this bound, because I've just told you singular values 673 00:45:30,260 --> 00:45:32,540 are bound by Zolotarev numbers. 674 00:45:32,540 --> 00:45:35,450 That doesn't mean anything to anyone. 675 00:45:35,450 --> 00:45:38,960 It means a little bit to me but not that much. 676 00:45:38,960 --> 00:45:42,020 So the key to this bound-- 677 00:45:42,020 --> 00:45:43,970 the reason this is useful-- 678 00:45:43,970 --> 00:45:49,220 is that so many people have worked out what these Zolotarev 679 00:45:49,220 --> 00:45:52,190 numbers actually mean. 680 00:45:52,190 --> 00:45:52,700 OK? 681 00:45:52,700 --> 00:45:57,620 So these are two key people that worked out 682 00:45:57,620 --> 00:45:59,360 what this bound means. 683 00:45:59,360 --> 00:46:02,600 And we have gained a lot because people 684 00:46:02,600 --> 00:46:04,880 have been studying this number. 685 00:46:04,880 --> 00:46:06,740 This is, like, a number that people 686 00:46:06,740 --> 00:46:11,600 cared about from 1870 onwards to the present day, 687 00:46:11,600 --> 00:46:14,870 and people have studied this number extremely well. 688 00:46:14,870 --> 00:46:17,510 So we've gained something by turning it 689 00:46:17,510 --> 00:46:21,290 into a more abstract problem that people have thought 690 00:46:21,290 --> 00:46:23,990 about previously, and now we can go 691 00:46:23,990 --> 00:46:26,450 to the literature on Zolotarev numbers, 692 00:46:26,450 --> 00:46:30,980 whatever they are, and discover this whole literature of work 693 00:46:30,980 --> 00:46:32,900 on this Zolotarev number. 694 00:46:32,900 --> 00:46:34,433 And the key part-- 695 00:46:34,433 --> 00:46:35,600 I'll just tell you the key-- 696 00:46:41,240 --> 00:46:44,690 is that the sets E and F are separated. 697 00:46:53,960 --> 00:46:57,905 So for example, in the Hilbert matrix, the eigenvalues of A 698 00:46:57,905 --> 00:46:59,135 can be read off the diagonal. 699 00:47:05,540 --> 00:47:06,155 What are they? 700 00:47:06,155 --> 00:47:13,550 They are between minus 1/2 and n minus 1/2. 701 00:47:13,550 --> 00:47:20,540 And the eigenvalues of B lie in the set minus 1/2 702 00:47:20,540 --> 00:47:23,524 minus n plus 1/2. 703 00:47:23,524 --> 00:47:27,610 And the key reason why the Hilbert matrix 704 00:47:27,610 --> 00:47:30,610 is of low numerical rank is the fact 705 00:47:30,610 --> 00:47:33,470 that these two sets are separated, 706 00:47:33,470 --> 00:47:36,580 and that makes this Zolotarev number gets small extremely 707 00:47:36,580 --> 00:47:38,950 quickly with k. 708 00:47:38,950 --> 00:47:41,080 Now you might wonder why there is a question 709 00:47:41,080 --> 00:47:44,980 mark on Penzl's name. 710 00:47:44,980 --> 00:47:49,690 There is an unofficial curse that's 711 00:47:49,690 --> 00:47:51,430 been going on for a while. 712 00:47:51,430 --> 00:47:54,730 Both these men died while working on the Zolotarev 713 00:47:54,730 --> 00:47:55,870 problem. 714 00:47:55,870 --> 00:48:00,270 They both died at the age of 31. 715 00:48:00,270 --> 00:48:03,990 One died by being hit by a train, Zolotarev. 716 00:48:03,990 --> 00:48:08,660 It's unclear whether he was suicidal or it was accidental. 717 00:48:08,660 --> 00:48:13,440 Penzl died at the age of 31 in the Canadian mountains 718 00:48:13,440 --> 00:48:16,030 by an avalanche. 719 00:48:16,030 --> 00:48:21,130 I am currently not yet 31 but going to be 31 very soon, 720 00:48:21,130 --> 00:48:23,830 and I'm scared that I may join this list. 721 00:48:27,860 --> 00:48:29,020 OK. 722 00:48:29,020 --> 00:48:32,770 But for the Hilbert matrix, what you get from this analysis, 723 00:48:32,770 --> 00:48:36,520 based on these two peoples' work, 724 00:48:36,520 --> 00:48:39,700 is a bound on the numerical rank. 725 00:48:39,700 --> 00:48:43,150 And the rank that you get is, let's say, 726 00:48:43,150 --> 00:48:45,820 a world record bound. 727 00:48:45,820 --> 00:48:56,260 For the Hilbert matrix is 34, which is not quite 28, not yet, 728 00:48:56,260 --> 00:49:02,860 but it's far more descriptive of 28 than 719. 729 00:49:02,860 --> 00:49:07,420 And so this technique of bounding singular values 730 00:49:07,420 --> 00:49:11,530 by using these Zolotarev numbers is starting to gain popularity 731 00:49:11,530 --> 00:49:15,940 because we can finally answer to ourselves why there are so 732 00:49:15,940 --> 00:49:20,420 many low-rank matrices that appear in computational math. 733 00:49:20,420 --> 00:49:27,440 And it's all based on two 31-year-olds that died. 734 00:49:27,440 --> 00:49:30,710 And so if you ever wonder when you're 735 00:49:30,710 --> 00:49:33,830 doing computational science when a low rank appears 736 00:49:33,830 --> 00:49:36,710 and the smoothness argument does not work for you, 737 00:49:36,710 --> 00:49:40,990 you might like to think about Zolotarev and the curse. 738 00:49:40,990 --> 00:49:42,721 OK, thank you very much. 739 00:49:42,721 --> 00:49:44,605 [APPLAUSE] 740 00:49:47,431 --> 00:49:49,790 GILBERT STRANG: Thank you [INAUDIBLE] Excellent. 741 00:49:49,790 --> 00:49:51,880 ALEX TOWNSEND: How does it work now? 742 00:49:51,880 --> 00:49:53,680 GILBERT STRANG: We're good. 743 00:49:53,680 --> 00:49:54,180 Yeah. 744 00:49:54,180 --> 00:49:55,930 ALEX TOWNSEND: I'm happy to take questions 745 00:49:55,930 --> 00:49:57,920 if we have a minute, if you have any questions. 746 00:49:57,920 --> 00:49:59,545 GILBERT STRANG: How near of 31 are you? 747 00:50:02,085 --> 00:50:03,960 ALEX TOWNSEND: [INAUDIBLE] I get a spotlight. 748 00:50:03,960 --> 00:50:05,490 I'm 31 in December. 749 00:50:05,490 --> 00:50:06,330 GILBERT STRANG: Wow. 750 00:50:06,330 --> 00:50:07,170 OK. 751 00:50:07,170 --> 00:50:10,920 ALEX TOWNSEND: So they died at the age of 31, so you know, 752 00:50:10,920 --> 00:50:14,640 next year is the scary year for me. 753 00:50:14,640 --> 00:50:16,550 So I'm not driving anywhere. 754 00:50:16,550 --> 00:50:20,898 I'm not leaving my house until I become 32. 755 00:50:20,898 --> 00:50:22,690 GILBERT STRANG: Well, thank you [INAUDIBLE] 756 00:50:22,690 --> 00:50:24,190 ALEX TOWNSEND: OK, thanks. 757 00:50:24,190 --> 00:50:26,640 [APPLAUSE]