1
00:00:22,290 --> 00:00:26,050
GILBERT STRANG: So
let me use the mic

2
00:00:26,050 --> 00:00:31,000
to introduce Alex Townsend,
who taught here at MIT--

3
00:00:31,000 --> 00:00:35,570
taught Linear Algebra
18.06 very successfully.

4
00:00:35,570 --> 00:00:38,980
And then now he's at
Cornell on the faculty,

5
00:00:38,980 --> 00:00:40,870
still teaching
very successfully.

6
00:00:40,870 --> 00:00:43,690
And he was invited
here yesterday

7
00:00:43,690 --> 00:00:47,260
for a big event
over in Engineering.

8
00:00:47,260 --> 00:00:54,820
And he agreed to give a talk
about a section of the book--

9
00:00:54,820 --> 00:00:57,520
section 4.3--

10
00:00:57,520 --> 00:01:00,790
which, if you look at it, you'll
see is all about his work.

11
00:01:00,790 --> 00:01:04,500
And now you get to hear
from the creator himself.

12
00:01:04,500 --> 00:01:05,000
OK.

13
00:01:10,450 --> 00:01:11,200
ALEX TOWNSEND: OK.

14
00:01:11,200 --> 00:01:11,830
Thanks.

15
00:01:11,830 --> 00:01:12,710
Thank you, Gil.

16
00:01:12,710 --> 00:01:14,442
Thank you for inviting me here.

17
00:01:14,442 --> 00:01:15,990
I hope you're
enjoying the course.

18
00:01:15,990 --> 00:01:19,990
Today I want to tell
you a little about why

19
00:01:19,990 --> 00:01:24,440
there so many matrices that
are low rank in the world.

20
00:01:24,440 --> 00:01:26,480
So as computational
mathematicians--

21
00:01:26,480 --> 00:01:30,700
Gil and myself-- we come across
low-rank matrices all the time.

22
00:01:30,700 --> 00:01:36,650
And we started wondering,
as a community, why?

23
00:01:36,650 --> 00:01:41,290
What is it about the problems
that we are looking at?

24
00:01:41,290 --> 00:01:44,320
What makes low-rank
matrices appear?

25
00:01:44,320 --> 00:01:46,690
And today I want to
give you that story--

26
00:01:46,690 --> 00:01:49,510
or at least an
overview of that story.

27
00:01:49,510 --> 00:01:57,600
So for this class, x is going
to be n by n real matrix.

28
00:01:57,600 --> 00:01:59,680
So nice and square.

29
00:01:59,680 --> 00:02:02,410
And you already know, or
are very comfortable with,

30
00:02:02,410 --> 00:02:05,500
the singular values of a matrix.

31
00:02:05,500 --> 00:02:09,220
So the singular values
of a matrix, as you know,

32
00:02:09,220 --> 00:02:15,360
are a sequence of numbers
that are monotonically

33
00:02:15,360 --> 00:02:21,030
non-increasing that tell
us all kinds of things

34
00:02:21,030 --> 00:02:22,290
about the matrix x.

35
00:02:26,250 --> 00:02:30,090
For example, the number
of nonzero singular values

36
00:02:30,090 --> 00:02:33,840
tell us the rank
of the matrix x.

37
00:02:33,840 --> 00:02:37,830
And they also, you probably
know, tell us how well a matrix

38
00:02:37,830 --> 00:02:42,600
x can be approximated
by a low-rank matrix.

39
00:02:42,600 --> 00:02:45,900
So let me just write two
facts down that you already

40
00:02:45,900 --> 00:02:47,740
are familiar with.

41
00:02:47,740 --> 00:02:50,940
So here's a fact--

42
00:02:50,940 --> 00:02:57,210
that, if I look at the number of
non-zero singular values in x--

43
00:02:57,210 --> 00:03:01,105
so I'm imagining there's going
to be k non-zero singular

44
00:03:01,105 --> 00:03:01,605
values--

45
00:03:06,600 --> 00:03:09,480
then we can say a
few things about x.

46
00:03:09,480 --> 00:03:16,350
For example, the rank
of x, as we know, is k--

47
00:03:16,350 --> 00:03:19,980
the number of non-zero
singular values.

48
00:03:19,980 --> 00:03:25,770
But we also know from the
SVD that we can decompose x

49
00:03:25,770 --> 00:03:29,670
into a sum of rank 1 matrices--

50
00:03:29,670 --> 00:03:33,120
in fact, the sum of k of them.

51
00:03:33,120 --> 00:03:37,770
So because x is rank
k, we can write down

52
00:03:37,770 --> 00:03:45,150
a low-rank representation for
x, and it involves k terms,

53
00:03:45,150 --> 00:03:47,830
like this.

54
00:03:47,830 --> 00:03:52,350
Each one of these vectors
here is a column vector.

55
00:03:52,350 --> 00:03:57,390
So if I draw this
pictorially, this guy

56
00:03:57,390 --> 00:03:58,800
looks like this, right?

57
00:03:58,800 --> 00:04:01,240
And we have k of them.

58
00:04:01,240 --> 00:04:06,390
So because x is rank k, we
can write x as a sum of k

59
00:04:06,390 --> 00:04:08,580
rank 1 matrices.

60
00:04:08,580 --> 00:04:11,880
And we also have an initial
fact that we already know--

61
00:04:11,880 --> 00:04:16,740
that the dimension of
the column space of x

62
00:04:16,740 --> 00:04:20,730
is equal to k, and the
same with the row space.

63
00:04:20,730 --> 00:04:27,620
So the column space of x
equals the row space of x--

64
00:04:27,620 --> 00:04:35,190
the dimension-- and
they all equal k.

65
00:04:35,190 --> 00:04:37,650
And so there are three
facts we can determine

66
00:04:37,650 --> 00:04:44,100
from looking at this sequence of
singular values of a matrix x.

67
00:04:44,100 --> 00:04:47,460
Of course, the singular
value sequence is unique.

68
00:04:47,460 --> 00:04:50,880
X defines its own
singular values.

69
00:04:54,270 --> 00:04:58,520
What we're interested in
here is, what makes x?

70
00:04:58,520 --> 00:05:00,780
What are the properties
of x that make

71
00:05:00,780 --> 00:05:02,970
sure that the singular
values have a lot

72
00:05:02,970 --> 00:05:05,250
of zeros in that sequence?

73
00:05:05,250 --> 00:05:09,750
Can we try to understand what
kind of x makes that happen?

74
00:05:12,960 --> 00:05:16,530
And we really like matrices
that have a lot of zeros

75
00:05:16,530 --> 00:05:18,630
here, for the following reason--

76
00:05:22,530 --> 00:05:30,330
we say x is low rank if
the following holds, right?

77
00:05:30,330 --> 00:05:33,120
Because if we wanted to
send x to our friend--

78
00:05:33,120 --> 00:05:36,810
we're imagining x as
picture where each entry

79
00:05:36,810 --> 00:05:40,170
is a pixel of that image.

80
00:05:40,170 --> 00:05:43,710
If that matrix-- that
image-- was low rank,

81
00:05:43,710 --> 00:05:48,870
we could send the picture
to our friend in two ways.

82
00:05:48,870 --> 00:05:53,310
We could send one every
single entry of x.

83
00:05:53,310 --> 00:05:55,140
And for us to do
that, we would have

84
00:05:55,140 --> 00:05:58,410
to send n squared
pieces of information,

85
00:05:58,410 --> 00:06:01,010
because we'd have
to send every entry.

86
00:06:01,010 --> 00:06:03,300
But if x is
sufficiently low rank,

87
00:06:03,300 --> 00:06:07,250
we could also send our
friend the vectors--

88
00:06:07,250 --> 00:06:12,240
u, u1, v1, uk, up to vk.

89
00:06:12,240 --> 00:06:15,570
And how much pieces
of data would we

90
00:06:15,570 --> 00:06:18,240
have to send our
friend to get x to them

91
00:06:18,240 --> 00:06:20,730
if we sent in the low-rank form?

92
00:06:20,730 --> 00:06:27,390
Well, there's 2n
here, 2n here numbers.

93
00:06:27,390 --> 00:06:28,530
There's k of them.

94
00:06:28,530 --> 00:06:33,750
So we'd have to
send 2kn numbers.

95
00:06:33,750 --> 00:06:36,210
And we strictly
say a matrix is low

96
00:06:36,210 --> 00:06:41,700
rank if it's more efficient
to send x to our friend

97
00:06:41,700 --> 00:06:46,650
in low-rank form then
in full-rank form.

98
00:06:46,650 --> 00:06:49,800
So this, of course, by
a little calculation,

99
00:06:49,800 --> 00:06:53,880
just shows us that,
provided the rank is

100
00:06:53,880 --> 00:06:56,640
less than half the
size of the matrix,

101
00:06:56,640 --> 00:07:00,450
we are calling the
matrix low rank.

102
00:07:00,450 --> 00:07:05,960
Now, often, in practice,
we demand more.

103
00:07:09,920 --> 00:07:15,260
We demand that k is much
smaller than this number,

104
00:07:15,260 --> 00:07:19,850
so that it's far more efficient
to send our friend the matrix x

105
00:07:19,850 --> 00:07:23,750
in low-rank form than
in full-rank form.

106
00:07:23,750 --> 00:07:26,810
So the colloquial use
of the word low rank

107
00:07:26,810 --> 00:07:29,540
is kind of this situation.

108
00:07:29,540 --> 00:07:31,480
But this is the strict
definition of it.

109
00:07:34,010 --> 00:07:39,600
So what do low-rank
matrices look like?

110
00:07:39,600 --> 00:07:43,580
And to do that, I have
some pictures for you.

111
00:07:43,580 --> 00:07:44,990
I have some flags--

112
00:07:44,990 --> 00:07:47,990
the world flags.

113
00:07:47,990 --> 00:07:50,430
So these are all matrices x--

114
00:07:50,430 --> 00:07:54,690
these examples-- because their
flags happen to not be square.

115
00:07:54,690 --> 00:07:56,610
I hope you can all see this.

116
00:07:56,610 --> 00:08:01,380
But the top row here
are all matrices

117
00:08:01,380 --> 00:08:04,360
that are extremely low rank.

118
00:08:04,360 --> 00:08:06,610
For example, the Austria flag--

119
00:08:06,610 --> 00:08:08,350
if you want to send
that to your friend,

120
00:08:08,350 --> 00:08:11,020
that matrix is of rank 1.

121
00:08:11,020 --> 00:08:14,740
So all you have to do is
send your friend two vectors.

122
00:08:14,740 --> 00:08:18,250
You have to tell your friend the
column space and the row space.

123
00:08:18,250 --> 00:08:21,190
And there's only the
dimensions of one of both.

124
00:08:21,190 --> 00:08:24,790
For the English flag, you
need to send them two column

125
00:08:24,790 --> 00:08:27,640
vectors and two row vectors--

126
00:08:27,640 --> 00:08:31,900
u1, v1, u2 and v2.

127
00:08:31,900 --> 00:08:35,440
And as we go down this row, they
get slowly fuller and fuller

128
00:08:35,440 --> 00:08:36,289
rank.

129
00:08:36,289 --> 00:08:38,440
So the Japanese
flag, for example,

130
00:08:38,440 --> 00:08:43,059
is low rank but not that small.

131
00:08:43,059 --> 00:08:45,880
The Scottish flag is
essentially full rank.

132
00:08:45,880 --> 00:08:50,350
So it's very inefficient to send
your friend the Scottish flag

133
00:08:50,350 --> 00:08:51,340
in low-rank form.

134
00:08:51,340 --> 00:08:55,190
You're better off sending
almost every single entry.

135
00:08:55,190 --> 00:08:58,430
So what do low-rank
matrices look like?

136
00:09:11,360 --> 00:09:15,250
Well, if the matrix
is extremely low rank,

137
00:09:15,250 --> 00:09:18,340
like rank 1, then when
you look at that matrix--

138
00:09:18,340 --> 00:09:19,900
like here, like the flag--

139
00:09:19,900 --> 00:09:25,420
it's highly aligned
with the coordinates--

140
00:09:25,420 --> 00:09:27,440
with the rows and columns.

141
00:09:27,440 --> 00:09:33,590
So if it's rank 1, the
matrix is highly aligned--

142
00:09:33,590 --> 00:09:34,780
like the Austria flag.

143
00:09:41,050 --> 00:09:43,690
And of course, as we add
in more and more rank here,

144
00:09:43,690 --> 00:09:46,300
the situation gets a bit blurry.

145
00:09:46,300 --> 00:09:49,450
For example, once we get into
the medium rank situation,

146
00:09:49,450 --> 00:09:51,550
which is a circle,
it's very hard

147
00:09:51,550 --> 00:09:56,800
to see that the circle is
actually, in fact, low rank.

148
00:09:56,800 --> 00:09:58,960
But what I'm going to
do was try to understand

149
00:09:58,960 --> 00:10:03,750
why the Scottish flag
or diagonal patterns--

150
00:10:03,750 --> 00:10:07,990
particularly a bad
example for low rank.

151
00:10:07,990 --> 00:10:12,040
So I'm going to take
the triangular flag

152
00:10:12,040 --> 00:10:15,980
to examine that more carefully.

153
00:10:15,980 --> 00:10:19,600
So the triangular
flag looks like--

154
00:10:19,600 --> 00:10:27,110
I'll take a square matrix and
I'll color in the bottom half.

155
00:10:27,110 --> 00:10:31,055
So this matrix is the matrix
of ones below the diagonal.

156
00:10:38,830 --> 00:10:41,560
And I'm interested in this
matrix and, in particular,

157
00:10:41,560 --> 00:10:43,360
its singular values,
to try to understand

158
00:10:43,360 --> 00:10:47,740
why diagonal patterns
are not particularly

159
00:10:47,740 --> 00:10:51,760
useful for low-rank compression.

160
00:10:51,760 --> 00:10:57,340
And this matrix of all ones has
a really nice property that,

161
00:10:57,340 --> 00:11:01,750
if I take its inverse,
it looks a lot like--

162
00:11:01,750 --> 00:11:04,850
getting close to
Gil's favorite matrix.

163
00:11:04,850 --> 00:11:08,550
So if I take the
inverse of this matrix--

164
00:11:08,550 --> 00:11:12,100
it has an inverse because it's
got ones on the diagonal--

165
00:11:12,100 --> 00:11:21,220
then its inverse is
the following matrix,

166
00:11:21,220 --> 00:11:24,220
which people familiar with
finite difference schemes

167
00:11:24,220 --> 00:11:27,400
will notice the
familiarity between that

168
00:11:27,400 --> 00:11:32,770
and the first order finite
difference approximation.

169
00:11:32,770 --> 00:11:35,500
In particular, if I go
a bit further and times

170
00:11:35,500 --> 00:11:39,130
two of these
together, and do this,

171
00:11:39,130 --> 00:11:45,370
then this is essentially
Gil's favorite matrix,

172
00:11:45,370 --> 00:11:49,355
except one entry happens
to be different--

173
00:11:52,540 --> 00:11:55,570
ends up being this
matrix, which is

174
00:11:55,570 --> 00:11:59,800
very close to the second order,
central, finite difference

175
00:11:59,800 --> 00:12:01,210
matrix.

176
00:12:01,210 --> 00:12:02,950
And people have
very well studied

177
00:12:02,950 --> 00:12:06,430
that matrix and know
its eigenvalues,

178
00:12:06,430 --> 00:12:07,930
its singular values--

179
00:12:07,930 --> 00:12:10,300
they know everything
about that matrix.

180
00:12:10,300 --> 00:12:12,310
And you'll remember
that if we know

181
00:12:12,310 --> 00:12:18,010
the eigenvalues of a
matrix, like x transpose x,

182
00:12:18,010 --> 00:12:21,610
we know the singular
values of x.

183
00:12:21,610 --> 00:12:25,810
So this allows us
to show, by the fact

184
00:12:25,810 --> 00:12:32,590
that we know that, that the
singular values of this matrix

185
00:12:32,590 --> 00:12:34,990
are not very
amenable to low rank.

186
00:12:34,990 --> 00:12:39,400
They're all non-zero, and
they don't even decay.

187
00:12:39,400 --> 00:12:41,920
So I'm getting this from--

188
00:12:44,520 --> 00:12:47,430
I rang up Gil, and Gil
tells me these numbers.

189
00:12:54,670 --> 00:12:57,300
That allows us to work out
exactly what the singular

190
00:12:57,300 --> 00:12:59,760
values of this matrix
are, from the connection

191
00:12:59,760 --> 00:13:02,130
to finite differences.

192
00:13:02,130 --> 00:13:04,260
And so we can understand
why this is not

193
00:13:04,260 --> 00:13:06,700
good by looking at
the singular values.

194
00:13:06,700 --> 00:13:10,410
So the first singular value
of x from this expression

195
00:13:10,410 --> 00:13:15,960
is going to be
approximately 2n over pi.

196
00:13:15,960 --> 00:13:19,110
And from this expression,
again, for the last guy--

197
00:13:19,110 --> 00:13:23,260
the last singular
value of x is going

198
00:13:23,260 --> 00:13:26,430
to be approximately a half.

199
00:13:26,430 --> 00:13:28,890
So these singular
values are all large.

200
00:13:28,890 --> 00:13:31,110
They're not getting
close to zero.

201
00:13:31,110 --> 00:13:37,070
If I plotted these singular
values on a graph--

202
00:13:37,070 --> 00:13:40,680
so here's the first
singular value, the second,

203
00:13:40,680 --> 00:13:43,040
and the n-th--

204
00:13:43,040 --> 00:13:45,210
then what would the
graph look like?

205
00:13:45,210 --> 00:13:47,885
Well, plot these numbers.

206
00:13:51,330 --> 00:13:54,190
Divide by this guy
so that they all

207
00:13:54,190 --> 00:13:59,800
are bounded between 1 and 0
because of the normalization,

208
00:13:59,800 --> 00:14:03,190
because I divided
by sigma 1 of x.

209
00:14:03,190 --> 00:14:05,230
And so we can plot
them, and they will

210
00:14:05,230 --> 00:14:11,520
look like this kind of thing.

211
00:14:15,150 --> 00:14:16,830
This number happens
to be here where

212
00:14:16,830 --> 00:14:21,780
they come to be
pi over 4n, which

213
00:14:21,780 --> 00:14:27,540
is me dividing this number by
this number, approximately.

214
00:14:27,540 --> 00:14:32,071
So triangular patterns are
extremely bad for low rank.

215
00:14:32,071 --> 00:14:35,040
We need things-- or we at
least intuitively think

216
00:14:35,040 --> 00:14:39,120
that we need things-- aligned
with the rows and columns,

217
00:14:39,120 --> 00:14:45,570
but the circle case happens
to also be low rank.

218
00:14:45,570 --> 00:14:49,410
And so what happened
to the Japanese flag?

219
00:14:54,300 --> 00:14:59,970
Why is the Japanese flag
convenient for low rank?

220
00:14:59,970 --> 00:15:03,420
Well it's the fact
that it's a circle,

221
00:15:03,420 --> 00:15:06,340
and there's lots of
symmetry in a circle.

222
00:15:06,340 --> 00:15:13,740
So if I try to look at the rank
of a circle, the Japanese flag,

223
00:15:13,740 --> 00:15:21,870
then I can bound this rank by
decomposing the Japanese flag

224
00:15:21,870 --> 00:15:24,130
into two things.

225
00:15:24,130 --> 00:15:29,460
So this is going to be less
than or equal to the rank of sum

226
00:15:29,460 --> 00:15:33,480
of two matrices, and I'll do it
so that the decomposition works

227
00:15:33,480 --> 00:15:34,050
out.

228
00:15:34,050 --> 00:15:35,430
I have the circle.

229
00:15:35,430 --> 00:15:40,252
I'm going to cut out a
rank one piece that lives

230
00:15:40,252 --> 00:15:41,460
in the middle of this circle.

231
00:15:47,440 --> 00:15:47,940
OK?

232
00:15:47,940 --> 00:15:51,360
And I'm going to cut out
a square from the interior

233
00:15:51,360 --> 00:15:54,650
of that circle.

234
00:15:54,650 --> 00:15:55,490
OK?

235
00:15:55,490 --> 00:15:58,110
And I can figure out-- of
course the rank is just bounded

236
00:15:58,110 --> 00:16:00,630
by the sum of those two ranks.

237
00:16:00,630 --> 00:16:04,320
This guy is bounded by rank
one because it's highly

238
00:16:04,320 --> 00:16:05,370
aligned with the grid.

239
00:16:09,570 --> 00:16:11,800
So this guy is
bounded by rank one.

240
00:16:11,800 --> 00:16:23,360
So this thing here plus 1.

241
00:16:26,010 --> 00:16:29,280
And now I have to
try to understand

242
00:16:29,280 --> 00:16:32,820
the rank of this piece.

243
00:16:32,820 --> 00:16:35,910
Now this piece has
lots of symmetry.

244
00:16:35,910 --> 00:16:39,690
For example, we know that
the rank of that matrix

245
00:16:39,690 --> 00:16:43,320
is the dimension
of the column space

246
00:16:43,320 --> 00:16:46,360
and the dimension
of the row space.

247
00:16:46,360 --> 00:16:49,650
So when we look at this
matrix, because of symmetry,

248
00:16:49,650 --> 00:16:55,230
if I divide this matrix
in half along the columns,

249
00:16:55,230 --> 00:16:58,320
all the columns on the
left appear on the right.

250
00:16:58,320 --> 00:17:01,920
So for example, the
rank of this matrix

251
00:17:01,920 --> 00:17:04,589
is the same as the
rank of that matrix

252
00:17:04,589 --> 00:17:07,880
because I didn't change
the column space.

253
00:17:07,880 --> 00:17:08,430
OK?

254
00:17:08,430 --> 00:17:13,650
Now I go again and
divide along the rows,

255
00:17:13,650 --> 00:17:17,819
and now the row
dimension of this matrix

256
00:17:17,819 --> 00:17:20,880
is the same as the top half,
because as I wipe out those,

257
00:17:20,880 --> 00:17:23,130
I didn't change the
dimension of the row space

258
00:17:23,130 --> 00:17:26,079
because the rows are
the same top-bottom.

259
00:17:26,079 --> 00:17:30,650
And so this becomes the rank of
that tiny little matrix there.

260
00:17:30,650 --> 00:17:35,840
And because it's small, it
won't have too large a rank.

261
00:17:35,840 --> 00:17:42,978
So this is definitely less
than-- if I divide that up,

262
00:17:42,978 --> 00:17:50,330
a little guy here looks like
that plus the other guy that

263
00:17:50,330 --> 00:18:00,450
looks like that plus 1.

264
00:18:00,450 --> 00:18:07,900
And so of course the row space
of this matrix cannot be very

265
00:18:07,900 --> 00:18:10,030
high because this is
a very thin matrix.

266
00:18:10,030 --> 00:18:13,800
There's lots of zeros in
that matrix, only a few ones.

267
00:18:13,800 --> 00:18:15,520
And so you can go
along and do a bit

268
00:18:15,520 --> 00:18:19,570
of trig to try to figure
out how many rows are

269
00:18:19,570 --> 00:18:22,420
non-zero in this matrix.

270
00:18:22,420 --> 00:18:25,660
And a bit of trig tells you--

271
00:18:25,660 --> 00:18:29,630
well it depends on the radius
of this original circle.

272
00:18:29,630 --> 00:18:34,120
So if I make the original
radius r of this Japanese flag,

273
00:18:34,120 --> 00:18:38,110
then the bound that you
end up getting will be,

274
00:18:38,110 --> 00:18:43,710
for this matrix, r 1
minus square root 2 over 2

275
00:18:43,710 --> 00:18:44,570
for this guy.

276
00:18:44,570 --> 00:18:46,020
That's a bit of trig.

277
00:18:46,020 --> 00:18:48,220
I've got to make sure
that's an integer.

278
00:18:48,220 --> 00:18:52,160
And then again, here it's the
same but for the column space.

279
00:18:52,160 --> 00:18:53,430
So this is me just doing trig.

280
00:18:56,140 --> 00:18:56,640
OK?

281
00:18:56,640 --> 00:18:57,870
And that's bound on the rank.

282
00:18:57,870 --> 00:18:59,650
It happens to be extremely good.

283
00:18:59,650 --> 00:19:03,540
And if you work out what that
rank is and try to look back,

284
00:19:03,540 --> 00:19:05,170
you will find it's
extremely efficient

285
00:19:05,170 --> 00:19:10,513
to send the Japanese flag to
your friend in low rank form,

286
00:19:10,513 --> 00:19:12,680
because it's not full rank
because these numbers are

287
00:19:12,680 --> 00:19:13,760
so small.

288
00:19:13,760 --> 00:19:21,080
So this comes out to be, like,
approximately 1/2 r plus 1.

289
00:19:21,080 --> 00:19:23,750
So much smaller than
what you would expect,

290
00:19:23,750 --> 00:19:27,350
because remember, a circle is
almost the anti-version version

291
00:19:27,350 --> 00:19:32,860
of a line with the grid, but
yet, it's still low rank.

292
00:19:32,860 --> 00:19:35,050
OK.

293
00:19:35,050 --> 00:19:39,350
Now most matrices
that we come up

294
00:19:39,350 --> 00:19:44,600
with in computational math are
not exactly of finite rank.

295
00:19:44,600 --> 00:19:48,860
They are of numerical rank.

296
00:19:48,860 --> 00:19:51,390
And so I'll just define that.

297
00:19:51,390 --> 00:19:58,060
So the numerical
rank of a matrix

298
00:19:58,060 --> 00:20:01,190
is very similar to the rank,
except we allow ourselves

299
00:20:01,190 --> 00:20:04,310
a little bit of wiggle
room when we define it,

300
00:20:04,310 --> 00:20:09,110
and so that amount of wiggle
room will be of parameter

301
00:20:09,110 --> 00:20:12,140
called tol called epsilon.

302
00:20:12,140 --> 00:20:13,010
That's a tolerance.

303
00:20:13,010 --> 00:20:16,436
I'm thinking of
epsilon as a tolerance.

304
00:20:16,436 --> 00:20:21,110
That's the amount of wiggle
room I'm going to give myself.

305
00:20:21,110 --> 00:20:22,140
OK.

306
00:20:22,140 --> 00:20:27,240
And we say that the
numerical rank--

307
00:20:27,240 --> 00:20:31,350
I'll put an epsilon there
to denote numerical rank--

308
00:20:31,350 --> 00:20:34,170
is k.

309
00:20:34,170 --> 00:20:37,650
k is the first singular value,
or the last singular value,

310
00:20:37,650 --> 00:20:39,120
above epsilon.

311
00:20:39,120 --> 00:20:42,370
In the following sense, I'm
copying the definition above

312
00:20:42,370 --> 00:20:45,090
but with epsilons
instead of zeros.

313
00:20:45,090 --> 00:20:52,960
If this singular value is
less than epsilon, relatively,

314
00:20:52,960 --> 00:20:56,220
and the kth one was not below.

315
00:20:56,220 --> 00:21:00,880
So k plus 1 is the first
singular value below epsilon

316
00:21:00,880 --> 00:21:03,130
in this relative sense.

317
00:21:03,130 --> 00:21:10,480
So of course the rank of
0x, if that was defined,

318
00:21:10,480 --> 00:21:13,480
is the same as the rank of x.

319
00:21:13,480 --> 00:21:14,290
OK?

320
00:21:14,290 --> 00:21:17,540
So this is just allowing
ourselves some wiggle room.

321
00:21:17,540 --> 00:21:20,870
But this is actually what we're
interested more in practice.

322
00:21:20,870 --> 00:21:21,370
All right?

323
00:21:21,370 --> 00:21:23,560
I don't want to
necessarily send my friend

324
00:21:23,560 --> 00:21:26,145
the flag to exact precision.

325
00:21:26,145 --> 00:21:27,520
I would actually
be happy to send

326
00:21:27,520 --> 00:21:31,550
my friend the flag up to
16 digits of precision,

327
00:21:31,550 --> 00:21:32,273
for example.

328
00:21:32,273 --> 00:21:34,690
They're not going to tell the
difference between those two

329
00:21:34,690 --> 00:21:35,830
flags.

330
00:21:35,830 --> 00:21:39,550
And if I can get away with
compressing the matrix

331
00:21:39,550 --> 00:21:42,220
a lot more once I have a
little bit of wiggle room,

332
00:21:42,220 --> 00:21:44,270
that would be a good thing.

333
00:21:44,270 --> 00:21:58,250
So we know from the
Eckart and Young

334
00:21:58,250 --> 00:22:02,540
that the singular values tell
us how well we can approximate

335
00:22:02,540 --> 00:22:05,900
x by a low-rank matrix.

336
00:22:05,900 --> 00:22:13,460
In particular, we know that the
k plus 1 singular value of x

337
00:22:13,460 --> 00:22:18,350
tells us how well x can be
approximated by a rank k

338
00:22:18,350 --> 00:22:19,390
matrix.

339
00:22:19,390 --> 00:22:20,570
OK?

340
00:22:20,570 --> 00:22:26,180
For example, when the rank was
exactly k, the sigma k plus 1

341
00:22:26,180 --> 00:22:29,570
was 0, and then this
came out to be 0

342
00:22:29,570 --> 00:22:33,260
and we found that x was
exactly a rank k matrix.

343
00:22:33,260 --> 00:22:36,170
Here, because we have the
wiggle room, the epsilon,

344
00:22:36,170 --> 00:22:39,180
we get an approximation,
not an exact.

345
00:22:39,180 --> 00:22:44,330
So this is telling us how
well we can approximate

346
00:22:44,330 --> 00:22:47,330
x by a rank k matrix.

347
00:22:50,030 --> 00:22:51,290
OK?

348
00:22:51,290 --> 00:22:54,170
That's what the singular
values are telling us.

349
00:22:54,170 --> 00:22:59,480
And so this allows us to try
our best to compress matrices

350
00:22:59,480 --> 00:23:03,470
but use low-rank
approximation rather

351
00:23:03,470 --> 00:23:05,315
than doing things exactly.

352
00:23:07,930 --> 00:23:11,100
And of course, on a computer,
when we're using floating point

353
00:23:11,100 --> 00:23:16,350
arithmetic, or on a computer
because we always round numbers

354
00:23:16,350 --> 00:23:21,450
to the nearest 16-digit number,
if epsilon was 16 digits,

355
00:23:21,450 --> 00:23:23,310
your computer wouldn't
be able to tell

356
00:23:23,310 --> 00:23:29,640
the difference between
x or x the rank k

357
00:23:29,640 --> 00:23:35,040
approximation if this number
satisfied this expression.

358
00:23:35,040 --> 00:23:38,190
Your computer would think of
x and xk as the same matrix

359
00:23:38,190 --> 00:23:42,240
because it would
inevitably round both

360
00:23:42,240 --> 00:23:45,410
to epsilon, within epsilon.

361
00:23:45,410 --> 00:23:46,150
OK.

362
00:23:46,150 --> 00:23:49,400
So what kind of matrices
are numerically of low rank?

363
00:24:03,130 --> 00:24:08,620
Of course all low-rank matrices
are numerically of low rank

364
00:24:08,620 --> 00:24:16,410
because the wiggle
room can only help you

365
00:24:16,410 --> 00:24:19,230
but it's far more than that.

366
00:24:19,230 --> 00:24:21,060
There are many
full-rank matrices--

367
00:24:21,060 --> 00:24:24,570
matrices that don't have any
singular values that are zero--

368
00:24:24,570 --> 00:24:27,700
but the singular values
decay rapidly to zero.

369
00:24:27,700 --> 00:24:32,370
That are full-rank matrices
with low numerical rank because

370
00:24:32,370 --> 00:24:33,780
of the wiggle room.

371
00:24:33,780 --> 00:24:38,880
So for example, here
is the classic matrix

372
00:24:38,880 --> 00:24:43,140
that fits this regime.

373
00:24:43,140 --> 00:24:45,570
If I give you this, this is
called the Hilbert matrix.

374
00:24:51,070 --> 00:24:53,200
This is a matrix
that happens to have

375
00:24:53,200 --> 00:24:57,860
extremely low numerical
rank but it's actually

376
00:24:57,860 --> 00:25:05,560
full rank, which means that I
can approximate H by a rank k

377
00:25:05,560 --> 00:25:08,620
matrix where k is
quite small very well,

378
00:25:08,620 --> 00:25:10,880
provided you give
me some wiggle room,

379
00:25:10,880 --> 00:25:13,750
but it's not a low-rank
matrix in the sense

380
00:25:13,750 --> 00:25:16,300
that if epsilon was zero
here, you didn't allow me

381
00:25:16,300 --> 00:25:18,220
the wriggle room, all
the singular values

382
00:25:18,220 --> 00:25:20,500
of this matrix are positive.

383
00:25:20,500 --> 00:25:28,920
So it's of low numerical rank
but it's not a low-rank matrix.

384
00:25:28,920 --> 00:25:32,550
The other classical
example which

385
00:25:32,550 --> 00:25:35,370
motivated a lot of the
research in this area

386
00:25:35,370 --> 00:25:37,780
was the Vandermonde matrix.

387
00:25:37,780 --> 00:25:39,285
So here is the
Vandermonde matrix.

388
00:25:48,370 --> 00:25:50,580
An n by n version of it.

389
00:25:50,580 --> 00:25:52,110
Think of the xi's as real.

390
00:25:55,838 --> 00:25:57,578
And this is Vandermonde.

391
00:26:02,060 --> 00:26:03,680
This is the matrix
that comes up when

392
00:26:03,680 --> 00:26:08,450
you try to do polynomial
interpolation at real points.

393
00:26:08,450 --> 00:26:13,820
This is an extremely bad matrix
to deal with because it's

394
00:26:13,820 --> 00:26:17,090
numerically low rank,
and often, you actually

395
00:26:17,090 --> 00:26:21,050
want to solve a linear
system with this matrix.

396
00:26:21,050 --> 00:26:24,530
And numerical low rank implies
that it's extremely hard

397
00:26:24,530 --> 00:26:32,120
to invert, so numerical low
rank is not always good for you.

398
00:26:32,120 --> 00:26:33,020
OK?

399
00:26:33,020 --> 00:26:42,620
Often, we want the
inverse, which exists,

400
00:26:42,620 --> 00:26:56,030
but it's difficult because
V has low numerical rank.

401
00:27:03,700 --> 00:27:04,300
OK.

402
00:27:04,300 --> 00:27:06,280
So people have been
trying to understand

403
00:27:06,280 --> 00:27:09,400
why these matrices
are numerically

404
00:27:09,400 --> 00:27:12,220
of low rank for a
number of years,

405
00:27:12,220 --> 00:27:16,450
and the classic
reason why there are

406
00:27:16,450 --> 00:27:21,040
so many low-rank matrices is
because the world is smooth,

407
00:27:21,040 --> 00:27:22,240
as people say.

408
00:27:22,240 --> 00:27:25,630
They say, the world is smooth.

409
00:27:25,630 --> 00:27:32,570
That's why matrices are
of numerical low rank.

410
00:27:32,570 --> 00:27:38,710
And to illustrate that
point, I will do an example.

411
00:27:38,710 --> 00:27:41,140
So this is
classically understood

412
00:27:41,140 --> 00:27:50,150
by a man called Reade
in 1983, and this

413
00:27:50,150 --> 00:27:51,740
is what his reason was.

414
00:27:51,740 --> 00:27:54,090
I have a picture of John Reade.

415
00:27:54,090 --> 00:27:56,780
He's not very famous,
so I try to make

416
00:27:56,780 --> 00:28:00,492
sure his picture gets around.

417
00:28:00,492 --> 00:28:01,450
He's playing the piano.

418
00:28:01,450 --> 00:28:04,400
It's, like, one of the only
pictures I could find of him.

419
00:28:04,400 --> 00:28:06,830
So what is in this reason?

420
00:28:06,830 --> 00:28:08,520
Why do people say this?

421
00:28:08,520 --> 00:28:12,830
Well here's an example
that illustrates it.

422
00:28:12,830 --> 00:28:19,930
If I take a polynomial
in two variables and I--

423
00:28:19,930 --> 00:28:23,050
for example, this is a
polynomial of two variables--

424
00:28:23,050 --> 00:28:27,340
and my x matrix
comes from sampling

425
00:28:27,340 --> 00:28:30,580
that polynomial integers--

426
00:28:30,580 --> 00:28:38,940
for example, this matrix--

427
00:28:38,940 --> 00:28:41,730
then that matrix happens
to be of low rank--

428
00:28:44,940 --> 00:28:50,250
mathematically of low rank,
with epsilon equals zero.

429
00:28:50,250 --> 00:28:50,790
Why is that?

430
00:28:50,790 --> 00:28:54,480
Well if I write down x
in terms of matrices,

431
00:28:54,480 --> 00:28:56,220
you could easily see it.

432
00:28:56,220 --> 00:29:00,120
So this is made up of
a matrix of all ones

433
00:29:00,120 --> 00:29:11,160
plus a matrix of j-- so that's
1, 2, up to n, 1, 2, up to n,

434
00:29:11,160 --> 00:29:12,750
because every entry
of that matrix

435
00:29:12,750 --> 00:29:15,500
just depends on the row index.

436
00:29:15,500 --> 00:29:18,730
And then this guy
depends on both j and k.

437
00:29:18,730 --> 00:29:21,330
So this is a multiplication
table, right?

438
00:29:21,330 --> 00:29:31,635
So this is n, 2, 4, up
to 2n, n, 2n, n squared.

439
00:29:31,635 --> 00:29:34,050
OK.

440
00:29:34,050 --> 00:29:38,130
Clearly, the matrix of all
ones is a rank one matrix.

441
00:29:42,260 --> 00:29:43,560
The same with this guy.

442
00:29:43,560 --> 00:29:47,220
The column space is
just of dimension one.

443
00:29:47,220 --> 00:29:51,960
And the last guy also
happens to be of rank one

444
00:29:51,960 --> 00:29:58,365
because I can write this
matrix in rank one form, which

445
00:29:58,365 --> 00:30:03,250
is a column vector
times a row vector.

446
00:30:03,250 --> 00:30:04,090
OK.

447
00:30:04,090 --> 00:30:07,280
So this matrix x
is of rank three.

448
00:30:13,710 --> 00:30:16,620
I guess at lowest rank three
is what I've actually shown.

449
00:30:16,620 --> 00:30:17,120
OK.

450
00:30:20,000 --> 00:30:23,480
Now of course this hasn't got
to numerical low rank yet,

451
00:30:23,480 --> 00:30:24,820
so let's get ourselves there.

452
00:30:28,690 --> 00:30:32,160
So Reade knew this, and
he said to himself, OK,

453
00:30:32,160 --> 00:30:35,590
well if I can approximate--

454
00:30:35,590 --> 00:30:38,800
if x is actually coming
from sampling a function,

455
00:30:38,800 --> 00:30:41,890
and I approximate that
function by polynomial,

456
00:30:41,890 --> 00:30:45,670
then I'm going to get myself
a low-rank approximation

457
00:30:45,670 --> 00:30:48,920
and get a bound on
the numerical rank.

458
00:30:48,920 --> 00:30:56,620
So in general, if I give you
a polynomial of two variables,

459
00:30:56,620 --> 00:30:58,630
which can be written down--

460
00:30:58,630 --> 00:31:04,000
it's degree n in both x and y.

461
00:31:04,000 --> 00:31:07,375
Let's just keep these indexes
away from the matrix index.

462
00:31:10,360 --> 00:31:14,170
I give you this such
polynomial, and I go away

463
00:31:14,170 --> 00:31:22,150
and I sample it and
make a matrix X, then X,

464
00:31:22,150 --> 00:31:24,220
by looking at each term
individually like I

465
00:31:24,220 --> 00:31:30,520
did there, will have
low rank mathematically,

466
00:31:30,520 --> 00:31:31,830
with epsilon equals zero.

467
00:31:31,830 --> 00:31:35,590
This will have, at
most, m squared rank,

468
00:31:35,590 --> 00:31:39,160
and if m is 3 or 4
or 10, it possibly

469
00:31:39,160 --> 00:31:43,570
could be low because this
X could be a large matrix.

470
00:31:43,570 --> 00:31:44,320
OK.

471
00:31:44,320 --> 00:31:47,020
So what Reade did for the
Hilbert matrix was said,

472
00:31:47,020 --> 00:31:49,270
OK, well look at that guy.

473
00:31:49,270 --> 00:31:52,193
That guy looks like it's
sampling a function.

474
00:31:52,193 --> 00:31:53,860
It looks like it's
sampling the function

475
00:31:53,860 --> 00:31:57,170
1 over x plus y minus 1.

476
00:31:57,170 --> 00:32:02,650
So he said to
himself, well, that x,

477
00:32:02,650 --> 00:32:07,540
if I look at the Hilbert
matrix, then that

478
00:32:07,540 --> 00:32:09,600
is sampling a function.

479
00:32:09,600 --> 00:32:13,270
It happens to not
be a polynomial.

480
00:32:13,270 --> 00:32:16,480
It happens to be this function.

481
00:32:16,480 --> 00:32:20,950
But that's OK because sampling
polynomials, integers,

482
00:32:20,950 --> 00:32:22,970
gives me low rank exactly.

483
00:32:22,970 --> 00:32:27,670
Maybe sampling smooth
functions, functions like this,

484
00:32:27,670 --> 00:32:29,770
can be well approximated
by polynomials

485
00:32:29,770 --> 00:32:32,680
and therefore have
low numerical rank.

486
00:32:32,680 --> 00:32:34,900
And that's what he
did in this case.

487
00:32:34,900 --> 00:32:42,790
So he tried to find a p, a
polynomial approximation to f.

488
00:32:42,790 --> 00:32:45,610
In particular, he looked
at exactly this kind

489
00:32:45,610 --> 00:32:46,994
of approximation.

490
00:32:50,870 --> 00:32:54,320
So he has some numbers
here so that things

491
00:32:54,320 --> 00:32:55,805
get dissolved later.

492
00:32:55,805 --> 00:33:01,220
And he tried to find a p that
did this kind of approximation.

493
00:33:01,220 --> 00:33:03,200
So this approximates f.

494
00:33:08,560 --> 00:33:14,080
And then he would develop a
low-rank approximation to X

495
00:33:14,080 --> 00:33:16,780
by sampling p.

496
00:33:16,780 --> 00:33:26,260
So he would say, OK, well if I
let y be a sampling of p, then

497
00:33:26,260 --> 00:33:29,620
from the fact that f is a
good approximation to p,

498
00:33:29,620 --> 00:33:34,590
y is a good approximation to
X. And so this has finite rank.

499
00:33:38,198 --> 00:33:43,100
He wrote down that
this must hold.

500
00:33:46,160 --> 00:33:49,500
And the epsilon comes out
here because these factors

501
00:33:49,500 --> 00:33:51,390
were chosen just right.

502
00:33:51,390 --> 00:33:54,910
The divide by n was chosen so
that the epsilon came out just

503
00:33:54,910 --> 00:33:55,960
there.

504
00:33:55,960 --> 00:33:56,460
OK?

505
00:33:56,460 --> 00:33:59,340
So, for many years, that was
kind of the canonical reason

506
00:33:59,340 --> 00:34:01,600
that people would
give, that, well,

507
00:34:01,600 --> 00:34:05,920
if the matrix X is sampled
from a smooth function,

508
00:34:05,920 --> 00:34:10,440
then we can approximate our
function by a polynomial

509
00:34:10,440 --> 00:34:15,060
and get polynomial
rank approximations.

510
00:34:15,060 --> 00:34:18,929
And therefore, the matrix X
will be of low numerical rank.

511
00:34:22,310 --> 00:34:26,630
There's an issue
with this reasoning,

512
00:34:26,630 --> 00:34:28,489
especially for the
Hilbert matrix,

513
00:34:28,489 --> 00:34:31,710
that it doesn't
actually work that well.

514
00:34:31,710 --> 00:34:38,710
So for example, if I take the
1,000 by 1,000 Hilbert matrix

515
00:34:38,710 --> 00:34:41,072
and I look at its rank--

516
00:34:41,072 --> 00:34:45,040
OK, well I've already told
you this is full rank.

517
00:34:45,040 --> 00:34:46,650
You'll get 1,000.

518
00:34:46,650 --> 00:34:50,929
All the singular
values are positive.

519
00:34:50,929 --> 00:34:55,980
If I look at the numerical
rank of this 1,000

520
00:34:55,980 --> 00:35:00,480
by 1,000 Hilbert matrix and I
compute it, I compute the SVD

521
00:35:00,480 --> 00:35:06,600
and I look at how many are above
epsilon where epsilon is 10

522
00:35:06,600 --> 00:35:10,620
to the minus 15,
so that means I can

523
00:35:10,620 --> 00:35:13,650
approximate the 1,000
by 1,000 Hilbert matrix

524
00:35:13,650 --> 00:35:18,750
by a rank 28 matrix
and only give up

525
00:35:18,750 --> 00:35:24,450
15-- there will be exact 15
digits, which is a huge amount.

526
00:35:24,450 --> 00:35:27,030
So this is what we
get in practice,

527
00:35:27,030 --> 00:35:42,670
but Reade's argument here shows
that the rank of this matrix,

528
00:35:42,670 --> 00:35:45,980
the numerical rank, is at most.

529
00:35:49,220 --> 00:35:53,210
So it doesn't do a very good
job on the Hilbert matrix

530
00:35:53,210 --> 00:35:57,220
for bounding the rank, right?

531
00:35:57,220 --> 00:36:00,100
So Reade comes along,
takes this function.

532
00:36:00,100 --> 00:36:02,520
He tries to find a polynomial
that does this, where

533
00:36:02,520 --> 00:36:04,480
epsilon is 10 to the minus 15.

534
00:36:04,480 --> 00:36:07,240
He finds that the
number of terms

535
00:36:07,240 --> 00:36:13,570
that he needs in this
expression here is around 719,

536
00:36:13,570 --> 00:36:16,580
and therefore, that's
the rank that he gets.

537
00:36:16,580 --> 00:36:19,300
The bound on the numerical rank.

538
00:36:19,300 --> 00:36:25,120
The trouble is that 719
tells us that this is not

539
00:36:25,120 --> 00:36:27,820
of low numerical
rank, but we know

540
00:36:27,820 --> 00:36:32,450
it is, so it's an
unsatisfactory reason.

541
00:36:32,450 --> 00:36:36,690
So there's been
several people trying

542
00:36:36,690 --> 00:36:39,900
to come up with more
appropriate reasons that

543
00:36:39,900 --> 00:36:44,190
explain the 28 here.

544
00:36:44,190 --> 00:36:50,220
And so one reason that
I've started to use

545
00:36:50,220 --> 00:36:52,710
is another slightly
different way

546
00:36:52,710 --> 00:36:57,990
of looking at things, which is
to say the world is Sylvester.

547
00:37:03,690 --> 00:37:10,420
Now Sylvester, what
does that mean?

548
00:37:10,420 --> 00:37:13,010
What does the word
"Sylvester" mean in this case?

549
00:37:13,010 --> 00:37:14,950
It means that the
matrices satisfy

550
00:37:14,950 --> 00:37:20,080
a certain type of equation
called the Sylvester equation,

551
00:37:20,080 --> 00:37:25,420
and so the reason is really,
many of these matrices

552
00:37:25,420 --> 00:37:30,220
satisfy a Sylvester equation,
and that takes the form--

553
00:37:36,270 --> 00:37:44,190
for sum A, B, and C.

554
00:37:44,190 --> 00:37:44,690
OK.

555
00:37:44,690 --> 00:37:46,580
So X is your matrix of interest.

556
00:37:46,580 --> 00:37:50,690
You want to show X is
of numerical low rank.

557
00:37:50,690 --> 00:37:54,770
And the task at hand is
to find an A, B, and C so

558
00:37:54,770 --> 00:37:58,880
that X satisfies that equation.

559
00:37:58,880 --> 00:38:00,030
OK.

560
00:38:00,030 --> 00:38:05,450
For example, the two matrices
I've had on the board

561
00:38:05,450 --> 00:38:09,360
satisfy a Sylvester equation--

562
00:38:09,360 --> 00:38:10,780
a Sylvester matrix equation.

563
00:38:10,780 --> 00:38:14,490
There is an A, a B, and a
C for which they do this.

564
00:38:14,490 --> 00:38:17,677
For example, remember
the Hilbert matrix,

565
00:38:17,677 --> 00:38:20,010
which we have there still,
but I'll write it down again.

566
00:38:24,710 --> 00:38:26,770
Has these entries.

567
00:38:26,770 --> 00:38:28,770
So all we need to do
is to try to figure out

568
00:38:28,770 --> 00:38:32,760
an A, a B, and then a
C so that we can make

569
00:38:32,760 --> 00:38:34,230
it fit a Sylvester equation.

570
00:38:34,230 --> 00:38:36,820
There's many different
ways of doing this.

571
00:38:36,820 --> 00:38:41,050
The one that I like
is the following,

572
00:38:41,050 --> 00:38:45,660
where if I put 1/2
here and 3/2 here,

573
00:38:45,660 --> 00:38:51,245
all the way down to n minus
1/2, times this matrix--

574
00:38:53,850 --> 00:38:59,080
so this is timesing the
top of this matrix by 1/2

575
00:38:59,080 --> 00:39:02,050
and then 3/2 and then 5/2.

576
00:39:02,050 --> 00:39:05,620
So we're basically timesing
each entry of this matrix

577
00:39:05,620 --> 00:39:07,435
by j minus 1/2.

578
00:39:10,510 --> 00:39:12,040
And then I do
something on the right

579
00:39:12,040 --> 00:39:14,582
here, which I'm allowed to do
because I've got the B freedom,

580
00:39:14,582 --> 00:39:19,280
and I choose this to be the
same up to a minus sign.

581
00:39:23,680 --> 00:39:26,830
Then when you think about
this, what is it doing?

582
00:39:26,830 --> 00:39:30,460
It's timing the jk entry--

583
00:39:30,460 --> 00:39:33,370
this is-- by j minus 1/2.

584
00:39:33,370 --> 00:39:34,840
That's what this is doing.

585
00:39:34,840 --> 00:39:37,570
And what's this doing
is timesing the jk entry

586
00:39:37,570 --> 00:39:40,420
by k minus 1/2.

587
00:39:40,420 --> 00:39:44,590
So this is, in total,
timesing the jk entry

588
00:39:44,590 --> 00:39:49,700
by j plus k minus 1/2 minus
1/2, which is minus 1,

589
00:39:49,700 --> 00:39:54,810
so this is timesing the jk
entry by j plus k minus 1.

590
00:39:54,810 --> 00:39:57,690
So it knocks out
the denominator.

591
00:39:57,690 --> 00:40:02,260
And what we get from this
equation is a bunch of ones.

592
00:40:11,050 --> 00:40:14,080
So in this case, A
and B are diagonal,

593
00:40:14,080 --> 00:40:17,250
and C is the matrix of all ones.

594
00:40:17,250 --> 00:40:17,920
OK?

595
00:40:17,920 --> 00:40:20,590
We can also do this
for Vandermonde.

596
00:40:20,590 --> 00:40:25,960
So Vandermonde, you'll
remember, looks like this.

597
00:40:30,970 --> 00:40:35,610
And then over here,
we have this guy,

598
00:40:35,610 --> 00:40:40,910
the matrix that appears with
polynomial interpolation.

599
00:40:40,910 --> 00:40:41,410
OK.

600
00:40:41,410 --> 00:40:44,530
So if I think about
this, I could also

601
00:40:44,530 --> 00:40:50,200
come up with an A, B, and
C, and for example, here's

602
00:40:50,200 --> 00:40:52,180
one that works.

603
00:40:52,180 --> 00:40:55,360
I can stick the x's
on the diagonal.

604
00:40:59,740 --> 00:41:03,850
So if you imagine what that
matrix on the left is doing,

605
00:41:03,850 --> 00:41:08,090
it's timesing each
column by the vector x.

606
00:41:08,090 --> 00:41:08,590
OK?

607
00:41:08,590 --> 00:41:12,940
So the first column of this
matrix becomes x, the vector x.

608
00:41:12,940 --> 00:41:16,630
The second becomes
the vector x squared,

609
00:41:16,630 --> 00:41:18,670
where squared is
done entry-wise.

610
00:41:18,670 --> 00:41:21,140
And then the third
entry is now x cubed,

611
00:41:21,140 --> 00:41:24,420
and when we get to the
last, it's x to the n.

612
00:41:24,420 --> 00:41:24,920
OK?

613
00:41:24,920 --> 00:41:30,080
So that's like, multiply
each column by the vector x.

614
00:41:30,080 --> 00:41:32,480
So if I want to try to
come up with a matrix--

615
00:41:32,480 --> 00:41:36,680
so what's left is of low
rank, is like of this form.

616
00:41:36,680 --> 00:41:40,520
What I can do is
shift the columns.

617
00:41:40,520 --> 00:41:43,580
So I've noticed that
this product here,

618
00:41:43,580 --> 00:41:46,688
this diagonal matrix, has
made the first column x.

619
00:41:46,688 --> 00:41:48,230
So if I want to kill
off that column,

620
00:41:48,230 --> 00:41:52,187
I can take the second column and
permute it to the first column.

621
00:41:52,187 --> 00:41:54,020
I could take the third
column and permute it

622
00:41:54,020 --> 00:41:56,810
to the second, the last
column and permute it

623
00:41:56,810 --> 00:41:58,910
to the penultimate column here.

624
00:41:58,910 --> 00:42:00,590
And that will actually
kill off a lot

625
00:42:00,590 --> 00:42:03,710
of what I've created in
this matrix right here.

626
00:42:03,710 --> 00:42:05,300
So let me write that down.

627
00:42:05,300 --> 00:42:08,510
This is a circumshift matrix.

628
00:42:08,510 --> 00:42:10,280
This does that permutation.

629
00:42:17,350 --> 00:42:18,393
I've put a minus 1 there.

630
00:42:18,393 --> 00:42:19,810
I could have put
any number there.

631
00:42:19,810 --> 00:42:22,440
It doesn't make any difference.

632
00:42:22,440 --> 00:42:25,120
But this is the one that
works out extremely nicely.

633
00:42:25,120 --> 00:42:29,170
Now this zeros out lots of
things because of the way

634
00:42:29,170 --> 00:42:32,005
I've done the multiplication
by x and the circumshift

635
00:42:32,005 --> 00:42:33,790
of the columns.

636
00:42:33,790 --> 00:42:39,010
And so the first column is zero
because this first column is x,

637
00:42:39,010 --> 00:42:43,240
this first column is x,
so I've got x minus x.

638
00:42:43,240 --> 00:42:47,170
This column was x squared
minus x squared, so I got zero,

639
00:42:47,170 --> 00:42:51,430
and I just keep going along
until that last column.

640
00:42:51,430 --> 00:42:54,040
That last column is a problem
because the last column

641
00:42:54,040 --> 00:42:57,280
of this guy is x
to the n, whereas I

642
00:42:57,280 --> 00:43:02,240
don't have x to the n in V, so
there are some numbers here.

643
00:43:02,240 --> 00:43:02,740
OK.

644
00:43:05,940 --> 00:43:08,550
You'll notice that
C in both cases

645
00:43:08,550 --> 00:43:11,070
happens to be a low-rank matrix.

646
00:43:11,070 --> 00:43:14,661
In these cases, it
happens to be of rank one.

647
00:43:14,661 --> 00:43:19,290
And so people were
wondering, maybe it's

648
00:43:19,290 --> 00:43:23,040
something to do with satisfying
these kind of equations that

649
00:43:23,040 --> 00:43:27,180
makes these matrices
that appear in practice

650
00:43:27,180 --> 00:43:29,850
numerically of low rank.

651
00:43:29,850 --> 00:43:33,630
And after a lot of
work in this area,

652
00:43:33,630 --> 00:43:37,740
people have come up
with a bound that

653
00:43:37,740 --> 00:43:42,360
demonstrates that
these kind of equations

654
00:43:42,360 --> 00:43:46,470
are key to understanding
numerical low rank.

655
00:43:46,470 --> 00:44:04,750
So if X satisfies a Sylvester
equation, like this, and A

656
00:44:04,750 --> 00:44:06,993
is normal, B is normal--

657
00:44:06,993 --> 00:44:08,410
I don't really
want to concentrate

658
00:44:08,410 --> 00:44:12,590
on those two conditions.

659
00:44:12,590 --> 00:44:17,230
It's a little bit academic.

660
00:44:17,230 --> 00:44:21,430
Then-- people have found
a bound on the singular

661
00:44:21,430 --> 00:44:24,010
values of any matrix
that satisfies

662
00:44:24,010 --> 00:44:27,970
this kind of
expression, and they

663
00:44:27,970 --> 00:44:30,460
found this following bound.

664
00:44:41,829 --> 00:44:49,240
OK, so here, the rank of C is r.

665
00:44:49,240 --> 00:44:50,180
So that goes there.

666
00:44:50,180 --> 00:44:52,360
So in our cases, the
two examples we have,

667
00:44:52,360 --> 00:44:55,990
r is 1, so we can
forget about r.

668
00:44:55,990 --> 00:45:02,930
This nasty guy here is
called the Zolotarev number.

669
00:45:07,010 --> 00:45:12,814
E is a set that contains
the eigenvalues of A,

670
00:45:12,814 --> 00:45:23,270
and F is a set that contains
the eigenvalues of B. OK.

671
00:45:23,270 --> 00:45:26,480
Now it looks like we have
gained absolutely nothing

672
00:45:26,480 --> 00:45:30,260
by this bound, because I've
just told you singular values

673
00:45:30,260 --> 00:45:32,540
are bound by Zolotarev numbers.

674
00:45:32,540 --> 00:45:35,450
That doesn't mean
anything to anyone.

675
00:45:35,450 --> 00:45:38,960
It means a little bit
to me but not that much.

676
00:45:38,960 --> 00:45:42,020
So the key to this bound--

677
00:45:42,020 --> 00:45:43,970
the reason this is useful--

678
00:45:43,970 --> 00:45:49,220
is that so many people have
worked out what these Zolotarev

679
00:45:49,220 --> 00:45:52,190
numbers actually mean.

680
00:45:52,190 --> 00:45:52,700
OK?

681
00:45:52,700 --> 00:45:57,620
So these are two key
people that worked out

682
00:45:57,620 --> 00:45:59,360
what this bound means.

683
00:45:59,360 --> 00:46:02,600
And we have gained
a lot because people

684
00:46:02,600 --> 00:46:04,880
have been studying this number.

685
00:46:04,880 --> 00:46:06,740
This is, like, a
number that people

686
00:46:06,740 --> 00:46:11,600
cared about from 1870
onwards to the present day,

687
00:46:11,600 --> 00:46:14,870
and people have studied
this number extremely well.

688
00:46:14,870 --> 00:46:17,510
So we've gained
something by turning it

689
00:46:17,510 --> 00:46:21,290
into a more abstract problem
that people have thought

690
00:46:21,290 --> 00:46:23,990
about previously,
and now we can go

691
00:46:23,990 --> 00:46:26,450
to the literature on
Zolotarev numbers,

692
00:46:26,450 --> 00:46:30,980
whatever they are, and discover
this whole literature of work

693
00:46:30,980 --> 00:46:32,900
on this Zolotarev number.

694
00:46:32,900 --> 00:46:34,433
And the key part--

695
00:46:34,433 --> 00:46:35,600
I'll just tell you the key--

696
00:46:41,240 --> 00:46:44,690
is that the sets E
and F are separated.

697
00:46:53,960 --> 00:46:57,905
So for example, in the Hilbert
matrix, the eigenvalues of A

698
00:46:57,905 --> 00:46:59,135
can be read off the diagonal.

699
00:47:05,540 --> 00:47:06,155
What are they?

700
00:47:06,155 --> 00:47:13,550
They are between minus
1/2 and n minus 1/2.

701
00:47:13,550 --> 00:47:20,540
And the eigenvalues of B
lie in the set minus 1/2

702
00:47:20,540 --> 00:47:23,524
minus n plus 1/2.

703
00:47:23,524 --> 00:47:27,610
And the key reason
why the Hilbert matrix

704
00:47:27,610 --> 00:47:30,610
is of low numerical
rank is the fact

705
00:47:30,610 --> 00:47:33,470
that these two
sets are separated,

706
00:47:33,470 --> 00:47:36,580
and that makes this Zolotarev
number gets small extremely

707
00:47:36,580 --> 00:47:38,950
quickly with k.

708
00:47:38,950 --> 00:47:41,080
Now you might wonder
why there is a question

709
00:47:41,080 --> 00:47:44,980
mark on Penzl's name.

710
00:47:44,980 --> 00:47:49,690
There is an unofficial
curse that's

711
00:47:49,690 --> 00:47:51,430
been going on for a while.

712
00:47:51,430 --> 00:47:54,730
Both these men died while
working on the Zolotarev

713
00:47:54,730 --> 00:47:55,870
problem.

714
00:47:55,870 --> 00:48:00,270
They both died at the age of 31.

715
00:48:00,270 --> 00:48:03,990
One died by being hit
by a train, Zolotarev.

716
00:48:03,990 --> 00:48:08,660
It's unclear whether he was
suicidal or it was accidental.

717
00:48:08,660 --> 00:48:13,440
Penzl died at the age of 31
in the Canadian mountains

718
00:48:13,440 --> 00:48:16,030
by an avalanche.

719
00:48:16,030 --> 00:48:21,130
I am currently not yet 31
but going to be 31 very soon,

720
00:48:21,130 --> 00:48:23,830
and I'm scared that
I may join this list.

721
00:48:27,860 --> 00:48:29,020
OK.

722
00:48:29,020 --> 00:48:32,770
But for the Hilbert matrix,
what you get from this analysis,

723
00:48:32,770 --> 00:48:36,520
based on these
two peoples' work,

724
00:48:36,520 --> 00:48:39,700
is a bound on the
numerical rank.

725
00:48:39,700 --> 00:48:43,150
And the rank that you
get is, let's say,

726
00:48:43,150 --> 00:48:45,820
a world record bound.

727
00:48:45,820 --> 00:48:56,260
For the Hilbert matrix is 34,
which is not quite 28, not yet,

728
00:48:56,260 --> 00:49:02,860
but it's far more
descriptive of 28 than 719.

729
00:49:02,860 --> 00:49:07,420
And so this technique of
bounding singular values

730
00:49:07,420 --> 00:49:11,530
by using these Zolotarev numbers
is starting to gain popularity

731
00:49:11,530 --> 00:49:15,940
because we can finally answer
to ourselves why there are so

732
00:49:15,940 --> 00:49:20,420
many low-rank matrices that
appear in computational math.

733
00:49:20,420 --> 00:49:27,440
And it's all based on two
31-year-olds that died.

734
00:49:27,440 --> 00:49:30,710
And so if you ever
wonder when you're

735
00:49:30,710 --> 00:49:33,830
doing computational science
when a low rank appears

736
00:49:33,830 --> 00:49:36,710
and the smoothness argument
does not work for you,

737
00:49:36,710 --> 00:49:40,990
you might like to think about
Zolotarev and the curse.

738
00:49:40,990 --> 00:49:42,721
OK, thank you very much.

739
00:49:42,721 --> 00:49:44,605
[APPLAUSE]

740
00:49:47,431 --> 00:49:49,790
GILBERT STRANG: Thank you
[INAUDIBLE] Excellent.

741
00:49:49,790 --> 00:49:51,880
ALEX TOWNSEND: How
does it work now?

742
00:49:51,880 --> 00:49:53,680
GILBERT STRANG: We're good.

743
00:49:53,680 --> 00:49:54,180
Yeah.

744
00:49:54,180 --> 00:49:55,930
ALEX TOWNSEND: I'm
happy to take questions

745
00:49:55,930 --> 00:49:57,920
if we have a minute, if
you have any questions.

746
00:49:57,920 --> 00:49:59,545
GILBERT STRANG: How
near of 31 are you?

747
00:50:02,085 --> 00:50:03,960
ALEX TOWNSEND: [INAUDIBLE]
I get a spotlight.

748
00:50:03,960 --> 00:50:05,490
I'm 31 in December.

749
00:50:05,490 --> 00:50:06,330
GILBERT STRANG: Wow.

750
00:50:06,330 --> 00:50:07,170
OK.

751
00:50:07,170 --> 00:50:10,920
ALEX TOWNSEND: So they died
at the age of 31, so you know,

752
00:50:10,920 --> 00:50:14,640
next year is the
scary year for me.

753
00:50:14,640 --> 00:50:16,550
So I'm not driving anywhere.

754
00:50:16,550 --> 00:50:20,898
I'm not leaving my
house until I become 32.

755
00:50:20,898 --> 00:50:22,690
GILBERT STRANG: Well,
thank you [INAUDIBLE]

756
00:50:22,690 --> 00:50:24,190
ALEX TOWNSEND: OK, thanks.

757
00:50:24,190 --> 00:50:26,640
[APPLAUSE]