1
00:00:01,550 --> 00:00:03,920
The following content is
provided under a Creative

2
00:00:03,920 --> 00:00:05,310
Commons license.

3
00:00:05,310 --> 00:00:07,520
Your support will help
MIT OpenCourseWare

4
00:00:07,520 --> 00:00:11,610
continue to offer high-quality
educational resources for free.

5
00:00:11,610 --> 00:00:14,180
To make a donation or to
view additional materials

6
00:00:14,180 --> 00:00:18,140
from hundreds of MIT courses,
visit MIT OpenCourseWare

7
00:00:18,140 --> 00:00:19,512
at ocw.mit.edu.

8
00:00:22,470 --> 00:00:24,750
GILBERT STRANG: OK, let
me start one minute early.

9
00:00:28,300 --> 00:00:35,140
So this being MIT, I just came
from a terrific faculty member,

10
00:00:35,140 --> 00:00:41,200
Andy Lowe in the Sloan
School, and I have

11
00:00:41,200 --> 00:00:43,070
to tell you what he told us.

12
00:00:43,070 --> 00:00:48,250
And then I had to leave before
he could explain why it's true,

13
00:00:48,250 --> 00:00:52,300
but this is like an amazing fact
which I don't want to forget,

14
00:00:52,300 --> 00:00:54,730
so here you go.

15
00:00:54,730 --> 00:00:57,380
Everything will
be on that board.

16
00:00:57,380 --> 00:01:06,010
So it's an observation about us
or other people, maybe not us.

17
00:01:06,010 --> 00:01:09,730
So suppose you
have a biased coin.

18
00:01:09,730 --> 00:01:13,870
Maybe the people playing
this game don't know,

19
00:01:13,870 --> 00:01:17,950
but it's 75% likely
to produce heads,

20
00:01:17,950 --> 00:01:22,180
25% likely to produce tails.

21
00:01:22,180 --> 00:01:27,520
And then the player has
to guess for one flip

22
00:01:27,520 --> 00:01:36,860
after another heads or tails,
and you get $1 if you're right,

23
00:01:36,860 --> 00:01:39,530
you pay $1 if you're wrong.

24
00:01:39,530 --> 00:01:43,700
So you just want to get
as many right choices

25
00:01:43,700 --> 00:01:48,510
as possible from this
coin flip that continues.

26
00:01:48,510 --> 00:01:51,120
So what should you do?

27
00:01:51,120 --> 00:01:58,370
Well what I hope we
would do is we would not

28
00:01:58,370 --> 00:02:03,740
know what the probabilities
were, so we would guess maybe

29
00:02:03,740 --> 00:02:05,870
heads the first time,
tails the second time,

30
00:02:05,870 --> 00:02:10,590
heads the third time, and so on.

31
00:02:10,590 --> 00:02:15,270
But the actual result
would be mostly heads,

32
00:02:15,270 --> 00:02:20,520
so we would learn at some point
that-- maybe not quite as soon

33
00:02:20,520 --> 00:02:21,330
as that.

34
00:02:21,330 --> 00:02:24,990
We would eventually
learn that we should

35
00:02:24,990 --> 00:02:26,610
keep guessing heads, right?

36
00:02:26,610 --> 00:02:30,430
And that would be
our optimal strategy,

37
00:02:30,430 --> 00:02:32,500
to guess heads all the time.

38
00:02:32,500 --> 00:02:36,620
But what do people actually do?

39
00:02:36,620 --> 00:02:43,950
They start like
this, the same way,

40
00:02:43,950 --> 00:02:46,020
and then they're
beginning to learn

41
00:02:46,020 --> 00:02:48,730
that heads is more common.

42
00:02:48,730 --> 00:02:55,460
So maybe they do more
heads than tails,

43
00:02:55,460 --> 00:02:59,440
but sometimes tails is right,
and then after a little while,

44
00:02:59,440 --> 00:03:04,430
they maybe see that it's-- yeah.

45
00:03:04,430 --> 00:03:07,970
Well maybe they're not
counting, they're just

46
00:03:07,970 --> 00:03:11,060
operating like ordinary people.

47
00:03:11,060 --> 00:03:16,270
And what do ordinary people
actually do in the long run?

48
00:03:16,270 --> 00:03:20,270
You would think guess
heads every time, right?

49
00:03:20,270 --> 00:03:22,530
But they don't.

50
00:03:22,530 --> 00:03:26,130
In the long run, people
and maybe animals

51
00:03:26,130 --> 00:03:31,740
and whatever guess heads three
quarters of the time and tails

52
00:03:31,740 --> 00:03:33,300
one quarter of the time.

53
00:03:33,300 --> 00:03:35,310
Isn't that unbelievable?

54
00:03:35,310 --> 00:03:38,040
They're guessing tails
a quarter of the time

55
00:03:38,040 --> 00:03:42,130
when the odds are
never changing.

56
00:03:42,130 --> 00:03:48,000
Anyway, that's something that
economists and other people

57
00:03:48,000 --> 00:03:52,080
have to explain, and if I had
been able to stay another hour,

58
00:03:52,080 --> 00:03:54,230
I could tell you
about the explanation.

59
00:03:57,080 --> 00:03:58,890
Oh, I see I've written
that on a board

60
00:03:58,890 --> 00:04:02,440
that I have no way to bury,
so it's going to be there,

61
00:04:02,440 --> 00:04:05,880
and it's not the
subject of 18.065

62
00:04:05,880 --> 00:04:09,060
but it's kind of amazing.

63
00:04:09,060 --> 00:04:12,930
All right, so there's good
math problems everywhere.

64
00:04:12,930 --> 00:04:13,650
OK.

65
00:04:13,650 --> 00:04:20,339
Can I just leave you with what
I know, and if I learn more,

66
00:04:20,339 --> 00:04:22,510
I'll come back to that question.

67
00:04:22,510 --> 00:04:24,780
OK.

68
00:04:24,780 --> 00:04:28,630
Please turn attention
this way, right?

69
00:04:28,630 --> 00:04:29,870
Norms.

70
00:04:29,870 --> 00:04:33,330
A few words on norms,
like that should

71
00:04:33,330 --> 00:04:35,730
be a word in your language.

72
00:04:35,730 --> 00:04:37,740
And so you should
know what it means

73
00:04:37,740 --> 00:04:41,610
and you should know a few
of the important norms.

74
00:04:41,610 --> 00:04:44,880
Again, a norm is
a way to measure

75
00:04:44,880 --> 00:04:49,740
the size of a vector or the
size of a matrix or the size

76
00:04:49,740 --> 00:04:52,200
of a tensor, whatever we have.

77
00:04:52,200 --> 00:04:53,100
Or a function.

78
00:04:53,100 --> 00:04:54,490
Very important.

79
00:04:54,490 --> 00:04:58,800
A norm would be a
function like sine x.

80
00:04:58,800 --> 00:05:02,760
From 0 to pi, what would be
the size of that function?

81
00:05:02,760 --> 00:05:06,970
Well if it was 2 sine x, the
size would be twice as much,

82
00:05:06,970 --> 00:05:09,420
so the norm should reflect that.

83
00:05:12,900 --> 00:05:17,610
So yesterday, or Wednesday,
I told you that--

84
00:05:17,610 --> 00:05:25,050
so p equals 2, 1,
actually infinity,

85
00:05:25,050 --> 00:05:29,490
and then I'm going to put in
the 0 norm with a question mark

86
00:05:29,490 --> 00:05:33,270
because you'll see
that it has a problem.

87
00:05:33,270 --> 00:05:36,100
But let me just
recall from last time.

88
00:05:36,100 --> 00:05:43,110
So p equal to 2 is the usual
sum of squares square root.

89
00:05:43,110 --> 00:05:45,090
Usual length of a vector.

90
00:05:45,090 --> 00:05:50,850
p equal 1 is this
very important norm,

91
00:05:50,850 --> 00:05:55,240
so I would call
that the l1 norm,

92
00:05:55,240 --> 00:05:58,690
and we'll see a lot of that.

93
00:05:58,690 --> 00:06:02,200
I mentioned that it plays
a very significant part now

94
00:06:02,200 --> 00:06:03,730
in compressed sensing.

95
00:06:03,730 --> 00:06:09,010
It really was a bombshell in
signal processing to discover--

96
00:06:09,010 --> 00:06:11,290
and in other fields,
too-- to discover

97
00:06:11,290 --> 00:06:17,220
that some things really
work best in the l1 norm.

98
00:06:17,220 --> 00:06:20,770
The maximum norm has a
natural part to play,

99
00:06:20,770 --> 00:06:26,410
and we'll see that,
or its matrix analog.

100
00:06:26,410 --> 00:06:29,930
So I didn't mention the l0 norm.

101
00:06:29,930 --> 00:06:31,810
All this lp business.

102
00:06:31,810 --> 00:06:41,605
So the lp norm, for any p,
is you take the pth power--

103
00:06:45,370 --> 00:06:47,380
to the pth power.

104
00:06:47,380 --> 00:06:49,630
Up here, p was 2.

105
00:06:49,630 --> 00:06:52,040
And you take the pth root.

106
00:06:52,040 --> 00:06:54,700
So maybe I should
write it to the 1/p.

107
00:06:57,550 --> 00:07:00,820
Then that way, taking
pth powers and pth

108
00:07:00,820 --> 00:07:05,350
roots, we do get the
norm of 2v has a factor

109
00:07:05,350 --> 00:07:08,050
2 compared to the norm of v.

110
00:07:08,050 --> 00:07:10,120
So p equal to 2, you see it.

111
00:07:10,120 --> 00:07:11,590
We've got it right there.

112
00:07:11,590 --> 00:07:15,320
p equal 1, you see it
here because it's just

113
00:07:15,320 --> 00:07:17,290
the sum of the absolute values.

114
00:07:17,290 --> 00:07:20,830
p equal to infinity,
if I move p up

115
00:07:20,830 --> 00:07:24,160
and up and up, it
will pick out--

116
00:07:24,160 --> 00:07:27,490
as I increase p,
whichever one is biggest

117
00:07:27,490 --> 00:07:29,740
is going to just
take over, and that's

118
00:07:29,740 --> 00:07:31,450
why you get the max norm.

119
00:07:31,450 --> 00:07:37,330
Now the zero norm, where I'm
using that word improperly,

120
00:07:37,330 --> 00:07:39,520
as you'll see.

121
00:07:39,520 --> 00:07:41,490
So what is the zero norm?

122
00:07:44,766 --> 00:07:50,200
So let me write it
[INAUDIBLE] It's

123
00:07:50,200 --> 00:07:59,420
the number of
non-zero components.

124
00:08:04,970 --> 00:08:06,710
It's the thing that
you'd like to know

125
00:08:06,710 --> 00:08:10,100
about in question of sparsity.

126
00:08:10,100 --> 00:08:12,890
Is there just one
non-zero component?

127
00:08:12,890 --> 00:08:13,820
Are there 11?

128
00:08:13,820 --> 00:08:16,880
Are there 101?

129
00:08:16,880 --> 00:08:22,940
That you might want to minimize
that because sparse vectors

130
00:08:22,940 --> 00:08:26,060
and sparse matrices are
much faster to compute with.

131
00:08:26,060 --> 00:08:27,500
You've got good stuff.

132
00:08:27,500 --> 00:08:30,200
But now I claim that's
not a norm, the number

133
00:08:30,200 --> 00:08:38,640
of non-zero components,
because how does the norm of 2v

134
00:08:38,640 --> 00:08:42,620
compare with the norm
of v, the zero norm?

135
00:08:42,620 --> 00:08:43,850
It would be the same.

136
00:08:43,850 --> 00:08:47,450
2v has the same number
of non-zeros as v.

137
00:08:47,450 --> 00:08:51,020
So it violates the
rule for a norm.

138
00:08:51,020 --> 00:09:00,290
So I think with these norms
and all the p's in between--

139
00:09:00,290 --> 00:09:04,240
so actually, the math
papers are full of,

140
00:09:04,240 --> 00:09:08,180
let p be between 1 and
infinity, because that's

141
00:09:08,180 --> 00:09:12,800
the range where you do have a
proper norm, as we will see.

142
00:09:12,800 --> 00:09:16,760
I think the good thing
to do with these norms is

143
00:09:16,760 --> 00:09:18,830
to have a picture in your mind.

144
00:09:18,830 --> 00:09:21,900
The geometry of a norm is good.

145
00:09:21,900 --> 00:09:23,690
So the picture I'm
going to suggest

146
00:09:23,690 --> 00:09:29,050
is, plot all the
vectors, let's say in 2D.

147
00:09:29,050 --> 00:09:33,940
So two-dimensional space, R2.

148
00:09:33,940 --> 00:09:40,750
So I want to plot the
vectors that have v equal 1

149
00:09:40,750 --> 00:09:43,060
in these different norms.

150
00:09:43,060 --> 00:09:45,680
So let me ask you--

151
00:09:45,680 --> 00:09:50,350
so here's 2D space,
R2, and now I

152
00:09:50,350 --> 00:09:56,590
want to plot all the vectors
that their ordinary l2 lengths

153
00:09:56,590 --> 00:09:58,660
equal 1.

154
00:09:58,660 --> 00:10:00,900
So what does that
picture look like?

155
00:10:00,900 --> 00:10:03,010
I just think a picture is
really worth something.

156
00:10:03,010 --> 00:10:04,770
It's a circle, thanks.

157
00:10:04,770 --> 00:10:06,340
It's a circle.

158
00:10:06,340 --> 00:10:07,290
It's a circle.

159
00:10:07,290 --> 00:10:10,380
This circle has the
equation, of course,

160
00:10:10,380 --> 00:10:13,560
v1 squared plus v2
squared equal 1.

161
00:10:18,006 --> 00:10:22,970
So I would call that the
unit ball for the norm

162
00:10:22,970 --> 00:10:26,660
or whatever is a circle.

163
00:10:26,660 --> 00:10:30,230
OK, now here comes
more interesting.

164
00:10:30,230 --> 00:10:31,950
What about in the l1, though?

165
00:10:35,780 --> 00:10:41,060
So again, tell me how to
plot all the points that

166
00:10:41,060 --> 00:10:46,130
have v1 plus v2 equal 1.

167
00:10:46,130 --> 00:10:49,890
What's the boundary
going to look like now?

168
00:10:49,890 --> 00:10:51,840
It's going to be, let's see.

169
00:10:51,840 --> 00:10:54,660
Well I can put down a
certain number of points.

170
00:10:54,660 --> 00:10:58,980
There up at 1 and there at 1
and there at minus 1 and there

171
00:10:58,980 --> 00:11:00,300
at minus 1.

172
00:11:00,300 --> 00:11:04,920
That would reflect the
vector 1, 0 and this

173
00:11:04,920 --> 00:11:09,510
would reflect the
vector 0, minus 1.

174
00:11:09,510 --> 00:11:10,860
So yeah.

175
00:11:10,860 --> 00:11:11,820
OK.

176
00:11:11,820 --> 00:11:15,240
So those are like four
points easy to plot.

177
00:11:15,240 --> 00:11:18,570
Easy to see the l1 norm.

178
00:11:18,570 --> 00:11:26,250
But what's the rest
of the boundary here?

179
00:11:26,250 --> 00:11:27,720
It's a diamond, good.

180
00:11:27,720 --> 00:11:30,000
It's a diamond.

181
00:11:30,000 --> 00:11:35,580
We have linear set equal to 1.

182
00:11:35,580 --> 00:11:38,220
Up here in the
positive quadrant,

183
00:11:38,220 --> 00:11:41,460
it's just v1 plus v2 equal
to 1, and the graph of that

184
00:11:41,460 --> 00:11:42,520
is a straight line.

185
00:11:45,430 --> 00:11:48,000
So all these guys--
this is all the points

186
00:11:48,000 --> 00:11:51,520
with v1 plus v2 equal 1.

187
00:11:51,520 --> 00:11:55,300
And over here and over
here and over here.

188
00:11:55,300 --> 00:12:01,650
So the unit ball in the
l1 norm is a diamond.

189
00:12:01,650 --> 00:12:04,950
And that's a very
important picture.

190
00:12:04,950 --> 00:12:09,060
It reflects in a very
simple way something

191
00:12:09,060 --> 00:12:11,460
important about the
l1 norm and the reason

192
00:12:11,460 --> 00:12:16,090
it's just exploded
in importance.

193
00:12:16,090 --> 00:12:17,520
Let me continue, though.

194
00:12:17,520 --> 00:12:20,400
What about the max norm?

195
00:12:20,400 --> 00:12:23,910
v max or v infinity equal to 1.

196
00:12:23,910 --> 00:12:26,700
So again, let me
plot these guys,

197
00:12:26,700 --> 00:12:31,810
and these guys are certainly
going to be in it again

198
00:12:31,810 --> 00:12:37,260
because 0 [INAUDIBLE] plus or
minus i and plus or minus j

199
00:12:37,260 --> 00:12:39,420
are good friends.

200
00:12:39,420 --> 00:12:42,650
What's the rest of the
boundary look like now?

201
00:12:42,650 --> 00:12:52,260
Now this means max of
the v's equal to 1.

202
00:12:52,260 --> 00:12:54,440
So what are the
rest of the points?

203
00:12:57,720 --> 00:13:00,440
You see, it does take a little
thought, but then you get it

204
00:13:00,440 --> 00:13:03,520
and you don't forget it.

205
00:13:03,520 --> 00:13:07,390
OK, so what's up?

206
00:13:07,390 --> 00:13:09,550
I'm looking.

207
00:13:09,550 --> 00:13:12,880
So suppose the maximum is v1.

208
00:13:12,880 --> 00:13:26,100
I think it's going to look like
that, out to 1, 0 and up to 0,

209
00:13:26,100 --> 00:13:27,690
1.

210
00:13:27,690 --> 00:13:36,380
And up here, the vector
would be 1.4 or something,

211
00:13:36,380 --> 00:13:38,560
so the maximum would be 1.

212
00:13:38,560 --> 00:13:40,520
Is that OK?

213
00:13:40,520 --> 00:13:45,920
So somehow, what really sees,
as you change this number p,

214
00:13:45,920 --> 00:13:50,360
you start with p equal
to 1, or a diamond,

215
00:13:50,360 --> 00:13:55,640
and it kind of swells out to
be a circle at p equal to 2,

216
00:13:55,640 --> 00:13:59,780
and then it kind
of keeps swelling

217
00:13:59,780 --> 00:14:03,200
to be a square and
p equal to infinity.

218
00:14:03,200 --> 00:14:05,490
That's an interesting thing.

219
00:14:05,490 --> 00:14:06,250
And yeah.

220
00:14:09,050 --> 00:14:12,960
Now what's the problem
with the zero norm?

221
00:14:15,510 --> 00:14:17,075
This is the number of non-zeros.

222
00:14:21,250 --> 00:14:23,440
OK, let me draw it.

223
00:14:23,440 --> 00:14:26,410
Where are the points
with one non-zero?

224
00:14:29,080 --> 00:14:33,760
So I'm plotting the unit ball.

225
00:14:33,760 --> 00:14:39,850
Where are the vectors in this
thing that have one non-zero?

226
00:14:39,850 --> 00:14:42,220
Not zero non-zero.

227
00:14:42,220 --> 00:14:43,960
So that's not included.

228
00:14:48,550 --> 00:14:50,260
So what do I have?

229
00:14:50,260 --> 00:14:53,920
I'm not allowed
the vector 1/3, 2/3

230
00:14:53,920 --> 00:14:56,860
because that has two
non-zeros, so where

231
00:14:56,860 --> 00:15:01,020
are the points with
only one non-zero?

232
00:15:01,020 --> 00:15:03,520
Yeah, on the axes, yeah.

233
00:15:03,520 --> 00:15:04,660
That tells you.

234
00:15:04,660 --> 00:15:08,250
So it can be there and there.

235
00:15:08,250 --> 00:15:11,190
Oops, without that guy.

236
00:15:11,190 --> 00:15:13,950
And of course those
just keep going out.

237
00:15:13,950 --> 00:15:17,025
So it totally violates the--

238
00:15:20,400 --> 00:15:26,140
so maybe the point that I should
make about these figures--

239
00:15:26,140 --> 00:15:27,690
so like, what's happening?

240
00:15:27,690 --> 00:15:30,280
When I go down to zero--

241
00:15:30,280 --> 00:15:33,930
and really, that figure should
be at the other end, right?

242
00:15:33,930 --> 00:15:35,850
Oh no, shoot.

243
00:15:35,850 --> 00:15:37,200
This guy's in the middle.

244
00:15:37,200 --> 00:15:40,860
This is a badly drawn figure.

245
00:15:40,860 --> 00:15:43,270
l2 is kind of the center guy.

246
00:15:43,270 --> 00:15:48,210
l1 is at one end, l infinity
is at the other end,

247
00:15:48,210 --> 00:15:53,700
and this one has gone off
the end at the left there.

248
00:15:53,700 --> 00:15:56,810
The diamond has-- yeah,
what's happened here,

249
00:15:56,810 --> 00:16:02,340
as that one goes down towards
zero, none of these will be OK.

250
00:16:02,340 --> 00:16:09,780
These balls or these
sets will lose weight.

251
00:16:12,590 --> 00:16:15,080
So they'll always
have these points in,

252
00:16:15,080 --> 00:16:19,910
but they'll be like this and
then like this and then finally

253
00:16:19,910 --> 00:16:24,080
in the unacceptable
limit, but none of those--

254
00:16:24,080 --> 00:16:25,720
this was not any good either.

255
00:16:25,720 --> 00:16:28,790
This was for people
equal 1/2, let's say.

256
00:16:31,820 --> 00:16:36,120
That's a p equal to 1/2
and that's not a good norm.

257
00:16:36,120 --> 00:16:36,660
Yeah.

258
00:16:36,660 --> 00:16:44,190
So maybe the property of
the circle, the diamond,

259
00:16:44,190 --> 00:16:51,420
and the square, which is a nice
math property of those three

260
00:16:51,420 --> 00:16:57,140
sets and is not
possessed by this.

261
00:16:57,140 --> 00:17:01,530
As this thing loses weight,
I lose the property.

262
00:17:01,530 --> 00:17:03,650
And then of course it's
totally lost over there.

263
00:17:03,650 --> 00:17:06,530
Do you know what that
property would be?

264
00:17:06,530 --> 00:17:08,130
It's what?

265
00:17:08,130 --> 00:17:09,390
Concave, convex.

266
00:17:09,390 --> 00:17:10,900
Convex, I would say.

267
00:17:10,900 --> 00:17:11,400
Convex.

268
00:17:14,020 --> 00:17:21,970
This is a true norm
as the convex unit.

269
00:17:21,970 --> 00:17:28,970
Well maybe for ball, I'm taking
all the v's less or equal to 1.

270
00:17:28,970 --> 00:17:33,070
Yeah, so I'm allowing the
insides of these shapes.

271
00:17:33,070 --> 00:17:35,280
So this is not a convex set.

272
00:17:35,280 --> 00:17:37,910
That set, which I should maybe--

273
00:17:37,910 --> 00:17:48,330
so not convex would
be this one like so.

274
00:17:48,330 --> 00:17:54,680
And that reflects the fact
that the rules for a norm

275
00:17:54,680 --> 00:17:56,030
are broken in the triangle.

276
00:17:56,030 --> 00:18:00,230
Inequality is probably
broken in the--

277
00:18:00,230 --> 00:18:02,510
other stuff, yeah.

278
00:18:02,510 --> 00:18:06,210
I think that's sort
of worth remembering.

279
00:18:06,210 --> 00:18:20,480
And then one more norm that's
natural to think about is--

280
00:18:20,480 --> 00:18:24,180
so S, as in the
Piazza question, S

281
00:18:24,180 --> 00:18:30,030
does always represent a
symmetric matrix in 18.065.

282
00:18:30,030 --> 00:18:33,600
And now my norm is going to be--

283
00:18:33,600 --> 00:18:36,840
so I'm going to
call it the S norm.

284
00:18:36,840 --> 00:18:41,610
So actually, it's a positive
definite symmetric matrix.

285
00:18:41,610 --> 00:18:44,400
S is a positive definite
symmetric matrix.

286
00:18:44,400 --> 00:18:45,600
And what do I do?

287
00:18:45,600 --> 00:18:47,340
I'll take v transpose Sv.

288
00:18:53,920 --> 00:18:56,730
OK, what's our word for that?

289
00:18:56,730 --> 00:18:57,780
The energy.

290
00:18:57,780 --> 00:19:00,540
That's the energy in
the vector v. And I'll

291
00:19:00,540 --> 00:19:10,270
take the square root so that
I now have the length of two

292
00:19:10,270 --> 00:19:13,530
if I double v, from v to 2v.

293
00:19:13,530 --> 00:19:15,420
Then I got a 2
here and a 2 here,

294
00:19:15,420 --> 00:19:18,880
and when I take the square
root, I get a overall 2

295
00:19:18,880 --> 00:19:20,250
and that's what I want.

296
00:19:20,250 --> 00:19:24,960
I want the norm to grow
linearly with the two or three

297
00:19:24,960 --> 00:19:26,610
or whatever I multiply by.

298
00:19:26,610 --> 00:19:30,720
But what is the
shape of this thing?

299
00:19:30,720 --> 00:19:36,540
So what is the shape of--

300
00:19:36,540 --> 00:19:39,460
let me put it on this board.

301
00:19:39,460 --> 00:19:41,510
I'm going to get a
picture like that.

302
00:19:41,510 --> 00:19:51,960
So what is the shape of v
transpose Sv equal 1 or less

303
00:19:51,960 --> 00:19:52,750
or equal 1?

304
00:19:55,460 --> 00:19:59,630
This is a symmetric
positive definite.

305
00:19:59,630 --> 00:20:03,800
People use those three
letters to tell us.

306
00:20:03,800 --> 00:20:06,780
I'm claiming that we
get a bunch of norms.

307
00:20:06,780 --> 00:20:10,280
When do we get the l2 norm?

308
00:20:10,280 --> 00:20:15,290
What matrix S would this
give us the l2 norm?

309
00:20:15,290 --> 00:20:17,240
The identity, certainly.

310
00:20:17,240 --> 00:20:21,260
Now what's going to happen if
I use some different matrix S?

311
00:20:21,260 --> 00:20:26,060
This circle is going
to change shape.

312
00:20:26,060 --> 00:20:28,770
I might have a different
norm, depending on S.

313
00:20:28,770 --> 00:20:33,970
And a typical case would
be S equal 2, 3, say.

314
00:20:37,320 --> 00:20:40,500
That's a positive
definite symmetric matrix.

315
00:20:40,500 --> 00:20:48,160
And now I would be drawing the
graph of 2v1 squared plus 3v2

316
00:20:48,160 --> 00:20:48,660
squared.

317
00:20:48,660 --> 00:20:50,580
That would be the energy, right?

318
00:20:50,580 --> 00:20:51,890
Equal 1.

319
00:20:51,890 --> 00:20:56,190
And I just want you to
tell me what shape that is.

320
00:20:56,190 --> 00:20:59,550
So that's a perfectly
good norm that you

321
00:20:59,550 --> 00:21:01,830
could check all its properties.

322
00:21:01,830 --> 00:21:03,930
They all come out easily.

323
00:21:03,930 --> 00:21:06,480
But I get a new picture--

324
00:21:06,480 --> 00:21:09,570
a new norm that's
kind of adjustable.

325
00:21:09,570 --> 00:21:11,400
You could say it's
a weighted norm.

326
00:21:11,400 --> 00:21:14,190
Weights mean that
you kind of have

327
00:21:14,190 --> 00:21:16,440
picked some numbers
sort of appropriate

328
00:21:16,440 --> 00:21:18,670
to the particular problem.

329
00:21:18,670 --> 00:21:21,060
Well, suppose those
numbers are 2 and 3.

330
00:21:21,060 --> 00:21:26,440
What shape is the unit
ball in this S norm?

331
00:21:26,440 --> 00:21:28,270
It's an ellipse, right.

332
00:21:28,270 --> 00:21:29,650
It's an ellipse.

333
00:21:29,650 --> 00:21:32,050
And I guess it
will actually be--

334
00:21:32,050 --> 00:21:34,900
so the larger
number, 3, will mean

335
00:21:34,900 --> 00:21:39,180
you can't go as far as
the smaller number, 2.

336
00:21:39,180 --> 00:21:43,420
I think it would probably
be an ellipse like this,

337
00:21:43,420 --> 00:21:46,690
and the axes length
of the ellipse

338
00:21:46,690 --> 00:21:50,530
would probably have something
to do with the 2 and the 3.

339
00:21:50,530 --> 00:21:56,940
OK, so now you know really
all the vector norms that

340
00:21:56,940 --> 00:22:01,380
are sort of naturally used.

341
00:22:01,380 --> 00:22:04,950
These come up in a natural way.

342
00:22:04,950 --> 00:22:10,040
As we said, the identity matrix
brings us back to the two norm.

343
00:22:10,040 --> 00:22:14,850
So these are all sort of
variations on the two norm.

344
00:22:14,850 --> 00:22:20,950
And these are variations
as p runs from 1 up to 2

345
00:22:20,950 --> 00:22:26,660
on to infinity and is not
allowed to go below 1.

346
00:22:26,660 --> 00:22:27,970
OK, that's norms.

347
00:22:30,710 --> 00:22:34,740
And then maybe you can actually
see from this picture--

348
00:22:34,740 --> 00:22:41,150
here is a, like,
somewhat hokey idea

349
00:22:41,150 --> 00:22:49,310
of why it is that minimizing
the area in this norm--

350
00:22:49,310 --> 00:22:52,600
so what do I mean by that?

351
00:22:52,600 --> 00:22:55,700
Here would be a typical problem.

352
00:22:55,700 --> 00:23:05,460
Minimize, subject to
Ax equal b, the l2--

353
00:23:05,460 --> 00:23:09,490
sorry, I'm using x now--

354
00:23:09,490 --> 00:23:11,980
the l1 norm of x.

355
00:23:16,900 --> 00:23:19,560
So that would be an
important problem.

356
00:23:19,560 --> 00:23:21,090
Actually, it has a name.

357
00:23:21,090 --> 00:23:24,120
People have spent a lot of
time thinking of a fast way

358
00:23:24,120 --> 00:23:25,200
to solve it.

359
00:23:25,200 --> 00:23:26,900
It's almost like least squares.

360
00:23:26,900 --> 00:23:31,500
What would make it more
like least squares would be,

361
00:23:31,500 --> 00:23:32,947
change that to 2.

362
00:23:35,750 --> 00:23:37,860
Yeah.

363
00:23:37,860 --> 00:23:43,410
Can I just sort of sketch,
without making a big argument

364
00:23:43,410 --> 00:23:49,480
here, the difference
between l equal 1 or 2 here.

365
00:23:49,480 --> 00:23:52,070
Yeah, I'll just draw a picture.

366
00:23:52,070 --> 00:23:56,580
Now I'll erase this ellipse,
but you won't forget.

367
00:23:56,580 --> 00:23:59,055
OK.

368
00:23:59,055 --> 00:24:02,030
So this is our problem.

369
00:24:02,030 --> 00:24:05,930
With l1, it has a famous
name, basis pursuit.

370
00:24:05,930 --> 00:24:09,920
Well famous to people
who work in optimization.

371
00:24:09,920 --> 00:24:12,740
For l2, it has an
important name.

372
00:24:12,740 --> 00:24:16,340
Well it's sort of
like least squares.

373
00:24:16,340 --> 00:24:20,180
Ridge regression.

374
00:24:20,180 --> 00:24:23,000
This is like a
beautiful model problem.

375
00:24:23,000 --> 00:24:25,340
Among all solutions
to Ax, suppose

376
00:24:25,340 --> 00:24:31,220
this is just one
equation, like c1x1 plus

377
00:24:31,220 --> 00:24:35,970
c2x2 equals some right side, b.

378
00:24:35,970 --> 00:24:43,500
So the constraint says that the
vectors x have to be on a line.

379
00:24:43,500 --> 00:24:46,560
Suppose that's a
graph of that line.

380
00:24:46,560 --> 00:24:51,510
So among all these
x's, which one--

381
00:24:51,510 --> 00:24:56,880
oh, I'm realizing what I'm going
to say is going to be smart.

382
00:24:56,880 --> 00:25:00,500
I mean, it's going to be nice.

383
00:25:00,500 --> 00:25:05,850
Not going to be difficult. Let's
do the one we know best, l2.

384
00:25:05,850 --> 00:25:07,560
So here's a picture of the line.

385
00:25:07,560 --> 00:25:11,380
Let me make it a little
more tilted so you--

386
00:25:15,980 --> 00:25:19,470
yeah, like 2, 3.

387
00:25:19,470 --> 00:25:21,030
OK.

388
00:25:21,030 --> 00:25:22,500
This is the xy plane.

389
00:25:22,500 --> 00:25:25,320
Here's x1, here's x2.

390
00:25:25,320 --> 00:25:29,040
Here are the points that
satisfy my condition.

391
00:25:29,040 --> 00:25:33,780
Which point on that
line minimizes--

392
00:25:33,780 --> 00:25:37,570
has the smallest l2 norm?

393
00:25:37,570 --> 00:25:43,920
Which point on the line
has the smallest l2 norm?

394
00:25:43,920 --> 00:25:48,990
Yeah, you're drawing the
right figure with your hands.

395
00:25:48,990 --> 00:25:50,640
The smallest l2 norm--

396
00:25:50,640 --> 00:25:55,160
l2, remember, is just
how far out you go.

397
00:25:55,160 --> 00:25:59,450
It's circular here, so it
doesn't matter what direction.

398
00:25:59,450 --> 00:26:02,740
They're all giving the same
l2 norm, it's just how far.

399
00:26:02,740 --> 00:26:07,053
So we're looking for the
closest point on the line

400
00:26:07,053 --> 00:26:08,720
because we don't want
to go any further.

401
00:26:08,720 --> 00:26:11,880
We want to go a
minimum distance with--

402
00:26:11,880 --> 00:26:13,460
I'm doing l2 now.

403
00:26:16,850 --> 00:26:18,890
So where is the point
at minimum distance?

404
00:26:18,890 --> 00:26:23,800
Yeah, just show me again once
more, with hands or whatever.

405
00:26:23,800 --> 00:26:26,420
It'll be that.

406
00:26:26,420 --> 00:26:29,840
I didn't want 45
degree angles there.

407
00:26:29,840 --> 00:26:33,350
I'm going to erase
it again and really--

408
00:26:33,350 --> 00:26:36,200
this time, I'm
going to get angles

409
00:26:36,200 --> 00:26:42,890
that are not 45 [INAUDIBLE]
All right, brilliant.

410
00:26:42,890 --> 00:26:44,180
Got it.

411
00:26:44,180 --> 00:26:45,880
OK, that's my line.

412
00:26:45,880 --> 00:26:48,430
OK, and what's the nearest
point in the l2 norm?

413
00:26:48,430 --> 00:26:51,040
Here's the winner in l2, right?

414
00:26:51,040 --> 00:26:53,870
The nearest point.

415
00:26:53,870 --> 00:26:55,280
Everybody sees that picture?

416
00:26:55,280 --> 00:27:04,380
So that's a basic picture
for minimizing something

417
00:27:04,380 --> 00:27:07,680
with a constraint, which
is the fundamental problem

418
00:27:07,680 --> 00:27:12,600
of optimization, of neural
nets, of everything, really.

419
00:27:12,600 --> 00:27:14,530
Of life.

420
00:27:14,530 --> 00:27:16,980
Well I'm getting philosophical.

421
00:27:19,750 --> 00:27:23,980
But the question always is,
and maybe it's true in life,

422
00:27:23,980 --> 00:27:26,690
too, which norm are you using?

423
00:27:26,690 --> 00:27:34,640
OK, now that was
the minimum in l2.

424
00:27:34,640 --> 00:27:36,590
That's the shortest
distance, where

425
00:27:36,590 --> 00:27:40,430
distance means what we usually
think of it as meaning.

426
00:27:40,430 --> 00:27:44,120
But now, let's go
for the l1 norm.

427
00:27:44,120 --> 00:27:49,830
Which point on the line
has the smallest l1 norm?

428
00:27:49,830 --> 00:27:53,180
So now I'm going to add the 2.

429
00:27:53,180 --> 00:28:00,050
So if this is some
point a, 0 and this

430
00:28:00,050 --> 00:28:04,520
is some point 0, b right there.

431
00:28:04,520 --> 00:28:07,030
So those two points are
obviously important.

432
00:28:07,030 --> 00:28:08,740
And that point, we
could figure out

433
00:28:08,740 --> 00:28:14,560
the formula for because we
know what the geometry is.

434
00:28:14,560 --> 00:28:17,230
But I've just put
those two points in.

435
00:28:17,230 --> 00:28:19,160
So did I get a 0, b?

436
00:28:19,160 --> 00:28:22,790
Yeah, that's a zero.

437
00:28:22,790 --> 00:28:24,310
So let me just ask
you the question.

438
00:28:24,310 --> 00:28:28,590
What point on that line
has the smallest l1 norm?

439
00:28:28,590 --> 00:28:30,714
Which has the smallest l1 norm?

440
00:28:33,482 --> 00:28:34,190
Somebody said it.

441
00:28:34,190 --> 00:28:37,430
Just say it a little louder so
that you're on tape forever.

442
00:28:40,130 --> 00:28:40,970
AUDIENCE: 0, b.

443
00:28:40,970 --> 00:28:43,120
GILBERT STRANG:
0, b, this point.

444
00:28:43,120 --> 00:28:45,500
That's the winner.

445
00:28:45,500 --> 00:28:51,440
This is the l1 winner and
this was the l2 winner.

446
00:28:54,180 --> 00:28:58,800
And notice what I said earlier,
and I didn't see it coming,

447
00:28:58,800 --> 00:29:02,540
but now I realize this is a
figure to put in the notes.

448
00:29:02,540 --> 00:29:07,920
The winner has the most zeros.

449
00:29:07,920 --> 00:29:10,860
It's the [? sparsest ?] vector.

450
00:29:10,860 --> 00:29:14,500
Well out of two components,
it didn't have much freedom,

451
00:29:14,500 --> 00:29:16,830
but it has a zero component.

452
00:29:16,830 --> 00:29:23,950
It's on the axes.

453
00:29:23,950 --> 00:29:26,700
It's the things on the
axes that have the smallest

454
00:29:26,700 --> 00:29:28,490
number of components.

455
00:29:28,490 --> 00:29:36,780
So yeah, this is the
picture in two dimensions.

456
00:29:36,780 --> 00:29:39,240
So I'm in 2D.

457
00:29:39,240 --> 00:29:42,850
And you can see that the winner
has a zero component, yeah.

458
00:29:46,680 --> 00:29:51,040
And that's a fact that extends
into higher dimensions too

459
00:29:51,040 --> 00:29:55,240
and that makes the l1 norm
special, as I've said.

460
00:29:55,240 --> 00:29:55,740
Yeah.

461
00:29:55,740 --> 00:30:00,130
Is there more to say
about that example?

462
00:30:00,130 --> 00:30:07,040
For a simple 2D question,
that really makes the point

463
00:30:07,040 --> 00:30:09,200
that the l1 winner is there.

464
00:30:09,200 --> 00:30:10,070
It's not further.

465
00:30:10,070 --> 00:30:12,040
You don't go further
up the line, right?

466
00:30:12,040 --> 00:30:17,530
Because that's bad in all ways.

467
00:30:17,530 --> 00:30:20,420
When you go up
further, you're adding

468
00:30:20,420 --> 00:30:23,960
some non-zero first
component and you're

469
00:30:23,960 --> 00:30:27,000
increasing the non-zero
second component,

470
00:30:27,000 --> 00:30:29,840
so that's a bad idea.

471
00:30:29,840 --> 00:30:30,870
That's a bad idea.

472
00:30:30,870 --> 00:30:32,480
This is the winner.

473
00:30:32,480 --> 00:30:38,570
And in a way,
here's the picture.

474
00:30:38,570 --> 00:30:40,490
Oh yeah.

475
00:30:40,490 --> 00:30:43,340
I should prepare these
lectures, but this one's

476
00:30:43,340 --> 00:30:45,050
coming out all right anyway.

477
00:30:45,050 --> 00:30:49,940
So the picture there is
the nearest ball hits

478
00:30:49,940 --> 00:30:52,470
at that point.

479
00:30:52,470 --> 00:30:53,640
And what is it?

480
00:30:53,640 --> 00:30:54,540
Can you see that?

481
00:30:54,540 --> 00:30:57,840
So that star is
outside the circle.

482
00:30:57,840 --> 00:31:07,260
This is the l1 winner and
that's the blow up the l1 norm

483
00:31:07,260 --> 00:31:08,730
until it hits.

484
00:31:08,730 --> 00:31:13,980
That's the point where
the l1 norm hits.

485
00:31:13,980 --> 00:31:16,260
Do you see it?

486
00:31:16,260 --> 00:31:20,970
Just give it a little thought,
that another geometric way

487
00:31:20,970 --> 00:31:23,640
to see the answer
to this problem is,

488
00:31:23,640 --> 00:31:26,080
you start at the
origin and you blow up

489
00:31:26,080 --> 00:31:30,450
the norm until you get
a point on the line that

490
00:31:30,450 --> 00:31:32,280
satisfies your constraint.

491
00:31:32,280 --> 00:31:36,870
And because you were blowing up
the norm, when it hits first,

492
00:31:36,870 --> 00:31:39,350
that's the smallest
blow-up possible.

493
00:31:39,350 --> 00:31:42,970
That's the guy that minimizes.

494
00:31:42,970 --> 00:31:44,820
Yeah, so just think
about that picture

495
00:31:44,820 --> 00:31:49,760
and I'll draw it
better somewhere, too.

496
00:31:49,760 --> 00:31:52,810
Well that's vector norms.

497
00:31:55,520 --> 00:31:59,420
And then I introduce some
matrix norms, and let me just

498
00:31:59,420 --> 00:32:00,740
say a word about those.

499
00:32:05,460 --> 00:32:08,410
OK, a word about matrix norms.

500
00:32:08,410 --> 00:32:16,720
So the matrix norms were the--

501
00:32:16,720 --> 00:32:25,050
so now I have a matrix A and I
want to define those same three

502
00:32:25,050 --> 00:32:27,870
norms again for a matrix.

503
00:32:27,870 --> 00:32:34,340
And this was the
2 norm, and what

504
00:32:34,340 --> 00:32:36,660
was the 2 norm of a matrix?

505
00:32:36,660 --> 00:32:45,710
Well it was sigma 1,
it turned out to be.

506
00:32:45,710 --> 00:32:47,900
So that doesn't define it.

507
00:32:47,900 --> 00:32:49,220
Or we could define it.

508
00:32:49,220 --> 00:32:51,800
Just say, OK, the
largest singular value

509
00:32:51,800 --> 00:32:54,320
is the 2 norm of the matrix.

510
00:32:54,320 --> 00:32:56,350
But actually, it
comes from somewhere.

511
00:32:56,350 --> 00:33:04,670
So I want to speak about
this one first, the 2 norm.

512
00:33:04,670 --> 00:33:09,070
So it's the 2 norm of
a matrix, and one way

513
00:33:09,070 --> 00:33:19,500
to see the 2 norm of a
matrix is to connect it

514
00:33:19,500 --> 00:33:21,150
to the 2 norm of vectors.

515
00:33:23,790 --> 00:33:26,220
I'd like to connect
the 2 norm of matrices

516
00:33:26,220 --> 00:33:29,400
to the 2 norm of vectors.

517
00:33:29,400 --> 00:33:32,400
And how shall I do that?

518
00:33:32,400 --> 00:33:39,020
I think I'm going to
look at the 2 norm of Ax

519
00:33:39,020 --> 00:33:43,420
over the 2 norm of x.

520
00:33:43,420 --> 00:33:49,370
So in a way, to me, that ratio
is like the blow-up factor.

521
00:33:49,370 --> 00:33:51,880
If A was seven
times the identity,

522
00:33:51,880 --> 00:33:53,690
to take an easy case--

523
00:33:53,690 --> 00:33:55,880
if A is seven
times the identity,

524
00:33:55,880 --> 00:33:57,140
what will that ratio be?

525
00:34:00,380 --> 00:34:01,760
Say it, yeah.

526
00:34:01,760 --> 00:34:03,340
Seven.

527
00:34:03,340 --> 00:34:09,580
If A is 7i, this will be 7x
and this will be x, and norms,

528
00:34:09,580 --> 00:34:14,662
the factor seven comes out,
so that ratio will be seven.

529
00:34:14,662 --> 00:34:16,600
OK.

530
00:34:16,600 --> 00:34:19,989
For me, the norm is--

531
00:34:19,989 --> 00:34:21,174
that's the blow-up factor.

532
00:34:23,995 --> 00:34:29,130
So here's the idea
of a matrix norm.

533
00:34:29,130 --> 00:34:31,159
Now I'm doing matrix.

534
00:34:31,159 --> 00:34:34,998
Matrix norm from vector norm.

535
00:34:38,830 --> 00:34:42,770
And the answer will be
the maximum blow-up.

536
00:34:46,210 --> 00:34:48,770
The maximum of this ratio.

537
00:34:48,770 --> 00:34:50,909
I call that ratio
the blow-up factor.

538
00:34:50,909 --> 00:34:53,210
That's just a made-up name.

539
00:34:53,210 --> 00:34:57,731
The maximum over all x.

540
00:34:57,731 --> 00:34:58,760
All of x.

541
00:34:58,760 --> 00:35:02,590
I look to see which vector
gets blown up the most

542
00:35:02,590 --> 00:35:09,670
and that is the
norm of the matrix.

543
00:35:09,670 --> 00:35:12,460
I've settled on
norms of vectors.

544
00:35:12,460 --> 00:35:15,520
That's done upstairs there.

545
00:35:15,520 --> 00:35:18,580
Now I'm looking at
norms of matrices.

546
00:35:18,580 --> 00:35:22,660
And this is one way to get
a good norm of a matrix that

547
00:35:22,660 --> 00:35:24,760
kind of comes from the 2 norm.

548
00:35:24,760 --> 00:35:27,520
So there would be other
norms for matrices coming

549
00:35:27,520 --> 00:35:31,300
from other vector norms,
and those, we haven't seen,

550
00:35:31,300 --> 00:35:35,600
but the 2 norm is a
very important one.

551
00:35:35,600 --> 00:35:40,030
So what is the
maximum value of this?

552
00:35:40,030 --> 00:35:43,433
Of that ratio for a matrix A?

553
00:35:43,433 --> 00:35:47,890
The claim is that it's sigma 1.

554
00:35:47,890 --> 00:35:49,880
I'll just put a
big equality there.

555
00:35:53,090 --> 00:35:58,372
Now, can we see, why is sigma
1 the answer to this problem?

556
00:36:03,200 --> 00:36:05,400
I can see a couple of
ways to think about that

557
00:36:05,400 --> 00:36:07,220
but that's a very
important fact.

558
00:36:07,220 --> 00:36:14,850
In fact, this is a way to
discover what sigma 1 is

559
00:36:14,850 --> 00:36:16,650
without all the other sigmas.

560
00:36:16,650 --> 00:36:19,860
If I look for the x that has
the biggest blow-up factor--

561
00:36:19,860 --> 00:36:22,260
and by the way,
which x will it be?

562
00:36:22,260 --> 00:36:27,630
Which x will win the max
competition here and be sigma

563
00:36:27,630 --> 00:36:30,300
1 times as large as--

564
00:36:30,300 --> 00:36:34,920
the ratio will be sigma 1.

565
00:36:34,920 --> 00:36:36,090
That will be sigma 1.

566
00:36:36,090 --> 00:36:39,540
When is this thing sigma
1 times as large as that?

567
00:36:39,540 --> 00:36:42,220
For which x?

568
00:36:42,220 --> 00:36:45,120
Not for an eigenvector.

569
00:36:45,120 --> 00:36:50,070
If x was an eigenvector,
what would that ratio be?

570
00:36:50,070 --> 00:36:50,570
Lambda.

571
00:36:53,200 --> 00:36:56,680
But if A is not a
symmetric matrix,

572
00:36:56,680 --> 00:37:03,170
maybe the eigenvectors don't
tell you the exact way they go.

573
00:37:03,170 --> 00:37:06,090
So what vector
would you now guess?

574
00:37:06,090 --> 00:37:10,460
It's not an eigenvector,
it is a singular vector.

575
00:37:10,460 --> 00:37:14,180
And which singular vector
is it probably going to be?

576
00:37:14,180 --> 00:37:16,140
v1.

577
00:37:16,140 --> 00:37:17,550
Yeah, v1 makes sense.

578
00:37:17,550 --> 00:37:18,370
Winner.

579
00:37:18,370 --> 00:37:21,240
So the winner of
this competition

580
00:37:21,240 --> 00:37:29,468
is x equal v1, the first
right singular vector.

581
00:37:32,960 --> 00:37:34,700
And we better be
able to check that.

582
00:37:34,700 --> 00:37:42,410
So again, this maximization
problem, the answer

583
00:37:42,410 --> 00:37:46,440
is in terms of the
singular vector.

584
00:37:46,440 --> 00:37:49,590
So that's a way to find
this first singular vector

585
00:37:49,590 --> 00:37:52,130
without finding them all.

586
00:37:52,130 --> 00:37:55,850
And let's just plug in
the first singular vector

587
00:37:55,850 --> 00:38:02,600
and see that the
ratio is sigma 1.

588
00:38:02,600 --> 00:38:04,890
So now let me plug it in.

589
00:38:04,890 --> 00:38:06,290
So what do I have?

590
00:38:06,290 --> 00:38:12,910
I want Av1 over length of v1.

591
00:38:12,910 --> 00:38:14,750
OK.

592
00:38:14,750 --> 00:38:17,990
And I'm hoping to
get that answer.

593
00:38:17,990 --> 00:38:20,450
Well what's the
denominator here?

594
00:38:20,450 --> 00:38:23,835
The length of v1 is one.

595
00:38:23,835 --> 00:38:25,690
So no big deal there.

596
00:38:25,690 --> 00:38:27,350
That's one.

597
00:38:27,350 --> 00:38:29,605
What's the length
of the top one?

598
00:38:32,530 --> 00:38:34,360
Now what is Av1?

599
00:38:34,360 --> 00:38:40,060
If v1 is the first right
singular vector, than Av1

600
00:38:40,060 --> 00:38:45,558
is sigma 1 times u1.

601
00:38:45,558 --> 00:38:52,560
Remember, the singular vector
deals were Av equals sigma u.

602
00:38:52,560 --> 00:38:58,740
Avk equals sigma k uk.

603
00:38:58,740 --> 00:38:59,890
You remember that.

604
00:38:59,890 --> 00:39:01,940
So they're not eigenvectors.

605
00:39:01,940 --> 00:39:03,120
They're singular vectors.

606
00:39:03,120 --> 00:39:13,110
So Av1 is the length of sigma
1 u1 and it's divided by 1.

607
00:39:13,110 --> 00:39:19,086
And of course, u1 is also a unit
vector, so I just get sigma 1.

608
00:39:19,086 --> 00:39:19,586
OK.

609
00:39:23,250 --> 00:39:25,290
So that's another
way to say that you

610
00:39:25,290 --> 00:39:30,420
can find sigma 1 by solving
this maximum problem.

611
00:39:30,420 --> 00:39:33,030
And you get that sigma 1.

612
00:39:33,030 --> 00:39:35,200
OK.

613
00:39:35,200 --> 00:39:38,960
And I could get
other matrix norms

614
00:39:38,960 --> 00:39:44,770
by maximizing that blow-up
factor in that vector norm.

615
00:39:44,770 --> 00:39:50,750
I won't do that now, just to
keep control of what we've got.

616
00:39:50,750 --> 00:39:55,970
Now what was the next matrix
norm that came in last time?

617
00:39:55,970 --> 00:40:01,070
Very, very important one for
deep learning and neural nets.

618
00:40:01,070 --> 00:40:04,680
Somehow it's a little
simpler than this guy.

619
00:40:04,680 --> 00:40:07,860
And what was that matrix norm?

620
00:40:07,860 --> 00:40:12,490
What letter whose
name goes here?

621
00:40:12,490 --> 00:40:13,960
Frobenius.

622
00:40:13,960 --> 00:40:17,200
So capital F for Frobenius.

623
00:40:17,200 --> 00:40:19,060
And what was that?

624
00:40:19,060 --> 00:40:23,410
That was the square root
of the sum of all the--

625
00:40:23,410 --> 00:40:33,670
add all the aij squares,
for all over the matrix,

626
00:40:33,670 --> 00:40:36,600
and then take the square root.

627
00:40:36,600 --> 00:40:40,160
And then somebody asked a
good question after class

628
00:40:40,160 --> 00:40:44,690
on Wednesday, what has that
got to do with the sigmas?

629
00:40:44,690 --> 00:40:52,040
Because my point was that
these norms are the guys that

630
00:40:52,040 --> 00:40:56,930
go with the sigmas, that have
nice formulas for the sigmas,

631
00:40:56,930 --> 00:40:58,220
and here it is.

632
00:40:58,220 --> 00:41:01,665
It's the square root of the
sum of the squares of all

633
00:41:01,665 --> 00:41:02,165
the sigmas.

634
00:41:07,130 --> 00:41:09,890
So let me write Frobenius again.

635
00:41:14,810 --> 00:41:20,450
But this notation with an
F is now pretty standard,

636
00:41:20,450 --> 00:41:25,280
and we should be able to
see why that number is

637
00:41:25,280 --> 00:41:26,701
the same as that number.

638
00:41:34,940 --> 00:41:35,440
Yeah.

639
00:41:39,420 --> 00:41:41,408
I could give you a
reason or I could put it

640
00:41:41,408 --> 00:41:42,200
on the problem set.

641
00:41:46,330 --> 00:41:48,510
Yeah, I think that's
better on the problem

642
00:41:48,510 --> 00:41:51,570
set, because first of
all, I get off the hook

643
00:41:51,570 --> 00:41:56,910
right away, and secondly,
this connection between--

644
00:41:56,910 --> 00:42:00,820
in Frobenius, that's a beautiful
fact about Frobenius norm

645
00:42:00,820 --> 00:42:03,415
that you add up all
the sigma squares--

646
00:42:03,415 --> 00:42:09,630
it's just m times n of them
because it's a filled matrix.

647
00:42:09,630 --> 00:42:12,990
So another way to say it
is, we haven't written down

648
00:42:12,990 --> 00:42:16,604
the SVD today, A equal
u sigma v transposed.

649
00:42:20,960 --> 00:42:26,530
And the point is that,
for the Frobenius norm--

650
00:42:26,530 --> 00:42:29,300
actually, for all these norms--

651
00:42:29,300 --> 00:42:30,790
I can change u.

652
00:42:30,790 --> 00:42:35,240
It doesn't change the norm,
so I can make u the identity.

653
00:42:35,240 --> 00:42:38,070
u, as we all know, is
an orthogonal matrix,

654
00:42:38,070 --> 00:42:41,220
and what I'm saying
is, orthogonal matrix u

655
00:42:41,220 --> 00:42:43,980
doesn't change any of
these particular norms.

656
00:42:43,980 --> 00:42:46,530
So suppose it was the identity.

657
00:42:46,530 --> 00:42:47,580
Same here.

658
00:42:47,580 --> 00:42:50,580
That could be the identity
without changing the norm.

659
00:42:50,580 --> 00:42:55,110
So we're down to the
norm of Frobenius.

660
00:42:55,110 --> 00:42:58,480
So what's the Frobenius
norm of that guy?

661
00:43:01,180 --> 00:43:06,440
What's the Frobenius norm
of that diagonal matrix?

662
00:43:06,440 --> 00:43:08,660
Well you're supposed
to add up the squares

663
00:43:08,660 --> 00:43:13,410
of all the numbers in the
matrix and what do you get?

664
00:43:13,410 --> 00:43:15,970
You get that, right?

665
00:43:15,970 --> 00:43:18,690
So that's why this
is the same as this

666
00:43:18,690 --> 00:43:22,360
because the orthogonal guy there
and the orthogonal guy there

667
00:43:22,360 --> 00:43:24,220
make no difference in the norm.

668
00:43:24,220 --> 00:43:27,630
But that takes checking, right?

669
00:43:27,630 --> 00:43:28,880
Yeah.

670
00:43:28,880 --> 00:43:30,820
But that's another
way to see why

671
00:43:30,820 --> 00:43:32,810
the Frobenius norm gives this.

672
00:43:32,810 --> 00:43:35,330
And then finally, this
was the nuclear norm.

673
00:43:38,410 --> 00:43:41,350
And actually, just
before my lunch

674
00:43:41,350 --> 00:43:43,440
lecture on the subject
of probability--

675
00:43:43,440 --> 00:43:47,350
I've had a learning morning.

676
00:43:47,350 --> 00:43:52,030
The lunch lecture was about this
crazy way that humans behave.

677
00:43:52,030 --> 00:43:58,000
Not us but other humans.

678
00:43:58,000 --> 00:44:02,310
Other actual-- well, no,
I don't want to say that.

679
00:44:02,310 --> 00:44:06,090
Take that out of the tape.

680
00:44:06,090 --> 00:44:06,970
Yeah, OK.

681
00:44:06,970 --> 00:44:09,370
Anyway, that was that
lecture, but before that

682
00:44:09,370 --> 00:44:16,240
was a lecture for an hour plus
about deep learning by somebody

683
00:44:16,240 --> 00:44:19,600
who really, really has
begun to understand

684
00:44:19,600 --> 00:44:21,820
what is happening inside.

685
00:44:21,820 --> 00:44:25,510
How does that gradient
descent optimization

686
00:44:25,510 --> 00:44:31,660
algorithm pick out, what
does it pick out as the thing

687
00:44:31,660 --> 00:44:33,550
it learns.

688
00:44:33,550 --> 00:44:38,200
This is going to be our
goal in this course.

689
00:44:38,200 --> 00:44:39,550
We're not there yet.

690
00:44:39,550 --> 00:44:43,870
But his conjecture is that--

691
00:44:43,870 --> 00:44:45,290
yeah, so it's a conjecture.

692
00:44:45,290 --> 00:44:46,390
He doesn't have a proof.

693
00:44:46,390 --> 00:44:49,720
He's got proofs
of some nice cases

694
00:44:49,720 --> 00:44:52,720
where things commute but he
hasn't got the whole thing yet,

695
00:44:52,720 --> 00:44:55,840
but it's pretty terrific work.

696
00:44:55,840 --> 00:45:03,430
So this was Professor
Srebro who's in Chicago.

697
00:45:03,430 --> 00:45:05,610
So he just announced
his conjecture,

698
00:45:05,610 --> 00:45:11,700
and his conjecture is that, in a
modeled case, the deep learning

699
00:45:11,700 --> 00:45:14,910
that we'll learn about
with the gradient descent

700
00:45:14,910 --> 00:45:19,210
that we'll learn about to
find the best weights--

701
00:45:19,210 --> 00:45:24,880
the point is that, in
a typical deep learning

702
00:45:24,880 --> 00:45:30,050
problem these days, there are
many more weights than samples

703
00:45:30,050 --> 00:45:34,170
and so there are a lot
of possible minima.

704
00:45:34,170 --> 00:45:37,230
Many different weights
give the same minimum loss

705
00:45:37,230 --> 00:45:40,050
because there are
so many weights.

706
00:45:40,050 --> 00:45:44,010
The problem is, like,
got too many variables,

707
00:45:44,010 --> 00:45:46,440
but it turns out to be
a very, very good thing.

708
00:45:46,440 --> 00:45:48,310
That's part of the success.

709
00:45:48,310 --> 00:45:54,780
And he believes that
in a model situation,

710
00:45:54,780 --> 00:46:00,090
that optimization
by gradient descent

711
00:46:00,090 --> 00:46:06,970
picks out the weights that
minimize the nuclear norm.

712
00:46:06,970 --> 00:46:10,650
So this would be a norm
of a lot of weights.

713
00:46:10,650 --> 00:46:15,120
And he thinks that's
where the system goes.

714
00:46:15,120 --> 00:46:16,155
We'll see this.

715
00:46:16,155 --> 00:46:18,510
This comes up in
compressed sensing,

716
00:46:18,510 --> 00:46:21,420
as I mentioned last time.

717
00:46:21,420 --> 00:46:26,840
But now I have to remember
what was the definition.

718
00:46:26,840 --> 00:46:30,580
Do you remember what
the nuclear norm?

719
00:46:30,580 --> 00:46:35,060
He often used a little
star instead of an N.

720
00:46:35,060 --> 00:46:37,020
I'll put that in the notes.

721
00:46:37,020 --> 00:46:39,790
Other people call
it the trace norm.

722
00:46:39,790 --> 00:46:47,730
But I think this N kind of gives
it a notation you can remember.

723
00:46:47,730 --> 00:46:49,733
So let's call it
the nuclear norm.

724
00:46:49,733 --> 00:46:51,150
Do you remember
what that one was?

725
00:46:54,000 --> 00:46:56,570
Yeah, somebody's
saying it right.

726
00:46:56,570 --> 00:46:58,070
Add the sigmas, yeah.

727
00:46:58,070 --> 00:47:05,620
Just the sum of the sigmas,
like the l1 norm, in a way.

728
00:47:05,620 --> 00:47:07,950
So that's the idea,
is that this is

729
00:47:07,950 --> 00:47:14,210
the natural sort of l1
type of norm for matrices.

730
00:47:14,210 --> 00:47:17,353
It's the l1 norm for
that sigma vector.

731
00:47:17,353 --> 00:47:19,270
This would be the l2
norm of the sigma vector.

732
00:47:19,270 --> 00:47:21,870
That would be the
l infinity norm.

733
00:47:21,870 --> 00:47:28,010
Notice that the vector numbers,
infinity, 2, and 1, get

734
00:47:28,010 --> 00:47:35,300
changed around when you
look at the matrix guy.

735
00:47:35,300 --> 00:47:42,460
So that's an exciting idea
and it remains to be proved.

736
00:47:42,460 --> 00:47:45,170
And expert people are
experimenting to see,

737
00:47:45,170 --> 00:47:47,080
is it true?

738
00:47:47,080 --> 00:47:47,910
Yeah.

739
00:47:47,910 --> 00:47:50,820
So that's a big thing
for their future.

740
00:47:50,820 --> 00:47:51,780
Yes.

741
00:47:51,780 --> 00:47:55,560
OK, so today, we've
talked about norms

742
00:47:55,560 --> 00:47:59,730
and this section of the notes
will be all about norms.

743
00:48:02,540 --> 00:48:09,930
We've taken a big leap into
a comment about deep learning

744
00:48:09,930 --> 00:48:14,880
and this is what I
want to say the most.

745
00:48:14,880 --> 00:48:18,120
And I say it to
every class I teach

746
00:48:18,120 --> 00:48:22,050
near the start of the semester.

747
00:48:22,050 --> 00:48:26,410
My feeling about what my
job is to teach you things,

748
00:48:26,410 --> 00:48:30,880
or to join with you in learning
things, as happened today.

749
00:48:30,880 --> 00:48:32,260
It's not to grade you.

750
00:48:32,260 --> 00:48:37,810
I don't spend any time
losing sleep-- you know,

751
00:48:37,810 --> 00:48:42,550
should that person take a
one point or epsilon penalty

752
00:48:42,550 --> 00:48:47,461
for turning it in
four minutes late?

753
00:48:47,461 --> 00:48:49,390
To Hell with that, right?

754
00:48:49,390 --> 00:48:52,780
We've got a lot to do here.

755
00:48:52,780 --> 00:48:55,150
So anyway, we'll
get on with the job.

756
00:48:55,150 --> 00:49:00,760
So homework three
coming up, and you'll

757
00:49:00,760 --> 00:49:02,950
be using the notes
that you already

758
00:49:02,950 --> 00:49:07,410
have posted in Stellar for
those sections eight and nine

759
00:49:07,410 --> 00:49:09,130
and so on.

760
00:49:09,130 --> 00:49:11,270
And we'll keep going on Monday.

761
00:49:11,270 --> 00:49:14,580
OK, see you on Monday
and have a great weekend.