1
00:00:01,550 --> 00:00:03,920
The following content is
provided under a Creative

2
00:00:03,920 --> 00:00:05,310
Commons license.

3
00:00:05,310 --> 00:00:07,520
Your support will help
MIT OpenCourseWare

4
00:00:07,520 --> 00:00:11,610
continue to offer high quality
educational resources for free.

5
00:00:11,610 --> 00:00:14,180
To make a donation or to
view additional materials

6
00:00:14,180 --> 00:00:18,140
from hundreds of MIT courses,
visit MIT OpenCourseWare

7
00:00:18,140 --> 00:00:19,026
at ocw.mit.edu.

8
00:00:24,420 --> 00:00:26,490
PROFESSOR: Well, OK.

9
00:00:26,490 --> 00:00:30,330
So first important
things about the course,

10
00:00:30,330 --> 00:00:31,890
plans for the course.

11
00:00:31,890 --> 00:00:38,970
And then today I'm going
to move to the next section

12
00:00:38,970 --> 00:00:44,670
of the notes, section 2,
or part 2, I should say.

13
00:00:44,670 --> 00:00:49,260
And actually I'll skip for
the moments section 2-1

14
00:00:49,260 --> 00:00:57,180
and go to section 2-2,
and all of chapter 2

15
00:00:57,180 --> 00:01:03,840
will come to you probably
today or latest tomorrow.

16
00:01:03,840 --> 00:01:06,480
So that's where
we're going next.

17
00:01:06,480 --> 00:01:09,060
I'm following the
notes pretty carefully,

18
00:01:09,060 --> 00:01:15,180
except I'm going to skip the
section on tensors until I

19
00:01:15,180 --> 00:01:18,640
learn more basically.

20
00:01:18,640 --> 00:01:19,140
Yeah.

21
00:01:19,140 --> 00:01:19,650
Yeah.

22
00:01:19,650 --> 00:01:21,510
I could say a little
about tensors,

23
00:01:21,510 --> 00:01:29,190
but this flows
naturally using the SVD.

24
00:01:29,190 --> 00:01:34,620
So it's just a terribly
important problem,

25
00:01:34,620 --> 00:01:35,400
least squares.

26
00:01:35,400 --> 00:01:38,910
And of course, I know that
you've seen one or two

27
00:01:38,910 --> 00:01:40,440
ways to do least squares.

28
00:01:40,440 --> 00:01:44,310
And really the whole
subject comes together.

29
00:01:44,310 --> 00:01:48,750
Here I want to say
something, before I send out

30
00:01:48,750 --> 00:01:52,620
a plan for looking ahead
for the course as a whole.

31
00:01:55,750 --> 00:01:57,220
So there's no final exam.

32
00:01:57,220 --> 00:02:01,810
And I don't really see how to
examine you, how to give tests.

33
00:02:01,810 --> 00:02:06,220
I could, of course,
create our tests

34
00:02:06,220 --> 00:02:08,229
about the linear algebra part.

35
00:02:08,229 --> 00:02:11,730
But I don't think it's--

36
00:02:11,730 --> 00:02:14,440
it's not sort of the
style of this course

37
00:02:14,440 --> 00:02:21,010
to expect you quickly to create
a proof for something in class.

38
00:02:21,010 --> 00:02:23,530
So I think, and
especially looking

39
00:02:23,530 --> 00:02:28,360
at what we're headed for,
and moving quite steadily

40
00:02:28,360 --> 00:02:34,750
in that direction,
is all the problems

41
00:02:34,750 --> 00:02:39,280
that this linear algebra
is is aimed at, right up to

42
00:02:39,280 --> 00:02:46,990
and including conjugate gradient
descent and deep learning,

43
00:02:46,990 --> 00:02:58,270
the overwhelmingly important and
lively, active research area.

44
00:02:58,270 --> 00:03:02,140
I couldn't do better than
to keep the course going

45
00:03:02,140 --> 00:03:03,500
in that direction.

46
00:03:03,500 --> 00:03:07,060
So I think what I
would ask you to do

47
00:03:07,060 --> 00:03:17,380
is late in sort of April, May,
the regular homeworks I'll

48
00:03:17,380 --> 00:03:19,900
discontinue at a certain point.

49
00:03:19,900 --> 00:03:27,220
And then instead, I'll be asking
and encouraging a project--

50
00:03:27,220 --> 00:03:30,700
I don't know if that's the
right word to be using--

51
00:03:30,700 --> 00:03:36,150
in which you use
what we've done.

52
00:03:36,150 --> 00:03:39,070
And I'll send out a
message on Stellar

53
00:03:39,070 --> 00:03:42,820
listing five or six
areas and only--

54
00:03:42,820 --> 00:03:46,090
I mean, one of them is
the machine learning, deep

55
00:03:46,090 --> 00:03:46,850
learning part.

56
00:03:46,850 --> 00:03:50,410
But they're all the
other parts, things

57
00:03:50,410 --> 00:03:53,050
we are learning how to do.

58
00:03:53,050 --> 00:03:57,010
How to find sparse
solutions, for example,

59
00:03:57,010 --> 00:03:59,400
or something about
the pseudo inverse.

60
00:03:59,400 --> 00:04:00,700
All kinds of things.

61
00:04:00,700 --> 00:04:06,310
So that's my goal,
is to give you

62
00:04:06,310 --> 00:04:11,800
something to do which uses the
material that you've learned.

63
00:04:11,800 --> 00:04:15,010
And look, I'm not
expecting a thesis.

64
00:04:15,010 --> 00:04:19,040
But it's a good chance.

65
00:04:19,040 --> 00:04:22,880
So it will be more
than just, drag

66
00:04:22,880 --> 00:04:29,120
in some code for deep learning
and some data matrix and do it.

67
00:04:29,120 --> 00:04:32,910
But we'll talk more
as the time comes.

68
00:04:32,910 --> 00:04:35,420
So I just thought I'd
say, before sending out

69
00:04:35,420 --> 00:04:36,920
the announcement,
I would say it's

70
00:04:36,920 --> 00:04:47,180
coming about what as a
larger scale than single one

71
00:04:47,180 --> 00:04:51,090
week homeworks would
be here before.

72
00:04:51,090 --> 00:04:52,400
Any thoughts about that?

73
00:04:52,400 --> 00:04:56,310
I haven't given you details.

74
00:04:56,310 --> 00:05:01,520
So let me do that with a
message, and then ask again.

75
00:05:01,520 --> 00:05:02,930
But I'm open to--

76
00:05:02,930 --> 00:05:04,640
I hope you've understood--

77
00:05:04,640 --> 00:05:08,270
I think you have-- that
if you make suggestions,

78
00:05:08,270 --> 00:05:13,850
either directly to my email
or on Piazza or whatever,

79
00:05:13,850 --> 00:05:15,890
they get paid attention to.

80
00:05:15,890 --> 00:05:17,450
OK.

81
00:05:17,450 --> 00:05:21,220
Shall I just go forward
with least squares?

82
00:05:21,220 --> 00:05:22,900
So what's the least
squares problem,

83
00:05:22,900 --> 00:05:28,470
and what are these four
ways, each bringing--

84
00:05:28,470 --> 00:05:31,150
so let me speak about
the pseudo inverse first.

85
00:05:31,150 --> 00:05:33,770
OK, the pseudo
inverse of a matrix.

86
00:05:33,770 --> 00:05:34,270
All right.

87
00:05:34,270 --> 00:05:34,770
Good.

88
00:05:39,620 --> 00:05:44,510
So we have a matrix A, m by n.

89
00:05:44,510 --> 00:05:48,680
And the pseudo inverse
I'm going to call A plus.

90
00:05:48,680 --> 00:05:52,630
And it naturally is
going to be n by m.

91
00:05:52,630 --> 00:05:55,370
I'm going to multiply
those together.

92
00:05:55,370 --> 00:05:59,570
And I'm going to get as near
to the identity as I can.

93
00:05:59,570 --> 00:06:01,970
That's the idea, of course,
of the pseudo inverse,

94
00:06:01,970 --> 00:06:06,330
The word pseudo is in
there, so no one's deceived.

95
00:06:06,330 --> 00:06:08,330
It's not an actual inverse.

96
00:06:08,330 --> 00:06:14,210
Oh, if the matrix is square
and has an inverse, of course.

97
00:06:14,210 --> 00:06:22,130
Then if A inverse
exists, which requires--

98
00:06:22,130 --> 00:06:24,620
everybody remembers
it requires the matrix

99
00:06:24,620 --> 00:06:29,360
to be square, because I
mean inverse on both sides.

100
00:06:29,360 --> 00:06:34,910
And it requires
rank n, full rank.

101
00:06:34,910 --> 00:06:36,710
Then the inverse will exist.

102
00:06:36,710 --> 00:06:38,000
You can check it.

103
00:06:38,000 --> 00:06:40,460
MATLAB would check
it by computing

104
00:06:40,460 --> 00:06:44,540
the pivots in elimination
and finding n pivots.

105
00:06:44,540 --> 00:06:49,130
So if A inverse
exists, which means

106
00:06:49,130 --> 00:06:55,280
A times A inverse, and A
inverse times A, both give I,

107
00:06:55,280 --> 00:07:01,640
then A plus is A
inverse, of course.

108
00:07:04,290 --> 00:07:07,810
The pseudo inverse is the
inverse when there is one.

109
00:07:07,810 --> 00:07:10,600
But I'm thinking
about cases where

110
00:07:10,600 --> 00:07:13,920
either the matrix
is rectangular,

111
00:07:13,920 --> 00:07:19,550
or it has zero eigenvalues.

112
00:07:19,550 --> 00:07:24,090
It could be square, but it has
a null space, other than just

113
00:07:24,090 --> 00:07:25,840
the 0 vector.

114
00:07:25,840 --> 00:07:28,800
In other words, the
columns are dependent.

115
00:07:28,800 --> 00:07:32,370
What can we do then
about inverting it?

116
00:07:32,370 --> 00:07:34,350
We can't literally invert it.

117
00:07:34,350 --> 00:07:38,520
If A has a null
space, then when I

118
00:07:38,520 --> 00:07:44,910
multiply by a vector x in
that null space, Ax will be 0.

119
00:07:44,910 --> 00:07:48,720
And when I multiply
by A inverse, still 0.

120
00:07:48,720 --> 00:07:50,850
That can't change the 0.

121
00:07:50,850 --> 00:07:56,560
So if there is an x in the null
space, then this can't happen.

122
00:07:56,560 --> 00:07:59,000
So we just do the best we can.

123
00:07:59,000 --> 00:08:01,390
And that's what this
pseudo inverse is.

124
00:08:01,390 --> 00:08:06,400
And so let me draw a picture of
the picture you know of the row

125
00:08:06,400 --> 00:08:11,530
space and the null space.

126
00:08:11,530 --> 00:08:13,120
OK, and it's there, you see.

127
00:08:13,120 --> 00:08:14,950
There is a null space.

128
00:08:14,950 --> 00:08:19,210
And over here I have the
column space and the null space

129
00:08:19,210 --> 00:08:20,880
of A transpose.

130
00:08:20,880 --> 00:08:21,850
OK.

131
00:08:21,850 --> 00:08:24,620
So this is the row
space, of course.

132
00:08:24,620 --> 00:08:28,180
That's the column
space of A transpose,

133
00:08:28,180 --> 00:08:32,030
and there is the
column space of A. OK.

134
00:08:32,030 --> 00:08:35,059
So which part of that picture
is invertible, and which part

135
00:08:35,059 --> 00:08:37,250
of the picture is hopeless?

136
00:08:37,250 --> 00:08:39,710
The top part is invertible.

137
00:08:39,710 --> 00:08:45,470
This is the r-dimensional row
space, r-dimensional column

138
00:08:45,470 --> 00:08:46,040
space.

139
00:08:46,040 --> 00:08:50,450
A takes a vector in here,
zaps it into every--

140
00:08:50,450 --> 00:08:54,710
you always end up
in the column space.

141
00:08:54,710 --> 00:08:57,410
Here I take a vector
in the row space--

142
00:08:57,410 --> 00:09:02,210
say, x-- and it
gets mapped to Ax.

143
00:09:02,210 --> 00:09:10,080
And between those two spaces,
A is entirely invertible.

144
00:09:10,080 --> 00:09:12,260
You get separate
vectors here, go

145
00:09:12,260 --> 00:09:15,230
to separate vectors
in the column space,

146
00:09:15,230 --> 00:09:19,720
and the inverse
just brings it back.

147
00:09:19,720 --> 00:09:23,770
So we know what the
pseudo inverse should do.

148
00:09:23,770 --> 00:09:30,060
It will take A will go
that way, and A plus,

149
00:09:30,060 --> 00:09:32,160
the pseudo inverse
will be just--

150
00:09:36,270 --> 00:09:39,810
on the top half of the
picture, it'll give us A plus.

151
00:09:39,810 --> 00:09:46,800
We'll take Ax back
to x in the top half.

152
00:09:46,800 --> 00:09:48,570
Now, what about here?

153
00:09:48,570 --> 00:09:51,210
That's where we have
trouble, when we don't have--

154
00:09:51,210 --> 00:09:53,340
that's what spoils our inverse.

155
00:09:53,340 --> 00:09:59,970
If there is a null space
vector, then it goes where?

156
00:09:59,970 --> 00:10:03,940
When you multiply by A, this
guy in the null space goes to 0.

157
00:10:07,050 --> 00:10:09,840
Usually along a straighter
line than I've drawn.

158
00:10:09,840 --> 00:10:10,590
But it goes there.

159
00:10:10,590 --> 00:10:12,300
It gets to 0.

160
00:10:12,300 --> 00:10:15,780
So you can't raise it from
the dead, so to speak.

161
00:10:15,780 --> 00:10:18,950
You can't recover it when
there's no A inverse.

162
00:10:18,950 --> 00:10:24,120
So we have to think, what shall
A inverse do to this space

163
00:10:24,120 --> 00:10:26,910
here, where nobody's hitting it?

164
00:10:26,910 --> 00:10:34,830
So this would be the null
space of A transpose.

165
00:10:34,830 --> 00:10:40,040
Because A-- sorry-- yeah, what
should the pseudo inverse do?

166
00:10:40,040 --> 00:10:41,690
I said what should
the inverse do?

167
00:10:41,690 --> 00:10:43,430
The inverse is helpless.

168
00:10:43,430 --> 00:10:46,610
But we have to define A plus.

169
00:10:46,610 --> 00:10:50,630
I've said what it should do on
that guy, on the column space.

170
00:10:50,630 --> 00:10:52,520
It should take everything
in the column space

171
00:10:52,520 --> 00:10:54,790
back where it came from.

172
00:10:54,790 --> 00:11:00,020
But what should it do on this
orthogonal space, where--

173
00:11:00,020 --> 00:11:03,460
yeah, just tell me,
what do you think?

174
00:11:03,460 --> 00:11:06,750
If I have some vector here--

175
00:11:06,750 --> 00:11:09,580
let's call it V r plus 1.

176
00:11:09,580 --> 00:11:11,410
That would be like--

177
00:11:11,410 --> 00:11:24,430
so here I have a nice
basis for the column space.

178
00:11:24,430 --> 00:11:30,400
I would use V's for the ones
that come up in the SVD.

179
00:11:30,400 --> 00:11:34,090
They're orthogonal, and they
come from orthogonal U's.

180
00:11:34,090 --> 00:11:36,040
So the top half is great.

181
00:11:36,040 --> 00:11:40,120
What shall I do with this stuff?

182
00:11:40,120 --> 00:11:43,780
I'm going to send
that back by A plus.

183
00:11:43,780 --> 00:11:46,720
And what am I going
to do with it?

184
00:11:46,720 --> 00:11:49,750
Send it to-- nowhere
else could it go.

185
00:11:49,750 --> 00:11:51,430
0 is the right answer.

186
00:11:51,430 --> 00:11:53,470
All this stuff goes back to 0.

187
00:11:56,260 --> 00:12:00,090
I'm looking for a linear
operator, a matrix.

188
00:12:00,090 --> 00:12:02,440
And I have to think,
once I've decided

189
00:12:02,440 --> 00:12:05,740
what to do with all those and
what to do with all these,

190
00:12:05,740 --> 00:12:08,110
then I know what to do
with any combination.

191
00:12:08,110 --> 00:12:09,640
So I've got it.

192
00:12:09,640 --> 00:12:10,540
I've got it.

193
00:12:10,540 --> 00:12:19,050
So the idea will be, this is
true for x in the row space.

194
00:12:19,050 --> 00:12:25,735
For x in the row space,
if x is in the row space,

195
00:12:25,735 --> 00:12:29,710
Ax is in the column space, and
A inverse just brings it back

196
00:12:29,710 --> 00:12:30,980
as it should.

197
00:12:30,980 --> 00:12:34,510
And in the case of
an invertible matrix

198
00:12:34,510 --> 00:12:37,000
A, what happens to my picture?

199
00:12:37,000 --> 00:12:41,230
What is this picture looking
like if A is actually a 6

200
00:12:41,230 --> 00:12:44,020
by 6 invertible matrix?

201
00:12:44,020 --> 00:12:47,200
In that case,
what's in my picture

202
00:12:47,200 --> 00:12:50,800
and what is not in my picture?

203
00:12:50,800 --> 00:12:54,070
All this null space
stuff isn't there.

204
00:12:54,070 --> 00:12:57,800
And null space is
just a 0 vector.

205
00:12:57,800 --> 00:12:59,670
But all that I don't
have to worry about.

206
00:12:59,670 --> 00:13:01,980
But in general,
I do have to say.

207
00:13:01,980 --> 00:13:08,890
So the point is
that A plus on the--

208
00:13:08,890 --> 00:13:12,330
what am I calling this?

209
00:13:12,330 --> 00:13:14,980
It's the null space
of A transpose,

210
00:13:14,980 --> 00:13:25,790
or whatever on V r
plus 1 to Vn, all those

211
00:13:25,790 --> 00:13:32,150
vectors, the guys that are not
orthogonal to the column space.

212
00:13:32,150 --> 00:13:36,540
Then we have to say, what
does A plus do to them?

213
00:13:36,540 --> 00:13:38,470
And the answer is, it
takes them all to 0.

214
00:13:41,020 --> 00:13:45,580
So there is a
picture using what I

215
00:13:45,580 --> 00:13:48,850
call the big picture of linear
algebra, the four spaces.

216
00:13:48,850 --> 00:13:51,520
You see what A plus should do.

217
00:13:51,520 --> 00:13:55,610
Now, I need a little
formula for it.

218
00:13:55,610 --> 00:13:58,970
I've got the plan for
what it should be,

219
00:13:58,970 --> 00:14:01,310
and it's sort of
the natural thing.

220
00:14:01,310 --> 00:14:06,470
So A plus A is, you could
say it's a projection matrix.

221
00:14:06,470 --> 00:14:11,420
It's not the identity
matrix because if x

222
00:14:11,420 --> 00:14:17,750
is in the null space, A
plus A will take it to 0.

223
00:14:17,750 --> 00:14:18,830
So it's a projection.

224
00:14:18,830 --> 00:14:22,130
A plus A is the identity
on the top half,

225
00:14:22,130 --> 00:14:23,840
and 0 on the bottom half.

226
00:14:23,840 --> 00:14:26,780
That's really what
the matrix is.

227
00:14:26,780 --> 00:14:34,240
And now, I want a
simple formula for it.

228
00:14:34,240 --> 00:14:37,700
And I guess my message
here is, that if we're

229
00:14:37,700 --> 00:14:41,750
looking for a nice expression,
start with the SVD.

230
00:14:41,750 --> 00:14:46,070
Because the SVD
works for any matrix.

231
00:14:46,070 --> 00:14:49,160
And it writes it as
an orthogonal matrix

232
00:14:49,160 --> 00:14:53,920
times a diagonal matrix
times an orthogonal matrix.

233
00:14:53,920 --> 00:14:56,560
And now I want to invert it.

234
00:14:56,560 --> 00:15:00,460
Well, suppose A had an inverse.

235
00:15:00,460 --> 00:15:01,360
What would that be?

236
00:15:04,600 --> 00:15:13,870
This is if invertible, what
would be the SVD of A inverse?

237
00:15:13,870 --> 00:15:18,080
What would be the singular value
decomposition, if this is good?

238
00:15:18,080 --> 00:15:20,410
So when is this
going to be good?

239
00:15:20,410 --> 00:15:24,380
What would I have to know
about that matrix sigma,

240
00:15:24,380 --> 00:15:26,680
that diagonal matrix
in the middle,

241
00:15:26,680 --> 00:15:32,100
if this is truly an
invertible matrix?

242
00:15:32,100 --> 00:15:32,600
Well, no.

243
00:15:32,600 --> 00:15:33,590
What's its name?

244
00:15:33,590 --> 00:15:35,270
Those are not eigenvalues.

245
00:15:35,270 --> 00:15:38,900
Well, they're eigenvalues
of A transpose A.

246
00:15:38,900 --> 00:15:40,280
But they're singular values.

247
00:15:40,280 --> 00:15:41,850
Singular value, that's fine.

248
00:15:41,850 --> 00:15:44,300
So that's the
singular value matrix.

249
00:15:44,300 --> 00:15:51,250
And what would be the
situation if A had an inverse?

250
00:15:51,250 --> 00:15:53,050
There would be no 0's.

251
00:15:53,050 --> 00:15:54,880
All the singular
values would be sitting

252
00:15:54,880 --> 00:15:57,220
there, sigma 1 to sigma n.

253
00:15:57,220 --> 00:15:59,650
What would be the shape
of this sigma matrix?

254
00:15:59,650 --> 00:16:05,470
If I have an inverse, then
it's got to be square n by n.

255
00:16:05,470 --> 00:16:09,280
So what's the shape
of the sigma guy?

256
00:16:09,280 --> 00:16:12,220
Also square, n by n.

257
00:16:12,220 --> 00:16:16,010
So the invertible
case would be--

258
00:16:16,010 --> 00:16:18,110
and I'm going to erase
this in a minute--

259
00:16:18,110 --> 00:16:23,426
the invertbile case would
be when sigma is just that.

260
00:16:23,426 --> 00:16:25,800
That would be the
invertible case.

261
00:16:25,800 --> 00:16:28,830
So let's see.

262
00:16:28,830 --> 00:16:30,660
Can you finish this formula?

263
00:16:30,660 --> 00:16:35,220
What would be the
SVD of A inverse?

264
00:16:35,220 --> 00:16:38,370
So I'm given the SVD
of A. I'm given the U

265
00:16:38,370 --> 00:16:43,080
and the sigma is cool
and the V transpose.

266
00:16:43,080 --> 00:16:44,640
What's the inverse of that?

267
00:16:44,640 --> 00:16:46,190
Yeah, I'm just
really asking what's

268
00:16:46,190 --> 00:16:51,130
the inverse of that
product of three matrices.

269
00:16:51,130 --> 00:16:52,930
What comes first here?

270
00:16:52,930 --> 00:16:56,230
V. The inverse of
V transpose is V.

271
00:16:56,230 --> 00:17:00,380
That's because V is
a orthogonal matrix.

272
00:17:00,380 --> 00:17:02,830
The inverse of sigma,
just 1 over it,

273
00:17:02,830 --> 00:17:04,720
is just the sigma inverse.

274
00:17:04,720 --> 00:17:06,609
It's obvious what that means.

275
00:17:06,609 --> 00:17:09,020
And the inverse of
U would go here.

276
00:17:09,020 --> 00:17:11,230
And that is U transpose.

277
00:17:11,230 --> 00:17:12,589
Great.

278
00:17:12,589 --> 00:17:13,089
OK.

279
00:17:13,089 --> 00:17:16,140
So this is if invertible.

280
00:17:16,140 --> 00:17:21,609
If invertible, we know what
the SVD of A inverse is.

281
00:17:21,609 --> 00:17:28,560
It just takes the V's back
to the U's, or the U's back

282
00:17:28,560 --> 00:17:29,830
to the V's, whichever.

283
00:17:29,830 --> 00:17:30,540
OK.

284
00:17:30,540 --> 00:17:31,170
OK.

285
00:17:31,170 --> 00:17:37,970
Now we've got to do it,
if we're going to allow--

286
00:17:37,970 --> 00:17:42,330
if we're going to get beyond
this limit, this situation,

287
00:17:42,330 --> 00:17:46,860
allow the matrix sigma
to be rectangular.

288
00:17:46,860 --> 00:17:51,750
Then let me just show
you the idea here.

289
00:17:51,750 --> 00:17:57,510
So now I'm going to say,
now sigma, in general,

290
00:17:57,510 --> 00:17:59,190
it's rectangular.

291
00:17:59,190 --> 00:18:04,460
It's got r non 0's on the
diagonal, but then it quits.

292
00:18:04,460 --> 00:18:09,610
So it's got a bunch of 0's
that make it not invertible.

293
00:18:09,610 --> 00:18:14,020
But let's do our best
and pseudo invert it.

294
00:18:14,020 --> 00:18:15,130
OK.

295
00:18:15,130 --> 00:18:20,650
So now help me get started
on a formula for using--

296
00:18:20,650 --> 00:18:24,970
I want to write this A plus,
which I described up there,

297
00:18:24,970 --> 00:18:27,880
in terms of the subspaces.

298
00:18:27,880 --> 00:18:33,700
Now I'm going to describe A plus
in terms of U, sigma, and V,

299
00:18:33,700 --> 00:18:35,350
the SVD guys.

300
00:18:35,350 --> 00:18:36,010
OK.

301
00:18:36,010 --> 00:18:39,520
So what shall I start with here?

302
00:18:39,520 --> 00:18:41,410
Well, let me give a hint.

303
00:18:41,410 --> 00:18:43,360
That was a great start.

304
00:18:43,360 --> 00:18:47,620
My V is still an
orthogonal matrix.

305
00:18:47,620 --> 00:18:50,170
V transpose is still
an orthogonal matrix.

306
00:18:50,170 --> 00:18:52,780
I'll invert it.

307
00:18:52,780 --> 00:18:57,280
At the end, the
U was no problem.

308
00:18:57,280 --> 00:19:00,170
All the problems are in sigma.

309
00:19:00,170 --> 00:19:04,640
And sigma, remember, sigma--

310
00:19:04,640 --> 00:19:06,320
so it's rectangular.

311
00:19:06,320 --> 00:19:10,070
Maybe I'll make
it wide, two wide.

312
00:19:10,070 --> 00:19:13,970
And maybe I'll only give
it two non-zeros, and then

313
00:19:13,970 --> 00:19:16,260
all the rest.

314
00:19:16,260 --> 00:19:22,400
So the rank of my matrix
A is 2, but the m and n

315
00:19:22,400 --> 00:19:23,990
are bigger than 2.

316
00:19:23,990 --> 00:19:26,660
It's just got two
independent columns,

317
00:19:26,660 --> 00:19:30,980
and then it's just sort
of totally singular.

318
00:19:30,980 --> 00:19:31,550
OK.

319
00:19:31,550 --> 00:19:35,630
So my question is, what
am I going to put there?

320
00:19:35,630 --> 00:19:37,850
And I've described it
one way, but now I'm

321
00:19:37,850 --> 00:19:39,500
going to describe
it another way.

322
00:19:39,500 --> 00:19:42,940
Well, let me just say,
what I'll put there

323
00:19:42,940 --> 00:19:46,390
is the pseudo inverse of sigma.

324
00:19:46,390 --> 00:19:50,680
I can't put sigma inverse
using that symbol,

325
00:19:50,680 --> 00:19:53,080
because there is no such thing.

326
00:19:53,080 --> 00:19:55,220
With this, I can't invert it.

327
00:19:55,220 --> 00:19:59,390
So that's the best I can do.

328
00:19:59,390 --> 00:20:03,330
So I'm almost done, but to
finish, I have to tell you,

329
00:20:03,330 --> 00:20:06,880
what is this thing?

330
00:20:06,880 --> 00:20:09,250
So sigma plus.

331
00:20:09,250 --> 00:20:11,860
I'm now going to
tell you sigma plus.

332
00:20:11,860 --> 00:20:15,490
And then that's what should
sit there in the middle.

333
00:20:15,490 --> 00:20:18,400
So if sigma is this
diagonal matrix

334
00:20:18,400 --> 00:20:22,690
which quits after two sigmas,
what should sigma plus be?

335
00:20:22,690 --> 00:20:28,180
Well, first of all, it should
be rectangular the other way.

336
00:20:28,180 --> 00:20:33,130
If this was m by n column,
n columns and m rows,

337
00:20:33,130 --> 00:20:39,070
now I want to have n
rows and m columns.

338
00:20:39,070 --> 00:20:42,750
And yeah, here's the question.

339
00:20:42,750 --> 00:20:44,760
What's the best inverse
you could come up

340
00:20:44,760 --> 00:20:47,270
with for that sigma?

341
00:20:47,270 --> 00:20:51,230
I mean, if somebody
independent of 18.065,

342
00:20:51,230 --> 00:20:55,850
if somebody asks you, do your
best to invert that matrix,

343
00:20:55,850 --> 00:21:00,320
I think we'd all
agree it is, yeah.

344
00:21:00,320 --> 00:21:04,580
One over the sigma
1 would come there.

345
00:21:04,580 --> 00:21:08,180
And 1 over sigma
2, the non zeros.

346
00:21:08,180 --> 00:21:10,800
And then?

347
00:21:10,800 --> 00:21:12,290
Zeros.

348
00:21:12,290 --> 00:21:15,630
Just the way up there, when
we didn't know what to do,

349
00:21:15,630 --> 00:21:17,860
when there was
nothing good to do.

350
00:21:17,860 --> 00:21:20,400
Zero was the right answer.

351
00:21:20,400 --> 00:21:23,310
So this is all zeros.

352
00:21:23,310 --> 00:21:25,470
Of course, it's
rectangular the other way.

353
00:21:25,470 --> 00:21:28,200
But do you see
that if I multiply

354
00:21:28,200 --> 00:21:31,260
sigma plus times
sigma, if I multiply

355
00:21:31,260 --> 00:21:35,250
the pseudo inverse times
the matrix, what do I

356
00:21:35,250 --> 00:21:39,170
get if I multiply that by that?

357
00:21:39,170 --> 00:21:41,440
What does that
multiplication produce?

358
00:21:41,440 --> 00:21:42,950
Can you describe the--

359
00:21:42,950 --> 00:21:46,100
well, or when you tell
me what it looks like,

360
00:21:46,100 --> 00:21:47,170
I'll write it down.

361
00:21:47,170 --> 00:21:50,740
So what is sigma
plus times sigma?

362
00:21:50,740 --> 00:21:54,250
If sigma is a diagonal,
sigma plus is a diagonal,

363
00:21:54,250 --> 00:21:59,510
and they both quit
after two guys.

364
00:21:59,510 --> 00:22:01,990
What do I have?

365
00:22:01,990 --> 00:22:03,300
One?

366
00:22:03,300 --> 00:22:07,480
Because sigma 1 times
1 over sigma 1 is a 1.

367
00:22:07,480 --> 00:22:11,260
And the other next guy is a 1.

368
00:22:11,260 --> 00:22:14,320
And the rest are all zeros.

369
00:22:14,320 --> 00:22:16,360
That's right.

370
00:22:16,360 --> 00:22:18,250
That's the best I could do.

371
00:22:18,250 --> 00:22:22,250
The rank was only two, so
I couldn't get anywhere.

372
00:22:22,250 --> 00:22:26,020
So that tells you
what sigma plus is.

373
00:22:26,020 --> 00:22:27,320
OK.

374
00:22:27,320 --> 00:22:30,210
So I described
the pseudo inverse

375
00:22:30,210 --> 00:22:33,020
then with a picture
of spaces, and then

376
00:22:33,020 --> 00:22:35,750
with a formula of matrices.

377
00:22:35,750 --> 00:22:39,960
And now I want to use
it in least squares.

378
00:22:39,960 --> 00:22:43,820
So now I'm going to say what
is the least squares problem.

379
00:22:43,820 --> 00:22:50,450
And the first way to solve
it will be to involve--

380
00:22:50,450 --> 00:22:52,550
A plus will give the solution.

381
00:22:52,550 --> 00:22:53,050
OK.

382
00:22:53,050 --> 00:22:56,420
So what is the least
squares problem?

383
00:22:56,420 --> 00:22:58,010
Let me put it here.

384
00:23:05,050 --> 00:23:09,190
OK, the least squares problem
is simply, you have an equation,

385
00:23:09,190 --> 00:23:10,440
Ax equals b.

386
00:23:14,190 --> 00:23:16,730
But A is not invertible.

387
00:23:16,730 --> 00:23:19,190
So you can't solve it.

388
00:23:19,190 --> 00:23:21,290
Of course, for which--

389
00:23:21,290 --> 00:23:24,740
yeah, you could solve
it for certain b's.

390
00:23:24,740 --> 00:23:27,380
If b is in the
column space of A,

391
00:23:27,380 --> 00:23:30,620
then just by the
meaning of column space,

392
00:23:30,620 --> 00:23:32,270
this has a solution.

393
00:23:32,270 --> 00:23:36,440
The vectors in the column space
are the guys that you can get.

394
00:23:36,440 --> 00:23:39,540
But the vectors in the
orthogonal space you cannot

395
00:23:39,540 --> 00:23:40,040
get.

396
00:23:40,040 --> 00:23:42,230
All the rest of the
vectors you cannot get.

397
00:23:42,230 --> 00:23:50,090
So suppose this is like so,
but always A is m by n rank r.

398
00:23:54,340 --> 00:24:03,820
And then we get A inverse
when m equals n equals r.

399
00:24:03,820 --> 00:24:06,190
That's the invertible case.

400
00:24:06,190 --> 00:24:07,730
OK.

401
00:24:07,730 --> 00:24:12,960
What do we do with a
system of equations

402
00:24:12,960 --> 00:24:15,600
when we can't solve it?

403
00:24:15,600 --> 00:24:19,050
This is probably the main
application in 18.06.

404
00:24:19,050 --> 00:24:25,560
So you've seen this
problem before.

405
00:24:25,560 --> 00:24:29,180
What do we do if Ax
equal b has no solution?

406
00:24:29,180 --> 00:24:33,210
So typically, b would be
a vector of measurements,

407
00:24:33,210 --> 00:24:39,480
like we're tracking a satellite,
and we get some measurements.

408
00:24:39,480 --> 00:24:44,310
But often we get too
many measurements.

409
00:24:44,310 --> 00:24:46,800
And of course, there's
a little noise in them.

410
00:24:46,800 --> 00:24:50,790
And a little noise means that
we can't solve the equations.

411
00:24:50,790 --> 00:24:54,900
That may be the
case everybody knows

412
00:24:54,900 --> 00:25:02,340
is, where this equation is
like expressing a straight line

413
00:25:02,340 --> 00:25:03,860
going through the data points.

414
00:25:03,860 --> 00:25:07,170
So the famous example
of least squares

415
00:25:07,170 --> 00:25:21,140
is fit a straight line
to the b's, to b1, b2.

416
00:25:21,140 --> 00:25:22,985
We've got m measurements.

417
00:25:26,100 --> 00:25:28,450
We've got m measurements.

418
00:25:28,450 --> 00:25:32,170
The physics or the
mechanics of the problem

419
00:25:32,170 --> 00:25:34,000
is pretty well linear.

420
00:25:34,000 --> 00:25:36,650
But of course, there's noise.

421
00:25:36,650 --> 00:25:41,480
And a straight line only
has two degrees of freedom.

422
00:25:41,480 --> 00:25:44,870
So we're going to have only
two columns in our matrix.

423
00:25:44,870 --> 00:25:53,190
A will be only two
columns, with many rows.

424
00:25:53,190 --> 00:25:54,900
Highly rectangular.

425
00:25:54,900 --> 00:25:56,490
So fit a straight line.

426
00:25:56,490 --> 00:26:02,400
Let me call that line Cx plus
D. Say this is the x direction.

427
00:26:02,400 --> 00:26:05,880
This is the b's direction.

428
00:26:05,880 --> 00:26:08,910
And we've got a whole
bunch of data points.

429
00:26:08,910 --> 00:26:10,720
And they're not on a line.

430
00:26:10,720 --> 00:26:11,730
Or they are on the line.

431
00:26:15,360 --> 00:26:18,780
Suppose those did lie on a line.

432
00:26:18,780 --> 00:26:22,610
What would that tell
me about Ax equal b?

433
00:26:22,610 --> 00:26:25,540
I haven't said
everything I need to,

434
00:26:25,540 --> 00:26:28,960
but maybe the insight
is what I'm after here.

435
00:26:28,960 --> 00:26:32,940
If my points are
right on the line--

436
00:26:32,940 --> 00:26:37,590
so there is a straight
line through them--

437
00:26:37,590 --> 00:26:39,830
the unknowns here-- so let me--

438
00:26:39,830 --> 00:26:44,930
so Ax-- the unknowns
here are C and D.

439
00:26:44,930 --> 00:26:48,845
And the right hand side
is all my measurements.

440
00:26:51,820 --> 00:26:54,380
OK.

441
00:26:54,380 --> 00:26:57,920
Suppose-- without my
drawing a picture--

442
00:26:57,920 --> 00:27:01,300
suppose these points
are on the line.

443
00:27:01,300 --> 00:27:05,000
Here's the different x's,
the measurement times.

444
00:27:05,000 --> 00:27:06,680
Here is the different
measurements.

445
00:27:09,250 --> 00:27:10,870
But if they're on
a line, what does

446
00:27:10,870 --> 00:27:16,060
that tell me about my
linear system, Ax equal b?

447
00:27:16,060 --> 00:27:19,800
It has a solution.

448
00:27:19,800 --> 00:27:22,740
Being on a line means
everything's perfect.

449
00:27:22,740 --> 00:27:24,360
There is a solution.

450
00:27:24,360 --> 00:27:27,210
But will there
usually be a solution?

451
00:27:27,210 --> 00:27:28,080
Certainly not.

452
00:27:28,080 --> 00:27:35,980
If I have only two parameters,
two unknowns, two columns here,

453
00:27:35,980 --> 00:27:38,350
the rank is going to be two.

454
00:27:38,350 --> 00:27:44,440
And here I'm trying to hit
any noisy set of measurements.

455
00:27:44,440 --> 00:27:47,800
So of course, in general the
picture will look like that.

456
00:27:47,800 --> 00:27:50,500
And I'm going to look
for the best C and D.

457
00:27:50,500 --> 00:28:01,840
So I'll call it Cx
plus D. Yeah, right.

458
00:28:01,840 --> 00:28:03,250
Sorry.

459
00:28:03,250 --> 00:28:05,760
That's my line.

460
00:28:05,760 --> 00:28:07,570
So those are my equations.

461
00:28:10,220 --> 00:28:14,200
Sorry, I often
write it C plus dx.

462
00:28:14,200 --> 00:28:16,810
Do you mind if I put
the constant term

463
00:28:16,810 --> 00:28:21,770
first in the highly
difficult equation here

464
00:28:21,770 --> 00:28:22,780
for a straight line?

465
00:28:26,790 --> 00:28:29,340
So let me tell you what I'm--

466
00:28:29,340 --> 00:28:32,565
so these are the points where
you have a measurement--

467
00:28:32,565 --> 00:28:35,820
x1, x2, up to xn.

468
00:28:35,820 --> 00:28:39,410
And these are the actual
measurements, b1 up to bm,

469
00:28:39,410 --> 00:28:40,770
let's say .

470
00:28:40,770 --> 00:28:43,740
And then my equations are--

471
00:28:43,740 --> 00:28:46,530
I just want to set
up a matrix here.

472
00:28:46,530 --> 00:28:49,420
I just want to
set up the matrix.

473
00:28:49,420 --> 00:28:55,160
So I want C to get multiplied
by ones every time.

474
00:28:55,160 --> 00:29:02,010
And I want D to get multiplied
by these x's-- x1, x2, x3,

475
00:29:02,010 --> 00:29:05,790
to xm, the measurement places.

476
00:29:05,790 --> 00:29:07,740
And those are the measurements.

477
00:29:07,740 --> 00:29:10,300
Anyway.

478
00:29:10,300 --> 00:29:13,165
And my problem is,
this has no solution.

479
00:29:15,850 --> 00:29:17,770
So what do I do when
there's no solution?

480
00:29:20,710 --> 00:29:24,810
Well, I'll do what Gauss did.

481
00:29:24,810 --> 00:29:28,570
He was a good mathematician,
so I'll follow his advice.

482
00:29:28,570 --> 00:29:34,020
And I won't do it all
semester, as you know.

483
00:29:34,020 --> 00:29:40,330
But Gauss's advice
was, minimize--

484
00:29:40,330 --> 00:29:43,200
I'll blame it on Gauss--

485
00:29:43,200 --> 00:29:53,550
the distance between Ax and b
squared, the L2 norm squared,

486
00:29:53,550 --> 00:30:02,030
which is just Ax minus
b transpose Ax minus b.

487
00:30:02,030 --> 00:30:04,190
It's a quadratic.

488
00:30:04,190 --> 00:30:10,050
And minimizing it gives me a
system of linear equations.

489
00:30:10,050 --> 00:30:12,330
So in the end, they
will have a solution.

490
00:30:12,330 --> 00:30:14,660
So that's the whole
point of least squares.

491
00:30:14,660 --> 00:30:20,080
We have an unsolvable
problem, not no solution.

492
00:30:20,080 --> 00:30:25,330
We follow Gauss's advice
to get the best we can.

493
00:30:25,330 --> 00:30:28,840
And that does produce an answer.

494
00:30:28,840 --> 00:30:34,690
So this is-- if I multiply
this out, it's x transpose,

495
00:30:34,690 --> 00:30:36,610
A transpose, Ax.

496
00:30:36,610 --> 00:30:39,830
That comes from
the squared term.

497
00:30:39,830 --> 00:30:42,100
And then I have probably these--

498
00:30:42,100 --> 00:30:48,150
actually, probably I'll
get two of those, and then

499
00:30:48,150 --> 00:30:52,240
a constant term that
has derivative 0

500
00:30:52,240 --> 00:30:54,100
so it doesn't enter.

501
00:30:54,100 --> 00:30:56,590
So this is what I'm minimizing.

502
00:30:56,590 --> 00:30:59,800
This is the loss function.

503
00:30:59,800 --> 00:31:01,540
And it leads to--

504
00:31:01,540 --> 00:31:07,390
let's just jump to the key here.

505
00:31:07,390 --> 00:31:11,920
What equation do I
get when I look for--

506
00:31:11,920 --> 00:31:19,160
what equation is solved
by the best x, the best x?

507
00:31:19,160 --> 00:31:23,610
The best x solves the famous--

508
00:31:23,610 --> 00:31:28,860
this is regression in
statistics, linear regression.

509
00:31:31,760 --> 00:31:36,240
It's one of the main
computations in statistics,

510
00:31:36,240 --> 00:31:38,700
not of course just for
straight line fits,

511
00:31:38,700 --> 00:31:42,930
but for any system Ax equal b.

512
00:31:42,930 --> 00:31:45,300
That will lead to--

513
00:31:45,300 --> 00:31:48,660
this minimum will lead
to a system of equations

514
00:31:48,660 --> 00:31:50,760
that I'm going to
put a box around,

515
00:31:50,760 --> 00:31:53,340
because it's so fundamental.

516
00:31:53,340 --> 00:31:58,990
And are you willing to tell
me what that equation is?

517
00:31:58,990 --> 00:31:59,750
Yes, thanks.

518
00:31:59,750 --> 00:32:00,750
AUDIENCE: A transpose A.

519
00:32:00,750 --> 00:32:04,240
PROFESSOR: A transpose A is
going to come from there--

520
00:32:04,240 --> 00:32:06,270
you see it--

521
00:32:06,270 --> 00:32:13,080
times the best x
equals A transpose b.

522
00:32:17,980 --> 00:32:20,610
That gives the minimum.

523
00:32:20,610 --> 00:32:23,610
Let me forego checking that.

524
00:32:23,610 --> 00:32:27,520
You see that the quadratic
term has the matrix in it.

525
00:32:27,520 --> 00:32:29,920
So it's derivative.

526
00:32:29,920 --> 00:32:34,980
Maybe the derivative of
this is 2 A transpose Ax,

527
00:32:34,980 --> 00:32:38,100
and then the 2 cancels that 2.

528
00:32:38,100 --> 00:32:43,650
And this could also be written
as x transpose A transpose b.

529
00:32:43,650 --> 00:32:47,970
So it's x transpose
against A transpose b.

530
00:32:47,970 --> 00:32:48,960
That's linear.

531
00:32:48,960 --> 00:32:53,990
So when I take the derivative,
it's that constant.

532
00:32:53,990 --> 00:32:55,490
That's pretty fast.

533
00:32:55,490 --> 00:33:04,090
18.06 would patiently
derive that.

534
00:33:04,090 --> 00:33:08,640
But here, let me give
you the picture that

535
00:33:08,640 --> 00:33:12,500
goes with it, the geometry.

536
00:33:12,500 --> 00:33:17,950
So we have the problem.

537
00:33:17,950 --> 00:33:19,920
No solution.

538
00:33:19,920 --> 00:33:24,920
We have Gauss's best answer.

539
00:33:24,920 --> 00:33:29,210
Minimize the 2
norm of the error.

540
00:33:29,210 --> 00:33:33,410
We have the conclusion,
the matrix that we get in.

541
00:33:33,410 --> 00:33:36,170
And now I want to draw a
picture that goes with it.

542
00:33:36,170 --> 00:33:37,010
OK.

543
00:33:37,010 --> 00:33:38,450
So here is a picture.

544
00:33:44,490 --> 00:33:48,630
I want to have a column space
of A there in that picture.

545
00:33:48,630 --> 00:33:54,060
Of course, the 0 vector's
in the column space of A.

546
00:33:54,060 --> 00:34:01,070
So this is all
possible vectors Ax.

547
00:34:05,560 --> 00:34:06,100
Right?

548
00:34:06,100 --> 00:34:11,630
You're never forgetting that the
column space is all the Ax's.

549
00:34:11,630 --> 00:34:15,820
Now, I've got to put
b in the picture.

550
00:34:15,820 --> 00:34:20,770
So where does this vector
b-- so I'm trying to solve Ax

551
00:34:20,770 --> 00:34:23,350
equal b, but failing.

552
00:34:23,350 --> 00:34:28,570
So if I draw b in this
picture, how do I draw b?

553
00:34:28,570 --> 00:34:29,830
Where do I put it?

554
00:34:29,830 --> 00:34:32,889
Shall I put it in
the column space?

555
00:34:32,889 --> 00:34:34,120
No.

556
00:34:34,120 --> 00:34:37,389
The whole point is, it's
not in the column space.

557
00:34:37,389 --> 00:34:39,580
It's not an Ax.

558
00:34:39,580 --> 00:34:43,000
It's out there somewhere, b.

559
00:34:43,000 --> 00:34:45,350
OK.

560
00:34:45,350 --> 00:34:47,960
And then what's
the geometry that

561
00:34:47,960 --> 00:34:51,860
goes with least squares
and the normal equations

562
00:34:51,860 --> 00:34:57,710
and Gauss's suggestion
to minimize the error?

563
00:34:57,710 --> 00:35:04,460
Where will Ax be, the
best Ax that I can do?

564
00:35:04,460 --> 00:35:12,350
So what Gauss has
produced is an A here.

565
00:35:12,350 --> 00:35:14,390
You can't find an x.

566
00:35:14,390 --> 00:35:17,040
He'll do as best he can.

567
00:35:17,040 --> 00:35:20,550
And we're calling
that guy x hat.

568
00:35:20,550 --> 00:35:24,690
And this is the
algebra to find x hat.

569
00:35:24,690 --> 00:35:28,770
And now, where is
the picture here?

570
00:35:28,770 --> 00:35:31,910
Where is this
vector Ax hat, which

571
00:35:31,910 --> 00:35:35,750
is the best Ax we can get?

572
00:35:35,750 --> 00:35:39,320
So it has to be in
the column space,

573
00:35:39,320 --> 00:35:41,420
because it's A times something.

574
00:35:41,420 --> 00:35:44,210
And where is it in
the column space?

575
00:35:44,210 --> 00:35:48,330
It's the projection.

576
00:35:48,330 --> 00:35:50,640
That's Ax hat.

577
00:35:50,640 --> 00:35:54,540
And here is the error, which
you couldn't do anything about,

578
00:35:54,540 --> 00:35:56,040
b minus Ax hat.

579
00:35:58,970 --> 00:35:59,470
Yeah.

580
00:35:59,470 --> 00:36:02,170
So it's the projection, right.

581
00:36:02,170 --> 00:36:07,210
So all this is justifying the--

582
00:36:07,210 --> 00:36:13,030
so we're in the second
approach to least squares,

583
00:36:13,030 --> 00:36:15,940
solve the normal equations.

584
00:36:15,940 --> 00:36:18,730
Solve the normal equations.

585
00:36:18,730 --> 00:36:22,590
That would be the second
approach to least squares.

586
00:36:22,590 --> 00:36:30,800
And most examples, if they're
not very big or very difficult,

587
00:36:30,800 --> 00:36:33,550
you just create the
matrix A transpose A,

588
00:36:33,550 --> 00:36:39,130
and you call MATLAB and
solve that linear system.

589
00:36:39,130 --> 00:36:41,692
You create the matrix, you
create the right hand side,

590
00:36:41,692 --> 00:36:42,400
and you solve it.

591
00:36:45,670 --> 00:36:51,330
So that's the ordinary run of
the mill least squares problem.

592
00:36:51,330 --> 00:36:54,150
Just do it.

593
00:36:54,150 --> 00:36:56,620
So that's method
two, just do it.

594
00:36:59,650 --> 00:37:02,590
What's method three?

595
00:37:02,590 --> 00:37:05,140
For the same-- we're talking
about the same problem here,

596
00:37:05,140 --> 00:37:10,950
but now I'm thinking it may
be a little more difficult.

597
00:37:10,950 --> 00:37:17,820
This matrix A transpose A
might be nearly singular.

598
00:37:17,820 --> 00:37:21,200
Gauss is assuming that--

599
00:37:21,200 --> 00:37:23,410
yeah, when did this work?

600
00:37:23,410 --> 00:37:24,890
When did this work?

601
00:37:24,890 --> 00:37:30,150
And it will continue to
work in the next three--

602
00:37:30,150 --> 00:37:38,110
this works, this is
good, if assuming A

603
00:37:38,110 --> 00:37:41,295
has independent columns.

604
00:37:52,570 --> 00:37:54,810
Yeah, better just make clear.

605
00:37:54,810 --> 00:37:58,320
I'm claiming that when A has--

606
00:37:58,320 --> 00:38:00,570
so what's the reasoning?

607
00:38:00,570 --> 00:38:03,430
If A has independent columns--

608
00:38:03,430 --> 00:38:06,550
but maybe not enough
columns, like here--

609
00:38:06,550 --> 00:38:07,840
it's only got two columns.

610
00:38:07,840 --> 00:38:10,750
It's obviously not going to be
able to match any right hand

611
00:38:10,750 --> 00:38:11,410
side.

612
00:38:11,410 --> 00:38:13,570
But it's got
independent columns.

613
00:38:13,570 --> 00:38:16,630
When A has independent
columns, then what can I

614
00:38:16,630 --> 00:38:17,815
say about this matrix?

615
00:38:21,800 --> 00:38:23,870
It's invertible.

616
00:38:23,870 --> 00:38:25,500
Gauss's plan works.

617
00:38:25,500 --> 00:38:29,790
If A has independent
columns, then this

618
00:38:29,790 --> 00:38:32,880
would be a linear algebra step.

619
00:38:32,880 --> 00:38:35,040
Then this will be invertible.

620
00:38:35,040 --> 00:38:37,230
You see the importance
of that step.

621
00:38:37,230 --> 00:38:38,910
If A has independent
columns, that

622
00:38:38,910 --> 00:38:41,550
means it has no null space.

623
00:38:41,550 --> 00:38:44,760
Only x equals 0 is
in the null space.

624
00:38:44,760 --> 00:38:48,000
Two independent
columns, but only two.

625
00:38:48,000 --> 00:38:52,140
So not enough to solve
systems, but independent.

626
00:38:52,140 --> 00:38:53,560
Then you're OK.

627
00:38:53,560 --> 00:38:55,590
This matrix is invertible.

628
00:38:55,590 --> 00:38:57,120
You can do what Gauss tells you.

629
00:39:00,200 --> 00:39:03,530
But we're prepared now--

630
00:39:03,530 --> 00:39:07,790
we have to think, OK.

631
00:39:07,790 --> 00:39:11,930
So what do I really want to do?

632
00:39:11,930 --> 00:39:19,410
I want to connect this Gauss's
solution to the pseudo inverse.

633
00:39:19,410 --> 00:39:22,860
Because I'm claiming they
both give the same result.

634
00:39:22,860 --> 00:39:29,065
The pseudo inverse will apply.

635
00:39:31,910 --> 00:39:35,790
But we have something--

636
00:39:35,790 --> 00:39:37,280
A is not invertible.

637
00:39:37,280 --> 00:39:40,310
Just keep remembering
this matrix.

638
00:39:40,310 --> 00:39:41,990
It's not invertible.

639
00:39:41,990 --> 00:39:47,850
But it has got
independent columns.

640
00:39:47,850 --> 00:39:50,143
What am I saying there?

641
00:39:50,143 --> 00:39:51,435
Just going back to the picture.

642
00:39:56,030 --> 00:40:01,040
If A is a matrix with
independent columns,

643
00:40:01,040 --> 00:40:03,370
what space disappears
in this picture?

644
00:40:06,460 --> 00:40:08,690
The null space goes away.

645
00:40:08,690 --> 00:40:10,450
So the picture is simpler.

646
00:40:10,450 --> 00:40:16,160
But it's still the null
space of A transpose.

647
00:40:16,160 --> 00:40:18,800
This is still pretty
big, because I only

648
00:40:18,800 --> 00:40:21,530
had two columns and
a whole lot of rows.

649
00:40:21,530 --> 00:40:24,530
And that's going to
be reflected here.

650
00:40:24,530 --> 00:40:28,520
So what am I trying to say?

651
00:40:28,520 --> 00:40:31,250
I'm trying to say that
this answer is the same

652
00:40:31,250 --> 00:40:33,740
as the pseudo inverse answer.

653
00:40:33,740 --> 00:40:36,590
We could possibly
even check that point.

654
00:40:36,590 --> 00:40:38,210
Let me write it down first.

655
00:40:40,900 --> 00:40:52,160
I claim that the
answer A plus b is

656
00:40:52,160 --> 00:40:58,870
the same as the answer coming
from here, A transpose A,

657
00:40:58,870 --> 00:41:10,710
inverse A transpose b, when
I guess the null space is 0,

658
00:41:10,710 --> 00:41:15,740
the rank is all of n,
whatever you like to say.

659
00:41:15,740 --> 00:41:23,970
I believe that method one, this
two within one quick formula--

660
00:41:23,970 --> 00:41:32,970
so you remember that this was V
sigma plus U transpose, right?

661
00:41:32,970 --> 00:41:35,550
That's what A transpose was.

662
00:41:35,550 --> 00:41:37,250
That this should
agree with this.

663
00:41:43,750 --> 00:41:49,820
I believe those are the same
when the null space isn't

664
00:41:49,820 --> 00:41:51,380
in the picture.

665
00:41:51,380 --> 00:41:54,920
So the fact that the null
space is just a 0 vector

666
00:41:54,920 --> 00:41:58,700
means that this
inverse does exist.

667
00:41:58,700 --> 00:42:01,400
So this inverse exists.

668
00:42:01,400 --> 00:42:07,970
But A A transpose
is not invertible.

669
00:42:07,970 --> 00:42:09,410
Right?

670
00:42:09,410 --> 00:42:12,200
No inverse.

671
00:42:12,200 --> 00:42:18,170
Because A A transpose
would be coming--

672
00:42:18,170 --> 00:42:20,750
all this is the null
space of A transpose.

673
00:42:20,750 --> 00:42:22,790
So A transpose is
not invertible.

674
00:42:26,390 --> 00:42:30,950
But A transpose A is invertible.

675
00:42:30,950 --> 00:42:34,090
How would you check that?

676
00:42:34,090 --> 00:42:36,610
You see what I'm--

677
00:42:36,610 --> 00:42:40,330
it's taken pretty
much the whole hour

678
00:42:40,330 --> 00:42:46,930
to get a picture of the
geometry of the pseudo inverse.

679
00:42:46,930 --> 00:42:50,790
So this is the pseudo inverse.

680
00:42:50,790 --> 00:42:57,090
And this is-- that
matrix there, it's

681
00:42:57,090 --> 00:43:00,090
really doing its best
to be the inverse.

682
00:43:00,090 --> 00:43:03,450
In fact, everybody here
is just doing their best

683
00:43:03,450 --> 00:43:04,950
to be the inverse.

684
00:43:04,950 --> 00:43:07,680
Now, how well is
this-- how close

685
00:43:07,680 --> 00:43:09,420
is that to being the inverse?

686
00:43:09,420 --> 00:43:11,820
Can I just ask you about
that, and then I'll

687
00:43:11,820 --> 00:43:15,870
make this connection, and
then we're out of time.

688
00:43:15,870 --> 00:43:18,600
How close is that to
being the inverse of A?

689
00:43:21,600 --> 00:43:25,660
Suppose I multiply that
by A. What do I get?

690
00:43:25,660 --> 00:43:26,835
So just notice.

691
00:43:30,030 --> 00:43:38,110
If I multiply that
by A, what do I get?

692
00:43:38,110 --> 00:43:40,170
I get, yeah?

693
00:43:40,170 --> 00:43:45,480
I get I. Terrific.

694
00:43:45,480 --> 00:43:47,340
But don't be
deceived to thinking

695
00:43:47,340 --> 00:43:51,370
that this is the inverse of
A. It worked on the left side,

696
00:43:51,370 --> 00:43:54,670
but it's not going to be
good on the right hand side.

697
00:43:54,670 --> 00:44:06,640
So if I multiply A by this
guy in that direction,

698
00:44:06,640 --> 00:44:09,730
I'll get as close to the
identity as I can come,

699
00:44:09,730 --> 00:44:12,250
but I won't get the
identity that way.

700
00:44:12,250 --> 00:44:14,455
So this is just a
little box to say--

701
00:44:17,090 --> 00:44:19,420
so what's the point I'm making?

702
00:44:19,420 --> 00:44:23,170
I'm claiming that this
is the pseudo inverse.

703
00:44:23,170 --> 00:44:25,420
Whatever.

704
00:44:25,420 --> 00:44:26,880
Whatever these spaces.

705
00:44:26,880 --> 00:44:30,580
The rank could be
tiny, just one.

706
00:44:30,580 --> 00:44:36,190
This works when the rank is n.

707
00:44:36,190 --> 00:44:38,440
I needed independent columns.

708
00:44:38,440 --> 00:44:40,240
So when the rank is n--

709
00:44:40,240 --> 00:44:44,470
so this is rank equal n.

710
00:44:44,470 --> 00:44:46,930
That Gauss worked.

711
00:44:46,930 --> 00:44:48,290
Then I can get a--

712
00:44:48,290 --> 00:44:51,500
then it's a one-sided
inverse, but it's not

713
00:44:51,500 --> 00:44:52,730
a two-sided inverse.

714
00:44:52,730 --> 00:44:54,010
I can't do it.

715
00:44:54,010 --> 00:44:55,940
Look, my matrix there.

716
00:44:55,940 --> 00:45:01,160
I could find a one-sided inverse
to get the 2 by 2 identity.

717
00:45:01,160 --> 00:45:04,640
But I could never multiply
that by some matrix

718
00:45:04,640 --> 00:45:09,520
and get the n by n identity out
of those two pathetic columns.

719
00:45:09,520 --> 00:45:12,020
OK.

720
00:45:12,020 --> 00:45:14,990
Maybe you feel like
just checking this.

721
00:45:14,990 --> 00:45:16,520
Just takes patience.

722
00:45:16,520 --> 00:45:18,200
What do I mean by checking it?

723
00:45:20,870 --> 00:45:28,940
I mean stick in the pseudo SVD.

724
00:45:28,940 --> 00:45:32,780
Just put it in the SVD
and cancel like crazy.

725
00:45:32,780 --> 00:45:35,120
And I think that'll pop out.

726
00:45:35,120 --> 00:45:38,006
Do you believe me?

727
00:45:38,006 --> 00:45:40,380
Because it's going to
be a little painful.

728
00:45:40,380 --> 00:45:45,707
3 U sigma V transpose, all
transposed, and then something

729
00:45:45,707 --> 00:45:46,790
there and something there.

730
00:45:46,790 --> 00:45:50,690
I've got nine matrices
multiplying away.

731
00:45:50,690 --> 00:45:52,400
But it's going to--

732
00:45:52,400 --> 00:45:54,837
all sorts of things will
produce the identity.

733
00:45:54,837 --> 00:45:56,420
And in the end,
that's what I'll have.

734
00:45:59,510 --> 00:46:08,420
So this is a one-sided true
inverse, where the SVD--

735
00:46:08,420 --> 00:46:13,960
this fit formula is prepared to
have neither side invertible.

736
00:46:13,960 --> 00:46:16,580
It's still-- we know
what sigma plus means.

737
00:46:16,580 --> 00:46:18,740
Anyway.

738
00:46:18,740 --> 00:46:22,910
So under the assumption
of independent columns,

739
00:46:22,910 --> 00:46:27,215
Gauss works and gives the same
answer as the pseudo inverse.

740
00:46:29,950 --> 00:46:31,347
OK.

741
00:46:31,347 --> 00:46:31,930
Three minutes.

742
00:46:34,550 --> 00:46:40,380
That's hardly time,
but this being MIT,

743
00:46:40,380 --> 00:46:43,110
I feel I should use it.

744
00:46:43,110 --> 00:46:44,340
Oh my god.

745
00:46:44,340 --> 00:46:45,075
Number three.

746
00:46:47,910 --> 00:46:49,850
So what's number three about?

747
00:46:49,850 --> 00:46:57,430
Number three has
the same requirement

748
00:46:57,430 --> 00:47:02,640
as number two, the same
requirement of no null space.

749
00:47:02,640 --> 00:47:07,590
But it says, if I could get
orthogonal columns first,

750
00:47:07,590 --> 00:47:11,490
then this problem would be easy.

751
00:47:11,490 --> 00:47:15,750
So everybody knows that
Gram-Schmidt is a way--

752
00:47:15,750 --> 00:47:22,780
boring way-- to get
from these two columns

753
00:47:22,780 --> 00:47:26,300
to get two orthogonal columns.

754
00:47:26,300 --> 00:47:28,250
Actually, the whole
idea of Gram-Schmidt

755
00:47:28,250 --> 00:47:29,900
is already there for 2 by 2.

756
00:47:29,900 --> 00:47:32,840
So I have two minutes,
and we can do it.

757
00:47:32,840 --> 00:47:37,025
Let's do Gram-Schmidt
on these two columns--

758
00:47:42,500 --> 00:47:44,060
I don't want to use U and V--

759
00:47:44,060 --> 00:47:46,970
column y and z.

760
00:47:46,970 --> 00:47:48,020
OK.

761
00:47:48,020 --> 00:47:50,570
Suppose I want to
orthogonalize those guys.

762
00:47:50,570 --> 00:47:53,190
What's the Gram-Schmidt idea?

763
00:47:53,190 --> 00:47:54,210
I take y.

764
00:47:54,210 --> 00:47:56,160
It's perfectly good.

765
00:47:56,160 --> 00:47:58,710
No problem with y.

766
00:47:58,710 --> 00:48:03,160
There is the y
vector, the all 1's.

767
00:48:03,160 --> 00:48:07,680
Then this guy is not
orthogonal probably to that.

768
00:48:07,680 --> 00:48:11,970
It'll go off in this
direction, with an angle

769
00:48:11,970 --> 00:48:14,160
that's not 90 degrees.

770
00:48:14,160 --> 00:48:16,170
So what do I do?

771
00:48:16,170 --> 00:48:19,680
I want to get
orthogonal vectors.

772
00:48:19,680 --> 00:48:23,520
I'm OK with this first
guy, but the second guy

773
00:48:23,520 --> 00:48:25,110
isn't orthogonal to the first.

774
00:48:25,110 --> 00:48:27,430
So what do I do?

775
00:48:27,430 --> 00:48:30,070
How do I-- in this
picture, how do I come up

776
00:48:30,070 --> 00:48:32,620
with a vector orthogonal to y?

777
00:48:35,280 --> 00:48:36,900
Project.

778
00:48:36,900 --> 00:48:40,630
I take this z, and I
take its projection.

779
00:48:40,630 --> 00:48:43,830
So z has a little piece--

780
00:48:43,830 --> 00:48:49,020
that z vector has a big piece
already in the direction of y,

781
00:48:49,020 --> 00:48:52,080
which I don't want, and
a piece orthogonal to it.

782
00:48:52,080 --> 00:48:53,970
That's my other piece.

783
00:48:53,970 --> 00:48:55,020
That's my other piece.

784
00:48:55,020 --> 00:48:56,830
So here's y.

785
00:48:56,830 --> 00:49:05,340
And here's the-- that is z minus
projection, let me just say.

786
00:49:05,340 --> 00:49:06,560
Whatever.

787
00:49:06,560 --> 00:49:07,200
Yeah.

788
00:49:07,200 --> 00:49:09,300
I don't know if I even
drew that picture right.

789
00:49:09,300 --> 00:49:10,290
Probably I didn't.

790
00:49:10,290 --> 00:49:11,040
Anyway.

791
00:49:11,040 --> 00:49:11,890
Whatever.

792
00:49:11,890 --> 00:49:17,100
The Gram-Schmidt idea
is just orthogonalize

793
00:49:17,100 --> 00:49:18,730
in the natural way.

794
00:49:18,730 --> 00:49:21,660
I'll come back to that at
the beginning of next time

795
00:49:21,660 --> 00:49:27,950
and say a word about
the fourth way.

796
00:49:27,950 --> 00:49:32,330
So this least squares
is not deep learning.

797
00:49:32,330 --> 00:49:36,350
It's what people
did a century ago

798
00:49:36,350 --> 00:49:38,730
and continue to do
for good reason.

799
00:49:38,730 --> 00:49:39,770
OK.

800
00:49:39,770 --> 00:49:43,100
And I'll send out that
announcement about the class,

801
00:49:43,100 --> 00:49:44,840
and you know the
homework, and you know

802
00:49:44,840 --> 00:49:48,000
the new due date is Friday.

803
00:49:48,000 --> 00:49:48,500
Good.

804
00:49:48,500 --> 00:49:50,089
Thank you.