1
00:00:01,550 --> 00:00:03,920
The following content is
provided under a Creative

2
00:00:03,920 --> 00:00:05,310
Commons license.

3
00:00:05,310 --> 00:00:07,520
Your support will help
MIT OpenCourseWare

4
00:00:07,520 --> 00:00:11,610
continue to offer high quality
educational resources for free.

5
00:00:11,610 --> 00:00:14,180
To make a donation or to
view additional materials

6
00:00:14,180 --> 00:00:18,140
from hundreds of MIT courses,
visit MIT OpenCourseWare

7
00:00:18,140 --> 00:00:19,026
at ocw.mit.edu.

8
00:00:22,340 --> 00:00:26,860
GILBERT STRANG:
OK, so actually, I

9
00:00:26,860 --> 00:00:29,560
know where people are
working on projects,

10
00:00:29,560 --> 00:00:34,030
and you're not responsible
for any new material

11
00:00:34,030 --> 00:00:35,810
in the lectures.

12
00:00:35,810 --> 00:00:37,240
Thank you for coming.

13
00:00:37,240 --> 00:00:42,290
But I do have something,
an important topic,

14
00:00:42,290 --> 00:00:46,570
which is a revised version
about the construction

15
00:00:46,570 --> 00:00:48,910
of neural nets,
the basic structure

16
00:00:48,910 --> 00:00:51,130
that we're working with.

17
00:00:51,130 --> 00:01:00,310
So that's on the open
web at section 7.1,

18
00:01:00,310 --> 00:01:08,980
so Construction of Neural Nets.

19
00:01:13,890 --> 00:01:23,870
Really, it's a construction
of the learning function, F.

20
00:01:23,870 --> 00:01:27,870
So that's the function
that you optimize

21
00:01:27,870 --> 00:01:32,150
by gradient descent or
stochastic gradient descent,

22
00:01:32,150 --> 00:01:40,560
and you apply to the training
data to minimize the loss.

23
00:01:40,560 --> 00:01:45,470
So I'm just thinking about
it in a more organized way,

24
00:01:45,470 --> 00:01:49,130
because I wrote that section
before I knew anything

25
00:01:49,130 --> 00:01:52,400
more than how to
spell neural nets,

26
00:01:52,400 --> 00:01:56,170
but now I've thought
about it more.

27
00:01:56,170 --> 00:02:05,820
So the key point maybe, compared
to what I had in the past,

28
00:02:05,820 --> 00:02:09,930
is that I now think of
this as a function of two

29
00:02:09,930 --> 00:02:21,240
sets of variables, x and
v. So x are the weights,

30
00:02:21,240 --> 00:02:29,775
and v are the feature vectors,
the sample feature vectors.

31
00:02:36,490 --> 00:02:42,030
So those come from the training
data, either one at a time,

32
00:02:42,030 --> 00:02:44,220
if we're doing
stochastic gradient

33
00:02:44,220 --> 00:02:47,160
descent with mini-batch size 1.

34
00:02:47,160 --> 00:02:51,570
Or B at a time, if we're
doing mini-batch of size B,

35
00:02:51,570 --> 00:02:54,190
or the whole thing,
a whole epoch

36
00:02:54,190 --> 00:02:59,490
at once, if we're doing
full-scale gradient descent.

37
00:02:59,490 --> 00:03:01,890
So those are the
feature vectors,

38
00:03:01,890 --> 00:03:12,670
and these are the numbers in
the linear steps, the weights.

39
00:03:12,670 --> 00:03:24,360
So they're the matrices AK that
you multiply by, multiply v by.

40
00:03:24,360 --> 00:03:34,270
And also the bias vectors
bK that you add on

41
00:03:34,270 --> 00:03:36,970
to shift the origin.

42
00:03:36,970 --> 00:03:39,640
OK.

43
00:03:39,640 --> 00:03:44,260
It's these that you optimize,
those are to optimize.

44
00:03:49,010 --> 00:03:57,620
And what's the structure of the
whole of the learning function,

45
00:03:57,620 --> 00:03:59,610
and how do you use it?

46
00:03:59,610 --> 00:04:02,200
What does a neural
net look like?

47
00:04:02,200 --> 00:04:09,700
So you take F of a
first set of weights,

48
00:04:09,700 --> 00:04:17,500
so F of the first set of
weights would be A1 and B1,

49
00:04:17,500 --> 00:04:20,140
so that's x part.

50
00:04:20,140 --> 00:04:27,190
And the actual sample
vector, the sample vectors

51
00:04:27,190 --> 00:04:32,320
are v0 in the iteration.

52
00:04:32,320 --> 00:04:38,140
And then you do the nonlinear
step to each component,

53
00:04:38,140 --> 00:04:41,050
and that produces v1.

54
00:04:41,050 --> 00:04:44,620
So there is a typical--

55
00:04:44,620 --> 00:04:51,620
I could write out what this
is here, A1 v0 plus b1.

56
00:04:54,940 --> 00:04:57,950
The two steps are
the linear step.

57
00:04:57,950 --> 00:05:00,600
The endpoint is v0.

58
00:05:00,600 --> 00:05:06,240
You take the linear step using
the first weights, A1 and b1.

59
00:05:06,240 --> 00:05:11,190
Then, you takes a nonlinear
step, and that gives you v1.

60
00:05:11,190 --> 00:05:14,790
So that really better
than my line above,

61
00:05:14,790 --> 00:05:17,520
so I'll erase that line above.

62
00:05:17,520 --> 00:05:18,020
Yeah.

63
00:05:25,340 --> 00:05:32,070
So that produces v1 from
v0 and the first weights.

64
00:05:32,070 --> 00:05:36,980
And then the next
level inputs v1,

65
00:05:36,980 --> 00:05:43,550
so I'll just call
this vK or vK minus 1,

66
00:05:43,550 --> 00:05:45,970
and I'll call this one vK.

67
00:05:45,970 --> 00:05:51,320
OK, so K equal to 1 up
to however many layers,

68
00:05:51,320 --> 00:05:52,740
you are l layers.

69
00:05:56,790 --> 00:05:59,190
So the input was v0.

70
00:05:59,190 --> 00:06:03,420
So this v is really
v0, you could say.

71
00:06:03,420 --> 00:06:12,120
And this is the neural net, and
this is the input and output

72
00:06:12,120 --> 00:06:13,980
from each layer.

73
00:06:13,980 --> 00:06:20,050
And then vl is the final
output from the final layer.

74
00:06:20,050 --> 00:06:23,370
So let's just do a picture here.

75
00:06:23,370 --> 00:06:29,820
Here is v0, a sample
vector, or if we're

76
00:06:29,820 --> 00:06:37,320
doing image processing, it's
all the pixels in the data,

77
00:06:37,320 --> 00:06:39,570
in the training.

78
00:06:39,570 --> 00:06:43,440
From one sample, this
is one training sample.

79
00:06:50,340 --> 00:06:55,470
And then you multiply
by A1, and you add b1.

80
00:06:55,470 --> 00:07:04,190
And you take ReLU of that
vector, and that gives you v1.

81
00:07:04,190 --> 00:07:09,480
That gives you v1,
and then you iterate

82
00:07:09,480 --> 00:07:14,010
to finally vl, the last layer.

83
00:07:14,010 --> 00:07:16,680
You don't do ReLU
at the last layer,

84
00:07:16,680 --> 00:07:23,430
so it's just Al vl
minus 1 plus bl.

85
00:07:23,430 --> 00:07:27,480
And you may not do a bias
vector also at that layer,

86
00:07:27,480 --> 00:07:32,310
but you might, and this
is the finally the output.

87
00:07:35,910 --> 00:07:37,800
So that picture
is clearer for me

88
00:07:37,800 --> 00:07:41,220
than it was previously
to distinguish

89
00:07:41,220 --> 00:07:44,370
between the weights.

90
00:07:44,370 --> 00:07:49,410
So in the gradient
descent algorithm,

91
00:07:49,410 --> 00:07:51,990
it's these x's that
you're choosing.

92
00:07:51,990 --> 00:07:55,980
The v's are given by
the training data.

93
00:07:55,980 --> 00:07:59,850
That's not part of
the optimization part.

94
00:07:59,850 --> 00:08:02,850
It's x in chapter
6, where you're

95
00:08:02,850 --> 00:08:05,170
finding the optimal weights.

96
00:08:05,170 --> 00:08:12,980
So this x really stands
for all the weights

97
00:08:12,980 --> 00:08:26,370
that you compute up
to Al, bl, so that's

98
00:08:26,370 --> 00:08:27,840
a collection of all the weights.

99
00:08:27,840 --> 00:08:32,100
And the important part for
applications for practice is

100
00:08:32,100 --> 00:08:36,150
to realize that there are
often more weights and more

101
00:08:36,150 --> 00:08:39,299
components in the weights
than there are components

102
00:08:39,299 --> 00:08:43,679
in the feature vectors, in
the samples, in the v's.

103
00:08:43,679 --> 00:08:48,870
So often, the size of x is
greater than the size of v's

104
00:08:48,870 --> 00:08:54,720
which is an interesting and
sort of unexpected situation.

105
00:08:54,720 --> 00:08:58,160
So often, I'll just write that.

106
00:08:58,160 --> 00:09:03,290
Often, the x's are the weights.

107
00:09:06,800 --> 00:09:23,930
x's are underdetermined, because
the number of x's exceeds,

108
00:09:23,930 --> 00:09:27,300
and often far exceeds,
the number of v's,

109
00:09:27,300 --> 00:09:32,490
the number of the cardinality,
the number of weights.

110
00:09:32,490 --> 00:09:39,380
This is in the A's
and b's, and these

111
00:09:39,380 --> 00:09:43,180
are in the samples
in the training

112
00:09:43,180 --> 00:09:57,010
set, the number of features of
all the samples in the training

113
00:09:57,010 --> 00:09:58,060
set.

114
00:09:58,060 --> 00:10:05,710
So I'll get that new
section 7.1 up hopefully

115
00:10:05,710 --> 00:10:09,370
this week on the open--

116
00:10:09,370 --> 00:10:15,250
that's the open set-- and
I'll email to you on Stellar.

117
00:10:15,250 --> 00:10:17,890
Is there more I
should say about this?

118
00:10:17,890 --> 00:10:21,250
You see here, I can
draw the picture,

119
00:10:21,250 --> 00:10:23,140
but of course, a
hand-drawn picture

120
00:10:23,140 --> 00:10:30,850
is far inferior to a
machine-drawn picture

121
00:10:30,850 --> 00:10:33,820
an online picture,
but let me just do it.

122
00:10:33,820 --> 00:10:40,090
So there is v, the training
sample has some components,

123
00:10:40,090 --> 00:10:42,250
and then they're multiplied.

124
00:10:42,250 --> 00:10:51,970
Now, here is going to be v1,
the first hidden layer, and that

125
00:10:51,970 --> 00:11:02,620
can have a different number of
components in the first layer,

126
00:11:02,620 --> 00:11:04,600
a different number of neurons.

127
00:11:04,600 --> 00:11:10,960
And then each one
comes from the v's--

128
00:11:10,960 --> 00:11:16,780
so I will keep going here,
but you see the picture.

129
00:11:16,780 --> 00:11:21,400
So that describes a
matrix A1 that tells you

130
00:11:21,400 --> 00:11:24,100
what the weights are on
those, and then there's

131
00:11:24,100 --> 00:11:28,590
a b1 that's added.

132
00:11:28,590 --> 00:11:34,290
The bias vector is added
to all those to get the v1.

133
00:11:34,290 --> 00:11:42,220
so v1 is A1 v0 plus
b1, and then onwards.

134
00:11:42,220 --> 00:11:46,750
So this is the spot
where drawing it by hand

135
00:11:46,750 --> 00:11:53,560
is clearly inferior to any
other possible way to do it.

136
00:11:53,560 --> 00:11:55,690
OK.

137
00:11:55,690 --> 00:12:03,430
So now, I haven't yet put into
the picture the loss function.

138
00:12:03,430 --> 00:12:08,410
So that's the function
that you want to minimize.

139
00:12:08,410 --> 00:12:09,675
So what is the loss function?

140
00:12:13,420 --> 00:12:18,740
So we're choosing x to--

141
00:12:18,740 --> 00:12:20,830
that's all the A's and b's--

142
00:12:20,830 --> 00:12:27,250
to minimize the
loss, function L.

143
00:12:27,250 --> 00:12:28,600
OK.

144
00:12:28,600 --> 00:12:34,540
So it's this part that Professor
Sra's lecture was about.

145
00:12:34,540 --> 00:12:45,490
So he said, L is often a
finite sum over all of F.

146
00:12:45,490 --> 00:12:46,630
So what would that be?

147
00:12:46,630 --> 00:13:00,210
F of x, vi, so this is the
output from with weights

148
00:13:00,210 --> 00:13:03,600
in x from sample number i.

149
00:13:03,600 --> 00:13:06,600
And if we're doing batch
processing-- that is,

150
00:13:06,600 --> 00:13:08,700
we're doing the
whole batch at once--

151
00:13:08,700 --> 00:13:10,530
then we compute that for all i.

152
00:13:10,530 --> 00:13:14,970
And that's the computation
that's ridiculously expensive,

153
00:13:14,970 --> 00:13:19,020
and you go instead to
stochastic gradient.

154
00:13:19,020 --> 00:13:22,260
And you just choose one
of those, or b of those,

155
00:13:22,260 --> 00:13:27,430
a small number b, like
32 or 128 of these F's.

156
00:13:27,430 --> 00:13:34,290
But full-scale gradient
descent chooses the weights

157
00:13:34,290 --> 00:13:36,840
x to minimize the loss.

158
00:13:36,840 --> 00:13:40,500
Now, so I haven't got
the loss here yet.

159
00:13:40,500 --> 00:13:49,860
This function, the loss would
be minus the true result

160
00:13:49,860 --> 00:13:52,110
from sample i.

161
00:13:52,110 --> 00:13:53,940
I haven't got a good
notation for that.

162
00:13:53,940 --> 00:13:56,400
I'm open to suggestions.

163
00:13:56,400 --> 00:13:58,390
So how do I want
to write the error?

164
00:14:01,200 --> 00:14:03,330
So that would be--

165
00:14:03,330 --> 00:14:06,630
if it was least squares, I
would maybe be squaring that.

166
00:14:09,480 --> 00:14:12,300
So it would be a sum
of squares of errors

167
00:14:12,300 --> 00:14:15,930
squared over all the samples.

168
00:14:15,930 --> 00:14:18,810
Or if I'm doing stochastic
gradient descent,

169
00:14:18,810 --> 00:14:19,700
I would minimize.

170
00:14:19,700 --> 00:14:21,870
I guess I'm minimizing this.

171
00:14:21,870 --> 00:14:25,740
But the question is, do
I use the whole function

172
00:14:25,740 --> 00:14:32,580
L at each iteration, or do
I just pick one, or only b,

173
00:14:32,580 --> 00:14:37,830
of the samples to look
at iteration number K?

174
00:14:37,830 --> 00:14:43,890
So this is the L of x then.

175
00:14:43,890 --> 00:14:47,250
I've added up over all the v's.

176
00:14:47,250 --> 00:14:51,510
So just to keep the
notation straight,

177
00:14:51,510 --> 00:14:54,180
I have this function
of x and v's.

178
00:14:54,180 --> 00:14:55,530
I find it's output.

179
00:14:58,380 --> 00:15:03,150
This is what the
neural net produces.

180
00:15:03,150 --> 00:15:06,300
It's supposed to be
close to the true.

181
00:15:06,300 --> 00:15:08,010
We don't want it to be exactly--

182
00:15:08,010 --> 00:15:10,620
we don't expect this
to be exactly 0,

183
00:15:10,620 --> 00:15:15,540
but it could be, because we have
lots of weight to achieve that.

184
00:15:18,270 --> 00:15:22,020
So anyway, that would
be the loss we minimize,

185
00:15:22,020 --> 00:15:24,780
and it'd be squared
for square loss.

186
00:15:24,780 --> 00:15:30,720
I guess I haven't really
spoken about loss functions.

187
00:15:30,720 --> 00:15:38,520
Let me just put those
here, and actually these

188
00:15:38,520 --> 00:15:42,070
are popular loss functions.

189
00:15:42,070 --> 00:15:49,890
One would be the one we know
best, square loss, and number

190
00:15:49,890 --> 00:15:53,730
two, I've never seen it
used quite this directly,

191
00:15:53,730 --> 00:16:03,330
would be the l1 loss,
maybe the sum of L1 norms.

192
00:16:03,330 --> 00:16:10,250
This is sum of these errors
squared in the L2 norm.

193
00:16:10,250 --> 00:16:16,080
The L1 loss could be the
sum over i of the L1 losses.

194
00:16:22,510 --> 00:16:28,150
Well, this comes into specific
other problems like Lasso

195
00:16:28,150 --> 00:16:32,350
and other important problems
you're minimizing an L1 norm

196
00:16:32,350 --> 00:16:36,550
but not in deep learning.

197
00:16:36,550 --> 00:16:39,415
Now, and three
would be Hinge loss.

198
00:16:43,960 --> 00:16:48,070
Probably some of you know
better than I the formula

199
00:16:48,070 --> 00:16:52,390
and the background
behind hinge losses.

200
00:16:52,390 --> 00:16:57,565
This is for the minus 1,
1 classification problems.

201
00:17:05,349 --> 00:17:08,960
That would be appropriate
for regression,

202
00:17:08,960 --> 00:17:11,079
so this would be
for a regression.

203
00:17:14,670 --> 00:17:18,359
And then finally, the most
important for neural nets,

204
00:17:18,359 --> 00:17:21,539
is cross-entropy loss.

205
00:17:28,099 --> 00:17:29,570
This is for neural nets.

206
00:17:34,800 --> 00:17:42,690
So this is really the most
used loss function in the setup

207
00:17:42,690 --> 00:17:46,590
that we are mostly
thinking of, and I'll

208
00:17:46,590 --> 00:17:51,510
try to say more about that
before the course ends.

209
00:17:51,510 --> 00:17:52,470
So is that--

210
00:17:52,470 --> 00:17:54,090
I don't know.

211
00:17:54,090 --> 00:18:00,450
For me, I hadn't got this
straight until rewriting

212
00:18:00,450 --> 00:18:05,400
that section, and it's
now in better form,

213
00:18:05,400 --> 00:18:07,735
but comments are welcome.

214
00:18:07,735 --> 00:18:08,235
OK.

215
00:18:11,360 --> 00:18:14,570
So that just completes
what I wanted to say,

216
00:18:14,570 --> 00:18:16,325
and you'll see the new section.

217
00:18:19,750 --> 00:18:25,610
Any comment on that before I go
to a different topic entirely?

218
00:18:25,610 --> 00:18:26,600
OK.

219
00:18:26,600 --> 00:18:31,190
Oh, any questions before
I go to this topic?

220
00:18:31,190 --> 00:18:32,600
Which I'll tell you what it is.

221
00:18:36,880 --> 00:18:47,230
It's a short section in the
book about distance matrices,

222
00:18:47,230 --> 00:18:49,000
and the question is.

223
00:18:58,540 --> 00:19:07,160
We have a bunch of points
in space, and what we know

224
00:19:07,160 --> 00:19:19,700
is we know the distances
between the points,

225
00:19:19,700 --> 00:19:23,050
and it's convenient to talk
about distances squared here.

226
00:19:28,050 --> 00:19:30,610
OK.

227
00:19:30,610 --> 00:19:32,740
And how would we know
of these distances?

228
00:19:32,740 --> 00:19:39,820
Maybe by radar or
any measurement.

229
00:19:39,820 --> 00:19:47,110
They might be sensors,
which we've placed around,

230
00:19:47,110 --> 00:19:51,100
and we can measure the
distances between them.

231
00:19:51,100 --> 00:19:54,190
And the question is,
what's their position?

232
00:19:54,190 --> 00:19:59,450
So that's the question.

233
00:19:59,450 --> 00:20:01,660
So let me talk a little
bit about this question

234
00:20:01,660 --> 00:20:03,310
and then pause.

235
00:20:03,310 --> 00:20:14,200
Find positions in, well,
in space, but I don't know.

236
00:20:14,200 --> 00:20:16,000
We don't know
ahead of time maybe

237
00:20:16,000 --> 00:20:20,500
whether the space is ordinary
3D space, or whether these

238
00:20:20,500 --> 00:20:24,160
are sensors in a plane,
or whether we have

239
00:20:24,160 --> 00:20:25,610
to go to higher dimensions.

240
00:20:25,610 --> 00:20:30,130
I'll just put d, and
also, I'll just say then,

241
00:20:30,130 --> 00:20:31,820
we're also finding d.

242
00:20:37,570 --> 00:20:39,160
And what are these positions?

243
00:20:39,160 --> 00:20:44,410
These are positions x,
xi, so that the distance

244
00:20:44,410 --> 00:20:53,920
between xi minus xj
squared is the given dij.

245
00:20:56,470 --> 00:20:59,110
So we're given
distances between them,

246
00:20:59,110 --> 00:21:02,110
and we want to find
their positions.

247
00:21:02,110 --> 00:21:05,800
So we know distances, and
we want to find positions.

248
00:21:05,800 --> 00:21:07,510
That's the question.

249
00:21:07,510 --> 00:21:11,650
It's just a neat math
question that is solved,

250
00:21:11,650 --> 00:21:13,360
and you'll see a
complete solution.

251
00:21:16,120 --> 00:21:22,680
And it has lots of applications,
and it's just a nice question.

252
00:21:22,680 --> 00:21:24,750
So it occupies a
section of the book,

253
00:21:24,750 --> 00:21:28,100
but that section is
only two pages long.

254
00:21:28,100 --> 00:21:32,670
It's just a straightforward
solution to that question.

255
00:21:32,670 --> 00:21:36,900
Given the distances,
find the positions.

256
00:21:36,900 --> 00:21:38,965
Given the distances,
find the excess.

257
00:21:43,610 --> 00:21:45,500
OK.

258
00:21:45,500 --> 00:21:47,120
So I'm going to
speak about that.

259
00:21:50,350 --> 00:21:54,500
I had a suggestion, a
good suggestion, by email.

260
00:21:54,500 --> 00:21:58,340
Well, questions about
the projects coming in?

261
00:21:58,340 --> 00:22:01,400
Projects are beginning
to come in, and at least

262
00:22:01,400 --> 00:22:04,930
at the beginning--

263
00:22:04,930 --> 00:22:07,460
well, in all cases,
beginning and end,

264
00:22:07,460 --> 00:22:09,260
I'll read them carefully.

265
00:22:09,260 --> 00:22:12,290
And as long as I
can, I'll send back

266
00:22:12,290 --> 00:22:19,190
suggestions for a final
rewrite, and as I said,

267
00:22:19,190 --> 00:22:20,930
a print out is great.

268
00:22:20,930 --> 00:22:24,320
You could leave it in the
envelope outside my office,

269
00:22:24,320 --> 00:22:30,330
but of course, online is
what everybody's doing.

270
00:22:30,330 --> 00:22:32,240
So those are just
beginning to come in,

271
00:22:32,240 --> 00:22:36,230
and if we can get them
in by a week from today,

272
00:22:36,230 --> 00:22:38,840
I'm really, really happy.

273
00:22:38,840 --> 00:22:41,940
Yeah, and just feel
free to email me.

274
00:22:41,940 --> 00:22:49,450
I would email me about projects,
not Jonathan and not anonymous

275
00:22:49,450 --> 00:22:50,660
Stellar.

276
00:22:50,660 --> 00:22:55,280
I think you'd probably do better
just to ask me the question.

277
00:22:55,280 --> 00:23:00,410
That's fine, and I'll try
to answer in a useful way.

278
00:23:00,410 --> 00:23:04,250
Yeah, and I'm always
open to questions.

279
00:23:04,250 --> 00:23:11,240
So you could email me like how
long should this project be?

280
00:23:11,240 --> 00:23:14,030
My tutor in Oxford
said something like--

281
00:23:14,030 --> 00:23:19,310
when you were writing essays.

282
00:23:19,310 --> 00:23:22,460
That's the Oxford system
is to write an essay--

283
00:23:22,460 --> 00:23:25,070
and he said, just
start where it starts,

284
00:23:25,070 --> 00:23:26,870
and end when it finishes.

285
00:23:26,870 --> 00:23:33,230
So that's the idea, certainly
not enormously long.

286
00:23:33,230 --> 00:23:39,740
And then a question was raised--
and I can ask you if you are

287
00:23:39,740 --> 00:23:41,300
interested in that--

288
00:23:41,300 --> 00:23:45,490
the question was, what
courses after this one

289
00:23:45,490 --> 00:23:51,200
are natural to
take to go forward?

290
00:23:51,200 --> 00:23:55,730
And I don't know how many of you
are thinking to take, have time

291
00:23:55,730 --> 00:24:02,510
to take, other MIT courses in
this area of deep learning,

292
00:24:02,510 --> 00:24:08,270
machine learning, optimization,
all the topics we've had here.

293
00:24:08,270 --> 00:24:12,320
Anybody expecting to take more
courses, just stick up a hand.

294
00:24:12,320 --> 00:24:15,540
Yeah, and you already
know like what MIT offers?

295
00:24:18,700 --> 00:24:21,020
So that was the question
that came to me,

296
00:24:21,020 --> 00:24:25,710
what does MIT offer
in this direction?

297
00:24:25,710 --> 00:24:29,300
And I haven't looked up to see
the number of Professor Sra's

298
00:24:29,300 --> 00:24:32,780
course, S-R-A, in course 6.

299
00:24:32,780 --> 00:24:37,850
It's 6 point high number,
and after his good lecture,

300
00:24:37,850 --> 00:24:42,510
I think that's got
to be worthwhile.

301
00:24:42,510 --> 00:24:44,390
So I looked in course 6.

302
00:24:44,390 --> 00:24:50,450
I didn't find really
an institute-wide list.

303
00:24:50,450 --> 00:24:52,970
Maybe course 6 feels that
they are the Institute,

304
00:24:52,970 --> 00:24:55,550
but there are other
courses around.

305
00:25:00,570 --> 00:25:04,790
But I found in the
operations research site,

306
00:25:04,790 --> 00:25:11,390
ORC, the Operations Research
Center, let me just put there.

307
00:25:11,390 --> 00:25:14,450
This is just in
case you would like

308
00:25:14,450 --> 00:25:17,545
to think about any
of these things.

309
00:25:24,280 --> 00:25:29,740
As I write that, so I heard
the lecture by Tim Berners-Lee.

310
00:25:29,740 --> 00:25:33,490
Did others hear that
a week or so ago?

311
00:25:33,490 --> 00:25:36,710
He created the web.

312
00:25:36,710 --> 00:25:39,820
So that's pretty amazing--

313
00:25:39,820 --> 00:25:45,580
it wasn't Al Gore, after all,
and do you know his name?

314
00:25:45,580 --> 00:25:50,500
Well, he's now Sir
Tim Berners-Lee.

315
00:25:53,960 --> 00:25:57,760
So that double name makes you
suspect that he's from England,

316
00:25:57,760 --> 00:26:00,520
and he is.

317
00:26:00,520 --> 00:26:03,910
So anyway, I was
going to say, I hold

318
00:26:03,910 --> 00:26:07,840
him responsible for
these excessive letters

319
00:26:07,840 --> 00:26:14,260
in the address, in the URL.

320
00:26:14,260 --> 00:26:19,090
I mean, he's made us
all say W-W-W for years.

321
00:26:19,090 --> 00:26:23,650
Find some other way to say
it, but it's not easy to say,

322
00:26:23,650 --> 00:26:24,220
I think.

323
00:26:24,220 --> 00:26:25,690
OK, whatever.

324
00:26:25,690 --> 00:26:36,070
This is the OR Center
at MIT, and then it's

325
00:26:36,070 --> 00:26:42,750
academics or something,
and then it's something

326
00:26:42,750 --> 00:26:45,330
like course offerings.

327
00:26:45,330 --> 00:26:46,650
That's approximately right.

328
00:26:52,080 --> 00:26:55,920
And since they do
applied optimization,

329
00:26:55,920 --> 00:26:59,940
under the heading of data
analytics or statistics,

330
00:26:59,940 --> 00:27:05,790
there's optimization, there's
OR, Operations Research,

331
00:27:05,790 --> 00:27:13,020
other lists but a good list of
courses from many departments,

332
00:27:13,020 --> 00:27:19,200
especially course 6.

333
00:27:19,200 --> 00:27:22,650
Course 15 which is where
the operation and research

334
00:27:22,650 --> 00:27:26,190
center is, course 18,
and there are others

335
00:27:26,190 --> 00:27:28,780
in course 2 and elsewhere.

336
00:27:28,780 --> 00:27:29,280
Yeah.

337
00:27:34,390 --> 00:27:37,930
Would somebody like to say
what course you have in mind

338
00:27:37,930 --> 00:27:40,250
to take next, after this one?

339
00:27:40,250 --> 00:27:47,260
If you looked ahead to next
year, any suggestions of what

340
00:27:47,260 --> 00:27:48,490
looks like a good course?

341
00:27:51,340 --> 00:27:56,290
I sat in once on 6.036,
the really basic course,

342
00:27:56,290 --> 00:28:00,640
and you would want to go higher.

343
00:28:00,640 --> 00:28:01,140
OK.

344
00:28:04,180 --> 00:28:06,970
Maybe this is just
to say, I'd be

345
00:28:06,970 --> 00:28:10,810
interested to know what you do
next, what your experience is,

346
00:28:10,810 --> 00:28:14,270
or I'd be happy to give advice.

347
00:28:14,270 --> 00:28:18,250
But maybe my general
advice is that that's

348
00:28:18,250 --> 00:28:20,405
a useful list of courses.

349
00:28:23,440 --> 00:28:25,820
OK?

350
00:28:25,820 --> 00:28:29,000
Back to distance matrices.

351
00:28:29,000 --> 00:28:32,830
OK, so here's the problem.

352
00:28:32,830 --> 00:28:34,040
Yeah.

353
00:28:34,040 --> 00:28:36,620
OK, I'll probably
have to erase that,

354
00:28:36,620 --> 00:28:40,080
but I'll leave it for a minute.

355
00:28:40,080 --> 00:28:41,550
OK.

356
00:28:41,550 --> 00:28:46,550
So we know these distances,
and we want to find the x's, so

357
00:28:46,550 --> 00:28:49,730
let's call this dij maybe.

358
00:28:53,240 --> 00:28:58,010
So we have a D matrix, and we
want to find a position matrix,

359
00:28:58,010 --> 00:29:00,060
let me just see what notation.

360
00:29:00,060 --> 00:29:10,460
So this is section 3.9, no 4.9,
previously 3.9, but chapters 3

361
00:29:10,460 --> 00:29:13,130
and 4 got switched.

362
00:29:13,130 --> 00:29:19,250
Maybe actually, yeah, I
think it's 8 or 9 or 10,

363
00:29:19,250 --> 00:29:24,640
other topics are trying
to find their way in.

364
00:29:24,640 --> 00:29:25,220
OK.

365
00:29:25,220 --> 00:29:29,640
So that's the
reference on the web,

366
00:29:29,640 --> 00:29:32,480
and I'll get these
sections onto Stellar.

367
00:29:32,480 --> 00:29:33,500
OK.

368
00:29:33,500 --> 00:29:37,760
So the question is, can
we recover the positions

369
00:29:37,760 --> 00:29:40,480
from the distances?

370
00:29:40,480 --> 00:29:41,980
In fact, there's
also a question,

371
00:29:41,980 --> 00:29:48,400
are there always positions
from given distances?

372
00:29:48,400 --> 00:29:51,980
And I mentioned
several applications.

373
00:29:51,980 --> 00:29:56,810
I've already spoken about
wireless sensor networks, where

374
00:29:56,810 --> 00:30:00,860
you can measure travel
times between them,

375
00:30:00,860 --> 00:30:02,120
between the sensors.

376
00:30:02,120 --> 00:30:05,210
And then that gives
you the distances,

377
00:30:05,210 --> 00:30:09,560
and then you use this
neat little bit of math

378
00:30:09,560 --> 00:30:11,750
to find the positions.

379
00:30:11,750 --> 00:30:17,270
Well, of course, you can't
find the positions uniquely.

380
00:30:17,270 --> 00:30:23,780
Clearly, you could any rigid
motion of all the positions.

381
00:30:23,780 --> 00:30:27,920
If I have a set
of positions, what

382
00:30:27,920 --> 00:30:30,410
am I going to call that, x?

383
00:30:30,410 --> 00:30:35,375
So I'll write here, and
so I'm given the D matrix.

384
00:30:40,440 --> 00:30:49,450
That's distances, and the job
is to find the X matrix which

385
00:30:49,450 --> 00:30:50,770
gives the positions.

386
00:30:54,150 --> 00:30:57,780
And what I'm just going
to say, and you already

387
00:30:57,780 --> 00:31:02,580
saw it your mind-- that if
I have a set of positions,

388
00:31:02,580 --> 00:31:05,880
I could do a translation.

389
00:31:05,880 --> 00:31:08,130
The distances
wouldn't change, or I

390
00:31:08,130 --> 00:31:13,980
could do a rigid motion,
a rigid rotation.

391
00:31:13,980 --> 00:31:21,840
So positions are not unique,
but I can come closer by saying,

392
00:31:21,840 --> 00:31:26,640
put the centroid at the
origin, or something like that.

393
00:31:26,640 --> 00:31:29,680
That will take out the
translations at least.

394
00:31:29,680 --> 00:31:30,180
OK.

395
00:31:30,180 --> 00:31:31,500
So find the X matrix.

396
00:31:31,500 --> 00:31:32,550
That's the job.

397
00:31:32,550 --> 00:31:36,390
OK, and I was going to--
before I started on that--

398
00:31:36,390 --> 00:31:39,910
the shapes of molecules
is another application.

399
00:31:39,910 --> 00:31:45,260
Nuclear magnetic resonance
gives distances, gives d,

400
00:31:45,260 --> 00:31:49,280
and then we find
the positions x.

401
00:31:49,280 --> 00:31:51,770
And of course, there's
a noise in there,

402
00:31:51,770 --> 00:31:54,350
and sometimes missing entries.

403
00:31:54,350 --> 00:31:58,100
And machine learning
could be just described

404
00:31:58,100 --> 00:32:01,400
also as you're given
a whole lot of points

405
00:32:01,400 --> 00:32:04,760
in space, feature vectors
in a high-dimensional space.

406
00:32:04,760 --> 00:32:06,740
Actually, this is a big deal.

407
00:32:06,740 --> 00:32:09,590
You're given a
whole lot of points

408
00:32:09,590 --> 00:32:17,210
with in high-dimensional
space, and those are related.

409
00:32:17,210 --> 00:32:19,310
They sort of come
together naturally,

410
00:32:19,310 --> 00:32:25,620
so they tend to fit on a surface
in high-dimensional space,

411
00:32:25,620 --> 00:32:28,920
a low-dimensional surface
in high-dimensional space.

412
00:32:28,920 --> 00:32:31,290
And really, a lot
of mathematics is

413
00:32:31,290 --> 00:32:36,720
devoted to finding that
low-dimensional, that subspace,

414
00:32:36,720 --> 00:32:38,430
but it could be curved.

415
00:32:38,430 --> 00:32:42,630
So subspace is not
the correct word.

416
00:32:42,630 --> 00:32:45,600
Really, manifold,
curved manifold

417
00:32:45,600 --> 00:32:49,230
is what a geometer would say.

418
00:32:49,230 --> 00:32:51,900
That is close to all the--

419
00:32:51,900 --> 00:32:55,110
it's smooth and close
to all the points,

420
00:32:55,110 --> 00:32:58,290
and you could linearize it.

421
00:32:58,290 --> 00:33:01,830
You could flatten
it out, and then you

422
00:33:01,830 --> 00:33:04,050
have a much reduced problem.

423
00:33:04,050 --> 00:33:07,320
The dimension is reduced
from the original dimension

424
00:33:07,320 --> 00:33:13,050
of where the points
lie with a lot of data

425
00:33:13,050 --> 00:33:17,498
to the true dimension of the
problem which, of course,

426
00:33:17,498 --> 00:33:19,290
sets of points were
all on a straight line.

427
00:33:19,290 --> 00:33:22,630
The true dimension of
the problem would be 1.

428
00:33:22,630 --> 00:33:25,065
So we have to discover this.

429
00:33:29,230 --> 00:33:33,520
We also have to find
that dimension d.

430
00:33:33,520 --> 00:33:34,650
OK, so how do we do it?

431
00:33:38,160 --> 00:33:40,420
So it's a classical problem.

432
00:33:40,420 --> 00:33:42,150
It just has a neat answer.

433
00:33:42,150 --> 00:33:42,650
OK.

434
00:33:46,870 --> 00:33:53,170
All right, so let's
recognize the connection

435
00:33:53,170 --> 00:33:55,750
between distances and positions.

436
00:33:55,750 --> 00:34:04,090
So dij is the square
distance between them,

437
00:34:04,090 --> 00:34:19,649
so that is xi dot xi minus xi
to xj minus xj, xi plus xj, xj.

438
00:34:22,730 --> 00:34:23,230
OK.

439
00:34:26,210 --> 00:34:28,667
Is that right?

440
00:34:28,667 --> 00:34:30,635
Yes.

441
00:34:30,635 --> 00:34:33,600
OK.

442
00:34:33,600 --> 00:34:37,219
So those are the
dij's in a matrix,

443
00:34:37,219 --> 00:34:45,020
and these are entries
in the matrix D. OK.

444
00:34:48,070 --> 00:34:57,560
Well, these entries
depend only on i.

445
00:34:57,560 --> 00:34:59,820
They're the same for every j.

446
00:34:59,820 --> 00:35:04,200
So this is going to be-- this
will this part will produce--

447
00:35:04,200 --> 00:35:05,430
I'll rank one matrix.

448
00:35:09,690 --> 00:35:13,020
Because things depend not
only on the row but not on j,

449
00:35:13,020 --> 00:35:20,565
the column number,
so columns repeated.

450
00:35:27,970 --> 00:35:34,420
Yeah, and this
produces similarly

451
00:35:34,420 --> 00:35:39,550
something that depends only on
j, only on the column number.

452
00:35:39,550 --> 00:35:43,350
So the rows are all the
same, so this is also

453
00:35:43,350 --> 00:35:55,730
a rank one matrix with all
repeated, all the same rows.

454
00:35:55,730 --> 00:36:01,180
Because if I change i,
nothing changes in a product.

455
00:36:01,180 --> 00:36:06,630
So really, these
are the terms that

456
00:36:06,630 --> 00:36:13,900
produce most of the matrix, the
significant part of the matrix.

457
00:36:13,900 --> 00:36:14,890
OK.

458
00:36:14,890 --> 00:36:20,440
So what do we do with those?

459
00:36:26,710 --> 00:36:30,640
So let's see, did I give
a name for the matrix

460
00:36:30,640 --> 00:36:31,690
that I'm looking for?

461
00:36:31,690 --> 00:36:49,930
I think in the notes I call
it X. So I'm given D, find X.

462
00:36:49,930 --> 00:36:52,720
And what I'll actually find--

463
00:36:52,720 --> 00:36:54,620
you can see it coming here--

464
00:36:54,620 --> 00:37:06,490
is actually find X transpose X.
Because what I'm given is dot

465
00:37:06,490 --> 00:37:08,740
products of X's.

466
00:37:11,530 --> 00:37:17,430
So I would like to
discover out of all this

467
00:37:17,430 --> 00:37:20,870
what xi dotted with xj is.

468
00:37:20,870 --> 00:37:23,380
That'll be the
correct dot product.

469
00:37:23,380 --> 00:37:29,470
Let's call this matrix G
for the dot product matrix,

470
00:37:29,470 --> 00:37:40,780
and then find X from G.

471
00:37:40,780 --> 00:37:44,960
So this is a nice argument.

472
00:37:44,960 --> 00:37:50,230
So what this tells me is some
information about dot products.

473
00:37:50,230 --> 00:37:54,280
So this is telling me something
about the G matrix, the X

474
00:37:54,280 --> 00:37:56,110
transpose X matrix.

475
00:37:56,110 --> 00:38:01,390
And then once I know G, then
it's a separate step to find X.

476
00:38:01,390 --> 00:38:06,640
And of course, this is the
point at which X is not unique.

477
00:38:06,640 --> 00:38:11,800
If I put it in a rotation
into X, then that rotation q,

478
00:38:11,800 --> 00:38:15,850
I'll see a q transpose
q, and it'll disappear.

479
00:38:15,850 --> 00:38:19,720
So I'm free to rotate
the X's, because that

480
00:38:19,720 --> 00:38:22,420
doesn't change the dot product.

481
00:38:22,420 --> 00:38:25,990
So it's G that I want to know,
and this tells me something

482
00:38:25,990 --> 00:38:31,252
about G, and this tells
me something about G.

483
00:38:31,252 --> 00:38:36,480
And so does that, but
that's what I have to see.

484
00:38:36,480 --> 00:38:40,170
So what do those tell me?

485
00:38:40,170 --> 00:38:40,950
Let's see.

486
00:38:40,950 --> 00:38:43,560
Let me write down
what I have here.

487
00:38:50,060 --> 00:39:00,940
So let's say a diagonal matrix
with Dii as the inner product

488
00:39:00,940 --> 00:39:10,120
xi with xi that we're getting
partial information from here.

489
00:39:10,120 --> 00:39:11,530
So is that OK?

490
00:39:11,530 --> 00:39:15,630
I'm introducing that
notation, because this is now

491
00:39:15,630 --> 00:39:20,560
going to tell me
that my D matrix is--

492
00:39:20,560 --> 00:39:22,390
so what is that?

493
00:39:22,390 --> 00:39:25,660
So this is the diagonal matrix.

494
00:39:25,660 --> 00:39:27,704
Maybe it's just a
vector, I should say.

495
00:39:30,220 --> 00:39:30,720
Yeah.

496
00:39:39,160 --> 00:39:45,220
Yeah, so can I write down the
equation that is fundamental

497
00:39:45,220 --> 00:39:49,750
here, and then we'll
figure out what it means.

498
00:39:49,750 --> 00:39:58,600
So it's an equation for G,
for the dot product matrix.

499
00:39:58,600 --> 00:40:01,120
OK, let me make space
for that equation.

500
00:40:05,410 --> 00:40:07,660
I believe that we can
get the dot product

501
00:40:07,660 --> 00:40:10,054
matrix which I'm calling G as--

502
00:40:14,380 --> 00:40:20,680
according to this, it's
minus 1/2 of the D matrix

503
00:40:20,680 --> 00:40:33,070
plus 1/2 of the 1's times
the d, the diagonal d.

504
00:40:33,070 --> 00:40:43,410
And it's plus 1/2 of
the d times the 1's.

505
00:40:48,460 --> 00:40:52,420
That's a matrix
with constant rows.

506
00:40:55,390 --> 00:40:59,860
This here is coming from there.

507
00:40:59,860 --> 00:41:11,020
This is a matrix with always
the same columns, or let me see.

508
00:41:11,020 --> 00:41:13,990
No, I haven't got
those right yet.

509
00:41:13,990 --> 00:41:21,100
I mean, I want these to be rank
1 matrices, so it's this one.

510
00:41:21,100 --> 00:41:22,160
Let me fix that.

511
00:41:32,780 --> 00:41:40,090
1, 1, 1, 1 times d transpose,
so it's column times row,

512
00:41:40,090 --> 00:41:49,720
and this one is also column
times row with the d here.

513
00:41:52,310 --> 00:41:56,150
OK, now let me look
at that properly.

514
00:41:56,150 --> 00:42:03,350
So every row in this guy is
a multiple of 1, 1, 1, 1.

515
00:42:03,350 --> 00:42:04,580
So what is that telling me?

516
00:42:04,580 --> 00:42:09,920
That all columns are
the same, this part

517
00:42:09,920 --> 00:42:15,860
is reflecting these ones,
where the columns are repeated.

518
00:42:15,860 --> 00:42:19,280
This one is reflecting this,
where the rows are repeated.

519
00:42:19,280 --> 00:42:22,100
The d is just the
set of d numbers.

520
00:42:22,100 --> 00:42:34,455
Let's call that di, and this
is dj, and here's the D matrix.

521
00:42:41,090 --> 00:42:46,760
So part of the D matrix
is this bit and this bit,

522
00:42:46,760 --> 00:42:49,670
each giving a rank 1.

523
00:42:49,670 --> 00:42:52,970
Now, it's this part that
I have to understand,

524
00:42:52,970 --> 00:42:56,960
so while you're
checking on that,

525
00:42:56,960 --> 00:43:02,090
let me look again at this.

526
00:43:07,590 --> 00:43:08,090
Yeah.

527
00:43:13,590 --> 00:43:17,170
Let's just see where
we are if this is true.

528
00:43:17,170 --> 00:43:20,590
If this is true, I'm
given the D matrix,

529
00:43:20,590 --> 00:43:28,810
and then these dot
products I can find.

530
00:43:28,810 --> 00:43:30,710
So I can find these,
so in other words,

531
00:43:30,710 --> 00:43:34,164
this is the key
equation that tells me

532
00:43:34,164 --> 00:43:39,130
D. That's the key
equation, and it's

533
00:43:39,130 --> 00:43:43,300
going to come just from
that simple identity,

534
00:43:43,300 --> 00:43:44,930
just from checking each term.

535
00:43:44,930 --> 00:43:47,530
This term we identified,
that last term we

536
00:43:47,530 --> 00:43:54,860
identified, and now this term
is D. Well, of, course it's D.

537
00:43:54,860 --> 00:43:58,750
So I have two of
those, and I'm going

538
00:43:58,750 --> 00:44:03,460
to take half of that
to get D, I think.

539
00:44:10,150 --> 00:44:12,970
Yeah, and we'll look.

540
00:44:15,900 --> 00:44:16,400
Yeah.

541
00:44:22,340 --> 00:44:29,300
So I guess I'm not seeing right
away why this 1/2 is in here,

542
00:44:29,300 --> 00:44:32,960
but I think I had it right,
and there's a reason.

543
00:44:32,960 --> 00:44:37,010
You see that this matrix
this, X transpose X matrix,

544
00:44:37,010 --> 00:44:41,630
is coming from these rank 1
pieces and these pieces which

545
00:44:41,630 --> 00:44:47,430
are the cross product.

546
00:44:47,430 --> 00:44:48,570
Oh, I see.

547
00:44:48,570 --> 00:44:51,060
I see.

548
00:44:51,060 --> 00:44:53,060
What that equation
is really saying

549
00:44:53,060 --> 00:44:56,730
is that the D matrix is this--

550
00:44:59,340 --> 00:45:02,220
if I just read that along
and translate it and put it

551
00:45:02,220 --> 00:45:14,070
in matrix language-- is this 1,
1, 1, 1, d1 to d4, let's say,

552
00:45:14,070 --> 00:45:17,310
transpose is this rank 1 matrix.

553
00:45:17,310 --> 00:45:24,900
And the other one is
the d's times the 1, 1

554
00:45:24,900 --> 00:45:27,060
which is a transpose of that.

555
00:45:27,060 --> 00:45:30,650
And then the other
one was a minus 2

556
00:45:30,650 --> 00:45:32,590
of the cross product matrices.

557
00:45:32,590 --> 00:45:33,370
I see.

558
00:45:33,370 --> 00:45:34,530
Yeah.

559
00:45:34,530 --> 00:45:38,520
So when I write that
equation in matrix language,

560
00:45:38,520 --> 00:45:39,280
I just get that.

561
00:45:41,810 --> 00:45:44,611
And now, when I solve for X--

562
00:45:44,611 --> 00:45:53,200
oh, minus 2 X transpose X. Yeah.

563
00:45:53,200 --> 00:45:57,270
Sorry, cross products, the X's.

564
00:45:57,270 --> 00:45:59,910
So I had one set
of cross products,

565
00:45:59,910 --> 00:46:03,090
and then this is the same
as this, so I have minus 2

566
00:46:03,090 --> 00:46:03,790
of them.

567
00:46:03,790 --> 00:46:06,420
So now, I'm just rewriting that.

568
00:46:06,420 --> 00:46:08,430
When I rewrite that
equation, I have that.

569
00:46:08,430 --> 00:46:09,960
Do you see that?

570
00:46:09,960 --> 00:46:12,360
I put that on this side.

571
00:46:12,360 --> 00:46:15,105
I put the d over
here as a minus d.

572
00:46:15,105 --> 00:46:20,250
I divide by 2, and then
that's the formula.

573
00:46:20,250 --> 00:46:26,240
So ultimately, this
simple identity

574
00:46:26,240 --> 00:46:30,560
just looked at-- because
these pieces were so simple,

575
00:46:30,560 --> 00:46:33,860
just rank one pieces,
and these pieces

576
00:46:33,860 --> 00:46:39,500
were exactly what we want, the
X transpose X pieces, the G.

577
00:46:39,500 --> 00:46:45,110
That equation told us
the D. All this is known.

578
00:46:48,350 --> 00:46:53,260
Well, so what's known
is D and this and this.

579
00:46:53,260 --> 00:47:01,310
So now, we have the equation for
X transpose X is minus 1/2 of D

580
00:47:01,310 --> 00:47:05,330
minus these rank 1's.

581
00:47:11,530 --> 00:47:16,250
Sorry to make it look messy.

582
00:47:16,250 --> 00:47:19,060
I remember Raj Rao talking
about it last spring,

583
00:47:19,060 --> 00:47:25,330
also the algebra got flustered.

584
00:47:25,330 --> 00:47:26,710
So we get it.

585
00:47:26,710 --> 00:47:31,450
So we know X transpose
X, that matrix.

586
00:47:31,450 --> 00:47:35,320
Now, can we just do four
minutes of linear algebra

587
00:47:35,320 --> 00:47:38,620
at the end today?

588
00:47:41,900 --> 00:47:51,170
Given X transpose X,
find X. This is n by n.

589
00:47:56,590 --> 00:47:58,880
How would you do that?

590
00:47:58,880 --> 00:48:00,680
Could you do it?

591
00:48:00,680 --> 00:48:03,190
Would there be just one X?

592
00:48:03,190 --> 00:48:05,620
No.

593
00:48:05,620 --> 00:48:11,380
So if you had one X,
multiply that by a rotation,

594
00:48:11,380 --> 00:48:14,530
by an orthogonal matrix,
you'd have another one.

595
00:48:14,530 --> 00:48:19,690
So this is finding X up to
an orthogonal transformation,

596
00:48:19,690 --> 00:48:22,150
but how would you
actually do that?

597
00:48:22,150 --> 00:48:27,025
What do we know about this
matrix, X transpose X?

598
00:48:27,025 --> 00:48:31,030
It's symmetric, clearly,
and what we especially know

599
00:48:31,030 --> 00:48:32,470
is that it is also?

600
00:48:32,470 --> 00:48:33,490
AUDIENCE: Positive.

601
00:48:33,490 --> 00:48:36,970
GILBERT STRANG: Positive
or semidefinite,

602
00:48:36,970 --> 00:48:38,890
so this is semidefinite.

603
00:48:42,180 --> 00:48:44,260
So I'm given a
semidefinite matrix,

604
00:48:44,260 --> 00:48:48,640
and I want to find a
square root, you could say.

605
00:48:48,640 --> 00:48:50,890
That matrix is
the X transpose X,

606
00:48:50,890 --> 00:48:54,160
and I want to find X. I
think there are two leading

607
00:48:54,160 --> 00:48:56,920
candidates.

608
00:48:56,920 --> 00:49:00,730
There are many candidates,
because if you find one,

609
00:49:00,730 --> 00:49:09,730
then any QX is OK.

610
00:49:09,730 --> 00:49:13,910
Because if I put a Q transpose
Q in there, it's the identity.

611
00:49:13,910 --> 00:49:14,650
OK.

612
00:49:14,650 --> 00:49:24,580
So one way is to use
eigenvalues of X transpose X,

613
00:49:24,580 --> 00:49:31,320
and the other way would be to
use elimination on X transpose

614
00:49:31,320 --> 00:49:37,280
X. So I'll put use.

615
00:49:37,280 --> 00:49:40,150
So if I use
eigenvalues of X, if I

616
00:49:40,150 --> 00:49:43,500
find the eigenvalues
of X transpose X,

617
00:49:43,500 --> 00:49:45,940
then I'm writing this a--

618
00:49:45,940 --> 00:49:47,870
it's a symmetric,
positive definition--

619
00:49:47,870 --> 00:49:51,410
I'm writing it as Q
lambda Q transpose.

620
00:49:51,410 --> 00:49:51,910
Right?

621
00:49:51,910 --> 00:49:54,820
That's the fundamental
most important theorem

622
00:49:54,820 --> 00:49:57,130
in linear algebra,
you could say.

623
00:49:57,130 --> 00:50:00,940
That a symmetric, positive,
semidefinite matrix

624
00:50:00,940 --> 00:50:06,340
has greater eigenvalues,
greater or equal to 0,

625
00:50:06,340 --> 00:50:09,650
and eigenvectors
that are orthogonal.

626
00:50:09,650 --> 00:50:12,580
So now, if I know
that, what's a good X?

627
00:50:12,580 --> 00:50:16,820
Then, take X to be what?

628
00:50:20,300 --> 00:50:24,920
So I've got the eigenvalues and
eigenvectors of X transpose X,

629
00:50:24,920 --> 00:50:27,590
and I'm looking for
an X that will work.

630
00:50:27,590 --> 00:50:33,800
And one idea is just to
take the same eigenvectors,

631
00:50:33,800 --> 00:50:37,160
and take the square
roots of the eigenvalues.

632
00:50:40,660 --> 00:50:41,930
That's symmetric now.

633
00:50:41,930 --> 00:50:51,860
This is equal to X
transpose, and that's

634
00:50:51,860 --> 00:50:57,150
a square root symbol, or a
lambda to the 1/2, I could say.

635
00:50:57,150 --> 00:51:00,260
So when I multiply that--

636
00:51:00,260 --> 00:51:03,500
X transpose X is
just X squared here.

637
00:51:03,500 --> 00:51:07,610
When I square it,
the Q transpose Q

638
00:51:07,610 --> 00:51:10,700
multiplies itself to
give the identity.

639
00:51:10,700 --> 00:51:14,740
The square root of lambda times
the square root of lambda,

640
00:51:14,740 --> 00:51:17,750
those are diagonal
matrices that give lambda,

641
00:51:17,750 --> 00:51:20,010
and I get the right answer.

642
00:51:20,010 --> 00:51:23,510
So one way is, in
a few words, take

643
00:51:23,510 --> 00:51:25,550
the square roots
of the eigenvalues

644
00:51:25,550 --> 00:51:27,560
and keep the eigenvectors.

645
00:51:27,560 --> 00:51:30,440
So that's the
eigenvalue construction.

646
00:51:30,440 --> 00:51:34,220
So that's producing an
X that is symmetric,

647
00:51:34,220 --> 00:51:36,650
positive, semidefinite.

648
00:51:36,650 --> 00:51:38,930
That might be what you want.

649
00:51:38,930 --> 00:51:42,260
It's a little work, because
your computing eigenvalues

650
00:51:42,260 --> 00:51:46,550
and eigenvectors to do
it, but that's one choice.

651
00:51:46,550 --> 00:51:51,200
Now, I believe that elimination
would give us another choice.

652
00:51:51,200 --> 00:51:55,100
So elimination produces
what factorization of this?

653
00:51:55,100 --> 00:52:00,020
This is still our symmetric,
positive, definite matrix.

654
00:52:00,020 --> 00:52:03,020
If you do elimination
on that, you usually

655
00:52:03,020 --> 00:52:12,980
expect L, a lower triangular,
times D, the pivots, times U,

656
00:52:12,980 --> 00:52:14,060
the upper triangle.

657
00:52:14,060 --> 00:52:18,620
That's the usual result
of elimination, LDU.

658
00:52:18,620 --> 00:52:21,260
I'm factoring out the
pivots, so they're

659
00:52:21,260 --> 00:52:25,460
1's on the diagonals
of L and U. But now,

660
00:52:25,460 --> 00:52:29,980
if it's a symmetric
matrix, what's up?

661
00:52:29,980 --> 00:52:32,830
We zipped by
elimination, regarding

662
00:52:32,830 --> 00:52:40,120
that as a 18.06 trivial
bit of linear algebra,

663
00:52:40,120 --> 00:52:42,280
but of course, it's
highly important.

664
00:52:42,280 --> 00:52:47,290
So what's the situation here
when the matrix is actually

665
00:52:47,290 --> 00:52:48,010
symmetric?

666
00:52:51,020 --> 00:52:53,810
So I want something
to look symmetric.

667
00:52:53,810 --> 00:52:56,130
How do I make that
look symmetric?

668
00:52:56,130 --> 00:52:59,060
The U gets replaced
by L transpose.

669
00:53:03,614 --> 00:53:05,920
If I'm working on a
positive definite--

670
00:53:05,920 --> 00:53:08,090
say positive definite matrix--

671
00:53:08,090 --> 00:53:15,020
then I get positive pivots,
and L and lower triangular

672
00:53:15,020 --> 00:53:17,820
and upper triangular are
transposes of each other.

673
00:53:17,820 --> 00:53:20,960
So now, what is then the X?

674
00:53:23,840 --> 00:53:25,610
It's just like that.

675
00:53:25,610 --> 00:53:31,990
I'll use L square root
of the D L transpose.

676
00:53:31,990 --> 00:53:34,800
Is that right?

677
00:53:34,800 --> 00:53:37,260
Oh, wait a minute.

678
00:53:37,260 --> 00:53:38,920
What's up?

679
00:53:38,920 --> 00:53:41,620
No, that's not going
to work, because I

680
00:53:41,620 --> 00:53:45,310
don't have L transpose L.
Where I had Q transpose Q,

681
00:53:45,310 --> 00:53:46,490
it was good.

682
00:53:46,490 --> 00:53:48,250
No, sorry.

683
00:53:48,250 --> 00:53:52,400
Let's get that totally erased.

684
00:53:52,400 --> 00:53:57,560
The X part should just be the
square root of DL transpose.

685
00:54:00,230 --> 00:54:02,840
The X is now a
triangular matrix,

686
00:54:02,840 --> 00:54:07,550
the square root of the pivots,
and the L transpose part.

687
00:54:07,550 --> 00:54:10,220
And now, when I
do X transpose X,

688
00:54:10,220 --> 00:54:13,790
then you see X transpose
X coming correctly.

689
00:54:13,790 --> 00:54:18,950
X transpose will be L transpose.

690
00:54:18,950 --> 00:54:21,440
Transpose will give me
the L. Square root of D

691
00:54:21,440 --> 00:54:24,080
will be square root of
D. We'll give the D,

692
00:54:24,080 --> 00:54:26,030
and then the L
transpose is right.

693
00:54:26,030 --> 00:54:27,740
So this is called the--

694
00:54:30,290 --> 00:54:31,880
do I try to write it here?

695
00:54:31,880 --> 00:54:35,073
This is my last word for today--

696
00:54:35,073 --> 00:54:35,615
the Cholesky.

697
00:54:39,470 --> 00:54:42,140
This is the Cholesky
Factorization,

698
00:54:42,140 --> 00:54:47,430
named after a French guy,
a French soldier actually.

699
00:54:47,430 --> 00:54:54,470
So LDL transpose is
Cholesky, and that's

700
00:54:54,470 --> 00:54:56,840
easy to compute, much
faster to compute

701
00:54:56,840 --> 00:54:59,120
than the eigenvalue square root.

702
00:54:59,120 --> 00:55:01,460
But this square
root is triangular.

703
00:55:01,460 --> 00:55:03,710
This square root is symmetric.

704
00:55:03,710 --> 00:55:08,810
Those are the two pieces of
linear algebra to find things,

705
00:55:08,810 --> 00:55:12,020
to reduce things
to triangular form,

706
00:55:12,020 --> 00:55:16,350
or to reduce them to connect
them with symmetric matrices.

707
00:55:16,350 --> 00:55:19,130
OK, thank you for
attention today.

708
00:55:19,130 --> 00:55:26,180
So today, we did the
distance matrices,

709
00:55:26,180 --> 00:55:29,720
and this was the final
step to get the X.

710
00:55:29,720 --> 00:55:35,840
And also, most
important was to get

711
00:55:35,840 --> 00:55:39,350
the structure of a
neural net straight,

712
00:55:39,350 --> 00:55:42,500
separating the v's,
the sample vectors,

713
00:55:42,500 --> 00:55:44,780
from the x's, the weights.

714
00:55:44,780 --> 00:55:49,580
OK, so Friday, I've
got one volunteer

715
00:55:49,580 --> 00:55:54,560
to talk about a project, and I'm
desperately looking for more.

716
00:55:54,560 --> 00:55:56,750
Please just send me an email.

717
00:56:00,226 --> 00:56:02,660
It'd would be appreciated,
or I'll send you an email,

718
00:56:02,660 --> 00:56:03,920
if necessary.

719
00:56:03,920 --> 00:56:05,880
OK, thanks.