1
00:00:01,550 --> 00:00:03,920
The following content is
provided under a Creative

2
00:00:03,920 --> 00:00:05,310
Commons license.

3
00:00:05,310 --> 00:00:07,520
Your support will help
MIT OpenCourseWare

4
00:00:07,520 --> 00:00:11,610
continue to offer high quality
educational resources for free.

5
00:00:11,610 --> 00:00:14,180
To make a donation, or to
view additional materials

6
00:00:14,180 --> 00:00:18,140
from hundreds of MIT courses,
visit MIT OpenCourseWare

7
00:00:18,140 --> 00:00:19,026
at ocw.mit.edu.

8
00:00:22,218 --> 00:00:23,010
GILBERT STRANG: OK.

9
00:00:23,010 --> 00:00:29,070
Just as we're getting
started, I thought

10
00:00:29,070 --> 00:00:33,450
I'd add a few words
about a question that

11
00:00:33,450 --> 00:00:35,550
came up after class.

12
00:00:35,550 --> 00:00:40,710
Suppose, in that
discussion last time,

13
00:00:40,710 --> 00:00:44,430
where you were given three--

14
00:00:44,430 --> 00:00:47,220
you were given a
distance matrix--

15
00:00:47,220 --> 00:00:51,420
you were given the
distance between x1 and x2,

16
00:00:51,420 --> 00:00:54,630
between x2 and x3,
and between x1 and x3,

17
00:00:54,630 --> 00:01:00,330
and you wanted to find
points that satisfied that.

18
00:01:00,330 --> 00:01:04,260
Well, we're going to
fail on this example,

19
00:01:04,260 --> 00:01:09,150
because if the distance here
is 1, the distance here is 1,

20
00:01:09,150 --> 00:01:14,850
then by the triangle inequality,
the distance from x1 to x3

21
00:01:14,850 --> 00:01:16,980
could not be more than 2.

22
00:01:16,980 --> 00:01:19,890
And when we square it, it
could not be more than 4.

23
00:01:19,890 --> 00:01:21,690
And here it's 6.

24
00:01:21,690 --> 00:01:22,800
So what's going to happen?

25
00:01:22,800 --> 00:01:26,060
What goes wrong in that case?

26
00:01:26,060 --> 00:01:26,560
Yeah.

27
00:01:26,560 --> 00:01:30,660
I hadn't commented on
that, and I'm not sure

28
00:01:30,660 --> 00:01:38,460
that the paper that
I referenced does so.

29
00:01:38,460 --> 00:01:41,760
So I had to do a little
search back in the literature,

30
00:01:41,760 --> 00:01:45,430
because people couldn't
overlook this problem.

31
00:01:45,430 --> 00:01:52,230
So this is the triangle
inequality fails.

32
00:01:57,340 --> 00:02:01,670
And it's not going to help
to go into 10 dimensions,

33
00:02:01,670 --> 00:02:07,700
because the triangle
inequalities doesn't change.

34
00:02:07,700 --> 00:02:09,900
And it's still there
in 10 dimensions.

35
00:02:09,900 --> 00:02:12,170
And we're still failing.

36
00:02:12,170 --> 00:02:13,160
So what happens?

37
00:02:13,160 --> 00:02:15,410
Well, what could happen?

38
00:02:15,410 --> 00:02:16,250
Do you remember?

39
00:02:16,250 --> 00:02:20,360
And you'll have to remind
me, the key equation.

40
00:02:20,360 --> 00:02:29,680
You remember, we had an
equation connecting the--

41
00:02:29,680 --> 00:02:33,500
so what is the matrix
D for this problem?

42
00:02:33,500 --> 00:02:39,410
So D is-- this is
a 3 by 3 matrix

43
00:02:39,410 --> 00:02:41,630
with these distances squared.

44
00:02:41,630 --> 00:02:44,150
And it was convenient
to use distances

45
00:02:44,150 --> 00:02:49,340
squared, because that's what
comes into the next steps.

46
00:02:49,340 --> 00:02:54,450
So of course, the distance
from each x to itself is zero.

47
00:02:54,450 --> 00:02:58,970
The distance from x
distance squared was that.

48
00:02:58,970 --> 00:03:00,980
This one was that.

49
00:03:00,980 --> 00:03:03,630
But this one is 6.

50
00:03:03,630 --> 00:03:06,260
OK.

51
00:03:06,260 --> 00:03:08,570
So that's the distance matrix.

52
00:03:08,570 --> 00:03:11,030
And we would like to find--

53
00:03:11,030 --> 00:03:13,430
the job was to find--

54
00:03:13,430 --> 00:03:20,360
and I'm just going to write
down, we cannot find x1, x2,

55
00:03:20,360 --> 00:03:26,760
and x3 to match those distances.

56
00:03:26,760 --> 00:03:27,650
So what goes wrong?

57
00:03:27,650 --> 00:03:30,320
Well, there's only one
thing that could go wrong.

58
00:03:30,320 --> 00:03:35,720
When you connect this distance
matrix D to the matrix X

59
00:03:35,720 --> 00:03:38,780
transpose X-- you remember
the position matrix--

60
00:03:38,780 --> 00:03:42,210
maybe I called it G?

61
00:03:42,210 --> 00:03:51,720
This is giving-- so Gij is
the dot product of xi with xj.

62
00:03:54,540 --> 00:03:57,270
Make that into a j.

63
00:03:57,270 --> 00:03:57,960
Thank you.

64
00:04:03,180 --> 00:04:08,340
So Gij is the matrix
of dot product.

65
00:04:08,340 --> 00:04:15,070
And the great thing was that we
can discover what that matrix--

66
00:04:15,070 --> 00:04:18,320
that matrix G comes
directly from D--

67
00:04:18,320 --> 00:04:20,820
comes directly from
D. And of course,

68
00:04:20,820 --> 00:04:23,950
what do we know about this
matrix of cross products?

69
00:04:23,950 --> 00:04:27,779
We know that is
positive semidefinite.

70
00:04:33,530 --> 00:04:35,270
So what goes wrong?

71
00:04:35,270 --> 00:04:40,250
Well, just in a word, when
we write out that equation

72
00:04:40,250 --> 00:04:45,690
and discover what G is, if
the triangle inequality fails,

73
00:04:45,690 --> 00:04:51,650
we learn that G doesn't
come out positive definite.

74
00:04:51,650 --> 00:04:54,760
That's really all I want to say.

75
00:04:54,760 --> 00:04:57,670
And I could push
through the example.

76
00:04:57,670 --> 00:05:00,820
G will not come out
positive definite if D--

77
00:05:00,820 --> 00:05:05,110
if that's D because it can't.

78
00:05:05,110 --> 00:05:07,420
If it came out
positive definite,

79
00:05:07,420 --> 00:05:11,350
then we could find an X.
So if we had the G, then

80
00:05:11,350 --> 00:05:14,140
the final step, you
remember, is to find

81
00:05:14,140 --> 00:05:19,360
an X. Well we know that if
G is positive semidefinite,

82
00:05:19,360 --> 00:05:24,970
there are multiple
ways to find an X.

83
00:05:24,970 --> 00:05:29,320
This is positive semidefinite
matrices is what you get out

84
00:05:29,320 --> 00:05:31,420
of X transpose X's.

85
00:05:31,420 --> 00:05:36,160
And we can find an x given a
G. We can find G given an x.

86
00:05:36,160 --> 00:05:48,390
So it has to be that
this won't be true--

87
00:05:48,390 --> 00:05:51,580
that the matrix G that
comes out of that equation

88
00:05:51,580 --> 00:05:54,370
will turn out not to
be positive definite.

89
00:05:54,370 --> 00:05:57,350
So it's really quite nice.

90
00:05:57,350 --> 00:05:59,290
It's a beautiful little
bit of mathematics,

91
00:05:59,290 --> 00:06:04,270
that if, and only if,
the triangle inequality

92
00:06:04,270 --> 00:06:08,140
is satisfied by these numbers--

93
00:06:08,140 --> 00:06:09,730
if and only if--

94
00:06:09,730 --> 00:06:14,300
then the matrix
in the D matrix--

95
00:06:14,300 --> 00:06:17,300
then the G matrix that
comes out of this equation--

96
00:06:17,300 --> 00:06:18,730
which I haven't written--

97
00:06:18,730 --> 00:06:21,360
is positive semidefinite.

98
00:06:21,360 --> 00:06:26,160
If the triangle inequality is
OK, we can find the points.

99
00:06:26,160 --> 00:06:30,600
If the triangle inequality
is violated-- like here--

100
00:06:30,600 --> 00:06:34,260
then the matrix G is not
positive semidefinite,

101
00:06:34,260 --> 00:06:38,070
has negative eigenvalues,
and we cannot find the point.

102
00:06:38,070 --> 00:06:39,060
Yeah.

103
00:06:39,060 --> 00:06:42,120
I could recall
the G matrix but--

104
00:06:42,120 --> 00:06:54,000
the G equation, but it's coming
to you in the two page section

105
00:06:54,000 --> 00:06:59,250
that does distance matrices.

106
00:06:59,250 --> 00:07:00,480
OK.

107
00:07:00,480 --> 00:07:06,740
That's just-- I should
have made a point.

108
00:07:06,740 --> 00:07:09,000
It's nice to have
specific numbers.

109
00:07:09,000 --> 00:07:11,820
And I could get the
specific numbers for G,

110
00:07:11,820 --> 00:07:13,380
and we would see, no way.

111
00:07:13,380 --> 00:07:14,960
It's not positive definite.

112
00:07:14,960 --> 00:07:15,840
OK.

113
00:07:15,840 --> 00:07:18,990
So that's just
tidying up last time.

114
00:07:18,990 --> 00:07:23,670
I have another small
problem to talk about,

115
00:07:23,670 --> 00:07:29,610
and then a big question of
whether deep learning actually

116
00:07:29,610 --> 00:07:30,510
works.

117
00:07:30,510 --> 00:07:33,600
I had an email from
an expert last night,

118
00:07:33,600 --> 00:07:39,930
which changed my view of the
world about that question,

119
00:07:39,930 --> 00:07:43,390
as you can imagine.

120
00:07:43,390 --> 00:07:49,170
The change in my world was, I
had thought the answer was yes,

121
00:07:49,170 --> 00:07:52,140
and I now think
the answer is no.

122
00:07:52,140 --> 00:07:56,820
So that's like rather
a big issue for 18.065.

123
00:07:56,820 --> 00:07:59,095
But we'll-- let's
see about that later.

124
00:07:59,095 --> 00:07:59,595
OK.

125
00:08:03,750 --> 00:08:06,380
Now Procrustes' problem.

126
00:08:06,380 --> 00:08:10,920
So Procrustes-- and it's
included in the notes--

127
00:08:10,920 --> 00:08:12,930
that name comes
from a Greek myth.

128
00:08:12,930 --> 00:08:16,800
Are you guys into Greek myths?

129
00:08:16,800 --> 00:08:20,102
So what was the
story of Procrustes?

130
00:08:23,280 --> 00:08:30,540
Was it Procrustes who
adjusted the length of his--

131
00:08:30,540 --> 00:08:34,179
so he had a special bed.

132
00:08:34,179 --> 00:08:36,850
Procrustes' bed--
certain length.

133
00:08:36,850 --> 00:08:38,950
And then, he had
visitors coming.

134
00:08:38,950 --> 00:08:42,730
And instead of adjusting
the length of the bed

135
00:08:42,730 --> 00:08:47,380
to fit the visitor,
Procrustes adjusted the length

136
00:08:47,380 --> 00:08:50,350
of the visitor to fit the bed.

137
00:08:50,350 --> 00:08:55,330
So either stretched the
visitor or chopped off

138
00:08:55,330 --> 00:08:56,290
part of the visitor.

139
00:08:56,290 --> 00:09:01,380
So anyway-- the Greeks
like this sort of thing.

140
00:09:01,380 --> 00:09:02,050
OK.

141
00:09:02,050 --> 00:09:06,970
So anyway, that's a
Greek myth for 18.065.

142
00:09:06,970 --> 00:09:07,750
OK.

143
00:09:07,750 --> 00:09:12,340
So the whole idea, the
Procrustes problem,

144
00:09:12,340 --> 00:09:15,860
is to make something
fit something else.

145
00:09:18,500 --> 00:09:20,165
So the two things are--

146
00:09:25,410 --> 00:09:27,360
so suppose I'm just
in three dimensions

147
00:09:27,360 --> 00:09:30,140
and I have two vectors here.

148
00:09:30,140 --> 00:09:33,780
So I have a basis for a
two dimensional space.

149
00:09:33,780 --> 00:09:34,905
And over here I have--

150
00:09:37,680 --> 00:09:44,880
people-- space scientists
might have one computation

151
00:09:44,880 --> 00:09:49,150
of the positions of satellites.

152
00:09:49,150 --> 00:09:51,480
Then, of course, they
wouldn't be off by as much

153
00:09:51,480 --> 00:09:53,070
as this figure shows.

154
00:09:53,070 --> 00:09:55,830
But then they have
another computation

155
00:09:55,830 --> 00:09:58,170
using different coordinates.

156
00:09:58,170 --> 00:10:03,400
So it partly rotated
from this picture,

157
00:10:03,400 --> 00:10:07,230
but also it's partly got
round off errors and error

158
00:10:07,230 --> 00:10:09,000
in it between the two.

159
00:10:09,000 --> 00:10:14,160
So the question is, what's the
best orthogonal transformation?

160
00:10:14,160 --> 00:10:22,100
So this is a bunch of vectors,
x1, x2, to xn, let's say.

161
00:10:22,100 --> 00:10:26,460
And I want to modify them
by an orthogonal matrix--

162
00:10:26,460 --> 00:10:28,260
maybe I'd do it
on the other side.

163
00:10:28,260 --> 00:10:29,010
I think I do.

164
00:10:29,010 --> 00:10:29,510
Yeah.

165
00:10:33,970 --> 00:10:41,340
Q, to be as close as possible
to this other set, y1,

166
00:10:41,340 --> 00:10:44,640
y2 up to yn.

167
00:10:44,640 --> 00:10:47,330
So let me just say it again.

168
00:10:47,330 --> 00:10:50,100
I have two sets of vectors.

169
00:10:50,100 --> 00:10:53,430
And I'm looking, and they're
different-- like those two

170
00:10:53,430 --> 00:10:54,390
sets.

171
00:10:54,390 --> 00:10:58,060
And I'm looking for the
orthogonality matrix

172
00:10:58,060 --> 00:11:02,490
that, as well as possible,
takes this set into this one.

173
00:11:02,490 --> 00:11:06,150
Of course, if this was
an orthogonal basis,

174
00:11:06,150 --> 00:11:08,760
and this was an
orthogonal basis, then

175
00:11:08,760 --> 00:11:11,910
we would be home free.

176
00:11:11,910 --> 00:11:13,590
Q-- we could get equality.

177
00:11:13,590 --> 00:11:16,800
We could take an
orthogonal basis directly

178
00:11:16,800 --> 00:11:21,390
into an orthogonal basis
with a orthogonal matrix Q.

179
00:11:21,390 --> 00:11:24,240
In other words, if x was
an orthogonal matrix,

180
00:11:24,240 --> 00:11:26,640
and y was an
orthogonal matrix, we

181
00:11:26,640 --> 00:11:34,010
would get the exact correct
Q. But that's not the case.

182
00:11:34,010 --> 00:11:35,900
So we're looking for
the best possible.

183
00:11:35,900 --> 00:11:38,010
So that's the problem there--

184
00:11:38,010 --> 00:11:46,610
minimize over
orthogonal matrix--

185
00:11:46,610 --> 00:11:50,870
matrices Q. And I just
want to get my notation

186
00:11:50,870 --> 00:11:52,220
to be consistent here.

187
00:11:54,950 --> 00:11:55,450
OK.

188
00:11:59,670 --> 00:12:04,980
So I've-- I see that starting
with the y's and mapping them

189
00:12:04,980 --> 00:12:07,200
to x's--

190
00:12:07,200 --> 00:12:09,790
so let me ask the question.

191
00:12:09,790 --> 00:12:15,810
What orthogonal matrix Q
multiplies the y's to come

192
00:12:15,810 --> 00:12:18,270
as close as possible to the x's?

193
00:12:18,270 --> 00:12:22,740
So over all
orthogonal Q's I want

194
00:12:22,740 --> 00:12:30,420
to minimize YQ minus X
in the Frobenius norm.

195
00:12:30,420 --> 00:12:33,100
And I might as well square it.

196
00:12:33,100 --> 00:12:37,560
So Frobenius-- we're
into the Frobenius norm.

197
00:12:37,560 --> 00:12:40,120
Remember the-- of a matrix?

198
00:12:45,850 --> 00:12:50,930
This is a very convenient
norm in data science,

199
00:12:50,930 --> 00:12:52,700
to measure the size of a matrix.

200
00:12:52,700 --> 00:12:56,030
And we have several
possible formulas for it.

201
00:12:56,030 --> 00:13:02,900
So let me call the matrix A.
And the Frobenius norm squared--

202
00:13:02,900 --> 00:13:05,570
so what's one
expression, in terms

203
00:13:05,570 --> 00:13:08,350
of the entries of the matrix--

204
00:13:08,350 --> 00:13:11,470
the numbers Aij in the matrix?

205
00:13:11,470 --> 00:13:16,370
The Frobenius norm just
treats it like a long vector.

206
00:13:16,370 --> 00:13:20,300
So it's a11 squared,
plus a12 squared,

207
00:13:20,300 --> 00:13:27,440
of all the way along the
first plus second row, just--

208
00:13:32,800 --> 00:13:35,460
I'll say nn squared.

209
00:13:35,460 --> 00:13:36,650
OK.

210
00:13:36,650 --> 00:13:39,730
Sum of all the squares--

211
00:13:39,730 --> 00:13:42,370
just treating it
like a long vector.

212
00:13:42,370 --> 00:13:43,450
OK.

213
00:13:43,450 --> 00:13:47,860
This-- but that's a awkward
expression to write down.

214
00:13:47,860 --> 00:13:51,900
So what other ways
do we have to find

215
00:13:51,900 --> 00:13:56,485
the Frobenius norm of a matrix?

216
00:13:59,690 --> 00:14:01,030
Let's see.

217
00:14:01,030 --> 00:14:08,560
I could look at this as A
transpose A. Is that right?

218
00:14:08,560 --> 00:14:13,870
A transpose A. So what
what's happening there?

219
00:14:13,870 --> 00:14:19,690
Remind me what-- yeah.

220
00:14:19,690 --> 00:14:21,400
I would get all that.

221
00:14:21,400 --> 00:14:28,750
I would get all these by taking
the matrix A transpose times

222
00:14:28,750 --> 00:14:31,200
A. But what--

223
00:14:31,200 --> 00:14:31,700
sorry.

224
00:14:31,700 --> 00:14:38,110
I'm not-- I haven't--

225
00:14:38,110 --> 00:14:44,060
I've lost my thread
of talk here.

226
00:14:44,060 --> 00:14:47,960
So here's-- oh, and then I
take the trace, of course.

227
00:14:47,960 --> 00:14:51,320
So that first row--

228
00:14:51,320 --> 00:14:58,250
first column of A times
that one will give me the--

229
00:14:58,250 --> 00:15:00,650
one set of squares.

230
00:15:00,650 --> 00:15:03,950
And then, that one times
the other, and the next one,

231
00:15:03,950 --> 00:15:07,310
will give me the next
set of squares, right?

232
00:15:07,310 --> 00:15:08,870
So this is going to--

233
00:15:08,870 --> 00:15:10,490
if I look at the trace--

234
00:15:10,490 --> 00:15:13,350
so now, let me.

235
00:15:13,350 --> 00:15:17,090
So I just want to look
at the diagonal here.

236
00:15:17,090 --> 00:15:20,240
So it's the trace.

237
00:15:20,240 --> 00:15:23,950
You remember, the
trace of a matrix--

238
00:15:23,950 --> 00:15:30,220
of a matrix M is the sum
down the diagonal M11,

239
00:15:30,220 --> 00:15:34,430
M22, down to Mnn.

240
00:15:34,430 --> 00:15:38,800
It's the diagonal sum.

241
00:15:44,300 --> 00:15:47,880
And-- everybody
with me here now?

242
00:15:47,880 --> 00:15:51,600
So that term on the diagonal--

243
00:15:51,600 --> 00:15:54,860
A transpose A--
gives me all of that.

244
00:15:54,860 --> 00:16:00,020
Then-- or maybe I should
be doing AA transpose.

245
00:16:00,020 --> 00:16:02,520
The point is, it doesn't matter.

246
00:16:02,520 --> 00:16:06,700
Or the trace of AA transpose.

247
00:16:06,700 --> 00:16:12,020
That would be-- those would
both give the correct Frobenius

248
00:16:12,020 --> 00:16:12,665
norm squared.

249
00:16:15,740 --> 00:16:20,010
So traces are going to come
into this little problem.

250
00:16:20,010 --> 00:16:22,560
Now there's another formula
for the Frobenius norm--

251
00:16:22,560 --> 00:16:24,180
even shorter--

252
00:16:24,180 --> 00:16:26,940
well, certainly
shorter than this one--

253
00:16:26,940 --> 00:16:28,800
involving a sum of squares.

254
00:16:28,800 --> 00:16:31,380
And what's that one?

255
00:16:31,380 --> 00:16:36,170
What's the other way
to get the same answer?

256
00:16:36,170 --> 00:16:37,760
If I look at the SVD--

257
00:16:37,760 --> 00:16:40,070
look at singular values.

258
00:16:40,070 --> 00:16:46,040
I think that this is also
equal to the sum square of all

259
00:16:46,040 --> 00:16:46,895
the singular values.

260
00:16:52,850 --> 00:16:59,140
So it's three nice expressions
for the Frobenius norm.

261
00:16:59,140 --> 00:17:03,190
The nice ones involve A
transpose A, or AA transpose.

262
00:17:03,190 --> 00:17:06,089
And of course, that connects
to the singular values,

263
00:17:06,089 --> 00:17:09,390
because what are-- what's the
connection between singular

264
00:17:09,390 --> 00:17:10,800
values and those--

265
00:17:10,800 --> 00:17:12,089
and these guys--

266
00:17:12,089 --> 00:17:15,069
A transpose A, or AA transpose?

267
00:17:15,069 --> 00:17:19,140
The singular values are the--

268
00:17:19,140 --> 00:17:23,606
or the singular values
squared are the--

269
00:17:23,606 --> 00:17:24,560
AUDIENCE: Eigenvalues.

270
00:17:24,560 --> 00:17:27,020
GILBERT STRANG: Eigenvalues
of A transpose A.

271
00:17:27,020 --> 00:17:30,110
And then when I
add up the trace,

272
00:17:30,110 --> 00:17:34,370
I'm adding up the
eigenvalues and that's the--

273
00:17:34,370 --> 00:17:39,080
that gives me the
Frobenius norm squared.

274
00:17:39,080 --> 00:17:39,940
So this is a--

275
00:17:43,490 --> 00:17:46,370
that tells us
something important,

276
00:17:46,370 --> 00:17:49,870
which we can see in
different ways, that the--

277
00:17:49,870 --> 00:17:52,640
so to solve this
problem, we're going

278
00:17:52,640 --> 00:17:59,410
to need various facts, like
the QA in the Frobenius norm

279
00:17:59,410 --> 00:18:03,200
is the same as A in
the Frobenius norm.

280
00:18:03,200 --> 00:18:05,480
Why is that?

281
00:18:05,480 --> 00:18:05,980
Why?

282
00:18:12,440 --> 00:18:15,290
So here I'm multiplying
every column

283
00:18:15,290 --> 00:18:18,840
by the matrix Q. What happens
to the length of the column

284
00:18:18,840 --> 00:18:20,210
when I multiply it by q?

285
00:18:20,210 --> 00:18:21,377
AUDIENCE: It doesn't change.

286
00:18:21,377 --> 00:18:22,700
GILBERT STRANG: Doesn't change.

287
00:18:22,700 --> 00:18:26,450
So I could add up the length
of the columns all squared.

288
00:18:26,450 --> 00:18:29,130
Here I wrote it
in terms of rows.

289
00:18:29,130 --> 00:18:33,530
But I could have reordered that,
and got it in terms of columns.

290
00:18:33,530 --> 00:18:44,420
That's because the length of
Q times any vector squared

291
00:18:44,420 --> 00:18:48,890
is the same as the
vector squared.

292
00:18:48,890 --> 00:18:56,600
And these-- take these
to be the columns of A.

293
00:18:56,600 --> 00:19:01,500
So for column by column,
the multiplication by Q

294
00:19:01,500 --> 00:19:03,480
doesn't change the length.

295
00:19:03,480 --> 00:19:07,800
And then when I add up
all the columns squared,

296
00:19:07,800 --> 00:19:10,860
I get the Frobenius
norm squared.

297
00:19:10,860 --> 00:19:13,440
And another way to say it--

298
00:19:13,440 --> 00:19:17,070
let's make that connection
between this fact--

299
00:19:17,070 --> 00:19:20,460
that Q didn't change
the Frobenius norm--

300
00:19:20,460 --> 00:19:24,240
and this fact, that the
Frobenius norm is expressed

301
00:19:24,240 --> 00:19:26,580
in terms of the sigmas.

302
00:19:26,580 --> 00:19:30,520
So what does Q do to the sigmas?

303
00:19:30,520 --> 00:19:35,260
I want to see in another
way the answer to why.

304
00:19:35,260 --> 00:19:38,590
So if I have a matrix
A with singular values,

305
00:19:38,590 --> 00:19:40,000
I multiply by Q--

306
00:19:40,000 --> 00:19:42,832
what happens to the
singular values?

307
00:19:42,832 --> 00:19:43,790
AUDIENCE: Don't change.

308
00:19:43,790 --> 00:19:45,140
GILBERT STRANG: Don't change.

309
00:19:45,140 --> 00:19:46,010
Don't change.

310
00:19:46,010 --> 00:19:49,760
That's the key point
about singular values.

311
00:19:49,760 --> 00:19:57,530
If I multiply-- so A has a
SVD, U sigma V transpose.

312
00:19:57,530 --> 00:20:04,800
And QA will have the SVD
QU sigma V transpose.

313
00:20:04,800 --> 00:20:07,160
So all I've changed
when I multiply by Q--

314
00:20:07,160 --> 00:20:10,670
all I changed was
the first factor--

315
00:20:10,670 --> 00:20:14,930
the first orthogonal
factor in the SVD.

316
00:20:14,930 --> 00:20:17,190
I didn't change the sigmas.

317
00:20:17,190 --> 00:20:19,130
They're still sitting there.

318
00:20:19,130 --> 00:20:22,220
So-- and of course, I could
do also A on the other side--

319
00:20:22,220 --> 00:20:25,970
different Q. Same Q or a
different Q on the other side

320
00:20:25,970 --> 00:20:29,360
would show up here, and
would not change the sigmas,

321
00:20:29,360 --> 00:20:32,420
and therefore would not
change the Frobenius norm.

322
00:20:32,420 --> 00:20:36,590
So these are
important properties

323
00:20:36,590 --> 00:20:38,480
of this Frobenius norm.

324
00:20:38,480 --> 00:20:44,240
It's a-- it looks messy to
write down in that form,

325
00:20:44,240 --> 00:20:49,480
but it's much nicer in these
forms and in that form.

326
00:20:49,480 --> 00:20:50,860
OK.

327
00:20:50,860 --> 00:20:53,310
So now, if I can just--

328
00:20:53,310 --> 00:20:58,370
then we saw that
it involves traces.

329
00:20:58,370 --> 00:21:02,345
So let me make a few
observations about traces.

330
00:21:09,180 --> 00:21:15,090
So I'll just-- we want to
be able to play with traces,

331
00:21:15,090 --> 00:21:18,310
and that's something
we really haven't done.

332
00:21:18,310 --> 00:21:24,450
Here's a fact-- that the
trace of A transpose B

333
00:21:24,450 --> 00:21:33,210
is equal to the trace
of B transpose A.

334
00:21:33,210 --> 00:21:38,700
Of course, if B
is A, it's clear,

335
00:21:38,700 --> 00:21:45,120
and it's equal to the
trace of BA transpose.

336
00:21:49,000 --> 00:21:55,340
So even do little
changes in your matrix

337
00:21:55,340 --> 00:21:57,530
without changing the trace.

338
00:21:57,530 --> 00:21:59,900
Let's see why one
of these is true.

339
00:21:59,900 --> 00:22:02,420
Why is that first
statement true?

340
00:22:09,130 --> 00:22:13,166
How is that matrix
related to this matrix?

341
00:22:13,166 --> 00:22:14,630
AUDIENCE: [INAUDIBLE] transpose.

342
00:22:14,630 --> 00:22:16,940
GILBERT STRANG: It's
just a transpose.

343
00:22:16,940 --> 00:22:19,730
If I take the transpose of
that matrix, I get that.

344
00:22:19,730 --> 00:22:22,120
So what happens to the trace?

345
00:22:22,120 --> 00:22:23,760
I'm adding down the diagonal.

346
00:22:23,760 --> 00:22:26,300
The transpose has no effect.

347
00:22:26,300 --> 00:22:32,840
Clearly, this is just a fact
that the trace doesn't change--

348
00:22:32,840 --> 00:22:35,780
is not changed when
you transpose a matrix,

349
00:22:35,780 --> 00:22:38,060
because the diagonal
is not changed.

350
00:22:38,060 --> 00:22:41,060
Now what about this guy?

351
00:22:41,060 --> 00:22:44,210
I guess we're getting back
to old fashioned 18.065,

352
00:22:44,210 --> 00:22:47,390
remembering facts
about linear algebra,

353
00:22:47,390 --> 00:22:49,580
because this is a
pure linear algebra.

354
00:22:49,580 --> 00:22:51,170
So what's this one about?

355
00:22:51,170 --> 00:22:55,570
This says that I can reverse
the order of two matrices.

356
00:22:55,570 --> 00:23:00,510
So I'm now looking at the
connection between those two.

357
00:23:00,510 --> 00:23:05,870
And so let me just-- to
use different letters--

358
00:23:05,870 --> 00:23:11,560
CD equals the trace of DC.

359
00:23:11,560 --> 00:23:14,540
I can flip the order.

360
00:23:14,540 --> 00:23:16,130
That's all I've done here is.

361
00:23:16,130 --> 00:23:18,980
I've reversed B
with A transpose.

362
00:23:18,980 --> 00:23:23,450
I reversed C with D.
So why is that true?

363
00:23:23,450 --> 00:23:24,800
Why is that true?

364
00:23:28,030 --> 00:23:33,100
Well, how shall we see
the truth of that fact?

365
00:23:33,100 --> 00:23:35,560
So these are really
convenient facts,

366
00:23:35,560 --> 00:23:40,180
that make a lot of people use
the trace more often than we

367
00:23:40,180 --> 00:23:41,770
have in 18.065.

368
00:23:41,770 --> 00:23:45,910
I'm not a big user of
arguments based on trace,

369
00:23:45,910 --> 00:23:52,430
but these are identities that go
a long way with many problems.

370
00:23:52,430 --> 00:23:55,850
So let's see why that's true.

371
00:23:55,850 --> 00:23:57,640
Any time you think
about trace, you've

372
00:23:57,640 --> 00:24:01,090
got two languages to use.

373
00:24:01,090 --> 00:24:02,680
You can use the eigenvalues.

374
00:24:02,680 --> 00:24:05,320
It's the sum of the eigenvalues.

375
00:24:05,320 --> 00:24:07,480
Or you can use the
diagonal entries,

376
00:24:07,480 --> 00:24:09,760
because it's the sum of
the diagonal entries.

377
00:24:09,760 --> 00:24:11,980
Let's use eigenvalues.

378
00:24:11,980 --> 00:24:14,740
How are the eigenvalues
of CD related

379
00:24:14,740 --> 00:24:17,550
to the eigenvalues of DC?

380
00:24:17,550 --> 00:24:19,800
They're the same.

381
00:24:19,800 --> 00:24:21,720
If these matrices
are rectangular,

382
00:24:21,720 --> 00:24:24,240
then there might be some
extra zero eigenvalues,

383
00:24:24,240 --> 00:24:26,880
because they would
have different shapes.

384
00:24:26,880 --> 00:24:30,060
But zeros are not going
to affect the trace.

385
00:24:30,060 --> 00:24:35,590
So this is the same
nonzero eigenvalues.

386
00:24:42,020 --> 00:24:43,730
OK.

387
00:24:43,730 --> 00:24:44,990
And so on.

388
00:24:44,990 --> 00:24:45,960
Yeah.

389
00:24:45,960 --> 00:24:46,910
OK.

390
00:24:46,910 --> 00:24:54,930
Let me just-- let me try
to tell you the steps

391
00:24:54,930 --> 00:24:59,770
now to get the correct Q. And
let me tell you the answer

392
00:24:59,770 --> 00:25:00,270
first.

393
00:25:04,410 --> 00:25:11,240
And I'm realizing that all
important question four--

394
00:25:11,240 --> 00:25:14,030
does deep learning
actually work?

395
00:25:14,030 --> 00:25:15,740
We're going to run
out of time today,

396
00:25:15,740 --> 00:25:18,170
because we only have
a few minutes left.

397
00:25:18,170 --> 00:25:21,320
I suggest we bring
that question back up,

398
00:25:21,320 --> 00:25:26,870
because it's pretty
important to a lot of people.

399
00:25:26,870 --> 00:25:32,270
There's-- I had lunch with
Professor Edelman, and he said,

400
00:25:32,270 --> 00:25:37,160
you know, deep learning and
neural nets have had a record

401
00:25:37,160 --> 00:25:44,390
amount of publicity and hype
for sort of computational

402
00:25:44,390 --> 00:25:45,830
algorithm.

403
00:25:45,830 --> 00:25:49,950
And-- but I had--

404
00:25:49,950 --> 00:25:55,860
I've had people now tell
me that typical first--

405
00:25:55,860 --> 00:25:59,300
if you create a network--

406
00:25:59,300 --> 00:26:04,690
using Alex's design,
for example--

407
00:26:04,690 --> 00:26:09,590
the chances are that it
won't be successful--

408
00:26:09,590 --> 00:26:15,620
that the successful networks
have been worked on,

409
00:26:15,620 --> 00:26:17,310
and experimented with.

410
00:26:17,310 --> 00:26:22,220
And a good structure has
emerged, but didn't--

411
00:26:22,220 --> 00:26:23,520
wasn't there at the start.

412
00:26:23,520 --> 00:26:27,350
So I think that's
a topic for Monday.

413
00:26:27,350 --> 00:26:31,370
And I'm really just
realizing, from talking

414
00:26:31,370 --> 00:26:37,190
to people in the field, that
it's by no means automatic.

415
00:26:37,190 --> 00:26:42,840
That structure-- even if you
put in a whole bunch of layers--

416
00:26:42,840 --> 00:26:45,350
it may not be what you want.

417
00:26:45,350 --> 00:26:46,160
OK.

418
00:26:46,160 --> 00:26:50,790
So I'm-- let me finish
this argument today.

419
00:26:50,790 --> 00:26:52,310
Let me give you the answer.

420
00:26:52,310 --> 00:26:54,140
So what's the good Q?

421
00:26:54,140 --> 00:27:04,140
I have matrices Y and X. And
the idea is that I take it--

422
00:27:04,140 --> 00:27:07,310
I look at Y transpose
X. So that'll

423
00:27:07,310 --> 00:27:10,370
be all the dot products
of one set of vectors

424
00:27:10,370 --> 00:27:11,630
or the other set of vectors.

425
00:27:11,630 --> 00:27:12,860
That's a matrix.

426
00:27:12,860 --> 00:27:14,470
And I do its SVD--

427
00:27:14,470 --> 00:27:16,520
U sigma V transpose.

428
00:27:19,900 --> 00:27:21,450
So multiply this.

429
00:27:23,990 --> 00:27:27,840
Multiply Y-- the two
bases that you're given.

430
00:27:27,840 --> 00:27:30,650
Of course, if Y
was the same as X--

431
00:27:30,650 --> 00:27:32,750
if it was an orthogonal basis--

432
00:27:32,750 --> 00:27:36,050
you'd have the
identity, no questions.

433
00:27:36,050 --> 00:27:37,970
But generally, we have--

434
00:27:37,970 --> 00:27:39,980
it has an SVD.

435
00:27:39,980 --> 00:27:44,840
And we're looking for
a orthogonal matrix

436
00:27:44,840 --> 00:27:49,055
of the best Q is--

437
00:27:51,670 --> 00:27:52,510
Da dun da duh.

438
00:27:52,510 --> 00:28:01,830
I mean, it's the right time
for expressions of amazement.

439
00:28:01,830 --> 00:28:03,690
It is UV transpose.

440
00:28:07,380 --> 00:28:08,130
OK.

441
00:28:08,130 --> 00:28:12,270
So that gives us the answer.

442
00:28:12,270 --> 00:28:17,140
We're given X and Y. We're
looking for the best Q.

443
00:28:17,140 --> 00:28:21,280
And the answer comes in
the simplest possible way.

444
00:28:21,280 --> 00:28:24,880
Compute Y transpose
X. Compute its SVD,

445
00:28:24,880 --> 00:28:29,340
and use the orthogonal
matrices from the SVD.

446
00:28:29,340 --> 00:28:29,840
Yeah.

447
00:28:29,840 --> 00:28:33,590
And I'm out of time so proof--

448
00:28:36,240 --> 00:28:41,520
it's [INAUDIBLE] line later--

449
00:28:41,520 --> 00:28:45,180
either to just send
you the section online,

450
00:28:45,180 --> 00:28:47,430
or to discuss it
in class Monday.

451
00:28:47,430 --> 00:28:53,730
But I'm really planning Monday
to start with question 4.

452
00:28:53,730 --> 00:28:56,910
And meanwhile to ask a
whole lot of people--

453
00:28:56,910 --> 00:29:00,810
everybody I can find--

454
00:29:00,810 --> 00:29:04,560
about that important
question, is--

455
00:29:04,560 --> 00:29:06,550
does deep learning usually work?

456
00:29:06,550 --> 00:29:09,375
How-- what can you do
to make sure it works,

457
00:29:09,375 --> 00:29:13,080
or give yourself a better
chance to have it work?

458
00:29:13,080 --> 00:29:15,560
So let's-- that's
up for Monday then.

459
00:29:15,560 --> 00:29:17,110
Good.