1
00:00:16,467 --> 00:00:18,400
MICHALE FEE: So for
the next few lectures,

2
00:00:18,400 --> 00:00:24,030
we're going to be looking at
developing methods of studying

3
00:00:24,030 --> 00:00:27,003
the computational properties
of networks of neurons.

4
00:00:27,003 --> 00:00:28,920
This is the outline for
the next few lectures.

5
00:00:28,920 --> 00:00:34,530
Today we are going to introduce
a method of studying networks

6
00:00:34,530 --> 00:00:40,500
called a rate model where
we basically replace

7
00:00:40,500 --> 00:00:42,990
spike trains with
firing rates in order

8
00:00:42,990 --> 00:00:45,780
to develop simple
mathematical descriptions

9
00:00:45,780 --> 00:00:47,380
of neural networks.

10
00:00:47,380 --> 00:00:51,660
And we're going to start by
introducing that technique

11
00:00:51,660 --> 00:00:55,560
to the problem of studying
feed-forward neural networks.

12
00:00:55,560 --> 00:00:58,080
And we'll introduce the
idea of perceptrons trance

13
00:00:58,080 --> 00:01:01,710
as a method of
developing networks that

14
00:01:01,710 --> 00:01:04,709
can classify their inputs.

15
00:01:04,709 --> 00:01:06,360
Then in the next
lecture, we're going

16
00:01:06,360 --> 00:01:13,290
to turn to largely describing
mathematical tools based

17
00:01:13,290 --> 00:01:18,450
on matrix operations and
the idea of basis sets.

18
00:01:18,450 --> 00:01:21,450
Matrix operations are very
important for studying

19
00:01:21,450 --> 00:01:22,260
neural networks.

20
00:01:22,260 --> 00:01:25,020
But they're also
a fundamental tool

21
00:01:25,020 --> 00:01:27,780
for analyzing data
and doing things

22
00:01:27,780 --> 00:01:31,780
like reducing the dimensionality
of high dimensional data sets,

23
00:01:31,780 --> 00:01:36,600
including methods such as
principal components analysis.

24
00:01:36,600 --> 00:01:40,680
So it's a very
powerful set of methods

25
00:01:40,680 --> 00:01:43,680
that apply both to
studying the brain

26
00:01:43,680 --> 00:01:49,530
and to analyzing the data that
we get when we study the brain.

27
00:01:49,530 --> 00:01:54,360
And then finally we'll turn
to a few lectures that focus

28
00:01:54,360 --> 00:01:55,680
on recurrent neural networks.

29
00:01:55,680 --> 00:01:59,610
These are networks where
the neurons connect

30
00:01:59,610 --> 00:02:03,060
to each other densely
in a recurrent way,

31
00:02:03,060 --> 00:02:05,280
meaning a neuron will
connect to another neuron.

32
00:02:05,280 --> 00:02:07,650
And that neuron will connect
back to the first neuron.

33
00:02:07,650 --> 00:02:10,050
And networks that
have that property

34
00:02:10,050 --> 00:02:13,830
have very interesting
computational abilities.

35
00:02:13,830 --> 00:02:16,260
And we're going to study
that in the context of line

36
00:02:16,260 --> 00:02:23,280
attractors and short-term
memory and hopfield networks.

37
00:02:23,280 --> 00:02:29,340
So for today, the plan is
to develop the rate model.

38
00:02:29,340 --> 00:02:33,660
We're going to show how we
can build receptive fields

39
00:02:33,660 --> 00:02:36,840
with feed forward
networks that we've

40
00:02:36,840 --> 00:02:38,580
described with the rate model.

41
00:02:38,580 --> 00:02:42,090
We're going to take
a little detour

42
00:02:42,090 --> 00:02:44,850
and describe vector notation
and vector algebra, which

43
00:02:44,850 --> 00:02:49,320
is very important for these
models, and also for building

44
00:02:49,320 --> 00:02:51,930
up to the matrix
methods that we'll

45
00:02:51,930 --> 00:02:54,720
talk about in the next lecture.

46
00:02:54,720 --> 00:02:57,390
Again, we'll talk about neural
networks for classification

47
00:02:57,390 --> 00:02:59,740
and introduce the
idea of a perceptron.

48
00:02:59,740 --> 00:03:02,490
So that's for today.

49
00:03:02,490 --> 00:03:06,840
So I've already talked
about most of this.

50
00:03:06,840 --> 00:03:09,480
Why is it that we want
to develop a simplified

51
00:03:09,480 --> 00:03:12,930
mathematical model of neurons
that we can study analytically?

52
00:03:12,930 --> 00:03:17,130
Well, the reason is that we can
really develop our intuition

53
00:03:17,130 --> 00:03:19,710
about how networks work.

54
00:03:19,710 --> 00:03:22,500
And that intuition
applies not just

55
00:03:22,500 --> 00:03:25,260
to the very simplified
mathematical model that we're

56
00:03:25,260 --> 00:03:27,420
developing, but also
applies more broadly

57
00:03:27,420 --> 00:03:30,180
to real networks
with real neurons

58
00:03:30,180 --> 00:03:33,210
that actually generate spikes
and interact with each other

59
00:03:33,210 --> 00:03:36,450
by the more complex
biophysical mechanisms that

60
00:03:36,450 --> 00:03:37,960
are going on in the brain.

61
00:03:37,960 --> 00:03:39,960
So a good example
of this is how we

62
00:03:39,960 --> 00:03:42,715
simplified the detailed
spiking neurons

63
00:03:42,715 --> 00:03:46,740
of the Hodgkin-Huxley
model and approximate that

64
00:03:46,740 --> 00:03:48,720
as an integrate and
fire model, which

65
00:03:48,720 --> 00:03:52,500
captures a lot of the
properties of real neurons.

66
00:03:52,500 --> 00:03:55,680
Simplifies it enough to
develop an intuition,

67
00:03:55,680 --> 00:03:59,310
but captures a lot of
the important properties

68
00:03:59,310 --> 00:04:03,210
of real neural circuits.

69
00:04:03,210 --> 00:04:06,860
All right, so let's
start by developing

70
00:04:06,860 --> 00:04:09,232
the basic idea of a rate model.

71
00:04:09,232 --> 00:04:10,440
Let's start with two neurons.

72
00:04:10,440 --> 00:04:13,110
We have an input neuron
and an output neuron.

73
00:04:13,110 --> 00:04:18,329
The input neuron has some
firing rate given by u.

74
00:04:18,329 --> 00:04:22,079
And the output neuron has
some firing rate given by v.

75
00:04:22,079 --> 00:04:27,330
So we're going to essentially
ignore the times of the spikes

76
00:04:27,330 --> 00:04:30,750
and describe the inputs
and outputs of this network

77
00:04:30,750 --> 00:04:33,150
just with firing rates.

78
00:04:33,150 --> 00:04:35,760
You can think of the rate
as just having units have

79
00:04:35,760 --> 00:04:38,430
spikes per second, for example.

80
00:04:38,430 --> 00:04:40,860
Those neurons, the input
neuron and the output neuron,

81
00:04:40,860 --> 00:04:43,290
are connected to each
other by a synapse.

82
00:04:43,290 --> 00:04:47,370
And we're going to replace
all of the complex structure

83
00:04:47,370 --> 00:04:55,650
of synapses, vesicle release,
neurotransmitter receptors,

84
00:04:55,650 --> 00:05:00,750
long-term depression
and paired spike

85
00:05:00,750 --> 00:05:02,940
facilitation and
depression, all that stuff

86
00:05:02,940 --> 00:05:04,530
we're just going to ignore.

87
00:05:04,530 --> 00:05:08,460
And we're going to replace that
synapse with a synaptic weight

88
00:05:08,460 --> 00:05:08,960
w.

89
00:05:13,380 --> 00:05:16,470
Just to give you the simplest
intuition of how a rate

90
00:05:16,470 --> 00:05:20,100
model works, there are
models where we can just

91
00:05:20,100 --> 00:05:22,890
treat the firing rate
of the output neuron,

92
00:05:22,890 --> 00:05:26,500
for example, as
linear in its input.

93
00:05:26,500 --> 00:05:30,180
And we can simplify
this even to the point

94
00:05:30,180 --> 00:05:33,630
where we can describe the
firing rate of the output neuron

95
00:05:33,630 --> 00:05:38,280
as the synaptic weight w times
the firing rate of the input

96
00:05:38,280 --> 00:05:39,160
neuron.

97
00:05:39,160 --> 00:05:43,690
So that's just to give you a
flavor of where we're heading.

98
00:05:43,690 --> 00:05:47,820
And I'm going to justify
how we can do this

99
00:05:47,820 --> 00:05:50,310
and/or why we can do this.

100
00:05:50,310 --> 00:05:52,290
And then we're going
to build this up

101
00:05:52,290 --> 00:05:56,220
from the case of one input
neuron and one output neuron

102
00:05:56,220 --> 00:05:58,410
to the case where we can
have many input neurons

103
00:05:58,410 --> 00:05:59,630
and many output neurons.

104
00:06:03,050 --> 00:06:07,280
So how do we justify going
from spikes to firing rates?

105
00:06:07,280 --> 00:06:12,860
So remember that the response
of a real output neuron,

106
00:06:12,860 --> 00:06:16,850
a real neuron, to a
single spike at its input,

107
00:06:16,850 --> 00:06:19,610
is some change in the
postsynaptic conductance that

108
00:06:19,610 --> 00:06:20,960
follows an input spike.

109
00:06:20,960 --> 00:06:23,480
And in our model
of a synapse, we

110
00:06:23,480 --> 00:06:29,420
described that the input spike
produces a transient increase

111
00:06:29,420 --> 00:06:31,730
in the synaptic conductance.

112
00:06:31,730 --> 00:06:34,100
And that synaptic
conductance we modeled

113
00:06:34,100 --> 00:06:37,940
as a simple step increase
in the conductance

114
00:06:37,940 --> 00:06:42,350
followed by an exponential
decay as the neurotransmitter

115
00:06:42,350 --> 00:06:47,450
gradually unbinds from the
neurotransmitter receptors.

116
00:06:47,450 --> 00:06:51,260
So we have a transient change
in the synaptic conductance.

117
00:06:51,260 --> 00:06:53,180
That's just a
maximum conductance

118
00:06:53,180 --> 00:06:56,990
times an exponential decay.

119
00:06:56,990 --> 00:06:59,930
Now remember that we wrote
down the postsynaptic--

120
00:06:59,930 --> 00:07:02,660
we can write down the
postsynaptic current that

121
00:07:02,660 --> 00:07:07,100
results from this synaptic input
as the synaptic conductance

122
00:07:07,100 --> 00:07:12,680
times v minus e synapse, the
synaptic reversal potential.

123
00:07:17,530 --> 00:07:19,850
In moving forward
in this model, we're

124
00:07:19,850 --> 00:07:23,610
not going to worry about
synaptic saturation.

125
00:07:23,610 --> 00:07:26,540
So we're just going to imagine
that the synaptic current is

126
00:07:26,540 --> 00:07:33,440
just proportional to the
synaptic conductance.

127
00:07:33,440 --> 00:07:35,480
So now we can write
the conductance

128
00:07:35,480 --> 00:07:43,070
as just some weight times
a kernel that is just

129
00:07:43,070 --> 00:07:45,270
some kernel of unit area.

130
00:07:45,270 --> 00:07:48,680
So what we've done here is we've
just taken the synaptic current

131
00:07:48,680 --> 00:07:51,080
and we've written
it as a constant,

132
00:07:51,080 --> 00:07:54,980
a synaptic weight, times
an exponentially decaying

133
00:07:54,980 --> 00:07:57,040
kernel of area, area 1.

134
00:08:01,170 --> 00:08:05,040
So now if we have a train of
spikes at the input instead

135
00:08:05,040 --> 00:08:07,620
of a single spike,
we can write down

136
00:08:07,620 --> 00:08:09,810
that train of spikes,
the spike train,

137
00:08:09,810 --> 00:08:11,610
as a sum of delta
functions where

138
00:08:11,610 --> 00:08:14,430
the spike times are t sub i.

139
00:08:14,430 --> 00:08:18,180
And if you want to plot
the synaptic current

140
00:08:18,180 --> 00:08:20,100
as a function of
time, you would just

141
00:08:20,100 --> 00:08:23,490
take that spike train
input and do what

142
00:08:23,490 --> 00:08:25,050
with that linear kernel?

143
00:08:25,050 --> 00:08:26,610
We would convolve it, right?

144
00:08:26,610 --> 00:08:28,340
So we would take
that spike train,

145
00:08:28,340 --> 00:08:32,039
convolve it with that
little exponential kernel.

146
00:08:32,039 --> 00:08:34,470
And that would give us
the synaptic current that

147
00:08:34,470 --> 00:08:36,929
results from that spike train.

148
00:08:39,960 --> 00:08:42,809
So let's think for
a moment about what

149
00:08:42,809 --> 00:08:44,159
this quantity is right here.

150
00:08:44,159 --> 00:08:49,500
What is k, this k which
is a little kernel that

151
00:08:49,500 --> 00:08:53,190
has an exponential step, and
then an exponential decay?

152
00:08:53,190 --> 00:08:55,530
What do you get
when you convolve

153
00:08:55,530 --> 00:09:02,078
that kind of smooth kernel
with this spike train here?

154
00:09:02,078 --> 00:09:03,120
What does that look like?

155
00:09:03,120 --> 00:09:06,140
We did that at one point
when we were in class

156
00:09:06,140 --> 00:09:09,560
when we were talking about how
you would estimate something

157
00:09:09,560 --> 00:09:12,210
from a spike train.

158
00:09:12,210 --> 00:09:13,200
What is that?

159
00:09:13,200 --> 00:09:15,190
What is that
quantity right there?

160
00:09:15,190 --> 00:09:18,630
It's sort of a smoothed
version of a spike train,

161
00:09:18,630 --> 00:09:21,510
which is how you would
calculate what, Habiba?

162
00:09:21,510 --> 00:09:24,611
AUDIENCE: Is it a window
for the spike train?

163
00:09:24,611 --> 00:09:25,810
MICHALE FEE: Yeah.

164
00:09:25,810 --> 00:09:28,870
It's windowed, but what is
it that you are calculating

165
00:09:28,870 --> 00:09:31,360
when you take a spike
train and you convolve it

166
00:09:31,360 --> 00:09:33,580
with some smooth window?

167
00:09:33,580 --> 00:09:34,771
AUDIENCE: Low-pass window?

168
00:09:34,771 --> 00:09:37,300
MICHALE FEE: It's like
a low-pass version

169
00:09:37,300 --> 00:09:38,230
of the spike train.

170
00:09:38,230 --> 00:09:43,900
And remember in the
lecture on firing rates,

171
00:09:43,900 --> 00:09:45,640
we talked about how
that's a good way

172
00:09:45,640 --> 00:09:50,870
to get a time-dependent estimate
of the firing rate of a neuron.

173
00:09:50,870 --> 00:09:53,500
We take the spike train
and just convolve it

174
00:09:53,500 --> 00:09:55,390
with a smooth window.

175
00:09:55,390 --> 00:09:58,630
And if the area of that
smooth window is 1,

176
00:09:58,630 --> 00:10:00,340
then what we're doing
is we're estimating

177
00:10:00,340 --> 00:10:05,330
the firing rate of the
neuron as a function of time.

178
00:10:05,330 --> 00:10:06,210
Does that make sense?

179
00:10:06,210 --> 00:10:06,710
Yes?

180
00:10:06,710 --> 00:10:09,510
AUDIENCE: So k is just a kernel?

181
00:10:09,510 --> 00:10:12,540
MICHALE FEE: k is just is
smooth kernel that happens

182
00:10:12,540 --> 00:10:14,561
to have this exponential shape.

183
00:10:14,561 --> 00:10:18,710
AUDIENCE: Is it like [INAUDIBLE]

184
00:10:18,710 --> 00:10:23,530
MICHALE FEE: Well, that's
our model for how a synapse--

185
00:10:23,530 --> 00:10:26,080
basically, what I'm
saying is that when

186
00:10:26,080 --> 00:10:28,570
you take a spike train and
put it through a synapse, what

187
00:10:28,570 --> 00:10:30,838
comes out the other end
is a smoothed version

188
00:10:30,838 --> 00:10:31,630
of the spike train.

189
00:10:31,630 --> 00:10:32,172
AUDIENCE: OK.

190
00:10:32,172 --> 00:10:34,000
MICHALE FEE: That's
all this is saying.

191
00:10:34,000 --> 00:10:35,090
AUDIENCE: OK.

192
00:10:35,090 --> 00:10:39,470
[INAUDIBLE] they have
this area or quantity?

193
00:10:39,470 --> 00:10:40,560
MICHALE FEE: Yep.

194
00:10:40,560 --> 00:10:47,530
If k has-- you remember that
if k has an area 1, then when

195
00:10:47,530 --> 00:10:50,510
you convolve evolve that
kernel with the spike train,

196
00:10:50,510 --> 00:10:58,160
you get a number that has
units of spikes per second.

197
00:10:58,160 --> 00:11:04,040
And that quantity is an
estimate of the local firing

198
00:11:04,040 --> 00:11:06,685
rate of the neuron.

199
00:11:06,685 --> 00:11:07,560
Does that make sense?

200
00:11:11,780 --> 00:11:14,280
So basically, we can
take this spike train,

201
00:11:14,280 --> 00:11:16,270
and by convolve it
with a smooth window,

202
00:11:16,270 --> 00:11:19,090
we can estimate the
number of spikes

203
00:11:19,090 --> 00:11:22,970
per second in that window.

204
00:11:22,970 --> 00:11:24,380
So what do we have here?

205
00:11:24,380 --> 00:11:28,120
We have that the current
is just a constant times

206
00:11:28,120 --> 00:11:32,870
an estimate of the
firing rate at that time.

207
00:11:32,870 --> 00:11:36,260
If k is a kernel, a
smooth kernel with an area

208
00:11:36,260 --> 00:11:39,800
normalized to 1, then
this quantity is just

209
00:11:39,800 --> 00:11:41,960
an estimate of the firing rate.

210
00:11:41,960 --> 00:11:44,460
So let's take a look at that.

211
00:11:44,460 --> 00:11:48,020
So here I have just made
a sample spike train

212
00:11:48,020 --> 00:11:51,500
with a bunch of spikes
that look like they're

213
00:11:51,500 --> 00:11:54,410
increasing in firing rate and
decreasing in firing rate.

214
00:11:54,410 --> 00:11:57,950
If we take that spike train and
convolve it with this kernel,

215
00:11:57,950 --> 00:12:01,100
you can see that you get
this sort of broad bump

216
00:12:01,100 --> 00:12:04,310
that looks like it gets higher
in the middle where the firing

217
00:12:04,310 --> 00:12:04,970
rate is higher.

218
00:12:04,970 --> 00:12:07,430
And it's lower at the edges
where the firing rate is lower.

219
00:12:12,080 --> 00:12:16,180
So the point is that you
can take a spike train

220
00:12:16,180 --> 00:12:17,800
and put it into a neuron.

221
00:12:17,800 --> 00:12:24,100
The response of the neuron
is a smooth low-pass version

222
00:12:24,100 --> 00:12:28,580
of the rate of this
input spike train.

223
00:12:28,580 --> 00:12:33,170
And so you can think about
writing down the input

224
00:12:33,170 --> 00:12:39,540
to this neuron as a weight times
the firing rate of the input.

225
00:12:39,540 --> 00:12:46,560
So that was a way of writing
down the input to this output

226
00:12:46,560 --> 00:12:52,900
neuron from the input
neuron, the current input.

227
00:12:52,900 --> 00:12:57,880
Now what is the firing rate of
the output neuron in response

228
00:12:57,880 --> 00:13:00,970
to that current injection?

229
00:13:00,970 --> 00:13:02,680
So that's what we're
going to ask next.

230
00:13:02,680 --> 00:13:06,400
And you can remember that when
we talked about the integrate

231
00:13:06,400 --> 00:13:13,640
and fire model, we
saw that neurons

232
00:13:13,640 --> 00:13:16,310
in the approximation
of large inputs

233
00:13:16,310 --> 00:13:19,340
have firing rate as
a function of current

234
00:13:19,340 --> 00:13:20,230
that looks like this.

235
00:13:20,230 --> 00:13:24,530
It's zero for inputs below
the threshold current.

236
00:13:24,530 --> 00:13:27,470
For input currents that
aren't large enough

237
00:13:27,470 --> 00:13:29,270
to drive the neuron
to threshold,

238
00:13:29,270 --> 00:13:31,550
the neuron doesn't spike at all.

239
00:13:31,550 --> 00:13:34,190
And then above some
threshold, the neuron

240
00:13:34,190 --> 00:13:39,470
fires approximately linearly
at higher input currents.

241
00:13:42,490 --> 00:13:44,770
So the way that we
think about this

242
00:13:44,770 --> 00:13:48,880
is that the input on is
spiking at some rate.

243
00:13:48,880 --> 00:13:50,740
It goes through a synapse.

244
00:13:50,740 --> 00:13:55,060
That synapse smooths the input
and produces some current

245
00:13:55,060 --> 00:13:57,070
in the postsynaptic
neuron that's

246
00:13:57,070 --> 00:13:59,890
proportional approximately to
the firing rate of the input

247
00:13:59,890 --> 00:14:00,940
neuron.

248
00:14:00,940 --> 00:14:04,750
And the output neuron has
some output firing rate

249
00:14:04,750 --> 00:14:06,835
that's some function
of the input current.

250
00:14:10,050 --> 00:14:13,320
So we can write down the firing
rate of our output neuron,

251
00:14:13,320 --> 00:14:18,270
v. It's just equal to
some function of the input

252
00:14:18,270 --> 00:14:23,380
current, which is just some
function of w times the firing

253
00:14:23,380 --> 00:14:26,370
rate of the input neuron.

254
00:14:26,370 --> 00:14:30,690
And that right there
is the basic equation

255
00:14:30,690 --> 00:14:33,290
of the rate model.

256
00:14:33,290 --> 00:14:37,940
The output firing
rate is some function

257
00:14:37,940 --> 00:14:42,440
of a weight times the firing
rate of the input neuron.

258
00:14:47,610 --> 00:14:50,710
And everything else
about the rate model

259
00:14:50,710 --> 00:14:56,740
is just different rate models
have different numbers of input

260
00:14:56,740 --> 00:15:00,100
neurons where we have more than
one contribution to the input

261
00:15:00,100 --> 00:15:01,390
current.

262
00:15:01,390 --> 00:15:03,640
They can have many
output neurons.

263
00:15:03,640 --> 00:15:07,300
They can have different FI
curves for the output neurons.

264
00:15:07,300 --> 00:15:09,940
Some of them are
non-linear like this.

265
00:15:09,940 --> 00:15:10,940
Some of them are linear.

266
00:15:10,940 --> 00:15:13,090
And we're going to
come back and talk

267
00:15:13,090 --> 00:15:18,310
about the function of
different FI curves

268
00:15:18,310 --> 00:15:20,928
and why different FYI
curves are useful.

269
00:15:20,928 --> 00:15:21,970
Any questions about this?

270
00:15:21,970 --> 00:15:23,190
That's the basic idea.

271
00:15:25,760 --> 00:15:28,230
All right, good.

272
00:15:28,230 --> 00:15:35,010
So let's take one
particularly simple version

273
00:15:35,010 --> 00:15:37,830
of the rate model called
a linear rate model.

274
00:15:37,830 --> 00:15:42,570
And the linear rate model
has a particular FI curve.

275
00:15:42,570 --> 00:15:48,090
That FI curve says that the
firing rate of the neuron

276
00:15:48,090 --> 00:15:50,790
is linear in the input current.

277
00:15:50,790 --> 00:15:56,640
Now why is this a really
weird model of a neuron?

278
00:15:56,640 --> 00:16:02,460
What's fundamentally
non-biological about this?

279
00:16:02,460 --> 00:16:04,310
AUDIENCE: Negative firing rate.

280
00:16:04,310 --> 00:16:06,360
MICHALE FEE: I'm hearing
a bunch of right answers

281
00:16:06,360 --> 00:16:07,140
at the same time.

282
00:16:07,140 --> 00:16:08,432
AUDIENCE: Negative firing rate.

283
00:16:08,432 --> 00:16:12,800
MICHALE FEE: This
neuron is allowed

284
00:16:12,800 --> 00:16:15,710
to fire at a
negative firing rate

285
00:16:15,710 --> 00:16:18,302
if the input
current is negative.

286
00:16:22,000 --> 00:16:24,680
That's a pretty
crazy thing to do.

287
00:16:24,680 --> 00:16:26,430
Why do you think we
would want to do that?

288
00:16:30,786 --> 00:16:33,206
AUDIENCE: [INAUDIBLE]?

289
00:16:33,206 --> 00:16:35,580
MICHALE FEE: Well,
no actually we do.

290
00:16:35,580 --> 00:16:41,400
So you can have
inhibitory inputs

291
00:16:41,400 --> 00:16:45,480
that produce outward currents
that hyperpolarize the neuron.

292
00:16:45,480 --> 00:16:49,380
Any thoughts about that?

293
00:16:49,380 --> 00:16:53,880
It turns out that as soon as
you have your output neurons

294
00:16:53,880 --> 00:16:58,200
have this kind of FI
curve, a linear FI curve,

295
00:16:58,200 --> 00:17:01,630
then the math
becomes super simple.

296
00:17:01,630 --> 00:17:05,700
You can write down very
complex networks of neurons

297
00:17:05,700 --> 00:17:09,780
with a bunch of linear
differential equations.

298
00:17:09,780 --> 00:17:12,720
And it becomes very
easy to write down

299
00:17:12,720 --> 00:17:18,060
what the solution is to
how a network behaves

300
00:17:18,060 --> 00:17:20,740
as a function of its inputs.

301
00:17:20,740 --> 00:17:24,819
And we're going to spend
a lot of time working

302
00:17:24,819 --> 00:17:30,780
with network models that have
linear FI curves because you

303
00:17:30,780 --> 00:17:33,960
can develop a lot of intuition
about how networks behave

304
00:17:33,960 --> 00:17:35,790
by using models like this.

305
00:17:35,790 --> 00:17:38,310
As soon as you have
models like this,

306
00:17:38,310 --> 00:17:42,750
you can't solve the behavior
of the network analytically.

307
00:17:42,750 --> 00:17:44,820
You have to do everything
on the computer.

308
00:17:44,820 --> 00:17:49,320
And it becomes very hard
to derive general solutions

309
00:17:49,320 --> 00:17:51,560
for how things behave.

310
00:17:51,560 --> 00:17:55,860
So we're going to
use this model a lot.

311
00:18:01,450 --> 00:18:06,130
And in this case again, for the
case of this two-neuron network

312
00:18:06,130 --> 00:18:10,300
where we have one output neuron
that receives a synaptic input

313
00:18:10,300 --> 00:18:12,940
from an input neuron, the
firing rate of the output neuron

314
00:18:12,940 --> 00:18:15,850
is just w, the synaptic
weight times the firing rate

315
00:18:15,850 --> 00:18:16,851
of the input neuron.

316
00:18:20,040 --> 00:18:23,730
And we're going to come
back to non-linear neurons

317
00:18:23,730 --> 00:18:26,010
because that
non-linearity actually

318
00:18:26,010 --> 00:18:27,390
does really important things.

319
00:18:27,390 --> 00:18:31,770
And we're going to talk
about what that does.

320
00:18:31,770 --> 00:18:34,830
So now let's look at the
case where our output neuron

321
00:18:34,830 --> 00:18:37,560
has not just one
input but actually

322
00:18:37,560 --> 00:18:40,440
many inputs from a
bunch of input neurons.

323
00:18:40,440 --> 00:18:44,841
So here we have what
we call an input layer,

324
00:18:44,841 --> 00:18:48,330
a layer of neurons
in the input layer.

325
00:18:48,330 --> 00:18:51,240
Each one of those neurons
has a firing rate--

326
00:18:51,240 --> 00:18:53,910
u1, u2, u3, u4, u5.

327
00:18:58,050 --> 00:19:01,710
Each of those neurons sends a
synapse onto our output neuron.

328
00:19:01,710 --> 00:19:05,140
Each one of those synapses
has a synaptic weight.

329
00:19:05,140 --> 00:19:07,620
This weight is w1.

330
00:19:07,620 --> 00:19:10,445
And that's w2, w3, w4, and w5.

331
00:19:13,360 --> 00:19:18,180
Now you can see that the total
input, the total current,

332
00:19:18,180 --> 00:19:20,010
to this output
neuron is just going

333
00:19:20,010 --> 00:19:26,560
to be a sum of the inputs from
each of the input neurons.

334
00:19:26,560 --> 00:19:29,640
The total input is just
a sum of the inputs

335
00:19:29,640 --> 00:19:31,040
from each of the input neuron.

336
00:19:31,040 --> 00:19:33,900
So the synaptic current--

337
00:19:33,900 --> 00:19:35,610
total synaptic current
into this neuron

338
00:19:35,610 --> 00:19:44,860
is w1 times u1, plus w2
times u2, plus w3 times u3,

339
00:19:44,860 --> 00:19:47,100
plus all the rest.

340
00:19:47,100 --> 00:19:54,510
So the response of
our linear neuron,

341
00:19:54,510 --> 00:19:56,760
the firing rate of
our linear neuron,

342
00:19:56,760 --> 00:20:02,540
is just a sum over
all of those inputs.

343
00:20:02,540 --> 00:20:04,910
So again, in this
case, we're going

344
00:20:04,910 --> 00:20:08,060
to say that the total input
current to this neuron

345
00:20:08,060 --> 00:20:10,040
is the sum over this.

346
00:20:10,040 --> 00:20:14,000
But then because this
is a linear neuron,

347
00:20:14,000 --> 00:20:18,465
the firing rate is just
equal to that current input.

348
00:20:18,465 --> 00:20:19,340
Does that make sense?

349
00:20:22,660 --> 00:20:26,020
So you can see that this
description of the firing

350
00:20:26,020 --> 00:20:30,010
rate of the output
neuron is a sum over all

351
00:20:30,010 --> 00:20:31,120
of those contributions.

352
00:20:31,120 --> 00:20:35,380
It turns out that
this actually can

353
00:20:35,380 --> 00:20:38,988
be written in a much more
compact way in vector notation.

354
00:20:38,988 --> 00:20:40,030
What does that look like?

355
00:20:40,030 --> 00:20:42,630
Does anyone know in vector
notation what that looks like?

356
00:20:42,630 --> 00:20:43,700
AUDIENCE: Dot product.

357
00:20:43,700 --> 00:20:45,117
MICHALE FEE: That's
a dot product.

358
00:20:45,117 --> 00:20:46,120
That's right.

359
00:20:46,120 --> 00:20:51,040
So in general, it's much
easier to write these responses

360
00:20:51,040 --> 00:20:51,832
in vector notation.

361
00:20:51,832 --> 00:20:53,207
And so I'm just
going to walk you

362
00:20:53,207 --> 00:20:54,910
through some basics
of vector notation

363
00:20:54,910 --> 00:20:57,565
for those of you who might
need a few minutes of reminder.

364
00:21:00,510 --> 00:21:02,490
Actually before we get
to the vector notation,

365
00:21:02,490 --> 00:21:05,550
I just want to
describe how we can

366
00:21:05,550 --> 00:21:09,970
use a simple network like this
to build a receptive field.

367
00:21:09,970 --> 00:21:12,240
So you remember that
when we were talking

368
00:21:12,240 --> 00:21:14,290
about receptive
fields of neurons,

369
00:21:14,290 --> 00:21:17,940
we described how
a neuron can have

370
00:21:17,940 --> 00:21:22,300
a maximal response to a
particular pattern of input.

371
00:21:22,300 --> 00:21:24,090
So let's say we
have a neuron that's

372
00:21:24,090 --> 00:21:26,140
sensitive to visual inputs.

373
00:21:26,140 --> 00:21:27,900
And as a function
of one dimension,

374
00:21:27,900 --> 00:21:29,880
let's say along the
retina, this neuron

375
00:21:29,880 --> 00:21:34,110
has a big response if light
comes in central field,

376
00:21:34,110 --> 00:21:37,260
some inhibitory responsive
light comes in outside

377
00:21:37,260 --> 00:21:39,990
of that central lobe.

378
00:21:39,990 --> 00:21:42,720
Well, it turns out
that a very simple way

379
00:21:42,720 --> 00:21:48,150
to build neurons that have
receptive fields like this,

380
00:21:48,150 --> 00:21:53,200
for example, is to have
an input layer that

381
00:21:53,200 --> 00:21:58,150
projects to this neuron that
has this receptive field

382
00:21:58,150 --> 00:22:00,910
and has a pattern
of synaptic inputs

383
00:22:00,910 --> 00:22:05,620
that corresponds to that
pattern in the field.

384
00:22:05,620 --> 00:22:08,120
So you can see that
if this neuron--

385
00:22:08,120 --> 00:22:10,300
so let's say these are
neurons in the retina,

386
00:22:10,300 --> 00:22:12,740
let's say retinal
ganglion cells,

387
00:22:12,740 --> 00:22:15,320
and this neuron is
in the thalamus,

388
00:22:15,320 --> 00:22:17,390
we can build a
thalamic neuron that

389
00:22:17,390 --> 00:22:21,890
has a center-surround
receptive field like this

390
00:22:21,890 --> 00:22:26,420
by having let's
say this neuron has

391
00:22:26,420 --> 00:22:30,730
a strong positive excitatory
synaptic weight onto our output

392
00:22:30,730 --> 00:22:31,230
neuron.

393
00:22:31,230 --> 00:22:35,330
So you can see that
if you have light here

394
00:22:35,330 --> 00:22:38,810
that corresponds to this neuron
having a high firing rate,

395
00:22:38,810 --> 00:22:42,570
that neuron is very effective
at driving the output neuron.

396
00:22:42,570 --> 00:22:45,920
And so the output neuron
has a positive component

397
00:22:45,920 --> 00:22:48,630
of its receptor field
right there in the middle.

398
00:22:48,630 --> 00:22:52,910
Now if this neuron here, which
is in this part of the retina,

399
00:22:52,910 --> 00:22:56,150
if that neuron has a negative
weight onto the output neuron,

400
00:22:56,150 --> 00:23:00,980
then light coming in here
driving this neuron will

401
00:23:00,980 --> 00:23:03,960
inhibit the output neuron.

402
00:23:03,960 --> 00:23:08,130
So if you have a pattern of
weights that looks like this,

403
00:23:08,130 --> 00:23:12,220
0 minus 1, 2 minus 1,
0, that this neuron

404
00:23:12,220 --> 00:23:16,460
will have a receptive
field that looks like that

405
00:23:16,460 --> 00:23:17,690
as a function of its inputs.

406
00:23:20,870 --> 00:23:24,592
So that's a
on-dimensional example.

407
00:23:24,592 --> 00:23:26,050
And you can see
that you write down

408
00:23:26,050 --> 00:23:28,990
the output here
as a weighted sum

409
00:23:28,990 --> 00:23:30,850
of each one of those inputs.

410
00:23:30,850 --> 00:23:34,190
This also works for two
dimensional receptive fields.

411
00:23:34,190 --> 00:23:37,060
For example, if we have
input from the retina that

412
00:23:37,060 --> 00:23:39,220
looks like this where we have--

413
00:23:39,220 --> 00:23:43,150
I guess this was excitatory
here in the center,

414
00:23:43,150 --> 00:23:46,510
inhibitory around, you
can make a neuron that

415
00:23:46,510 --> 00:23:49,750
has a two-dimensional
receptor field like this

416
00:23:49,750 --> 00:23:53,500
by having inputs to
this neuron from all

417
00:23:53,500 --> 00:23:58,180
of those different regions
of the visual field that

418
00:23:58,180 --> 00:24:01,780
have different weights
corresponding to positive

419
00:24:01,780 --> 00:24:02,860
in the center.

420
00:24:02,860 --> 00:24:06,490
So neurons in the
positive synaptic weights

421
00:24:06,490 --> 00:24:07,690
under the output neuron.

422
00:24:07,690 --> 00:24:13,270
And neurons around the edges
have negative synaptic weights.

423
00:24:13,270 --> 00:24:18,430
So we can build any receptive
field we want into a neuron

424
00:24:18,430 --> 00:24:22,420
by just plugging in-- by
putting in the right set

425
00:24:22,420 --> 00:24:25,320
of synaptic weights.

426
00:24:25,320 --> 00:24:25,950
Yes?

427
00:24:25,950 --> 00:24:38,404
AUDIENCE: So would you
rule out [INAUDIBLE]

428
00:24:38,404 --> 00:24:41,480
MICHALE FEE: So in real life,
I assume you mean in the brain?

429
00:24:41,480 --> 00:24:42,386
AUDIENCE: Yeah.

430
00:24:42,386 --> 00:24:45,820
MICHALE FEE: So in the
brain, we don't really

431
00:24:45,820 --> 00:24:48,650
know how these
weights are built.

432
00:24:48,650 --> 00:24:54,790
So one idea is that
there are rules that

433
00:24:54,790 --> 00:24:58,540
control the development
of these circuits,

434
00:24:58,540 --> 00:25:05,010
let's say, connections
of bipolar cells

435
00:25:05,010 --> 00:25:06,960
in the retina to
retinal ganglion cells

436
00:25:06,960 --> 00:25:11,010
that control how these
weights are determined

437
00:25:11,010 --> 00:25:12,270
to be positive or negative.

438
00:25:12,270 --> 00:25:17,250
Negative weights are implemented
by bipolar cells connected

439
00:25:17,250 --> 00:25:20,820
to amacrine cells,
which are inhibitory,

440
00:25:20,820 --> 00:25:24,250
and then connect to
the retinal ganglion.

441
00:25:24,250 --> 00:25:26,730
So there's a whole
circuit that gets

442
00:25:26,730 --> 00:25:29,400
built in the retina that
controls whether these weights

443
00:25:29,400 --> 00:25:30,780
are positive or negative.

444
00:25:30,780 --> 00:25:36,450
And those can be programmed by
genetic developmental programs.

445
00:25:36,450 --> 00:25:44,560
They can also be controlled by
experience with visual stimuli.

446
00:25:44,560 --> 00:25:47,430
So there's a lot
we don't understand

447
00:25:47,430 --> 00:25:51,300
about how these
weights are controlled

448
00:25:51,300 --> 00:25:54,300
or set up or programmed.

449
00:25:54,300 --> 00:25:58,290
But the way we think
about how receptive fields

450
00:25:58,290 --> 00:26:02,400
of these neurons emerge is
by controlling the weight

451
00:26:02,400 --> 00:26:04,500
of those synaptic input.

452
00:26:04,500 --> 00:26:06,570
That's the message here--

453
00:26:06,570 --> 00:26:12,450
that receptive fields
emerge from the pattern

454
00:26:12,450 --> 00:26:15,960
of weights from an input
layer onto an output layer.

455
00:26:18,930 --> 00:26:27,840
AUDIENCE: [INAUDIBLE]
how many [INAUDIBLE]

456
00:26:27,840 --> 00:26:29,860
MICHALE FEE: If you're
going to build a model,

457
00:26:29,860 --> 00:26:32,430
let's say, of the retina.

458
00:26:32,430 --> 00:26:35,780
So it just depends on how
realistic you want it to be.

459
00:26:39,100 --> 00:26:41,720
If you wanted to make a model
of a retinal ganglion cell,

460
00:26:41,720 --> 00:26:45,770
you could try to build a
model that has as many bipolar

461
00:26:45,770 --> 00:26:52,040
neurons as are actually
in the receptive field

462
00:26:52,040 --> 00:26:55,190
of that retinal ganglion cell.

463
00:26:55,190 --> 00:26:58,460
Or you could make a
simplified model that

464
00:26:58,460 --> 00:27:01,760
only has 10 or 100 neurons.

465
00:27:01,760 --> 00:27:05,300
Depends on what
you want to study.

466
00:27:05,300 --> 00:27:06,995
All right any other questions?

467
00:27:12,320 --> 00:27:17,240
And again, even for these
more complex models,

468
00:27:17,240 --> 00:27:20,150
you can still write
down a simple rate model

469
00:27:20,150 --> 00:27:22,490
formulation of the firing
rate of the output neuron.

470
00:27:22,490 --> 00:27:26,400
It's just a weighted sum
of the input firing rate.

471
00:27:26,400 --> 00:27:32,010
So each neuron in the input
layer fires at some rate.

472
00:27:32,010 --> 00:27:33,425
It has a weight w.

473
00:27:36,970 --> 00:27:39,003
To get the contribution
of this neuron

474
00:27:39,003 --> 00:27:40,670
to the firing rate
of the output neuron,

475
00:27:40,670 --> 00:27:44,720
you just take that input firing
rate times the synaptic weight,

476
00:27:44,720 --> 00:27:48,040
and add that up then for
all the input layer neurons.

477
00:27:53,360 --> 00:27:55,760
So as I said, we've
been describing

478
00:27:55,760 --> 00:27:58,760
the response of our linear
neuron as this weighted sum.

479
00:27:58,760 --> 00:28:03,030
And that's a little bit
cumbersome to carry around.

480
00:28:03,030 --> 00:28:06,260
So we're going to start using
vector notation and matrix

481
00:28:06,260 --> 00:28:08,840
notation to describe networks.

482
00:28:08,840 --> 00:28:12,080
It's just much more compact.

483
00:28:12,080 --> 00:28:16,040
So we're going to take a little
detour, talk about vectors.

484
00:28:16,040 --> 00:28:18,920
So a vector is just a
collection of numbers.

485
00:28:18,920 --> 00:28:21,382
The number of numbers is
called the dimensionality

486
00:28:21,382 --> 00:28:21,965
of the vector.

487
00:28:25,130 --> 00:28:27,710
If a vector has
only two numbers,

488
00:28:27,710 --> 00:28:32,435
then we can just plot
that vector in a plane.

489
00:28:35,000 --> 00:28:40,100
So for a 2D vector, if that
vector has two components, x1

490
00:28:40,100 --> 00:28:41,750
and x2, then we can
plot that vector

491
00:28:41,750 --> 00:28:47,330
in that space of x1 and
x2, put the origin at zero.

492
00:28:47,330 --> 00:28:49,850
In this case, the vector
has two vector components

493
00:28:49,850 --> 00:28:52,820
or elements, x1 and x2.

494
00:28:52,820 --> 00:28:57,500
And in two dimensions we
describe that as spaces,

495
00:28:57,500 --> 00:29:00,920
as R2, the space of
two real numbers.

496
00:29:00,920 --> 00:29:06,620
We can write down that vector
as a row in row vector notation.

497
00:29:06,620 --> 00:29:11,010
So x is x1, x2.

498
00:29:11,010 --> 00:29:15,080
We can write it as a
column vector, x1, x2,

499
00:29:15,080 --> 00:29:18,020
organized on top of
each other, like this.

500
00:29:18,020 --> 00:29:21,630
Vector sums are very simple.

501
00:29:21,630 --> 00:29:24,860
So if you have two
vectors, x and y,

502
00:29:24,860 --> 00:29:29,870
you can write down the sum
of x and y is x plus y.

503
00:29:29,870 --> 00:29:31,460
That's called the resultant.

504
00:29:31,460 --> 00:29:36,310
X plus y it can be written like
this in column vector notation.

505
00:29:36,310 --> 00:29:38,960
You can see that
the sum of x and y

506
00:29:38,960 --> 00:29:45,220
is just an element by element
sum of the vector elements.

507
00:29:45,220 --> 00:29:48,050
It's called element
by element addition.

508
00:29:48,050 --> 00:29:49,440
Let's look at vector product.

509
00:29:49,440 --> 00:29:52,040
So there are multiple
ways of taking

510
00:29:52,040 --> 00:29:54,680
the product of two vectors.

511
00:29:54,680 --> 00:29:58,340
There's an element by element
product, an inner product,

512
00:29:58,340 --> 00:30:03,320
an outer product that we'll
cover in later lectures.

513
00:30:03,320 --> 00:30:06,080
And also, something
called the cross product

514
00:30:06,080 --> 00:30:07,790
that's very common in physics.

515
00:30:07,790 --> 00:30:13,100
But I have not yet seen the
application of a cross product

516
00:30:13,100 --> 00:30:14,460
to neuroscience.

517
00:30:14,460 --> 00:30:20,150
If anybody can find one of
those, I'll give extra credit.

518
00:30:22,860 --> 00:30:26,550
Element by element product
is called a Hadamard product.

519
00:30:26,550 --> 00:30:30,780
So x times y is just the
element-by-element product

520
00:30:30,780 --> 00:30:35,888
of the elements in
the two vectors.

521
00:30:35,888 --> 00:30:38,980
In Matlab, that
element-by-element product

522
00:30:38,980 --> 00:30:42,610
you compute by x dot star y.

523
00:30:48,360 --> 00:30:51,280
Inner product or dot
product looks like this.

524
00:30:51,280 --> 00:30:54,030
So if we have two
column vectors,

525
00:30:54,030 --> 00:30:57,630
the dot product of
x and y is the sum

526
00:30:57,630 --> 00:31:01,710
of the element-by-element
products.

527
00:31:01,710 --> 00:31:07,950
So x dot y is just x1
times y1 plus x2 times y2,

528
00:31:07,950 --> 00:31:11,640
and so on, plus xn times yn.

529
00:31:11,640 --> 00:31:19,790
And that's that sum that we
saw earlier in our feed forward

530
00:31:19,790 --> 00:31:20,710
network.

531
00:31:20,710 --> 00:31:21,210
OK.

532
00:31:21,210 --> 00:31:24,870
So notice that the dot
product is a scalar.

533
00:31:24,870 --> 00:31:26,250
It's a single number.

534
00:31:26,250 --> 00:31:29,760
It's no longer a vector.

535
00:31:29,760 --> 00:31:31,927
Products have some
nice properties.

536
00:31:31,927 --> 00:31:32,760
They're commutative.

537
00:31:32,760 --> 00:31:36,160
So x.y is equal to y.x.

538
00:31:36,160 --> 00:31:39,690
They're distributive
so that vector w dotted

539
00:31:39,690 --> 00:31:44,520
into the sum of two vectors
is just the sum of the two

540
00:31:44,520 --> 00:31:46,320
separate dot products.

541
00:31:46,320 --> 00:31:51,050
So w dot x plus y
is just w.x, w.y.

542
00:31:51,050 --> 00:31:53,010
And it's also linear.

543
00:31:53,010 --> 00:32:00,000
So if you have a x dot y
that is equal to a times

544
00:32:00,000 --> 00:32:00,870
the quantity x.y.

545
00:32:03,470 --> 00:32:07,340
So if you have vector x and
y dotted into each other,

546
00:32:07,340 --> 00:32:09,680
if you make one of those
vectors twice as long,

547
00:32:09,680 --> 00:32:11,630
then the dot product
is just twice as big.

548
00:32:14,790 --> 00:32:17,030
A little bit more
about inner products.

549
00:32:17,030 --> 00:32:19,700
So we can also write
down the inner product

550
00:32:19,700 --> 00:32:21,140
in matrix notation.

551
00:32:21,140 --> 00:32:28,070
So x.y is a matrix
product of a row vector.

552
00:32:28,070 --> 00:32:31,820
Column vector, you remember
how to multiply two matrices.

553
00:32:31,820 --> 00:32:38,060
You multiply the elements of
each row times the elements

554
00:32:38,060 --> 00:32:39,240
of each column.

555
00:32:39,240 --> 00:32:41,570
So you can see that
this in matrix notation

556
00:32:41,570 --> 00:32:44,830
is just the dot product
of those two vectors.

557
00:32:44,830 --> 00:32:48,670
In matrix notation,
this is a 1 by n matrix.

558
00:32:48,670 --> 00:32:50,200
This is an n by 1.

559
00:32:50,200 --> 00:32:58,600
So 1 row by n columns,
times n rows by 1 column.

560
00:32:58,600 --> 00:33:04,030
And that is equal to a 1 by 1
matrix, which is just a scalar.

561
00:33:04,030 --> 00:33:06,190
All right, in
Matlab, let me just

562
00:33:06,190 --> 00:33:09,590
show you how to write
down these components.

563
00:33:09,590 --> 00:33:15,610
So in this case, x is a column
vector, a 1 by 3 column vector.

564
00:33:15,610 --> 00:33:17,920
y is a 1 by 3 column vector.

565
00:33:17,920 --> 00:33:20,920
You can calculate those
vectors like this.

566
00:33:20,920 --> 00:33:26,680
And z is x transpose times y.

567
00:33:26,680 --> 00:33:28,930
And so that's how
you can write down

568
00:33:28,930 --> 00:33:33,520
the dot product of two vectors.

569
00:33:40,140 --> 00:33:44,380
What is the dot product
of a vector with itself?

570
00:33:44,380 --> 00:33:49,410
It's the square
magnitude of the vector.

571
00:33:49,410 --> 00:33:54,890
So x is just the norm or
magnitude of the vector.

572
00:33:54,890 --> 00:33:59,000
And you can see that the
norm of the vector is just--

573
00:33:59,000 --> 00:34:00,770
you can think
about this as being

574
00:34:00,770 --> 00:34:06,500
analogous to the
Pythagorean theorem.

575
00:34:06,500 --> 00:34:11,150
The length of one
side of a triangle

576
00:34:11,150 --> 00:34:17,610
is just the sum of the
squares of all the sides,

577
00:34:17,610 --> 00:34:18,620
the square root of that.

578
00:34:22,510 --> 00:34:27,010
So a unit vector is a
vector that has length 1.

579
00:34:27,010 --> 00:34:30,370
So a unit vector
by definition has

580
00:34:30,370 --> 00:34:32,710
a magnitude of 1,
which means its dot

581
00:34:32,710 --> 00:34:34,960
product with itself is 1.

582
00:34:34,960 --> 00:34:37,150
We can turn any vector
into a unit vector

583
00:34:37,150 --> 00:34:41,290
by just taking that vector,
dividing by its norm.

584
00:34:41,290 --> 00:34:45,280
I'm going to always use this
notation with this little caret

585
00:34:45,280 --> 00:34:47,870
symbol to represent
a unit vector.

586
00:34:47,870 --> 00:34:50,830
So if you see a vector
with that little hat on it,

587
00:34:50,830 --> 00:34:54,239
that means it's a unit vector.

588
00:34:54,239 --> 00:35:00,600
You can express any vector as a
product of a scalar, a length,

589
00:35:00,600 --> 00:35:04,661
times a unit vector
in that direction.

590
00:35:07,430 --> 00:35:11,780
We can find the projection
or component of any vector

591
00:35:11,780 --> 00:35:15,390
in the direction of this
unit vector as follows.

592
00:35:15,390 --> 00:35:18,260
So if we have a
unit vector x, we

593
00:35:18,260 --> 00:35:23,600
can find the projection
of a vector y

594
00:35:23,600 --> 00:35:25,130
onto that unit vector x.

595
00:35:25,130 --> 00:35:26,010
How do we do that?

596
00:35:26,010 --> 00:35:31,550
We just find the normal
projection of that vector.

597
00:35:31,550 --> 00:35:35,570
That distance right there
is called the scalar

598
00:35:35,570 --> 00:35:38,015
projection of y onto x.

599
00:35:42,470 --> 00:35:45,380
If you write down the
length of the vector y,

600
00:35:45,380 --> 00:35:49,280
the norm of the vector y in
the angle between y and x,

601
00:35:49,280 --> 00:35:54,170
then the dot product y.x is
just equal to the magnitude

602
00:35:54,170 --> 00:35:58,670
of y times the cosine of the
angle between the two vectors.

603
00:35:58,670 --> 00:36:03,490
Just simple trigonometry.

604
00:36:03,490 --> 00:36:07,260
We can also define what's called
the vector projection of y

605
00:36:07,260 --> 00:36:09,700
onto x as follows.

606
00:36:09,700 --> 00:36:12,370
So we just draw
that same picture.

607
00:36:12,370 --> 00:36:16,140
So we can find the
projection of y onto x

608
00:36:16,140 --> 00:36:18,840
and add that as a vector.

609
00:36:18,840 --> 00:36:22,830
And that's just this
scalar projection of y

610
00:36:22,830 --> 00:36:27,960
onto x times a unit
vector in the x direction.

611
00:36:27,960 --> 00:36:31,180
So x actually is a unit
vector in this example.

612
00:36:31,180 --> 00:36:35,130
So this vector projection
of y to x is just

613
00:36:35,130 --> 00:36:39,030
defined as y dot x times x.

614
00:36:39,030 --> 00:36:42,240
Any questions about that?

615
00:36:42,240 --> 00:36:45,730
I'm guessing most of you have
seen all of this stuff already.

616
00:36:45,730 --> 00:36:48,870
But we're going to be
using these things a lot.

617
00:36:48,870 --> 00:36:51,370
So I just want to make sure
that we're all on the same page.

618
00:36:54,930 --> 00:36:57,670
And that's just a scalar
times a unit vector.

619
00:37:01,070 --> 00:37:03,800
Let me just give you a
little bit of intuition

620
00:37:03,800 --> 00:37:05,550
about dot products here.

621
00:37:05,550 --> 00:37:08,810
So a dot product is
related to the cosine

622
00:37:08,810 --> 00:37:10,370
of the angle
between two vectors,

623
00:37:10,370 --> 00:37:12,330
as we talked about before.

624
00:37:12,330 --> 00:37:13,940
The dot product
is just magnitude

625
00:37:13,940 --> 00:37:16,430
of x times the magnitude
of y times the cosine

626
00:37:16,430 --> 00:37:18,300
of the angle between them.

627
00:37:18,300 --> 00:37:22,150
So the cosine of the
angle between two vectors

628
00:37:22,150 --> 00:37:28,570
is just the dot product divided
by the product of the magnitude

629
00:37:28,570 --> 00:37:30,470
of each of the two vectors.

630
00:37:30,470 --> 00:37:33,580
So if x and y are unit
vectors, the cosine

631
00:37:33,580 --> 00:37:36,010
of the angle between them
is just the dot product

632
00:37:36,010 --> 00:37:39,110
of the unit vectors.

633
00:37:39,110 --> 00:37:41,110
So again, if x and
y are unit vectors,

634
00:37:41,110 --> 00:37:45,670
then that dot product is
just the cosine of the angle.

635
00:37:45,670 --> 00:37:46,280
Orthogonality.

636
00:37:46,280 --> 00:37:50,680
So two vectors are orthogonal,
are perpendicular, if

637
00:37:50,680 --> 00:37:52,640
and only if their
dot product is 0.

638
00:37:52,640 --> 00:37:55,970
So if we have two
vectors x and y,

639
00:37:55,970 --> 00:37:59,350
they are orthogonal if the angle
between them is 90 degrees.

640
00:37:59,350 --> 00:38:03,320
x.y is just proportional
to the cosine of the angle.

641
00:38:03,320 --> 00:38:05,740
Cosine of 90 degrees is zero.

642
00:38:05,740 --> 00:38:08,890
So if two vectors
are orthogonal,

643
00:38:08,890 --> 00:38:10,990
then their dot
product will be zero.

644
00:38:10,990 --> 00:38:13,160
If their dot product
is zero, then they're

645
00:38:13,160 --> 00:38:16,490
orthogonal with each other.

646
00:38:16,490 --> 00:38:19,610
And using the notation
we just developed,

647
00:38:19,610 --> 00:38:22,670
the vector projection
of y along x

648
00:38:22,670 --> 00:38:27,500
is the zero vector, if those
two vectors are orthogonal.

649
00:38:27,500 --> 00:38:30,620
There is an intuition
that one can

650
00:38:30,620 --> 00:38:34,760
think about in terms of the
relation between dot product

651
00:38:34,760 --> 00:38:36,330
and correlation.

652
00:38:36,330 --> 00:38:39,830
So the dot product is related
to the statistical correlation

653
00:38:39,830 --> 00:38:42,210
between the elements
of those two vectors.

654
00:38:42,210 --> 00:38:46,740
So if you have a
vector x and y, you

655
00:38:46,740 --> 00:38:49,470
can write down the cosine of
the angle between those two

656
00:38:49,470 --> 00:38:53,540
vectors, again, as x.y over
the product of the norms.

657
00:38:53,540 --> 00:38:56,250
And if you write
that out as sums,

658
00:38:56,250 --> 00:38:58,080
you can see that
this is just the sum

659
00:38:58,080 --> 00:39:00,340
of the element-by-element
products--

660
00:39:00,340 --> 00:39:01,740
that's the dot product--

661
00:39:01,740 --> 00:39:05,940
divided by the norm of
x and the norm of y.

662
00:39:05,940 --> 00:39:09,190
And if you have taken
a statistics class,

663
00:39:09,190 --> 00:39:12,480
you will recognize that
as just the Pearson

664
00:39:12,480 --> 00:39:18,330
correlation of a set of numbers
x and a set of numbers y.

665
00:39:18,330 --> 00:39:21,210
The dot product
is closely related

666
00:39:21,210 --> 00:39:25,920
to the correlation between
two sets of numbers.

667
00:39:29,250 --> 00:39:33,030
One other thing that
I want to point out

668
00:39:33,030 --> 00:39:36,330
coming back to the
idea of using this feed

669
00:39:36,330 --> 00:39:39,960
forward network as a
way of receptive field,

670
00:39:39,960 --> 00:39:44,970
you can see that the response
of a neuron in this model

671
00:39:44,970 --> 00:39:50,510
is just the dot product
of the stimulus vector u.

672
00:39:50,510 --> 00:39:55,620
The vector of input firing
rates represents the stimulus,

673
00:39:55,620 --> 00:39:57,660
the dot product of
the stimulus vector u

674
00:39:57,660 --> 00:39:59,400
with the weight vector w.

675
00:39:59,400 --> 00:40:03,890
So the firing rate of the
output neuron is just w.u.

676
00:40:08,190 --> 00:40:10,830
So you can see that
what this means is

677
00:40:10,830 --> 00:40:15,020
that the firing rate
of the output neuron

678
00:40:15,020 --> 00:40:20,750
will be high if there is
a high degree of overlap

679
00:40:20,750 --> 00:40:24,410
between the input, the
pattern of the input,

680
00:40:24,410 --> 00:40:28,220
and the pattern of synaptic
weights from the input layer

681
00:40:28,220 --> 00:40:31,750
to the output neuron.

682
00:40:31,750 --> 00:40:38,330
We can see that w.u is big
when w and u are parallel,

683
00:40:38,330 --> 00:40:44,060
are highly correlated, which
means a neuron fires a lot when

684
00:40:44,060 --> 00:40:47,990
the stimulus matches the pattern
of those synaptic weights.

685
00:40:51,190 --> 00:40:54,280
Now, so you can see
that for a given amount

686
00:40:54,280 --> 00:40:56,890
of power in the stimulus--

687
00:40:56,890 --> 00:41:01,180
so the power is just the
square magnitude of u--

688
00:41:01,180 --> 00:41:04,150
the stimulus that has the best
overlap with the receptive

689
00:41:04,150 --> 00:41:08,320
field, where cosine
of that angle is 1,

690
00:41:08,320 --> 00:41:09,880
produces the largest response.

691
00:41:12,960 --> 00:41:15,890
And so we now have
actually a definition

692
00:41:15,890 --> 00:41:19,760
of the optimal stimulus
of a neuron in terms

693
00:41:19,760 --> 00:41:22,370
of the pattern of
synaptic weights.

694
00:41:22,370 --> 00:41:25,580
In other words, the
optimal stimulus

695
00:41:25,580 --> 00:41:29,030
is one that's essentially
proportional to the weight

696
00:41:29,030 --> 00:41:31,060
matrix.

697
00:41:31,060 --> 00:41:33,210
Any questions so far?

698
00:41:33,210 --> 00:41:37,420
All right, so now let's
turn to the question of how

699
00:41:37,420 --> 00:41:44,440
we use neural networks to do
some interesting computation.

700
00:41:44,440 --> 00:41:49,960
So classification is a
very important computation

701
00:41:49,960 --> 00:41:54,880
that neural networks
do in the brain

702
00:41:54,880 --> 00:41:57,700
and actually in the
application of neural networks

703
00:41:57,700 --> 00:42:00,160
for technology.

704
00:42:02,810 --> 00:42:04,810
So what does
classification mean?

705
00:42:04,810 --> 00:42:07,130
So how does the brain--

706
00:42:07,130 --> 00:42:15,260
how does a neural circuit
decide how a particular input--

707
00:42:15,260 --> 00:42:20,170
let's say that it looks
like you might eat it.

708
00:42:20,170 --> 00:42:21,340
How do we decide--

709
00:42:21,340 --> 00:42:23,590
how do the neural
circuits in our brain

710
00:42:23,590 --> 00:42:25,690
decide whether that
thing that we're seeing

711
00:42:25,690 --> 00:42:30,970
is something edible or
something that will make us sick

712
00:42:30,970 --> 00:42:32,800
based on past experience?

713
00:42:32,800 --> 00:42:36,010
If we see something that
looks like an animal or a dog,

714
00:42:36,010 --> 00:42:42,070
how do we know whether that's a
friendly puppy or a or a wolf?

715
00:42:42,070 --> 00:42:45,920
So these are
classification problems.

716
00:42:45,920 --> 00:42:47,950
And feed forward
circuits actually

717
00:42:47,950 --> 00:42:50,170
can be very good
at classification.

718
00:42:50,170 --> 00:42:55,600
In fact, recent advances
in training neural networks

719
00:42:55,600 --> 00:42:59,860
have actually resulted in feed
forward neural networks that

720
00:42:59,860 --> 00:43:02,860
actually approach human
performance in terms

721
00:43:02,860 --> 00:43:07,120
of their ability to make
decisions like this.

722
00:43:10,930 --> 00:43:11,680
All right.

723
00:43:11,680 --> 00:43:17,560
So basically, a
feed forward circuit

724
00:43:17,560 --> 00:43:19,240
that does
classification like this

725
00:43:19,240 --> 00:43:21,890
typically has an input layer.

726
00:43:21,890 --> 00:43:26,320
It has a bunch of inputs that
represent sensory stimulus.

727
00:43:26,320 --> 00:43:29,680
And a bunch of output
neurons that represent

728
00:43:29,680 --> 00:43:34,420
different categorizations
of that input stimulus.

729
00:43:34,420 --> 00:43:37,300
So you can have a
retinal input here.

730
00:43:37,300 --> 00:43:40,210
Going to other
layers of a network.

731
00:43:40,210 --> 00:43:41,710
And then at the end
of that, you can

732
00:43:41,710 --> 00:43:45,310
have a network that starts
firing when that input was

733
00:43:45,310 --> 00:43:47,650
a dog, or starts firing
another neuron that

734
00:43:47,650 --> 00:43:52,780
starts firing when that input
was a cat, or something else.

735
00:43:52,780 --> 00:43:56,810
Now in general,
classification networks

736
00:43:56,810 --> 00:43:59,810
that have one input layer
and one output layer

737
00:43:59,810 --> 00:44:01,820
can't do this problem.

738
00:44:01,820 --> 00:44:07,040
You can't take a visual
input and have connections

739
00:44:07,040 --> 00:44:10,370
to another layer of
neurons that just light up

740
00:44:10,370 --> 00:44:13,760
when the picture that the
network is seeing is a dog.

741
00:44:13,760 --> 00:44:16,470
Another neuron lights
up when it's a cat.

742
00:44:16,470 --> 00:44:22,970
Generally, there are many
layers of neurons in between.

743
00:44:22,970 --> 00:44:27,470
But today, we're going to
talk about a very simplified

744
00:44:27,470 --> 00:44:30,250
version of the
classification problem

745
00:44:30,250 --> 00:44:33,710
and build up to the sorts of
networks that can actually

746
00:44:33,710 --> 00:44:36,980
do those more complex problems.

747
00:44:36,980 --> 00:44:44,090
So I just want to point out
that the obviously our brains

748
00:44:44,090 --> 00:44:46,400
are very good at
recognizing things.

749
00:44:46,400 --> 00:44:48,650
We do this all the time.

750
00:44:48,650 --> 00:44:51,650
There are hundreds of objects
in every visual scene.

751
00:44:51,650 --> 00:44:54,960
And we're able to recognize
every one of those objects.

752
00:44:54,960 --> 00:44:58,310
But it turns out that there
are individual neurons--

753
00:44:58,310 --> 00:45:01,490
so in this case, I
alluded to the idea

754
00:45:01,490 --> 00:45:04,490
that there are individual
nones in this network that

755
00:45:04,490 --> 00:45:07,190
light up when the
sensory input is a dog

756
00:45:07,190 --> 00:45:10,920
or light up when the
input is an elephant.

757
00:45:10,920 --> 00:45:14,840
And it turns out that that's
actually true in the brain.

758
00:45:14,840 --> 00:45:18,740
So there have
recently been studies

759
00:45:18,740 --> 00:45:22,080
where it's been
possible to record

760
00:45:22,080 --> 00:45:25,680
in parts of the human brain in
patients that are undergoing

761
00:45:25,680 --> 00:45:29,550
brain surgery for the
treatment of epilepsy

762
00:45:29,550 --> 00:45:35,240
or tumors or things like
that where you have to go in

763
00:45:35,240 --> 00:45:39,660
and find parts of the
brain that are defective,

764
00:45:39,660 --> 00:45:41,560
find parts of the
brain that are healthy.

765
00:45:41,560 --> 00:45:43,530
So when you do a
surgery, you can

766
00:45:43,530 --> 00:45:47,760
be very careful to just
do surgery on the damaged

767
00:45:47,760 --> 00:45:50,550
parts of the brain
and not impact parts

768
00:45:50,550 --> 00:45:52,090
of the brain that are healthy.

769
00:45:52,090 --> 00:45:55,920
So there are cases now,
more and more commonly,

770
00:45:55,920 --> 00:45:58,920
where neuroscientists can
work with neurosurgeons

771
00:45:58,920 --> 00:46:02,580
to actually record from
neurons in the brain

772
00:46:02,580 --> 00:46:09,000
in these patients who are
in preparation for surgery.

773
00:46:09,000 --> 00:46:10,530
And so it's been
possible to record

774
00:46:10,530 --> 00:46:12,210
from neurons in the brain.

775
00:46:12,210 --> 00:46:19,340
This was a study from
Itzhak Frieds lab at UCLA.

776
00:46:19,340 --> 00:46:22,710
And this shows recording in
the right anterior hippocampus.

777
00:46:22,710 --> 00:46:28,720
And what this lab did
was to find neurons.

778
00:46:28,720 --> 00:46:30,720
So these were electrodes
implanted in the brain.

779
00:46:30,720 --> 00:46:33,150
And then they basically
take these patients

780
00:46:33,150 --> 00:46:36,450
and they show them thousands
of pictures and look

781
00:46:36,450 --> 00:46:40,132
at how their brains respond
to different visual inputs.

782
00:46:40,132 --> 00:46:42,090
So let me just show you
what you're looking at.

783
00:46:42,090 --> 00:46:46,320
These are just different
pictures of celebrities.

784
00:46:46,320 --> 00:46:54,795
There's Luke Skywalker, Mother
Teresa, and some others.

785
00:46:57,508 --> 00:46:59,550
This paper is getting old
enough that you may not

786
00:46:59,550 --> 00:47:01,860
recognize most of these people.

787
00:47:01,860 --> 00:47:05,640
But if you record from
neurons in the brain,

788
00:47:05,640 --> 00:47:06,950
you can see that--

789
00:47:06,950 --> 00:47:10,245
so what do you see here?

790
00:47:10,245 --> 00:47:11,120
I think that's Oprah.

791
00:47:11,120 --> 00:47:14,820
The image is flashed up on
the screen for about a second.

792
00:47:14,820 --> 00:47:17,430
You record this neuron spiking.

793
00:47:17,430 --> 00:47:19,900
Here you see a couple spikes.

794
00:47:19,900 --> 00:47:22,230
Here's when the image
was actually presented.

795
00:47:22,230 --> 00:47:24,300
And here's where the
image was turned off.

796
00:47:24,300 --> 00:47:26,053
You can see different trials.

797
00:47:26,053 --> 00:47:27,720
So this neuron actually
had a little bit

798
00:47:27,720 --> 00:47:29,550
of a response
right there shortly

799
00:47:29,550 --> 00:47:33,130
after the stimulus
was turned on.

800
00:47:33,130 --> 00:47:38,260
But you can see there's not that
much response in these neurons.

801
00:47:38,260 --> 00:47:41,800
But when they flashed
a different stimulus--

802
00:47:41,800 --> 00:47:45,060
anybody know who that is?

803
00:47:45,060 --> 00:47:46,020
That's Halle Berry.

804
00:47:48,660 --> 00:47:51,480
Look at this neuron.

805
00:47:51,480 --> 00:47:52,890
Every time you
show this picture,

806
00:47:52,890 --> 00:47:57,430
that neuron fires off a
couple spikes very precisely.

807
00:47:57,430 --> 00:47:59,730
If you look at the
histogram, these

808
00:47:59,730 --> 00:48:04,200
are histograms underneath
showing as a function of time

809
00:48:04,200 --> 00:48:06,090
relative to the onset
of the stimulus,

810
00:48:06,090 --> 00:48:08,670
you could see that this
neuron very reliably spikes.

811
00:48:08,670 --> 00:48:11,010
There's a different
picture of Halle Berry.

812
00:48:11,010 --> 00:48:12,300
Neuron spikes.

813
00:48:12,300 --> 00:48:14,490
Different picture,
neuron spikes.

814
00:48:14,490 --> 00:48:16,450
Another picture, neuron spikes.

815
00:48:19,890 --> 00:48:25,040
A line drawing of Halle
Berry, the neuron spikes.

816
00:48:25,040 --> 00:48:30,380
Catwoman, the neuron spikes.

817
00:48:30,380 --> 00:48:33,760
The text, Halle Berry,
the neuron spikes.

818
00:48:38,903 --> 00:48:39,445
It's amazing.

819
00:48:43,430 --> 00:48:48,710
So this group got a
lot of press for this

820
00:48:48,710 --> 00:48:54,050
because they also found
Jennifer Aniston neurons.

821
00:48:54,050 --> 00:48:57,490
They found other celebrities.

822
00:48:57,490 --> 00:49:00,570
This is like some celebrity
part of the brain.

823
00:49:00,570 --> 00:49:02,300
No, it's actually
a part of the brain

824
00:49:02,300 --> 00:49:05,600
where you have neurons that
have very sparse responses

825
00:49:05,600 --> 00:49:08,180
to a wide range of things.

826
00:49:08,180 --> 00:49:14,240
But they're extremely
specific to particular people

827
00:49:14,240 --> 00:49:18,510
or categories or objects.

828
00:49:18,510 --> 00:49:24,860
And it actually is consistent
with this old notion of what's

829
00:49:24,860 --> 00:49:26,960
called the grandmother cell.

830
00:49:26,960 --> 00:49:30,680
So back before people
were able to record

831
00:49:30,680 --> 00:49:34,190
in the human brain like
this, there was speculation

832
00:49:34,190 --> 00:49:36,260
that there might be
neurons in the brain that

833
00:49:36,260 --> 00:49:38,870
are so specific for
particular things,

834
00:49:38,870 --> 00:49:41,120
that there might be one
neuron in your brain

835
00:49:41,120 --> 00:49:44,240
that responds when you
see your grandmother.

836
00:49:44,240 --> 00:49:47,240
And so it turns out
it's actually true.

837
00:49:47,240 --> 00:49:48,740
There are neurons
in your brain that

838
00:49:48,740 --> 00:49:55,920
respond very specifically to
particular concepts or people

839
00:49:55,920 --> 00:49:58,360
or things.

840
00:49:58,360 --> 00:50:04,170
So the question of how
these kinds of neurons

841
00:50:04,170 --> 00:50:07,595
acquire their responses is
really cool and interesting.

842
00:50:12,010 --> 00:50:18,490
So that leads us to the
idea of perceptrons.

843
00:50:18,490 --> 00:50:23,140
Perceptron is the simplest
notion of how you can have

844
00:50:23,140 --> 00:50:27,700
a neuron that responds to a
particular thing that detects

845
00:50:27,700 --> 00:50:31,410
a particular thing and
responds when it sees it

846
00:50:31,410 --> 00:50:33,190
and doesn't respond
when it doesn't.

847
00:50:35,720 --> 00:50:41,150
So let's start with the
simplest notion of a perceptron.

848
00:50:41,150 --> 00:50:44,530
So how do we make a neuron that
fires when it sees something--

849
00:50:44,530 --> 00:50:47,020
let's say a dog--

850
00:50:47,020 --> 00:50:49,080
and doesn't fire
when there is no dog?

851
00:50:53,330 --> 00:50:55,800
So in order to think about
this a little bit more,

852
00:50:55,800 --> 00:51:00,320
so we can begin thinking
about this in the case

853
00:51:00,320 --> 00:51:04,550
where we have a single neuron
input and a single output

854
00:51:04,550 --> 00:51:05,460
neuron.

855
00:51:05,460 --> 00:51:08,750
So if we have a single input
neuron, then what comes in

856
00:51:08,750 --> 00:51:09,650
has to be--

857
00:51:09,650 --> 00:51:10,970
it can't be an image right?

858
00:51:10,970 --> 00:51:13,280
An image is a high
dimensional thing that

859
00:51:13,280 --> 00:51:17,570
has many thousands of pixels.

860
00:51:17,570 --> 00:51:22,550
So you can't write that
down as a simple model

861
00:51:22,550 --> 00:51:25,560
with a single input neuron
and a single output neuron.

862
00:51:25,560 --> 00:51:27,770
So you need to do this
classification problem

863
00:51:27,770 --> 00:51:28,750
in one-dimension.

864
00:51:28,750 --> 00:51:30,440
So we can imagine
that we have an input

865
00:51:30,440 --> 00:51:37,165
neuron that comes from, let's
say, some set of numbers--

866
00:51:37,165 --> 00:51:39,140
I'll make up a story
here-- some set

867
00:51:39,140 --> 00:51:43,670
of neurons that measure
the dogginess of an input.

868
00:51:43,670 --> 00:51:47,030
So let's say that we
have a single input that

869
00:51:47,030 --> 00:51:51,320
fires like crazy when it sees
this cute little guy here.

870
00:51:51,320 --> 00:51:56,210
And fires at a
negative rate when

871
00:51:56,210 --> 00:52:00,630
it sees that thing, which
doesn't look much like a dog.

872
00:52:00,630 --> 00:52:05,960
So we have a single input
that's a measure of dogginess.

873
00:52:05,960 --> 00:52:09,475
And now let's say that we
take this dogginess detector

874
00:52:09,475 --> 00:52:10,850
and we point it
around the world.

875
00:52:10,850 --> 00:52:13,590
And we walk around outside
with our dogginess detector

876
00:52:13,590 --> 00:52:16,700
and we make a bunch
of measurements.

877
00:52:16,700 --> 00:52:19,042
So we're going to see
something that looks like this.

878
00:52:19,042 --> 00:52:20,750
We're going to see a
lot of measurements,

879
00:52:20,750 --> 00:52:22,880
a lot of observations
down here that

880
00:52:22,880 --> 00:52:24,860
are close to zero dogginess.

881
00:52:24,860 --> 00:52:27,560
And we're going to
see a bump of things

882
00:52:27,560 --> 00:52:29,300
up here that correspond to dogs.

883
00:52:29,300 --> 00:52:31,877
Whenever we point our
dogginess detector at a dog,

884
00:52:31,877 --> 00:52:33,710
it's going to give us
a measurement up here.

885
00:52:33,710 --> 00:52:35,690
And we're going to
get a bunch of those.

886
00:52:35,690 --> 00:52:38,450
And those things
correspond to dogs.

887
00:52:38,450 --> 00:52:41,270
So we need to build
a network that

888
00:52:41,270 --> 00:52:44,900
fires when the input is
up here and doesn't fire

889
00:52:44,900 --> 00:52:46,150
when the input is down there.

890
00:52:48,770 --> 00:52:51,890
So how do we do that?

891
00:52:51,890 --> 00:52:56,090
So the central feature
of classification

892
00:52:56,090 --> 00:53:00,470
is this notion of binariness,
of decision-making.

893
00:53:00,470 --> 00:53:04,100
That it fires when you
see a dog and doesn't

894
00:53:04,100 --> 00:53:06,380
fire when you don't see a dog.

895
00:53:06,380 --> 00:53:09,170
So there exists a
classification boundary

896
00:53:09,170 --> 00:53:10,490
in this stimulus space.

897
00:53:10,490 --> 00:53:13,550
You can imagine that there's
some points along this

898
00:53:13,550 --> 00:53:17,480
dimension above which you'll
say that that input is a dog,

899
00:53:17,480 --> 00:53:20,600
below which you
say that it isn't.

900
00:53:20,600 --> 00:53:24,410
And we can imagine that
that classification boundary

901
00:53:24,410 --> 00:53:25,470
is right here.

902
00:53:25,470 --> 00:53:27,060
It's a particular number.

903
00:53:27,060 --> 00:53:30,560
It's a particular value
of our dogginess detector,

904
00:53:30,560 --> 00:53:32,460
above which we're
going to call it a dog,

905
00:53:32,460 --> 00:53:37,430
and below which we're going
to call it something else.

906
00:53:37,430 --> 00:53:43,440
How do we make this
neuron respond by firing

907
00:53:43,440 --> 00:53:46,620
when there's a dog and not
firing when there's no dog?

908
00:53:46,620 --> 00:53:48,060
Can we use a linear neuron?

909
00:53:51,700 --> 00:53:54,340
Can we use one of
our linear neurons

910
00:53:54,340 --> 00:53:57,430
that we just talked
about before?

911
00:53:57,430 --> 00:54:01,760
We can't do that because a
linear neuron will always fire

912
00:54:01,760 --> 00:54:04,450
more the bigger the input is.

913
00:54:04,450 --> 00:54:07,300
And it will fire less
if the dogginess is 0.

914
00:54:07,300 --> 00:54:09,220
And it will even
fire more negatively

915
00:54:09,220 --> 00:54:11,690
if the dogginess
input is negative.

916
00:54:11,690 --> 00:54:14,350
So a linear neuron is
terrible for actually

917
00:54:14,350 --> 00:54:16,090
making any decisions.

918
00:54:16,090 --> 00:54:21,670
Linear neurons always go,
ah, well, maybe that's a dog.

919
00:54:21,670 --> 00:54:22,750
Not really.

920
00:54:22,750 --> 00:54:25,270
There's no decisions.

921
00:54:25,270 --> 00:54:27,220
So in order to
have a decision, we

922
00:54:27,220 --> 00:54:30,670
need to have a particular
kind of neuron.

923
00:54:30,670 --> 00:54:36,310
And that kind of neuron
uses something very natural.

924
00:54:36,310 --> 00:54:40,100
In biophysics, it's the
spike threshold of neurons.

925
00:54:40,100 --> 00:54:45,580
Neurons only fire when the
input is above some threshold,

926
00:54:45,580 --> 00:54:46,180
generally.

927
00:54:46,180 --> 00:54:48,013
There are neurons that
are tonically active.

928
00:54:48,013 --> 00:54:49,570
But let's not worry about those.

929
00:54:49,570 --> 00:54:51,970
So many neurons only
fire when the input

930
00:54:51,970 --> 00:54:53,900
is above some threshold.

931
00:54:53,900 --> 00:54:56,770
So for decision-making
and classification,

932
00:54:56,770 --> 00:55:03,520
a commonly used kind of neuron
takes this idea to an extreme.

933
00:55:03,520 --> 00:55:06,760
So for perceptrons, we're
going to use a simplified

934
00:55:06,760 --> 00:55:09,010
model of a neuron
that's particularly

935
00:55:09,010 --> 00:55:10,570
good at making decisions.

936
00:55:10,570 --> 00:55:13,830
There's no if, ands,
or buts about it.

937
00:55:13,830 --> 00:55:16,540
It's either off or on.

938
00:55:16,540 --> 00:55:19,170
It's called a binary unit.

939
00:55:19,170 --> 00:55:23,010
And a binary unit uses
what's called a step

940
00:55:23,010 --> 00:55:26,610
function for its FI curve.

941
00:55:26,610 --> 00:55:29,170
That step function is 0--

942
00:55:29,170 --> 00:55:33,240
the output is 0 if the
input is zero or below.

943
00:55:33,240 --> 00:55:39,090
And the output is 1 if
the input is above 0.

944
00:55:41,960 --> 00:55:45,500
We can use that step
function to create

945
00:55:45,500 --> 00:55:48,410
a neuron that responds
when the input is

946
00:55:48,410 --> 00:55:51,620
above any threshold we want.

947
00:55:51,620 --> 00:55:58,430
So we can write down the output
firing rate is this function,

948
00:55:58,430 --> 00:56:03,050
a step function-- that
function of a quantity that's

949
00:56:03,050 --> 00:56:07,700
given by w times u, the synaptic
weight times the input firing

950
00:56:07,700 --> 00:56:10,430
rate, minus that threshold.

951
00:56:10,430 --> 00:56:13,040
So you can see if
w times u, which

952
00:56:13,040 --> 00:56:17,860
is the input synaptic current,
if that synaptic current is

953
00:56:17,860 --> 00:56:25,080
above theta, then this
argument to this function

954
00:56:25,080 --> 00:56:28,200
is greater than 0,
then the neuron spikes.

955
00:56:28,200 --> 00:56:31,970
If this argument is negative,
then the neuron doesn't spike.

956
00:56:31,970 --> 00:56:37,950
So by changing theta, we can put
that decision boundary anywhere

957
00:56:37,950 --> 00:56:39,475
we want.

958
00:56:39,475 --> 00:56:40,350
Does that make sense?

959
00:56:49,440 --> 00:56:54,480
Usually the way we do
this is we pick a theta.

960
00:56:54,480 --> 00:56:57,870
We say our neuron
has a theta of 1.

961
00:56:57,870 --> 00:57:00,690
And then we do everything else--

962
00:57:00,690 --> 00:57:02,730
we do everything else
we're going to do

963
00:57:02,730 --> 00:57:05,230
with this network with a theta.

964
00:57:05,230 --> 00:57:07,830
So what I'm going to talk
about today are just two cases.

965
00:57:07,830 --> 00:57:11,270
Where theta is a fixed
number that's non-zero,

966
00:57:11,270 --> 00:57:14,085
or theta that's a fixed
number that is equal to 0.

967
00:57:14,085 --> 00:57:15,960
So we're going to talk
about those two cases.

968
00:57:19,050 --> 00:57:21,120
So the neuron fires
when the input w

969
00:57:21,120 --> 00:57:22,660
u is greater than theta.

970
00:57:22,660 --> 00:57:25,540
And it doesn't fire
when it's less.

971
00:57:25,540 --> 00:57:29,410
So now the output neuron fires
whenever the input neuron

972
00:57:29,410 --> 00:57:33,920
has a firing rate greater
than this decision boundary.

973
00:57:33,920 --> 00:57:37,720
So the decision boundary,
the u threshold,

974
00:57:37,720 --> 00:57:39,970
is equal to theta divided by w.

975
00:57:39,970 --> 00:57:41,020
Does that make sense?

976
00:57:41,020 --> 00:57:44,680
U threshold is the
neuron fires when

977
00:57:44,680 --> 00:57:47,290
u is greater than
theta divided by w.

978
00:57:51,030 --> 00:57:55,510
So the way we learn, the
way this network learns

979
00:57:55,510 --> 00:58:01,270
to fire when that u is above
this classification boundary

980
00:58:01,270 --> 00:58:03,940
is simply by
changing the weight.

981
00:58:03,940 --> 00:58:05,180
Does that make sense?

982
00:58:05,180 --> 00:58:08,380
So we're going to
learn the weight such

983
00:58:08,380 --> 00:58:12,130
that this network fires whenever
the input says there's a dog.

984
00:58:12,130 --> 00:58:14,790
And it doesn't fire whenever
the input says there's no dog.

985
00:58:18,250 --> 00:58:22,650
So let's see what happens
when w is really small.

986
00:58:22,650 --> 00:58:25,390
If w is really small,
then what happens

987
00:58:25,390 --> 00:58:28,360
is all of these-- remember,
this is the input.

988
00:58:28,360 --> 00:58:32,170
That's that the
dogginess detector.

989
00:58:32,170 --> 00:58:34,810
If w is really small,
then all these inputs

990
00:58:34,810 --> 00:58:40,780
get collapsed to a small input
current into our output neuron.

991
00:58:40,780 --> 00:58:43,780
Does that make sense?

992
00:58:43,780 --> 00:58:47,520
So all those different
inputs, dogs and non-dogs,

993
00:58:47,520 --> 00:58:50,490
gets multiplied
by a small number.

994
00:58:50,490 --> 00:58:53,790
And all those inputs
are close to 0.

995
00:58:53,790 --> 00:58:55,860
And if all those
inputs are close to 0,

996
00:58:55,860 --> 00:59:00,320
they're all below the threshold
for making this neuron spike.

997
00:59:00,320 --> 00:59:04,610
So this network is not
good for detecting dogs

998
00:59:04,610 --> 00:59:08,390
because it says it never
fires, whether the input

999
00:59:08,390 --> 00:59:11,420
is a dog or a non-dog.

1000
00:59:11,420 --> 00:59:14,570
Now what happens
if w is too big?

1001
00:59:14,570 --> 00:59:21,710
If w is really big, then this
range of dogginess values

1002
00:59:21,710 --> 00:59:24,910
gets multiplied by a big number.

1003
00:59:24,910 --> 00:59:31,355
And you can see that a bunch of
non-dogs make the neuron fire.

1004
00:59:31,355 --> 00:59:32,230
Does that make sense?

1005
00:59:32,230 --> 00:59:36,450
So now this one fires
for dogs plus doggie-ish

1006
00:59:36,450 --> 00:59:38,350
looking things,
which, I don't know,

1007
00:59:38,350 --> 00:59:40,478
maybe it'll fire
when it sees a cat.

1008
00:59:40,478 --> 00:59:41,145
That's terrible.

1009
00:59:43,860 --> 00:59:49,590
So you have to choose w to make
this classification network

1010
00:59:49,590 --> 00:59:51,180
function properly.

1011
00:59:51,180 --> 00:59:52,260
Does that make sense?

1012
00:59:52,260 --> 00:59:56,630
And if you choose
w just right, then

1013
00:59:56,630 --> 01:00:00,360
that classification
boundary lands

1014
01:00:00,360 --> 01:00:03,220
right on the threshold
of the neuron.

1015
01:00:03,220 --> 01:00:07,300
And now the neuron spikes
whenever there is a dog.

1016
01:00:07,300 --> 01:00:10,510
And it doesn't spike
whenever there's not a dog.

1017
01:00:10,510 --> 01:00:12,590
So what's the message here?

1018
01:00:12,590 --> 01:00:16,450
The message is we can
have a neuron that

1019
01:00:16,450 --> 01:00:21,310
has this binary threshold.

1020
01:00:21,310 --> 01:00:25,510
And what we can do is simply
by changing the weight,

1021
01:00:25,510 --> 01:00:29,350
we can make that
threshold land anywhere

1022
01:00:29,350 --> 01:00:30,700
on this space of inputs.

1023
01:00:34,820 --> 01:00:38,370
And we can actually use the
error to set the weight.

1024
01:00:38,370 --> 01:00:41,030
So let's say that
we made errors here.

1025
01:00:41,030 --> 01:00:46,280
We classify dogs as non-dogs
because the neuron didn't fire.

1026
01:00:46,280 --> 01:00:50,190
You can see that this was the
case when w was too small.

1027
01:00:50,190 --> 01:00:54,290
So if you classify
dogs as non-dogs,

1028
01:00:54,290 --> 01:00:57,300
then you need to make w bigger.

1029
01:00:57,300 --> 01:01:01,100
And if you classify
non-dogs as dogs,

1030
01:01:01,100 --> 01:01:03,930
you need to make w smaller.

1031
01:01:03,930 --> 01:01:08,310
And by measuring what
kind of errors you make,

1032
01:01:08,310 --> 01:01:14,460
you can actually fix the weights
to get to the right answer.

1033
01:01:14,460 --> 01:01:17,480
So this is a method
called supervised

1034
01:01:17,480 --> 01:01:22,380
learning where you
set w randomly.

1035
01:01:22,380 --> 01:01:24,210
You take a guess.

1036
01:01:24,210 --> 01:01:26,940
And then you look at
the mistakes you make.

1037
01:01:26,940 --> 01:01:31,450
And you use those
mistakes to fix the w.

1038
01:01:31,450 --> 01:01:37,310
In other words, you
just look at the world

1039
01:01:37,310 --> 01:01:40,460
and you say, oh, that's a dog.

1040
01:01:40,460 --> 01:01:42,350
And then your mom
says, no, that's not

1041
01:01:42,350 --> 01:01:44,970
a dog, that's something else.

1042
01:01:44,970 --> 01:01:46,265
And you adjust your weights.

1043
01:01:48,790 --> 01:01:50,520
I think that was the
example I just gave.

1044
01:01:50,520 --> 01:01:52,470
You're going to
make that w smaller.

1045
01:01:52,470 --> 01:01:54,720
In another case, you'll make
the other kind of mistake

1046
01:01:54,720 --> 01:01:55,860
and you'll fix the weights.

1047
01:01:59,010 --> 01:02:01,650
So this is called a perceptron.

1048
01:02:01,650 --> 01:02:04,590
And the way you learn the
weights in a perceptron is you

1049
01:02:04,590 --> 01:02:08,680
just classify things and you
figure out what kind of mistake

1050
01:02:08,680 --> 01:02:11,840
you made and you use that
to adjust the weights.

1051
01:02:11,840 --> 01:02:15,790
So that's the basic idea of
a perceptron and perceptron

1052
01:02:15,790 --> 01:02:16,690
learning.

1053
01:02:16,690 --> 01:02:19,480
And there's a lot of
mathematical formalism

1054
01:02:19,480 --> 01:02:21,760
that goes into how
that learning happens.

1055
01:02:21,760 --> 01:02:26,170
And we're going to
get to that in more

1056
01:02:26,170 --> 01:02:29,420
detail in the next lecture.

1057
01:02:29,420 --> 01:02:33,340
But before we do that,
I want to go from having

1058
01:02:33,340 --> 01:02:34,630
a one-dimensional case.

1059
01:02:34,630 --> 01:02:37,300
So here we had a one-dimensional
network that was just

1060
01:02:37,300 --> 01:02:40,630
operating on dogginess.

1061
01:02:40,630 --> 01:02:43,030
And then we have a
single neuron that

1062
01:02:43,030 --> 01:02:46,170
says, was that a dog or not.

1063
01:02:46,170 --> 01:02:49,290
But in general, you're
not classifying things

1064
01:02:49,290 --> 01:02:51,420
based on one input.

1065
01:02:51,420 --> 01:02:54,840
Like for example when you
have to identify a dog,

1066
01:02:54,840 --> 01:02:59,350
you have a whole
image of something.

1067
01:02:59,350 --> 01:03:01,970
And you have to classify
that based on an image.

1068
01:03:01,970 --> 01:03:03,790
So let's go from the
one-dimensional case

1069
01:03:03,790 --> 01:03:05,080
to a two-dimensional case.

1070
01:03:05,080 --> 01:03:08,260
So the classification isn't
done on one-dimension,

1071
01:03:08,260 --> 01:03:11,650
but it's based on many
different features.

1072
01:03:11,650 --> 01:03:15,720
So let's say that we have
two features, furriness

1073
01:03:15,720 --> 01:03:17,500
and bad breath.

1074
01:03:17,500 --> 01:03:21,160
That dog doesn't really
look like it has bad breath.

1075
01:03:21,160 --> 01:03:24,790
but mine does.

1076
01:03:24,790 --> 01:03:27,550
So you can have two
different features, furriness

1077
01:03:27,550 --> 01:03:28,220
and bad breath.

1078
01:03:28,220 --> 01:03:32,190
And dogs are generally,
let's say, up here.

1079
01:03:32,190 --> 01:03:34,990
Now you can have other animals.

1080
01:03:34,990 --> 01:03:37,960
This guy is
definitely not furry.

1081
01:03:37,960 --> 01:03:39,480
So he's down here somewhere.

1082
01:03:39,480 --> 01:03:41,830
And you can have
this guy up here.

1083
01:03:41,830 --> 01:03:44,700
He's definitely furry.

1084
01:03:44,700 --> 01:03:47,310
So you have these two
dimensions and a bunch

1085
01:03:47,310 --> 01:03:49,800
of observations in
those two dimensions,

1086
01:03:49,800 --> 01:03:51,920
in those higher dimensions.

1087
01:03:51,920 --> 01:03:55,470
And you can see
that, in this case,

1088
01:03:55,470 --> 01:04:01,590
you can't actually apply that
one-dimensional decision-making

1089
01:04:01,590 --> 01:04:06,670
circuit to discriminate dogs
from these other animals.

1090
01:04:06,670 --> 01:04:07,630
Why is that?

1091
01:04:07,630 --> 01:04:11,950
Because if I apply my
one-dimensional perceptron

1092
01:04:11,950 --> 01:04:14,050
to this problem,
you can see that I

1093
01:04:14,050 --> 01:04:18,600
could put a boundary
here and it will

1094
01:04:18,600 --> 01:04:23,480
misclassify some of these
non-furry animals as dogs.

1095
01:04:23,480 --> 01:04:26,130
Or I could put my
classifier here

1096
01:04:26,130 --> 01:04:30,420
and it will misclassify
some of these cats as dogs.

1097
01:04:30,420 --> 01:04:35,110
So how would I separate dogs
from these other animals

1098
01:04:35,110 --> 01:04:37,710
if I had this
two-dimensional space?

1099
01:04:37,710 --> 01:04:39,770
What would I do?

1100
01:04:39,770 --> 01:04:42,160
How would I put a
classification bound?

1101
01:04:42,160 --> 01:04:47,270
If this doesn't work and this
doesn't work, what would I do?

1102
01:04:47,270 --> 01:04:50,090
You could put a
boundary right there.

1103
01:04:50,090 --> 01:04:51,890
So in this little
toy problem, that

1104
01:04:51,890 --> 01:04:55,100
would perfectly separate
dogs from all these non-dogs.

1105
01:04:58,390 --> 01:05:01,000
So how do we do that?

1106
01:05:01,000 --> 01:05:08,290
Well, what we want is some
way of projecting these inputs

1107
01:05:08,290 --> 01:05:12,310
onto some other
direction so that we

1108
01:05:12,310 --> 01:05:16,470
can put a classification
boundary right there.

1109
01:05:16,470 --> 01:05:20,010
And it turns out there's a very
simple network that does that.

1110
01:05:20,010 --> 01:05:21,423
It looks like this.

1111
01:05:21,423 --> 01:05:25,620
We take each one of those
detectors, a furriness detector

1112
01:05:25,620 --> 01:05:31,470
and a bad breath detector,
and we have those two inputs.

1113
01:05:31,470 --> 01:05:34,500
We have those inputs synapse
onto our output neuron

1114
01:05:34,500 --> 01:05:37,800
with some weight w1
and some weight w2,

1115
01:05:37,800 --> 01:05:41,200
and we calculate the
firing rate of this neuron.

1116
01:05:41,200 --> 01:05:46,080
Now we have this problem of
how do we place this decision

1117
01:05:46,080 --> 01:05:48,730
boundary correctly.

1118
01:05:48,730 --> 01:05:49,800
What's the answer?

1119
01:05:49,800 --> 01:05:51,645
Well, in the
one-dimensional example,

1120
01:05:51,645 --> 01:05:52,770
what is it that we learned?

1121
01:05:57,400 --> 01:05:59,690
What was it that we
were actually changing?

1122
01:05:59,690 --> 01:06:02,830
We were taking guesses.

1123
01:06:02,830 --> 01:06:05,530
And if we were right
or wrong, we did what?

1124
01:06:05,530 --> 01:06:08,170
We changed the weight.

1125
01:06:08,170 --> 01:06:09,880
And that's exactly
what we do here.

1126
01:06:09,880 --> 01:06:14,320
We're going to learn to
change these weights to put

1127
01:06:14,320 --> 01:06:16,150
that boundary in
the right place.

1128
01:06:18,830 --> 01:06:21,260
If we just take a random
guess for these weights,

1129
01:06:21,260 --> 01:06:25,580
that line is just going to
be some random position.

1130
01:06:25,580 --> 01:06:28,160
But we can learn to
place that line exactly

1131
01:06:28,160 --> 01:06:32,480
in the right place to
separate dogs from non-dogs.

1132
01:06:32,480 --> 01:06:34,040
So let's just think
a little bit more

1133
01:06:34,040 --> 01:06:39,020
about how that
decision boundary looks

1134
01:06:39,020 --> 01:06:41,030
as a function of the weight.

1135
01:06:41,030 --> 01:06:43,640
So let's look at this case
where we have two inputs.

1136
01:06:43,640 --> 01:06:51,700
So now you can see that the
input to this neuron is w.u.

1137
01:06:51,700 --> 01:06:56,950
So now if we use our binary
neuron with a threshold,

1138
01:06:56,950 --> 01:07:00,370
we can see that the firing
rate of this output neuron

1139
01:07:00,370 --> 01:07:05,610
is this step function
operating on or acting

1140
01:07:05,610 --> 01:07:08,550
on this input, w.u minus theta.

1141
01:07:12,750 --> 01:07:14,270
So now what does that look like?

1142
01:07:14,270 --> 01:07:16,160
The decision
boundary happens when

1143
01:07:16,160 --> 01:07:20,100
this quantity is pulled to 0.

1144
01:07:20,100 --> 01:07:22,620
When this input is greater
than 0, the neuron fires.

1145
01:07:22,620 --> 01:07:25,470
When this input is less
than 0, it doesn't fire.

1146
01:07:25,470 --> 01:07:28,135
So what does that look like?

1147
01:07:28,135 --> 01:07:29,760
So you can see the
decision boundary is

1148
01:07:29,760 --> 01:07:32,180
when w.u minus theta equals 0.

1149
01:07:32,180 --> 01:07:33,840
Does anyone know what that is?

1150
01:07:37,688 --> 01:07:40,420
Remember, u is our input space.

1151
01:07:40,420 --> 01:07:43,600
That's what we're asking,
where is this decision

1152
01:07:43,600 --> 01:07:45,790
boundary in the input space.

1153
01:07:45,790 --> 01:07:48,610
w is some weights that
are fixed right now,

1154
01:07:48,610 --> 01:07:51,860
but we're gradually going
to change them later.

1155
01:07:51,860 --> 01:07:55,290
So what is that an equation for?

1156
01:07:55,290 --> 01:07:57,930
It's a line.

1157
01:07:57,930 --> 01:07:59,310
That's an equation for a line.

1158
01:07:59,310 --> 01:08:07,880
If u is our input, you
can see w.u equals theta.

1159
01:08:07,880 --> 01:08:11,520
That's an equation
for a line, base of u.

1160
01:08:11,520 --> 01:08:14,750
The slope and
position of that line

1161
01:08:14,750 --> 01:08:20,109
are controlled by the weights
w and the threshold theta.

1162
01:08:20,109 --> 01:08:26,140
So you can see this is w1,
u1, plus w2, u2 equals theta.

1163
01:08:26,140 --> 01:08:30,880
In the space of u1 and
u2, that's just a line.

1164
01:08:30,880 --> 01:08:34,600
So let's look at the case
where theta equals 0.

1165
01:08:34,600 --> 01:08:39,410
You can see that if you have
this input space, u1 and u2,

1166
01:08:39,410 --> 01:08:44,890
if you take a particular
input u and dot it into w--

1167
01:08:44,890 --> 01:08:48,720
so let's just pick a w in
some random direction--

1168
01:08:48,720 --> 01:08:52,270
the neuron fires when the
projection of u along w

1169
01:08:52,270 --> 01:08:52,970
is positive.

1170
01:08:52,970 --> 01:08:56,439
So you can see here, the
projection of u along w

1171
01:08:56,439 --> 01:09:00,020
is positive.

1172
01:09:00,020 --> 01:09:02,870
So in this case for this
u the neuron will fire.

1173
01:09:05,800 --> 01:09:11,350
So any u that has a
positive projection along w

1174
01:09:11,350 --> 01:09:14,260
will make the neuron spike.

1175
01:09:14,260 --> 01:09:17,500
So you can see that
all of these inputs

1176
01:09:17,500 --> 01:09:20,390
will make the neuron spike.

1177
01:09:20,390 --> 01:09:24,423
All of these inputs will
make the neuron not spike.

1178
01:09:24,423 --> 01:09:26,300
Does that make sense?

1179
01:09:26,300 --> 01:09:30,430
So you can see that the
decision boundary, this boundary

1180
01:09:30,430 --> 01:09:32,560
between the inputs
that make the neuron

1181
01:09:32,560 --> 01:09:37,390
spike and the inputs that
don't make the neuron spike,

1182
01:09:37,390 --> 01:09:42,245
is a line that's
orthogonal to w.

1183
01:09:42,245 --> 01:09:43,120
Does that make sense?

1184
01:09:47,490 --> 01:09:50,930
Because you can see
that any u, any input,

1185
01:09:50,930 --> 01:09:54,050
along this line will
have zero projection,

1186
01:09:54,050 --> 01:09:55,900
will be orthogonal to w.

1187
01:09:55,900 --> 01:09:57,560
Will have zero projection.

1188
01:09:57,560 --> 01:10:02,840
And that's going to correspond
to that decision boundary.

1189
01:10:06,940 --> 01:10:12,230
So let's just look
at a couple of cases.

1190
01:10:12,230 --> 01:10:17,920
So here a set of points that
correspond to our non-dogs.

1191
01:10:17,920 --> 01:10:20,570
Here are a set of points
that correspond to our dog.

1192
01:10:20,570 --> 01:10:23,740
You can see that if you have
a w in this direction, that

1193
01:10:23,740 --> 01:10:26,500
produces a decision boundary
that nicely separates

1194
01:10:26,500 --> 01:10:28,980
the dogs from the non-dogs.

1195
01:10:28,980 --> 01:10:32,590
So what is that w?
that w is 1, comma, 0.

1196
01:10:32,590 --> 01:10:36,790
And we're going to consider
the case where theta is 0.

1197
01:10:36,790 --> 01:10:38,140
Let's look at this case here.

1198
01:10:38,140 --> 01:10:40,210
So you can see that
here are all the dogs.

1199
01:10:40,210 --> 01:10:41,590
Here are all the non-dogs.

1200
01:10:41,590 --> 01:10:44,080
You can see that if you drew
a line in this direction,

1201
01:10:44,080 --> 01:10:47,380
that would be a good
decision boundary

1202
01:10:47,380 --> 01:10:49,330
for that classification problem.

1203
01:10:49,330 --> 01:10:51,940
You can see that
a w corresponding

1204
01:10:51,940 --> 01:10:56,200
to solving that problem
is 1, comma, minus 1,

1205
01:10:56,200 --> 01:10:57,420
and theta equals 0.

1206
01:11:02,990 --> 01:11:06,430
Let's look at the case
where theta is not 0.

1207
01:11:06,430 --> 01:11:09,730
So here we have w.u minus theta.

1208
01:11:09,730 --> 01:11:13,660
When theta is not 0, then
the decision boundary is w.u

1209
01:11:13,660 --> 01:11:15,760
equals some non-zero theta.

1210
01:11:15,760 --> 01:11:16,870
That's also a line.

1211
01:11:16,870 --> 01:11:20,150
It's a equation for a line.

1212
01:11:20,150 --> 01:11:22,240
When theta is 0, that
decision boundary

1213
01:11:22,240 --> 01:11:23,980
goes through the origin.

1214
01:11:23,980 --> 01:11:26,310
When theta is not 0,
the decision boundary

1215
01:11:26,310 --> 01:11:29,050
is offset from the origin.

1216
01:11:29,050 --> 01:11:31,840
So we could see that
when we had theta is 0,

1217
01:11:31,840 --> 01:11:33,850
the decision boundary--
that network only

1218
01:11:33,850 --> 01:11:37,900
works if the decision boundary
is going through the origin.

1219
01:11:37,900 --> 01:11:40,570
In general, though, we can put
the decision boundary anywhere

1220
01:11:40,570 --> 01:11:44,740
we want by having
this non-zero theta.

1221
01:11:44,740 --> 01:11:46,170
So here's an example.

1222
01:11:46,170 --> 01:11:48,740
Here are a set of points
that are the dogs.

1223
01:11:48,740 --> 01:11:52,390
Here are a set of points
that are the non-dogs.

1224
01:11:52,390 --> 01:11:55,030
If we wanted to
design a network that

1225
01:11:55,030 --> 01:11:57,640
separates the dogs
from the non-dogs,

1226
01:11:57,640 --> 01:12:00,820
we could just draw a line
that cleanly separates

1227
01:12:00,820 --> 01:12:03,880
the green from the red dots.

1228
01:12:03,880 --> 01:12:07,120
And now we can calculate
w that gives us

1229
01:12:07,120 --> 01:12:08,740
that decision boundary.

1230
01:12:08,740 --> 01:12:10,460
How do we do that?

1231
01:12:10,460 --> 01:12:13,360
So the decision boundary
is w minus u.theta.

1232
01:12:13,360 --> 01:12:15,250
Let's say that we
want to calculate

1233
01:12:15,250 --> 01:12:17,730
this weight vector w1 and w2.

1234
01:12:17,730 --> 01:12:22,470
And let's just say that our
neuron has a threshold of 1.

1235
01:12:22,470 --> 01:12:25,080
So we can see that we have
two points on the decision

1236
01:12:25,080 --> 01:12:26,250
boundary.

1237
01:12:26,250 --> 01:12:30,660
We have one point here,
a, comma, 0, right there.

1238
01:12:30,660 --> 01:12:34,380
We have another point
here, 0, comma, b.

1239
01:12:34,380 --> 01:12:36,960
And we can calculate
the decision boundary

1240
01:12:36,960 --> 01:12:43,200
using ua.w equals theta,
ub.w equals theta.

1241
01:12:43,200 --> 01:12:50,040
That's two equations and
two unknowns, w1 and w2.

1242
01:12:50,040 --> 01:12:57,090
So if I gave you a set of points
and I said calculate a weight

1243
01:12:57,090 --> 01:13:02,040
for this perceptron that will
separate one set of points

1244
01:13:02,040 --> 01:13:06,370
from another set of
points, and I give you

1245
01:13:06,370 --> 01:13:10,200
a theta for the output
neuron, all you have to do

1246
01:13:10,200 --> 01:13:12,960
is draw a line that
separates them,

1247
01:13:12,960 --> 01:13:15,690
and then solve those
two equations to get

1248
01:13:15,690 --> 01:13:20,632
w1 and w2 for that network.

1249
01:13:20,632 --> 01:13:24,070
It's very easy to do
this in two dimensions.

1250
01:13:24,070 --> 01:13:26,920
You can just draw a
line and calculate

1251
01:13:26,920 --> 01:13:32,260
the w that corresponds to
that decision boundary.

1252
01:13:32,260 --> 01:13:34,290
Any questions about that?

1253
01:13:34,290 --> 01:13:36,860
Just that, if you
have questions,

1254
01:13:36,860 --> 01:13:40,120
you should ask because
that's going to be

1255
01:13:40,120 --> 01:13:41,450
a question you ought to solve.

1256
01:13:47,000 --> 01:13:50,450
So you can see in two dimensions
you can just look at the data,

1257
01:13:50,450 --> 01:13:52,760
decide where's the decision
boundary, draw a line,

1258
01:13:52,760 --> 01:13:55,790
and calculate the weights w.

1259
01:13:55,790 --> 01:13:59,150
But in higher dimensions,
it's a really hard problem.

1260
01:13:59,150 --> 01:14:03,380
In high dimensions,
first of all,

1261
01:14:03,380 --> 01:14:07,190
remember in high dimensions
you've got images.

1262
01:14:07,190 --> 01:14:10,460
Each pixel in that image
is a different dimension

1263
01:14:10,460 --> 01:14:13,400
in the classification problem.

1264
01:14:13,400 --> 01:14:17,750
So how do you write
down a set of weights?

1265
01:14:17,750 --> 01:14:22,430
So imagine that's an
image, that's an image.

1266
01:14:22,430 --> 01:14:24,560
And you want to find
a set of weights

1267
01:14:24,560 --> 01:14:27,530
so that this neuron fires
when you have the dog,

1268
01:14:27,530 --> 01:14:30,170
but doesn't fire when
you have the cat.

1269
01:14:30,170 --> 01:14:32,110
That's a really hard problem.

1270
01:14:32,110 --> 01:14:34,130
You can't look at
those things and decide

1271
01:14:34,130 --> 01:14:35,600
what that w should be.

1272
01:14:38,950 --> 01:14:48,740
So there's a way of taking
inputs and taking the answer,

1273
01:14:48,740 --> 01:14:52,020
like a 1 for a dog
and a 0 for non-dogs,

1274
01:14:52,020 --> 01:14:55,440
and actually finding a set
of weights that will properly

1275
01:14:55,440 --> 01:14:57,510
classify those inputs.

1276
01:14:57,510 --> 01:15:00,300
And that's called the
perceptron learning rule.

1277
01:15:00,300 --> 01:15:05,580
And we're going to talk about
that in the next lecture.

1278
01:15:05,580 --> 01:15:08,070
So that's what we did today.

1279
01:15:08,070 --> 01:15:10,170
And we're going to
continue working

1280
01:15:10,170 --> 01:15:14,880
on developing methods
for understanding

1281
01:15:14,880 --> 01:15:17,930
neural networks next time.