1
00:00:17,888 --> 00:00:18,555
[AUDIO PLAYBACK]

2
00:00:18,555 --> 00:00:19,472
- Good morning, class.

3
00:00:19,472 --> 00:00:20,190
[END PLAYBACK]

4
00:00:20,190 --> 00:00:22,300
MICHALE FEE: Hey, let's
go ahead and get started.

5
00:00:22,300 --> 00:00:25,150
So we're going to finish
spectral analysis today.

6
00:00:25,150 --> 00:00:32,369
So we are going to learn how to
make a graphical representation

7
00:00:32,369 --> 00:00:36,780
like this of the spectral and
temporal structure of time

8
00:00:36,780 --> 00:00:39,840
series, or in this case,
a speech signal recorded

9
00:00:39,840 --> 00:00:42,060
on a microphone.

10
00:00:42,060 --> 00:00:44,370
Well, actually let
me just tell you

11
00:00:44,370 --> 00:00:46,580
exactly what it is that
we're looking at here.

12
00:00:46,580 --> 00:00:52,080
So this is a spectrogram that
displays the amount of power

13
00:00:52,080 --> 00:00:55,830
in this signal as
a function of time

14
00:00:55,830 --> 00:00:57,430
and as a function of frequency.

15
00:00:57,430 --> 00:01:02,160
So you remember we've been
learning how to construct

16
00:01:02,160 --> 00:01:04,500
the spectrum of a signal.

17
00:01:04,500 --> 00:01:06,930
And today, we're
going to learn how

18
00:01:06,930 --> 00:01:11,600
to construct a representation
like this, called

19
00:01:11,600 --> 00:01:15,880
a spectrogram, that shows how
that spectrum varies over time.

20
00:01:15,880 --> 00:01:17,610
So as you recall,
we have learned

21
00:01:17,610 --> 00:01:21,970
how to compute the Fourier
transform of a signal.

22
00:01:21,970 --> 00:01:26,010
This is one of the signals
that we actually started with.

23
00:01:26,010 --> 00:01:29,820
So if you compute the Fourier
transform of this square wave,

24
00:01:29,820 --> 00:01:32,710
you can see that in
the frequency domain,

25
00:01:32,710 --> 00:01:34,830
now we plot the amount
of, essentially,

26
00:01:34,830 --> 00:01:39,090
the components of this signal
at different frequencies.

27
00:01:39,090 --> 00:01:41,190
So the Fourier transform
of this square wave

28
00:01:41,190 --> 00:01:43,350
has a number of peaks.

29
00:01:43,350 --> 00:01:49,350
Each of these peaks correspond
to a cosine contribution

30
00:01:49,350 --> 00:01:54,020
to this time series, OK?

31
00:01:54,020 --> 00:01:54,520
All right.

32
00:01:54,520 --> 00:01:56,740
And we also
discussed how you can

33
00:01:56,740 --> 00:01:59,890
compute the power
spectrum of a signal

34
00:01:59,890 --> 00:02:01,520
from the Fourier transform.

35
00:02:01,520 --> 00:02:04,510
So here what I've done
is I've taken the Fourier

36
00:02:04,510 --> 00:02:07,120
transform of this square wave.

37
00:02:07,120 --> 00:02:09,699
And now we take the
square magnitude

38
00:02:09,699 --> 00:02:11,830
of each of these
values, and we just

39
00:02:11,830 --> 00:02:14,380
plot the spectrum,
the square magnitude

40
00:02:14,380 --> 00:02:16,720
of just the positive
frequency components.

41
00:02:16,720 --> 00:02:21,640
For real-valued functions, the
power spectrum is symmetric.

42
00:02:21,640 --> 00:02:23,620
The power in each of these--

43
00:02:23,620 --> 00:02:28,366
at each of these frequencies
in the positive, half of the--

44
00:02:28,366 --> 00:02:31,390
for the positive frequencies
is exactly the same

45
00:02:31,390 --> 00:02:34,100
as the power in the
negative frequencies.

46
00:02:34,100 --> 00:02:37,060
So if we plot the power
spectrum of that square wave,

47
00:02:37,060 --> 00:02:39,430
we can see that there
are multiple peaks

48
00:02:39,430 --> 00:02:41,620
at regular intervals.

49
00:02:41,620 --> 00:02:46,000
Now the problem with plotting
power spectra on a linear scale

50
00:02:46,000 --> 00:02:48,520
here is that you often
have contributions--

51
00:02:48,520 --> 00:02:51,910
important contributions to
signals that actually have

52
00:02:51,910 --> 00:02:55,180
a very small amount of
power when you plot them

53
00:02:55,180 --> 00:02:56,120
on a linear scale.

54
00:02:56,120 --> 00:02:58,180
And so you can barely see them.

55
00:02:58,180 --> 00:03:00,580
You can barely see
those contributions

56
00:03:00,580 --> 00:03:04,070
at these frequencies
here on a linear scale.

57
00:03:04,070 --> 00:03:05,740
But if you plot
this on a log scale,

58
00:03:05,740 --> 00:03:09,010
you can see the spectrum
much more easily.

59
00:03:09,010 --> 00:03:10,570
So for example,
what we've done here

60
00:03:10,570 --> 00:03:13,510
is we've plotted the square
magnitude of the Fourier

61
00:03:13,510 --> 00:03:19,840
transform, taken the log
base 10 of that spectrum--

62
00:03:19,840 --> 00:03:23,600
spectrum is the square magnitude
of the Fourier transform.

63
00:03:23,600 --> 00:03:28,520
And now we can take the log
base 10 of the power spectrum

64
00:03:28,520 --> 00:03:32,830
to get the power
in units of bels,

65
00:03:32,830 --> 00:03:37,090
and multiply that by 10 to get
the power in units of decibels,

66
00:03:37,090 --> 00:03:37,840
OK?

67
00:03:37,840 --> 00:03:43,210
So each tick mark here
of size 10 corresponds

68
00:03:43,210 --> 00:03:46,540
to one order of
magnitude in power, OK?

69
00:03:46,540 --> 00:03:51,400
So this peak here is about 10
dB lower than that peak there,

70
00:03:51,400 --> 00:03:56,960
and that corresponds to about
a factor of 10 lower in power.

71
00:03:56,960 --> 00:03:58,250
OK, any questions about that?

72
00:03:58,250 --> 00:04:01,420
I want to be able-- want
you to understand what

73
00:04:01,420 --> 00:04:03,130
these units of decibels are.

74
00:04:03,130 --> 00:04:05,200
They're going to be on the test.

75
00:04:05,200 --> 00:04:06,040
OK.

76
00:04:06,040 --> 00:04:07,030
Questions about that?

77
00:04:07,030 --> 00:04:08,925
You want to just
ask me right now?

78
00:04:08,925 --> 00:04:10,380
OK.

79
00:04:10,380 --> 00:04:12,930
Remember this.

80
00:04:12,930 --> 00:04:14,050
OK.

81
00:04:14,050 --> 00:04:18,050
And keep in mind that
the power in a signal

82
00:04:18,050 --> 00:04:21,649
is proportional to the
square of the amplitude, OK?

83
00:04:21,649 --> 00:04:26,120
So if I tell you that a signal
has 10 times as much amplitude,

84
00:04:26,120 --> 00:04:28,960
it's going to have 100
times as much power.

85
00:04:28,960 --> 00:04:35,500
100 times as much
power is 10 to the 2

86
00:04:35,500 --> 00:04:38,083
bels, which is 20 decibels.

87
00:04:38,083 --> 00:04:39,710
Does that make sense?

88
00:04:39,710 --> 00:04:40,680
OK.

89
00:04:40,680 --> 00:04:41,180
All right.

90
00:04:41,180 --> 00:04:44,960
So we also talked about
some Fourier transforms

91
00:04:44,960 --> 00:04:46,530
of different kinds of functions.

92
00:04:46,530 --> 00:04:49,280
So this is the Fourier
transform of a square pulse.

93
00:04:49,280 --> 00:04:50,910
So here I showed
you a square pulse

94
00:04:50,910 --> 00:04:53,460
that has a width of
100 milliseconds.

95
00:04:53,460 --> 00:04:56,300
The Fourier transform
is this sinc function,

96
00:04:56,300 --> 00:05:00,180
and for a square pulse of
width 100 milliseconds,

97
00:05:00,180 --> 00:05:02,120
the sinc function has a half--

98
00:05:02,120 --> 00:05:07,400
sorry-- has a full width
at half height of 12 hertz.

99
00:05:07,400 --> 00:05:11,330
If we have a square pulse
that's five times as long,

100
00:05:11,330 --> 00:05:14,990
500 milliseconds long,
the Fourier transform

101
00:05:14,990 --> 00:05:17,150
is the sinc function
again, but it

102
00:05:17,150 --> 00:05:21,560
has a width of this central
lobe here of 2.4 hertz.

103
00:05:21,560 --> 00:05:25,010
So you can see that the longer
the pulse, the shorter--

104
00:05:25,010 --> 00:05:28,320
the narrower the structure
in the frequency domain.

105
00:05:28,320 --> 00:05:30,530
Up here, if we'd look
at the Fourier transform

106
00:05:30,530 --> 00:05:34,640
of a square pulse that's
25 milliseconds long,

107
00:05:34,640 --> 00:05:37,460
then the Fourier transform
is again a sinc function,

108
00:05:37,460 --> 00:05:40,310
and the width of that
central lobe is 48 hertz.

109
00:05:40,310 --> 00:05:43,370
So you can see that the
width in the time domain

110
00:05:43,370 --> 00:05:46,830
and the width in the frequency
domain are inversely related.

111
00:05:46,830 --> 00:05:49,370
So the product of the
width in the time domain

112
00:05:49,370 --> 00:05:52,760
and the width in the
frequency domain is constant.

113
00:05:52,760 --> 00:05:56,820
And that constant is called
the time-bandwidth product

114
00:05:56,820 --> 00:05:58,360
of that signal.

115
00:05:58,360 --> 00:06:02,610
The time-bandwidth product
of this square pulse

116
00:06:02,610 --> 00:06:05,170
and sinc function is constant.

117
00:06:05,170 --> 00:06:06,420
It's independent of the width.

118
00:06:06,420 --> 00:06:09,980
The time-bandwidth product
is a function of that.

119
00:06:09,980 --> 00:06:16,084
It's a characteristic
of that functional form.

120
00:06:16,084 --> 00:06:18,270
OK.

121
00:06:18,270 --> 00:06:21,110
Now we also talked about
the convolution theorem,

122
00:06:21,110 --> 00:06:25,430
which relates the way signals
get multiplied in time

123
00:06:25,430 --> 00:06:27,840
or convolved in
frequency domain.

124
00:06:27,840 --> 00:06:31,910
So for example, if we have a
square pulse in time multiplied

125
00:06:31,910 --> 00:06:35,780
by a cosine function in time
to get a windowed cosine--

126
00:06:35,780 --> 00:06:37,670
so this function
is zero everywhere

127
00:06:37,670 --> 00:06:41,760
except it's cosine
within this window--

128
00:06:41,760 --> 00:06:43,700
we can compute the
Fourier transform

129
00:06:43,700 --> 00:06:49,020
of this windowed cosine function
by convolving the Fourier

130
00:06:49,020 --> 00:06:51,210
transform of the square
pulse with the Fourier

131
00:06:51,210 --> 00:06:54,272
transform of the cosine
function, like this.

132
00:06:54,272 --> 00:06:56,105
So the Fourier transform
of the square pulse

133
00:06:56,105 --> 00:06:57,980
is, again, this sinc function.

134
00:06:57,980 --> 00:07:00,860
The Fourier transform
of the cosine

135
00:07:00,860 --> 00:07:03,720
are these two delta functions.

136
00:07:03,720 --> 00:07:06,680
Now if we convolve the sinc
function with those two delta

137
00:07:06,680 --> 00:07:10,160
functions, we get a copy
of that sinc function

138
00:07:10,160 --> 00:07:13,560
at the location of each
of those delta functions.

139
00:07:13,560 --> 00:07:16,828
And that is the
Fourier transform, OK?

140
00:07:16,828 --> 00:07:17,870
Any questions about that?

141
00:07:17,870 --> 00:07:21,770
Just a quick review of things
we've been talking about.

142
00:07:21,770 --> 00:07:24,220
All right.

143
00:07:24,220 --> 00:07:26,920
So we can look at this
Fourier transform here.

144
00:07:26,920 --> 00:07:28,870
We can look at
the power spectrum

145
00:07:28,870 --> 00:07:33,560
of this windowed cosine
function, like this.

146
00:07:33,560 --> 00:07:35,740
So there's the windowed
cosine function.

147
00:07:35,740 --> 00:07:38,410
The power spectrum--
the power spectrum

148
00:07:38,410 --> 00:07:42,310
plotted on a linear scale
is just the square magnitude

149
00:07:42,310 --> 00:07:44,440
of what I've plotted here.

150
00:07:44,440 --> 00:07:47,870
And we're just going to plot
the positive frequencies.

151
00:07:47,870 --> 00:07:50,230
That's what the power
spectrum of that signal

152
00:07:50,230 --> 00:07:52,210
looks like on a log scale.

153
00:07:52,210 --> 00:07:54,730
So you can see
that it has a peak

154
00:07:54,730 --> 00:07:57,310
at 20 hertz, which
was the frequency

155
00:07:57,310 --> 00:07:59,550
of the cosine function.

156
00:07:59,550 --> 00:08:01,790
You see some little
wiggles out here.

157
00:08:01,790 --> 00:08:03,400
But if you look on
a log scale, you

158
00:08:03,400 --> 00:08:05,830
can see that those
wiggles off to the side

159
00:08:05,830 --> 00:08:07,940
are actually quite significant.

160
00:08:07,940 --> 00:08:12,370
The first side lobe there
has a power that's about 1/10

161
00:08:12,370 --> 00:08:13,840
of the central peak.

162
00:08:13,840 --> 00:08:16,368
That may not matter,
sometimes, when

163
00:08:16,368 --> 00:08:18,160
you're looking at the
spectrum of a signal,

164
00:08:18,160 --> 00:08:21,640
but sometimes it will matter
because those side lobes

165
00:08:21,640 --> 00:08:23,560
there will interfere.

166
00:08:23,560 --> 00:08:28,900
They'll mask the spectrum of
other components of this signal

167
00:08:28,900 --> 00:08:32,600
that you may be interested in.

168
00:08:32,600 --> 00:08:35,470
We also talked about
how this spectrum

169
00:08:35,470 --> 00:08:39,620
depends on the function that you
multiply your cosine by here.

170
00:08:39,620 --> 00:08:42,549
So for example, if
you take a cosine

171
00:08:42,549 --> 00:08:47,020
and you multiply it by
Gaussian, the power spectrum

172
00:08:47,020 --> 00:08:49,450
has the shape of
a Gaussian, it's

173
00:08:49,450 --> 00:08:53,080
a Gaussian that has
a peak at 20 hertz.

174
00:08:53,080 --> 00:08:55,450
And if you look at the--

175
00:08:55,450 --> 00:08:57,310
if you look at that
spectrum on a log scale,

176
00:08:57,310 --> 00:09:00,580
you can see that it loses it--

177
00:09:00,580 --> 00:09:03,710
you've lost all of these
high-frequency wiggles,

178
00:09:03,710 --> 00:09:04,450
up here.

179
00:09:04,450 --> 00:09:08,110
All of those wiggles
come from the sharp edge

180
00:09:08,110 --> 00:09:12,040
of this square pulse
windowing function, OK?

181
00:09:12,040 --> 00:09:15,430
So the shape of the
spectrum that you get

182
00:09:15,430 --> 00:09:19,180
depends a lot on how
you window the function

183
00:09:19,180 --> 00:09:21,630
that you're looking at.

184
00:09:21,630 --> 00:09:23,560
Questions about that?

185
00:09:23,560 --> 00:09:25,360
Right, again, more review.

186
00:09:25,360 --> 00:09:25,860
OK.

187
00:09:25,860 --> 00:09:29,110
So we talked about estimating
the spectrum of a signal.

188
00:09:29,110 --> 00:09:32,260
If you have many different
measurements of some signal,

189
00:09:32,260 --> 00:09:35,380
you can actually just compute
the spectrum of each one.

190
00:09:35,380 --> 00:09:38,590
This little hat here means
an estimate of the spectrum.

191
00:09:38,590 --> 00:09:41,380
You compute some estimate
of the spectrum of each

192
00:09:41,380 --> 00:09:45,080
of those trials,
samples of your data,

193
00:09:45,080 --> 00:09:48,154
and you can just average
of those together.

194
00:09:48,154 --> 00:09:52,590
OK, now if you have a continuous
signal, you can also--

195
00:09:52,590 --> 00:09:54,450
you could estimate
the spectrum just

196
00:09:54,450 --> 00:09:58,380
by taking the Fourier
transform of a long recording

197
00:09:58,380 --> 00:09:59,370
of your signal.

198
00:09:59,370 --> 00:10:03,210
But it's much better to break
your signal into small pieces,

199
00:10:03,210 --> 00:10:06,630
compute a spectral estimate of
each one of those small pieces

200
00:10:06,630 --> 00:10:08,440
and average those together.

201
00:10:08,440 --> 00:10:12,570
Now how do you construct a
small sample of a signal?

202
00:10:12,570 --> 00:10:15,480
If you have a continuous
signal, how do you

203
00:10:15,480 --> 00:10:17,400
take a small sample of it?

204
00:10:17,400 --> 00:10:19,350
Well, you can think
about that as taking

205
00:10:19,350 --> 00:10:21,780
your continuous signal
and multiplying it

206
00:10:21,780 --> 00:10:23,760
by a square window.

207
00:10:23,760 --> 00:10:26,550
Setting everything outside
that window to zero

208
00:10:26,550 --> 00:10:28,993
and just keeping the part
that's in that window.

209
00:10:28,993 --> 00:10:30,660
And you know that
when you take a signal

210
00:10:30,660 --> 00:10:34,200
and you multiply it by a square
window, what have you done?

211
00:10:34,200 --> 00:10:38,250
You've convolved the spectrum
of this original signal

212
00:10:38,250 --> 00:10:41,130
with the spectrum of
this square pulse.

213
00:10:41,130 --> 00:10:43,290
And that spectrum
of the square pulse

214
00:10:43,290 --> 00:10:45,900
is really a nasty
looking thing, right?

215
00:10:45,900 --> 00:10:49,590
It is this what we call the
Dirichlet kernel, which is just

216
00:10:49,590 --> 00:10:51,660
the power spectrum
of a square pulse

217
00:10:51,660 --> 00:10:53,820
that we just talked about, OK?

218
00:10:53,820 --> 00:10:55,680
So that's called the
Dirichlet kernel.

219
00:10:55,680 --> 00:11:03,240
And using a square pulse to
select out a sample of data

220
00:11:03,240 --> 00:11:07,680
introduces two errors into
your spectral estimate,

221
00:11:07,680 --> 00:11:09,330
narrowband bias.

222
00:11:09,330 --> 00:11:14,190
It broadens your estimate of
the spectrum of, let's say,

223
00:11:14,190 --> 00:11:17,220
sinusoidal or periodic
components in your signal.

224
00:11:17,220 --> 00:11:20,980
And it also introduces
these side lobes.

225
00:11:20,980 --> 00:11:23,650
So the way we solve
that problem is

226
00:11:23,650 --> 00:11:26,200
we break our signal
into little pieces,

227
00:11:26,200 --> 00:11:28,240
multiply each of
those little pieces

228
00:11:28,240 --> 00:11:31,000
by a smoother windowing
function by something

229
00:11:31,000 --> 00:11:33,280
that isn't a square
pulse, multiply it

230
00:11:33,280 --> 00:11:36,700
by something that maybe
looks like a Gaussian, or a--

231
00:11:36,700 --> 00:11:41,700
or half of a cosine function.

232
00:11:41,700 --> 00:11:46,620
That gives us what we call
tapered segments of our data.

233
00:11:46,620 --> 00:11:50,280
We can estimate the spectrum
of those tapered pieces

234
00:11:50,280 --> 00:11:52,802
and averaged those together, OK?

235
00:11:52,802 --> 00:11:54,440
Any questions?

236
00:11:54,440 --> 00:11:56,300
OK, again, that's a review.

237
00:11:56,300 --> 00:11:58,310
And I showed you briefly
what happens if we

238
00:11:58,310 --> 00:12:00,230
take a little piece of signal.

239
00:12:00,230 --> 00:12:03,500
The blue is white
noise with a little bit

240
00:12:03,500 --> 00:12:09,090
of this periodic sine
function added to it.

241
00:12:09,090 --> 00:12:11,570
And if you run
that analysis, you

242
00:12:11,570 --> 00:12:14,960
can see that there is a large
component of the spectrum

243
00:12:14,960 --> 00:12:16,880
that's due to the white noise.

244
00:12:16,880 --> 00:12:19,250
That's this broadband
component here.

245
00:12:19,250 --> 00:12:22,700
And that sinusoidal
component there gives you

246
00:12:22,700 --> 00:12:26,920
this peak in the spectrum, OK?

247
00:12:26,920 --> 00:12:28,260
And there's a--

248
00:12:28,260 --> 00:12:30,820
I've posted-- or Daniel's
posted a function

249
00:12:30,820 --> 00:12:35,350
called wspec.m that implements
this spectral estimate

250
00:12:35,350 --> 00:12:36,220
like this.

251
00:12:36,220 --> 00:12:41,560
So now today, we're going to
turn to estimating time varying

252
00:12:41,560 --> 00:12:44,500
signals, estimating the spectrum
of time varying signals.

253
00:12:44,500 --> 00:12:48,240
So this is a microphone
recording of a speech signal.

254
00:12:48,240 --> 00:12:50,103
Let me see if I can play that.

255
00:12:50,103 --> 00:12:50,770
[AUDIO PLAYBACK]

256
00:12:50,770 --> 00:12:51,435
- Hello.

257
00:12:51,435 --> 00:12:51,850
[END PLAYBACK]

258
00:12:51,850 --> 00:12:52,808
MICHALE FEE: All right.

259
00:12:52,808 --> 00:12:56,510
So that's just me saying
hello in a robotic voice.

260
00:12:56,510 --> 00:12:57,010
OK.

261
00:12:57,010 --> 00:13:00,490
So that is the signal.

262
00:13:00,490 --> 00:13:02,890
That's basically
voltage recorded

263
00:13:02,890 --> 00:13:04,900
on the output of a microphone.

264
00:13:04,900 --> 00:13:07,180
It's got some interesting
structure in it, right?

265
00:13:07,180 --> 00:13:13,000
So first these
little pulses here,

266
00:13:13,000 --> 00:13:15,470
you see this kind
of periodic pulse.

267
00:13:15,470 --> 00:13:16,840
Those are called glottal pulses.

268
00:13:16,840 --> 00:13:19,330
Does anyone know what those are?

269
00:13:19,330 --> 00:13:22,745
What produces those?

270
00:13:22,745 --> 00:13:24,236
No?

271
00:13:24,236 --> 00:13:28,580
OK, so when you
speak a voiced sound,

272
00:13:28,580 --> 00:13:30,170
your vocal cords are vibrating.

273
00:13:30,170 --> 00:13:35,090
You have two pieces
of flexible tissue

274
00:13:35,090 --> 00:13:38,300
that are close to each
other in your trachea.

275
00:13:38,300 --> 00:13:40,880
As air flows up
through your trachea,

276
00:13:40,880 --> 00:13:46,250
the air pressure builds up and
pushes the glottal folds apart.

277
00:13:46,250 --> 00:13:51,050
Air begins to flow rapidly
through that open space.

278
00:13:51,050 --> 00:13:54,500
At high velocities,
the velocity flowing

279
00:13:54,500 --> 00:13:57,110
through the constriction is
higher than the velocity of air

280
00:13:57,110 --> 00:13:59,180
anywhere else in the
trachea because it's flowing

281
00:13:59,180 --> 00:14:01,160
through a tiny little space.

282
00:14:01,160 --> 00:14:04,520
At high velocities,
at constrictions

283
00:14:04,520 --> 00:14:06,800
where you have a
high fluid flow,

284
00:14:06,800 --> 00:14:09,140
the pressure actually drops.

285
00:14:09,140 --> 00:14:14,660
And that pulls the vocal
folds back together again.

286
00:14:14,660 --> 00:14:17,750
When they snap together,
all the airflow stops,

287
00:14:17,750 --> 00:14:22,080
and you have a pulse of negative
pressure above the glottis,

288
00:14:22,080 --> 00:14:22,580
right?

289
00:14:22,580 --> 00:14:25,320
Imagine you have
airflow coming up.

290
00:14:25,320 --> 00:14:27,260
And all of a sudden,
you pinch it off.

291
00:14:27,260 --> 00:14:30,410
There's a sudden drop in the
pressure as that mass of air

292
00:14:30,410 --> 00:14:33,650
keeps flowing up, but there's
nothing more coming up below.

293
00:14:33,650 --> 00:14:36,380
So you get a sharp
drop in the pressure.

294
00:14:36,380 --> 00:14:38,750
Then the air pressure
builds up again.

295
00:14:38,750 --> 00:14:41,810
The glottal folds open,
velocity increases,

296
00:14:41,810 --> 00:14:43,410
and they snap shut again.

297
00:14:43,410 --> 00:14:46,460
And so that's what happens
as you're talking, OK?

298
00:14:46,460 --> 00:14:51,740
And so that periodic
signal right there, those

299
00:14:51,740 --> 00:14:54,200
pulses in pressure--
the microphone is

300
00:14:54,200 --> 00:14:55,980
recording pressure, remember.

301
00:14:55,980 --> 00:15:00,200
So those pulses are due
to your glottis snapping

302
00:15:00,200 --> 00:15:06,290
shut each time it
closes during the cycle,

303
00:15:06,290 --> 00:15:08,390
during that oscillatory cycle.

304
00:15:08,390 --> 00:15:11,060
The period of that--
those glottal pulses

305
00:15:11,060 --> 00:15:16,550
is about 10 milliseconds in
men and about 5 milliseconds

306
00:15:16,550 --> 00:15:17,420
in women.

307
00:15:17,420 --> 00:15:17,930
OK.

308
00:15:17,930 --> 00:15:21,440
But you can see there's
a lot of other structure

309
00:15:21,440 --> 00:15:23,780
changes in this signal
that go on through time.

310
00:15:23,780 --> 00:15:26,240
But let's start by just
looking at the spectrum

311
00:15:26,240 --> 00:15:27,560
of that whole signal.

312
00:15:27,560 --> 00:15:29,670
Now what might we expect?

313
00:15:29,670 --> 00:15:35,090
So if you have periodic pulses
at 10 millisecond period,

314
00:15:35,090 --> 00:15:38,110
what should the
spectrum look like?

315
00:15:38,110 --> 00:15:43,240
If you have a train of pulses,
let's say delta functions

316
00:15:43,240 --> 00:15:46,810
with 10 millisecond period, what
would the spectrum of that look

317
00:15:46,810 --> 00:15:49,375
like?

318
00:15:49,375 --> 00:15:52,270
Anybody remember what the
spectrum of a train of pulses

319
00:15:52,270 --> 00:15:52,770
looks like?

320
00:15:57,910 --> 00:15:58,630
Almost, yes.

321
00:15:58,630 --> 00:15:59,513
There would be.

322
00:15:59,513 --> 00:16:01,180
But there would be
other things as well.

323
00:16:04,230 --> 00:16:06,510
What would a signal
look like that just

324
00:16:06,510 --> 00:16:08,470
has a peak at 100 hertz?

325
00:16:08,470 --> 00:16:09,350
What is that?

326
00:16:11,990 --> 00:16:14,390
Has one peak at 100 hertz?

327
00:16:14,390 --> 00:16:18,740
Or let's say [INAUDIBLE]
Fourier transform would have

328
00:16:18,740 --> 00:16:22,630
a peak at 100 and at minus 100.

329
00:16:22,630 --> 00:16:24,040
That's just a cosine.

330
00:16:24,040 --> 00:16:27,070
That's not a train of pulses.

331
00:16:27,070 --> 00:16:29,350
What's the Fourier transform
of a train of pulses?

332
00:16:29,350 --> 00:16:31,900
Those of you who are
concentrating on this right

333
00:16:31,900 --> 00:16:34,130
now are going to be really
glad on the midterm.

334
00:16:34,130 --> 00:16:37,840
What's the Fourier transform
of a train of pulses?

335
00:16:37,840 --> 00:16:39,480
OK, let's go back to
here because there

336
00:16:39,480 --> 00:16:43,020
was a bit of a hint here at
the beginning of lecture.

337
00:16:43,020 --> 00:16:47,820
What's the Fourier
transform of a square wave?

338
00:16:47,820 --> 00:16:51,103
Any idea what happens if we
make these pulses narrower

339
00:16:51,103 --> 00:16:51,645
and narrower?

340
00:16:56,530 --> 00:16:59,010
The pulses get more
and more narrow,

341
00:16:59,010 --> 00:17:01,065
these peaks get
bigger and bigger.

342
00:17:01,065 --> 00:17:05,050
And as we go to a train
of delta functions,

343
00:17:05,050 --> 00:17:07,480
you just get Fourier
transform of a train

344
00:17:07,480 --> 00:17:09,510
of delta functions
in time, is just

345
00:17:09,510 --> 00:17:13,349
a train of delta
functions in frequency.

346
00:17:13,349 --> 00:17:17,210
The spacing between the
peaks in frequency is just 1

347
00:17:17,210 --> 00:17:21,130
over the spacing between
the peaks in time, right?

348
00:17:21,130 --> 00:17:22,720
Make sure you know that.

349
00:17:27,410 --> 00:17:30,530
OK, so now let's go back
to our speech signal.

350
00:17:34,967 --> 00:17:37,942
These are almost
like delta functions.

351
00:17:37,942 --> 00:17:40,150
Maybe not quite, but for
now, let's pretend they are.

352
00:17:40,150 --> 00:17:42,820
If those are a train of
delta functions spaced

353
00:17:42,820 --> 00:17:45,940
at 10 milliseconds, what is our
spectrum going to look like?

354
00:17:53,240 --> 00:17:55,500
I just said it.

355
00:17:55,500 --> 00:17:58,750
What is it going to look like?

356
00:17:58,750 --> 00:17:59,340
Yep.

357
00:17:59,340 --> 00:18:00,748
Spaced by?

358
00:18:00,748 --> 00:18:02,103
AUDIENCE: One.

359
00:18:02,103 --> 00:18:03,020
MICHALE FEE: Which is?

360
00:18:05,560 --> 00:18:06,530
100 hertz.

361
00:18:06,530 --> 00:18:07,120
Good.

362
00:18:07,120 --> 00:18:10,550
So here's the spectrum
of that speech signal.

363
00:18:10,550 --> 00:18:13,088
What do you see?

364
00:18:13,088 --> 00:18:19,150
You see a train of delta
functions separated

365
00:18:19,150 --> 00:18:22,820
by about 100 hertz, right?

366
00:18:22,820 --> 00:18:26,210
That's a kilohertz, that's
500 hertz, that's 100 hertz.

367
00:18:26,210 --> 00:18:29,960
So you get a train of
delta functions separated

368
00:18:29,960 --> 00:18:31,820
by 100 hertz, OK?

369
00:18:31,820 --> 00:18:34,922
That's called a harmonic stack.

370
00:18:34,922 --> 00:18:35,422
OK.

371
00:18:35,422 --> 00:18:38,410
And the spectrum
of a speech signal

372
00:18:38,410 --> 00:18:43,000
has a harmonic stack
because the signal has

373
00:18:43,000 --> 00:18:48,540
these short little pulses
of pressure in them.

374
00:18:48,540 --> 00:18:50,670
OK, what are these bumps here?

375
00:18:50,670 --> 00:18:53,460
Why is there a bump here, a
bump here, and a bump here?

376
00:18:53,460 --> 00:18:54,580
Does anyone know that?

377
00:19:01,984 --> 00:19:07,830
What is it that shapes
the sound as you speak?

378
00:19:07,830 --> 00:19:11,442
That makes an "ooh" sound
different from an "ahh?"

379
00:19:11,442 --> 00:19:18,180
[INAUDIBLE] This is hello.

380
00:19:18,180 --> 00:19:20,280
Sorry, I'm having
trouble with my pointer.

381
00:19:23,058 --> 00:19:24,735
That's hello.

382
00:19:28,598 --> 00:19:32,864
What is it that makes all
things sound different?

383
00:19:32,864 --> 00:19:35,500
So the sound, those
pulses, are made down

384
00:19:35,500 --> 00:19:37,300
hearing your vocal tract.

385
00:19:37,300 --> 00:19:41,620
As those pulses propagate
up from your glottis

386
00:19:41,620 --> 00:19:46,570
to your lips, they [AUDIO OUT]
filter, which is your mouth.

387
00:19:46,570 --> 00:19:48,370
And that the shape
of that filter

388
00:19:48,370 --> 00:19:52,270
is controlled by the
closure of your lips,

389
00:19:52,270 --> 00:19:58,180
by where your tongue is, where
different parts of your tongue

390
00:19:58,180 --> 00:20:03,040
are closing the
opening in your mouth.

391
00:20:03,040 --> 00:20:08,820
And all of those things produce
filters that have peaks.

392
00:20:08,820 --> 00:20:13,050
And the vocal filter
has three main peaks

393
00:20:13,050 --> 00:20:15,700
that move around as you move
the shape of your mouth.

394
00:20:15,700 --> 00:20:19,740
And those are
called formants, OK?

395
00:20:19,740 --> 00:20:20,240
OK.

396
00:20:20,240 --> 00:20:23,850
Now you can see that
this temporal structure,

397
00:20:23,850 --> 00:20:26,650
this spectral structure
isn't constant in time.

398
00:20:26,650 --> 00:20:29,440
It changes-- right--
throughout this word.

399
00:20:29,440 --> 00:20:33,380
So what we can do is we
can take that signal,

400
00:20:33,380 --> 00:20:37,180
and we can compute the spectrum
of little parts of it, OK?

401
00:20:37,180 --> 00:20:41,600
So we can take that signal
and multiply it by a window

402
00:20:41,600 --> 00:20:46,430
here, a taper here, and get
a little sample of the speech

403
00:20:46,430 --> 00:20:48,320
signal and calculate
the spectrum of it

404
00:20:48,320 --> 00:20:50,970
just by Fourier
transforming, OK?

405
00:20:50,970 --> 00:20:52,080
We can do the same thing.

406
00:20:52,080 --> 00:20:55,470
Shift it over a little bit
and compute the spectrum

407
00:20:55,470 --> 00:20:56,670
of that signal, all right?

408
00:20:56,670 --> 00:21:00,210
So we're going to take a
little piece of the signal that

409
00:21:00,210 --> 00:21:04,600
has width in time, capital T--

410
00:21:04,600 --> 00:21:06,920
OK-- that's the
width of the window.

411
00:21:06,920 --> 00:21:09,420
We're going to multiply it by
a taper, compute the spectrum.

412
00:21:09,420 --> 00:21:10,920
And we're going to
shift that window

413
00:21:10,920 --> 00:21:13,545
by a smaller amount,
delta t, so that you

414
00:21:13,545 --> 00:21:15,090
have overlapping windows.

415
00:21:15,090 --> 00:21:17,020
Compute the spectrum
of each one,

416
00:21:17,020 --> 00:21:21,111
and then stack all of
those up next to each other

417
00:21:21,111 --> 00:21:25,840
So now you've got a spectrum
that's a function of time

418
00:21:25,840 --> 00:21:29,640
and frequency, OK?

419
00:21:29,640 --> 00:21:34,830
So each column is the
spectrum of one little piece

420
00:21:34,830 --> 00:21:37,700
of the sound at
one moment in time.

421
00:21:37,700 --> 00:21:39,450
Does that make sense?

422
00:21:39,450 --> 00:21:39,950
OK.

423
00:21:39,950 --> 00:21:43,960
And that's where this
spectrogram comes from.

424
00:21:43,960 --> 00:21:45,340
Here in this
spectrogram, you can

425
00:21:45,340 --> 00:21:50,380
see these horizontal striations
are the harmonics stack

426
00:21:50,380 --> 00:21:52,450
produced by the glottal pulse.

427
00:21:52,450 --> 00:21:55,180
This is a really
key way that people

428
00:21:55,180 --> 00:21:59,620
study the mechanisms of
sound production and speech,

429
00:21:59,620 --> 00:22:04,590
and animals vocalizations,
and all kinds of signals,

430
00:22:04,590 --> 00:22:06,004
more generally, OK?

431
00:22:06,004 --> 00:22:09,210
All right, any
questions about that?

432
00:22:09,210 --> 00:22:09,940
All right.

433
00:22:09,940 --> 00:22:12,550
Now what's really cool
is that you can actually

434
00:22:12,550 --> 00:22:15,190
focus on different
things in a signal, OK?

435
00:22:15,190 --> 00:22:18,550
So for example, if I
compute the spectrogram

436
00:22:18,550 --> 00:22:21,760
with signals where that little
window that I'm choosing

437
00:22:21,760 --> 00:22:25,990
is really long, then I
have high frequency--

438
00:22:25,990 --> 00:22:28,720
high resolution and
frequency, and the spectrogram

439
00:22:28,720 --> 00:22:30,240
looks like this.

440
00:22:30,240 --> 00:22:33,730
But if I compute
the spectrograph

441
00:22:33,730 --> 00:22:37,870
with little windows in
time that are very short,

442
00:22:37,870 --> 00:22:41,630
then my frequency
resolution is very poor,

443
00:22:41,630 --> 00:22:43,910
but the temporal
resolution is very high.

444
00:22:43,910 --> 00:22:46,300
And now you can
see the spectrum.

445
00:22:46,300 --> 00:22:48,520
You can see these
vertical striations.

446
00:22:48,520 --> 00:22:50,410
Those vertical
striations correspond

447
00:22:50,410 --> 00:22:52,810
to pulse of the glottal pulse.

448
00:22:52,810 --> 00:22:57,700
And we can basically see the
spectrum of each pulse coming

449
00:22:57,700 --> 00:23:00,940
through the vocal tract.

450
00:23:00,940 --> 00:23:02,110
Pretty cool, right?

451
00:23:02,110 --> 00:23:05,200
So how you compute the
spectrum depends on

452
00:23:05,200 --> 00:23:07,390
whether you're
actually interested in.

453
00:23:07,390 --> 00:23:11,950
If you want to focus
on the glottal pulses,

454
00:23:11,950 --> 00:23:13,450
for example, the
pitch of the speech

455
00:23:13,450 --> 00:23:15,580
you look with a longtime window.

456
00:23:15,580 --> 00:23:18,030
If you want to focus
on the formants,

457
00:23:18,030 --> 00:23:20,770
here you can see the
performance very nicely,

458
00:23:20,770 --> 00:23:22,420
you would use shorttime window.

459
00:23:25,210 --> 00:23:28,380
Any questions?

460
00:23:28,380 --> 00:23:34,950
So now I'm going to talk more
about the kinds of tapers

461
00:23:34,950 --> 00:23:41,690
that you use to get the best
possible spectral estimate.

462
00:23:41,690 --> 00:23:45,940
So a perfect taper, in
a sense would give you

463
00:23:45,940 --> 00:23:47,860
perfect temporal resolution.

464
00:23:51,470 --> 00:23:54,560
It would give you really
fine temporal resolution.

465
00:23:54,560 --> 00:23:58,370
And it would give you really
fine frequency resolution,

466
00:23:58,370 --> 00:24:03,050
but because there is a
fundamental limit on the time

467
00:24:03,050 --> 00:24:08,270
bandwidth product, you can't
measure frequency infinitely

468
00:24:08,270 --> 00:24:12,410
well with an infinitely
short sample of a signal.

469
00:24:12,410 --> 00:24:14,420
Imagine you have a
sine wave, and you

470
00:24:14,420 --> 00:24:17,550
took like two samples
of a sine wave.

471
00:24:17,550 --> 00:24:20,270
It would be really hard to
figure out the frequency,

472
00:24:20,270 --> 00:24:23,270
whereas if you have many, many,
many samples of a sine wave,

473
00:24:23,270 --> 00:24:24,720
you can figure
out the frequency.

474
00:24:24,720 --> 00:24:26,870
So there's a
fundamental limit there.

475
00:24:26,870 --> 00:24:29,180
So there's no such thing
as a perfect taper.

476
00:24:29,180 --> 00:24:32,270
If I want to take a
sample of my signal

477
00:24:32,270 --> 00:24:37,850
in time, if I have a sample
that's limited in time,

478
00:24:37,850 --> 00:24:41,610
if it goes from one
time to another time

479
00:24:41,610 --> 00:24:44,830
and a zero outside of
that, then in frequency,

480
00:24:44,830 --> 00:24:47,130
it's spread out to infinity.

481
00:24:47,130 --> 00:24:51,480
And so all we can do
is choose how it is.

482
00:24:51,480 --> 00:24:54,390
We can either have
things look worse in time

483
00:24:54,390 --> 00:24:57,180
and better in frequency or
better in time and worse

484
00:24:57,180 --> 00:24:58,150
in frequency.

485
00:25:01,170 --> 00:25:04,170
So the other problem is
that when we taper a signal,

486
00:25:04,170 --> 00:25:06,570
we're throwing away
data here at the edges.

487
00:25:06,570 --> 00:25:10,860
But if you take a square window
and you keep all the data

488
00:25:10,860 --> 00:25:13,020
within that square
window, well, you've

489
00:25:13,020 --> 00:25:15,390
got all the data in that window.

490
00:25:15,390 --> 00:25:17,520
But as soon as you
taper it, you're

491
00:25:17,520 --> 00:25:19,480
throwing away
stuff at the edges.

492
00:25:19,480 --> 00:25:21,570
So you taper it
to make it smooth

493
00:25:21,570 --> 00:25:25,740
and improve the spectrum,
the spectral estimate,

494
00:25:25,740 --> 00:25:28,050
but you're throwing away data.

495
00:25:28,050 --> 00:25:33,110
So you can actually
compute the optimal taper.

496
00:25:33,110 --> 00:25:35,060
Here's how you do that.

497
00:25:35,060 --> 00:25:37,110
What we're going to
do is we're going

498
00:25:37,110 --> 00:25:40,020
to think of this
as what's called

499
00:25:40,020 --> 00:25:41,820
the spectral
concentration problem.

500
00:25:41,820 --> 00:25:45,750
We're going to
find a function W.

501
00:25:45,750 --> 00:25:49,170
This is a tapering
that is limited

502
00:25:49,170 --> 00:25:54,870
in time from some
minus T/2 to plus T/2.

503
00:25:54,870 --> 00:25:57,020
So it's 0 outside of that.

504
00:25:57,020 --> 00:25:59,580
It concentrates
the maximum amount

505
00:25:59,580 --> 00:26:02,310
of energy in it's
Fourier Transform,

506
00:26:02,310 --> 00:26:08,580
in its power spectrum within
a window that has widths 2W.

507
00:26:08,580 --> 00:26:11,968
So W is this [INAUDIBLE].

508
00:26:11,968 --> 00:26:13,560
Does that makes sense?

509
00:26:13,560 --> 00:26:15,950
We're going to find
a function w that

510
00:26:15,950 --> 00:26:21,140
concentrates as much energy
as possible in square window.

511
00:26:24,804 --> 00:26:27,700
And of course, that's
going to have the result

512
00:26:27,700 --> 00:26:29,590
that the energy
in the side lobes

513
00:26:29,590 --> 00:26:34,030
is going to be as
small as possible.

514
00:26:34,030 --> 00:26:35,930
And there are many
different optimizations

515
00:26:35,930 --> 00:26:38,150
you can do in principle.

516
00:26:38,150 --> 00:26:41,750
But this particular optimization
is about getting as much

517
00:26:41,750 --> 00:26:44,750
of the power as possible
into a central low.

518
00:26:44,750 --> 00:26:46,460
Here's this function of time.

519
00:26:46,460 --> 00:26:49,850
We simply calculate the
Fourier Transform of W.

520
00:26:49,850 --> 00:26:52,920
We call that U of f.

521
00:26:52,920 --> 00:26:55,770
And now we just write
down a parameter that

522
00:26:55,770 --> 00:26:58,440
says how much of that Fourier--

523
00:26:58,440 --> 00:27:03,750
how much of the power in U
is in the window from minus

524
00:27:03,750 --> 00:27:08,160
w to w compared to how
much power there is in U

525
00:27:08,160 --> 00:27:12,480
overall, overall frequencies?

526
00:27:12,480 --> 00:27:17,200
So if lambda is 1,
then all of the power

527
00:27:17,200 --> 00:27:19,680
is between minus w and w.

528
00:27:19,680 --> 00:27:21,250
Does that make sense?

529
00:27:21,250 --> 00:27:26,560
So you can actually solve this
optimization problem, maximize

530
00:27:26,560 --> 00:27:30,340
lambda, and what you find
is that there's not just one

531
00:27:30,340 --> 00:27:37,490
function that gives very good
concentration of the power

532
00:27:37,490 --> 00:27:38,120
into this band.

533
00:27:38,120 --> 00:27:40,360
There's actually a
family of functions.

534
00:27:40,360 --> 00:27:43,600
There's actually k
of these functions,

535
00:27:43,600 --> 00:27:48,730
where k is twice the
bandwidth times the duration

536
00:27:48,730 --> 00:27:50,005
of the window minus 1.

537
00:27:50,005 --> 00:27:52,810
So there are a
family of k functions

538
00:27:52,810 --> 00:27:54,970
called Slepian functions
for which lambda

539
00:27:54,970 --> 00:27:56,080
is very close to 1.

540
00:27:58,870 --> 00:28:02,540
There are also discrete probate
spheroid sequence functions,

541
00:28:02,540 --> 00:28:03,770
dpss.

542
00:28:03,770 --> 00:28:05,760
And that's the command
that Matlab uses

543
00:28:05,760 --> 00:28:08,970
to find those functions dpss.

544
00:28:08,970 --> 00:28:10,420
Here's what they look like.

545
00:28:10,420 --> 00:28:15,000
So these are five functions
that give lambda close

546
00:28:15,000 --> 00:28:23,360
to 1 or for a particular
bandwidth in a particular time

547
00:28:23,360 --> 00:28:23,860
window.

548
00:28:23,860 --> 00:28:26,830
The n equals 1 function
is a single peak.

549
00:28:26,830 --> 00:28:30,190
It looks a lot like a Gaussian,
but it's not a Gaussian.

550
00:28:30,190 --> 00:28:33,550
What's fundamentally different
between this function

551
00:28:33,550 --> 00:28:35,120
and a Gaussian?

552
00:28:35,120 --> 00:28:40,070
This function goes to 0
outside that time window,

553
00:28:40,070 --> 00:28:42,530
whereas a Gaussian
goes on forever.

554
00:28:42,530 --> 00:28:47,290
The second slepian
in this family

555
00:28:47,290 --> 00:28:51,255
has a peak, a positive
peak in the left half,

556
00:28:51,255 --> 00:28:55,630
a negative peak in the right.

557
00:28:55,630 --> 00:28:59,300
The third one has positive,
negative, positive,

558
00:28:59,300 --> 00:29:01,100
and then goes to 0.

559
00:29:01,100 --> 00:29:04,960
And the higher order functions
just have more wiggle.

560
00:29:04,960 --> 00:29:09,290
They all have the property
that they go to 0 at the edges.

561
00:29:09,290 --> 00:29:11,260
And the other
interesting properties

562
00:29:11,260 --> 00:29:14,350
that these functions are all
orthogonal to each other.

563
00:29:14,350 --> 00:29:18,730
That means if you multiply this
function times that function

564
00:29:18,730 --> 00:29:21,160
and integrate, you get 0.

565
00:29:21,160 --> 00:29:24,490
Multiply any two of these
functions and integrate

566
00:29:24,490 --> 00:29:27,980
over the window
minus T/2 to plus T/2

567
00:29:27,980 --> 00:29:29,940
the integral [INAUDIBLE]

568
00:29:29,940 --> 00:29:33,720
What that means is that
the spectral estimate you

569
00:29:33,720 --> 00:29:37,680
get by windowing your data
with each of these functions

570
00:29:37,680 --> 00:29:41,280
separately are
statistically independent.

571
00:29:41,280 --> 00:29:43,860
You actually have multiple
different estimates

572
00:29:43,860 --> 00:29:45,757
of the spectrum from
the same little piece

573
00:29:45,757 --> 00:29:48,270
of [AUDIO OUT] The
other cool thing is

574
00:29:48,270 --> 00:29:51,710
that remember the problem
with windowing our [AUDIO OUT]

575
00:29:51,710 --> 00:29:54,900
with one peak like this
is we were throwing away

576
00:29:54,900 --> 00:29:55,970
data at the edges.

577
00:29:55,970 --> 00:29:59,940
Well, notice that the higher
order slepian functions

578
00:29:59,940 --> 00:30:01,690
have big peaks at the edges.

579
00:30:01,690 --> 00:30:04,290
And so they are actually
measuring the spectrum

580
00:30:04,290 --> 00:30:08,860
of the parts of the signal that
are at the edge of the window.

581
00:30:08,860 --> 00:30:12,700
Now notice that those functions
start crashing into the edges.

582
00:30:12,700 --> 00:30:16,900
So you start getting sharp,
sharp edges out here,

583
00:30:16,900 --> 00:30:21,690
which is why the higher order
functions have worse ripples

584
00:30:21,690 --> 00:30:24,990
outside that central lobe.

585
00:30:24,990 --> 00:30:26,160
Any questions about that?

586
00:30:30,860 --> 00:30:33,490
It's a lot [AUDIO OUT]
Just remember

587
00:30:33,490 --> 00:30:37,390
that for a given width
of the window in time

588
00:30:37,390 --> 00:30:41,740
and within frequency, there
are multiple of these functions

589
00:30:41,740 --> 00:30:47,350
that put the maximum amount
of power in this window 2W.

590
00:30:54,310 --> 00:30:55,660
So that's great.

591
00:30:55,660 --> 00:30:56,480
So good question.

592
00:30:56,480 --> 00:31:00,260
What would you do if you're
trying to measure something

593
00:31:00,260 --> 00:31:02,620
and you measure it
five different times,

594
00:31:02,620 --> 00:31:05,410
how would you get an estimate
of what the actual number is?

595
00:31:08,990 --> 00:31:12,740
How would you get an error bar
on how good your estimate is?

596
00:31:17,320 --> 00:31:19,960
For deviation of your
estimates, right.

597
00:31:19,960 --> 00:31:21,550
And that's exactly what you do.

598
00:31:21,550 --> 00:31:23,980
So not only can you
get a good estimate

599
00:31:23,980 --> 00:31:27,550
of the average spectrum by
averaging all of these things

600
00:31:27,550 --> 00:31:29,950
together, but you can
actually get an error bar.

601
00:31:29,950 --> 00:31:31,060
And that's really cool.

602
00:31:34,740 --> 00:31:38,080
So here's the
procedure that you use.

603
00:31:38,080 --> 00:31:41,700
And this is what's in that
little function W spec.m.

604
00:31:44,500 --> 00:31:48,220
So you select a time window
of a particular width.

605
00:31:48,220 --> 00:31:49,810
How do you know
what with to choose?

606
00:31:55,630 --> 00:31:56,750
That's part of it.

607
00:31:56,750 --> 00:31:59,380
The other thing is if your
signal is changing rapidly

608
00:31:59,380 --> 00:32:02,050
in time and you actually
care about that change,

609
00:32:02,050 --> 00:32:02,910
you should choose--

610
00:32:02,910 --> 00:32:05,800
you're more interested
in temporal resolution.

611
00:32:05,800 --> 00:32:07,740
If your signal is
really constant,

612
00:32:07,740 --> 00:32:09,147
like it doesn't
change very fast,

613
00:32:09,147 --> 00:32:10,480
then you can use bigger windows.

614
00:32:13,210 --> 00:32:15,617
So we're going to
choose a time width.

615
00:32:15,617 --> 00:32:17,200
Then what you're
going to do is you're

616
00:32:17,200 --> 00:32:19,630
going to select this
parameter p, which is just

617
00:32:19,630 --> 00:32:22,720
the time-bandwidth product
And if you've already

618
00:32:22,720 --> 00:32:25,870
chosen T, what you're doing
is you're just choosing

619
00:32:25,870 --> 00:32:27,634
the frequency resolution.

620
00:32:30,590 --> 00:32:35,090
Once you compute
p and you know T,

621
00:32:35,090 --> 00:32:38,940
you just stuff those numbers
into this Matlab function

622
00:32:38,940 --> 00:32:46,280
dpss, which sends back to you
this set of functions here.

623
00:32:46,280 --> 00:32:50,540
It sends you back k of those
functions that once you've

624
00:32:50,540 --> 00:32:54,130
chosen p, k is just 2p minus 1.

625
00:32:54,130 --> 00:32:55,130
And then what do you do?

626
00:32:55,130 --> 00:32:57,740
You just take your
little snippet of data.

627
00:32:57,740 --> 00:33:01,520
You multiply it by the first
taper, compute the spectrum,

628
00:33:01,520 --> 00:33:04,810
compute the Fourier Transform.

629
00:33:04,810 --> 00:33:07,460
And then take your
little piece of data,

630
00:33:07,460 --> 00:33:10,720
multiply it by the second one,
compute the Fourier transform,

631
00:33:10,720 --> 00:33:11,950
and the power spectrum.

632
00:33:11,950 --> 00:33:15,520
And then you're just
going to average.

633
00:33:15,520 --> 00:33:21,190
This square magnitude
should be inside the window,

634
00:33:21,190 --> 00:33:22,360
your piece of data.

635
00:33:22,360 --> 00:33:26,170
You Fourier transform it, square
magnitude, and then average

636
00:33:26,170 --> 00:33:28,720
all those spectra together.

637
00:33:28,720 --> 00:33:30,970
I see this is Fourier
transform right here.

638
00:33:30,970 --> 00:33:32,740
This sum is the
Fourier transform that.

639
00:33:32,740 --> 00:33:35,830
We square magnitude that to
get the spectral estimate

640
00:33:35,830 --> 00:33:37,810
of that particular sample.

641
00:33:37,810 --> 00:33:40,840
Then we're going to average
that spectrum together

642
00:33:40,840 --> 00:33:46,330
for all the different
windowing tapering function.

643
00:33:46,330 --> 00:33:49,780
Now you get then multiple
spectral estimates.

644
00:33:49,780 --> 00:33:52,960
You're going to average them
together to get the mean.

645
00:33:52,960 --> 00:33:56,020
And you can also
look at the variance

646
00:33:56,020 --> 00:33:57,799
to get the standard deviation.

647
00:34:04,160 --> 00:34:04,985
Questions?

648
00:34:04,985 --> 00:34:05,780
Let's stop there.

649
00:34:05,780 --> 00:34:07,880
That was that was
a lot of stuff.

650
00:34:07,880 --> 00:34:14,230
Let's take a breath and
[INAUDIBLE] to see whether we

651
00:34:14,230 --> 00:34:20,086
[AUDIO OUT] Questions?

652
00:34:27,394 --> 00:34:27,894
No.

653
00:34:34,560 --> 00:34:35,960
Don't worry about it.

654
00:34:35,960 --> 00:34:40,670
This is representation
of the Fourier transform.

655
00:34:40,670 --> 00:34:44,050
You sum over all
the time samples.

656
00:34:44,050 --> 00:34:48,260
This, you will just do as
fast Fourier transform.

657
00:34:48,260 --> 00:34:52,030
So you'll take the
data, multiply it

658
00:34:52,030 --> 00:34:58,150
by this taper function,
which is the slepian

659
00:34:58,150 --> 00:35:00,220
and then do the
Fourier transform, take

660
00:35:00,220 --> 00:35:02,350
the square magnitude.

661
00:35:02,350 --> 00:35:06,010
We just want make sure that
we've got the basic idea.

662
00:35:06,010 --> 00:35:09,730
So you've got a
long piece of data.

663
00:35:09,730 --> 00:35:17,390
You're going to lose some
time window, capital T. You're

664
00:35:17,390 --> 00:35:24,940
going to choose a bandwidth W or
this time bandwidth product p.

665
00:35:24,940 --> 00:35:29,010
Bend T and p to
this dpss function.

666
00:35:29,010 --> 00:35:34,410
It will send you back a bunch
of these dpss functions that

667
00:35:34,410 --> 00:35:35,867
fit in that window.

668
00:35:35,867 --> 00:35:37,700
Now you're going to
take your piece of data.

669
00:35:37,700 --> 00:35:40,440
You're going to break it into
little windows of that length,

670
00:35:40,440 --> 00:35:43,860
multiply them by each one
of the slepian functions

671
00:35:43,860 --> 00:35:47,570
Do the Fourier transform of
each one of those products.

672
00:35:47,570 --> 00:35:49,670
Average them all altogether.

673
00:35:49,670 --> 00:35:52,250
Take the square magnitude of
each one to get the spectrum,

674
00:35:52,250 --> 00:35:55,410
and then average all
those spectra together.

675
00:35:55,410 --> 00:36:00,880
So now so what does
p do? p chooses

676
00:36:00,880 --> 00:36:06,630
bandwidth of the slepian
function in that window.

677
00:36:06,630 --> 00:36:11,260
So if you have a window
that's 100 milliseconds wide--

678
00:36:11,260 --> 00:36:14,280
so we're going to take
our data and break it

679
00:36:14,280 --> 00:36:16,830
into little pieces
that's milliseconds long.

680
00:36:16,830 --> 00:36:21,580
It's goes from minus 50
to plus 50 milliseconds.

681
00:36:21,580 --> 00:36:28,290
Choose a window that
has a narrow bandwidth,

682
00:36:28,290 --> 00:36:32,170
the small p, then the
bandwidth is narrow.

683
00:36:32,170 --> 00:36:35,200
The bandwidth is narrow,
because the function is wide,

684
00:36:35,200 --> 00:36:37,060
or you can choose
a large bandwidth.

685
00:36:37,060 --> 00:36:38,320
What does that mean?

686
00:36:38,320 --> 00:36:42,560
It's a narrower
function in time.

687
00:36:42,560 --> 00:36:46,650
Now if p is 5, you have
a broader bandwidth.

688
00:36:46,650 --> 00:36:49,340
And that means that the
window, the tapering function

689
00:36:49,340 --> 00:36:53,020
is narrower in time.

690
00:36:53,020 --> 00:36:55,990
Look at the Fourier
transform of each of two

691
00:36:55,990 --> 00:36:57,280
different tapering functions.

692
00:36:57,280 --> 00:37:00,970
You can see that
if p equals 1.5,

693
00:37:00,970 --> 00:37:03,250
the tapering function is broad.

694
00:37:03,250 --> 00:37:11,370
But that Fourier transform,
a kernel in frequency space

695
00:37:11,370 --> 00:37:13,460
is narrower.

696
00:37:13,460 --> 00:37:18,330
Take the p equals 5 function,
a broader bandwidth,

697
00:37:18,330 --> 00:37:22,530
it's narrower in time
and broader in frequency.

698
00:37:22,530 --> 00:37:25,650
Does that makes sense?

699
00:37:25,650 --> 00:37:30,290
p just for a given
size time window

700
00:37:30,290 --> 00:37:34,610
tells you how many
different samples

701
00:37:34,610 --> 00:37:36,710
we're going to take
within that time window.

702
00:37:39,780 --> 00:37:41,610
no

703
00:37:41,610 --> 00:37:44,950
So let me just go back to
this example right here.

704
00:37:44,950 --> 00:37:48,120
So I took this speech signal
that I just showed you

705
00:37:48,120 --> 00:37:50,500
that was recorded
on the microphone.

706
00:37:50,500 --> 00:37:53,170
I chose a time window
of 50 milliseconds.

707
00:37:53,170 --> 00:37:54,910
So I broke the
speech signal down

708
00:37:54,910 --> 00:37:57,280
into little 50
millisecond chunks.

709
00:37:57,280 --> 00:38:00,610
I chose a bandwidth of 60 hertz.

710
00:38:00,610 --> 00:38:06,430
That corresponds to p
equals 1.5 and k equals 2.

711
00:38:06,430 --> 00:38:09,010
That gives me back a bunch
of these little functions.

712
00:38:09,010 --> 00:38:12,090
And I computed this spectragram.

713
00:38:12,090 --> 00:38:15,250
For this spectragram, I chose
a shorter time window, eight

714
00:38:15,250 --> 00:38:21,130
milliseconds, choose a
bandwidth of 375 hertz, which

715
00:38:21,130 --> 00:38:24,550
also corresponds to p
equals 1.5 and k equals 2.

716
00:38:24,550 --> 00:38:29,130
And if you Fourier
transform the spectragram

717
00:38:29,130 --> 00:38:33,210
with those parameters, you
get this example right here.

718
00:38:33,210 --> 00:38:36,840
So in this case,
I kept the same p,

719
00:38:36,840 --> 00:38:43,190
the same time-bandwidth product,
but I made the time shorter.

720
00:38:43,190 --> 00:38:45,690
So the best way to do this,
when you're actually doing this,

721
00:38:45,690 --> 00:38:48,050
practically is just
to take a signal

722
00:38:48,050 --> 00:38:51,040
by some of these different
things [INAUDIBLE]

723
00:38:51,040 --> 00:38:52,600
That's really the
best way to do it.

724
00:38:52,600 --> 00:38:58,900
You can't-- I don't recommend
trying to think through

725
00:38:58,900 --> 00:39:02,543
beforehand too much exactly what
it's going to look like if you

726
00:39:02,543 --> 00:39:04,960
choose these different values
when it's easier just to try

727
00:39:04,960 --> 00:39:06,793
different things and
see what it looks like.

728
00:39:09,440 --> 00:39:09,940
Yes.

729
00:39:09,940 --> 00:39:14,020
AUDIENCE: What are looking for?

730
00:39:14,020 --> 00:39:16,300
MICHALE FEE: Well, it
depends on what you're

731
00:39:16,300 --> 00:39:18,460
trying to get out of the data.

732
00:39:18,460 --> 00:39:21,130
If you want to
visualize formants,

733
00:39:21,130 --> 00:39:24,220
you can see that the
formants are much clearer.

734
00:39:24,220 --> 00:39:29,440
These different windows give you
a different view on the data.

735
00:39:29,440 --> 00:39:31,300
So just look through
different windows

736
00:39:31,300 --> 00:39:34,492
and see what looks
interesting in the results.

737
00:39:34,492 --> 00:39:35,700
That's the best way to do it.

738
00:39:41,210 --> 00:39:43,190
So I just want to
say one more word

739
00:39:43,190 --> 00:39:48,180
about this
time-bandwidth product.

740
00:39:48,180 --> 00:39:55,320
So the time-bandwidth product of
any function is greater than 1.

741
00:39:55,320 --> 00:40:00,260
So you can make time shorter,
but bandwidth is worse.

742
00:40:00,260 --> 00:40:01,960
The way that you
can think about this

743
00:40:01,960 --> 00:40:05,620
as that you're sort of
looking at your data

744
00:40:05,620 --> 00:40:12,550
through a window in
time and frequency.

745
00:40:12,550 --> 00:40:16,390
What you want is to look with
infinitely fine resolution

746
00:40:16,390 --> 00:40:19,540
in both time and
frequency, but really you

747
00:40:19,540 --> 00:40:23,200
can't have infinite time
and frequency resolution.

748
00:40:23,200 --> 00:40:28,540
You're going to be smearing
your view of the data

749
00:40:28,540 --> 00:40:32,650
with something that
has a minimum area,

750
00:40:32,650 --> 00:40:37,300
the time-bandwidth product which
has a minimum of size of 1.

751
00:40:40,080 --> 00:40:45,740
You can either make time small
and stretch the bandwidth out,

752
00:40:45,740 --> 00:40:49,220
or you can stretch out time
and make the bandwidth shorter

753
00:40:49,220 --> 00:40:52,730
or make time short and
make the bandwidth long.

754
00:40:52,730 --> 00:40:56,150
But you can't squeeze
both, because of

755
00:40:56,150 --> 00:41:00,410
this fundamental limit on
the time bandwidth product.

756
00:41:00,410 --> 00:41:02,120
This all depends on
how you're measuring

757
00:41:02,120 --> 00:41:04,190
the time and the bandwidth.

758
00:41:04,190 --> 00:41:06,620
These are kind of
funny shaped functions.

759
00:41:06,620 --> 00:41:09,860
So there are different
ways of measure

760
00:41:09,860 --> 00:41:11,420
what the bandwidth
of a signal is

761
00:41:11,420 --> 00:41:14,300
or what the time
width of a signal is.

762
00:41:14,300 --> 00:41:19,090
Now, the windows that you're
looking in time and frequency

763
00:41:19,090 --> 00:41:23,340
with are the smallest
time bandwidth product.

764
00:41:23,340 --> 00:41:25,810
So notice that if the
time-bandwidth product is

765
00:41:25,810 --> 00:41:30,540
small, close to 1,
the number of tapers

766
00:41:30,540 --> 00:41:35,100
you get in this dpss, this
family of functions you get

767
00:41:35,100 --> 00:41:36,190
is just 1.

768
00:41:36,190 --> 00:41:40,790
If p is 1, then k is
2P minus 1, which is 1.

769
00:41:40,790 --> 00:41:43,350
So you only get one window.

770
00:41:43,350 --> 00:41:47,390
You only get one
estimate of the spectrum,

771
00:41:47,390 --> 00:41:51,760
but you can also choose
to look at your data

772
00:41:51,760 --> 00:41:59,170
with worse, [AUDIO OUT] that
have a worse time frequency

773
00:41:59,170 --> 00:42:00,800
time-bandwidth product.

774
00:42:00,800 --> 00:42:02,240
Why would you do that?

775
00:42:02,240 --> 00:42:04,270
Why would you ever
look at your data

776
00:42:04,270 --> 00:42:07,120
with functions that have a
worse time-bandwidth product?

777
00:42:11,850 --> 00:42:17,160
Well, notice that if the
time-bandwidth product is 2,

778
00:42:17,160 --> 00:42:19,440
how many functions do you have?

779
00:42:19,440 --> 00:42:23,030
Why does that matter,
because now you

780
00:42:23,030 --> 00:42:27,020
have three independent estimates
of what that spectrum is.

781
00:42:30,320 --> 00:42:33,830
So sometimes you
would gladly choose

782
00:42:33,830 --> 00:42:37,910
to have a worse resolution
in time and frequency,

783
00:42:37,910 --> 00:42:40,520
because you've got more
independent estimates means

784
00:42:40,520 --> 00:42:41,340
better.

785
00:42:41,340 --> 00:42:44,090
So sometimes your signal
might be changing very slowly.

786
00:42:44,090 --> 00:42:47,200
And then you can use a big
time-bandwidth product.

787
00:42:47,200 --> 00:42:49,780
It doesn't matter.

788
00:42:49,780 --> 00:42:52,870
Sometimes your signal is
changing very rapidly in time.

789
00:42:52,870 --> 00:42:57,680
And so you want to keep the
time-bandwidth product small.

790
00:42:57,680 --> 00:43:03,490
Does that begin to [INAUDIBLE]
bigger time-bandwidth products

791
00:43:03,490 --> 00:43:08,000
and now you get even more
independent estimates.

792
00:43:08,000 --> 00:43:15,320
Most typically, you choose p's
that go from 1.5 to multiples

793
00:43:15,320 --> 00:43:18,805
of 0.5, because then you have
an integer number of k's.

794
00:43:22,362 --> 00:43:26,780
But usually, you choose
p equals 1.5 or higher

795
00:43:26,780 --> 00:43:29,210
in multiples of 0.5.

796
00:43:29,210 --> 00:43:32,590
If you really care about
temporal resolution

797
00:43:32,590 --> 00:43:36,490
and frequency resolution,
you want that box

798
00:43:36,490 --> 00:43:38,110
that's smearing
out your spectrum

799
00:43:38,110 --> 00:43:40,210
to be as small as possible.

800
00:43:40,210 --> 00:43:44,680
Small as possible means
it has an area of 1.

801
00:43:44,680 --> 00:43:48,850
That's the minimum
area it can have,

802
00:43:48,850 --> 00:43:51,720
but that only gives
you one taper.

803
00:43:51,720 --> 00:43:55,800
But if you really care about
both temporal and frequency--

804
00:43:55,800 --> 00:43:58,498
time and frequency
resolution, then that's

805
00:43:58,498 --> 00:43:59,790
the trade-off you have to make.

806
00:44:02,830 --> 00:44:07,890
Slowly you can air out more in
time, maybe more in frequency,

807
00:44:07,890 --> 00:44:10,710
and you can choose your
time-bandwidth product,

808
00:44:10,710 --> 00:44:13,218
in which case you get
more tapers and a better

809
00:44:13,218 --> 00:44:14,260
estimate of the spectrum.

810
00:44:21,970 --> 00:44:27,600
So this is state of the
art spectral estimation.

811
00:44:27,600 --> 00:44:31,610
It doesn't get better than this.

812
00:44:31,610 --> 00:44:37,690
To put it like this, you're
doing it the best possible way,

813
00:44:37,690 --> 00:44:38,880
a bit of digesting.

814
00:44:49,810 --> 00:44:53,950
So lets spend the
rest of the lecture

815
00:44:53,950 --> 00:44:56,260
today talking about filtering.

816
00:44:56,260 --> 00:44:59,935
So Matlab has a bunch of really
powerful filtering tools.

817
00:45:01,610 --> 00:45:03,700
So here's an example
of the kind of thing

818
00:45:03,700 --> 00:45:04,910
where we use filtering.

819
00:45:04,910 --> 00:45:09,770
So this is a [INAUDIBLE] finch
song recorded in the lab.

820
00:45:20,704 --> 00:45:22,890
OK, so now I want
you to just listen--

821
00:45:22,890 --> 00:45:24,690
so you were probably
listening to the song,

822
00:45:24,690 --> 00:45:27,690
but now listen at
very low frequencies.

823
00:45:30,880 --> 00:45:31,810
Tell me what you hear.

824
00:45:31,810 --> 00:45:33,490
Listen at very low frequen--

825
00:45:37,098 --> 00:45:37,765
[AUDIO PLAYBACK]

826
00:45:37,765 --> 00:45:39,665
[FINCH CHIRPING]

827
00:45:39,665 --> 00:45:40,540
[END PLAYBACK]

828
00:45:40,540 --> 00:45:41,040
[HUMS]

829
00:45:42,380 --> 00:45:46,100
--background, that's hum
from the building's air

830
00:45:46,100 --> 00:45:48,220
conditioners, air handling.

831
00:45:48,220 --> 00:45:58,690
It all makes this low rumbling,
which adds a lot of noise

832
00:45:58,690 --> 00:46:01,060
to the signal that can
make it hard to see

833
00:46:01,060 --> 00:46:06,486
where the syllables are in
the [AUDIO OUT] time series.

834
00:46:06,486 --> 00:46:09,227
Here are the
syllables right here.

835
00:46:09,227 --> 00:46:10,560
And that's the background noise.

836
00:46:10,560 --> 00:46:13,170
But the background noise
is at very low frequencies.

837
00:46:13,170 --> 00:46:16,100
So sometimes you want to
just filter stuff like that

838
00:46:16,100 --> 00:46:19,670
away because we don't care
about the air conditioner.

839
00:46:19,670 --> 00:46:21,750
We care about the bird's song.

840
00:46:21,750 --> 00:46:25,940
OK, so we can get rid of that by
applying-- what kind of filter

841
00:46:25,940 --> 00:46:28,400
would we apply to
this signal to get rid

842
00:46:28,400 --> 00:46:31,968
of these low frequencies?

843
00:46:31,968 --> 00:46:32,468
[INAUDIBLE]

844
00:46:32,468 --> 00:46:33,430
AUDIENCE: A high pass.

845
00:46:33,430 --> 00:46:35,440
MICHALE FEE: A high-pass
filter, very good.

846
00:46:35,440 --> 00:46:38,110
OK, so let's put a high
pass filter on this.

847
00:46:38,110 --> 00:46:39,820
Now, in the past,
previously we've

848
00:46:39,820 --> 00:46:44,560
talked about using convolution
to carry out a high pass

849
00:46:44,560 --> 00:46:45,400
filtering function.

850
00:46:45,400 --> 00:46:48,010
But Matlab has all these
very powerful tools.

851
00:46:48,010 --> 00:46:50,410
So I wanted to show you
what those look like

852
00:46:50,410 --> 00:46:52,490
and how to use them.

853
00:46:52,490 --> 00:46:55,150
OK, so this is a
little piece of code

854
00:46:55,150 --> 00:46:58,070
that implements a high-pass
filter on that signal.

855
00:46:58,070 --> 00:47:00,310
Now, you can see that all
of that low frequency stuff

856
00:47:00,310 --> 00:47:04,630
is [AUDIO OUT] You have a
nice clean, silent background.

857
00:47:04,630 --> 00:47:06,430
And now you can
see the syllables

858
00:47:06,430 --> 00:47:08,210
on top of that background.

859
00:47:08,210 --> 00:47:09,290
Here's the spectrogram.

860
00:47:09,290 --> 00:47:12,160
You can see that all of that
low frequency stuff is gone.

861
00:47:12,160 --> 00:47:16,910
And this is a little
bit of sample code here.

862
00:47:16,910 --> 00:47:18,970
I just want to point
out a few things.

863
00:47:18,970 --> 00:47:21,760
You give it the Nyquist
frequency, which is just

864
00:47:21,760 --> 00:47:23,490
the sampling rate divided by 2.

865
00:47:23,490 --> 00:47:25,990
I'll explain later
what that means.

866
00:47:25,990 --> 00:47:27,730
You set a cutoff frequency.

867
00:47:27,730 --> 00:47:35,660
So you tell it to cut
off below 500 hertz.

868
00:47:35,660 --> 00:47:40,440
You put the cutoff and
Nyquest frequency together,

869
00:47:40,440 --> 00:47:42,300
you get a ratio of
those two that's

870
00:47:42,300 --> 00:47:45,480
basically the fraction
of the spectral width

871
00:47:45,480 --> 00:47:47,010
that you're going to cut off.

872
00:47:47,010 --> 00:47:50,880
And then you tell it to
give you the parameters

873
00:47:50,880 --> 00:47:52,710
for a Butterworth filter.

874
00:47:52,710 --> 00:47:55,670
It's just one of the kinds
of filters that you use.

875
00:47:55,670 --> 00:47:59,190
Tell it whether it's a
high-pass or low-pass.

876
00:47:59,190 --> 00:48:04,200
Send that filter, those filter
parameters, to this function

877
00:48:04,200 --> 00:48:08,955
called filtfilt. You give it
these two parameters, B and A,

878
00:48:08,955 --> 00:48:11,340
and your data vector.

879
00:48:11,340 --> 00:48:14,610
And when run that, that's what
the result looks like, OK?

880
00:48:14,610 --> 00:48:17,924
Let me play that again
for you after filtering.

881
00:48:28,310 --> 00:48:32,905
All that low
frequency hum is gone.

882
00:48:32,905 --> 00:48:37,790
All right, so here's
an example of what--

883
00:48:37,790 --> 00:48:40,060
I mean, we would never
actually do this in the lab.

884
00:48:40,060 --> 00:48:43,160
But this is what it would look
like if you wanted to emphasize

885
00:48:43,160 --> 00:48:44,330
that low frequency stuff.

886
00:48:44,330 --> 00:48:48,740
Let's say that you're the air
conditioner technician who

887
00:48:48,740 --> 00:48:50,270
comes and wants to
figure out what's

888
00:48:50,270 --> 00:48:51,562
wrong with the air conditioner.

889
00:48:51,562 --> 00:48:54,290
And it turns out that the way
it sounds really is helpful.

890
00:48:54,290 --> 00:48:57,440
So you now do a low-pass filter.

891
00:48:57,440 --> 00:48:59,570
And you're going to keep
the low frequency part.

892
00:48:59,570 --> 00:49:02,690
Because all those annoying
birds are making it hard

893
00:49:02,690 --> 00:49:05,310
for you to hear what's wrong
with the air conditioner.

894
00:49:05,310 --> 00:49:07,355
OK, so here's--

895
00:49:15,991 --> 00:49:17,530
Didn't quite get
rid of the birds.

896
00:49:17,530 --> 00:49:21,900
But now you can hear the low
frequency stuff much better.

897
00:49:21,900 --> 00:49:26,190
OK, all right, so now we
just did that by, again,

898
00:49:26,190 --> 00:49:27,290
giving it the Nyquist.

899
00:49:27,290 --> 00:49:30,540
The cutoff, we're going
to cut off above 2,000,

900
00:49:30,540 --> 00:49:33,210
pass below 2,000.

901
00:49:33,210 --> 00:49:37,350
We're going to tell it to
use a Butterworth filter, now

902
00:49:37,350 --> 00:49:38,610
low-pass.

903
00:49:38,610 --> 00:49:42,030
And again, we just pass it
the parameters and the data.

904
00:49:42,030 --> 00:49:43,800
And it sends us back
the filtered data.

905
00:49:48,390 --> 00:49:51,450
OK, you can also do a band-pass.

906
00:49:51,450 --> 00:49:56,480
OK, so a band-pass does a
high-pass and a low-pass

907
00:49:56,480 --> 00:49:57,540
together.

908
00:49:57,540 --> 00:50:00,530
Now you're filtering out
everything above some number

909
00:50:00,530 --> 00:50:02,180
and below some number.

910
00:50:02,180 --> 00:50:04,800
And here we give it a
cutoff with two numbers.

911
00:50:04,800 --> 00:50:07,040
So it's going to
cut off everything

912
00:50:07,040 --> 00:50:11,030
below 4 kilohertz and
everything above 5 kilohertz.

913
00:50:11,030 --> 00:50:15,380
Again, we use the
Butterworth filter.

914
00:50:15,380 --> 00:50:18,235
You leave off the tag to
get a band-pass filter.

915
00:50:18,235 --> 00:50:19,610
And here's what
that sounds like.

916
00:50:24,787 --> 00:50:25,454
[AUDIO PLAYBACK]

917
00:50:25,454 --> 00:50:32,272
[BIRDS CHIRPING]

918
00:50:32,272 --> 00:50:34,707
[END PLAYBACK]

919
00:50:34,707 --> 00:50:37,100
And thats a band-pass filter.

920
00:50:37,100 --> 00:50:38,090
Questions?

921
00:50:47,990 --> 00:50:50,480
Yeah, so there are
many different ways

922
00:50:50,480 --> 00:50:53,240
to do this kind of filtering.

923
00:50:53,240 --> 00:50:56,570
Daniel, do you know how filtfilt
actually implements this?

924
00:50:56,570 --> 00:51:00,350
Because Matlab has a bunch of
different filtering functions.

925
00:51:00,350 --> 00:51:02,928
And this is just one of them.

926
00:51:02,928 --> 00:51:09,200
[INAUDIBLE] how it's actually
implemented [AUDIO OUT]

927
00:51:09,200 --> 00:51:11,000
Right, so there's
a filt function,

928
00:51:11,000 --> 00:51:14,750
which actually does a
convolution in one direction.

929
00:51:14,750 --> 00:51:18,080
And filtfilt does the
convolution one direction

930
00:51:18,080 --> 00:51:20,500
and then the other direction.

931
00:51:20,500 --> 00:51:22,960
And what that does
it [AUDIO OUT]

932
00:51:22,960 --> 00:51:27,140
output center with respect to
the input, centered [AUDIO OUT]

933
00:51:27,140 --> 00:51:29,460
Anyway, there are
different ways of doing it.

934
00:51:29,460 --> 00:51:32,780
And the nice thing about--

935
00:51:32,780 --> 00:51:33,280
yeah?

936
00:51:33,280 --> 00:51:39,468
AUDIENCE: [INAUDIBLE]

937
00:51:39,468 --> 00:51:44,210
MICHALE FEE: Well,
for the bird data,

938
00:51:44,210 --> 00:51:46,838
it doesn't necessarily
make all that much sense,

939
00:51:46,838 --> 00:51:47,880
right, on the face of it?

940
00:51:47,880 --> 00:51:49,710
But there are
applications where there's

941
00:51:49,710 --> 00:51:52,400
some signal at that
particular band.

942
00:51:52,400 --> 00:51:55,070
So for example, let's say
you had a speech signal

943
00:51:55,070 --> 00:51:57,500
and you wanted to find out
when the formants cross

944
00:51:57,500 --> 00:51:58,790
a certain frequency.

945
00:51:58,790 --> 00:52:03,990
Let's say you wanted to find
out if somebody could learn

946
00:52:03,990 --> 00:52:09,700
to speak [AUDIO OUT] if you
blocked one of their formants

947
00:52:09,700 --> 00:52:12,340
whenever it comes through
a particular frequency.

948
00:52:12,340 --> 00:52:15,070
OK, so let's say I
have my second formant

949
00:52:15,070 --> 00:52:17,170
and every time it
crosses 2 kilohertz

950
00:52:17,170 --> 00:52:18,370
I play a burst of noise.

951
00:52:18,370 --> 00:52:21,850
And I ask, can I
understand if I've knocked

952
00:52:21,850 --> 00:52:24,165
out that particular formant?

953
00:52:24,165 --> 00:52:25,790
I don't know why
you'd want to do that.

954
00:52:25,790 --> 00:52:27,880
But maybe it's fun, right?

955
00:52:27,880 --> 00:52:29,720
So I don't know, it
might be kind of cool.

956
00:52:29,720 --> 00:52:31,637
So then you would run a
band-pass filter right

957
00:52:31,637 --> 00:52:34,240
over 2-kilohertz band.

958
00:52:34,240 --> 00:52:37,180
And now, you'd get a big signal
whenever that formant passed

959
00:52:37,180 --> 00:52:39,037
through that band, right?

960
00:52:39,037 --> 00:52:40,870
And then you would send
that to an amplifier

961
00:52:40,870 --> 00:52:44,430
and play a noise burst
into the person's ear.

962
00:52:44,430 --> 00:52:47,390
All right, we do things
like that with birds

963
00:52:47,390 --> 00:52:51,650
to find out if they can learn
to shift the pitch of their song

964
00:52:51,650 --> 00:52:53,510
in response to errors.

965
00:52:53,510 --> 00:52:56,919
OK, so yes, they can.

966
00:52:56,919 --> 00:52:57,419
Yes--

967
00:52:57,419 --> 00:53:00,320
AUDIENCE: [INAUDIBLE]

968
00:53:00,320 --> 00:53:02,300
MICHALE FEE: Formants
are the peaks

969
00:53:02,300 --> 00:53:05,780
in the filter that's
formed by your vocal tract

970
00:53:05,780 --> 00:53:11,720
by the [AUDIO OUT] air channel
from your glottis to your lips.

971
00:53:17,090 --> 00:53:24,190
The location of those peaks
changes [INAUDIBLE] So ahh,

972
00:53:24,190 --> 00:53:26,260
ooh, the difference
between those

973
00:53:26,260 --> 00:53:30,880
is just the location
of those formant peaks.

974
00:53:30,880 --> 00:53:34,135
All those things just have
formants at different location.

975
00:53:34,135 --> 00:53:46,342
AUDIENCE: [INAUDIBLE]

976
00:53:46,342 --> 00:53:49,860
MICHALE FEE: So explain
a little bit more what

977
00:53:49,860 --> 00:53:52,230
you mean by analog interference.

978
00:53:52,230 --> 00:53:55,429
AUDIENCE: [INAUDIBLE]

979
00:53:55,429 --> 00:53:58,540
MICHALE FEE: Oh,
OK, like 60 hertz.

980
00:53:58,540 --> 00:54:00,220
OK, so that's a great question.

981
00:54:00,220 --> 00:54:02,940
So let's say that you're
doing an experiment.

982
00:54:02,940 --> 00:54:05,460
And you [AUDIO OUT]
contamination

983
00:54:05,460 --> 00:54:10,410
of your signal by 60 hertz
noise from the outlet.

984
00:54:10,410 --> 00:54:13,350
OK, it's really better to
spend the time to figure out

985
00:54:13,350 --> 00:54:15,390
how to get rid of that noise.

986
00:54:15,390 --> 00:54:21,500
But let's say that you
[INAUDIBLE] advisor your data

987
00:54:21,500 --> 00:54:24,500
[AUDIO OUT] quite figured out
how to get rid of the 60 hertz

988
00:54:24,500 --> 00:54:29,070
yet [AUDIO OUT] How would you
get rid of the 60 hertz from

989
00:54:29,070 --> 00:54:30,160
your signal?

990
00:54:30,160 --> 00:54:35,380
You could make what's called
a band-stop filter where

991
00:54:35,380 --> 00:54:40,800
you suppress frequencies
within a particular band.

992
00:54:40,800 --> 00:54:43,610
Put that band-stop
filter at 60 hertz.

993
00:54:43,610 --> 00:54:45,870
The thing is, it's
very hard to make

994
00:54:45,870 --> 00:54:48,350
a very narrow band-stop filter.

995
00:54:48,350 --> 00:54:50,700
So we learned this
in the last lecture.

996
00:54:50,700 --> 00:55:00,440
How would you get rid of a
particular [AUDIO OUT] Yeah,

997
00:55:00,440 --> 00:55:02,660
so take the Fourier
transform of your signal,

998
00:55:02,660 --> 00:55:07,560
that 60 hertz [AUDIO OUT] one
particular value of the Fourier

999
00:55:07,560 --> 00:55:08,100
transform.

1000
00:55:08,100 --> 00:55:14,588
And you can just
set that [AUDIO OUT]

1001
00:55:14,588 --> 00:55:18,290
Because the filtering
in that case

1002
00:55:18,290 --> 00:55:20,930
would be knocking
down a whole band

1003
00:55:20,930 --> 00:55:26,040
of [AUDIO OUT] frequencies.

1004
00:55:26,040 --> 00:55:32,322
AUDIENCE: [INAUDIBLE]

1005
00:55:32,322 --> 00:55:37,172
MICHALE FEE: Well, it's
just that with filtfilt,

1006
00:55:37,172 --> 00:55:39,630
like I said, there are many
different ways of doing things.

1007
00:55:39,630 --> 00:55:42,960
filtfilt won't do that for you.

1008
00:55:42,960 --> 00:55:46,980
But once you know this stuff
that we've been learning,

1009
00:55:46,980 --> 00:55:49,200
you can go in and do stuff.

1010
00:55:49,200 --> 00:55:51,780
You don't have to have some
Matlab function to do it.

1011
00:55:51,780 --> 00:55:54,060
You just know how it
all works and you just

1012
00:55:54,060 --> 00:55:55,140
write a program to do it.

1013
00:55:55,140 --> 00:55:59,015
OK, that's pretty cool, right?

1014
00:55:59,015 --> 00:56:01,060
All right-- oh, and here's
the band-stop filter.

1015
00:56:04,525 --> 00:56:12,270
[INAUDIBLE] that
lag there stop, OK?

1016
00:56:12,270 --> 00:56:13,410
OK, let's keep going.

1017
00:56:13,410 --> 00:56:16,490
Oh, and there's a tool
here that's part of Matlab.

1018
00:56:16,490 --> 00:56:20,820
It's called a filter
visualization tool, FV tool.

1019
00:56:20,820 --> 00:56:25,260
You just run this and you
can select different kinds

1020
00:56:25,260 --> 00:56:29,820
of filters that have
different kinds of roll-off

1021
00:56:29,820 --> 00:56:32,430
in frequency, that have
different properties

1022
00:56:32,430 --> 00:56:33,870
in the time domain.

1023
00:56:33,870 --> 00:56:35,310
It's kind of fun to play with.

1024
00:56:35,310 --> 00:56:38,250
If you have to do
filtering on some signal,

1025
00:56:38,250 --> 00:56:39,420
just play around with this.

1026
00:56:39,420 --> 00:56:41,910
Because there are a bunch of
different kind of filters that

1027
00:56:41,910 --> 00:56:47,130
have different weird names
like Butterworth and Chebyshev

1028
00:56:47,130 --> 00:56:50,900
and a bunch of other things
that have different properties.

1029
00:56:50,900 --> 00:56:52,970
But you can actually just
play around with this

1030
00:56:52,970 --> 00:56:57,170
and design your own filter
to meet your own [AUDIO OUT]

1031
00:56:57,170 --> 00:57:00,930
OK, so I want to end by spending
a little bit of time talking

1032
00:57:00,930 --> 00:57:03,480
about some really cool things
about the Fourier transform

1033
00:57:03,480 --> 00:57:06,780
and talk about the
Nyquist Shannon theorem.

1034
00:57:06,780 --> 00:57:08,700
This is really kind
of mind boggling.

1035
00:57:08,700 --> 00:57:10,110
It's pretty cool.

1036
00:57:10,110 --> 00:57:15,120
So all right, so remember
that when you take the Fourier

1037
00:57:15,120 --> 00:57:17,700
transform-- the fast Fourier
transform of something--

1038
00:57:17,700 --> 00:57:20,390
[INAUDIBLE] take the
Fourier transform

1039
00:57:20,390 --> 00:57:25,400
of something analytically,
the Fourier transform

1040
00:57:25,400 --> 00:57:27,860
is defined continuously.

1041
00:57:27,860 --> 00:57:33,980
At every value of F,
there's a Fourier transform.

1042
00:57:33,980 --> 00:57:36,320
But when we do fast
Fourier transforms,

1043
00:57:36,320 --> 00:57:41,360
we've discretized time and we've
discretized frequency, right?

1044
00:57:41,360 --> 00:57:43,520
So when we take the
fast Fourier transform,

1045
00:57:43,520 --> 00:57:48,130
we get answer back where we
have a value of the Fourier

1046
00:57:48,130 --> 00:57:50,900
transform at a bunch of
discrete frequencies.

1047
00:57:50,900 --> 00:57:53,450
So frequency is discretized.

1048
00:57:53,450 --> 00:57:55,822
And we have frequencies,
little samples

1049
00:57:55,822 --> 00:57:57,530
of the spectrum at
different frequencies,

1050
00:57:57,530 --> 00:58:00,747
that are separated
by a little delta f.

1051
00:58:00,747 --> 00:58:06,760
[INAUDIBLE] What does that mean?

1052
00:58:06,760 --> 00:58:12,490
Remember when we were
doing a Fourier series?

1053
00:58:12,490 --> 00:58:18,220
What was it that we had to have
to write down a Fourier series

1054
00:58:18,220 --> 00:58:20,200
where we can write
down an approximation

1055
00:58:20,200 --> 00:58:25,690
to a function as a sum of
sine waves and multiples

1056
00:58:25,690 --> 00:58:28,380
of a common frequency?

1057
00:58:28,380 --> 00:58:30,985
What was it about
the signal in time

1058
00:58:30,985 --> 00:58:32,110
that allowed us to do that?

1059
00:58:40,660 --> 00:58:41,630
It's periodic.

1060
00:58:41,630 --> 00:58:45,240
We could only do that if
the signal is periodic.

1061
00:58:45,240 --> 00:58:52,700
So when we write down our fast
Fourier transform of a signal,

1062
00:58:52,700 --> 00:58:55,910
it's discretized in
time and frequency.

1063
00:58:55,910 --> 00:59:00,310
What that means is that
it's periodic in time.

1064
00:59:00,310 --> 00:59:06,360
So when we pass a signal that
we've sampled of some duration

1065
00:59:06,360 --> 00:59:09,360
and the fast Fourier transform
algorithm passes back

1066
00:59:09,360 --> 00:59:13,410
a spectrum that's discreted in
frequency, what that means is

1067
00:59:13,410 --> 00:59:15,750
that you can think
about that signal

1068
00:59:15,750 --> 00:59:19,280
as being periodic in time, OK?

1069
00:59:19,280 --> 00:59:24,160
Now, when you discretize
the signal in time,

1070
00:59:24,160 --> 00:59:27,720
you've taken samples
of that signal

1071
00:59:27,720 --> 00:59:31,830
in time separated by delta t.

1072
00:59:31,830 --> 00:59:35,680
What does that tell
you about the spectrum?

1073
00:59:35,680 --> 00:59:38,360
So when we pass the
Fourier transform

1074
00:59:38,360 --> 00:59:41,580
FFT algorithm, a signal
that's discretized in time,

1075
00:59:41,580 --> 00:59:48,480
it passes us back this
thing here, right,

1076
00:59:48,480 --> 00:59:50,460
with positive frequencies
in the first half

1077
00:59:50,460 --> 00:59:53,430
of the vector, the negative
frequencies in the second half.

1078
00:59:53,430 --> 00:59:55,650
It's really a piece--

1079
00:59:55,650 --> 00:59:58,125
it's one period of
a periodic spectrum.

1080
01:00:01,749 --> 01:00:05,100
[AUDIO OUT] right?

1081
01:00:05,100 --> 01:00:08,160
Mathematically, if our signal
is discretized in time,

1082
01:00:08,160 --> 01:00:10,570
it means the
spectrum is periodic.

1083
01:00:10,570 --> 01:00:14,830
And the FFT algorithm is
passing back one period.

1084
01:00:14,830 --> 01:00:18,575
And then there's a circular
shift to get this thing.

1085
01:00:18,575 --> 01:00:21,920
Does that makes sense?

1086
01:00:21,920 --> 01:00:25,990
OK, now, because these
are real functions,

1087
01:00:25,990 --> 01:00:30,460
this piece here is exactly
equal to that piece.

1088
01:00:30,460 --> 01:00:33,000
It's symmetric.

1089
01:00:33,000 --> 01:00:37,650
The magnitude of the
spectrum is symmetric.

1090
01:00:37,650 --> 01:00:40,140
So what does that mean?

1091
01:00:40,140 --> 01:00:43,010
What that means, if our
signal has some bandwidth--

1092
01:00:43,010 --> 01:00:49,990
if the highest frequency is
less than some bandwidth B--

1093
01:00:49,990 --> 01:00:53,050
if the sampling
rate is high enough,

1094
01:00:53,050 --> 01:00:57,330
then you can see that the
frequency components here

1095
01:00:57,330 --> 01:01:01,680
don't interact with the
frequency components here.

1096
01:01:01,680 --> 01:01:03,180
You can see that
they're separated.

1097
01:01:06,150 --> 01:01:07,590
OK, one more thing.

1098
01:01:07,590 --> 01:01:15,090
The period of a spectrum 1 is
over delta t [INAUDIBLE] which

1099
01:01:15,090 --> 01:01:17,200
is equal to the sampling rate.

1100
01:01:17,200 --> 01:01:20,700
So when we have a signal
that's discretized in time,

1101
01:01:20,700 --> 01:01:24,480
the spectrum is periodic and
there are multiple copies

1102
01:01:24,480 --> 01:01:27,120
of that spectrum,
of this spectrum,

1103
01:01:27,120 --> 01:01:31,820
at intervals of
[AUDIO OUT] rate.

1104
01:01:31,820 --> 01:01:35,230
OK, so if the sampling
rate is high enough,

1105
01:01:35,230 --> 01:01:38,130
then the positive
frequencies are well

1106
01:01:38,130 --> 01:01:40,890
separated from the
negative frequencies

1107
01:01:40,890 --> 01:01:47,220
if the sampling rate is higher
than twice the bandwidth

1108
01:01:47,220 --> 01:01:55,360
[AUDIO OUT] If I sample
the signal at a slower

1109
01:01:55,360 --> 01:01:57,490
and slower rate but
it's the same signal,

1110
01:01:57,490 --> 01:02:02,250
you can see at some point
that negative frequencies are

1111
01:02:02,250 --> 01:02:05,270
going to start crashing into
the positive frequencies.

1112
01:02:05,270 --> 01:02:09,140
So you can see that you
don't run into this problem

1113
01:02:09,140 --> 01:02:13,640
as long as the sampling rate is
greater than twice the highest

1114
01:02:13,640 --> 01:02:15,272
frequency [INAUDIBLE]

1115
01:02:15,272 --> 01:02:17,552
So what?

1116
01:02:17,552 --> 01:02:20,530
So who cares?

1117
01:02:20,530 --> 01:02:22,040
What's so bad about this?

1118
01:02:22,040 --> 01:02:27,680
Well, it turns out
that if you sample

1119
01:02:27,680 --> 01:02:30,955
at a frequency higher than
twice the bandwidth, the highest

1120
01:02:30,955 --> 01:02:32,330
frequency in the
signal, then you

1121
01:02:32,330 --> 01:02:33,570
can do something really cool.

1122
01:02:36,636 --> 01:02:39,800
You can perfectly reconstruct
the original signal

1123
01:02:39,800 --> 01:02:45,320
even though you've sampled
it only discretely.

1124
01:02:45,320 --> 01:02:47,960
Put an arbitrary signal in.

1125
01:02:47,960 --> 01:02:49,580
You can sample it discretely.

1126
01:02:49,580 --> 01:02:52,160
And as long as you've sampled
it at twice the highest

1127
01:02:52,160 --> 01:02:53,960
frequency in the
original signal,

1128
01:02:53,960 --> 01:02:58,190
you can perfectly reconstruct
the original signal.

1129
01:02:58,190 --> 01:02:59,660
Back to this.

1130
01:02:59,660 --> 01:03:03,080
Here's our discretely
sampled signal.

1131
01:03:03,080 --> 01:03:04,820
There is the spectrum.

1132
01:03:04,820 --> 01:03:05,690
It's periodic.

1133
01:03:05,690 --> 01:03:07,820
Let's say that the
sampling rate is

1134
01:03:07,820 --> 01:03:11,420
more than twice the bandwidth.

1135
01:03:11,420 --> 01:03:13,670
How would I reconstruct
the original signal?

1136
01:03:26,981 --> 01:03:31,410
But remember that the
convolution theorem

1137
01:03:31,410 --> 01:03:36,760
says that by multiplying
the frequency domain,

1138
01:03:36,760 --> 01:03:40,550
I'm convolving in
the time domain, OK?

1139
01:03:40,550 --> 01:03:45,670
So remember that
this piece right here

1140
01:03:45,670 --> 01:03:50,970
was the spectrum of the
original signal, right?

1141
01:03:50,970 --> 01:03:55,370
As I sampled it in time,
I added these [AUDIO OUT]

1142
01:03:55,370 --> 01:03:58,910
copies at intervals
of the sampling rate.

1143
01:03:58,910 --> 01:04:02,810
If I want to get the
original signal back,

1144
01:04:02,810 --> 01:04:05,720
I can just put a square
window around this,

1145
01:04:05,720 --> 01:04:10,115
keep that, and throw
away all the others.

1146
01:04:10,115 --> 01:04:12,030
[INAUDIBLE]

1147
01:04:12,030 --> 01:04:15,180
By sampling regularly, I've
just added these other copies.

1148
01:04:15,180 --> 01:04:18,360
But they're far enough away
that I can just throw them off.

1149
01:04:18,360 --> 01:04:21,610
I can set them to zero.

1150
01:04:21,610 --> 01:04:25,720
Now, when I put a square
window in the frequency domain,

1151
01:04:25,720 --> 01:04:29,650
what am I doing in
the time domain?

1152
01:04:29,650 --> 01:04:31,960
Multiply by a square
window in frequency,

1153
01:04:31,960 --> 01:04:35,210
what am I doing in time?

1154
01:04:35,210 --> 01:04:35,710
[INAUDIBLE]

1155
01:04:35,710 --> 01:04:40,870
So basically what I do is I
take the original signal sampled

1156
01:04:40,870 --> 01:04:41,890
regularly in time.

1157
01:04:41,890 --> 01:04:44,055
And I just convolve
it with what?

1158
01:04:44,055 --> 01:04:46,150
What's the Fourier
transform of a square pulse?

1159
01:04:50,450 --> 01:04:55,030
[INAUDIBLE] If I could just
convolve the time domain

1160
01:04:55,030 --> 01:04:56,537
[INAUDIBLE] with
a kernel, that's

1161
01:04:56,537 --> 01:04:58,370
the Fourier transform
of that square window.

1162
01:04:58,370 --> 01:05:00,080
It's just the sync function.

1163
01:05:00,080 --> 01:05:03,640
And when I do that, I get
back the original function.

1164
01:05:03,640 --> 01:05:05,320
But it's actually easier to do.

1165
01:05:05,320 --> 01:05:07,240
Rather than convolving
with a sync function,

1166
01:05:07,240 --> 01:05:10,500
it's easier just to multiply
in the frequency domain.

1167
01:05:10,500 --> 01:05:14,920
So I can basically get
back my sampled function

1168
01:05:14,920 --> 01:05:19,540
at arbitrarily fine [AUDIO OUT]

1169
01:05:19,540 --> 01:05:21,160
Here's how you actually do that.

1170
01:05:21,160 --> 01:05:24,180
That process is
called zero-padding.

1171
01:05:24,180 --> 01:05:27,710
OK, so what you can do is you
can take a function, Fourier

1172
01:05:27,710 --> 01:05:30,470
transform it, get the spectrum.

1173
01:05:30,470 --> 01:05:32,105
And what the Fourier
transform hands

1174
01:05:32,105 --> 01:05:36,130
us back is just this piece
right here [INAUDIBLE]

1175
01:05:36,130 --> 01:05:41,960
But what I can do is I can just
move those other peaks away.

1176
01:05:41,960 --> 01:05:44,080
So that's what my
FFT sends back to me.

1177
01:05:44,080 --> 01:05:45,640
Now what am I going
to do, I'm just

1178
01:05:45,640 --> 01:05:52,830
going to push that away and
add zeros in the middle.

1179
01:05:52,830 --> 01:06:01,440
Now, inverse Fourier transform,
and [INAUDIBLE] So the sampling

1180
01:06:01,440 --> 01:06:04,320
rate is just the number
of frequency samples

1181
01:06:04,320 --> 01:06:05,580
I have times delta f.

1182
01:06:05,580 --> 01:06:08,400
And here I'm just adding a
bunch of frequency samples

1183
01:06:08,400 --> 01:06:09,780
that are zero.

1184
01:06:09,780 --> 01:06:16,020
And my new delta t is just going
to be 1 over that new sampling

1185
01:06:16,020 --> 01:06:16,530
rate.

1186
01:06:16,530 --> 01:06:17,500
Here's an example.

1187
01:06:17,500 --> 01:06:19,860
This is a little bit
of code that does it.

1188
01:06:19,860 --> 01:06:23,010
Here I've taken a
sine wave at 20 hertz.

1189
01:06:23,010 --> 01:06:26,070
You can see 50
millisecond spacing

1190
01:06:26,070 --> 01:06:30,370
sampled four times per cycle.

1191
01:06:30,370 --> 01:06:32,560
I just run this little
zero-padding algorithm.

1192
01:06:32,560 --> 01:06:38,161
And you can see that it
sends me back these red dots.

1193
01:06:38,161 --> 01:06:42,690
[INAUDIBLE] have more completely
reconstructed the sine wave

1194
01:06:42,690 --> 01:06:43,710
that I sampled.

1195
01:06:43,710 --> 01:06:47,400
OK, but you can do
that with any function

1196
01:06:47,400 --> 01:06:52,150
as long as the highest frequency
in your original signal

1197
01:06:52,150 --> 01:06:56,730
is less than [AUDIO OUT]
half the sampling rate.

1198
01:07:00,090 --> 01:07:04,420
[INAUDIBLE]

1199
01:07:04,420 --> 01:07:07,360
So zero-padding, so
what I showed you here

1200
01:07:07,360 --> 01:07:11,280
is that zero-padding
in the frequency domain

1201
01:07:11,280 --> 01:07:16,290
gives you higher
sampling, faster sampling,

1202
01:07:16,290 --> 01:07:17,130
in the time domain.

1203
01:07:17,130 --> 01:07:21,230
OK, and you can also
do the same thing.

1204
01:07:21,230 --> 01:07:24,480
You can also zero-pad
in the time domain

1205
01:07:24,480 --> 01:07:28,060
to give finer spacing
in the frequency domain.

1206
01:07:28,060 --> 01:07:30,800
FFT samples will
be closer together

1207
01:07:30,800 --> 01:07:32,340
in the frequency domain.

1208
01:07:32,340 --> 01:07:34,100
OK, so here's how you do that.

1209
01:07:34,100 --> 01:07:36,830
So you take a little
piece of data.

1210
01:07:36,830 --> 01:07:40,220
You multiply it by
your DPSS taper.

1211
01:07:40,220 --> 01:07:43,840
And then just add
a bunch of zeros.

1212
01:07:43,840 --> 01:07:45,720
And then take the
Fourier transform

1213
01:07:45,720 --> 01:07:49,310
of that longer piece with
all those zeros added to it.

1214
01:07:49,310 --> 01:07:52,030
And when you do that, what
you're going to get back

1215
01:07:52,030 --> 01:07:57,550
is an FFT that has the
samples in frequency

1216
01:07:57,550 --> 01:07:58,690
more finely spaced.

1217
01:07:58,690 --> 01:08:01,060
Your delta f is
going to be smaller.

1218
01:08:01,060 --> 01:08:03,430
Now, that doesn't [AUDIO OUT]
frequency resolution.

1219
01:08:03,430 --> 01:08:06,905
There's no magic getting
around the minimum time

1220
01:08:06,905 --> 01:08:08,090
[INAUDIBLE] product.

1221
01:08:08,090 --> 01:08:08,590
OK?

1222
01:08:08,590 --> 01:08:12,460
But you have more
samples in frequency.

1223
01:08:12,460 --> 01:08:14,400
All right, any questions?

1224
01:08:17,930 --> 01:08:21,680
[INAUDIBLE] We're going to be
starting a new topic next time.

1225
01:08:21,680 --> 01:08:24,789
We're done with
spectral analysis.