1 00:00:01,550 --> 00:00:03,920 The following content is provided under a Creative 2 00:00:03,920 --> 00:00:05,310 Commons license. 3 00:00:05,310 --> 00:00:07,520 Your support will help MIT OpenCourseware 4 00:00:07,520 --> 00:00:11,610 continue to offer high quality educational resources for free. 5 00:00:11,610 --> 00:00:14,180 To make a donation or to view additional materials 6 00:00:14,180 --> 00:00:18,140 from hundreds of MIT courses, visit MIT OpenCourseWare 7 00:00:18,140 --> 00:00:19,026 at ocw.mit.edu. 8 00:00:23,170 --> 00:00:28,300 GILBERT STRANG: OK, so basically probability ideas today, 9 00:00:28,300 --> 00:00:35,860 because that's a part of the subject, part of deep learning 10 00:00:35,860 --> 00:00:37,300 as we get there. 11 00:00:37,300 --> 00:00:40,240 And it's probably a good topic for the day 12 00:00:40,240 --> 00:00:45,590 before spring break, because lots of you will have seen-- 13 00:00:45,590 --> 00:00:47,920 of course, you will have seen the sample 14 00:00:47,920 --> 00:00:51,620 mean, the average of the data. 15 00:00:55,980 --> 00:00:58,400 And you'll know about the expected mean. 16 00:00:58,400 --> 00:01:01,560 Let me complete that. 17 00:01:04,230 --> 00:01:07,650 What's the expected mean? 18 00:01:07,650 --> 00:01:11,610 So this is the expectation of the value x 19 00:01:11,610 --> 00:01:17,370 where we get x1 with probability P1 20 00:01:17,370 --> 00:01:20,670 along to xn with probability Pn. 21 00:01:20,670 --> 00:01:26,250 So we just want to say, what's our average output-- 22 00:01:26,250 --> 00:01:28,500 average outcome? 23 00:01:28,500 --> 00:01:31,150 And we weight it by their probabilities. 24 00:01:31,150 --> 00:01:39,240 So it's P1 x1 plus so on plus Pn xn. 25 00:01:39,240 --> 00:01:41,400 So that's the expected value of x. 26 00:01:45,920 --> 00:01:48,330 Are you comfortable with that symbol E? 27 00:01:48,330 --> 00:01:50,100 Because that's like everywhere. 28 00:01:50,100 --> 00:01:52,530 It gives a handy shorthand. 29 00:01:52,530 --> 00:01:57,240 For example, the variance is the expected value of what? 30 00:02:01,820 --> 00:02:04,580 The variance is an expected value 31 00:02:04,580 --> 00:02:09,830 based on these probabilities of the square of the distance. 32 00:02:09,830 --> 00:02:13,700 So everybody remembers it involves square. 33 00:02:13,700 --> 00:02:15,845 And it's the distance from the mean. 34 00:02:19,460 --> 00:02:25,640 Let me call this, say, m, maybe, just to have a smaller letter. 35 00:02:25,640 --> 00:02:29,680 So it's the distance from the mean minus m. 36 00:02:29,680 --> 00:02:35,600 It's the expected value, the average value of x minus m 37 00:02:35,600 --> 00:02:37,040 squared. 38 00:02:37,040 --> 00:02:40,700 And in general, of course, the expected-- 39 00:02:40,700 --> 00:02:46,070 this covariance matrix I could express with that E notation. 40 00:02:46,070 --> 00:02:49,270 But let me just stretch it as far as 41 00:02:49,270 --> 00:02:53,750 what would be the expected value of any function of x? 42 00:02:56,590 --> 00:03:02,170 Well, we've got n possible outputs, x1 to xn. 43 00:03:02,170 --> 00:03:06,700 We look at f of x1 up to f of xn. 44 00:03:06,700 --> 00:03:10,510 We weight those by the probabilities that they happen. 45 00:03:10,510 --> 00:03:12,370 So this would be-- 46 00:03:12,370 --> 00:03:16,450 let me make just a little corner for this E letter-- 47 00:03:16,450 --> 00:03:20,770 so this would be the probability that that 48 00:03:20,770 --> 00:03:29,160 is f of x1 times the value of f of x1. 49 00:03:29,160 --> 00:03:34,340 So this is the contribution from the x1 possibility. 50 00:03:34,340 --> 00:03:36,860 And now, we include them all. 51 00:03:36,860 --> 00:03:44,720 So it will be output f of xn with probability Pn. 52 00:03:44,720 --> 00:03:48,920 And if that f of x is x minus m squared, 53 00:03:48,920 --> 00:03:51,600 then we get what we expect. 54 00:03:51,600 --> 00:03:54,830 And let me remember that. 55 00:03:54,830 --> 00:03:57,710 So I just want to keep going with variance. 56 00:04:01,660 --> 00:04:03,220 So it's the sum. 57 00:04:03,220 --> 00:04:09,280 It's the first probability times the first output minus the mean 58 00:04:09,280 --> 00:04:16,500 squared the last probability times that last output 59 00:04:16,500 --> 00:04:19,800 xn minus m squared. 60 00:04:19,800 --> 00:04:23,310 And everybody should know a second expression, 61 00:04:23,310 --> 00:04:27,600 a second way, to do that the sum. 62 00:04:27,600 --> 00:04:31,200 If I just write out those squares 63 00:04:31,200 --> 00:04:33,400 and combine them a little differently, 64 00:04:33,400 --> 00:04:39,070 I get a second expression which is really useful, often 65 00:04:39,070 --> 00:04:40,600 a little faster to compute. 66 00:04:40,600 --> 00:04:42,910 So can I just do that? 67 00:04:42,910 --> 00:04:52,480 So that's x1 squared minus 2 x1 m1 plus m1 squared. 68 00:04:52,480 --> 00:04:56,740 And then same thing here, Pn times xn 69 00:04:56,740 --> 00:05:03,590 squared minus 2xn m plus m squared. 70 00:05:03,590 --> 00:05:05,662 Good with that? 71 00:05:05,662 --> 00:05:07,638 AUDIENCE: On that m-- 72 00:05:07,638 --> 00:05:08,626 GILBERT STRANG: Sorry? 73 00:05:08,626 --> 00:05:11,100 AUDIENCE: On that m-- 74 00:05:11,100 --> 00:05:13,730 GILBERT STRANG: Plus n, oh, sorry. 75 00:05:13,730 --> 00:05:15,965 No, I mean-- am I-- 76 00:05:15,965 --> 00:05:20,520 AUDIENCE: So for P1 it's x1 squared minus-- 77 00:05:20,520 --> 00:05:22,010 GILBERT STRANG: Oh, it's just an m. 78 00:05:22,010 --> 00:05:22,650 Correct. 79 00:05:22,650 --> 00:05:23,770 Thank you. 80 00:05:23,770 --> 00:05:24,270 Thank you. 81 00:05:24,270 --> 00:05:25,320 Just an m. 82 00:05:25,320 --> 00:05:26,650 Good. 83 00:05:26,650 --> 00:05:27,150 OK. 84 00:05:27,150 --> 00:05:31,270 Can we take that sum? 85 00:05:31,270 --> 00:05:38,680 So I get P1 x1 squared if I take these, the first guys. 86 00:05:43,280 --> 00:05:47,990 So I've accounted for this and this. 87 00:05:47,990 --> 00:05:54,680 Now, I'll take minus 2 P1 x1 m. 88 00:05:54,680 --> 00:06:06,600 So P1 x1 m plus Pn xn m. 89 00:06:10,330 --> 00:06:13,070 I'm just writing it all out, and I'm going to recombine it. 90 00:06:13,070 --> 00:06:20,100 So now, I have P1 m squared plus P2 m squared plus Pn m squared. 91 00:06:20,100 --> 00:06:26,640 So what do I have from the P1 m squared 92 00:06:26,640 --> 00:06:31,700 all the way up to Pn m squared? 93 00:06:31,700 --> 00:06:33,110 Are you with me? 94 00:06:33,110 --> 00:06:37,190 So m squared is in every term. 95 00:06:37,190 --> 00:06:40,340 So I'm going to have an m squared. 96 00:06:40,340 --> 00:06:43,010 And what's it multiplied by? 97 00:06:43,010 --> 00:06:47,780 P1 here, P2 here, Pn here. 98 00:06:47,780 --> 00:06:49,560 I add those up and I get? 99 00:06:49,560 --> 00:06:50,060 AUDIENCE: 1. 100 00:06:50,060 --> 00:06:50,810 GILBERT STRANG: 1. 101 00:06:50,810 --> 00:06:52,020 So that's it. 102 00:06:52,020 --> 00:06:55,670 OK, now, I'll just simplify this thing. 103 00:06:55,670 --> 00:07:00,920 So this is really the expected value of what? 104 00:07:00,920 --> 00:07:04,457 What am I seeing in this term? 105 00:07:04,457 --> 00:07:05,290 AUDIENCE: x squared. 106 00:07:05,290 --> 00:07:09,060 GILBERT STRANG: The expected value of x squared, right. 107 00:07:11,880 --> 00:07:15,270 Different from the expected value of x minus m 108 00:07:15,270 --> 00:07:17,620 squared, of course. 109 00:07:17,620 --> 00:07:20,010 This is just a first term from here. 110 00:07:20,010 --> 00:07:22,590 But, now, what do I get for this second term? 111 00:07:22,590 --> 00:07:25,290 Well, the point is that m comes out. 112 00:07:25,290 --> 00:07:29,490 So this is minus an m and a 2. 113 00:07:29,490 --> 00:07:31,990 And what do I have left? 114 00:07:31,990 --> 00:07:33,960 So I've used up the m. 115 00:07:33,960 --> 00:07:35,880 I've used up the 2. 116 00:07:35,880 --> 00:07:39,230 P1 x1 dot dot dot Pn xn, what's that? 117 00:07:41,790 --> 00:07:44,900 Everybody should just pay attention to this. 118 00:07:44,900 --> 00:07:49,300 Trivial, I mean, we're just doing high school algebra here. 119 00:07:49,300 --> 00:07:53,010 But P1 x1 up to Pn xn is m. 120 00:07:53,010 --> 00:07:56,960 So I have another m, m squared there. 121 00:07:56,960 --> 00:08:01,680 And I have a plus m squared from the n. 122 00:08:01,680 --> 00:08:06,900 So you see that it is another expression, the expected value 123 00:08:06,900 --> 00:08:11,361 of x squared minus m squared. 124 00:08:11,361 --> 00:08:14,050 It's just algebra. 125 00:08:14,050 --> 00:08:19,560 That is the same as this. 126 00:08:19,560 --> 00:08:21,870 So that if you happen to have a handy way 127 00:08:21,870 --> 00:08:24,930 to compute the expected value of x squared, 128 00:08:24,930 --> 00:08:26,610 you would just subtract m squared. 129 00:08:26,610 --> 00:08:29,710 And you'd have the same as this. 130 00:08:29,710 --> 00:08:32,580 Yeah, it's just algebra. 131 00:08:32,580 --> 00:08:39,880 OK, let's go a little deeper with something here-- 132 00:08:39,880 --> 00:08:42,140 if I can find it. 133 00:08:42,140 --> 00:08:46,640 There are two great inequalities in statistics. 134 00:08:46,640 --> 00:08:49,580 And the first one is due to Markov. 135 00:08:49,580 --> 00:08:53,370 And I don't know if you know Markov's inequality. 136 00:08:55,940 --> 00:09:02,180 It comes out easily, in fact, too easily. 137 00:09:02,180 --> 00:09:06,140 I'm kind of happy to discuss him. 138 00:09:06,140 --> 00:09:11,570 And now I've jumped to Section 5 of the book. 139 00:09:11,570 --> 00:09:16,190 So I'll need to post Section 5, which 140 00:09:16,190 --> 00:09:18,770 is probability and statistics. 141 00:09:18,770 --> 00:09:22,550 And you'll see this Markov inequality. 142 00:09:22,550 --> 00:09:24,360 So it just involves this stuff. 143 00:09:24,360 --> 00:09:28,910 So that's why I'll go do it now. 144 00:09:28,910 --> 00:09:30,840 Markov's inequality. 145 00:09:35,550 --> 00:09:38,870 He was a great Russian mathematician, 146 00:09:38,870 --> 00:09:41,940 oh, probably about 1900. 147 00:09:41,940 --> 00:09:46,740 And we will see Markov chains and Markov processes, 148 00:09:46,740 --> 00:09:49,350 that's beautiful linear algebra. 149 00:09:49,350 --> 00:09:52,460 But this little inequality is not matrices. 150 00:09:52,460 --> 00:09:55,740 It's just playing with these. 151 00:09:55,740 --> 00:10:04,270 And it applies to non-negative events. 152 00:10:08,200 --> 00:10:16,173 So shall I say applies when all the x, all the outputs, 153 00:10:16,173 --> 00:10:17,590 are greater than or equal to zero. 154 00:10:21,670 --> 00:10:24,590 So I'm going to use that fact. 155 00:10:24,590 --> 00:10:32,020 So it doesn't apply to something like a Gaussian, 156 00:10:32,020 --> 00:10:33,970 because there, the Gaussian, the outputs 157 00:10:33,970 --> 00:10:37,060 go all the way from minus infinity to infinity. 158 00:10:37,060 --> 00:10:45,180 It does apply to a lot of important ones and simple ones. 159 00:10:45,180 --> 00:10:51,440 I'll give you the proof for this finite probability. 160 00:10:51,440 --> 00:10:54,920 And there will be a similar proof, similar discussion 161 00:10:54,920 --> 00:10:58,220 everywhere here for continuous probability. 162 00:10:58,220 --> 00:11:00,500 So what does Markov say? 163 00:11:00,500 --> 00:11:06,640 Let me be sure I get it right, because I'm not a pro at this. 164 00:11:06,640 --> 00:11:15,890 It's natural to want to estimate the probability that x 165 00:11:15,890 --> 00:11:19,310 is greater or equal to some number a. 166 00:11:19,310 --> 00:11:22,850 Get some idea of what's the probability of x 167 00:11:22,850 --> 00:11:24,770 being greater or equal to a. 168 00:11:24,770 --> 00:11:26,270 So what do we know? 169 00:11:26,270 --> 00:11:28,840 This is certainly a number between 0 and 1. 170 00:11:31,390 --> 00:11:34,180 That number is going to get smaller as a increases, 171 00:11:34,180 --> 00:11:38,950 because we're going to be asking for more. 172 00:11:38,950 --> 00:11:50,620 If I take a to b, say, twice the mean, 173 00:11:50,620 --> 00:11:53,590 can I estimate what that probability could be. 174 00:11:53,590 --> 00:11:56,890 And that's what Markov has done. 175 00:11:56,890 --> 00:12:01,135 He says the probability of that is at least-- 176 00:12:05,140 --> 00:12:08,520 at most-- sorry. 177 00:12:08,520 --> 00:12:11,230 Let's see I used to have an eraser-- 178 00:12:11,230 --> 00:12:16,030 at least-- sorry, at most-- 179 00:12:16,030 --> 00:12:19,450 yes, got it, got it-- 180 00:12:19,450 --> 00:12:25,320 is less or equal to the mean-- 181 00:12:25,320 --> 00:12:28,710 x bar is another way to write the mean-- 182 00:12:28,710 --> 00:12:30,540 divided by a. 183 00:12:30,540 --> 00:12:37,200 And this is the mean over a or it's 184 00:12:37,200 --> 00:12:41,040 the expected value of x over a. 185 00:12:41,040 --> 00:12:44,420 We could see any of those notations. 186 00:12:44,420 --> 00:12:44,920 OK. 187 00:12:50,180 --> 00:12:56,180 And as we expect, as a increases, the probability, 188 00:12:56,180 --> 00:12:58,580 this number, goes down, the probability 189 00:12:58,580 --> 00:13:02,150 goes down of exceeding a. 190 00:13:02,150 --> 00:13:05,750 So that's a pretty simple estimate 191 00:13:05,750 --> 00:13:09,560 to get this probability just in terms of the number 192 00:13:09,560 --> 00:13:13,010 a, which has to come in, because it's part of the question, 193 00:13:13,010 --> 00:13:15,150 and the mean, x. 194 00:13:15,150 --> 00:13:20,300 So let me take an example as a equals 3. 195 00:13:26,700 --> 00:13:29,490 For example, suppose a with 3. 196 00:13:29,490 --> 00:13:35,620 I want to show that the probability of x being greater 197 00:13:35,620 --> 00:13:38,030 than or equal to 3. 198 00:13:38,030 --> 00:13:38,920 Yeah, OK. 199 00:13:42,550 --> 00:13:44,420 We don't have many facts to work with. 200 00:13:44,420 --> 00:13:50,020 So if we write those down, we should see the reason. 201 00:13:50,020 --> 00:13:54,430 So I know that the mean is E of x. 202 00:13:54,430 --> 00:13:57,400 So let's see, am I going to take-- 203 00:13:57,400 --> 00:14:03,880 yeah, for example, let's take the mean to be 1. 204 00:14:07,430 --> 00:14:11,620 So I'm going to imagine that the mean is 1 205 00:14:11,620 --> 00:14:15,800 and that I'm asking for what's the chance that x 206 00:14:15,800 --> 00:14:20,880 will be bigger than 3. 207 00:14:20,880 --> 00:14:24,240 And I'll get an estimate of 1/3. 208 00:14:24,240 --> 00:14:34,310 So I'm trying to show that the probability of x 209 00:14:34,310 --> 00:14:39,580 greater or equal 3 is less or equal to-- 210 00:14:39,580 --> 00:14:41,730 the mean, I'm saying is 1. 211 00:14:41,730 --> 00:14:43,100 a is 3. 212 00:14:43,100 --> 00:14:45,800 So it is less than or equal 1/3. 213 00:14:45,800 --> 00:14:48,290 Now, why is that true? 214 00:14:48,290 --> 00:14:51,320 That's what I have to show. 215 00:14:51,320 --> 00:14:54,740 I think that if I write down what I know, I'll see it. 216 00:14:58,520 --> 00:15:04,440 So let me just raise that a little so that I have room 217 00:15:04,440 --> 00:15:05,630 to write. 218 00:15:05,630 --> 00:15:08,840 So what do I know? 219 00:15:08,840 --> 00:15:10,600 I know the definition of the mean. 220 00:15:10,600 --> 00:15:24,660 So I know that x1 times P1 plus x2 P2 plus x3 P3-- 221 00:15:24,660 --> 00:15:28,550 allow me to get carried away here-- 222 00:15:28,550 --> 00:15:33,570 x5 P5, say, is what? 223 00:15:36,280 --> 00:15:39,220 So what I've written down there is the mean. 224 00:15:39,220 --> 00:15:41,230 And I'm assuming that to be 1. 225 00:15:41,230 --> 00:15:44,230 So this is the fact that I know is 1. 226 00:15:47,020 --> 00:15:50,020 And what is it that I want to prove? 227 00:15:50,020 --> 00:15:55,060 I want to know the probability of being greater or equal 3. 228 00:15:55,060 --> 00:15:59,000 So what's the probability that the result 229 00:15:59,000 --> 00:16:00,430 will be greater or equal 3? 230 00:16:03,690 --> 00:16:05,640 It's P3. 231 00:16:05,640 --> 00:16:10,870 So this is saying that P3 plus P4-- 232 00:16:10,870 --> 00:16:12,670 these are the probabilities. 233 00:16:12,670 --> 00:16:14,940 These are the different ways that I 234 00:16:14,940 --> 00:16:17,240 might be greater or equal 3. 235 00:16:17,240 --> 00:16:19,620 And I'm claiming that that's less than or equal 1/3. 236 00:16:25,290 --> 00:16:29,400 What I liked about this elementary approach 237 00:16:29,400 --> 00:16:37,200 is that I've stated these facts, these probability 238 00:16:37,200 --> 00:16:43,920 assumptions and conclusions directly in terms of numbers. 239 00:16:43,920 --> 00:16:51,030 So I just want to show that if this is true, then that's true. 240 00:16:51,030 --> 00:16:55,370 Let's see, I guess I'm thinking that the-- 241 00:16:55,370 --> 00:16:58,640 I'm sorry, I even took a more special case. 242 00:16:58,640 --> 00:17:11,849 I'm taking the case where x1 is 1, x2 is 2, x3 is 3, x4 is 4, 243 00:17:11,849 --> 00:17:14,200 and x5 is 5. 244 00:17:14,200 --> 00:17:21,040 So that satisfies my condition that the outputs-- 245 00:17:21,040 --> 00:17:24,609 1, 2, 3, 4, or 5-- 246 00:17:24,609 --> 00:17:31,150 are all-- Markov only applies when 247 00:17:31,150 --> 00:17:32,660 they're all greater or equal 0. 248 00:17:35,680 --> 00:17:38,560 So I'm just imagining the special case 249 00:17:38,560 --> 00:17:42,590 where that possible outputs are 1, 2, 3, 4, 5. 250 00:17:42,590 --> 00:17:47,530 Their probabilities are P1, P2, P3, P4, P5. 251 00:17:47,530 --> 00:17:50,080 The mean is 1. 252 00:17:50,080 --> 00:17:52,840 And what I want to show is that the probability 253 00:17:52,840 --> 00:17:59,350 of being greater than 3 is less than or equal 1/3. 254 00:17:59,350 --> 00:18:01,645 And can you put together these two? 255 00:18:05,620 --> 00:18:09,010 Given this, we want to conclude that. 256 00:18:09,010 --> 00:18:11,650 Let me just step back a minute. 257 00:18:11,650 --> 00:18:13,720 So what do we know here? 258 00:18:16,990 --> 00:18:19,090 We know this, the first line. 259 00:18:19,090 --> 00:18:21,920 And we want to prove the second. 260 00:18:21,920 --> 00:18:23,120 We know one more thing. 261 00:18:25,690 --> 00:18:28,390 All the probability-- well, we know the probabilities 262 00:18:28,390 --> 00:18:29,080 add to 1. 263 00:18:29,080 --> 00:18:31,450 And we know they're all greater equal 0. 264 00:18:31,450 --> 00:18:34,960 So let me put those facts in here too. 265 00:18:34,960 --> 00:18:46,320 We know that P1 plus P2 plus P3 plus P4 plus P5 is 1. 266 00:18:46,320 --> 00:18:47,430 That we know. 267 00:18:47,430 --> 00:18:53,550 And we also know that all the P's 268 00:18:53,550 --> 00:18:56,550 are greater than or equal to 0. 269 00:18:56,550 --> 00:18:57,050 OK. 270 00:19:03,420 --> 00:19:06,180 So here we go. 271 00:19:06,180 --> 00:19:14,370 My idea is this is looking at 3 times P3 plus P4 plus P5. 272 00:19:14,370 --> 00:19:19,560 So I'm going to take 3 P3 plus 3 P4 away from this. 273 00:19:22,070 --> 00:19:39,320 So this we'll say P1 plus 2 P2 plus 3 of P3 plus P4 plus P5. 274 00:19:39,320 --> 00:19:42,420 I'm just picking out three of those guys. 275 00:19:42,420 --> 00:19:46,500 Plus I have one more P4 to account for 276 00:19:46,500 --> 00:19:50,430 and two more P5s gives 1. 277 00:19:57,594 --> 00:19:58,094 Good? 278 00:20:00,970 --> 00:20:07,120 Now, this is what I'm trying to prove. 279 00:20:07,120 --> 00:20:07,960 So that is here. 280 00:20:07,960 --> 00:20:12,020 I'm trying to prove that this thing is-- 281 00:20:12,020 --> 00:20:15,060 what am I trying to prove about that number? 282 00:20:15,060 --> 00:20:17,280 Sorry, I'm talking a lot. 283 00:20:17,280 --> 00:20:19,580 But now, I've really come to the point. 284 00:20:19,580 --> 00:20:22,390 What is Markov telling me about that number? 285 00:20:22,390 --> 00:20:22,890 That's-- 286 00:20:22,890 --> 00:20:23,950 AUDIENCE: Less than or equal to 1. 287 00:20:23,950 --> 00:20:25,480 GILBERT STRANG: That is less than or equal to? 288 00:20:25,480 --> 00:20:26,090 AUDIENCE: 1. 289 00:20:26,090 --> 00:20:27,048 GILBERT STRANG: Thanks. 290 00:20:27,048 --> 00:20:31,930 OK, I'm trying to prove that this is less than or equal 1. 291 00:20:31,930 --> 00:20:33,160 That's what Markov tells me. 292 00:20:36,670 --> 00:20:38,870 But suppose it was greater than 1? 293 00:20:38,870 --> 00:20:40,255 Do you see the problem? 294 00:20:40,255 --> 00:20:42,580 Do you see why it can't be greater than 1? 295 00:20:42,580 --> 00:20:43,600 Because why? 296 00:20:48,470 --> 00:20:50,430 AUDIENCE: All are the other terms-- 297 00:20:50,430 --> 00:20:53,310 GILBERT STRANG: All the other terms are greater equal 0. 298 00:20:53,310 --> 00:20:55,200 Probabilities or greater equal 0. 299 00:20:55,200 --> 00:20:57,690 These are all greater equal 0. 300 00:20:57,690 --> 00:21:00,240 And the total things adds to 1. 301 00:21:00,240 --> 00:21:04,180 So that this piece has to be less or equal to 1. 302 00:21:04,180 --> 00:21:04,860 That's right. 303 00:21:04,860 --> 00:21:06,600 That's it. 304 00:21:06,600 --> 00:21:09,120 So a lot of talking there. 305 00:21:09,120 --> 00:21:10,800 Simple idea. 306 00:21:10,800 --> 00:21:16,140 And you'll see exactly this example written down 307 00:21:16,140 --> 00:21:19,650 in the notes. 308 00:21:19,650 --> 00:21:22,590 And then you'll see a more conventional proof 309 00:21:22,590 --> 00:21:30,000 of Markov's inequality by taking simple inequality steps. 310 00:21:30,000 --> 00:21:31,950 But they're somehow more mysterious. 311 00:21:31,950 --> 00:21:34,560 For me, this was explicit. 312 00:21:34,560 --> 00:21:37,740 OK, so that's Markov. 313 00:21:37,740 --> 00:21:42,390 Chebyshev is the other great Russian probabilist 314 00:21:42,390 --> 00:21:43,140 of the time. 315 00:21:43,140 --> 00:21:48,658 And he gets his inequality. 316 00:21:57,460 --> 00:21:59,020 So there are the two. 317 00:21:59,020 --> 00:22:01,330 There's Markov's equality. 318 00:22:01,330 --> 00:22:04,960 Let me write it down again what it was. 319 00:22:04,960 --> 00:22:12,670 Here was Markov's inequality and Markov's assumption. 320 00:22:12,670 --> 00:22:14,890 Chebyshev doesn't make that assumption. 321 00:22:14,890 --> 00:22:26,280 So now, no assumption of that the outputs are greater 322 00:22:26,280 --> 00:22:28,060 equal 0. 323 00:22:28,060 --> 00:22:30,310 Doesn't come in. 324 00:22:30,310 --> 00:22:34,190 Now what is Chebyshev trying to estimate? 325 00:22:34,190 --> 00:22:36,420 OK, let's move to Chebyshev. 326 00:22:36,420 --> 00:22:37,750 And that's the last guy. 327 00:22:45,270 --> 00:22:56,190 So Chebyshev was interested in the probability that x minus 328 00:22:56,190 --> 00:22:59,940 the mean, m-- can I use that for mean-- 329 00:23:03,190 --> 00:23:10,866 is probably greater equal to a-- 330 00:23:10,866 --> 00:23:18,660 the probability of being sort of a distance a away from mean. 331 00:23:18,660 --> 00:23:22,860 So again, as a increases, I'm asking more, 332 00:23:22,860 --> 00:23:25,810 I'm asking it to be further away from the mean, 333 00:23:25,810 --> 00:23:28,500 and the probability will drop. 334 00:23:28,500 --> 00:23:32,040 And then the question is can we estimate this? 335 00:23:32,040 --> 00:23:34,590 So this is a different estimate. 336 00:23:34,590 --> 00:23:38,300 But it's similar question. 337 00:23:38,300 --> 00:23:41,690 And what Chebyshev's answer for this? 338 00:23:41,690 --> 00:23:44,240 So this is the probability of this. 339 00:23:44,240 --> 00:23:47,550 I have to put off big-- 340 00:23:47,550 --> 00:23:50,240 that's all one mouthful-- 341 00:23:50,240 --> 00:24:00,170 the probability that this x minus m is greater equal to a. 342 00:24:00,170 --> 00:24:02,650 And again, we're going to have is less than 343 00:24:02,650 --> 00:24:09,280 or equal to sigma squared now comes in over a squared. 344 00:24:14,560 --> 00:24:16,360 So that's Chebyshev. 345 00:24:16,360 --> 00:24:22,210 And I just take time today to do these two 346 00:24:22,210 --> 00:24:27,720 because they involve analysis. 347 00:24:27,720 --> 00:24:30,880 They're basic tools. 348 00:24:30,880 --> 00:24:32,380 They're sort of the first thing you 349 00:24:32,380 --> 00:24:35,560 think of if you're trying to estimate a probability. 350 00:24:35,560 --> 00:24:38,800 Does it fit Markov? 351 00:24:38,800 --> 00:24:41,060 And Markov only applies-- 352 00:24:41,060 --> 00:24:42,490 so I'll put only applies-- 353 00:24:45,130 --> 00:24:47,720 when the x's are all greater or equal 0. 354 00:24:47,720 --> 00:24:50,470 Here, does it fit Chebyshev? 355 00:24:50,470 --> 00:24:52,790 And now we're taking absolute values. 356 00:24:52,790 --> 00:24:56,230 So we're not concerned about the size of x. 357 00:24:56,230 --> 00:24:58,240 And we're taking a distance from m. 358 00:24:58,240 --> 00:25:02,200 So we're obviously in the world of variances. 359 00:25:02,200 --> 00:25:05,210 We're distances from m. 360 00:25:05,210 --> 00:25:10,870 And the proof of Chebyshev comes directly from Markov. 361 00:25:10,870 --> 00:25:16,510 So I'm going to apply Markov-- 362 00:25:16,510 --> 00:25:18,415 so good thing that Markov came first-- 363 00:25:22,680 --> 00:25:27,290 to-- now let me just say this right-- 364 00:25:31,270 --> 00:25:39,070 to a new, let me call it, y-- 365 00:25:39,070 --> 00:25:45,280 this will be a new output. 366 00:25:45,280 --> 00:25:52,470 And it will be x minus m squared. 367 00:25:58,190 --> 00:26:00,050 Of course, with the same probability. 368 00:26:00,050 --> 00:26:07,040 So yi is xi minus m, the mean, squared. 369 00:26:07,040 --> 00:26:13,990 And the same probability, same probabilities Pi. 370 00:26:18,720 --> 00:26:22,200 So I guess if I'm going to apply-- 371 00:26:22,200 --> 00:26:31,860 I'm just going to take the y's here instead of the x's here 372 00:26:31,860 --> 00:26:33,970 and then apply Markov. 373 00:26:33,970 --> 00:26:36,130 So what is x bar? 374 00:26:36,130 --> 00:26:38,100 So if I want to apply Markov, I have 375 00:26:38,100 --> 00:26:40,650 to figure out the mean of x. 376 00:26:40,650 --> 00:26:43,140 Over here, I have to figure out the mean of y. 377 00:26:43,140 --> 00:26:50,040 What is the mean of y? 378 00:26:50,040 --> 00:26:55,440 The mean value, the sum of probabilities times y's. 379 00:27:03,020 --> 00:27:06,230 You're supposed to recognize it. 380 00:27:06,230 --> 00:27:08,360 This is the sum of probabilities. 381 00:27:11,570 --> 00:27:16,805 And my y's are the xi minus the mean squared. 382 00:27:20,230 --> 00:27:25,420 So this is the mean for this y thing 383 00:27:25,420 --> 00:27:28,600 that I've brought in has that formula. 384 00:27:28,600 --> 00:27:31,450 And we recognize what that quantity is. 385 00:27:31,450 --> 00:27:33,420 That is? 386 00:27:33,420 --> 00:27:37,500 That's sigma squared, sigma squared for the original x's. 387 00:27:37,500 --> 00:27:39,750 So that's great. 388 00:27:39,750 --> 00:27:46,010 So the mean is sigma-- is the old sigma squared. 389 00:27:46,010 --> 00:27:47,460 Those are exclamation marks. 390 00:27:50,580 --> 00:27:56,118 Do see that now Chebyshev is looking like Markov? 391 00:27:59,580 --> 00:28:02,370 Over here will be the x-- 392 00:28:02,370 --> 00:28:09,180 now I want the expected value of y over the-- 393 00:28:09,180 --> 00:28:16,070 let's see, yeah, so the expected y is going to be that. 394 00:28:16,070 --> 00:28:18,450 And now what do I have to divide by? 395 00:28:18,450 --> 00:28:25,570 I want to know probability of this thing being bigger than a. 396 00:28:25,570 --> 00:28:28,750 But now I'm looking at the y's. 397 00:28:28,750 --> 00:28:33,490 So the probability of if x minus m 398 00:28:33,490 --> 00:28:39,670 is greater than or equal to a, then x minus xi minus m squared 399 00:28:39,670 --> 00:28:43,060 is greater equal a squared. 400 00:28:43,060 --> 00:28:51,580 So my a over here for x is now turning, 401 00:28:51,580 --> 00:28:53,850 in this problem where I'm looking 402 00:28:53,850 --> 00:28:57,470 at probability greater equal a but squaring it, 403 00:28:57,470 --> 00:29:02,790 this is the a squared. 404 00:29:02,790 --> 00:29:07,150 So that's Markov applied to y. 405 00:29:07,150 --> 00:29:09,190 Here is Markov applied to x. 406 00:29:09,190 --> 00:29:11,800 And x had to be greater equal 0. 407 00:29:11,800 --> 00:29:16,800 So over here, Chebyshev took a y, which was greater equal to 0 408 00:29:16,800 --> 00:29:22,390 than just applied Markov and recognize 409 00:29:22,390 --> 00:29:28,270 that mean of his variable, x minus m squared 410 00:29:28,270 --> 00:29:30,550 was exactly sigma squared. 411 00:29:30,550 --> 00:29:33,800 And it fell out. 412 00:29:33,800 --> 00:29:43,900 So again, here is a very simple proof for Markov. 413 00:29:43,900 --> 00:29:48,130 And then everybody agrees that Chebyshev follows right away 414 00:29:48,130 --> 00:29:49,660 from Markov. 415 00:29:49,660 --> 00:29:54,080 So those are two basic inequalities. 416 00:29:54,080 --> 00:29:59,440 Now, the other topic that I wanted to deal with 417 00:29:59,440 --> 00:30:02,480 was covariance, covariance matrix. 418 00:30:02,480 --> 00:30:07,660 You have to get comfortable with what's the covariance. 419 00:30:16,550 --> 00:30:25,450 So covariance, covariance matrix, 420 00:30:25,450 --> 00:30:37,950 and it will be m by m when I have m experiments at once. 421 00:30:40,632 --> 00:30:44,500 And let me take m equal to 2. 422 00:30:44,500 --> 00:30:48,580 You'll see everything for m equal to 2. 423 00:30:48,580 --> 00:30:53,620 So we're expecting to get a 2 by 2 matrix. 424 00:30:53,620 --> 00:30:55,960 And what are we starting with? 425 00:30:55,960 --> 00:30:59,180 We start we're doing two experiments at once. 426 00:30:59,180 --> 00:31:07,610 So we have two outputs, an x and a y. 427 00:31:10,230 --> 00:31:13,390 So the x's are the outputs from the x experiment. 428 00:31:13,390 --> 00:31:18,900 The y's are the output from a second experiment. 429 00:31:18,900 --> 00:31:21,240 We're flipping two coins. 430 00:31:21,240 --> 00:31:26,190 So let's take that example, two coins. 431 00:31:26,190 --> 00:31:37,080 Coin 1 gets 0 or 1 with P equal to the probability 1/2. 432 00:31:37,080 --> 00:31:46,760 Coin 2, 0 or 1 with probability 1/2. 433 00:31:46,760 --> 00:31:47,750 So they're fair coins. 434 00:31:50,610 --> 00:31:56,290 But what I haven't said is, is there a connection 435 00:31:56,290 --> 00:31:58,350 between the output-- 436 00:31:58,350 --> 00:32:00,580 this is the x. 437 00:32:00,580 --> 00:32:03,760 This is the y-- 438 00:32:03,760 --> 00:32:06,400 if I glue the coins together, then the two outputs 439 00:32:06,400 --> 00:32:07,270 are the same. 440 00:32:07,270 --> 00:32:13,390 I think for me this is a model question that brings out 441 00:32:13,390 --> 00:32:16,160 the main point of covariance. 442 00:32:16,160 --> 00:32:20,270 If I flipped two coins separately, 443 00:32:20,270 --> 00:32:29,600 quite independently, then I don't know more about y 444 00:32:29,600 --> 00:32:31,070 from knowing x. 445 00:32:31,070 --> 00:32:33,287 If I know the answer to one flip, 446 00:32:33,287 --> 00:32:35,120 it doesn't tell me anything about the second 447 00:32:35,120 --> 00:32:40,090 if they're independent, uncorrelated. 448 00:32:40,090 --> 00:32:42,850 But if the two coins are glued together, 449 00:32:42,850 --> 00:32:48,510 then heads will come up for both coins. 450 00:32:48,510 --> 00:32:50,420 I'll only have two possibilities. 451 00:32:50,420 --> 00:32:53,100 It'll be heads heads or tails tails. 452 00:32:53,100 --> 00:32:58,560 Let me let me write down those two different scenarios. 453 00:32:58,560 --> 00:33:06,100 So unglued-- I never expected to write that word in a math 454 00:33:06,100 --> 00:33:06,800 class-- 455 00:33:06,800 --> 00:33:07,300 unglued. 456 00:33:10,330 --> 00:33:13,150 And what am I going to write down? 457 00:33:13,150 --> 00:33:24,850 I'm going to write down a matrix with heads, tails for coin 1, 458 00:33:24,850 --> 00:33:28,380 and heads and tails for coin 2. 459 00:33:28,380 --> 00:33:32,730 So the possibilities are coin one get heads and coin 2 460 00:33:32,730 --> 00:33:33,230 gets heads. 461 00:33:33,230 --> 00:33:35,440 What's the probability of that? 462 00:33:35,440 --> 00:33:37,910 This is the unglued case. 463 00:33:37,910 --> 00:33:40,160 So I'm going to create a little probability 464 00:33:40,160 --> 00:33:43,790 matrix of joint probabilities. 465 00:33:43,790 --> 00:33:49,580 That's really the key word that I'm discussing here-- 466 00:33:49,580 --> 00:33:50,960 joint probability. 467 00:33:54,260 --> 00:33:57,580 So let's complete that matrix. 468 00:33:57,580 --> 00:34:00,930 So I have unglued coins, independent coins. 469 00:34:00,930 --> 00:34:02,180 I flip them both. 470 00:34:02,180 --> 00:34:06,150 What is the chances of getting heads on both? 471 00:34:06,150 --> 00:34:08,070 1/4. 472 00:34:08,070 --> 00:34:11,949 What are the chances of-- what do I put in here? 473 00:34:11,949 --> 00:34:14,580 This means heads on the first coin 474 00:34:14,580 --> 00:34:16,230 and tails on this second coin. 475 00:34:16,230 --> 00:34:18,690 And the probability of that is? 476 00:34:18,690 --> 00:34:20,670 1/4. 477 00:34:20,670 --> 00:34:25,090 And 1/4 here and 1/4 here. 478 00:34:25,090 --> 00:34:28,870 So I've got four possibilities, which I put into a 2 479 00:34:28,870 --> 00:34:32,080 by 2 matrix, instead of a long vector. 480 00:34:32,080 --> 00:34:36,250 My four possibilities are heads heads, heads tails, 481 00:34:36,250 --> 00:34:38,350 tails heads, and tails tails. 482 00:34:38,350 --> 00:34:39,909 And they have equal probability-- 483 00:34:39,909 --> 00:34:40,900 1/4. 484 00:34:40,900 --> 00:34:54,409 But now, if they're glued, heads and tails on the first coin, 485 00:34:54,409 --> 00:34:59,330 heads and tails on the second coin, now what do I 486 00:34:59,330 --> 00:35:01,340 put in there? 487 00:35:01,340 --> 00:35:03,350 So the two coins are glued together. 488 00:35:03,350 --> 00:35:05,890 What is the chance that they both come up heads? 489 00:35:10,160 --> 00:35:12,365 1/2. 490 00:35:12,365 --> 00:35:16,490 Because if one comes up heads, the other one is glued to it. 491 00:35:16,490 --> 00:35:18,140 It will also. 492 00:35:18,140 --> 00:35:22,760 What's the probability of heads tails, heads on one, tails 493 00:35:22,760 --> 00:35:24,375 on the other, is of course? 494 00:35:24,375 --> 00:35:25,000 AUDIENCE: Zero. 495 00:35:25,000 --> 00:35:26,510 GILBERT STRANG: Zero, thanks. 496 00:35:26,510 --> 00:35:27,940 And here, zero. 497 00:35:27,940 --> 00:35:29,870 And here, 1/2. 498 00:35:29,870 --> 00:35:36,890 So what I've created are those two setups, 499 00:35:36,890 --> 00:35:40,610 two different scenarios of unglued and glued. 500 00:35:43,340 --> 00:35:50,060 But each experimental setup has its matrix 501 00:35:50,060 --> 00:35:53,630 of joint probabilities. 502 00:35:53,630 --> 00:35:56,740 That's the thing that there are four numbers here, 503 00:35:56,740 --> 00:35:58,400 four numbers. 504 00:35:58,400 --> 00:36:01,250 We have all possibilities. 505 00:36:01,250 --> 00:36:08,420 We have any possible x and at the same time any possible y. 506 00:36:08,420 --> 00:36:12,620 So suppose we were running three experiments. 507 00:36:19,020 --> 00:36:22,830 So what would be the what would be the situation if I 508 00:36:22,830 --> 00:36:27,180 was running three experiments with three 509 00:36:27,180 --> 00:36:30,510 independent, fair coins. 510 00:36:30,510 --> 00:36:33,210 I'd be in this unglued picture. 511 00:36:33,210 --> 00:36:39,170 But I would have three different experiments that I'm running. 512 00:36:39,170 --> 00:36:44,540 Then what would I be looking at then? 513 00:36:44,540 --> 00:36:47,300 Just the whole idea is to see what 514 00:36:47,300 --> 00:36:51,590 is this like joint probability. 515 00:36:51,590 --> 00:36:56,570 So suppose I have three coins unglued. 516 00:37:01,170 --> 00:37:06,240 Then I want to know like the probability of getting heads 517 00:37:06,240 --> 00:37:09,660 on the first, heads on the second, heads on the third. 518 00:37:09,660 --> 00:37:10,500 Will be what? 519 00:37:10,500 --> 00:37:13,430 Just give me that number. 520 00:37:13,430 --> 00:37:16,190 What will be the probability that all three of them 521 00:37:16,190 --> 00:37:17,917 independently come up heads? 522 00:37:17,917 --> 00:37:18,500 AUDIENCE: 1/8. 523 00:37:18,500 --> 00:37:20,540 GILBERT STRANG: 1/8, OK. 524 00:37:20,540 --> 00:37:23,480 But now my question is-- 525 00:37:23,480 --> 00:37:27,280 so what do I have? 526 00:37:27,280 --> 00:37:29,890 Then I have probability of heads, 527 00:37:29,890 --> 00:37:33,540 of, say, tails heads heads. 528 00:37:33,540 --> 00:37:36,480 I've got three indices here and eventually 529 00:37:36,480 --> 00:37:39,750 down the probability of tails tails tails. 530 00:37:39,750 --> 00:37:43,530 Everybody sees that the numbers are going to be 1/8. 531 00:37:43,530 --> 00:37:47,120 But where do those fit in? 532 00:37:49,913 --> 00:37:51,580 They don't fit in a matrix, because I've 533 00:37:51,580 --> 00:37:56,440 got 3 indices here. 534 00:37:56,440 --> 00:37:59,710 So I guess what we're seeing, I sort of realized today, 535 00:37:59,710 --> 00:38:02,570 we're seeing for the first time a tensor. 536 00:38:02,570 --> 00:38:09,860 A tensor is a three-way structure, 537 00:38:09,860 --> 00:38:12,830 three-way matrix you could say. 538 00:38:12,830 --> 00:38:23,760 So I guess I think of this instead of a square like that, 539 00:38:23,760 --> 00:38:28,110 an ordinary matrix, I have to think of a cube, right? 540 00:38:28,110 --> 00:38:36,800 I have a cube with two rows, two columns, and two whatevers. 541 00:38:39,610 --> 00:38:43,160 Layer, somebody might say layers for that. 542 00:38:43,160 --> 00:38:47,300 You see that the matrix has become 543 00:38:47,300 --> 00:38:58,000 a three-way thing, a tensor. 544 00:38:58,000 --> 00:39:01,330 And the entries in that tensor-- 545 00:39:01,330 --> 00:39:02,950 so it's 2 by 2 by 2. 546 00:39:06,580 --> 00:39:09,480 But instead of m by n for a matrix, 547 00:39:09,480 --> 00:39:12,880 I have to give you the number of rows, the number of columns, 548 00:39:12,880 --> 00:39:17,200 and the number of layers going into the board. 549 00:39:17,200 --> 00:39:20,310 So rows going one way-- 550 00:39:20,310 --> 00:39:26,110 you know, columns, rows, and then layers are going in deep. 551 00:39:26,110 --> 00:39:29,800 So it will have eight entries. 552 00:39:29,800 --> 00:39:35,140 And, of course, in this simple case 553 00:39:35,140 --> 00:39:44,020 each will be 1/8 in that unglued, totally independent 554 00:39:44,020 --> 00:39:44,990 way. 555 00:39:44,990 --> 00:39:50,110 But then you can imagine some dependence. 556 00:39:50,110 --> 00:39:53,380 So what would happen if I glued coins 1 and 3? 557 00:39:56,860 --> 00:40:00,130 I would still have a tensor, still have a 2 by 2 558 00:40:00,130 --> 00:40:06,090 by 2 tensor of all the possibilities. 559 00:40:06,090 --> 00:40:08,580 But some of those are going to have probability 560 00:40:08,580 --> 00:40:10,620 zero, the joint probability. 561 00:40:10,620 --> 00:40:16,500 If I've glued coin 1 to coin 3, then the probability of jointly 562 00:40:16,500 --> 00:40:22,530 seeing heads on one, whatever on two, tails on 3 563 00:40:22,530 --> 00:40:25,280 will be 0, right? 564 00:40:25,280 --> 00:40:29,830 Because that can't happen if I've glued coins 1 and 3. 565 00:40:29,830 --> 00:40:31,520 So I'll have eight entries in here. 566 00:40:34,610 --> 00:40:36,215 This is the unglued case. 567 00:40:39,260 --> 00:40:42,310 And then I could have a case where two coins are glued. 568 00:40:45,690 --> 00:40:53,715 And as I say, I think I'd 1/4 four times probably. 569 00:40:59,030 --> 00:41:04,890 And then if I had any spare glue, I glue all three coins. 570 00:41:07,800 --> 00:41:12,740 I flip that stuck together thing, 571 00:41:12,740 --> 00:41:16,040 and I never get heads tails heads. 572 00:41:16,040 --> 00:41:18,530 Only possibilities I get our heads 573 00:41:18,530 --> 00:41:21,080 heads heads and tails, tails, tails, 574 00:41:21,080 --> 00:41:22,730 because they're glued together. 575 00:41:22,730 --> 00:41:27,870 So what would be the situation for three coins glued? 576 00:41:31,540 --> 00:41:36,170 What will be the entries in the in the matrix 577 00:41:36,170 --> 00:41:37,650 of joint probabilities? 578 00:41:37,650 --> 00:41:39,230 What will be the joint probability? 579 00:41:39,230 --> 00:41:43,310 So the probability of heads heads heads, of seeing heads 580 00:41:43,310 --> 00:41:45,250 from all three will be? 581 00:41:45,250 --> 00:41:46,190 AUDIENCE: 1/2. 582 00:41:46,190 --> 00:41:47,420 GILBERT STRANG: 1/2. 583 00:41:47,420 --> 00:41:47,920 1/2. 584 00:41:47,920 --> 00:41:53,020 Because I'm flipping this heavy mix of three coins together, 585 00:41:53,020 --> 00:41:56,330 I get 1/2 twice. 586 00:42:01,070 --> 00:42:06,170 Actually, this is a good introduction to tensors 587 00:42:06,170 --> 00:42:10,430 in a way, because the first step in understanding tensor 588 00:42:10,430 --> 00:42:15,250 is to think of three-way matrices, 589 00:42:15,250 --> 00:42:18,090 think of three-way things. 590 00:42:18,090 --> 00:42:20,440 We just haven't done that. 591 00:42:20,440 --> 00:42:22,010 And now we have to do it. 592 00:42:22,010 --> 00:42:26,920 And, of course, four-way or n-way tensors 593 00:42:26,920 --> 00:42:30,840 are now understood. 594 00:42:30,840 --> 00:42:37,080 So these are tensors with very special, simple things. 595 00:42:37,080 --> 00:42:44,120 So now, I have to say, what is the covariance matrix? 596 00:42:44,120 --> 00:42:46,340 What's the covariance matrix? 597 00:42:46,340 --> 00:42:49,700 So now I'm ready for that, and I'll put it here. 598 00:42:49,700 --> 00:42:52,330 That's the final topic for today. 599 00:42:55,920 --> 00:42:57,525 So the covariance matrix. 600 00:43:04,330 --> 00:43:08,485 So I'm saying matrix, because I'm just going to have-- 601 00:43:08,485 --> 00:43:10,560 yeah, the covariance matrix. 602 00:43:10,560 --> 00:43:20,200 Yeah, it's going to be 2 by 2 for two coins. 603 00:43:20,200 --> 00:43:22,420 So it is a matrix. 604 00:43:22,420 --> 00:43:27,340 For three coins, it will be 3 by 3. 605 00:43:27,340 --> 00:43:30,580 But it is a matrix. 606 00:43:30,580 --> 00:43:32,508 So how is it defined? 607 00:43:32,508 --> 00:43:33,800 And what am I going to call it? 608 00:43:33,800 --> 00:43:41,160 I think I'll call v, because really the key ideas variance. 609 00:43:41,160 --> 00:43:45,480 Covariance is telling us that we're also 610 00:43:45,480 --> 00:43:55,010 interested in the joint outcome, an x and y. 611 00:43:55,010 --> 00:43:59,560 So it's a variance. 612 00:43:59,560 --> 00:44:05,525 So I'm going to add up over all plausible i and j-- 613 00:44:09,570 --> 00:44:13,580 sorry, all possible outcomes. 614 00:44:13,580 --> 00:44:14,990 Yeah, that's not right. 615 00:44:17,700 --> 00:44:24,070 All possible xi and yj. 616 00:44:24,070 --> 00:44:28,350 So I'm running these two experiments at the same time. 617 00:44:28,350 --> 00:44:31,200 From experiment 1, the output's an x. 618 00:44:31,200 --> 00:44:34,200 From experiment 2, the output's a y. 619 00:44:34,200 --> 00:44:36,835 Then what is Pij? 620 00:44:40,530 --> 00:44:44,610 What does that symbol mean? 621 00:44:44,610 --> 00:44:49,140 That's the guy in our 2 by 2 matrix, 622 00:44:49,140 --> 00:44:54,660 like that one or that one, depending on the gluing 623 00:44:54,660 --> 00:44:56,480 or not gluing. 624 00:44:56,480 --> 00:44:59,880 So Pij, let me say what it is. 625 00:44:59,880 --> 00:45:08,100 This is the probability that x is xi 626 00:45:08,100 --> 00:45:10,800 and that the second output that this is yj. 627 00:45:14,320 --> 00:45:17,440 Let me give you a second example to keep in mind. 628 00:45:17,440 --> 00:45:21,150 Suppose I'm looking at age and height. 629 00:45:21,150 --> 00:45:28,380 So suppose x is the age of the sample, the person. 630 00:45:28,380 --> 00:45:29,750 And y is the height. 631 00:45:34,430 --> 00:45:39,470 I want to know so what fraction have a certain age 632 00:45:39,470 --> 00:45:41,230 and a certain height. 633 00:45:41,230 --> 00:45:44,300 I'm looking at every pair, age and height. 634 00:45:44,300 --> 00:45:47,690 Age 11, height 4 feet. 635 00:45:47,690 --> 00:45:49,760 Age 12, height 5 feet. 636 00:45:49,760 --> 00:45:52,940 Age 11, height 5 feet. 637 00:45:52,940 --> 00:45:55,060 Each combination. 638 00:45:55,060 --> 00:45:59,870 So Pij is the probability that these will both happen, both. 639 00:46:07,570 --> 00:46:10,470 I'm going to add more to it here. 640 00:46:10,470 --> 00:46:14,290 But that joint probability is really important. 641 00:46:14,290 --> 00:46:16,900 So I'm going to ask you more about that. 642 00:46:16,900 --> 00:46:22,220 Suppose that I take these Pijs and that I add up 643 00:46:22,220 --> 00:46:27,010 P1j plus P2j plus P3j. 644 00:46:27,010 --> 00:46:32,630 In other words, I sum the Pijs over i. 645 00:46:32,630 --> 00:46:40,225 So I'm looking at a row, row i, of my matrix. 646 00:46:43,450 --> 00:46:46,300 So let me ask the question. 647 00:46:46,300 --> 00:46:47,980 Maybe I have to put it somewhere else. 648 00:46:56,110 --> 00:47:02,565 What's the meaning of the sum of Pij over the i's? 649 00:47:05,800 --> 00:47:10,320 What does that quantity look like? 650 00:47:10,320 --> 00:47:12,160 So that's a probability. 651 00:47:12,160 --> 00:47:14,880 Pij is the probability of getting 652 00:47:14,880 --> 00:47:17,340 a certain i and a certain j. 653 00:47:17,340 --> 00:47:20,490 But now I'm including all the i's. 654 00:47:20,490 --> 00:47:22,658 So what am I seeing there? 655 00:47:22,658 --> 00:47:23,200 AUDIENCE: Pj. 656 00:47:23,200 --> 00:47:24,090 GILBERT STRANG: Pj. 657 00:47:24,090 --> 00:47:24,660 Thanks. 658 00:47:24,660 --> 00:47:25,380 Pj. 659 00:47:25,380 --> 00:47:27,375 This is Pj. 660 00:47:30,840 --> 00:47:35,240 That's the probability of seeing j in the second guy, 661 00:47:35,240 --> 00:47:36,710 because I had to see something. 662 00:47:36,710 --> 00:47:39,810 If I see j in the second one, I'm 663 00:47:39,810 --> 00:47:42,180 allowed to see anything here in the first one. 664 00:47:42,180 --> 00:47:43,770 But I'm adding those all up. 665 00:47:43,770 --> 00:47:45,920 So that's the point. 666 00:47:45,920 --> 00:47:49,020 Those would be called the marginals. 667 00:47:49,020 --> 00:47:55,560 In my matrices, I would be adding up along a row 668 00:47:55,560 --> 00:47:58,260 or adding up down my column. 669 00:47:58,260 --> 00:48:06,390 Those are called the marginals of the joint probabilities. 670 00:48:06,390 --> 00:48:10,010 So the marginals would be the individual probabilities, 671 00:48:10,010 --> 00:48:15,860 Pi and Pj, in the case of two experiments going on 672 00:48:15,860 --> 00:48:17,210 at the same time. 673 00:48:17,210 --> 00:48:20,360 Yeah, it's just like new ideas. 674 00:48:20,360 --> 00:48:23,120 Everything today has been sort of straightforward. 675 00:48:23,120 --> 00:48:24,980 But it's different. 676 00:48:24,980 --> 00:48:28,790 OK, now, I'm going to complete the definition 677 00:48:28,790 --> 00:48:31,230 of this covariance matrix. 678 00:48:31,230 --> 00:48:32,300 So it's going to be-- 679 00:48:34,795 --> 00:48:35,795 I want to have a square. 680 00:48:38,380 --> 00:48:41,020 So it's going to be-- 681 00:48:41,020 --> 00:48:47,590 and it should be this is between x and the mean of x. 682 00:48:47,590 --> 00:48:50,620 Mean 1 I could call it or mean of x. 683 00:48:50,620 --> 00:48:58,080 And y, the distance from the mean of y times-- 684 00:48:58,080 --> 00:49:01,020 so it's going to be column times row-- 685 00:49:01,020 --> 00:49:07,440 same x minus the mean of x, y minus the mean of y. 686 00:49:10,680 --> 00:49:11,180 xi. 687 00:49:17,255 --> 00:49:17,755 yj. 688 00:49:23,090 --> 00:49:25,940 Can you look at this formula? 689 00:49:25,940 --> 00:49:35,470 So this is with two experiments, two coins, two experiments. 690 00:49:35,470 --> 00:49:37,000 I get a 2 by 2 matrix. 691 00:49:37,000 --> 00:49:37,875 Everybody sees that. 692 00:49:37,875 --> 00:49:41,450 A column times a row, 2 by 2 matrix. 693 00:49:41,450 --> 00:49:47,150 And let's just see what would be the 1, 1 entry in that matrix. 694 00:49:49,670 --> 00:49:52,690 This is the covariance matrix. 695 00:49:52,690 --> 00:49:55,470 So what is the 1, 1 entry in that matrix? 696 00:49:55,470 --> 00:49:57,600 So the 1, 1 entry is coming from that 697 00:49:57,600 --> 00:50:03,600 times that, which is that thing squared, times all the Pijs. 698 00:50:03,600 --> 00:50:04,310 Add them up. 699 00:50:04,310 --> 00:50:07,500 What do you think I get for that? 700 00:50:07,500 --> 00:50:12,870 I get the variance of the x experiment, 701 00:50:12,870 --> 00:50:18,000 the standard variance of the x experiment, so v-- 702 00:50:18,000 --> 00:50:22,470 I have to tell you what v is now. 703 00:50:22,470 --> 00:50:26,940 V, this is V. This is a 2 by 2 matrix. 704 00:50:26,940 --> 00:50:30,700 Up here, I get the variance of the x experiment. 705 00:50:30,700 --> 00:50:33,450 What do I get down here? 706 00:50:33,450 --> 00:50:36,240 The variance of the y experiment by itself. 707 00:50:36,240 --> 00:50:40,680 Because it's y's times y's, it gives me that 2, 2 entry. 708 00:50:40,680 --> 00:50:47,440 So this is just sigma y squared, 709 00:50:47,440 --> 00:50:53,430 But the novelty is the 1, 2, and it will be symmetric. 710 00:50:53,430 --> 00:50:55,260 So it's a symmetric matrix. 711 00:50:55,260 --> 00:50:57,760 This is V transpose. 712 00:50:57,760 --> 00:50:59,700 I just have to see here. 713 00:50:59,700 --> 00:51:07,890 So I've Pij times the distance of this guy times this guy. 714 00:51:07,890 --> 00:51:11,665 That's what's going to show up in the 1, 2 position. 715 00:51:14,230 --> 00:51:15,450 It'll be in row 1. 716 00:51:15,450 --> 00:51:18,420 It'll be in column 2. 717 00:51:18,420 --> 00:51:21,490 It's the distances. 718 00:51:21,490 --> 00:51:25,710 So what will it be in the case of unglued coins, 719 00:51:25,710 --> 00:51:28,720 independent coins? 720 00:51:28,720 --> 00:51:29,730 Zero. 721 00:51:29,730 --> 00:51:31,470 I mean, it's just feeling like 0. 722 00:51:31,470 --> 00:51:33,250 I haven't done the computation. 723 00:51:33,250 --> 00:51:37,500 But I know that when I have independent experiments, then 724 00:51:37,500 --> 00:51:42,420 this covariance, which everybody would write as sigma xy-- 725 00:51:42,420 --> 00:51:43,530 and it's the same here. 726 00:51:43,530 --> 00:51:47,340 It's symmetric, sigma yx if you like. 727 00:51:47,340 --> 00:51:49,980 So those subscripts are telling me 728 00:51:49,980 --> 00:51:56,420 that the sum of the P's, joint probabilities, 729 00:51:56,420 --> 00:51:59,180 times the distance of x from its means, 730 00:51:59,180 --> 00:52:02,090 the distance of y from its main, added up 731 00:52:02,090 --> 00:52:04,620 over all the possibilities. 732 00:52:04,620 --> 00:52:10,280 So the case of unglued coins, the case of independent ones, 733 00:52:10,280 --> 00:52:14,390 in that case, those are 0. 734 00:52:14,390 --> 00:52:17,030 Maybe worth just writing that out. 735 00:52:17,030 --> 00:52:19,010 You would get 0. 736 00:52:19,010 --> 00:52:21,920 So you have a diagonal matrix. 737 00:52:21,920 --> 00:52:25,250 The diagonal matrix is just separate variances, 738 00:52:25,250 --> 00:52:27,920 because that's all the two independent-- the experiments 739 00:52:27,920 --> 00:52:28,630 are independent. 740 00:52:28,630 --> 00:52:32,000 So all you can really expect-- 741 00:52:32,000 --> 00:52:35,330 information is sigma x squared and sigma y squared. 742 00:52:35,330 --> 00:52:42,410 But if the two coins are glued together, then what? 743 00:52:42,410 --> 00:52:44,840 If the two coins are glued together-- 744 00:52:44,840 --> 00:52:47,390 well, let me just say because time is up. 745 00:52:47,390 --> 00:52:49,850 This matrix will be singular. 746 00:52:49,850 --> 00:52:53,570 If the two coins were glued together, 747 00:52:53,570 --> 00:52:56,290 the determinant would be 0 here. 748 00:52:56,290 --> 00:53:02,390 The sigma xy in the glued case would be-- 749 00:53:02,390 --> 00:53:06,470 squared-- would be the same as sigma x squared sigma 750 00:53:06,470 --> 00:53:07,130 y squared. 751 00:53:10,360 --> 00:53:14,110 Actually, we're probably getting all these 1/4s. 752 00:53:14,110 --> 00:53:16,120 And that would make sense. 753 00:53:20,660 --> 00:53:23,590 I'll just end with this statement. 754 00:53:23,590 --> 00:53:30,130 This matrix is positive semidefinite always. 755 00:53:30,130 --> 00:53:31,900 Positive semidefinite always. 756 00:53:31,900 --> 00:53:35,510 Because it's column times row, we 757 00:53:35,510 --> 00:53:37,730 know that's positive semidefinite. 758 00:53:37,730 --> 00:53:41,270 And it's multiplied by numbers greater or equal 0. 759 00:53:41,270 --> 00:53:45,710 So it's a combination of rank 1 positive semidefinite definite. 760 00:53:45,710 --> 00:53:47,720 So it's positive semidefinite definite. 761 00:53:47,720 --> 00:53:50,420 Or positive definite. 762 00:53:50,420 --> 00:53:54,350 It's certainly positive definite in the independent case 763 00:53:54,350 --> 00:53:57,290 when it's diagonal. 764 00:53:57,290 --> 00:54:00,860 And the totally dependent case, when the coins are completely 765 00:54:00,860 --> 00:54:05,450 stuck together, that will be the semidefinite case 766 00:54:05,450 --> 00:54:10,990 when these entries would all be the same actually. 767 00:54:10,990 --> 00:54:16,270 So that's a first look at covariance matrices. 768 00:54:16,270 --> 00:54:17,610 It brought in tensors. 769 00:54:17,610 --> 00:54:19,910 It brought in joint probabilities. 770 00:54:19,910 --> 00:54:22,550 It brought in column times row. 771 00:54:22,550 --> 00:54:24,730 It kept symmetry. 772 00:54:24,730 --> 00:54:28,850 And we recognized positive definite or positive 773 00:54:28,850 --> 00:54:29,870 semidefinite definite. 774 00:54:29,870 --> 00:54:32,900 So in between, coins that were partly glued, 775 00:54:32,900 --> 00:54:35,960 partly independent, but not completely independent 776 00:54:35,960 --> 00:54:39,440 experiments, then this number would be smaller. 777 00:54:42,630 --> 00:54:45,765 This wouldn't be 0, but it would be smaller than these numbers. 778 00:54:49,160 --> 00:54:51,510 I've run four minutes over. 779 00:54:51,510 --> 00:54:53,970 You're very kind to stay. 780 00:54:53,970 --> 00:54:56,100 So have a wonderful break. 781 00:54:56,100 --> 00:54:58,420 And I'll see you a week from Monday. 782 00:54:58,420 --> 00:54:58,920 Good. 783 00:54:58,920 --> 00:55:00,238 Thanks.