1 00:00:00,780 --> 00:00:03,520 Conditional probability is an absolutely basic idea 2 00:00:03,520 --> 00:00:05,130 that we use all the time. 3 00:00:05,130 --> 00:00:08,250 It's the probability that some event occurs, 4 00:00:08,250 --> 00:00:11,160 given certain information about it. 5 00:00:11,160 --> 00:00:14,190 For example, an insurance company 6 00:00:14,190 --> 00:00:17,390 wants to know, what's the probability that you'll 7 00:00:17,390 --> 00:00:20,580 live for the next 10 years given your medical history? 8 00:00:20,580 --> 00:00:22,850 Or a typical investor wants to know, 9 00:00:22,850 --> 00:00:25,740 what's the probability that this stock is going to rise 10 00:00:25,740 --> 00:00:29,315 given its stock price gyrations for the past month? 11 00:00:29,315 --> 00:00:30,940 There are people who actually think you 12 00:00:30,940 --> 00:00:33,200 could do that, the chartists. 13 00:00:33,200 --> 00:00:35,770 That, not knowing anything about the nature of the company, 14 00:00:35,770 --> 00:00:40,280 or the business that the stock is part of, 15 00:00:40,280 --> 00:00:44,160 that just by watching the price gyration you can make a better 16 00:00:44,160 --> 00:00:46,490 guess on what the stock will do tomorrow 17 00:00:46,490 --> 00:00:47,770 than you could otherwise. 18 00:00:50,200 --> 00:00:53,610 Another good example is for a system engineer. 19 00:00:53,610 --> 00:00:56,170 What's the probability that the system is going to overload, 20 00:00:56,170 --> 00:00:58,837 given the recent history of the rate at which requests 21 00:00:58,837 --> 00:00:59,670 have been coming in? 22 00:00:59,670 --> 00:01:02,930 And finally, as a joke that I like to think about, 23 00:01:02,930 --> 00:01:04,819 is, what's the probability that you're 24 00:01:04,819 --> 00:01:06,920 a cat owner, given that you're sitting 25 00:01:06,920 --> 00:01:10,145 in the cat section of the Angell Memorial Veterinary Hospital? 26 00:01:11,170 --> 00:01:12,010 OK. 27 00:01:12,010 --> 00:01:16,866 So let's look concretely at a very simple example 28 00:01:16,866 --> 00:01:18,240 of conditional probability that's 29 00:01:18,240 --> 00:01:21,110 meant to be illustrative, where we look at a die, 30 00:01:21,110 --> 00:01:22,481 and rolling a fair die. 31 00:01:22,910 --> 00:01:23,410 OK. 32 00:01:23,410 --> 00:01:26,790 Now if I'm thinking about an ordinary fair die, 33 00:01:26,790 --> 00:01:29,170 I've got 6 outcomes that are equally likely. 34 00:01:29,170 --> 00:01:31,700 The outcomes are one, two, three, four, five, six. 35 00:01:31,700 --> 00:01:34,200 And if I ask, what's the probability that in one roll, 36 00:01:34,200 --> 00:01:35,670 I roll a one? 37 00:01:35,670 --> 00:01:39,310 Well it's going to be the number of outcomes involving 38 00:01:39,310 --> 00:01:43,830 my rolling a one, divided by the total number of outcomes. 39 00:01:43,830 --> 00:01:45,060 It's one-sixth. 40 00:01:45,060 --> 00:01:48,559 The probability of any given face of a six-sided fair die 41 00:01:48,559 --> 00:01:49,100 is one-sixth. 42 00:01:50,000 --> 00:01:53,580 But suppose I give you some additional information. 43 00:01:53,580 --> 00:01:57,110 Knowledge about the roll can change 44 00:01:57,110 --> 00:01:59,250 the judgment of probabilities. 45 00:01:59,250 --> 00:02:02,860 Suppose that I tell you that I rolled an odd number. 46 00:02:02,860 --> 00:02:06,390 And now I want to know what's the probability that I 47 00:02:06,390 --> 00:02:08,180 rolled a one? 48 00:02:08,180 --> 00:02:12,250 And the answer will now be that, given that it's an odd number, 49 00:02:12,250 --> 00:02:15,290 the only possibilities are one, three, and five. 50 00:02:15,290 --> 00:02:20,335 And so the probability has changed to one-third. 51 00:02:20,335 --> 00:02:22,210 That should be a straightforward enough idea. 52 00:02:23,746 --> 00:02:25,620 One way to understand conditional probability 53 00:02:25,620 --> 00:02:28,770 is as a kind of experiment, where first you 54 00:02:28,770 --> 00:02:32,840 try to roll an odd number and then you 55 00:02:32,840 --> 00:02:35,680 do you decide what final roll you're going to make. 56 00:02:35,680 --> 00:02:38,260 Let's look at that tree, if we were describing it that way. 57 00:02:38,760 --> 00:02:42,170 So the first branch of the tree that we'll 58 00:02:42,170 --> 00:02:45,350 use to build a probability space is 59 00:02:45,350 --> 00:02:49,255 to say, OK, among the six possible outcomes, what 60 00:02:49,255 --> 00:02:52,280 are the chances that we rolled an odd number? 61 00:02:52,280 --> 00:02:54,610 Well, it's 50/50, because there are three of each. 62 00:02:54,610 --> 00:02:57,680 So there's one-half chance that yes, you rolled an odd number. 63 00:02:57,680 --> 00:03:00,450 In which case those are the possible outcomes. 64 00:03:00,450 --> 00:03:04,130 Or a one-half chance that no, you didn't roll an odd number, 65 00:03:04,130 --> 00:03:06,450 and the possible outcomes then are two, four, and six. 66 00:03:06,970 --> 00:03:10,310 Now, once you're here with one, three, and five, 67 00:03:10,310 --> 00:03:12,550 let's ask whether you rolled a one. 68 00:03:12,550 --> 00:03:14,490 The probability that you did roll a one, 69 00:03:14,490 --> 00:03:16,570 we've already agreed is one-third. 70 00:03:16,570 --> 00:03:19,000 It's equally likely to be any one of those three outcomes. 71 00:03:19,000 --> 00:03:22,720 Which means that it's two-thirds that you wind up rolling 72 00:03:22,720 --> 00:03:24,240 either a three or a five. 73 00:03:24,240 --> 00:03:27,140 And likewise, here, the probability 74 00:03:27,140 --> 00:03:29,737 if you didn't roll an odd number, that 75 00:03:29,737 --> 00:03:32,070 is, you rolled an even number, the probability that next 76 00:03:32,070 --> 00:03:35,750 you'll roll a one is zero. 77 00:03:35,750 --> 00:03:38,930 Or that you won't roll a one is probability one. 78 00:03:39,990 --> 00:03:42,310 So this is a kind of standard way 79 00:03:42,310 --> 00:03:46,390 that we have of trying to build up a set of probabilities 80 00:03:46,390 --> 00:03:47,800 for outcomes. 81 00:03:47,800 --> 00:03:50,350 And if we look at this tree, well, first of all, 82 00:03:50,350 --> 00:03:52,180 we can use it to assign some probabilities. 83 00:03:52,180 --> 00:03:55,150 Because the probability of your rolling 84 00:03:55,150 --> 00:03:57,990 a one is one-sixth, as it should be. 85 00:03:57,990 --> 00:04:00,060 It's one-half times one-third, which 86 00:04:00,060 --> 00:04:02,814 is the usual way we would calculate the probability 87 00:04:02,814 --> 00:04:03,480 of this outcome. 88 00:04:04,000 --> 00:04:06,780 By the way, we could calculate the probability of the outcome 89 00:04:06,780 --> 00:04:07,800 being three or five. 90 00:04:07,800 --> 00:04:10,810 It would be one-half times two-thirds or one-third. 91 00:04:10,810 --> 00:04:16,820 And finally, the probability of rolling an even number 92 00:04:16,820 --> 00:04:17,960 would just be one-half. 93 00:04:17,960 --> 00:04:19,223 One-half times one. 94 00:04:19,723 --> 00:04:21,510 Now, what's going on here? 95 00:04:21,510 --> 00:04:24,695 Well, if you look at this number, one-third, 96 00:04:24,695 --> 00:04:28,240 it is what we said was the probability of a one, 97 00:04:28,240 --> 00:04:31,640 given that you rolled an odd number. 98 00:04:31,640 --> 00:04:34,140 So that's where this label came from. 99 00:04:34,140 --> 00:04:36,490 Likewise, this number, two-thirds, 100 00:04:36,490 --> 00:04:39,700 is the probability that you didn't roll a one, given 101 00:04:39,700 --> 00:04:41,620 that you rolled an odd number. 102 00:04:41,620 --> 00:04:45,000 And finally, this number is the probability 103 00:04:45,000 --> 00:04:47,100 that you didn't roll a one, given 104 00:04:47,100 --> 00:04:48,450 that you rolled an even number. 105 00:04:48,450 --> 00:04:49,380 And it's certain. 106 00:04:50,400 --> 00:04:52,590 Let's do another example to get this idea across. 107 00:04:52,590 --> 00:04:55,520 Let's go back to Monty Hall, which we've seen before. 108 00:04:55,520 --> 00:04:57,470 Remember how we have the probability 109 00:04:57,470 --> 00:05:00,710 labels on these branches, which we figured out. 110 00:05:01,210 --> 00:05:04,730 So if we look at this number, one-third, what is it? 111 00:05:04,730 --> 00:05:08,590 Well this is where the prize is at Location One 112 00:05:08,590 --> 00:05:11,260 and the contestant has picked Door One. 113 00:05:11,260 --> 00:05:13,260 And that one-third, we figured out 114 00:05:13,260 --> 00:05:15,050 that once the prize is at Door One, 115 00:05:15,050 --> 00:05:19,330 in fact, wherever the prize is, the probability 116 00:05:19,330 --> 00:05:21,420 that the contestant will pick One is one-third. 117 00:05:21,420 --> 00:05:23,900 This number, one-third, is the probability 118 00:05:23,900 --> 00:05:25,830 the contestant will pick One given 119 00:05:25,830 --> 00:05:27,940 that the prize is at Door One. 120 00:05:27,940 --> 00:05:28,440 Yeah? 121 00:05:29,110 --> 00:05:30,550 Here's another third. 122 00:05:30,550 --> 00:05:33,650 This is, similarly, the probability that the contestant 123 00:05:33,650 --> 00:05:39,300 will pick Door Two given that the prize is at Door Three. 124 00:05:39,300 --> 00:05:40,577 It's symmetric to this one. 125 00:05:40,577 --> 00:05:42,410 But here's something a little bit different. 126 00:05:42,410 --> 00:05:43,520 Here's one-half. 127 00:05:43,520 --> 00:05:47,170 This is the probability that Door Three 128 00:05:47,170 --> 00:05:51,860 will be opened by Carol given that the prize is 129 00:05:51,860 --> 00:05:55,381 at One-- that's that branch-- and the contestant picked One. 130 00:05:55,880 --> 00:05:59,080 And when the prize is at One and the contestant picks One, 131 00:05:59,080 --> 00:06:01,870 Carol, we said in our model, is equally likely 132 00:06:01,870 --> 00:06:06,000 to open the two possible doors that have goats 133 00:06:06,000 --> 00:06:07,190 that she's able to open. 134 00:06:07,689 --> 00:06:11,570 And so that's one-half, is this conditional probability. 135 00:06:11,570 --> 00:06:13,470 The probability that she'll open Door Three 136 00:06:13,470 --> 00:06:17,300 given that we're in this location in the tree. 137 00:06:17,300 --> 00:06:19,930 Given that the prize is at One and pick is at One. 138 00:06:20,430 --> 00:06:22,690 So the point is simply that we were reasoning 139 00:06:22,690 --> 00:06:26,810 about conditional probability in the very way we began defining 140 00:06:26,810 --> 00:06:29,480 the tree model that we were using to define probability 141 00:06:29,480 --> 00:06:31,270 spaces in the first place. 142 00:06:31,270 --> 00:06:33,970 We were implicitly using conditional probabilities 143 00:06:33,970 --> 00:06:38,060 to label the probabilities that left each vertex of the tree. 144 00:06:39,497 --> 00:06:41,580 And in fact, formally speaking, what we were using 145 00:06:41,580 --> 00:06:42,540 was the product rule. 146 00:06:42,540 --> 00:06:47,870 Which is that, the probability that an A event occurs and a B 147 00:06:47,870 --> 00:06:50,730 event occurs is simply the probability 148 00:06:50,730 --> 00:06:53,480 that the A event-- that's the first branch of the tree-- 149 00:06:53,480 --> 00:06:59,090 times the probability of B given A. That's 150 00:06:59,090 --> 00:07:01,420 the fundamental rule of conditional probabilities. 151 00:07:01,420 --> 00:07:04,950 That's the product rule, and it's something to be memorized. 152 00:07:05,820 --> 00:07:10,060 In fact, this product rule is not a corollary. 153 00:07:10,060 --> 00:07:14,320 It's really the definition of conditional probability. 154 00:07:14,320 --> 00:07:15,939 So all of the previous discussion 155 00:07:15,939 --> 00:07:17,730 was motivation of the following definition. 156 00:07:18,460 --> 00:07:21,510 If A and B are events in a probability space, 157 00:07:21,510 --> 00:07:25,340 the probability of B, given A, is 158 00:07:25,340 --> 00:07:29,790 defined to be the probability that A and B occur-- 159 00:07:29,790 --> 00:07:33,460 that is, A intersection B-- relative to the probability 160 00:07:33,460 --> 00:07:34,010 of A. 161 00:07:35,060 --> 00:07:36,730 So that's the formal definition. 162 00:07:36,730 --> 00:07:40,000 So this formal definition justifies the product rule 163 00:07:40,000 --> 00:07:40,774 by definition. 164 00:07:40,774 --> 00:07:43,315 Because you just multiply both sides by the probability of A. 165 00:07:43,315 --> 00:07:46,080 And you get probability of A times the probability of B, 166 00:07:46,080 --> 00:07:48,320 given A is the probability of the intersection. 167 00:07:48,320 --> 00:07:50,330 Notice that implicit in this definition 168 00:07:50,330 --> 00:07:52,540 is, the probability of A better not be zero. 169 00:07:52,540 --> 00:07:54,600 So you can't condition on an event 170 00:07:54,600 --> 00:07:55,910 that has zero probability. 171 00:07:55,910 --> 00:07:58,950 Probability of B given A is only defined if probability of A 172 00:07:58,950 --> 00:07:59,520 is positive. 173 00:08:01,580 --> 00:08:03,960 OK. 174 00:08:03,960 --> 00:08:06,450 If you have a tree that's of depth three, 175 00:08:06,450 --> 00:08:09,920 then you need a product rule for three consecutive choices. 176 00:08:09,920 --> 00:08:12,160 And it generalizes in a straightforward way. 177 00:08:12,660 --> 00:08:16,322 Namely, the probability of A and B and C-- the first branch is A 178 00:08:16,322 --> 00:08:18,280 and the second branch is B and the third branch 179 00:08:18,280 --> 00:08:22,630 is C-- is the probability that you do A on the first branch, 180 00:08:22,630 --> 00:08:24,300 times the probability that you do 181 00:08:24,300 --> 00:08:26,620 B on the second branch, given that you did 182 00:08:26,620 --> 00:08:29,600 A on the first branch, and times the probability 183 00:08:29,600 --> 00:08:31,440 that you do C on the third branch, 184 00:08:31,440 --> 00:08:34,730 given that you did A on the first and B on the second. 185 00:08:34,730 --> 00:08:37,350 And this product rule for three could, in fact, 186 00:08:37,350 --> 00:08:41,409 be proved simply by substitution, using the product 187 00:08:41,409 --> 00:08:43,470 rule for two twice. 188 00:08:43,470 --> 00:08:46,760 And of course it generalizes to any finite number of sets. 189 00:08:51,350 --> 00:08:53,830 Another useful way to think about probability, 190 00:08:53,830 --> 00:08:59,280 that may be a more intuitive than the idea of choosing 191 00:08:59,280 --> 00:09:03,180 to whether or not to roll odd and then choosing 192 00:09:03,180 --> 00:09:05,672 whether or not to roll a one, usually what you think of is, 193 00:09:05,672 --> 00:09:07,130 you rolled the dice and then you're 194 00:09:07,130 --> 00:09:10,390 giving me some information about what that roll was. 195 00:09:10,390 --> 00:09:14,370 I don't think about the odds of rolling odd or not. 196 00:09:14,370 --> 00:09:16,890 I just tell you it's odd, and now 197 00:09:16,890 --> 00:09:20,630 tell me what is the probability that, among those odd outcomes, 198 00:09:20,630 --> 00:09:21,380 it was one. 199 00:09:22,050 --> 00:09:24,100 So a way to formalize that is, you 200 00:09:24,100 --> 00:09:26,295 could think of conditioning on an event A 201 00:09:26,295 --> 00:09:30,570 as defining a new probability function on the sample space. 202 00:09:30,570 --> 00:09:32,880 Once you're given that A occurred, 203 00:09:32,880 --> 00:09:36,202 I can now think that all the probabilities of the sample 204 00:09:36,202 --> 00:09:37,410 of the outcomes have changed. 205 00:09:37,970 --> 00:09:40,830 So I'll define a new probability measure 206 00:09:40,830 --> 00:09:44,300 relative to A, where all the outcomes that are not in A 207 00:09:44,300 --> 00:09:46,080 are going to be assigned probability zero. 208 00:09:46,080 --> 00:09:48,640 Because they can't happen given that A occurred. 209 00:09:48,640 --> 00:09:54,600 And all of the probabilities of outcomes of points in A, 210 00:09:54,600 --> 00:09:58,060 they get their probability raised in proportion to A. 211 00:09:58,060 --> 00:10:00,711 Because now A is going to be whole probability space. 212 00:10:01,075 --> 00:10:02,950 Let's be a little bit more formal about that, 213 00:10:02,950 --> 00:10:03,920 to be precise. 214 00:10:03,920 --> 00:10:06,550 We're going to define a new probability function. 215 00:10:06,550 --> 00:10:09,800 Probability sub A, on the same sample space, 216 00:10:09,800 --> 00:10:11,610 where the probability of an outcome 217 00:10:11,610 --> 00:10:14,610 is zero if the outcome is not in A, 218 00:10:14,610 --> 00:10:19,290 and its old probability relativized to the probability 219 00:10:19,290 --> 00:10:21,630 of A, if omega is in A. 220 00:10:21,630 --> 00:10:24,360 So that's the definition of the probability 221 00:10:24,360 --> 00:10:26,610 with respect to A of omega. 222 00:10:26,610 --> 00:10:29,840 It's a new probability measure on the same sample space. 223 00:10:29,840 --> 00:10:32,810 So to verify that this new thing is a probability space, 224 00:10:32,810 --> 00:10:36,350 you have to verify that the sum of the outcome probabilities 225 00:10:36,350 --> 00:10:37,130 is one. 226 00:10:37,130 --> 00:10:38,850 And that's a little exercise that I 227 00:10:38,850 --> 00:10:41,600 would encourage you to stop now and work out a piece of paper. 228 00:10:41,600 --> 00:10:44,307 Because it's trivial, but it's worth 229 00:10:44,307 --> 00:10:46,015 checking that you follow the definitions. 230 00:10:46,515 --> 00:10:49,710 The claim is simply that this new measure, probability 231 00:10:49,710 --> 00:10:50,410 sub A-- 232 00:10:50,410 --> 00:11:27,120 [NO SOUND] 233 00:11:27,120 --> 00:11:30,620 --will satisfy all of the rules of probability. 234 00:11:30,620 --> 00:11:32,350 Because it is a probability measure. 235 00:11:32,850 --> 00:11:36,030 So for example, I have the difference rule restated 236 00:11:36,030 --> 00:11:37,280 for conditional probabilities. 237 00:11:37,280 --> 00:11:41,300 Given that probability sub A is a probability measure, 238 00:11:41,300 --> 00:11:43,170 it satisfies the difference rule. 239 00:11:43,170 --> 00:11:46,140 Which means when I translate it into a conditional probability 240 00:11:46,140 --> 00:11:50,480 statement, I get that the probability of B minus C, 241 00:11:50,480 --> 00:11:53,360 given A, is equal to the probability of B, 242 00:11:53,360 --> 00:11:58,060 given A, minus the probability of B intersection C, given A. 243 00:11:58,560 --> 00:12:00,890 It's exactly the same as the standard difference rule, 244 00:12:00,890 --> 00:12:05,120 except that I have it made everything conditioned on A. 245 00:12:05,120 --> 00:12:09,160 And so we automatically get all of these rules 246 00:12:09,160 --> 00:12:12,430 for conditional probability that we had holding for probability. 247 00:12:12,430 --> 00:12:13,630 Which will be helpful. 248 00:12:13,630 --> 00:12:16,760 We won't have to think about proving them again.