1 00:00:01,010 --> 00:00:03,190 ALBERT MEYER: Today's topic is random variables. 2 00:00:03,190 --> 00:00:05,950 Random variables are an absolutely fundamental concept 3 00:00:05,950 --> 00:00:06,950 in probability theory. 4 00:00:06,950 --> 00:00:09,750 But before we get into officially defining them, 5 00:00:09,750 --> 00:00:11,270 let's start off with an example that 6 00:00:11,270 --> 00:00:15,156 in fact, is a game because that's a fun way to start. 7 00:00:15,156 --> 00:00:17,030 So we're going to play the bigger number game 8 00:00:17,030 --> 00:00:18,480 and here's how it works. 9 00:00:18,480 --> 00:00:22,410 There are two teams, and Team 1 has 10 00:00:22,410 --> 00:00:27,020 the task of picking two different integers between 0 11 00:00:27,020 --> 00:00:29,819 and 7 inclusive, and they write one integer 12 00:00:29,819 --> 00:00:31,610 on one piece of paper and the other integer 13 00:00:31,610 --> 00:00:33,410 on the other piece of paper. 14 00:00:33,410 --> 00:00:35,470 They turn the two pieces of paper 15 00:00:35,470 --> 00:00:37,820 face down so the numbers are not visible, 16 00:00:37,820 --> 00:00:41,480 and the other team then sees these two pieces of paper 17 00:00:41,480 --> 00:00:44,430 whose other side has different numbers written 18 00:00:44,430 --> 00:00:46,350 on them sitting on the table. 19 00:00:46,350 --> 00:00:50,752 What Team 2 then does is picks one of the pieces of paper 20 00:00:50,752 --> 00:00:53,490 and turns it over and looks at the number on it. 21 00:00:53,490 --> 00:00:55,350 And then, based on what that number 22 00:00:55,350 --> 00:00:58,420 is, they make a decision, stick with the number 23 00:00:58,420 --> 00:01:02,480 they have or switch to the other unknown number on the face 24 00:01:02,480 --> 00:01:03,780 down piece of paper. 25 00:01:03,780 --> 00:01:05,560 And that'll be their final number. 26 00:01:05,560 --> 00:01:08,400 And the game is that Team 2 wins if they wind up 27 00:01:08,400 --> 00:01:10,160 with the larger number. 28 00:01:10,160 --> 00:01:12,285 So they're going to look at the number on the paper 29 00:01:12,285 --> 00:01:13,701 that they expose and they're going 30 00:01:13,701 --> 00:01:15,090 to try to decide whether it looks 31 00:01:15,090 --> 00:01:17,182 like a big number or little number. 32 00:01:17,182 --> 00:01:19,390 If it looks like a big number, they'll stick with it. 33 00:01:19,390 --> 00:01:21,150 If it looks like a little number, 34 00:01:21,150 --> 00:01:25,260 they'll switch to the other one that they hope is larger. 35 00:01:25,260 --> 00:01:29,269 So which team do you think has an advantage here? 36 00:01:29,269 --> 00:01:31,060 Course, if you've read the notes, you know. 37 00:01:31,060 --> 00:01:33,290 But if you haven't been exposed to this before, 38 00:01:33,290 --> 00:01:35,580 it's not really so obvious. 39 00:01:35,580 --> 00:01:38,330 And what we encourage and what we 40 00:01:38,330 --> 00:01:41,270 used to do when we ran this in real time in classes 41 00:01:41,270 --> 00:01:44,370 that we would have students in teams, split 42 00:01:44,370 --> 00:01:46,160 their team in half, one would be Team 1 43 00:01:46,160 --> 00:01:47,868 and the other would be Team 2, and they'd 44 00:01:47,868 --> 00:01:50,730 play the game few times, see if they could figure out 45 00:01:50,730 --> 00:01:53,100 who had the advantage. 46 00:01:53,100 --> 00:01:54,990 And if you have the opportunity, this 47 00:01:54,990 --> 00:01:57,470 might be a good moment to stop this video 48 00:01:57,470 --> 00:02:00,410 and try playing the game with some friends if they're around. 49 00:02:00,410 --> 00:02:04,170 Otherwise, let's just proceed and see how it all works. 50 00:02:04,170 --> 00:02:09,770 So this is the strategy Team 2 is going to adopt. 51 00:02:09,770 --> 00:02:12,070 They're going to take this idea about big and small 52 00:02:12,070 --> 00:02:16,640 that I mentioned and act on it in a methodical way. 53 00:02:16,640 --> 00:02:20,190 So they're going to pick a paper to expose, giving 54 00:02:20,190 --> 00:02:22,390 each paper equal probability. 55 00:02:22,390 --> 00:02:24,490 So that guarantees that they have 56 00:02:24,490 --> 00:02:28,250 a 50/50 chance of picking the big number and a 50/50 chance 57 00:02:28,250 --> 00:02:29,470 of picking the little number. 58 00:02:29,470 --> 00:02:34,529 Whatever ingenuity Team 1 tried to do on which piece of paper 59 00:02:34,529 --> 00:02:36,320 was on the left and which was on the right, 60 00:02:36,320 --> 00:02:39,970 it doesn't really matter if Team 2 simply picks a piece of paper 61 00:02:39,970 --> 00:02:40,580 at random. 62 00:02:40,580 --> 00:02:42,510 There's no way that Team 1 can try 63 00:02:42,510 --> 00:02:46,080 to fake out Team 2 on where they put the number. 64 00:02:46,080 --> 00:02:47,600 OK. 65 00:02:47,600 --> 00:02:49,950 The next step is that Team 2 is going 66 00:02:49,950 --> 00:02:51,910 to decide whether the number that they can see, 67 00:02:51,910 --> 00:02:54,190 the exposed number, is small. 68 00:02:54,190 --> 00:02:55,620 And if so, would they switch? 69 00:02:55,620 --> 00:02:56,810 And otherwise they stick. 70 00:02:56,810 --> 00:03:01,110 So that is, they're going to define some threshold Z where 71 00:03:01,110 --> 00:03:03,410 being less than or equal to Z means small, 72 00:03:03,410 --> 00:03:05,960 and being greater than Z means large. 73 00:03:05,960 --> 00:03:08,680 The question is, how do they choose Z? 74 00:03:08,680 --> 00:03:10,660 Well, a naive thing to do would be 75 00:03:10,660 --> 00:03:14,870 to choose Z to be in the middle of the interval from 0 to 7. 76 00:03:14,870 --> 00:03:19,120 Let's say, you choose Z equals 3. 77 00:03:19,120 --> 00:03:25,590 So there would be four numbers less than or equal to Z 78 00:03:25,590 --> 00:03:28,320 and four numbers greater than Z. But of course, 79 00:03:28,320 --> 00:03:32,410 as soon as Team 1 knew that that was your Z, what would they do? 80 00:03:32,410 --> 00:03:36,440 Well, they would make sure that both numbers were 81 00:03:36,440 --> 00:03:39,260 on the same side of Z. If you're Z was 3, 82 00:03:39,260 --> 00:03:43,650 they would always choose their numbers to be, say, 0 and 1. 83 00:03:43,650 --> 00:03:46,690 And that way, when you were switching, 84 00:03:46,690 --> 00:03:49,090 your Z would tell you that you had a small number, 85 00:03:49,090 --> 00:03:50,710 you should switch to the other one. 86 00:03:50,710 --> 00:03:53,540 And you'd only have a 50/50 chance of winning. 87 00:03:53,540 --> 00:03:57,750 So if you fixed that value of Z, Team 2 88 00:03:57,750 --> 00:04:01,020 has a way of ensuring that you have no advantage. 89 00:04:01,020 --> 00:04:03,630 You can only win with probability 50/50. 90 00:04:03,630 --> 00:04:05,530 And that's true no matter what Z you take. 91 00:04:05,530 --> 00:04:08,217 If Team 1 knew what your Z was, they 92 00:04:08,217 --> 00:04:09,800 would just make sure to pick their two 93 00:04:09,800 --> 00:04:12,880 numbers on the same side of your Z. 94 00:04:12,880 --> 00:04:15,840 And then your Z wouldn't really tell you anything. 95 00:04:15,840 --> 00:04:18,709 You'd switch or stick in both cases, 96 00:04:18,709 --> 00:04:20,510 and you'd only have a 50/50 chance 97 00:04:20,510 --> 00:04:21,930 of picking the right number. 98 00:04:21,930 --> 00:04:25,190 So what you do-- and this is where probability comes in-- 99 00:04:25,190 --> 00:04:28,450 is you pick Z in a way that can't be predicted or made 100 00:04:28,450 --> 00:04:30,040 use of by Team 1. 101 00:04:30,040 --> 00:04:35,230 You pick Z at random, to be any number from 0 to 7, 102 00:04:35,230 --> 00:04:36,800 not including 7 including 0. 103 00:04:36,800 --> 00:04:40,080 That is, your number is either 0, 1, 2, up through 6. 104 00:04:40,080 --> 00:04:43,130 And being less than or equal to Z means small, 105 00:04:43,130 --> 00:04:46,460 and being greater than Z means large. 106 00:04:46,460 --> 00:04:48,670 And when you see a small number, you'll switch 107 00:04:48,670 --> 00:04:52,320 and when you see a large number, you'll stick. 108 00:04:52,320 --> 00:04:55,310 But what's going to be large and what's going to be small 109 00:04:55,310 --> 00:04:59,920 is going to vary each time you play the game, depending 110 00:04:59,920 --> 00:05:05,510 on what random number, Z, comes out to be. 111 00:05:05,510 --> 00:05:08,420 So let's analyze your probability if you're Team 2. 112 00:05:08,420 --> 00:05:11,726 What's the probability that you're going to win now? 113 00:05:11,726 --> 00:05:16,399 Well, let's suppose that Team 1 picks these two numbers. 114 00:05:16,399 --> 00:05:17,940 We don't know what they are, but they 115 00:05:17,940 --> 00:05:20,451 have to pick a low number that's less than a high number. 116 00:05:20,451 --> 00:05:22,200 So these two numbers are at least 1 apart, 117 00:05:22,200 --> 00:05:24,890 they can't have the same number on both pieces of paper. 118 00:05:24,890 --> 00:05:29,127 Otherwise, it's clear that you are not 119 00:05:29,127 --> 00:05:30,960 going to be able to pick the large one, that 120 00:05:30,960 --> 00:05:31,780 would be cheating. 121 00:05:31,780 --> 00:05:33,322 OK, so there's two different numbers. 122 00:05:33,322 --> 00:05:35,196 So one of them has to be less than the other. 123 00:05:35,196 --> 00:05:37,340 We don't know how much less, might be a lot less, 124 00:05:37,340 --> 00:05:39,990 might be only one less, but low is less than high. 125 00:05:39,990 --> 00:05:43,730 OK, now we can consider three cases of what 126 00:05:43,730 --> 00:05:45,930 happens with your strategy. 127 00:05:45,930 --> 00:05:48,730 The most interesting case is the middle case. 128 00:05:48,730 --> 00:05:53,100 That is, when your Z, which was chosen at random, 129 00:05:53,100 --> 00:05:56,150 happens to fall in the interval between low and high. 130 00:05:56,150 --> 00:05:59,240 That is, your Z is strictly less than high and greater than 131 00:05:59,240 --> 00:06:00,630 or equal to low. 132 00:06:00,630 --> 00:06:04,510 And then in that case, your Z is really guiding you correctly 133 00:06:04,510 --> 00:06:05,560 on what to do. 134 00:06:05,560 --> 00:06:08,350 If you turn over the low card, then it's 135 00:06:08,350 --> 00:06:10,790 going to look low because it's less than or equal to Z 136 00:06:10,790 --> 00:06:12,880 so you'll switch to the high card and win. 137 00:06:12,880 --> 00:06:15,690 If you turn over the high card, it's 138 00:06:15,690 --> 00:06:17,820 going to be greater than Z so it'll look high 139 00:06:17,820 --> 00:06:19,910 and you'll know to stick with it. 140 00:06:19,910 --> 00:06:23,290 So in this case, you're guaranteed to win. 141 00:06:23,290 --> 00:06:25,150 If you were lucky enough to guess 142 00:06:25,150 --> 00:06:28,390 the right threshold between low and high, you're going to win. 143 00:06:28,390 --> 00:06:31,190 And so the probability that you win, 144 00:06:31,190 --> 00:06:33,850 given the middle case occurs, is 1. 145 00:06:33,850 --> 00:06:35,280 Now, what about the middle case? 146 00:06:35,280 --> 00:06:37,070 How often does that happen? 147 00:06:37,070 --> 00:06:41,310 Well, the difference between low and high is at least 1, 148 00:06:41,310 --> 00:06:44,650 so there's guaranteed to be 1 chance in 7 149 00:06:44,650 --> 00:06:48,110 that your Z is going to fall between them. 150 00:06:48,110 --> 00:06:51,550 And it could be more if low and high are further apart, 151 00:06:51,550 --> 00:06:53,530 but as long as they're at least one apart, 152 00:06:53,530 --> 00:06:57,200 there's a 1/7 chance that you're going to fall in between them. 153 00:06:57,200 --> 00:06:58,670 OK. 154 00:06:58,670 --> 00:07:02,520 Now, in case H, that's the case where 155 00:07:02,520 --> 00:07:04,920 Z happens to be chosen greater than 156 00:07:04,920 --> 00:07:07,490 or equal to the high number that Team 1 shows. 157 00:07:07,490 --> 00:07:10,480 In other words, Z is bigger than both numbers than Team 1 158 00:07:10,480 --> 00:07:12,890 shows and put on the pieces of paper. 159 00:07:12,890 --> 00:07:16,100 Well, in that case, Z just isn't telling you anything. 160 00:07:16,100 --> 00:07:18,480 So what's going to happen is that both numbers are going 161 00:07:18,480 --> 00:07:21,812 to look high to you-- sorry-- both numbers 162 00:07:21,812 --> 00:07:24,270 are going to look low to you because they're both less than 163 00:07:24,270 --> 00:07:26,570 or equal to Z. So you'll switch. 164 00:07:26,570 --> 00:07:30,520 And that means that you'll win, if and only 165 00:07:30,520 --> 00:07:34,810 if, you happen to turn the low card over first. 166 00:07:34,810 --> 00:07:36,540 Well that was 50/50. 167 00:07:36,540 --> 00:07:39,570 So the probability that you win, given 168 00:07:39,570 --> 00:07:45,390 that Z-- both cards are on the low side of Z, 169 00:07:45,390 --> 00:07:46,920 you'll win with half the time. 170 00:07:46,920 --> 00:07:50,150 And symmetrically, if Z is less than the low card, that 171 00:07:50,150 --> 00:07:52,900 is, Z is less than both cards chosen by Team 1, 172 00:07:52,900 --> 00:07:56,400 then they're both going to look high, and so you'll stick. 173 00:07:56,400 --> 00:07:59,677 And that means that you'll stick, you'll win, if and only 174 00:07:59,677 --> 00:08:01,510 if, you happen to have picked the high card. 175 00:08:01,510 --> 00:08:03,320 There's a 50/50 chance of that. 176 00:08:03,320 --> 00:08:09,130 So again, in this case that Z makes both cards look high, 177 00:08:09,130 --> 00:08:14,900 or Z itself is low, Team 2, you win with probability 1/2. 178 00:08:14,900 --> 00:08:19,010 Well, that's great because now we can apply total probability. 179 00:08:19,010 --> 00:08:26,510 And what total probability tells us is that Team 2 wins 180 00:08:26,510 --> 00:08:29,120 is the probability that they win given case 181 00:08:29,120 --> 00:08:32,280 M times the probability of M plus the probability 182 00:08:32,280 --> 00:08:34,960 that they win given not the middle case 183 00:08:34,960 --> 00:08:37,860 times the probability of not the middle case. 184 00:08:37,860 --> 00:08:39,580 But we figured out what these were. 185 00:08:39,580 --> 00:08:41,870 Well, at least inequalities on them, 186 00:08:41,870 --> 00:08:46,010 because there's probability 1 that you'll 187 00:08:46,010 --> 00:08:47,990 win 1/7 of the time. 188 00:08:47,990 --> 00:08:50,880 And there's probability a 1/2 that you'll 189 00:08:50,880 --> 00:08:54,900 win the rest of the time, the other 6/7 of the time. 190 00:08:54,900 --> 00:08:57,590 You're going to win 4/7 of the time. 191 00:08:57,590 --> 00:09:02,050 The probability that you win playing your strategy is 4/7. 192 00:09:02,050 --> 00:09:04,030 It's better than 50/50. 193 00:09:04,030 --> 00:09:06,050 You have an advantage. 194 00:09:06,050 --> 00:09:09,430 And whether that was a priori obvious or not, I don't know. 195 00:09:09,430 --> 00:09:11,270 But I think it's kind of cool. 196 00:09:11,270 --> 00:09:14,770 OK, you win with probability 4/7. 197 00:09:14,770 --> 00:09:17,540 Now, Team 2 has the advantage. 198 00:09:17,540 --> 00:09:19,430 And the important thing to understand 199 00:09:19,430 --> 00:09:22,280 is it does not matter what team does. 200 00:09:22,280 --> 00:09:26,000 No matter how smart Team 1 is, Team 2 201 00:09:26,000 --> 00:09:27,780 has gotten control of the situation 202 00:09:27,780 --> 00:09:30,530 because they picked-- which piece of paper 203 00:09:30,530 --> 00:09:33,210 they picked at random 50/50. 204 00:09:33,210 --> 00:09:36,000 So it doesn't matter what strategy Team 1 used 205 00:09:36,000 --> 00:09:37,700 on where they placed the numbers. 206 00:09:37,700 --> 00:09:41,570 And they chose Z randomly, so again, it 207 00:09:41,570 --> 00:09:44,310 doesn't matter what numbers Team 1 shows. 208 00:09:44,310 --> 00:09:49,540 Team 2 is still going to have their 1/7 chance of coming out 209 00:09:49,540 --> 00:09:53,980 ahead, which is enough to tip the balance in their favor. 210 00:09:53,980 --> 00:09:56,990 It's interesting that symmetrically, Team 1 also 211 00:09:56,990 --> 00:09:59,390 has a random strategy that they can use, 212 00:09:59,390 --> 00:10:04,070 which guarantees that no matter what Team 2 does, Team 2 213 00:10:04,070 --> 00:10:06,950 wins with probability at most 4/7. 214 00:10:06,950 --> 00:10:10,130 So either team can force the probability 215 00:10:10,130 --> 00:10:15,800 that Team 2 wins to be at most 4/7 and at least 4/7. 216 00:10:15,800 --> 00:10:19,740 So if they both play optimally, it's going to stay at 4/7. 217 00:10:19,740 --> 00:10:22,780 And that's again, true no matter what Team 2 does, 218 00:10:22,780 --> 00:10:25,930 Team 1 can put this upper bound to 4/7 on it. 219 00:10:25,930 --> 00:10:28,310 So essentially we can say that the value 220 00:10:28,310 --> 00:10:31,170 of this game, the probability that Team 2 wins 221 00:10:31,170 --> 00:10:34,650 is optimally for both is 4/7. 222 00:10:34,650 --> 00:10:38,220 OK, now what does this game got to do with anything, 223 00:10:38,220 --> 00:10:40,600 with our general topic of random variables? 224 00:10:40,600 --> 00:10:42,970 Well, we'll be formal in a moment. 225 00:10:42,970 --> 00:10:44,890 But informally, a random variable 226 00:10:44,890 --> 00:10:49,460 is simply a number that's produced by a random process. 227 00:10:49,460 --> 00:10:52,000 And just to give an example before we come up 228 00:10:52,000 --> 00:10:55,530 with a formal definition, the threshold variable Z 229 00:10:55,530 --> 00:11:01,930 was a thing that took a value from 0 to 6 inclusive, 230 00:11:01,930 --> 00:11:03,600 each with probability 1/7. 231 00:11:03,600 --> 00:11:07,970 So it was producing a number by a random process, 232 00:11:07,970 --> 00:11:11,880 that chose a number at random with equal probability. 233 00:11:11,880 --> 00:11:22,000 If Team 2 plays properly at random picking which 234 00:11:22,000 --> 00:11:25,770 piece of paper to expose, then the number of the exposed card, 235 00:11:25,770 --> 00:11:29,760 or more precisely, whether the exposed card is high or low, 236 00:11:29,760 --> 00:11:33,000 will also be a random variable. 237 00:11:33,000 --> 00:11:37,740 And if Team 1 plays optimally, the number on the exposed card 238 00:11:37,740 --> 00:11:40,320 is going to be a random variable. 239 00:11:40,320 --> 00:11:42,350 That is, Team 1 in their optimal strategy that 240 00:11:42,350 --> 00:11:46,310 puts an upper bound to 4/7 is in fact, going to choose the two 241 00:11:46,310 --> 00:11:47,230 numbers randomly. 242 00:11:47,230 --> 00:11:49,040 So the exposed card is going to wind up 243 00:11:49,040 --> 00:11:51,930 being another random variable, a number produced 244 00:11:51,930 --> 00:11:53,270 by the random process. 245 00:11:53,270 --> 00:11:55,800 And likewise, the number of the larger card 246 00:11:55,800 --> 00:12:00,070 if Team 1 picks its larger and smaller cards randomly, 247 00:12:00,070 --> 00:12:02,560 it's going to be another example of a number produced 248 00:12:02,560 --> 00:12:04,256 by a random process. 249 00:12:04,256 --> 00:12:06,130 And likewise, the number of the smaller card. 250 00:12:06,130 --> 00:12:07,230 So that's enough examples. 251 00:12:07,230 --> 00:12:09,030 This little game has a whole bunch 252 00:12:09,030 --> 00:12:10,940 of random variables appearing in it. 253 00:12:10,940 --> 00:12:13,480 And in the next segment, we will look 254 00:12:13,480 --> 00:12:15,810 again officially, what is the definition 255 00:12:15,810 --> 00:12:18,030 of a random variable?