1 00:00:01,069 --> 00:00:05,050 In the next section we're going to start our discussion on how to actually engineer the 2 00:00:05,050 --> 00:00:09,660 bit encodings we'll use in our circuitry, but first we'll need a way to evaluate the 3 00:00:09,660 --> 00:00:12,120 efficacy of an encoding. 4 00:00:12,120 --> 00:00:16,980 The entropy of a random variable is average amount of information received when learning 5 00:00:16,980 --> 00:00:19,500 the value of the random variable. 6 00:00:19,500 --> 00:00:23,439 The mathematician's name for "average" is "expected value"; that's what the capital 7 00:00:23,439 --> 00:00:25,050 E means. 8 00:00:25,050 --> 00:00:30,560 We compute the average in the obvious way: we take the weighted sum, where the amount 9 00:00:30,560 --> 00:00:37,329 of information received when learning of particular choice i -- that's the log-base-2 of 1/p_i 10 00:00:37,329 --> 00:00:41,160 -- is weighted by the probability of that choice actually happening. 11 00:00:41,160 --> 00:00:43,489 Here's an example. 12 00:00:43,489 --> 00:00:49,149 We have a random variable that can take on one of four values: A, B, C or D. 13 00:00:49,149 --> 00:00:53,739 The probabilities of each choice are shown in the table, along with the associated information 14 00:00:53,739 --> 00:00:56,280 content. 15 00:00:56,280 --> 00:01:00,949 Now we'll compute the entropy using the probabilities and information content. 16 00:01:00,949 --> 00:01:06,770 So we have the probability of A (1/3) times its associated information content (1.58 bits), 17 00:01:06,770 --> 00:01:13,360 plus the probability of B times its associated information content, and so on. 18 00:01:13,360 --> 00:01:17,079 The result is 1.626 bits. 19 00:01:17,079 --> 00:01:21,330 This is telling us that a clever encoding scheme should be able to do better than simply 20 00:01:21,330 --> 00:01:27,100 encoding each symbol using 2 bits to represent which of the four possible values is next. 21 00:01:27,100 --> 00:01:29,020 Food for thought! 22 00:01:29,020 --> 00:01:33,399 We'll discuss this further in the third section of this chapter. 23 00:01:33,399 --> 00:01:37,820 So, what is the entropy telling us? 24 00:01:37,820 --> 00:01:42,490 Suppose we have a sequence of data describing a sequence of values of the random variable 25 00:01:42,490 --> 00:01:43,490 X. 26 00:01:43,490 --> 00:01:49,860 If, on the average, we use less than H(X) bits to transmit each piece of information 27 00:01:49,860 --> 00:01:54,340 in the sequence, we will not be sending enough information to resolve the uncertainty about 28 00:01:54,340 --> 00:01:55,840 the values. 29 00:01:55,840 --> 00:02:01,649 In other words, the entropy is a lower bound on the number of bits we need to transmit. 30 00:02:01,649 --> 00:02:06,579 Getting less than this number of bits wouldn't be good if the goal was to unambiguously describe 31 00:02:06,579 --> 00:02:10,970 the sequence of values -- we'd have failed at our job! 32 00:02:10,970 --> 00:02:16,140 On the other hand, if we send, on the average, more than H(X) bits to describe the sequence 33 00:02:16,140 --> 00:02:20,970 of values, we will not be making the most effective use of our resources, since the 34 00:02:20,970 --> 00:02:25,050 same information might have been able to be represented with fewer bits. 35 00:02:25,050 --> 00:02:29,560 This is okay, but perhaps with some insights we could do better. 36 00:02:29,560 --> 00:02:35,970 Finally, if we send, on the average, exactly H(X) bits, then we'd have the perfect encoding. 37 00:02:35,970 --> 00:02:40,660 Alas, perfection is, as always, a tough goal, so most of the time we'll have to settle for 38 00:02:40,660 --> 00:02:42,040 getting close. 39 00:02:42,040 --> 00:02:48,300 In the final set of exercises for this section, try computing the entropy for various scenarios.