1 00:00:00,500 --> 00:00:03,370 Given a set of symbols and their probabilities, 2 00:00:03,370 --> 00:00:06,960 Huffman’s Algorithm tells us how to construct an optimal 3 00:00:06,960 --> 00:00:08,750 variable-length encoding. 4 00:00:08,750 --> 00:00:11,570 By “optimal” we mean that, assuming we’re encoding each 5 00:00:11,570 --> 00:00:15,000 symbol one-at-a-time, no other variable-length code will have 6 00:00:15,000 --> 00:00:17,370 a shorter expected length. 7 00:00:17,370 --> 00:00:20,000 The algorithm builds the binary tree for the encoding 8 00:00:20,000 --> 00:00:21,340 from the bottom up. 9 00:00:21,340 --> 00:00:23,830 Start by choosing the two symbols with the smallest 10 00:00:23,830 --> 00:00:25,550 probability (which means they have 11 00:00:25,550 --> 00:00:28,010 highest information content and should have the longest 12 00:00:28,010 --> 00:00:29,750 encoding). 13 00:00:29,750 --> 00:00:31,440 If anywhere along the way, two symbols 14 00:00:31,440 --> 00:00:35,010 have the same probability, simply choose one arbitrarily. 15 00:00:35,010 --> 00:00:37,340 In our running example, the two symbols with the lowest 16 00:00:37,340 --> 00:00:40,310 probability are C and D. 17 00:00:40,310 --> 00:00:42,580 Combine the symbols as a binary subtree, 18 00:00:42,580 --> 00:00:44,770 with one branch labeled “0” and the other “1”. 19 00:00:44,770 --> 00:00:48,530 It doesn’t matter which labels go with which branch. 20 00:00:48,530 --> 00:00:51,050 Remove C and D from our list of symbols, 21 00:00:51,050 --> 00:00:54,840 and replace them with the newly constructed subtree, whose root 22 00:00:54,840 --> 00:00:57,660 has the associated probability 1/6, 23 00:00:57,660 --> 00:01:01,640 the sum of the probabilities of its two branches. 24 00:01:01,640 --> 00:01:04,620 Now continue, at each step choosing the two symbols 25 00:01:04,620 --> 00:01:07,050 and/or subtrees with the lowest probabilities, 26 00:01:07,050 --> 00:01:10,620 combining the choices into a new subtree. 27 00:01:10,620 --> 00:01:12,650 At this point in our example, the symbol A 28 00:01:12,650 --> 00:01:16,880 has the probability 1/3, the symbol B the probability 1/2 29 00:01:16,880 --> 00:01:19,610 and the C/D subtree probability 1/6. 30 00:01:19,610 --> 00:01:24,290 So we’ll combine A with the C/D subtree. 31 00:01:24,290 --> 00:01:26,880 On the final step we only have two choices left: 32 00:01:26,880 --> 00:01:29,810 B and the A/C/D subtree, which we 33 00:01:29,810 --> 00:01:32,350 combine in a new subtree, whose root then becomes 34 00:01:32,350 --> 00:01:33,910 the root of the tree representing 35 00:01:33,910 --> 00:01:36,440 the optimal variable-length code. 36 00:01:36,440 --> 00:01:40,060 Happily, this is the code we’ve been using all along! 37 00:01:40,060 --> 00:01:42,730 As mentioned above, we can produce a number of different 38 00:01:42,730 --> 00:01:45,890 variable-length codes by swapping the “0” and “1” labels 39 00:01:45,890 --> 00:01:48,290 on any of the subtree branches. 40 00:01:48,290 --> 00:01:51,120 But all those encodings would have the same expected length, 41 00:01:51,120 --> 00:01:53,770 which is determined by the distance of each symbol 42 00:01:53,770 --> 00:01:56,640 from the root of the tree, not the labels along the path 43 00:01:56,640 --> 00:01:58,560 from root to leaf. 44 00:01:58,560 --> 00:02:00,250 So all these different encodings are 45 00:02:00,250 --> 00:02:03,110 equivalent in terms of their efficiency. 46 00:02:03,110 --> 00:02:05,450 Now try your hand at using Huffman’s Algorithm 47 00:02:05,450 --> 00:02:08,960 to construct optimal variable-length encodings.