# 5.3. Huffman’s Algorithm

In 

# 1. Contents

  • Idea
    • Put the rarest characters at the bottom of the tree.
  • A greedy approach
    • Repeat the following until only one tree is left:
      1. Start from a set of single node trees.
      2. Pick up two trees u and v with the lowest frequencies.
      3. Merge them by adding a root node w where the frequency of the new node is the sum of those of u and v.
      4. Replace u and v by w. image

# 2. Implementation and Time Complexity

  • Implementation issues
    • How can you manage a dynamic set to which the following operations occur frequently:
      • Delete the elements with the highest priority from the list.
      • Insert an element with some priority into the list.
      • The answer is to use Priority Queue.
    • The priority queue can be implemented in many ways. Which one would you use? | Representation | Insertion | Deletion | | --------------------- | ------------ | ------------ | | Unordered array | O(1) | O(n) | | Unordered linked list | O(1) | O(n) | | Sorted array | O(n) | O(1) | | Sorted linked list | O(n) | O(1) | | Heap | O(\log n) | O(\log n) |
  • ✓ The answer is to use the priority queue based on (min) heap.
  • O (n\log n) time
    • insert the n single-node trees : O(n)
    • \log n + \log (n-1) + ... + \log(2) = \log n! = O (n \log n)
    typedef struct _node
    {
        char symbol;
        int freq;
        struct _node left;
        struct _node right;
    } NODE;
    NODE u, v, w;
    
    for (i = 1; i <= n; i++) // O(n)
    {
        /* insert the n single-node trees */
    }
    for (i = 1; i <= n - 1; i++)
    {
        u = PQ_delete();
        v = PQ_delete();
        w = make_a_new_node();
        w->left = u;
        w->right = v;
        w->freq = u->freq + v->freq;
        PQ_insert(w);
    }
    w = PQ_delete();
    /* w points to the optimal tree. */
    image image

# 3. Correctness of the Huffman’s Algorithm

  • siblings, branch
  • (Proof by mathematical induction)
    • If the set of trees obtained in the i-th step are branches in a binary tree corresponding to an optimal code, then the set of trees obtained in the $(i+1)$st step are also branches in a binary tree corresponding to an optimal code.
    • Optimal binary tree image
    • i-th step image
    • \rightarrow (i+1)-th step image
    • (Base step) When k = 0, each tree is trivially a branch of an optimal tree.
      image
    • (Induction step) Suppose that the proposition is true when k = i, that S is the set of trees that exist after the $i$th step, and that T is the corresponding optimal tree. Let u and v be the root of the trees combined in the $(i+1)$st step. image
    • Case 1: If u and v are siblings in T, we are done. image
    • Case 2: Otherwise, assume that u is at a level in T at least as low as v, and that w is the u’s sibling in T.
      • The branch in T with root w is one of the trees in S or contains one of those trees as a subtree.
        • why?
      • Therefore, freq(w) ≥ freq(v) and depth(w) ≥ depth(v) in T
      • If we create a new tree T’ by swapping the two branches with root v and w, then bits(T’) = bits(T) + (depth(w) – depth(v))(freq(v) – freq(w)) ≤ bits(T).
      • Since bits(T) ≤ bits(T’), T’ is also optimal. Hence, the proposition also holds when k = i+1.
        image image
    • What happens if all the steps are done?