# 5.3. Huffman’s Algorithm

# 1. Contents

Idea
- Put the rarest characters at the bottom of the tree.
A greedy approach
- Repeat the following until only one tree is left:
  1. Start from a set of single node trees.
  2. Pick up two trees u and v with the lowest frequencies.
  3. Merge them by adding a root node w where the frequency of the new node is the sum of those of u and v.
  4. Replace u and v by w.

# 2. Implementation and Time Complexity

Implementation issues
- How can you manage a dynamic set to which the following operations occur frequently:
  - Delete the elements with the highest priority from the list.
  - Insert an element with some priority into the list.
  - The answer is to use Priority Queue.
- The priority queue can be implemented in many ways. Which one would you use? | Representation | Insertion | Deletion | | --------------------- | ------------ | ------------ | | Unordered array | O(1) | O(n) | | Unordered linked list | O(1) | O(n) | | Sorted array | O(n) | O(1) | | Sorted linked list | O(n) | O(1) | | Heap | O(\log n) | O(\log n) |
✓ The answer is to use the priority queue based on (min) heap.

→ O (n\log n) time

insert the n single-node trees : O(n)
\log n + \log (n-1) + ... + \log(2) = \log n! = O (n \log n)

typedef struct _node
{
    char symbol;
    int freq;
    struct _node left;
    struct _node right;
} NODE;
NODE u, v, w;

for (i = 1; i <= n; i++) // O(n)
{
    /* insert the n single-node trees */
}
for (i = 1; i <= n - 1; i++)
{
    u = PQ_delete();
    v = PQ_delete();
    w = make_a_new_node();
    w->left = u;
    w->right = v;
    w->freq = u->freq + v->freq;
    PQ_insert(w);
}
w = PQ_delete();
/* w points to the optimal tree. */

# 3. Correctness of the Huffman’s Algorithm

siblings, branch
(Proof by mathematical induction)
- If the set of trees obtained in the i-th step are branches in a binary tree corresponding to an optimal code, then the set of trees obtained in the $(i+1)$st step are also branches in a binary tree corresponding to an optimal code.
- Optimal binary tree
- i-th step
- \rightarrow (i+1)-th step
- (Base step) When k = 0, each tree is trivially a branch of an optimal tree.
- (Induction step) Suppose that the proposition is true when k = i, that S is the set of trees that exist after the $i$th step, and that T is the corresponding optimal tree. Let u and v be the root of the trees combined in the $(i+1)$st step.
- Case 1: If u and v are siblings in T, we are done.
- Case 2: Otherwise, assume that u is at a level in T at least as low as v, and that w is the u’s sibling in T.
  - The branch in T with root w is one of the trees in S or contains one of those trees as a subtree.
    - why?
  - Therefore, freq(w) ≥ freq(v) and depth(w) ≥ depth(v) in T
  - If we create a new tree T’ by swapping the two branches with root v and w, then bits(T’) = bits(T) + (depth(w) – depth(v))(freq(v) – freq(w)) ≤ bits(T).
  - Since bits(T) ≤ bits(T’), T’ is also optimal. Hence, the proposition also holds when k = i+1.
- What happens if all the steps are done?

alg