PlanetMath (more info)
 Math for the people, by the people.
Encyclopedia | Requests | Forums | Docs | Wiki | Random | RSS  
Login
create new user
name:
pass:
forget your password?
Main Menu
Huffman's algorithm (Algorithm)

Huffman's algorithm is a method for building an extended binary tree with a minimum weighted path length from a set of given weights. Initially construct a forest of singleton trees, one associated with each weight. If there are at least two trees, choose the two trees with the least weight associated with their roots and replace them with a new tree, constructed by creating a root node whose weight is the sum of the weights of the roots of the two trees removed, and setting the two trees just removed as this new node's children. This process is repeated until the forest consists of one tree.

Pseudocode


\begin{program} \mathrm{Huffman}(W, n) \text{{\bf Input}: A list $W$ of $n$ (\... ...T$ to $F$} \OD \mathrm{Huffman} \gets \text{ tree stored in } F \end{program}

Example

Let us work through an example for the set of weights $ \{ 1, 2, 3, 3, 4 \}$. In these intermediate steps we will display the temporary weights assigned by the algorithm in the circled nodes. Initially our forest is

\includegraphics{tree.5}

During the first step, the two trees with weights 1 and 2 are merged, to create a new tree with a root of weight 3.

\includegraphics{tree.6}

We now have three trees with weights of 3 at their roots. It does not matter which two we choose to merge. Let us choose the tree we just created and one of the singleton nodes of weight 3.

\includegraphics{tree.7}

Now our two minimum trees are the two singleton nodes of weights 3 and 4. We will combine these to form a new tree of weight 7.

\includegraphics{tree.8}

Finally we merge our last two remaining trees.

\includegraphics{tree.9}

The result is an extended binary tree of minimum weighted path length 29. In the following diagram each circled node is marked with its weighted path length.

\includegraphics{tree.10}

Analysis

Each iteration of Huffman's algorithm reduces the size of the problem by 1, and so there are exactly $ n$ iterations. The $ i$th iteration consists of locating the two minimum values in a list of length $ n - i + 1$. This is a linear operation, and so Huffman's algorithm clearly has a time complexity of $ \mathcal{O}(n^2)$.

However, it would be faster to sort the weights initially, and then maintain two lists. The first list consists of weights that have not yet been combined, and the second list consists of trees that have been formed by combining weights. This initial ordering is obtained at a cost of $ \mathcal{O}(n\log n)$. Obtaining the minimum two trees at each step then consists of two comparisons (compare the heads of the two lists, and then compare the larger to the item after the smaller). The ordering of the second list can be maintained cheaply by using a binary search to insert new elements. Since at step $ i$ there are $ i-1$ elements in the second list, $ \mathcal{O}(\log i)$ comparisons are needed for insertion. Over the entire duration of the algorithm the cost of keeping this list sorted is $ \mathcal{O}(n\log n)$. Therefore the overall time complexity of Huffman's algorithm is $ \mathcal{O}(n\log n)$.

In terms of space complexity, the algorithm constructs a complete binary tree with exactly $ n$ leaves. Therefore the output can only have at most $ 2n-1$ nodes. Thus Huffman's algorithm requires linear space.



"Huffman's algorithm" is owned by mps. [ full author list (2) | owner history (1) ]
(view preamble)

View style:

See Also: minimum weighted path length, Huffman coding


Cross-references: linear space, complete binary tree, terms, entire, binary search, ordering, time complexity, operation, length, size, iteration, weighted path length, positive, children, sum, node, roots, trees, singleton, forest, weights, minimum weighted path length, extended binary tree
There are 2 references to this entry.

This is version 5 of Huffman's algorithm, born on 2002-03-08, modified 2006-08-12.
Object id is 2780, canonical name is HuffmansAlgorithm.
Accessed 34868 times total.

Classification:
AMS MSC68P30 (Computer science :: Theory of data :: Coding and information theory )

Pending Errata and Addenda
None.
[ View all 4 ]
Discussion
Style: Expand: Order:
forum policy

No messages.

Interact
rate | post | correct | update request | add example | add (any)