A linear-time algorithm for constructing Huffman codes from a list of symbols
ordered by probability.
This method first appeared in an obscure conference that I have now lost
the reference to. Andre Trudel, a PhD student at Waterloo, independently
discovered it several years ago but we didn't bother publishing it because of
the obscure prior reference. I suppose this was a mistake and I'll generate
a tech. report about it. If anyone knows the reference, I'd appreciate
hearing about it.
In any event, here is our statement of the algorithm (copyright 1995,
Cormack & Trudel):
Input: list of in ascending order by priority
Intermediate data structure: queue of
Algorithm:
while input remains or sizeof(queue) > 1 loop
select pair of elements from input and/or queue with least probability
(remove these elements from their respective data structures)
build new treenode with the two above elements as children, and
with probability equal to the sum of the children's
add new treenode to queue
end loop
the single element in the queue is the root of the Huffman tree
Key observations:
- tree nodes are generated in increasing order of probability, therefore
the queue, although implemented as a simple linear queue with constant-
time operations, is a priority queue
- selecting the pair of elements involves inspecting both the input
and the queue. This requires constant time.
Permission is granted to reproduce this text provided the author's name
and address are cited:
Gordon Cormack
Department of Computer Science
University of Waterloo
Waterloo, Ontario N2L 3G1, Canada
gvcormac@plg.uwaterloo.ca
http://plg.uwaterloo.ca/~gvcormac
From cs.mu.OZ.AU!alistair Tue Apr 4 19:50:10 1995
Received: from mulga.cs.mu.OZ.AU ([128.250.1.22]) by plg.uwaterloo.ca with SMTP id <23548>; Tue, 4 Apr 1995 19:50:03 -0400
Received: from muloora.cs.mu.OZ.AU by mulga.cs.mu.OZ.AU with SMTP (5.83--+1.3.1+0.50); id AA23353
Wed, 5 Apr 1995 09:49:19 +1000 (from alistair@cs.mu.OZ.AU)
Message-Id: <9504042349.23353@mulga.cs.mu.OZ.AU>
To: gvcormac@plg.uwaterloo.ca
Subject: Linear time Huffman codes
Date: Tue, 4 Apr 1995 19:49:18 -0400
From: Alistair Moffat
Status: RO
Gordon,
Someone told me that you were interested in linear-time generation
of Huffman codes for sorted input probability distributions.
As far as I know, the earliest description of this idea is in
@inproceedings{lee76,
author = "J. Van {Leeuwen}",
title = "On the construction of {H}uffman trees",
booktitle = "Proc. 3rd International Conference on Automata,
Languages, and Programming",
address = "Edinburgh University",
pages = "382-410",
year = 1976,
}
Recently, Jyrki Katajainen (Copenhagen University) and I showed how
this method can be implemented in-place using the n words of the input
vector, and O(1) extra space. That is, we take an n-item array A
containing sorted symbol frequencies, use O(1) extra space and O(n)
time, and overwrite A[i] by the length in bits of the codeword for
symbol i. (This codeword length can then be easily turned into the
actual codeword in a further O(n)-time pass; for all of our work it is
more useful to generate the lengths so that we can calculate a
canonical code rather than risk our luck with the codewords produced by
Huffmans method, and so we tend to regard the job as being done when we
have the lengths). We have also been studying the problem of
space-efficient generation of optimal length-limited Huffman codes, and
presented a paper on the generation of these two types of prefix codes
last week at the Data Compression Conference in Utah.
If you are interested, I can send some more information through the post.
Cheers,...
Alistair Moffat, alistair@cs.mu.oz.au
Department of Computer Science, The University of Melbourne
Parkville, Victoria 3052, Australia.