Arithmetic Coding - An Alternate Explanation
Imagine that you want to encode as many characters as possible in an
integer.
It is easy to represent 1 character as an integer in the range 0..m.
But if m >= n, there is "leftover space" in the integer.
We can use the leftover space to code additional characters. We do
this by using a range of values to represent the first character. That is
*any* value in the range a..b (0 <= a <= b <= m) will represent a
character c. The range itself is used to code the next character. The
leftover space is used to code the next character, and so on.
Giving an equal amount of leftover space to each character (usually a
power of 2) is a traditional fixed-radix number system, where each
character is a digit.
If for each character, the amount of leftover space is proportional to
the probability of that character, the resulting integer is an optimal
code for the sequence of characters. Arithmetic coding amounts to
splitting up a large integer one character at a time so that the leftover
space is proportional the probability of the characters transmitted.
Due to quantization, it is not possible to split the space exactly in
proportion to the probability. To achieve a truly optimal code (that
minimizes this quantization error) requires a two-pass algorithm along
the lines of:
(a) compute the range of integer needed to represent ALL
characters with no space left over
(b) apportion the range according to probability
However, with fixed-precision arithmetic, one can get very close to
optimal by starting with a fixed-precision representation and extending
it by padding on the right when the "leftover space" becomes too small.
It turns out that you need not worry excessively about the quantization
error. Several multiply-free arithmetic codes have been invented that
approximate the probability only within a factor of 2, with
negligible degradation in compression.
I posted a C implementation of arithmetic coding (available from
msg.uwaterloo.ca:pub/dmc/arith*) that follows this explanation pretty
closely.