While Huffman or Shannon codes achieve compression of data by using
the underlying distribution to assign variable length codewords, in
most applications, we do not know the underlying distribution. In
such situations, the source is usually modeled as an unknown
distribution in a collection of distributions, the
collection of i.i.d., markov, context tree sources, depending on the
context of the problem.
The question is: how much penalty is incurred if we use one (albeit
carefully chosen) encoder for the whole class instead of a
separate one for each distribution? This penalty is measured by the
redundancy of encoding.
However, Kieffer showed that universal compression of even i.i.d.
sources over infinite alphabets incurs infinite
redundancy [2]. (We analyze this problem in more detail
in [3].)