Some practical encoding/decoding questions
- Key People:
- Claude Shannon
- Harry Nyquist
- David Blackwell
- On the Web:
- Princeton University - Some background on information theory (PDF) (Jan. 02, 2025)
To be useful, each encoding must have a unique decoding. Consider the encoding shown in the table A less useful encoding. While every message can be encoded using this scheme, some will have duplicate encodings. For example, both the message AA and the message C will have the encoding 00. Thus, when the decoder receives 00, it will have no obvious way of telling whether AA or C was the intended message. For this reason, the encoding shown in this table would have to be characterized as “less useful.”
M | → | S |
---|---|---|
A | 0 | |
B | 1 | |
C | 00 | |
D | 11 |
Encodings that produce a different signal for each distinct message are called “uniquely decipherable.” Most real applications require uniquely decipherable codes.
Another useful property is ease of decoding. For example, using the first encoding scheme, a received signal can be divided into groups of two characters and then decoded using the table Encoding 1 of M using S. Thus, to decode the signal 11010111001010, first divide it into 11 01 01 11 00 10 10, which indicates that DBBDACC was the original message.
The scheme shown in the table Encoding 2 of M using S must be deciphered using a different technique because strings of different length are used to represent characters from M. The technique here is to read one digit at a time until a matching character is found in this table. For example, suppose that the string 01001100111010 is received. Reading this string from the left, the first 0 matches the character A. Thus, the string 1001100111010 now remains. Because 1, the next digit, does not match any entry in the table, the next digit must now be appended. This two-digit combination, 10, matches the character B.
The table Decoding a message encoded with encoding 2 shows each unique stage of decoding this string. While the second encoding might appear to be complicated to decode, it is logically simple and easy to automate. The same technique can also be used to decode the first encoding.
decoded so far | remainder of signal string |
---|---|
01001100111010 | |
A | 1001100111010 |
A B | 01100111010 |
A B A | 1100111010 |
A B A C | 0111010 |
A B A C A | 111010 |
A B A C A D | 010 |
A B A C A D A | 10 |
A B A C A D A B |
The table An impractical encoding, on the other hand, shows a code that involves some complications in its decoding. The encoding here has the virtue of being uniquely decipherable, but, to understand what makes it “impractical,” consider the following strings: 011111111111111 and 01111111111111. The first is an encoding of CDDDD and the second of BDDDD. Unfortunately, to decide whether the first character is a B or a C requires viewing the entire string and then working back. Having to wait for the entire signal to arrive before any part of the message can be decoded can lead to significant delays.
M | → | S |
---|---|---|
A | 0 | |
B | 01 | |
C | 011 | |
D | 111 |
In contrast, the encodings in both the table Encoding 1 of M using S and the table Encoding 2 of M using S allow messages to be decoded as the signal is received, or “on the fly.” The table Another comparison of encoding from M to S compares the first two encodings.
character | number of cases | length of encoding 1 | length of encoding 2 |
---|---|---|---|
A | 30 | 60 | 30 |
B | 30 | 60 | 60 |
C | 30 | 60 | 90 |
D | 30 | 60 | 90 |
Totals | 120 | 240 | 270 |
Encodings that can be decoded on the fly are called prefix codes or instantaneous codes. Prefix codes are always uniquely decipherable, and they are generally preferable to nonprefix codes for communication because of their simplicity. Additionally, it has been shown that there must always exist a prefix code whose transmission rate is as good as that of any uniquely decipherable code, and, when the probability distribution of characters in the message is known, further improvements in the transmission rate can be achieved by the use of variable-length codes, such as the encoding used in the table Encoding 2 of M using S. (Huffman codes, invented by the American D.A. Huffman in 1952, produce the minimum average code length among all uniquely decipherable variable-length codes for a given symbol set and a given probability distribution.)