Encoding

Encoding is a means of converting data. Data may be converted into another format in order to transmit it, store it, or compress it. Encoding might also be used to describe a data structure or format, for example a file format. Algorithms can encode and decode this data without any sort of key.

Encoding is not Encryption!

As long as someone can determine the rules that were applied to the original data, they can easily reverse the encoding without any special knowledge, like passwords or secret keys. For this reason, encoding should never be used in a situation where the security and confidentiality of data is important.

Binary

Binary encoding consists of only two basic components and can be represented by any two values. They might be an ON and OFF state, a clockwise or counter-clockwise spin, or simply the numbers 1 and 0.

Hexadecimal

It can be difficult to read a long string of 0s and 1s. One of the things that can make it a bit easier to understand is representing binary-encoded data using hexadecimal, or base-16, numbers. The highest number we can fit into a single byte is 0b11111111, 0xff, or decimal 255.

EXAMPLE

DecimalBinaryHexadecimal
00b00x0
10b10x1
20b100x2
30b110x3
40b1000x4
50b1010x5
60b1100x6
70b1110x7
80b10000x8
90b10010x9
100b10100xa
110b10110xb
120b11000xc
130b11010xd
140b11100xe
150b11110xf

American Standard Code for Information Interchange

American Standard Code for Information Interchange (ASCII) is a type of encoding used to store and process both printable and non-printable characters. In ASCII every character is represented with a 7-bit binary number, a string of seven 0s or 1s. ASCII contains encoding for all the alphanumeric characters and symbols on a modern keyboard, as well as encoding for things like TABs, Line Feeds, and even Backspaces.

Unicode and Unicode Transformation Format

Unicode is a standard that provides a number, or unique code point, for each character. Another way to say this is that each character is mapped to a unique value.

Unicode includes numbers and characters from the familiar Latin alphabet, for example, U+0041 for the Latin uppercase letter “A”. There are also Unicode numbers for each character in, for example, the Cyrillic, Thai, and Hangul alphabets. In total, there are over a million (a total of 1,112,064) mapped visible and non-visible characters.

Unicode Transformation Format (UTF) is a way to encode these Unicode mappings. The most common forms of UTF are UTF-8, which uses 8 bits, or 1-byte unit, and UTF-16, which uses 16 bits, or 2-byte units.

NOTE

UTF-8 was designed to be backward compatible with ASCII.

Base64

Base64 encoding allows us to transfer binary data over channels that can only represent text data. It essentially converts any binary data into an encoded sequence of printable characters, allowing us to transfer that data over virtually any channel and protocol.

Base64 gets its name from its use of 64 characters:

1-26: A to Z
27-52: a to z
53-62: 0 to 9
63: +
64: /

The = character might be used in the visual representation of this encoding as well, but only at the end of a string for padding.

Base64 works by:

  1. Converting every three-bytes of binary data into four Base64 characters.
  2. Each three byte sequence is called a block.
  3. 3x8 bytes of input produces 4x6 Base64 bytes of output.
  4. When the input is indivisible by six, we add zeroes at the end of the input string to pad it, so that it becomes divisible.

Base64 output will contain:

  • One = character if the last block of input was only two bytes (without the added zeros).
  • Two = characters if the last block of input was only one byte.

Relevant Note(s):