Encoding
Encoding is a means of converting data. Data may be converted into another format in order to transmit it, store it, or compress it. Encoding might also be used to describe a data structure or format, for example a file format. Algorithms can encode and decode this data without any sort of key.
Encoding is not Encryption!
As long as someone can determine the rules that were applied to the original data, they can easily reverse the encoding without any special knowledge, like passwords or secret keys. For this reason, encoding should never be used in a situation where the security and confidentiality of data is important.
Binary
Binary encoding consists of only two basic components and can be represented by any two values. They might be an ON and OFF state, a clockwise or counter-clockwise spin, or simply the numbers 1 and 0.
Hexadecimal
It can be difficult to read a long string of 0s and 1s. One of the things that can make it a bit easier to understand is representing binary-encoded data using hexadecimal, or base-16, numbers. The highest number we can fit into a single byte is 0b11111111, 0xff, or decimal 255.
EXAMPLE
Decimal Binary Hexadecimal 0 0b0 0x0 1 0b1 0x1 2 0b10 0x2 3 0b11 0x3 4 0b100 0x4 5 0b101 0x5 6 0b110 0x6 7 0b111 0x7 8 0b1000 0x8 9 0b1001 0x9 10 0b1010 0xa 11 0b1011 0xb 12 0b1100 0xc 13 0b1101 0xd 14 0b1110 0xe 15 0b1111 0xf
American Standard Code for Information Interchange
American Standard Code for Information Interchange (ASCII) is a type of encoding used to store and process both printable and non-printable characters. In ASCII every character is represented with a 7-bit binary number, a string of seven 0s or 1s. ASCII contains encoding for all the alphanumeric characters and symbols on a modern keyboard, as well as encoding for things like TABs, Line Feeds, and even Backspaces.
Unicode and Unicode Transformation Format
Unicode is a standard that provides a number, or unique code point, for each character. Another way to say this is that each character is mapped to a unique value.
Unicode includes numbers and characters from the familiar Latin alphabet, for example, U+0041 for the Latin uppercase letter “A”. There are also Unicode numbers for each character in, for example, the Cyrillic, Thai, and Hangul alphabets. In total, there are over a million (a total of 1,112,064) mapped visible and non-visible characters.
Unicode Transformation Format (UTF) is a way to encode these Unicode mappings. The most common forms of UTF are UTF-8, which uses 8 bits, or 1-byte unit, and UTF-16, which uses 16 bits, or 2-byte units.
NOTE
UTF-8 was designed to be backward compatible with ASCII.
Base64
Base64 encoding allows us to transfer binary data over channels that can only represent text data. It essentially converts any binary data into an encoded sequence of printable characters, allowing us to transfer that data over virtually any channel and protocol.
Base64 gets its name from its use of 64 characters:
The
=
character might be used in the visual representation of this encoding as well, but only at the end of a string for padding.
Base64 works by:
- Converting every three-bytes of binary data into four Base64 characters.
- Each three byte sequence is called a block.
- 3x8 bytes of input produces 4x6 Base64 bytes of output.
- When the input is indivisible by six, we add zeroes at the end of the input string to pad it, so that it becomes divisible.
Base64 output will contain:
- One
=
character if the last block of input was only two bytes (without the added zeros). - Two
=
characters if the last block of input was only one byte.
Relevant Note(s):