Counting and Character Encoding

ASCII

American Standard Code for Information Interchange – character encoding standard for electronic communication.

Developed 1963 and is a 7bit binary system. Each character translates into a 7bit binary. This allows for 253 characters.

The upper case “A” was 100001, “B” was 1000011. The lower case “a” started at 110001. This was 7bits because an 8bits of zeroes indicates end of file/line.

Due to the limitation of 253 characters, the original version of ASCII became a problem as other languages and characters became apparent.

 

Unicode

Started by the Unicode Consortium to assign over 100,000 characters to every language’s character/symbol throughout the world.

 

UTF-8

As the WWW became prevalent, a new character representation mechanism was needed. Potentially need to support over 100,000 characters. It is the most widely used character encoding system used today.

If the byte starts with “0” it means this is representing the old 7bit ASCII character. But if it starts with “1” it is instructions to a new character set.

 

Base2 / Binary

Using only 0 and 1. The “byte” uses bits to represent its value. Generally the defacto 8 bits per byte results in 0 – 255 representation per byte.

 

Base16 / Hexadecimal

A 16 base numeral/ character system. Uses ASCII characters 0-9 and A – F (or a-f). Generally a Base16 digit represents 4 binary bits.

 

Base64

A 64 base numeral/character system. (Humans use Base10 counting – 0 to 9, whereas computers use Base2 or binary). It uses 64 ASCII based characters to represent a single Base64 character. The actual ASCII characters might vary depending on systems but generally it is the digits 0 – 9, A-F, a-f and two characters such as “+” and “/”. The slash can cause problems for Urls, therefore the additional characters are often substituted with “-” and “_” in web-based systems.

Value Char Value Char Value Char Value Char
0 A 16 Q 32 g 48 w
1 B 17 R 33 h 49 x
2 C 18 S 34 i 50 y
3 D 19 T 35 j 51 z
4 E 20 U 36 k 52 0
5 F 21 V 37 l 53 1
6 G 22 W 38 m 54 2
7 H 23 X 39 n 55 3
8 I 24 Y 40 o 56 4
9 J 25 Z 41 p 57 5
10 K 26 a 42 q 58 6
11 L 27 b 43 r 59 7
12 M 28 c 44 s 60 8
13 N 29 d 45 t 61 9
14 O 30 e 46 u 62 +
15 P 31 f 47 v 63 /

Base64 is generally used in computation for encoding binary data. For example, the work “And” is translated to its decimal ASCII values. Based on that we take the binary representation and take 6 bit groups (6bits = 63 possibilities/characters). So for the word “And” the Base64 encoding are the following 4 characters:

For the letter “g”, the Base64 encoding are the following 4 characters. Note that in this case we are padding the remaining 2 unused characters with “=”.