Get #Amazon #Prime for this #holiday #amazonprime #christmas #2019

#utf8 character encoding in #HTML template

There is a meta tag which has attribute charset equals to UTF-8 in an HTML template. What is it?


<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8">
  <meta name="viewport" content="width=device-width, initial-scale=1.0">
  <meta http-equiv="X-UA-Compatible" content="ie=edge">
  <title>Document</title>
</head>
<body>
  
</body>
</html>

Unicode Homepage
Unicode Homepage


History - starts with ASCII

ASCII - American Standard Code for Information Interchange
It is a character encoding standard for electronic communication between machines. That allows communications between machines to happen.

ASCII is a 7-bit binary system. Each letter that is keyed in is converted into 7 binary numbers and sent over the wire.
With 7-bit binary, we can have 0-127.
The first 32 are for control codes.
For example:
A - 65: 10 00001
B - 66: 10 00010

a - 97: 11 00001
b - 98: 11 00010

Without the first two binary digits, A and a are just 1 which makes it easier to identify English alphabets.

ASCII
ASCII table from http://www.asciitable.com/
Other countries, for example, Japan created multibyte encoding that can include Kanji characters.
This caused incompatibility for communications among machines. If your machine or computer can't decode a message it receives, your messages would end up being garbled.

Unicode - UTF8

Unicode consortium figured out a standard to cover all the characters in the world.
If we use 32 bits to encode ASCII-encodable characters directly, the binary numbers will have a long list of zeros prefix. That will take 4 times more space for each English character.

Problems that were solved:
1) Get rids of all zeroes in English characters and ASCII set
2) Handle old computer systems that interpret 8 zeroes in a row as NULL and stop listening further (end of string)
3) It has to be backward-compatible. Let machines that understands only basic ASCII understands this new Unicode text.

It starts with a header that tells how many bytes that are in the current number. The number of 1s determines how many octets there are in the current number

For example:
1) First octet has 110. Two 1's mean two octet: header octet and a continuation octet. Continuation octet is marked by 10. x's are filled by the number that represents a character that is being transmitted.
110xxxxx 10xxxxxx

2) First octet has 1110. Two 1's mean two octet: header octet and a continuation octet. Continuation octet is marked by 10. x's are filled by the number that represents a character that is being transmitted.
1110xxxx 10xxxxxx 10xxxxxx

This UTF-8 system solved the 3 problems mentioned above cleverly.

Source: Computerphile on youtube

Thanks for reading!

Jun