HTML Character Set
The standard character sets are designed so that a single 8 bit byte stores a single character because all computers use 8 bit as a basic storage unit. As there is 8 bits in a byte therefore, each byte can represent 256 possible characters.
ISO Latin-1 character set
The character set is useful when they can be understood. The international standard organization ISO has specified different character sets that fit inside the 8 bit characters. For the World Wide Web application, the default set of printable characters is ISO Latin-1 which is also called ISO 8859-1 character set.
The first 128 characters in the ISO 8859-1 are equivalent to the US ASCII. The US ASCII is also known as ISO 646 character set. The US ASCII is a 7 bit character set because it has only 128 characters. There are 32 special characters that are used for communication lines or for controlling the printing devices. These 32 characters are not the printable characters. Some of these characters are shown in the table below:
|LF||New line or line feed (also called NL)||10|
URL Character Encoding
Any bit character can be represented in a URL by an indirect reference or by encoding. Therefore, the ISO Latin-1 character can be represented by the special character sequence as follows:
In the above syntax xx is the hexadecimal or the hex code of the character that is to be used and % indicates the start of encoding.
As an example, the encoding for the character é is %E9.
The charset attribute
In an HTML5 document the default character encoding is UTF-8. The UTF-8 is also called Unicode that supports all the characters and the symbols.
The charset attribute is used with the meta element. The charset attribute is used so that the browser could know which character set is used in the page. Therefore, the charset element has assigned the value UTF-8 that supports all the characters and symbols. This is demonstrated as follows:
<meta charset = ”UTF-8” >
For 0 to 128 characters the UTF-8 is equivalent to the ASCII.