HTML Character Set

HTML Character Set

A computer character set is a relationship between the computer binary code and the set of letters.

The standard character sets are designed so that a single 8 bit byte stores a single character because all computers use 8 bit as a basic storage unit. As there is 8 bits in a byte therefore, each byte can represent 256 possible characters.

ISO Latin-1 character set

The character set is useful when they can be understood. The international standard organization ISO has specified different character sets that fit inside the 8 bit characters. For the World Wide Web application, the default set of printable characters is ISO Latin-1 which is also called ISO 8859-1 character set.

US-ASCII

The first 128 characters in the ISO 8859-1 are equivalent to the US ASCII. The US ASCII is also known as ISO 646 character set. The US ASCII is a 7 bit character set because it has only 128 characters. There are 32 special characters that are used for communication lines or for controlling the printing devices. These 32 characters are not the printable characters. Some of these characters are shown in the table below:

Character Code Meaning Decimal
NUL Null character 00
BS Backspace 08
HT Tab 09
LF New line or line feed (also called NL) 10
CR Carriage return 13
SP Space character 32
DEL Delete 127

URL Character Encoding

Any bit character can be represented in a URL by an indirect reference or by encoding. Therefore, the ISO Latin-1 character can be represented by the special character sequence as follows:

%xx

In the above syntax xx is the hexadecimal or the hex code of the character that is to be used and % indicates the start of encoding.

EXAMPLE

As an example, the encoding for the character é is %E9.

The charset attribute

In an HTML5 document the default character encoding is UTF-8. The UTF-8 is also called Unicode that supports all the characters and the symbols.

The charset attribute is used with the meta element. The charset attribute is used so that the browser could know which character set is used in the page. Therefore, the charset element has assigned the value UTF-8 that supports all the characters and symbols. This is demonstrated as follows:

EXAMPLE

<meta charset = ”UTF-8” >

For 0 to 128 characters the UTF-8 is equivalent to the ASCII.