HTML URL Encode

HTML URL Encode

Uniform Resource Locator or URL is used to specify internet resources using a single line of printable ASCII characters. The URL contains all the major internet protocols that also include FTP, HTTP, and HTTPs etc.

The URL is the basic tool of the World Wide Web. The URLs are used in the HTML document for referencing the hypertext links. The URL contains the following information:

  1. The protocol for example HTTP
  2. The domain name of the site on which the server is running
  3. Domain name with the required username and password information
  4. Port number of the server, if there is no port number the default value is understood to be the value of the indicated protocol
  5. The location of the resource

EXAMPLE

As an example consider the link below:

https://www.xyz.com/hypertext/www/abc/qwe.html

It references the file qwe.html in the directory /hypertext/www/abc accessible at the server www.xyz.com using the html protocol.

Allowed characters in URLs

It is mandatory to write every URL using the printable ASCII characters and it cannot be written using ISO Latin-1character set because doing this the URL can be sent by the electronic mail as many electronic mail programs sometimes mishandle some characters.

In a URL the non ASCII characters are shown using character encoding. HTML characters and entity references cannot be used in the URL.

URL Character Encoding

Any bit character can be represented in a URL by an indirect reference or by encoding. Therefore, the ISO Latin-1 character can be represented by the special character sequence as follows:

%xx

In the above syntax xx is the hexadecimal or the hex code of the character that is to be used and % indicates the start of encoding.

EXAMPLE

The encoding for the character é is %E9.

Characters that are not allowed in URL

There are some characters that are not allowed in the URL and can be used only in the encoded form. This is because these characters have special meanings in the non URL text.

EXAMPLE

An HTML document uses double quotation mark (“) to delimit the URL in a hypertext anchor.

Therefore, the quotation mark inside the URL is not allowed. A space character is also not allowed because many programs will consider the space as a break between two strings. To demonstrate consider the following example:

A filename saved as Network Information, there is a single space this space must be encoded in URL as follows:

Network%20Information

Following is the list of ASCII characters that are not allowed in a URL.

Character Hex code Character Hex code
TAB 09 SPACE 20
22 < 3C
> 3E [ 5B
\ 5C ] 5D
^ 5E ` 60
{ 7B | 7C
} 7D ~ 7E

Special Characters

In a URL there are some characters that have the special meanings % is the best example of this. The % denotes the character encoding. The forward slash that is (/) has also a special meaning that indicates the change in hierarchy such as directory change. These special characters should be encoded if you want them to appear as they are in the URL.

EXAMPLE

For example, if you want to in to include the following string:

John%Ibraham

in a URL you should encode the percentage sign because it has the special meaning. Therefore,

John%25Ibraham

where the %25 is encoding for the percentage character. If you want the effect of the special character you should not encode it. The most common special characters that are used in the URL are as follows:

Characters Description
% This is escape character, used in all URLs.
# This is used to separate URL of a resource from fragment identifier for that resource.
/ This is used to indicate hierarchical structures.
? This is used to indicate a query string.

Other special characters in a URL are colon (:), semicolon (;), at (@), equals (=), and ampersand (&).