HTML URL Encode
The URL is the basic tool of the World Wide Web. The URLs are used in the HTML document for referencing the hypertext links. The URL contains the following information:
- The protocol for example HTTP
- The domain name of the site on which the server is running
- Domain name with the required username and password information
- Port number of the server, if there is no port number the default value is understood to be the value of the indicated protocol
- The location of the resource
As an example consider the link below:
It references the file qwe.html in the directory /hypertext/www/abc accessible at the server www.xyz.com using the html protocol.
Allowed characters in URLs
It is mandatory to write every URL using the printable ASCII characters and it cannot be written using ISO Latin-1character set because doing this the URL can be sent by the electronic mail as many electronic mail programs sometimes mishandle some characters.
In a URL the non ASCII characters are shown using character encoding. HTML characters and entity references cannot be used in the URL.
URL Character Encoding
Any bit character can be represented in a URL by an indirect reference or by encoding. Therefore, the ISO Latin-1 character can be represented by the special character sequence as follows:
In the above syntax xx is the hexadecimal or the hex code of the character that is to be used and % indicates the start of encoding.
The encoding for the character é is %E9.
Characters that are not allowed in URL
There are some characters that are not allowed in the URL and can be used only in the encoded form. This is because these characters have special meanings in the non URL text.
An HTML document uses double quotation mark (“) to delimit the URL in a hypertext anchor.
Therefore, the quotation mark inside the URL is not allowed. A space character is also not allowed because many programs will consider the space as a break between two strings. To demonstrate consider the following example:
A filename saved as Network Information, there is a single space this space must be encoded in URL as follows:
Following is the list of ASCII characters that are not allowed in a URL.
|Character||Hex code||Character||Hex code|
In a URL there are some characters that have the special meanings % is the best example of this. The % denotes the character encoding. The forward slash that is (/) has also a special meaning that indicates the change in hierarchy such as directory change. These special characters should be encoded if you want them to appear as they are in the URL.
For example, if you want to in to include the following string:
in a URL you should encode the percentage sign because it has the special meaning. Therefore,
where the %25 is encoding for the percentage character. If you want the effect of the special character you should not encode it. The most common special characters that are used in the URL are as follows:
|%||This is escape character, used in all URLs.|
|#||This is used to separate URL of a resource from fragment identifier for that resource.|
|/||This is used to indicate hierarchical structures.|
|?||This is used to indicate a query string.|
Other special characters in a URL are colon (:), semicolon (;), at (@), equals (=), and ampersand (&).