Overview

URL stands for “Uniform Resource Locator,” commonly known as a web address. It represents the internet address of various resources. Here’s a typical URL:

1
https://www.example.com/path/index.html

Resources can be understood as any files accessible via the internet, such as web pages, images, audio, video, JavaScript scripts, and more. Knowing their URLs is essential to access them on the internet.
Any resource accessible through the internet must have a corresponding URL. While one URL corresponds to one resource, a single resource may have multiple URLs.
URLs are fundamental to the internet. The “inter” in internet exists because web pages can include “links” to other URLs. Users can jump from one URL to another, visiting different websites with a simple click.

Components of a URL

URLs consist of multiple parts. Here’s a complex URL example (actual URLs are usually simpler):

1
https://www.example.com:80/path/to/myfile.html?key1=value1&key2=value2#anchor

Let’s break down the parts of this URL:

Protocol

The protocol (or scheme) is the method the browser uses to request resources from the server. In our example, it’s the https:// part, indicating the use of the HTTPS protocol.
The internet supports multiple protocols, and it’s necessary to specify which one the URL uses. The default is typically HTTP. If you omit the protocol and enter www.example.com in the browser’s address bar, it will default to http://www.example.com. HTTPS is the encrypted version of HTTP, and for security reasons, more and more websites are adopting this protocol.
HTTP and HTTPS protocol names are followed by a colon and two forward slashes (://). Other protocols may differ; for example, the email address protocol mailto: is followed by only a colon, as in mailto:foo@example.com.

Host

The host is the name of the website or server where the resource is located, also known as the domain name. In our example, the host is www.example.com.
Some hosts don’t have domain names and only use IP addresses (e.g., 192.168.2.15). This is common in local area networks.

Port

A single domain name may host multiple websites, distinguished by their ports. A “port” is an integer that tells the server which specific website the visitor wants to access. The default port for HTTP is 80, and if omitted, the server will return the website on port 80.
The port follows the domain name, separated by a colon, like www.example.com:80.

Path

The path represents the resource’s location on the website. For example, /path/index.html points to the index.html file in the /path subdirectory of the website.
In the early days of the internet, paths represented actual physical locations. Now, servers can simulate these locations, so paths are often virtual.
Paths may include only directories without filenames (e.g., /foo/), and the trailing slash can even be omitted. In such cases, servers typically default to the index.html file in that directory (equivalent to requesting /foo/index.html), but this depends on server settings.

Query Parameters

Query parameters provide additional information to the server. They appear after the path, separated by a question mark (?). In our example, it’s ?key1=value1&key2=value2.
Query parameters can be single or multiple. Each parameter is a key-value pair, with the key and value connected by an equals sign (=). For example, key1=value1 is a key-value pair where key1 is the key and value1 is the value.
Multiple parameters are separated by ampersands (&), like key1=value1&key2=value2.

Anchor

An anchor (or fragment identifier) is a location marker within a web page. It’s denoted by a hash symbol (#) followed by the anchor name, placed at the end of the URL (e.g., #anchor). After loading the page, the browser automatically scrolls to the anchor’s location.
Anchor names are typically defined by the id attribute of HTML elements, which we’ll cover in the “Element Attributes” chapter.

URL Characters

URLs can only use the following characters:

  • 26 English letters (uppercase and lowercase)
  • 10 Arabic numerals
  • Hyphen (-)
  • Period (.)
  • Underscore (_)

Additionally, 18 characters are reserved for specific uses in URLs. For example, the question mark (?) can only appear at the beginning of query parameters. If these reserved characters need to be used elsewhere in a URL, they must be escaped.
URL character escaping is done by prepending a percent sign (%) to the character’s hexadecimal ASCII code. Here are the 18 reserved characters and their escaped forms:

  • !:%21
  • #:%23
  • $:%24
  • &:%26
  • ':%27
  • (:%28
  • ):%29
  • *:%2A
  • +:%2B
  • ,:%2C
  • /:%2F
  • ::%3A
  • ;:%3B
  • =:%3D
  • ?:%3F
  • @:%40
  • [:%5B
  • ]:%5D

For example, if a webpage’s URL is foo?bar.html (containing a question mark), it should be written as foo%3Fbar.html.
While legal URL characters can also be escaped, it’s not recommended. For instance, the letter ‘a’ could be written as %61, but it’s unnecessary.
Note that a space is escaped as %20. This is crucial for filenames containing spaces.

Characters that are neither legal URL characters nor reserved characters (such as Chinese characters) theoretically don’t need manual escaping and can be written directly in URLs. For example, www.example.com/中国.html. Browsers will automatically escape these characters when sending the request to the server. The escaping method uses the hexadecimal UTF-8 encoding of these characters. Each two digits are treated as a group, and each group is prefixed with a percent sign (%).

For instance, the UTF-8 hexadecimal encoding for the Chinese character is e4b8ad. Grouping every two characters and adding percent signs results in %e4%b8%ad. This means that wherever the character appears in a URL, it should be written as %e4%b8%ad. Therefore, to access the URL www.example.com/中国.html, you would need to write it as:

1
www.example.com/%e4%b8%ad%e5%9b%bd.html

In this example, is escaped as %e4%b8%ad, and is escaped as %e5%9b%b.

Absolute vs Relative URLs

URLs come in two types: absolute and relative.
Absolute URLs contain complete information to locate a resource, including protocol, host, path, etc. All previous examples were absolute URLs.
Relative URLs don’t include full location information and must be combined with the current page’s location to determine the resource’s location. For example, if the current page is https://www.example.com/path/index.html and it contains a resource with URL a.html, this is a relative URL. The browser assumes a.html is in the same subdirectory as the current page, resulting in the absolute URL https://www.example.com/path/a.html.
Relative URLs starting with a slash (/) indicate the website’s root directory. Otherwise, they’re calculated from the current directory. For instance, /foo/bar.html refers to the foo subdirectory in the root, while foo/bar.html refers to the foo subdirectory in the current directory.
URLs can also use two special shortcuts:
. : current directory (e.g., ./a.html for a.html in the current directory)
.. : parent directory (e.g., ../a.html for a.html in the parent directory)
These shortcuts can be chained, like ../../ to indicate two directories up.
Absolute URLs can also use these shortcuts, e.g., www.example.com/./index.html is equivalent to www.example.com/index.html. In this case, the dot (.) represents the current directory of the root directory, which is essentially the root directory itself.

<base>

The <base> tag specifies the base URL for all relative URLs within a web page. Only one <base> tag can be used per page, and it must be placed inside the <head> section. It’s a self-closing tag, meaning it doesn’t have a closing tag. Here’s an example:

1
2
3
<head>
<base href="https://www.example.com/files/" target="_blank">
</head>

The href attribute of the <base> tag provides the base URL for calculations, while the target attribute specifies how links should be opened (see the “Links” chapter for more details). With the base URL set to https://www.example.com/files/, a relative URL like foo.html would be converted to the absolute URL https://www.example.com/files/foo.html.
Note that the <base> tag must have at least one of the href or target attributes:

1
2
<base href="http://foo.com/app/">
<base target="_blank">

Once set, the <base> tag affects the entire web page. To change the behavior of a specific link, you must use an absolute URL instead of a relative one. Pay special attention to anchor links, as they will be calculated relative to the <base> URL rather than the current page’s.

Link to original article:
https://wangdoc.com/html/url