HyperText Transfer Protocol

HTTP is the protocol of the web. It specifies rules for web clients to retrieve content from web servers. HTTP most commonly uses port 80. Traditionally, web browsers would use port 80 as the default port if left unspecified in the URL. However, recently browser developers have started to set the default port to 443, which often runs an encrypted of HTTP, called HTTPS. HTTP uses a series of requests generated by a client and responses generated by a server to enable flexible and efficient communication.

When we visit a page in a web browser, the browser sends an HTTP request to the application server. The server will send an HTTP response that contains the page's HTML, which describes the content of the webpage. The browser will then render the content from the HTML, request any additional resources specified in the HTML (such as images, JavaScript files, or CSS files), and display the final result to the user.

Example

Request:

GET /users?language=english HTTP/1.1
Host: www.example.com
Connection: close
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.125 Safari/537.36
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9
Accept-Encoding: gzip, deflate
Accept-Language: en-US,en;q=0.9

Response:

HTTP/1.1 200 OK
Server: nginx/1.14.0 (Ubuntu)
Date: Tue, 17 Aug 2023 17:47:30 GMT
Content-Type: text/html; charset=utf-8
Content-Length: 2739

<!DOCTYPE html>
<html>
    <body>
        <h1>Users</h1>
        <ul>
          <li>Alice</li>
          <li>Bob</li>
        </ul>
    </body>
</html>

Methods

Method Description
GET How a client (like a web browser) can ask for information from a web server.
POST The server has to process something or make a state change.
PUT Updates the content of a given resource with the client's input.
DELETE Removes the requested resource.
OPTIONS Returns the communication options allowed by the server, including the allowed methods that the server accepts.
HEAD Similar to GET, but only retrieves the HTTP headers of the page without the response body.
Note

It's entirely up to the server to determine how to interpret or handle the request. So it is possible to send data to a server via a GET request, something some phishing pages utilize to fall under the radar. Because of this, it is important not to solely rely on the method, but to verify its implementation as well with for example packet inspection.

Responses

Status Code Meaning
200 to 299 Success
300 to 399 Redirection
400 to 499 Client Error
500 to 599 Server Error
Note

Note that a page's status code can be set by the website developer. For example, accessing a document that requires authentication without being an authenticated user should usually respond with a 401 Unauthorized status. However, it is possible for the developer to program the application to respond with a 200 Ok response. Because of this, it is essential not to solely rely on the status code when determining if a user has access to a given resource or not.

URL Encoding

Web browsers interpret text within a URL in one of two ways:

  1. As raw text.
  2. As special characters that perform a particular function within the URL.

These special characters are:

Character Description
? The query string identifier, which starts a query string. Example: http://site.com?search
= Separates a parameter and value pair. Example: language=english
& Appends another parameter and value pair. Example: language=english&color=blue
/ Indicates hierarchical directory structure or http route. Example: http://site.com/folder/page.txt
. Part of a directory structure. A single period (.) represents the current directory and two of them (..) represents one parent up.
: Separates the protocol and port from the resource. Example: http://site.com
% Indicates a URL encoding character.

But what happens if we want to use a literal question mark as part of a URL? In order to allow this set of special reserved characters to be literal parts of the URL, we need to URL encode them.

URL Encoding works by replacing all forbidden URL characters with the "%" sign followed by the hexadecimal ASCII value of the specified character. For example, URL encoding would become URL%20encoding.

In python, we can do this using the urllib package:

$ python -q
>>> import urllib.parse
>>> urllib.parse.quote('/El Niño/', safe='')
'%2FEl%20Ni%C3%B1o%2F'
>>> exit()

Headers

Request Headers

Header Description
User-Agent Specifies to the server what version of browser we are using. Some servers may only allow specific User-Agents to access their resources. However, since we can intercept and edit requests with a proxy, we are often able to trick the server and bypass this restriction.
Origin Tells the server where a request originally comes from.
Accept-Language Tells the server which human-speaking language the client expects back. This header could be helpful for a system administrator or defender to learn more about the potential location of an attacker.
Accept-Encoding Tells the server which encoding types the client will understand. In particular, Accept-Encoding allows the client and server to negotiate which compression algorithm should be used to transmit data.
Forwarded Provides the server with diagnostics for a given request that comes via a proxy. By default, it gathers sensitive client-side information.

Response Headers

Header Description
Server Often reveals the name of the software running the web server to the client.
Proxy-Authenticate Tells the client which type of authentication should be used to access a page or resource sitting behind a proxy.
Proxy-Authorization Tells the client which credentials to use in order to access a page or resource sitting behind a proxy. Credentials are transmitted in base64 encoding.
Strict-Transport-Security (HSTS) tells a client that it should only access the server via HTTPS, the encrypted version of HTTP.

X-Headers

There are several headers that start with the letter X, as in X-Requested-With. This syntax is one way to represent non-standard HTTP headers, and they often reveal interesting information about the software used by the web application.

For example, the X-Requested-With header usually suggests Ajax requests, and the X-amz-cf-id header indicates that the application uses Amazon CloudFront.

Cookies

HTTP cookies are a common way to maintain state throughout a series of HTTP requests for a particular user. Most web applications use cookies as part of their authentication framework, as HTTP by itself is stateless. Data from one HTTP request does not influence the next request sent by the same user.

The site could ask the user for their username and password on every request, but that would be annoying. Instead, the server could provide a token that the user's browser can send on every request. This type of authentication flow starts when a user submits a username and password to the server. The server validates the credentials and creates a session identified by a random token. The token is returned to the user in the HTTP response to the login request. The browser will save this cookie and include it on all subsequent requests to the site. When the user navigates to a new page and the browser sends the HTTP request, the server will read the cookie value and look up the session. Using this information, the server knows which user sent the request.

Warning

Since browsers store cookies locally, we can easily modify their values. If an application uses cookies insecurely, we might be able to manipulate the cookie values and make the application do something unintended.

In secure implementations, the values stored in cookies aren't randomly generated and are used as lookup keys for server-side data. For example, if we log in to an application and get a cookie with a session identifier of 3, we can assume that there are sessions with identifiers of 1 and 2. We would have a more difficult time guessing another session identifier if ours is b9edc541a87be432175defdf08aec342.


Relevant Note(s): Network Protocols