HTTP: Requests and responses
Recap Materials
You should know all of this, but I have added it for reference.
In the previous article we introduces HTTP, the protocol used for the majority of communication between clients and servers on the web. In this article we examine how a client requests data from a server using HTTP, and the format of the response the server replies with.
HTTP Methods
The HTTP methods and headers described here are the most commonly used and should cover the majority of the situations you encounter. For those interested full the HTTP Specification, (available download at the bottom of the page) gives detailed information on how HTTP requests are structured, and the valid parameters we can use.
The header fields above are the most commonly used parts of the HTTP specification
The diagram below (from step 1.3) gives an overview of the interaction between the client and the server, when requesting a webpage.
This request - response flow is used for almost all web based communication, whether browsing a static website, searching google, or getting data on dynamic websites using AJAX and REST (Representational state transfer)
HTTP Requests
HTTP Requests form the basis for all communication, between devices on the web. The client will request a given resource (or page) using a URL. The Request should contain all the data the server requires to properly serve the response.
Request Methods
The web would be a pretty boring place if we could only request single static pages. Request methods allow us to interact with the server in different ways, and do more than just request a page.
- GET requests are used to request content and send small amounts of data.
- POST requests are used to send data send larger amounts of data to the server (they also retrieve the page)
- HEAD requests returns just the HTTP headers
Other request methods such as PUT, DELETE and OPTIONS are also implemented, although we will not use them frequently here (although REST based APIs can provide a good source of amusement and information disclosure2).
HTTP Request Headers
Computers are stupid: unless they are given a specific set of information, they don't know how to respond, so we need to formalise the request.
A standard HTTP request will contain:
- HTTP Method -- what operation the client wants to perform
- Resource -- what the Client is trying to access
- Host -- which site on the server we are connecting to (this may not be needed depending on the server and HTTP Version)
- Headers -- other information for the request, such as browser version
- HTTP Body -- (optional) content that is being sent to the server
Some of this information is optional, or added by the browser or the tool we are using to talk to the server. For example, the user-agent header will be completed by the browser that we are using.
Examining Request Headers
Lets examine some request headers to see the types of data that are sent. We will need to request a page from a website, we will use httpbin, which is designed for testing the data sent in HTTP requests.
All of the major request methods (GET, POST, HEAD etc) will send the same core set of request headers, with the different functionality encapsulated in either optional headers, or inside the request body.
Request headers sent by a browser can be viewed without any plugins, on:
- Chrome using developer tools (Network Bar, click on the file to see its requests)
- Firefox developer tools (Network, click the file)
If we visit http://httpbin.org/get/ and examine the request headers in the browser we get this:
Host: httpbin.org
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:65.0) Gecko/20100101 Firefox/65.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Connection: keep-alive
Upgrade-Insecure-Requests: 1
The only mandatory line is:
- Host The host name of the server we wish to talk to. This is used if there are multiple websites on the same server, and is mandatory when making HTTP 1.1 and above requests.
Note
While this is not mandatory for HTTP 1.0 we can consider this version of the protocol obsolete
Optional, but interesting lines are:
- Referer tells the server where the request came from (this is not a typo; it was misspelled in the original HTTP specification and has become standard since)
- User-Agent provides information about the browser making the request. More history here, as most browsers will include the Mozilla string (thank Netscape for that)1
- Accept tells the server the types of data this request is expecting
- Cookie isn't shown here, but it can be used to submit additional cookie parameters to the server
Request Parameters
Request parameters allow us to send information to the server that can be used to change how the server will respond to the request. I like to think of request parameters as variables in an Object-Oriented (OO) language.
Note
Without a way of sending these parameters to the server the web would be a very different place.
Dynamic content would be impossible, with the content instead being static (like the pages in a book), so you could only find information if you had the specific link. Quite how we would find these pages would be interesting, as we wouldn't be able to ask Google (or Google would be like the old yellow pages and madness would ensue).
Consider the following HTML log-in form:
<form>
<div class="form-group">
<label for="username">Username</label>
<input type="text" id="username">
</div>
<div class="form-group">
<label for="password">Password</label>
<input type="text" id="password">
</div>
</form>
When we submit the form, the values in the username and password boxes are sent to the server as parameters. The server can then act on the data that has been sent, and respond appropriately.
Using an OO analogy to examine what could happen on the server:
def login(username, password):
if username == "dang" and password == "sekret":
return "Login Successful"
else:
return "Login Fails"
So our login function takes two parameters and, depending on what is supplied, either returns a success or fail message.
To bring this back to HTTP, this means we need to find a way of sending the parameters to the server by encoding them into the request. Depending on the request type the method of encoding this data changes.
Passing parameters to server using requests
GET requests
GET requests are encoded as part of the URL (do a Google search and look at the URL for a good example of this). The part appended to the string containing the parameters is known as the query string.
The query string is formed of a question mark (?), followed by parameter, value pairs. Multiple parameters are delimited by an ampersand (&).
So submitting the form above with a GET request the query string would be:
http://127.0.0.1/login.html?username=dang&password=sekret
The benefits of using GET requests and the query string are:
- They are good for transferring small amounts of data
- They are easy to debug (we can see the parameters in the URL)
- They can be bookmarked
However we also need to consider how GET requests work:
- Requests can be cached
- Requests can appear in the browser history
- Or a link between the favicon and request may be cached (leading to possible information disclosure even in 'safe' browsing mode)
- There is a limit on the length of a request
- Sensitive data will be transmitted as part of the URL
POST Requests
POST requests are encoded inside the request body. This means that the request itself is only visible by inspecting the headers, rather than as part of the URL. This makes it slightly harder to modify the request, as we cannot just manipulate the URL, and would need to use a tool to add the parameters to the request body. We will discuss manually sending requests to the server in the next step.
The following code block shows a POST request made to httpbin, you can see that the core elements of the request header are the same as the GET request above. However, there is a new field, Body that contains the encoded request parameters.
Host: httpbin.org
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:65.0) Gecko/20100101 Firefox/65.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Connection: keep-alive
Body: username=dang&password=foobar
POST requests:
- Can be harder to debug as we need to examine the request headers
- Only the page remains in the browser history
- Are not cached
- Have no restrictions on data length
Note
While the parameters are 'hidden' in the header, do not rely on POST requests to be secure. It is trivial to view the headers, either using the browser developer modes, packet sniffing or third-party tools such as Burpsuite.
HTTP Responses
Each request will be served with a response. Usually this will be an HTML- based web page, however protocols such as REST may return things like JavaScript Object Notation (JSON) formatted data. Looking at a request outside of the browser can be done with tools like Netcat (or python). For example, grabbing the CUEH website gives the following response (truncated for brevity):
HTTP/1.1 200 OK
Date: Mon, 06 Mar 2017 13:01:22 GMT
Server: Apache/2.4.7 (Ubuntu)
Last-Modified: Tue, 10 May 2016 15:48:20 GMT
ETag: "2446-5327edab5ac1c"
Accept-Ranges: bytes
Content-Length: 9286
Vary: Accept-Encoding
Content-Type: text/html
<!DOCTYPE html>
<html lang="en">
... (HTML DATA) ...
The first part of the response is the Header, which contains information such as:
- Response Status Code (HTTP/1.1 200 OK) -- defines how to handle the
response:
- HTTP version
- HTTP status code
- Textual version of status code
- Server -- gives us information on the server itself
- Last Modified -- date page was last modified
- E-Tag -- hash of the page (used for caching)
- Content-Type -- the type of data that is being returned in the body
- Content-Length -- the amount of data being returned in the response body
This actually leaks quite a bit of information. For example, the type (and version) of the server that is responding, and a good idea of when the page was created or modified -- more stuff that can be included in your recon phase.
Note
This is one of the times that having a good understanding of the underlying protocols is good. Sometimes having a good think about the data returned can be really useful, and applying a bit of cunning can help provide more information for the recon phase, or save you time. For example, the server type can be useful for finding exploits; last modified can tell you if the page has been changed recently, or what types of technology to expect.
HTTP Response Status Codes
The first element of a HTTP response is its status code. Response codes let us know the status of the request, and gives some information on how to handle the response.
Response codes are broken into categories, (the hundreds parts of the code), consisting of subcategories (the tens and units parts of the code)
The core response code categories, and common response codes include:
- 1xx Information
- 2xx Success
- 200 OK -- The page was returned without error
- 3xx Redirection
- 303 Redirect -- The page has moved, attempt to automatically redirect to the correct address.
- 400 Bad Request
- 401 Unauthorised -- There has been some issue authenticating with the server.
- 403 Forbidden -- the server actively rejects the request
- 404 Not Found (no such page)
- 500 Internal Server error
HTTP Response body
The response body will include the data being returned by the web server. This will be either the HTML representing the page itself, or in the cases of a REST service, some representation of the data (usually in JSON, but it can be XML).
Summary
In this article we have discussed methods of sending and receiving data using the HTTP protocol.
HTTP requests are used by a client to request information from a service. To allow the service to respond dynamically to a request, we can also include parameters to the server.
The server returns this information though a HTTP response. If successful, this will return the data from the server to the client.
In the next article we will examine how we can make requests from the server, using command line tools, and through a program.
Further reading
-
History of the user agent string https://webaim.org/blog/user-agent-string-history/ ↩