HTTP - the basics
Recap Materials
You should know all of this, but If you need a recap...
The HTTP (Hyper Text Transfer Protocol) provides the basis for communication across the web. Understanding the HTTP protocol, how data is transferred between devices, and how it is manipulated, allows us to manipulate data between machines on the web, and influence the behaviour of remote devices.
So why is this important? Public facing web services are likely to be the first point of attack against a remote system. Therefore, being able to interact with these services, potentially in a way the designers didn't intend, will help us to find any potential flaws.
What is HTTP?
HTTP is a text based client-server communication protocol, that defines how web servers and clients should behave in response to commands:
- A Client sends a request to a remote service
- The Server responds to this request
Example
Bob (client) would like to get a file from Alice (Server)
- Bob asks (requests) the file 'do you have a copy of page X'
- Alice (responds) with the file 'yes, here you go'
The Client tells the server what file it wants though a URL (Uniform Resource Locators1) The URL represents the address of the file that you wish to access. When you type an address in your web browser, the URL is encapsulated into a HTTP Request and forwarded to the server for processing.
Stateless Communication
One important feature to consider with HTTP is that it is a stateless protocol. This means that each request to a server is independent of any other requests that have been made.
This is important as it effects the way we deal with our content. If we wish to remember who a user is then we need some kind of session management (more on this later). Often this can be exploited to make the server behave in a different way than expected.
Important
This part really is Important. It makes the whole distributed web thing sane, but adds a lot of other security considerations.
HTTP Ports and Versions
While you may still find servers operating HTTP 1.0, the most common version is HTTP 1.1. This adds support for session persistence (not to be confused with Stateless communication). Where before each HTTP request required a new connection to the server (for example, Open Socket, Make Request, Close Socket), later versions of HTTP allow multiple transactions to be encapsulated within a single connection. This improves latency, by reducing the overhead of opening sockets.
HTTP is currently at version 2.0 (version 3 has been draft for a while). However, the main differences between 1.1 and 2.x/3.x are optimisations to the way communications are handled. What this means is that for the purposes of web hacking, these later versions make little difference.
By default HTTP operates on port 80. Other common ports for HTTP services include 8080, and 8000 -- services discovered at this address during the recon phase are worth investigating. HTTPS (secure HTTP) operates on port 443 and sometimes on 8443.
Encryption
HTTP doesn't come with encryption by default6, so HTTPS is currently an "extension" to the protocol. When it comes to attacking a site, HTTPs doesn't really make that much of a difference. So we wont go into detail.
While HTTPs doesn't make the site itself any "safer" from attack, it does stop us from eavesdropping on the communications7.
Tip
The SSL Certificate can also be an interesting place to look as part of the recon process. The subject alt names field, can give us some indication of subdomains.
User Agents
Devices that connect to a sever using HTTP are called user agents. While web browsers are the most common, many other services communicate over HTTP; for example, Google's web indexing crawler, Internet of Things (IoT) devices, and anything else that will make use of web content.
A HTTP server may modify its response depending on the user-agent that connects to it, for example sending a different version of the page if it detects you are on a mobile browser5.
This means that we could modify the behaviour of the site, by using a different user agent.
Summary
Here we have had a brief introduction to the HTTP protocol.
-
May also be known by Pedantic2 people as URIs (Uniform Resource Identifiers) ↩
-
And wrong, if we are talking about the web addresses. URI3 is the generic superset of URL. If its good enough for Berners-Lee4 then its good enough for us. ↩
-
And if they want to get that pedantic about it, they really should be calling them IRI's, as we do more than just ASCII now. ↩
-
Tim, obviously. Though his brother Mike is also really interesting, doing lots of good stuff around carbon footprints. Mikes books are also much better coffee table material than RFC-1738 ↩
-
For example, getting you to install an Cra^H^H^H Amazing App. ↩
-
Dispite Mozilla etc, trying to make it happen, and GCHQ "Warning" us about how terrible things like DNS over HTTPs it might be Reddit ↩
-
Dodgy Local Network Proxy Server Shenanigans aside. ↩