Mapping Input and Output

In our previous step we looked at how we can map the website contents. This allows us to build a picture of the site functionality, and identify areas that are worth further investigation.

The next stage of the Audit is to examine how Input/Output (IO) is handled, and to examine any areas where the user can control or manipulate site contents. As the majority of exploits depend on us interacting with the server in a way it doesn't expect, this element of mapping is important in helping us to identify possible attack vectors.

As before, we will set out a general process, how we approach the problem will depend on the nature of the test. If we have access to the source code, then the way data gets processed should be clear to us. However, if we are performing a black box style pentest, we will have to guess or infer how data is being processed.

Types of information that is useful

From the mapping stage we will have identified:

Pages where content is dynamically generated
Pages with user IO (forms etc.)
Areas where JavaScript is used to manipulate data
Session Cookies and other client side storage

Examine at forms and other user Input

Forms are still the primary way of interacting with an application.

Here, we take a close look at each of the forms used in the site, and make a note of

Request Methods
URLS the form submits to
Form Parameters
Any Hidden Form Fields

Again this gives us an idea of both the entrypoints that we can send data too, and the types of parameters that these entrypoints might accept.

The other thing we need to consider when examining the forms is what they do. Looking at how the application reacts to form input can give us some insight into the functionality there.

So a login form is going to give us some indication of the authorisation system.
A Search form is likely to interact with a database of some kind
Text entry for things like writing messages shows us where we could control information displayed on the site.

Following this mapping process we should have an idea of

Entrypoints for data. By this I mean the URLs that we can requests containing data to
Parameters for each entry point. Giving us a list of possible input values we can try in our tests.
Form Functionality What each form in the application controls and does.

Examine Dynamically generated content.

Sites may use a single page to display different content. Here data is sent to the server which then selects and shows the relevant content. This kind of dynamic content is interesting as it usually means there is a database involved somewhere in the process.

For example, a shopping site might have a products page, where the product shown is based on a user choice. Or the functionally offered may differ based on the parameters given.

The method of selecting the data may differ depending on the application structure.

Classic URL Approach

Here the item to be shown is encapsulated in the request data. For example using GET requests the selected item can be part of the query string.

www.shopper.com/products?id=1234 To show the Product with the ID 1234
www.shopper.com/users?id=4242 To show the user with the ID 4242

More complicated functionality could be encapsualted with multiple parameters. Searching the product pages for Electronics made by Arduino

www.shopper.com/search?category=electronics&brand=arduino

REST Style Approach

With a REST style approach the URL itself has meaning, a common approach is to use <item>/<id> as part of the URL.

The requests shown above could be represented in REST with:

www.shopper.com/products/1234
www.shopper.com/users/4242

There are no "rules" for REST style URLS, so more complex requests can be harder to pull apart. We may have to infer the structure of the URL For example our search for the arduino above could look like either of the below

www.shopper.com/products/electronics/arduino
www.shopper.com/products/search/electronics/arduino

Dynamically generated content can be interesting as it may show us where a database is being used. This not only lets us infer database structure, gives us a potential entry point for attacks such as SQLInjection style attacks.

For example our product search string of www.shopper.com/products?id=1234 could translate to a query on the server of SELECT * FROM Products WHERE id=1234.

More complex queries in the URL may also help is to identify Table structure: For example, the URL www.shopper.com/products?type=electronics&manufaturer=arduino means that the Products table might² contain a type and manufacturer field

As well as Databases, the URLS generated for dynamic content can also give us some idea of application functionality. In this case we need to take the URL parameters apart and try to infer the behavour.

Example

In the course of mapping a forum application we come across the following URL.

www.forum.com/posts.php?userid=123&postid=4242&edit=false

We can identify several components from the URL that is provided.

The Technology used is PHP (identified by the .php extension)
The Endpoint that the data is sent to. (post.php)
Two possible database parameters. userId and postId

The interesting part here is the edit parameter, modifing this to true may allow us to submit data, or unlock functionality that allows us to edit the page.

Similarly we may also be able to modify behavour with REST style URL's

www.forum.com/posts/view/1234

Here changing "view" to "edit" might give us access to new functionality.

Looking at JavaScript

The JavaScript used in the application can also give us insight into possible endpoints for data, and information on API tokens used to access backend functionality.

Example

Consider the folowing¹ piece of Javascript used to for authentication

``` ```

While its unlikely anyone would actually do authroisation client side like this. Examining the code and understanding its structure gives us a way to bypass the login functionality, by just calling the loginSuccess() function.

Other things we might want to look for in JavaScript functions are variable representing API keys etc⁴ These can come in handy if we need to make calls to the API as part of our exploit. For example, Google maps API keys have this form

<script async defer src="https://maps.googleapis.com/maps/api/js?key=YOUR_API_KEY&callback=initMap"></script>

With JavaScript driven content, as well as looking through the code itself, it is useful to examine the network logs in the Inspector tool. This can give us an idea of endpoints that are used to populate the data on the page.

You may also want to filter the data shown, XHR requests are requests used to update a pages content without refreshing it, and are commonly used by JS API's to populate a page.

Example

Looking at the network graph for an aula page load gives us some idea of how the page is built and the data gathered.

aula network

We can identify endpoints for things like authorisation, analytics and popultaing the feed.

Looking closer at the request for the feed we can start to infer more information about how the remote API works.

https://apiv2.coventry.aula.education/posts/feed?space=Mi9WMNtynL&until=2021-08-19T08:36:21.826Z

apiv2.coventry.aula.education Is a REST style API
The /posts part of the URL, refers to fetching individual posts from the API
The space parameter, refers to ID of the "space" (module) we want to fetch data for. (we can confirm this by checking the URL for the current space and seeing if it matches)
until is a bit strange. as there is no functionality on the page to limit this, Given it was the time the request was made, we can assume that it is used for pagination or similar.

As part of our further investigations, we might see if we can identify other Spaces we may be able to get access to posts that we shouldn't be allowed to see.

Note

When doing this, noticed an interesting thing around the timestamps. While they are stored in UTC (which is good), I made the request at ~ 10:30 BST (which is 9:30 UTC)

I have been having a problem with Windows, not adjusting the timezone for me, so the clock on my PC runs a hour slow until I force an update. However, the timing is 2 hours off. This suggests that there is some logical error in the way timesamps are processed³.

I assume the problem is something like. - Windows Time says I am in BST (UTC +1 hour), but the clock is incorrect, and doesn't actually match the real time. - Aula requests a UTC time, It asks Windows what the time is, then adjusts it by an hour. - Result is the clock is 2 hours behind.

Looking at session cookies and other local storage.

Lastly, we want to take a look at the cookies and any other local storage used.

Session cookies can help us identify the types of technologies used for example:

PHPSESSID php
ASPSESSIONID Microsoft IIS Server
ASP.NET_SessionId ASP on Microsoft IIS

Additionally, there may be extra information on the session types, or user abilities in stored in the cookies. For example, modifying an admin = false cookie, may allow you to change your user rights.

Looking at User Supplied Data

The final thing we want to examine is where user supplied data ends gets displayed, and any ideas around how it get processed both when it gets sent to the server, and when it gets displayed back to the user.

While there are some obvious areas for user input being displayed (like the feed in aula). User supplied data can also be shown in other less obvious places:

Usernames / Tag lines
Image Captions
Uploaded File names etc.

Its good to note the locations where data gets shown, and also some idea of how it get processed. Using HTML as input can give an idea of any input encoding used to sanitise the data.

This kind of data can be used in XSS or SSTI style attacks, where the user is able to inject extra vulnerability into the page. We will deal with the specifics of each of these attacks.

Summary

By this point we will have the bulk of the recon done for the "visible" parts of the site. We should have identified areas of interest that we can explore in the exploitation phase of the audit.

Session Cookies and other local storage used to hold information on the clients browser.
List of all pages in the site that can accept user input.
Areas where the user can supply input
- Parameters used for this input.
- Endpoints for submitting input.
Some idea of how input is used and displayed back to the user.

Horribly Contrived, and probably incorrect, but its an understandable example. ↩
While some developers might obsfrucate the database columns, or the parameters. However, people are on the whole logical, having a clear link between the parameters sent and the database rows is common. ↩
Though whether it windows or aula that is causing the problem is anyone's guess, I blame windows. ↩
Recently I worked with a student on dumping a list of users from Aula using the API, and a key found stashed in a file. Should be able to do a public writeup soon. ↩