Skip to content

Security Audit: Reconnaissance

The success of any security audit depends on the initial information gathering (or reconnaissance) stage. Here we discover as much information as we can about an application, and identify areas that need further investigation and testing.

In this article we will examine stages for reconnaissance of web applications, and build a general strategy for information gathering.

Reconnaissance?

It seems a bit weird to use reconnisance when we talk about code audit, as I tend to think of it in more of a pen-test of a unknown system.

However, its sliglty less awkward than calling it information gathering, and will map across to other modules / online tutorials.

What kind of things do we need to know.

First lets look at the things that can effect the security of a web site. This gives us an idea of the elements we are looking for during our audit process:

  • Web Technologies Used
  • Authentication
  • Authorization
  • Session Management
  • Cryptography
  • Data / Input Validation
  • Data / Output format
  • Error Handling
  • Logging / Auditing

Fingerprinting the Server

The types of technologies used to implement the site will have an effect on our attack methodology.

Static Sites that are build in pure HTML are unlikely to have vulnerabilities as they usually just serve information. The attack payloads for sites developed in PHP, Python (such as Flask) or JavaScript (NodeJS based), are going to be different.

One way of getting this information is through Port Scanning (see 5063CEM). NMap uses a banner grabbing technique to try to determine the web server in use. Other tools like Nikto, or OSint through Netcraft (more on these next week), could also be used.

Manual Banner Grabbing.

We may also be able to manually identify the server in use though examining the response headers, either in the browser, or by sending a request manually using something like CUrl (note the use of the -I flag to send just a HEAD request)

For example the Response headers from NGINX can look like this:

└──╼ $curl -I https://www.nginx.com
HTTP/2 200 
date: Mon, 16 Aug 2021 15:52:07 GMT
content-type: text/html; charset=UTF-8
vary: Accept-Encoding
x-gateway-request-id: b92127f643ebd5715518723fcdebbe72
set-cookie: geoip=GB; Max-Age=604800
x-pingback: https://www.nginx.com/xmlrpc.php
cache-control: max-age=300
link: <https://www.nginx.com/wp-json/>; rel="https://api.w.org/"
link: <https://www.nginx.com/>; rel=shortlink
content-security-policy: frame-ancestors 'self'
x-gateway-cache-key: 1628936245.043|standard|https|www.nginx.com||/
x-gateway-cache-status: HIT
x-gateway-skip-cache: 0
cf-cache-status: DYNAMIC
expect-ct: max-age=604800, report-uri="https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct"
server: cloudflare-nginx
cf-ray: 67fbd3f75ffe0752-LHR

or examining an Apache based site in the browser

Apache Server Header

The Server field of a response may include the web server type.

Different versions of the browser, may return different information. For example, older versions of Apache (such as the one on CUEH) provide the version number. On the other hand, some security hardened servers may manipulate the headers to remove things like server information.

Sending a Bad Request

The default error page often gives the the version number of the server. If the server is still using the default error we may also be able to get the server version by sending a bad request to the server.

For example, this great 404 I found gives us not only the server information, but the extra modules installed also

A 'Nice' 404

Hint

Obviously when it comes to securing your site, we should bear this in mind.

  • Update the server to not include the version in the headers
  • Remove standard 404's that leak information about the server type.

Easytask

Take a look at the request headers for cueh.coventry.ac.uk What is the server type and version ?

Identifying Technologies used

We can also get an idea of the technologies used to build a site by looking at other elements. This can give us version numbers to check against CVE databases, or an general idea of where other site functionality may be.

Some of this information may come from our mapping phase (see below) when we examine the source of the site.

File Extensions

The extension of a file might give us an idea of what technology is being used.

For example:

- ```.php, .phpx``` indicates that PHP is being used
- ```.asp, .aspx``` Lets us know that ASP.net is being used (its also a good indication of a windows IIS server)
- ```.jsp``` Java Server Pages
Meta Data

The <head> section of a site may also contain <meta> tags that give us an idea of the types of technology used.

For example <meta name="generator" content="WordPress 3.9.2" /> gives us an indication the site is based on wordpress, and gives us a version number. These documents have <meta name="generator" content="mkdocs-1.2.1, mkdocs-material-7.1.7"> showing the version number of MKDocs used.

Comments

Comments can also help us identify the content generator and version number. For example copyright strings or similar.

<!-- Powered By XXX -->

Mapping the Site

Having identified technologies, the next stage in our recon process mapping the site.

Here we examine the content and functionality in order to understand what the application does, how it behaves, and to identify any of the areas where there may be flaws.

The approach we take to mapping the site will depend on the amount of information available to us.
If we have access to the source code, then the inner workings of any server side aspects should be clear to us. If we are doing a pen-test without the source, then we may need to infer this behavour. In a black-box test some of this the functionality is easy to identify, but often it is hidden meaning we need to guess what is happening in the background.

Note

We are going to focus on the web service specific elements here. While an attack on a site may depend on things like server configuration or the version of other back end technology. This has a better fit for the Practical Pen Testing module.

Identifying specific issues may also rely on detailed knowledge of the problem, which we may not be able to exploit at that current time1. However, identifying these possible areas, then getting further help (or research) into whether they are vulnerable is an important step.

Manually Mapping the Site Structure.

The next stage will help us start to identify content, and site functionality.

Our basic approach would be to examine the application from the landing page (index or whatever). We can then visit every link, and take a look at pages containing functionality (for example Login Pages). We keep repeating this process until we have built a map of the entire site and its functionally.

We do this process in two ways:

  • Looking at the page Visually, (IE Eyeballing the Rendered Page itself)
  • Looking at the source code (IE looking at the raw HTML for the source)

Note

Depending on the size of the site, we can take a fully manual approach to this. In many CTFs (and the exersises in this module2), the vulnerable content should be reasonably easy to find.
On a larger RealWorld™ site, you may want to use tools like the sitemap funciton of Burp to help manage the load.

Lets take a look at the index page for the app we are going to use in the Labs.

Visual Recon

For the visual recon stage we want to look at the rendered web page and identify

  • Any Links from this page (this can give us an idea of site structure)
  • Places the user can interact with the site (i.e. search or dropdowns)
  • Places the user can send data to the site.
  • Any "Dynamic" content, where the information displayed is based on some user specified parameter.

Example

Lets do the visual recon stage for teh example site.

TODO

Looking at the page, we can see that there are several intersting elements here:

  • Navigation Bar, with links to other site pages
  • Login Dialog

We also have a little bit of information disclosure, The footer of the page tells us the site author, and the version, this may come in handy for any OSInt we perform.

Checking the Request and Response

It can also be worth checking the request and response headers to see what information is being sent and received. Things to look for here are

  • Cookies being set in the response
  • Any API keys that are sent as part of a request

Source Code Recon

We also review the page using the View Source function (and the server source code if it is available).

Again we are looking to identify any elements that may be of use later in the test, we can also confirm things like details of the links identified by eyeballing the site.

The source code can also give us information on some of the technologies used in the site. For example we will be able to see details of JavaScript libraries used, and possibly any web frameworks.

  • Look through comments, sometimes we can discover information about "hidden" pages, or other useful stuff
  • Identify links to other pages
  • Identify areas of user input (such as forms)
  • Identify any Frameworks used for development
  • Identify any JavaScript or client side services used

We will also want to dig into the source of any JavaScript files we find related to site functionality.

We can safely4 ignore the sources for well known frameworks (like JQuery), but it is worth making a note of version numbers, as it gives some idea of the technologies powering the site,

Example

Looking at the source code for the Exmaple site we see

TODO

This allows us to confirm the links we identified when we Eyeballed the site.

We also learn:

  • The Bootstrap framework is used for Look and Feel
  • There is a Commented "Hidden" link to an Admin Page
  • Details of the request sent in the form used for Login.

Moving on to the "About" page

Having identified the interesting elements for the page, we should be starting to build an idea of the site structure and functionality. We move on to the next page that we have identified from looking at the links (about.html), and repeat the process. We keep doing this until we have looked at every page we have discovered.

TODO

!!! example:

Looking at the About page we see quite a bit of repition.

  - The Navigation Links are the same as on the Index page
  - Same use of Bootstrap etc for the look and feel

We do however have some differences:

  - Login option from the page has been removed.
  - We have a dropdown to select some information on the page.

Once we are satisfied we have identified all the interesting pieces of information we can move on the the next page, and repeat the process until we have mapped all of the pages in the application.

Checking the differences when Authenticated.

Many sites will require authentication to get access to all of the content.

Once we authenticate with the site we can continue the mapping process, looking at any new content that we have identified:

  • Make a note of changes to functionality on pages we have already visited
  • Visit new links that are available to authenticated users
  • Check Cookie and / or Session details used for authentication.

Tip

When it comes to dealing with pages where authentication, or interaction is needed, different people have different approaches.

I like to build the recon up in stages.
- First I will focus on the pages that don't require authentication, then move to pages that do. - For the first pass, I will tend to ignore forms or other ways of interacting with the site itself.
Then go back to them once I have made my first pass.

Mapping Single Page Applications

"Modern" web design makes heavy use of Single Page Applications, where a majority of the page is generated client side.

This means that the source visible though "View Source" will be limited to whatever scripts are used to build the page.

For example: Lets take a look at the landing page for OWASP Juice Shop3

Juice Shop Landing Page

Then compare the content to the page source, we can see that the raw HTML is missing most of the page.

    <!--
    ~ Copyright (c) 2014-2021 Bjoern Kimminich.
    ~ SPDX-License-Identifier: MIT
    -->

    <!doctype html>
    <html lang="en">
    <head>
        <meta charset="utf-8">
        <title>OWASP Juice Shop</title>
        <meta name="description" content="Probably the most modern and sophisticated insecure web application">
        <meta name="viewport" content="width=device-width, initial-scale=1">
        <link id="favicon" rel="icon" type="image/x-icon" href="assets/public/favicon_js.ico">
        <link rel="stylesheet" type="text/css" href="//cdnjs.cloudflare.com/ajax/libs/cookieconsent2/3.1.0/cookieconsent.min.css"/>
        <script src="//cdnjs.cloudflare.com/ajax/libs/cookieconsent2/3.1.0/cookieconsent.min.js"></script>
        <script src="//cdnjs.cloudflare.com/ajax/libs/jquery/2.2.4/jquery.min.js"></script>
        <script>
            window.addEventListener("load", function(){
            window.cookieconsent.initialise({
                "palette": {
            "popup": { "background": "#546e7a", "text": "#ffffff" },
          "button": { "background": "#558b2f", "text": "#ffffff" }
            },
            "theme": "classic",
            "position": "bottom-right",
            "content": { "message": "This website uses fruit cookies to ensure you get the juiciest tracking experience.", "dismiss": "Me want it!", "link": "But me wait!", "href": "https://www.youtube.com/watch?v=9PnbKL3wuH4" }
            })});
        </script>
        <link rel="stylesheet" href="styles.css">
    </head>
    <body class="mat-app-background bluegrey-lightgreen-theme">
    <app-root></app-root>
    <script src="runtime-es2018.js" type="module"></script><script src="runtime-es5.js" nomodule defer></script><script src="polyfills-es5.js" nomodule defer></script><script src="polyfills-es2018.js" type="module"></script><script src="vendor-es2018.js" type="module"></script><script src="vendor-es5.js" nomodule defer></script><script src="main-es2018.js" type="module"></script><script src="main-es5.js" nomodule defer></script></body>
    </html>

While we can infer some of the technologies used (runtime-es2018.js could indicate Angular is used as it matches the default naming scheme)
Until the page is parsed we don't get to see the content, so we don't get access to links etc.

Spiders and tools like Burp's sitemap will have the same issue, as they usually deal with the raw HTML. Unless they run the JavaScript, they are not going to get the unvisited links listed either. Again we will have to manually navigate the pages to build up our sitemap.

In this case the Inspect Element tool inside the browser can show us the generated source.

Automated Spidering

Some tools such as Burp, OWASP Zap, and SkipFish can try to reduce the workload when mapping by automating the process for us using a web spider (or web crawler). Like all automated tools, we should not rely solely on them, but instead make use of automation, and manual analysis to make sure we get all of the information.

In this case they follow a similar process to our manual audit. Identifying links and other interesting information on a page, then moving on to the next page in the list. The functionality offered by the tools can differ, and we have the ability to set different levels of scan.

For example we could ask SkipFish to try brute forcing URL parameters, or submit data to the forms to infer details and try to find issues. However, a high level of automation can also have problems.

Possible Issues with Automated Spidering

We need some caution if we are using an fully automated tool to map the site.

  • Depending on the URL structure or parameters, the spider may not find all elements of a site. Another issue can occur with REST like URLS, where the spider will attempt to enumerate the database, or get caught in an infinite loop

    For example: If we had an online store with a REST style interface of /product/<id> where changing the ID changes the product that is displayed.

    With Manual Recon it will be obvious that the pages have the same functionality, but the content comes from elsewhere. An automated spider may see each page as a new item and try to visit /product/1, /product/2 ... /product/N

  • There may also be links that change an applications state. For example: If the site requires authorisation to view pages, the crawling may be cancelled when the spider visits the logout page.

  • There May be "Dangerous" functionality that can change the data stored within an application. For example: an Edit URL that changes site content. If the spider starts submitting random strings to this page, it may change (or even worse if its a live site remove) content.

Summary

In this article we have looked at a generalised recon process. We have started to identify elements of the site that can be of interest to attackers, and areas that could be manipulated.

Once we have finished the mapping process we should have

  • Some idea of the technologies used in the site
  • A List of all the pages in the website
  • A list of all the areas where we can supply user input
  • An Idea of how any JavaScript elements are used for client side interaction.
  • Any cookies or client side storage that the site may use.

Note

We will run through the mapping process in the lab tasks for this week. As part of it we will also make use of the built in mapping tools in Burp Suite, to see how they can help us identify functionality.

In the next article we will look identifying Information flows, and strategies for mapping the logic of a site.


  1. For example, we are going to find some potential XSS vulnerabilities here, but I don't expect you to actually exploit them. Knowing that unsanitised input can be bad, is more than enough 

  2. Except the ones where I want you to hunt for hidden pages. 

  3. A Purposefully vulnerable single page web application built in JavaScript using "modern" design principles. Is a really nice way to play with vulnerabilities. You can grab a copy from Juice Shop Project Page 

  4. Except in the case of that one HTB machine where the info for the foothold was hidden in a custom JQuery. Always check the Hashes. 

Back to top