Google Hacking
In the previous topic we discussed search engines and how they gather information. In this article we will take a closer look at search results are returned, and examine the advanced search terms we can use to quickly find the information we need.
Searching Basics: How data is returned.
When we search for something using search engine, the terms we use are compared against the index and relevant pages are returned. These are ordered by some "relevance" metric. As with the indexing techniques, the exact method used to determine relevance is proprietary. While second guessing the "page rank" mechanism may be of interest in search engine optimisation, I feel it has little relevance to our reconnaissance process. However, there are some things we need to consider with the search results.
-
There is evidence that your previous search history will effect results, Often searching as a "Logged in User" and without logging in will return different results. This is due to the search engines collecting data on your preference and trying to return results that you will find relevant. For example, I make use of the \LaTeX typesetting language for writing papers and other materials. Google tends to favour results about the typesetting language, rather than the synthetic rubber when I search1.
-
Different combinations and the ordering of search terms will present different results, so "Coventry University ethical hacking" may return a different set of results to "Ethical Hacking University Coventry".
-
Google tends to be aware of suffix and variations on words (for example ownership so CUEH and CUEH's will be returned, or Color and Colour).
-
Pay attention to the first couple of results returned, these are often affiliated or paid links based on a set of terms, and may not be relevant to the reconnaissance process.
Now lets look as some of the basic search modifiers:
The Basic Search. AND
The basic search is based on AND, when we enter a set of search
terms, the results are based upon pages that will contain all of the
words. For example "Coventry University ethical hacking
will
search for pages that contain the works "Coventry AND University AND
Ethical AND Hacking".
Searching for Phrases
If we wish to search for pages containing the exact phrase Coventry
University Ethical Hacking you need to enclose the search terms in
quotation marks "Coventry University Ethical Hacking"
You can see this reduces the number of results returned from >150k to about 100 items. Also note that google is aware of suffixes, so things like Coventry University, and Coventry Universities will still be returned.
Searching for Alternative words
We can use the OR
modifier to search for alternative words, For
example, lets search for all posts related to "Coventry University"
that contain the words "Ethical" OR "Hacking"
Coventry university ethical OR hacking
This time we get hits for the CU Ethics approval page, as well as information related to the Ethical Hacking course.
Removing words from the search results
Sometimes we may get many results from a specific site, or featuring a term that we are not interested in. We can remove specific search terms by prefixing a word with a minus symbol.
For example if we search for "Ethical Hacking" a lot of the results
are from Training organisations offering Certification. We can remove
these matches by using the search term Ethical Hacking -training
NOTE: This will remove from the main set of results, however you may still get advertising and paid links at the top of the search results.
Summary of Basic Search Terms
We can start to build up a much more detailed search using these basics operators. Looking for pages that contain combination of words, alternate words and removing certain terms from our search.
A quick reference for the basic search terms is below
Operator | Search Results |
---|---|
Foo Bar | pages containing both "Foo" AND "bar" |
"Foo Bar" | pages containing the phrase "Foo Bar" |
Foo Bar OR Baz | pages containing "Foo" AND the words "Bar" OR "Baz" |
Foo Bar -Baz | pages containing both "Foo" and "Bar" but not "Baz" |
Advanced Filtering
While the basic search operators allow us to modify the terms we are looking for. We also have control over the types of data returned, and where it comes from.
Searching a specific site
We could limit the search results to a specific website, (or domain), using the site:
operator.
For example, to search for pages on hacking within
www.coventry.ac.uk can can use site:coventry.ac.uk hacking
Looking for filetypes.
We can also search for specific filetypes (for example PDF files),
using the filetype:
operator. This can be useful when trying to
locate documents that may contain interesting data (i.e. Excel documents
may contain lists of people). We can also use the ext:
operator
to search for files with a given extension.
For example, lets look for all PDF documents related to lectures on the Coventry web page
site:coventry.ac.uk lecture filetype:pdf
returns a load of
information on the university policy for recording lectures.
Page Titles and URLS
Sometimes it is useful to search based on the page title (the bit displayed in the tab for a page). This can help us to filter out content where the search term appears in the text of a page (for example the word "Login").
For example if we search for site:coventry.ac.uk login we get a lot of false positives, where the login element is a link or similar. We can filter down show only pages with login in the title using.
site:coventry.ac.uk intitle:login
Sometimes searching for a specific term in a URL may also be useful.
This can often give us the same sort of results as the page title
search. For this use the inurl:
modifier.
Getting Sneaky: Passive Recon With Google
Now we have introduced the more useful search terms, lets see how we could use them as part of our passive recon process.
Important
While all the information we are going to be looking for is technically in the public domain, this is another one of the grey areas we cover. You need to get used to considering the ethics of google hacking, and only use the techniques on sites you have permission to access.
Remember that using search engines to find this info relies on the pages being indexed. Depending on a site's settings (for example, the robots.txt file, or how much requires authentication) these approaches may not yield a huge amount of information.
In addition, if you actually visit the site in question (rather than just using the search results) it is not purely passive reconnaissance. Remember that connecting to an organisation's servers will leave a trail, and so you need to consider the rules of engagement for the test.
What kinds of information do we want to find?
Let's consider the types of information that we might want to discover:
- Subdomains for an organisation
- Login pages
- Documents, including backup and temporary files
- Well known administrative tools used on the site
- Testing and Debug pages (such as directory listings)
Using Google to discover subdomains
We can use DNS record information, and sites like netcraft to
discover information about subdomains associated with a site. We can
also make use of google to find this information. The logic here is
to make use of the site
operator to identify all pages associated
with the site, then remove pages linked to the main www domain.
Our initial cut of the search term then becomes
site:coventry.ac.uk -www.coventry.ac.uk
This gives us a good hit rate, but we may need to filter out duplicate items.
One benefit of subdomains is that they help us to identify independent parts of the organisation that may not be running on the main infrastructure. There may be a server associated with each of these groups that is not under the full control of IT services. In this case, the security of the site may be lower than the rest of the organisation, and give you an entry point.
Searching for Login Pages
Identifying login pages can also be useful, this may allow us to see potential entry points when we are planning a web-based penetration test. Just like before, we can look for pages with the term "login" in the title.
site:coventry.ac.uk intitle:login
We could combine this with the subdomain search used earlier, and attempt to find login pages associated with subdomains of interest, that are outside of core IT services control.
site:coventry.ac.uk -www.coventry.ac.uk intitle:login
Document Types
In the previous article we introduced the filetype:
modifier,
that allows us to search for specific documents. While searching all
PDFs on a site can be a bit hit and miss, we can use these searches to
help us find out about the technologies used.
For example, we could identify server side technologies by searching for php or asp documents.
site:coventry.ac.uk -www.coventry.ac.uk filetype:php
site:coventry.ac.uk -www.coventry.ac.uk filetype:asp
You may also have some luck finding backup files (a common method is
to suffix with .bak
) or temporary files created by a text editor.
If server side technologies are used this can help you discover the
program logic.
... ext:log
... ext:bak
... ext:sql
Additionally configuration files may also be of interest.
... ext:config
... ext:conf
... ext:cfg
... ext:ini
Directory Listings and other admin tools.
Sometimes a server is misconfigured to allow directory listings. We can look for Apache Based servers with this problem using the term Index Of2
site:coventry.ac.uk Index Of
Another interesting search is to look for third party modules (such as blogs) that may give us an entry point. Different technologies have different "signatures" -- for example, WordPress and other CMS tend to have "Powered by" in the page footer. Other indications of WordPress are "wp-*" lines in the URL.
site:coventry.ac.uk "powered by"
site:coventry.ac.uk inurl:wp
Other nice examples of third party tools include
- phpMyAdmin
inurl:phpmyadmin
- webmail
intitle:webmail
Google Dorks
In the previous article we discussed using Google to discover information about an organisation. By making use of the advanced search functionality it was possible to discover documents and parts of the website that may not be intended for public use.
Working out the search terms to use to find this information can be difficult. Fortunately, the guys at offensive security keep a database of useful google hacking search terms.
The google hacking database can be found at https://www.exploit-db.com/google-hacking-database Have a look through the searches, and see how they map to the search times we have looked at during the week.
Many of these searches (called Google Dorks), exploit Security Through Obscurity. The owner of a shiny new webcam sees that the URL is an IP like 10.10.15.24 and doesn't believe that anyone will find the site. However, as we have seen, if one element of the site is public, then google will find it.
Warning
While the information disclosed by google dorks is publicly valuable, we need to make an ethical judgement before using it. Watching Giraffes on a zoos unsecured webcam may be a laugh, but other situations are not so clear cut. Legal Warning: If you do decide to go down the rabbit hole, bear in mind the sites you are looking t. If there is a login page, you may be in violation of the Computer Misuse act (even if they are stupid enough to keep the default password). Again, using these techniques with permission is fine as long as it is part of the terms of engagement.
Summary
In this article we have looked at advanced google search operators. These allow us to refine our searches to help find information we need, and other "interesting" documents that are available on a web page. In the next article we will look at examples of using these search terms to find hidden documents and other useful information.
We have also discussed using Google to discover information about a website and and organisations employees. While I have presented some suggestions for finding information, this is not an exhaustive list. Thinking about the types of data you wish to find, and the way it may be displayed on a web page, will help with the reconnaissance process.
Shodan #shodan
Shodan is another awesome search engine, we wont cover Shodan here, as you need a licence, but its a tool well worth investigating.
Shodan calls itself, "The search engine for the internet of things" and allows you to search not only for web pages, but for services and ports on remote devices. This means that you can use Shodan to search for open Telnet ports (for the obligatory webcam backdoor), RDP servers, the Jesus Port3, or other fun things.
If you have played with Shodan, and have found anything cool, share it in the aula