Google Dorking is all about pushing Google Search to its limits, by using advanced search operators to tell Google exactly what you want. Many people view it as a Hacking Technique to find unprotected sensitive information about a company, but I try to view it as more of the Hacker Way of Thinking because I use Google Dorks for far more than security research.
I first realized the power of a Google Dork when I was looking for the Cisco Anyconnect software used to connect to Cisco VPNs. However, Cisco didn’t allow for downloading the software and googling “Cisco Anyconnect” led to many websites just talking about the software. I looked at the advanced search operators of Google and noticed I could search the titles of web pages. After seeing I could search by the HTTP Titles, I had an idea that if I searched for "Open Directory Listings", then my results would be just files. I changed my search query to intitle:index.of cisco anyconnect and suddenly I found loads of results. I ended up downloading the executable and then generating an MD5sum hash of the file then comparing it with what was on Cisco’s site. Once I found a file that matched up, I knew it hadn’t been tampered with and was safe to run.
At the time I didn’t know this was called Google Dorking, it wasn’t until I came across the Google Hacking for Penetration Testers BlackHat presentation that I realized the full power of Google.
Using Advanced Search Operators lets me find almost anything I want on the web. Most people know about queries like site:hackthebox.com ext:pdf to show all the PDFs hosted on a domain. This can often be combined with exiftool to extract metadata from documents revealing potential usernames, dates, and software used. However, many people don’t think about using it against cloud storage like site:drive.google.com hackthebox. Thankfully, this search result doesn’t come up with too much. Google isn’t crawling drive.google.com itself but instead looking for documents posted on the public internet. My favorite Google Dorks are:
These operators examine the URL, I most often use the site because many websites have bad searching themselves. For example, if you used Reddit’s built-in search to search for ippsec and then use the Google Search: site:reddit.com ippsec you will likely get completely different results.
The INURL piece is valuable if you are looking for a phrase to be in the URL but don’t care what website it is used on. This is often used for measuring the impact a web exploit may have. For example, if a vulnerability came out with a WordPress plugin, I would find the filename of the plugin and use inurl:file_used_by_plugin.php and see how many websites could be affected by this vulnerability. I would then build a list of websites that offer bug bounties or take part in programs like Synack and then check if any of those sites appear on the list.
Whenever performing a Google query, putting a hyphen will subtract that query from the results. This is extremely useful when trying to remove sites or portions of URL’s from the results. Using -site:website.com will make sure that the website does not appear on your search results. This is also useful with the inurl: piece as it can remove portions of a website from the results.
Google keeps records of when pages were first seen or last modified. Using the before/after tag is a great way to narrow the search. There have been plenty of times when a recent headline keeps filling up my Google Search results. Adding the tag before:<date> is a great way to eliminate that. Also when an exploit comes out, I typically will use the after:<date> tag to find the latest proof of concepts. If the exploit came out seven days ago, I may set the after tag to two days ago and try to find the newest proof of concepts which tend to be the more advanced ones, whereas the exploits that come out immediately after tend to just be denial of services.
Websites will often present different information to search engines for Search Engine Optimization (SEO) reasons. Using cache:url will allow you to view the page a website returned to Google. This is useful when the website is hiding information behind a login. There is a misnomer around this feature, I’ve seen many people consider this “Passive Recon” meaning that if you viewed a website through the google cache then the website would have no idea you went there. This is not true because oftentimes the Google cache won’t rewrite some links that automatically load when viewing the cached result, and your browser will still make web requests to the target webserver.
If you navigate to google.com/advanced_search, you’ll be presented with a page that helps craft a Google Dork and displays the syntax to perform the search. The most interesting setting is being able to change the region to display results. Google serves results it thinks are of interest to you and one of the main deciding factors is where it thinks you are located. For example, if I perform the Google Search of “Google”, the first result goes to google.com. If I change the regional settings to the UK it gives me google.co.uk. This can be useful to map out various countries an organization has infrastructure. However, not all Google Dorks are for OSINT, this is equally useful for searching news of foreign countries. As I am in the US, if I search for news related to the UK Google will still give priority to the US websites that are covering UK news. If I perform the same search related to the UK but change the regional settings to the UK, it will now favor local UK websites over the US-based ones.
My favorite demo of Google Dorking is with LinkedIn as I use it almost every engagement to improve the number of employee names I harvest in the OSINT stage. When you use LinkedIn to view employees of a company, and don’t have any connections to a person, LinkedIn will only show you their avatar, job title, and location. You don’t have a link to go to that person's actual profile.
However, if you make a Google Search like site:linkedin.com jobtitle companyname Google will often find the LinkedIn profile for you, which reveals their name! Also, remember LinkedIn is a global company. If the company I’m searching for employees for is in a different country than myself, I will often change my regional settings to match that company.
There is a lot more to Google Dorking than what I said here, by far the best resource to go to is Exploit-DB’s Google Hacking Database. I’m sure once you start reading the Google Dorks themselves you’ll get a lot of ideas at dorks that would be useful to you. If not, then search for various cloud providers like Google Drive, OneDrive, Dropbox, etc, and see how you can use Google to crawl those documents.
The Pentest-Tools google hacking page is a great way to get started with Google Dorking. I really like this site because it gives 14 different types of Google Dorks, such as looking for documents, log, or configuration files, and then opens the Google Search on a new page. This is a great way to get started with Google Dorking as it guides you through some of the common Google Dorks. However, eventually, you should depend less on this tool as it can limit creativity if you’re just going through a checklist.
There are a lot of automated tools that will run a large number of Google dorks for you and output to text files letting you grep through the results. This is beneficial because the main time-consuming part with a Google Dork is clicking through all the Google pages. If you just want a list of all PDF’s on a site, it will take a while writing each URL down 10 at a time (default results per page). Thankfully programs do that clicking for us! These tools change pretty often so it’s hard to say what is the best, maybe use a site:github.com after:(6 months ago) Google Dork query and look for the most up-to-date tools. However, one I like using is this dork scanner.
Many OSINT skills are really just investigative skills. Even if you don’t intend to focus on OSINT, you would be surprised at how much knowing the basics can help with general research. Google Dorking is a great example of this but to learn even more check out our OSINT: Corporate Recon Academy course.