Scan Discovery
Also known as Scan Spider
The navigation of your scanned web application is entirely configurable by HawkScan. To find meaningful vulnerabilities, HawkScan will try to discover parts of your site, intercepting the request & response HTTP payloads as it navigates your web application.
This process is called Scan Discovery and is configured under the hawk.spider
section of the stackhawk.yml
file.
stackhawk.yml
hawk:
spider:
maxDurationMinutes: 5 # maximum allowed time in minutes for any enabled spiders to crawl your web application.
seedPaths: [] # list of paths to directly add to the site tree.
base: true # basic spider utility that looks at html source files and follows urls it finds. Enabled by default.
ajax: false # more complex spider operation that follows dynamic links and buttons on application.
custom: {} # bring your own developer tools and use generated web traffic to discover your application.
These mechanisms are best suited for discovering running web applications that serve Content-Type: text/html;
, including server-side rendered and MVC-shaped web applications. While HawkScan will try to deterministically and consistently scan a running website, the results of the Scan Discovery phase can be more variable for larger web applications with more links and changing content.
For more consistent and protocol constrained REST API scanning, you should specify a configuration such as OpenAPI specification instead of relying on Scan Discovery mechanisms. HawkScan also supports scanning GraphQL and SOAP APIs.
maxDurationMinutes
Multiple spiders can be enabled for a scan; however, the full navigation of your web application may take a long time if the app is sufficiently large. This setting limits the amount of time all configured spiders may take when operating. By default this is 5 minutes. Larger web applications may need more time to scan in pre-production, whereas a shorter feedback time is better when scanning in development.
seedPaths
Explicitly adds routes to the site tree. HawkScan visits the host URL and any routes added here directly during the scan. These paths will be used as additional starting points for crawling your application. This parameter is useful for defining routes that are not readily crawlable from the root of your application host. For example, a hidden page like /admin
.
NOTE: This configuration is NOT a replacement for an API definition and provides no benefit to pure REST API’s.
base
Spider
The base
spider is the basic web crawler for discovering your application’s routes. This spider is appropriate for most traditional web applications. This spider will reach new pages in the web application by finding URLs in the Content-Type: text/html;
responses and breadth-first-searching those paths until it has reached all feasible pages.
Toggle it’s operation with true
or false
.
NOTE: This feature is enabled by default.
ajax
Spider
The ajax
spider is a more complex web crawler that is designed to discover and find new pages in more dynamic websites or Single Page Applications.
This spider leverages Selenium to follow an unscripted process for clicking any buttons and links it encounters.
Toggle its operation with true
or false
.
You can additionally configure which browser to use with spider.ajaxBrowser
setting. Options include:
FIREFOX_HEADLESS
(default)FIREFOX
CHROME_HEADLESS
CHROME
NOTE: To use the spider.ajax option with the CLI you must have Firefox or Chrome installed and set spider.ajaxBrowser appropriately. This spider is not available in the arm64
HawkScan Docker image.
custom
Scan Discovery
Software Developers that are skillful and successful with HawkScan tend to use other great application testing tools. These tools may generate web traffic and support proxying that traffic into other software. These capabilities can be reused with HawkScan to check the tested application web traffic for software vulnerabilities.
Toggle its operation by specifying a custom.command
to be run.
See the Custom Scan Discovery page for more details.