GitHub - emptyxl/HCSpider: A generic spider for security testing

HCSpider

A generic spider for security testing. The built-in spiders in common scanning tools crawl all the links to the HTML file. This may seem like a very comprehensive way, but it is incomplete and inefficient for security testing. Many web pages use js or ajax to dynamically generate page content. The urls in these content cannot be obtained through some library like requests. In addition, the content of many pages is generated by the frameworks. For example, the display of merchandise, the playing page of video, etc., except for the content, the logic is exactly the same. For security testing, we don't need to crawl and test those same pages.

Our spider use selenium + headless chrome to simulate a real browser environment and parse url in both static and dynamic content. We use the bloom filter and the url path similarity to clean the duplicate content.

Usage

When you install all relevant dependencies, you can use start spider easily like this:

start_spider('https://www.example.com', domain=REG_DOMAIN, deep=0, delay_level=3, method='get', data=None, cookie='a=1;b=3')

    surl        strat url: "https://www.baidu.com/?a=1&b=2" / scheme is required
    domain      domain scope      [default = *]
    deep        recursion depth   [default = 0]
    delay_level delay crawl level [default = 0] / scope:0-5
    method      support get/post in lowercase [default = get]
    data        post method parmas [default = None]
    cookie      you can copy the chrome cookies field directly, we will parse it [default = None]

In addition to these parameters, there are some other fields that need attention, you need to change them according to your situation:

MAX_RECURSION_DEPTH     The maximum depth relative to the root node
url_bloom               BloomFilter accuracy
DB_CONNECT_STRING       Your database config
DEFAULT_THREAD_NUMBER   The number of threads

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
hcspider.py		hcspider.py
parse_sim_url.py		parse_sim_url.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HCSpider

Usage

About

Uh oh!

Releases

Packages

Languages

License

emptyxl/HCSpider

Folders and files

Latest commit

History

Repository files navigation

HCSpider

Usage

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages