SecretScraper是一个高度可配置的网页爬虫工具,从目标网站抓取链接,并通过正则表达式抓取敏感数据。
Python 版本 >= 3.9
- 网络爬虫:通过 DOM 层次结构和正则表达式提取链接
- 支持域名白名单和黑名单
- 支持多个目标,从文件输入目标网址
- 支持本地文件扫描
- 可扩展的自定义:标头、代理、超时、cookie、抓取深度、跟随重定向等
- 内置正则表达式,用于搜索敏感信息
- 以 yaml 格式灵活配置
pip install secretscraper
更新:
pip install --upgrade secretscraper
secretscraper -u https://xxxxxxx.com/
secretscraper -f urls
http://xxxxxxx.com/1
http://xxxxxxx.com/2
http://xxxxxxx.com/3
http://xxxxxxx.com/4
> secretscraper --help
Usage: secretscraper [OPTIONS]
Main commands
Options:
-V, --version Show version and exit.
--debug Enable debug.
-a, --ua TEXT Set User-Agent
-c, --cookie TEXT Set cookie
-d, --allow-domains TEXT Domain white list, wildcard(*) is supported,
separated by commas, e.g. *.example.com,
example*
-D, --disallow-domains TEXT Domain black list, wildcard(*) is supported,
separated by commas, e.g. *.example.com,
example*
-f, --url-file FILE Target urls file, separated by line break
-i, --config FILE Set config file, defaults to settings.yml
-m, --mode [1|2] Set crawl mode, 1(normal) for max_depth=1,
2(thorough) for max_depth=2, default 1
--max-page INTEGER Max page number to crawl, default 100000
--max-depth INTEGER Max depth to crawl, default 1
-o, --outfile FILE Output result to specified file in csv format
-s, --status TEXT Filter response status to display, seperated by
commas, e.g. 200,300-400
-x, --proxy TEXT Set proxy, e.g. http://127.0.0.1:8080,
socks5://127.0.0.1:7890
-H, --hide-regex Hide regex search result
-F, --follow-redirects Follow redirects
-u, --url TEXT Target url
--detail Show detailed result
--validate Validate the status of found urls
-l, --local PATH Local file or directory, scan local
file/directory recursively
--help Show this message and exit.
内置配置如下图所示。您可以通过 分配自定义配置。-i settings.yml
verbose: false
debug: false
loglevel: critical
logpath: log
handler_type: re
proxy: "" # http://127.0.0.1:7890
max_depth: 1 # 0 for no limit
max_page_num: 1000 # 0 for no limit
timeout: 5
follow_redirects: true
workers_num: 1000
headers:
Accept: "*/*"
Cookie: ""
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.87 Safari/537.36 SE 2.X MetaSr 1.0
urlFind:
- "["'‘“`]\s{0,6}(https{0,1}:[-a-zA-Z0-9()@:%_\+.~#?&//={}]{2,250}?)\s{0,6}["'‘“`]"
- "=\s{0,6}(https{0,1}:[-a-zA-Z0-9()@:%_\+.~#?&//={}]{2,250})"
- "["'‘“`]\s{0,6}([#,.]{0,2}/[-a-zA-Z0-9()@:%_\+.~#?&//={}]{2,250}?)\s{0,6}["'‘“`]"
- ""([-a-zA-Z0-9()@:%_\+.~#?&//={}]+?[/]{1}[-a-zA-Z0-9()@:%_\+.~#?&//={}]+?)""
- "href\s{0,6}=\s{0,6}["'‘“`]{0,1}\s{0,6}([-a-zA-Z0-9()@:%_\+.~#?&//={}]{2,250})|action\s{0,6}=\s{0,6}["'‘“`]{0,1}\s{0,6}([-a-zA-Z0-9()@:%_\+.~#?&//={}]{2,250})"
jsFind:
- (https{0,1}:[-a-zA-Z0-9()@:%_+.~#?&//=]{2,100}?[-a-zA-Z0-9()@:%_+.~#?&//=]{3}[.]js)
- '["''‘“`]s{0,6}(/{0,1}[-a-zA-Z0-9()@:%_+.~#?&//=]{2,100}?[-a-zA-Z0-9()@:%_+.~#?&//=]{3}[.]js)'
- =s{0,6}[",',’,”]{0,1}s{0,6}(/{0,1}[-a-zA-Z0-9()@:%_+.~#?&//=]{2,100}?[-a-zA-Z0-9()@:%_+.~#?&//=]{3}[.]js)
dangerousPath:
- logout
- update
- remove
- insert
- delete
rules:
- name: Swagger
regex: b[w/]+?((swagger-ui.html)|("swagger":)|(Swagger UI)|(swaggerUi)|(swaggerVersion))b
loaded: true
- name: ID Card
regex: b((d{8}(0d|10|11|12)([0-2]d|30|31)d{3})|(d{6}(18|19|20)d{2}(0[1-9]|10|11|12)([0-2]d|30|31)d{3}(d|X|x)))b
loaded: true
- name: Phone
regex: "['"](1(3([0-35-9]\d|4[1-8])|4[14-9]\d|5([\d]\d|7[1-79])|66\d|7[2-35-8]\d|8\d{2}|9[89]\d)\d{7})['"]"
loaded: true
- name: JS Map
regex: b([w/]+?.js.map)
loaded: true
- name: URL as a Value
regex: (bw+?=(https?)(://|%3a%2f%2f))
loaded: false
- name: Email
regex: "['"]([\w]+(?:\.[\w]+)*@(?:[\w](?:[\w-]*[\w])?\.)+[\w](?:[\w-]*[\w])?)['"]"
loaded: true
- name: Internal IP
regex: '[^0-9]((127.0.0.1)|(10.d{1,3}.d{1,3}.d{1,3})|(172.((1[6-9])|(2d)|(3[01])).d{1,3}.d{1,3})|(192.168.d{1,3}.d{1,3}))'
loaded: true
- name: Cloud Key
regex: b((accesskeyid)|(accesskeysecret)|b(LTAI[a-z0-9]{12,20}))b
loaded: true
- name: Shiro
regex: (=deleteMe|rememberMe=)
loaded: true
- name: Suspicious API Key
regex: "["'][0-9a-zA-Z]{32}['"]"
loaded: true
- name: Jwt
regex: "['"](ey[A-Za-z0-9_-]{10,}\.[A-Za-z0-9._-]{10,}|ey[A-Za-z0-9_\/+-]{10,}\.[A-Za-z0-9._\/+-]{10,})['"]"
loaded: true
项目地址-
https://github.com/PadishahIII/SecretScraper
原文始发于微信公众号(安全小圈):敏感信息扫描工具推荐 SecretScraper
免责声明:文章中涉及的程序(方法)可能带有攻击性,仅供安全研究与教学之用,读者将其信息做其他用途,由读者承担全部法律及连带责任,本站不承担任何法律及连带责任;如有问题可邮件联系(建议使用企业邮箱或有效邮箱,避免邮件被拦截,联系方式见首页),望知悉。
- 左青龙
- 微信扫一扫
-
- 右白虎
- 微信扫一扫
-
评论