LSpider 一个为被动扫描器定制的前端爬虫

Overview

LSpider

LSpider - 一个为被动扫描器定制的前端爬虫

什么是LSpider?

一款为被动扫描器而生的前端爬虫~

由Chrome Headless、LSpider主控、Mysql数据库、RabbitMQ、被动扫描器5部分组合而成。

(1) 建立在Chrome Headless基础上,将模拟点击和触发事件作为核心原理,通过设置代理将流量导出到被动扫描器。

(2) 通过内置任务+子域名api来进行发散式的爬取,目的经可能的触发对应目标域的流量。

(3) 通过RabbitMQ来进行任务管理,支持大量线程同时任务。

(4) 智能填充表单,提交表单等。

(5) 通过一些方式智能判断登录框,并反馈给使用者,使用者可以通过添加cookie的方式来完成登录。

(6) 定制了相应的Webhook接口,以供Webhook统计发送到微信。

(7) 内置了Hackerone、bugcrowd爬虫,提供账号的情况下可以一键获取某个目标的所有范围。

为什么选择LSpider?

LSpider是专门为被动扫描器定制的爬虫,许多功能都是为被动扫描器而服务的。

建立在RabbitMQ的任务管理系统相当稳定,可以长期在无人监管的情况下进行发散式的爬取。

LSpider的最佳实践是什么?

服务器1(2c4g以上): Nginx + Mysql + Mysql管理界面(phpmyadmin)

将被动扫描器的输出位置设置为web路径下,这样可以通过Web同时管理结果以及任务。

LSpider部署5线程以上,设置代理连接被动扫描器(被动扫描器可以设置专门的漏扫代理)

服务器2(非必要,但如果部署在服务器1,那么就需要更好的配置):RabbitMQ

还有什么问题?

LSpider从设计之初是为了配合像xray这种被动扫描器而诞生的,但可惜的是,在工具发展的过程中,深刻认识到爬虫是无法和被动扫描器拆分开来的。

强行将应该在被动扫描器实现的功能在爬虫端实现简直是舍本逐末,所以我们发起了另一个被动扫描器项目,如果有机会,后续还会开源出来给大家。

设计思路?

为被动扫描器量身打造一款爬虫-LSpider

Usage

安装&使用

你可以通过下面的命令来测试是否安装成功

python3 manage.py SpiderCoreBackendStart --test

值得注意的是,以下脚本可能会涉及到项目路径影响,使用前请修改相应的配置

启动LSpider webhook(默认端口2062)

./lspider_webhook.sh

启动LSpider

./lspider_start.sh

完全关闭LSpider

./lspider_stop.sh

启动被动扫描器

./xray.sh

一些关键的配置

配置说明

如何配置扫描任务 以及 其他的配置相关

其中包含了如何配置扫描任务、鉴权信息、webhook。

值得注意的是,文中提到的Cookie配置,格式为浏览器请求包复制即可。

如何配置扫描任务 以及 其他的配置相关

使用内置的hackerone、bugcrowd爬虫获取目标

使用hackerone爬虫,你需要首先配置好hackerone账号

 python3 .\manage.py HackeroneSpider {appname}

同理,bugcrowd使用

 python3 .\manage.py BugcrowdSpider {appname}

404StarLink

LSpider 是 404Team 星链计划中的一环,如果对LSpider有任何疑问又或是想要找小伙伴交流,可以参考星链计划的加群方式。

Comments
  • 使用遇到了问题

    使用遇到了问题

    [WARNING] [Thread-5] [00:33:08] [LReq.py:115] [LReq] something error, Traceback (most recent call last): File "/home/ubuntuvm/LSpider/utils/LReq.py", line 75, in get return method(url, args) File "/home/ubuntuvm/LSpider/utils/LReq.py", line 179, in getRespByChrome return self.cs.get_resp(url, cookies) File "/home/ubuntuvm/LSpider/core/chromeheadless.py", line 134, in get_resp self.add_cookie(cookies) File "/home/ubuntuvm/LSpider/core/chromeheadless.py", line 192, in add_cookie value = cookie.split('=')[1].strip() IndexError: list index out of range

    [WARNING] [Thread-5] [00:33:08] [htmlparser.py:86] [AST] something error, Traceback (most recent call last): File "/home/ubuntuvm/LSpider/core/htmlparser.py", line 42, in html_parser soup = BeautifulSoup(content, "html.parser") File "/usr/local/lib/python3.8/dist-packages/bs4/init.py", line 310, in init elif len(markup) <= 256 and ( TypeError: object of type 'bool' has no len()

    报这个错误 不知道怎么解决

    opened by 294517102 3
  • pika.exceptions.AMQPConnectionError 错误

    pika.exceptions.AMQPConnectionError 错误

    运行lspider_start.sh 提示pika.exceptions.AMQPConnectionError

    ubuntu20,python3.8,RabbitMQ 3.9.10,Erlang 24.1.7 http://ip:2062可访问,http://ip:15672可访问,且新建Virtual Hosts为lyspider。 lspider与rabbitmq位于一机,且rabbitmq使用docker,命令如下: docker run -d --hostname rabbit --name some-rabbit -p 15672:15672 rabbitmq:3-management

    image

    设置如下 image

    报错截图如下: image

    哪怕账号密码乱打然后使用docker logs rabbit-log都看不到任何相关报错,怀疑是IP/端口问题,但怎么看都不像是有问题的样子。

    没接触过RABBITMQ和相关模块,折磨一天百度谷歌无果,特此发问,感谢回复!

    opened by KagamigawaMeguri 2
  • AttributeError: 'ChromeDriver' object has no attribute 'driver'

    AttributeError: 'ChromeDriver' object has no attribute 'driver'

    第一次运行时正常,但是后面每次运行都报 root@tomato:/home/tomato/LSpider-1.0.0.1# python3 manage.py SpiderCoreBackendStart --test [INFO] [MainThread] [08:48:14] [SpiderCoreBackendStart.py:35] [Spider] start test spider. [INFO] [MainThread] [08:48:14] [rabbitmqhandler.py:39] [Monitor][INIT][Rabbitmq] New Rabbitmq link to 127.0.0.1 [INFO] [MainThread] [08:48:14] [rabbitmqhandler.py:36] [Monitor][INIT] Rabbitmq init success... [INFO] [MainThread] [08:48:14] [chromeheadless.py:100] [Chrome Headless] Proxy 127.0.0.1:7777 init [ERROR] [MainThread] [08:48:15] [chromeheadless.py:45] [Chrome Headless] ChromeDriver load error. [ERROR] [MainThread] [08:48:15] [SpiderCoreBackendStart.py:47] [Spider] something error, Traceback (most recent call last): File "/home/tomato/LSpider-1.0.0.1/core/chromeheadless.py", line 38, in init self.init_object() File "/home/tomato/LSpider-1.0.0.1/core/chromeheadless.py", line 119, in init_object desired_capabilities=desired_capabilities) File "/usr/local/lib/python3.6/dist-packages/selenium/webdriver/chrome/webdriver.py", line 81, in init desired_capabilities=desired_capabilities) File "/usr/local/lib/python3.6/dist-packages/selenium/webdriver/remote/webdriver.py", line 157, in init self.start_session(capabilities, browser_profile) File "/usr/local/lib/python3.6/dist-packages/selenium/webdriver/remote/webdriver.py", line 252, in start_session response = self.execute(Command.NEW_SESSION, parameters) File "/usr/local/lib/python3.6/dist-packages/selenium/webdriver/remote/webdriver.py", line 321, in execute self.error_handler.check_response(response) File "/usr/local/lib/python3.6/dist-packages/selenium/webdriver/remote/errorhandler.py", line 242, in check_response raise exception_class(message, screen, stacktrace) selenium.common.exceptions.WebDriverException: Message: unknown error: Chrome failed to start: exited abnormally. (unknown error: DevToolsActivePort file doesn't exist) (The process started from chrome location /usr/bin/google-chrome is no longer running, so ChromeDriver is assuming that Chrome has crashed.)

    During handling of the above exception, another exception occurred:

    Traceback (most recent call last): File "/home/tomato/LSpider-1.0.0.1/web/spider/management/commands/SpiderCoreBackendStart.py", line 40, in handle spidercore = SpiderCore(test_target_list) File "/home/tomato/LSpider-1.0.0.1/web/spider/controller/spider.py", line 239, in init self.req = LReq(is_chrome=True) File "/home/tomato/LSpider-1.0.0.1/utils/LReq.py", line 37, in init self.cs = ChromeDriver() File "/home/tomato/LSpider-1.0.0.1/core/chromeheadless.py", line 46, in init exit(0) File "/usr/lib/python3.6/_sitebuiltins.py", line 26, in call raise SystemExit(code) SystemExit: 0

    Exception ignored in: <bound method ChromeDriver.del of <core.chromeheadless.ChromeDriver object at 0x7f1bb6c546d8>> Traceback (most recent call last): File "/home/tomato/LSpider-1.0.0.1/core/chromeheadless.py", line 591, in del self.close_driver() File "/home/tomato/LSpider-1.0.0.1/core/chromeheadless.py", line 586, in close_driver self.driver.quit() AttributeError: 'ChromeDriver' object has no attribute 'driver'

    opened by LuckyT0mat0 2
  • Docker rabbitmq传入环境变量的特性已弃用

    Docker rabbitmq传入环境变量的特性已弃用

    rabbitmq不停报错重启,docker-compose报错信息:

    rabbitmq | error: RABBITMQ_DEFAULT_PASS is set but deprecated rabbitmq | error: RABBITMQ_DEFAULT_USER is set but deprecated rabbitmq | error: RABBITMQ_DEFAULT_VHOST is set but deprecated rabbitmq | error: deprecated environment variables detected

    图片

    官方镜像仓库描述,3.9开始确实停用了这个特性。 图片

    我在docker-compose.yml修改,指定版本3.8。看起来能解决问题。 或者作者按新版推荐的写配置文件方式改一下,嘻嘻 rabbitmq: image: rabbitmq:3.8 container_name: rabbitmq hostname: rabbitmq restart: always

    opened by go1f 0
  • docker搭建后,在lspider的docker环境中执行,如下报错,请大佬告知一下,什么原因

    docker搭建后,在lspider的docker环境中执行,如下报错,请大佬告知一下,什么原因

    /opt/LSpider # python3 manage.py SpiderCoreBackendStart --test
    [INFO] [MainThread] [03:55:17] [SpiderCoreBackendStart.py:35] [Spider] start test spider.
    [INFO] [MainThread] [03:55:17] [rabbitmqhandler.py:39] [Monitor][INIT][Rabbitmq] New Rabbitmq link to rabbitmq
    [INFO] [MainThread] [03:55:17] [rabbitmqhandler.py:36] [Monitor][INIT] Rabbitmq init success...
    [INFO] [MainThread] [03:55:17] [chromeheadless.py:100] [Chrome Headless] Proxy 127.0.0.1:7777 init
    [ERROR] [MainThread] [03:55:17] [chromeheadless.py:45] [Chrome Headless] ChromeDriver load error.
    [ERROR] [MainThread] [03:55:17] [SpiderCoreBackendStart.py:47] [Spider] something error, Traceback (most recent call last):
      File "/usr/local/lib/python3.7/site-packages/selenium/webdriver/common/service.py", line 76, in start
        stdin=PIPE)
      File "/usr/local/lib/python3.7/subprocess.py", line 800, in __init__
        restore_signals, start_new_session)
      File "/usr/local/lib/python3.7/subprocess.py", line 1551, in _execute_child
        raise child_exception_type(errno_num, err_msg, err_filename)
    FileNotFoundError: [Errno 2] No such file or directory: '/opt/LSpider/bin/chromedriver': '/opt/LSpider/bin/chromedriver'
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "/opt/LSpider/core/chromeheadless.py", line 38, in __init__
        self.init_object()
      File "/opt/LSpider/core/chromeheadless.py", line 119, in init_object
        desired_capabilities=desired_capabilities)
      File "/usr/local/lib/python3.7/site-packages/selenium/webdriver/chrome/webdriver.py", line 73, in __init__
        self.service.start()
      File "/usr/local/lib/python3.7/site-packages/selenium/webdriver/common/service.py", line 83, in start
        os.path.basename(self.path), self.start_error_message)
    selenium.common.exceptions.WebDriverException: Message: 'chromedriver' executable needs to be in PATH. Please see https://sites.google.com/a/chromium.org/chromedriver/home
    
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "/opt/LSpider/web/spider/management/commands/SpiderCoreBackendStart.py", line 40, in handle
        spidercore = SpiderCore(test_target_list)
      File "/opt/LSpider/web/spider/controller/spider.py", line 239, in __init__
        self.req = LReq(is_chrome=True)
      File "/opt/LSpider/utils/LReq.py", line 37, in __init__
        self.cs = ChromeDriver()
      File "/opt/LSpider/core/chromeheadless.py", line 46, in __init__
        exit(0)
      File "/usr/local/lib/python3.7/_sitebuiltins.py", line 26, in __call__
        raise SystemExit(code)
    SystemExit: 0
    
    Exception ignored in: <function ChromeDriver.__del__ at 0x7f91f2b63680>
    Traceback (most recent call last):
      File "/opt/LSpider/core/chromeheadless.py", line 591, in __del__
        self.close_driver()
      File "/opt/LSpider/core/chromeheadless.py", line 586, in close_driver
        self.driver.quit()
    AttributeError: 'ChromeDriver' object has no attribute 'driver'
    
    opened by uunnsec 3
Releases(1.0.2)
Owner
Knownsec, Inc.
Knownsec, Inc.