a high-performance, lightweight and human friendly serving engine for scrapy

Overview

scrapy-x (X)

a distributed, scalable and lightweight environment for deploying and running scrapy spiders/projects with no-hassle on commodity hardware, also it is compatible with scrapyd /schedule.json and /daemonstatus.json.

Installation

$ pip install -U git+git://github.com/speakol-ads/scrapy-x.git

Usage

let's assume that you have a project called TestCrawler

  • cd to TestCrawler
  • run scrapy x
  • that is all!

Default Settings

it utilizes your default project settings.py file

# whether to enable debug mode or not
X_DEBUG = True

# the default queue name that the system will use
# actually it will be used as a prefix for its internal
# queues, currently there is only one queue called `X_QUEUE_NAME + '.BACKLOG'`
# which holds all jobs that should be crawled.
X_QUEUE_NAME = 'SCRAPY_X_QUEUE'

# the queue workers
# by default it uses the cpu cores count
# try to adjust it based on your resources & needs
X_QUEUE_WORKERS_COUNT = os.cpu_count()

# the webserver workers count
# the workers count required from uvicorn to spwan
# defaults to the available cpu count
# try to adjust it based on your resources & needs
X_SERVER_WORKERS_COUNT = os.cpu_count()

# the port the http server should listen on
X_SERVER_LISTEN_PORT = 6800

# the host used by the http server to listen on
X_SERVER_LISTEN_HOST = '0.0.0.0'

# whether to enable access log or not
X_ENABLE_ACCESS_LOG = True

# redis host
X_REDIS_HOST = 'localhost'

# redis port
X_REDIS_PORT = 6379

# redis db
X_REDIS_DB = 0

# redis password
X_REDIS_PASSWORD = ''

# the maximum allowed wait time for a running task
# it will be killed after that time.
X_TASK_TIMEOUT = 25

Available Endpoints

as well scrapyd core endpoints like (schedule.json, daemonstatus.json), you have the following too:

GET /

returns some info about the engine like the available spiders and backlog queue length

GET|POST /run/{spider_name}

execute the specified spider in {spider_name} and wait for it to return its result, P.S: any query param and json post data will be passed to the spider as argument -a key=value

GET|POST /enqueue/{spider_name}

adding the specified spider in {spider_name} to the backlog to be executed later, P.S: any query param and json post data will be used as spider argument

Technologies Used

Author

I'm Mohamed, a software engineer who enjoys writing code in his free time, I'm speaking python, php, go, rust and js

My Similar Projects

P.S: star the project if you liked it ^_^

You might also like...
Scraping news from Ucsal portal with Scrapy.

NewsScraping Esse é um projeto de raspagem das últimas noticias, de 2021, do portal da universidade Ucsal http://noosfero.ucsal.br/institucional Tecno

Fundamentus scrapy

Fundamentus_scrapy Baixa informacões que os outros scrapys do fundamentus não realizam. Para iniciar (python main.py), sera criado um arquivo chamado

Crawler do site Fundamentus.com com o uso do framework scrapy, tanto da aba detalhada como a de resumo.

Crawler do site Fundamentus.com com o uso do framework scrapy, tanto da aba detalhada como a de resumo. (Todas as infomações)

Scrapy-based cyber security news finder

Cyber-Security-News-Scraper Scrapy-based cyber security news finder Goal To keep up to date on the constant barrage of information within the field of

Searching info from Google using Python Scrapy
Searching info from Google using Python Scrapy

Python-Search-Engine-Scrapy || Python-爬虫-索引/利用爬虫获取谷歌信息**/ Searching info from Google using Python Scrapy /* 利用 PYTHON 爬虫获取天气信息,以及城市信息和资料**/ translatio

Amazon scraper using scrapy, a python framework for crawling websites.

#Amazon-web-scraper This is a python program, which use scrapy python framework to crawl all pages of the product and scrap products data. This progra

This is a web scraper, using Python framework Scrapy, built to extract data  from the Deals of the Day section on Mercado Livre website.
This is a web scraper, using Python framework Scrapy, built to extract data from the Deals of the Day section on Mercado Livre website.

Deals of the Day This is a web scraper, using the Python framework Scrapy, built to extract data such as price and product name from the Deals of the

Iptvcrawl - A scrapy project for crawl IPTV playlist

iptvcrawl a scrapy project for crawl IPTV playlist. Dependency Python3 pip insta

Amazon web scraping using Scrapy Framework

Amazon-web-scraping-using-Scrapy-Framework Scrapy Scrapy is an application framework for crawling web sites and extracting structured data which can b

Owner
Speakol Ads
Speakol Ads
a Scrapy spider that utilizes Postgres as a DB, Squid as a proxy server, Redis for de-duplication and Splash to render JavaScript. All in a microservices architecture utilizing Docker and Docker Compose

This is George's Scraping Project To get started cd into the theZoo file and run: chmod +x script.sh then: ./script.sh This will spin up a Postgres co

George Reyes 7 Nov 27, 2022
Distributed Crawler Management Framework Based on Scrapy, Scrapyd, Django and Vue.js

Gerapy Distributed Crawler Management Framework Based on Scrapy, Scrapyd, Scrapyd-Client, Scrapyd-API, Django and Vue.js. Documentation Documentation

Gerapy 2.9k Jan 3, 2023
Scrapy uses Request and Response objects for crawling web sites.

Requests and Responses¶ Scrapy uses Request and Response objects for crawling web sites. Typically, Request objects are generated in the spiders and p

Md Rashidul Islam 1 Nov 3, 2021
This Spider/Bot is developed using Python and based on Scrapy Framework to Fetch some items information from Amazon

- Hello, This Project Contains Amazon Web-bot. - I've developed this bot for fething some items information on Amazon. - Scrapy Framework in Python is

Khaled Tofailieh 4 Feb 13, 2022
Bigdata - This Scrapy project uses Redis and Kafka to create a distributed on demand scraping cluster

Scrapy Cluster This Scrapy project uses Redis and Kafka to create a distributed

Hanh Pham Van 0 Jan 6, 2022
A scrapy pipeline that provides an easy way to store files and images using various folder structures.

scrapy-folder-tree This is a scrapy pipeline that provides an easy way to store files and images using various folder structures. Supported folder str

Panagiotis Simakis 7 Oct 23, 2022
Visual scraping for Scrapy

Portia Portia is a tool that allows you to visually scrape websites without any programming knowledge required. With Portia you can annotate a web pag

Scrapinghub 8.7k Jan 5, 2023
An experiment to deploy a serverless infrastructure for a scrapy project.

Serverless Scrapy project This project aims to evaluate the feasibility of an architecture based on serverless technology for a web crawler using scra

José Ferraz Neto 5 Jul 8, 2022
Snowflake database loading utility with Scrapy integration

Snowflake Stage Exporter Snowflake database loading utility with Scrapy integration. Meant for streaming ingestion of JSON serializable objects into S

Oleg T. 0 Dec 6, 2021
download NCERT books using scrapy

download_ncert_books download NCERT books using scrapy Downloading Books: You can either use the spider by cloning this repo and following the instruc

null 1 Dec 2, 2022