A dead simple crawler to get books information from Douban.

Overview

Introduction

A dead simple crawler to get books information from Douban.

Pre-requesites

  • Python 3
  • Install dependencies from requirements.txt
  • (Optional) Install Anaconda to handle environment

Usage

On Local Machine

  1. Run get_tags to fetch all the trending tags.
# This will generate a file tags.csv
python app.py get_tags -o /your-output-dir
  1. Run crawl_books to start crawling the books by the tags from the previous step.
python app.py crawl_books -i /some-where/tags.csv -o /your-output-dir

Certainly, you can create the tags.csv without using the get_tags script. You might want to make sure the tags you specified can lead to any actual result of books.

Docker Compose

You'll need to install Docker and the docker-compose command before proceeding.

docker-compose build --no-cache && docker-compose up -d --force-recreate

You can omit --force-recreate if you want to keep the container when the configuration or the image hasn't changed.

License

MIT © mogita

You might also like...
Audio media crawler for lbry.
Audio media crawler for lbry.

Audio media crawler for lbry. Requirements Python 3.8 Poetry 1.1.7 Elasticsearch 7.14.0 Lbry-sdk 0.99.0 Development This project uses poetry as a depe

Crawler job that scrapes comments from social media posts and saves them in a S3 bucket.
Crawler job that scrapes comments from social media posts and saves them in a S3 bucket.

Toxicity comments crawler Crawler job that scrapes comments from social media posts and saves them in a S3 bucket. Twitter Tweets and replies are scra

A crawler of doubamovie

豆瓣电影 A crawler of doubamovie 一个小小的入门级scrapy框架的应用,选取豆瓣电影对排行榜前1000的电影数据进行爬取。 spider.py start_requests方法为scrapy的方法,我们对它进行重写。 def start_requests(self):

A web crawler script that crawls the target website and lists its links

A web crawler script that crawls the target website and lists its links || A web crawler script that lists links by scanning the target website.

Deep Web Miner Python | Spyder Crawler

Webcrawler written in Python. This crawler does dig in till the 3 level of inside addressed and mine the respective data accordingly

Crawler do site Fundamentus.com com o uso do framework scrapy, tanto da aba detalhada como a de resumo.

Crawler do site Fundamentus.com com o uso do framework scrapy, tanto da aba detalhada como a de resumo. (Todas as infomações)

A Pixiv web crawler module

Pixiv-spider A Pixiv spider module WARNING It's an unfinished work, browsing the code carefully before using it. Features 0004 - Readme.md updated, co

Python script to check if there is any differences in responses of an application when the request comes from a search engine's crawler.
Python script to check if there is any differences in responses of an application when the request comes from a search engine's crawler.

crawlersuseragents This Python script can be used to check if there is any differences in responses of an application when the request comes from a se

Google Maps crawler using Selenium
Google Maps crawler using Selenium

Google Maps Crawler using Selenium Built as part of the Antifragile Dev Project Selenium crawler that browses Google Maps as a regular user and stores

Comments
  • Feat: postgres

    Feat: postgres

    This PR integrates PostgreSQL to handle the data storage and progress resuming. Also the proxy pool has been made optional with an environment variable WITHOUT_PROXY=yes.

    opened by mogita 1
  • Feat: proxy pool override

    Feat: proxy pool override

    This PR customizes the proxy_pool component to fetch proxies from a specified source by custom conditions. I'm using Zhima HTTP Proxy as a provider for now.

    Also I re-wrote the HTTP functions to make proxy handling in a central component.

    I used xpinyin to mitigate the bad-shaped filenames when Chinese characters present.

    opened by mogita 1
Owner
Yun Wang
Yun Wang
🐞 Douban Movie / Douban Book Scarpy

Python3-based Douban Movie/Douban Book Scarpy crawler for cover downloading + data crawling + review entry.

Xingbo Jia 1 Dec 3, 2022
Create crawler get some new products with maximum discount in banimode website

crawler-banimode create crawler and get some new products with maximum discount in banimode website. این پروژه کوچک جهت یادگیری و کار با ابزار سلنیوم

nourollah rezaei 2 Feb 17, 2022
A Web Scraper built with beautiful soup, that fetches udemy course information. Get udemy course information and convert it to json, csv or xml file

Udemy Scraper A Web Scraper built with beautiful soup, that fetches udemy course information. Installation Virtual Environment Firstly, it is recommen

Aditya Gupta 15 May 17, 2022
This is a simple website crawler which asks for a website link from the user to crawl and find specific data from the given website address.

This is a simple website crawler which asks for a website link from the user to crawl and find specific data from the given website address.

Faisal Ahmed 1 Jan 10, 2022
A Powerful Spider(Web Crawler) System in Python.

pyspider A Powerful Spider(Web Crawler) System in Python. Write script in Python Powerful WebUI with script editor, task monitor, project manager and

Roy Binux 15.7k Jan 4, 2023
Distributed Crawler Management Framework Based on Scrapy, Scrapyd, Django and Vue.js

Gerapy Distributed Crawler Management Framework Based on Scrapy, Scrapyd, Scrapyd-Client, Scrapyd-API, Django and Vue.js. Documentation Documentation

Gerapy 2.9k Jan 3, 2023
Incredibly fast crawler designed for OSINT.

Photon Incredibly fast crawler designed for OSINT. Photon Wiki • How To Use • Compatibility • Photon Library • Contribution • Roadmap Key Features Dat

Somdev Sangwan 9.3k Jan 2, 2023
A low-code tool that generates python crawler code based on curl or url

KKBA Intruoduction A low-code tool that generates python crawler code based on curl or url Requirement Python >= 3.6 Install pip install kkba Usage Co

null 8 Sep 20, 2021
The core packages of security analyzer web crawler

Security Analyzer ?? A large scale web crawler (considered also as vulnerability scanner tool) to take an overview about security of Moroccan sites Cu

Security Analyzer 10 Jul 3, 2022
Crawler in Python 3.7, 3.8. 3.9. Pypy3

Description Python Crawler written Python 3. (Supports major Python releases Python3.6, Python3.7 and Python 3.8) Installation and Use Setup VirtualEn

Vinit Kumar 2 Mar 12, 2022