191 Python Scraping-websites Libraries

A quick username checker to see if a username is available on a list of assorted websites.

4 Jan 4, 2022

Consulta de CPF e CNPJ na Receita Federal com Web-Scraping

Repositório contendo scripts Python que realizam a consulta de CPF e CNPJ diretamente no site da Receita Federal.

5 Nov 29, 2021

A Python module to bypass Cloudflare's anti-bot page.

cloudflare-scrape A simple Python module to bypass Cloudflare's anti-bot page (also known as "I'm Under Attack Mode", or IUAM), implemented with Reque

3k Jan 4, 2023

Scraping Bot for the Covid19 vaccination website of the Canton of Zurich, Switzerland.

Hi 👋 , I'm David A passionate developer from France. 🌱 I’m currently learning Kotlin, ReactJS and Kubernetes 👨‍💻 All of my projects are available

1 Nov 14, 2021

Here I provide the source code for doing web scraping using the python library, it is Selenium.

1 Nov 13, 2021

CiteURL is an extensible tool that parses legal citations and makes links to websites where you can read the cited language for free.

CiteURL is an extensible tool that parses legal citations and makes links to websites where you can read the cited language for free. It can be used t

15 Dec 27, 2022

Implementation of the bachelor's thesis "Real-time stock predictions with deep learning and news scraping".

Real-time stock predictions with deep learning and news scraping This repository contains a partial implementation of my bachelor's thesis "Real-time

0 Feb 9, 2022

Scraping comments from the political section of popular Nigerian blog (Nairaland), and saving in a CSV file.

Scraping_Nairaland This project scraped comments from the political section of popular Nigerian blog www.nairaland.com using the Python BeautifulSoup

1 Nov 14, 2021

4CAT: Capture and Analysis Toolkit

4CAT: Capture and Analysis Toolkit 4CAT is a research tool that can be used to analyse and process data from online social platforms. Its goal is to m

147 Dec 20, 2022

Scraping weather data using Python to receive umbrella reminders

A Python package which scrapes weather data from google and sends umbrella reminders to specified email at specified time daily.

1 Aug 23, 2022

Web Scraping Practica With Python

Web-Scraping-Practica Integrants: Guillem Vidal Pallarols. Lídia Bandrés Solé Fitxers: Aquest document és el primer que trobem. A continuació trobem u

2 Nov 8, 2021

Facebook Group Scraping Using Beautiful Soup & Selenium

Extract Facebook group posts that are related to a specific topic and write them to a .json file.

14 Aug 12, 2022

Sqli-Scanner is a python3 script written to scan websites for SQL injection vulnerabilities

Sqli-Scanner is a python3 script written to scan websites for SQL injection vulnerabilities Features 1 Scan one website 2 Scan multiple websites Insta

9 Dec 30, 2022

A project that automatically sends you a Medium article on a topic of your choosing to your email address daily.

Daily Article from Medium ✏️ About A project that automatically sends you a Medium article on a topic of your choosing to your email address daily. No

2 Apr 27, 2022

PyJPBoatRace: Python-based Japanese boatrace tools 🚤

pyjpboatrace :speedboat: provides you with useful tools for data analysis and auto-betting for boatrace.

5 Oct 29, 2022

Scraping web pages to get data

Scraping Data Get public data and save in database This is project use Python How to run a project 1 - Clone the repository 2 - Install beautifulsoup4

2 Nov 1, 2021

Web-Scraping using Selenium Master

Web-Scraping using Selenium What is the need of Selenium? Some websites don't like to be scrapped and in that case you need to disguise your webscrapi

1 Oct 26, 2021

A one place destination to check whatever is trending on the top social and news websites at present.

UpTrend A one place destination to check whatever is trending on the top social and news websites at present. Explore the docs » View Demo · Report Bu

10 Oct 3, 2021

Example of scraping a paginated API endpoint and dumping the data into a DB

Provider API Scraper Example Example of scraping a paginated API endpoint and dumping the data into a DB. Pre-requisits Python = 3.9 Pipenv Setup # i

1 Oct 20, 2021

PyMultiDictionary is a Dictionary Module for Python 3+ to get meanings, translations, synonyms and antonyms of words in 20 different languages

PyMultiDictionary PyMultiDictionary is a Dictionary Module for Python 3+ to get meanings, translations, synonyms and antonyms of words in 20 different

19 Dec 26, 2022

Web Scraping, Document Deduplication & GPT-2 Fine-tuning with a newly created scam dataset.

18 Nov 28, 2022

Username reconnaisance tool that checks the availability of a specified username on over 200 websites.

Username reconnaisance tool that checks the availability of a specified username on over 200 websites. Installation & Usage Clone from Github: $ git c

20 Oct 30, 2022

Subdomain enumeration,Web scraping and finding usernames automation script written in python

12 Nov 22, 2022

Better GitHub statistics images for your profile, with stats from private and public repos

2k Dec 30, 2022

Custom Selenium Chromedriver | Zero-Config | Passes ALL bot mitigation systems (like Distil / Imperva/ Datadadome / CloudFlare IUAM)

3.5k Dec 30, 2022

Scraping news from Ucsal portal with Scrapy.

NewsScraping Esse é um projeto de raspagem das últimas noticias, de 2021, do portal da universidade Ucsal http://noosfero.ucsal.br/institucional Tecno

0 Sep 30, 2021

A Python package that scrapes Google News article data while remaining undetected by Google.

A Python package that scrapes Google News article data while remaining undetected by Google. Our scraper can scrape page data up until the last page and never trigger a CAPTCHA (download stats: https://pepy.tech/project/GoogleNewsScraper)

6 Aug 10, 2022

Using Selenium with Python to Web Scrap Popular Youtube Tech Channels.

Web Scrapping Popular Youtube Tech Channels with Selenium Data Mining, Data Wrangling, and Exploratory Data Analysis About the Data Web scrapi

0 Aug 18, 2021

The core packages of security analyzer web crawler

Security Analyzer 🐍 A large scale web crawler (considered also as vulnerability scanner tool) to take an overview about security of Moroccan sites Cu

10 Jul 3, 2022

The open-source web scrapers that feed the Los Angeles Times California coronavirus tracker.

The open-source web scrapers that feed the Los Angeles Times' California coronavirus tracker. Processed data ready for analysis is available at datade

Los Angeles Times Data and Graphics Department

51 Dec 14, 2022

Current Antarctic large iceberg positions derived from ASCAT and OSCAT-2

Iceberg Locations Antarctic large iceberg positions derived from ASCAT and OSCAT-2. All data collected here are from the NASA SCP website Overview Thi

5 Jul 27, 2022

A tool for scraping and organizing data from NewsBank API searches

nbscraper Overview This simple tool automates the process of copying, pasting, and organizing data from NewsBank API searches. Curerntly, nbscrape onl

0 Jun 17, 2021

A web scraping pipeline project that retrieves TV and movie data from two sources, then transforms and stores data in a MySQL database.

New to Streaming Scraper An in-progress web scraping project built with Python, R, and SQL. The scraped data are movie and TV show information. The go

1 Mar 28, 2022

Universal Reddit Scraper - A comprehensive Reddit scraping command-line tool written in Python.

543 Jan 3, 2023

PyQuery-based scraping micro-framework.

demiurge PyQuery-based scraping micro-framework. Supports Python 2.x and 3.x. Documentation: http://demiurge.readthedocs.org Installing demiurge $ pip

109 Jul 20, 2022

NASA APOD Discord Bot - Fetches information from NASA APOD site.

4 Apr 23, 2022

Python SCript to scrape members from a selected Telegram group.

A python script to scrape all the members in a telegram group anad save in a CSV file. REGESTRING Go to this link https://core.telegram.org/api/obtain

7 Dec 1, 2022

A script that will warn you, by opening a new browser tab, when there are new content in your favourite websites.

web check A script that will warn you, by opening a new browser tab, when there are new content in your favourite websites. What it does The script wi

52 Mar 15, 2022

A Web Scraping Program.

Web Scraping AUTHOR: Saurabh G. MTech Information Security, IIT Jammu. If you find this repository useful. I would appreciate if you Star it and Fork

2 Dec 14, 2022

Ross Virtual Assistant is a programme which can play Music, search Wikipedia, open Websites and much more.

Ross-Virtual-Assistant Ross Virtual Assistant is a programme which can play Music, search Wikipedia, open Websites and much more. Installation Downloa

4 Nov 8, 2021

Django-registration (redux) provides user registration functionality for Django websites.

Description: Django-registration provides user registration functionality for Django websites. maintainers: Macropin, DiCato, and joshblum contributor

920 Jan 8, 2023

Automated network configuration backups using Github actions and git-scraping

Network Config Scraper This repository demonstrates the use of Github Actions and git-scraping to build an automated backup solution for network confi

19 Dec 14, 2022

Docker containerized Python Flask API that uses selenium to scrape and interact with websites

0 Jan 22, 2022

Django API that scrapes and provides the last news of the city of Carlos Casares by semantic way (RDF format).

"Casares News" API Api that scrapes and provides the last news of the city of Carlos Casares by semantic way (RDF format). Usage Consume the articles

6 May 12, 2022

Amazon Scraper: A command-line tool for scraping Amazon product data

Amazon Product Scraper: 2021 Description A command-line tool for scraping Amazon product data to CSV or JSON format(s). Requirements Python 3 pip3 Ins

49 Nov 15, 2021

A Web Scraper built with beautiful soup, that fetches udemy course information. Get udemy course information and convert it to json, csv or xml file

Udemy Scraper A Web Scraper built with beautiful soup, that fetches udemy course information. Installation Virtual Environment Firstly, it is recommen

15 May 17, 2022

Scraping Thailand COVID-19 data from the DDC's tableau dashboard

Scraping COVID-19 data from DDC Dashboard Scraping Thailand COVID-19 data from the DDC's tableau dashboard. Data is updated at 07:30 and 08:00 daily.

5 Jan 4, 2022

Scraping and analysis of leetcode-compensations page.

Leetcode compensations report Scraping and analysis of leetcode-compensations page.

96 Jan 1, 2023

A Python library to utilize AWS API Gateway's large IP pool as a proxy to generate pseudo-infinite IPs for web scraping and brute forcing.

929 Jan 1, 2023

Command line program to download documents from web portals.

command line document download made easy Highlights list available documents in json format or download them filter documents using string matching re

16 Dec 26, 2022

Download images from forum threads

Forum Image Scraper Downloads images from forum threads Only works with forums which doesn't require a login to view and have an incremental paginatio

9 Nov 16, 2022

mlscraper: Scrape data from HTML pages automatically with Machine Learning

🤖 Scrape data from HTML websites automatically with Machine Learning

798 Dec 29, 2022

crypto currency scraping

SCRYPTO What ? Crypto currencies scraping (At the moment, only bitcoin and ethereum crypto currencies are supported) How ? A python script is running

15 Sep 1, 2022

Autotype on websites that have copy-paste disabled like Moodle, HackerEarth contest etc.

Autotype A quick and small python script that helps you autotype on websites that have copy paste disabled like Moodle, HackerEarth contests etc as it

32 Nov 3, 2022

This is a python based web scraping bot for windows to download all ACCEPTED submissions of any user on Codeforces

CODEFORCES DOWNLOADER This is a python based web scraping bot for windows to download all ACCEPTED submissions of any user on Codeforces Requirements

6 Dec 29, 2022

Create Multiple CF entry for multiple websites

AWS-CloudFront Problem: Deploy multiple CloudFront for account with multiple domains. Functionality: Running this script in loop and deploy CloudFront

5 Nov 18, 2022

Minimal set of tools to conduct stealthy scraping.

Stealthy Scraping Tools Do not use puppeteer and playwright for scraping. Explanation. We only use the CDP to obtain the page source and to get the ab

88 Jan 4, 2023

This little tool is to calculate a MurmurHash value of a favicon to hunt phishing websites on the Shodan platform.

MurMurHash This little tool is to calculate a MurmurHash value of a favicon to hunt phishing websites on the Shodan platform. What is MurMurHash? Murm

87 Dec 31, 2022

Campsite Reservation Finder

yellowstone-camping UPDATE: yellowstone-camping is being expanded and renamed to camply. The updated tool now interfaces with the Recreation.gov API a

233 Jan 8, 2023

Campsite Reservation Cancellation Finder (Yellowstone National Park)

yellowstone-camping yellowstone-camping is a Campsite Reservation Cancellation Finder for Yellowstone National Park. This simple Python application wi

7 Aug 5, 2022

A repository with scraping code and soccer dataset from understat.com.

UNDERSTAT - SHOTS DATASET As many people interested in soccer analytics know, Understat is an amazing source of information. They provide Expected Goa

48 Jan 3, 2023

FireDM is a python open source (Internet Download Manager) with multi-connections, high speed engine, it downloads general files and videos from youtube and tons of other streaming websites .

python open source (Internet Download Manager) with multi-connections, high speed engine, based on python, LibCurl, and youtube_dl https://github.com/firedm/FireDM

1.6k Apr 12, 2022

Bulk Downloader for Reddit

saveddit is a bulk media downloader for reddit pip3 install saveddit Setting up authorization Register an application with Reddit Write down your clie

136 Jan 3, 2023

Polyglot Machine Learning example for scraping similar news articles.

Polyglot Machine Learning example for scraping similar news articles In this example, we will see how we can work with Machine Learning applications w

15 Mar 28, 2022

ZeroNet - Decentralized websites using Bitcoin crypto and BitTorrent network

ZeroNet Decentralized websites using Bitcoin crypto and the BitTorrent network - https://zeronet.io / onion Why? We believe in open, free, and uncenso

17.8k Jan 3, 2023

A tool for exporting Telegram group chats into static websites, preserving chat history like mailing list archives.

tg-archive is a tool for exporting Telegram group chats into static websites, preserving chat history like mailing list archives. Preview The @fossuni

400 Dec 27, 2022

Powerful Telegram Members Scraping and Adding Toolkit

🔥 Genisys V2.1 Powerful Telegram Members Scraping and Adding Toolkit 🔻 Features 🔺 ADDS IN BULK[by user id, not by username] Scrapes and adds to pub

16 Mar 1, 2022

A declarative website generator designed for high-quality websites, with a focus on easy maintenance and localization.

Grow Grow is a declarative tool for rapidly building, launching, and maintaining high-quality static HTML. Easy installation Jinja template engine Con

385 Dec 3, 2022

Finds Jobs on LinkedIn using web-scraping

Find Jobs on LinkedIn 📔 This program finds jobs by scraping on LinkedIn 👨‍💻 Relies on User Input. Accepts: Country, City, State 📑 Data about jobs

44 Dec 27, 2022

dirmaker is a simple, opinionated static site generator for quickly publishing directory websites.

dirmaker is a simple, opinionated static site generator for publishing directory websites (eg: Indic.page, env.wiki It takes entries from a YAML file and generates a categorised, paginated directory website.

40 Nov 20, 2022

Ella is a CMS based on Python web framework Django with a main focus on high-traffic news websites and Internet magazines.

Ella CMS Ella is opensource CMS based on Django framework, designed for flexibility. It is composed from several modules: Ella core is the main module

295 Oct 16, 2022

Securely and anonymously share files, host websites, and chat with friends using the Tor network

OnionShare OnionShare is an open source tool that lets you securely and anonymously share files, host websites, and chat with friends using the Tor ne

5.4k Jan 2, 2023

A pure-python HTML screen-scraping library

Scrapely Scrapely is a library for extracting structured data from HTML pages. Given some example web pages and the data to be extracted, scrapely con

1.8k Dec 31, 2022

Transistor, a Python web scraping framework for intelligent use cases.

Web data collection and storage for intelligent use cases. transistor About The web is full of data. Transistor is a web scraping framework for collec

212 Nov 5, 2022

🥫 The simple, fast, and modern web scraping library

About gazpacho is a simple, fast, and modern web scraping library. The library is stable, actively maintained, and installed with zero dependencies. I

692 Dec 22, 2022

Web scraping library and command-line tool for text discovery and extraction (main content, metadata, comments)

trafilatura: Web scraping tool for text discovery and retrieval Description Trafilatura is a Python package and command-line tool which seamlessly dow

704 Jan 6, 2023

Async Python 3.6+ web scraping micro-framework based on asyncio

Ruia 🕸️ Async Python 3.6+ web scraping micro-framework based on asyncio. ⚡ Write less, run faster. Overview Ruia is an async web scraping micro-frame

1.6k Jan 1, 2023

A Smart, Automatic, Fast and Lightweight Web Scraper for Python

AutoScraper: A Smart, Automatic, Fast and Lightweight Web Scraper for Python This project is made for automatic web scraping to make scraping easy. It

4.8k Jan 4, 2023

Parsel lets you extract data from XML/HTML documents using XPath or CSS selectors

Parsel Parsel is a BSD-licensed Python library to extract and remove data from HTML and XML using XPath and CSS selectors, optionally combined with re

859 Dec 29, 2022

Web mining module for Python, with tools for scraping, natural language processing, machine learning, network analysis and visualization.

Pattern Pattern is a web mining module for Python. It has tools for: Data Mining: web services (Google, Twitter, Wikipedia), web crawler, HTML DOM par

Computational Linguistics Research Group

8.4k Jan 8, 2023

Scrapy, a fast high-level web crawling & scraping framework for Python.

Scrapy Overview Scrapy is a fast high-level web crawling and web scraping framework, used to crawl websites and extract structured data from their pag

45.5k Jan 7, 2023

Python binding to Modest engine (fast HTML5 parser with CSS selectors).

A fast HTML5 parser with CSS selectors using Modest engine. Installation From PyPI using pip: pip install selectolax Development version from github:

710 Jan 4, 2023

Pythonic HTML Parsing for Humans™

Requests-HTML: HTML Parsing for Humans™ This library intends to make parsing HTML (e.g. scraping the web) as simple and intuitive as possible. When us

12.9k Jan 1, 2023

Official python API for Phish.AI public and private API to detect zero-day phishing websites

phish-ai-api Summary Official python API for Phish.AI public and private API to detect zero-day phishing websites How it Works (TLDR) Essentially we h

168 May 17, 2022

Download song lyrics and metadata from Genius.com 🎶🎤

LyricsGenius: a Python client for the Genius.com API lyricsgenius provides a simple interface to the song, artist, and lyrics data stored on Genius.co

738 Jan 4, 2023

Pythonic HTML Parsing for Humans™

Requests-HTML: HTML Parsing for Humans™ This library intends to make parsing HTML (e.g. scraping the web) as simple and intuitive as possible. When us

12.9k Jan 1, 2023

Web mining module for Python, with tools for scraping, natural language processing, machine learning, network analysis and visualization.

Pattern Pattern is a web mining module for Python. It has tools for: Data Mining: web services (Google, Twitter, Wikipedia), web crawler, HTML DOM par

8.4k Dec 30, 2022

Web mining module for Python, with tools for scraping, natural language processing, machine learning, network analysis and visualization.

Pattern Pattern is a web mining module for Python. It has tools for: Data Mining: web services (Google, Twitter, Wikipedia), web crawler, HTML DOM par

8.4k Jan 3, 2023

Python Scraping-websites Resources

Python scraping-websites Libraries

A quick username checker to see if a username is available on a list of assorted websites.

Consulta de CPF e CNPJ na Receita Federal com Web-Scraping

A Python module to bypass Cloudflare's anti-bot page.

Scraping Bot for the Covid19 vaccination website of the Canton of Zurich, Switzerland.

Here I provide the source code for doing web scraping using the python library, it is Selenium.

CiteURL is an extensible tool that parses legal citations and makes links to websites where you can read the cited language for free.

Implementation of the bachelor's thesis "Real-time stock predictions with deep learning and news scraping".

Scraping comments from the political section of popular Nigerian blog (Nairaland), and saving in a CSV file.

4CAT: Capture and Analysis Toolkit

Scraping weather data using Python to receive umbrella reminders

Web Scraping Practica With Python

Facebook Group Scraping Using Beautiful Soup & Selenium

Sqli-Scanner is a python3 script written to scan websites for SQL injection vulnerabilities

A project that automatically sends you a Medium article on a topic of your choosing to your email address daily.

PyJPBoatRace: Python-based Japanese boatrace tools 🚤

Scraping web pages to get data

Web-Scraping using Selenium Master

A one place destination to check whatever is trending on the top social and news websites at present.

Example of scraping a paginated API endpoint and dumping the data into a DB

PyMultiDictionary is a Dictionary Module for Python 3+ to get meanings, translations, synonyms and antonyms of words in 20 different languages

Web Scraping, Document Deduplication & GPT-2 Fine-tuning with a newly created scam dataset.

Username reconnaisance tool that checks the availability of a specified username on over 200 websites.

Subdomain enumeration,Web scraping and finding usernames automation script written in python

Better GitHub statistics images for your profile, with stats from private and public repos

Custom Selenium Chromedriver | Zero-Config | Passes ALL bot mitigation systems (like Distil / Imperva/ Datadadome / CloudFlare IUAM)

Scraping news from Ucsal portal with Scrapy.

A Python package that scrapes Google News article data while remaining undetected by Google.

Using Selenium with Python to Web Scrap Popular Youtube Tech Channels.

The core packages of security analyzer web crawler

The open-source web scrapers that feed the Los Angeles Times California coronavirus tracker.

Current Antarctic large iceberg positions derived from ASCAT and OSCAT-2

A tool for scraping and organizing data from NewsBank API searches

A web scraping pipeline project that retrieves TV and movie data from two sources, then transforms and stores data in a MySQL database.

Universal Reddit Scraper - A comprehensive Reddit scraping command-line tool written in Python.

PyQuery-based scraping micro-framework.

NASA APOD Discord Bot - Fetches information from NASA APOD site.

Python SCript to scrape members from a selected Telegram group.

A script that will warn you, by opening a new browser tab, when there are new content in your favourite websites.

A Web Scraping Program.

Ross Virtual Assistant is a programme which can play Music, search Wikipedia, open Websites and much more.

Django-registration (redux) provides user registration functionality for Django websites.

Automated network configuration backups using Github actions and git-scraping

Docker containerized Python Flask API that uses selenium to scrape and interact with websites

Django API that scrapes and provides the last news of the city of Carlos Casares by semantic way (RDF format).

Amazon Scraper: A command-line tool for scraping Amazon product data

A Web Scraper built with beautiful soup, that fetches udemy course information. Get udemy course information and convert it to json, csv or xml file

Scraping Thailand COVID-19 data from the DDC's tableau dashboard

Scraping and analysis of leetcode-compensations page.

A Python library to utilize AWS API Gateway's large IP pool as a proxy to generate pseudo-infinite IPs for web scraping and brute forcing.

Command line program to download documents from web portals.

Download images from forum threads

mlscraper: Scrape data from HTML pages automatically with Machine Learning

crypto currency scraping

Autotype on websites that have copy-paste disabled like Moodle, HackerEarth contest etc.

This is a python based web scraping bot for windows to download all ACCEPTED submissions of any user on Codeforces

Create Multiple CF entry for multiple websites

Minimal set of tools to conduct stealthy scraping.

This little tool is to calculate a MurmurHash value of a favicon to hunt phishing websites on the Shodan platform.

Campsite Reservation Finder

Campsite Reservation Cancellation Finder (Yellowstone National Park)

A repository with scraping code and soccer dataset from understat.com.

FireDM is a python open source (Internet Download Manager) with multi-connections, high speed engine, it downloads general files and videos from youtube and tons of other streaming websites .

Bulk Downloader for Reddit

Polyglot Machine Learning example for scraping similar news articles.

ZeroNet - Decentralized websites using Bitcoin crypto and BitTorrent network

A tool for exporting Telegram group chats into static websites, preserving chat history like mailing list archives.

Powerful Telegram Members Scraping and Adding Toolkit

A declarative website generator designed for high-quality websites, with a focus on easy maintenance and localization.

Finds Jobs on LinkedIn using web-scraping

dirmaker is a simple, opinionated static site generator for quickly publishing directory websites.

Ella is a CMS based on Python web framework Django with a main focus on high-traffic news websites and Internet magazines.

Securely and anonymously share files, host websites, and chat with friends using the Tor network

A pure-python HTML screen-scraping library

Transistor, a Python web scraping framework for intelligent use cases.

🥫 The simple, fast, and modern web scraping library

Web scraping library and command-line tool for text discovery and extraction (main content, metadata, comments)

Async Python 3.6+ web scraping micro-framework based on asyncio

A Smart, Automatic, Fast and Lightweight Web Scraper for Python