191 Repositories
Python scraping-websites Libraries
A quick username checker to see if a username is available on a list of assorted websites.
A quick username checker to see if a username is available on a list of assorted websites.
Consulta de CPF e CNPJ na Receita Federal com Web-Scraping
Repositório contendo scripts Python que realizam a consulta de CPF e CNPJ diretamente no site da Receita Federal.
A Python module to bypass Cloudflare's anti-bot page.
cloudflare-scrape A simple Python module to bypass Cloudflare's anti-bot page (also known as "I'm Under Attack Mode", or IUAM), implemented with Reque
Scraping Bot for the Covid19 vaccination website of the Canton of Zurich, Switzerland.
Hi 👋 , I'm David A passionate developer from France. 🌱 I’m currently learning Kotlin, ReactJS and Kubernetes 👨💻 All of my projects are available
Here I provide the source code for doing web scraping using the python library, it is Selenium.
Here I provide the source code for doing web scraping using the python library, it is Selenium.
CiteURL is an extensible tool that parses legal citations and makes links to websites where you can read the cited language for free.
CiteURL is an extensible tool that parses legal citations and makes links to websites where you can read the cited language for free. It can be used t
Implementation of the bachelor's thesis "Real-time stock predictions with deep learning and news scraping".
Real-time stock predictions with deep learning and news scraping This repository contains a partial implementation of my bachelor's thesis "Real-time
Scraping comments from the political section of popular Nigerian blog (Nairaland), and saving in a CSV file.
Scraping_Nairaland This project scraped comments from the political section of popular Nigerian blog www.nairaland.com using the Python BeautifulSoup
4CAT: Capture and Analysis Toolkit
4CAT: Capture and Analysis Toolkit 4CAT is a research tool that can be used to analyse and process data from online social platforms. Its goal is to m
Scraping weather data using Python to receive umbrella reminders
A Python package which scrapes weather data from google and sends umbrella reminders to specified email at specified time daily.
Web Scraping Practica With Python
Web-Scraping-Practica Integrants: Guillem Vidal Pallarols. Lídia Bandrés Solé Fitxers: Aquest document és el primer que trobem. A continuació trobem u
Facebook Group Scraping Using Beautiful Soup & Selenium
Extract Facebook group posts that are related to a specific topic and write them to a .json file.
Sqli-Scanner is a python3 script written to scan websites for SQL injection vulnerabilities
Sqli-Scanner is a python3 script written to scan websites for SQL injection vulnerabilities Features 1 Scan one website 2 Scan multiple websites Insta
A project that automatically sends you a Medium article on a topic of your choosing to your email address daily.
Daily Article from Medium ✏️ About A project that automatically sends you a Medium article on a topic of your choosing to your email address daily. No
PyJPBoatRace: Python-based Japanese boatrace tools 🚤
pyjpboatrace :speedboat: provides you with useful tools for data analysis and auto-betting for boatrace.
Scraping web pages to get data
Scraping Data Get public data and save in database This is project use Python How to run a project 1 - Clone the repository 2 - Install beautifulsoup4
Web-Scraping using Selenium Master
Web-Scraping using Selenium What is the need of Selenium? Some websites don't like to be scrapped and in that case you need to disguise your webscrapi
A one place destination to check whatever is trending on the top social and news websites at present.
UpTrend A one place destination to check whatever is trending on the top social and news websites at present. Explore the docs » View Demo · Report Bu
Example of scraping a paginated API endpoint and dumping the data into a DB
Provider API Scraper Example Example of scraping a paginated API endpoint and dumping the data into a DB. Pre-requisits Python = 3.9 Pipenv Setup # i
PyMultiDictionary is a Dictionary Module for Python 3+ to get meanings, translations, synonyms and antonyms of words in 20 different languages
PyMultiDictionary PyMultiDictionary is a Dictionary Module for Python 3+ to get meanings, translations, synonyms and antonyms of words in 20 different
Web Scraping, Document Deduplication & GPT-2 Fine-tuning with a newly created scam dataset.
Web Scraping, Document Deduplication & GPT-2 Fine-tuning with a newly created scam dataset.
Username reconnaisance tool that checks the availability of a specified username on over 200 websites.
Username reconnaisance tool that checks the availability of a specified username on over 200 websites. Installation & Usage Clone from Github: $ git c
Subdomain enumeration,Web scraping and finding usernames automation script written in python
Subdomain enumeration,Web scraping and finding usernames automation script written in python
Better GitHub statistics images for your profile, with stats from private and public repos
Better GitHub statistics images for your profile, with stats from private and public repos
Custom Selenium Chromedriver | Zero-Config | Passes ALL bot mitigation systems (like Distil / Imperva/ Datadadome / CloudFlare IUAM)
Custom Selenium Chromedriver | Zero-Config | Passes ALL bot mitigation systems (like Distil / Imperva/ Datadadome / CloudFlare IUAM)
Scraping news from Ucsal portal with Scrapy.
NewsScraping Esse é um projeto de raspagem das últimas noticias, de 2021, do portal da universidade Ucsal http://noosfero.ucsal.br/institucional Tecno
A Python package that scrapes Google News article data while remaining undetected by Google.
A Python package that scrapes Google News article data while remaining undetected by Google. Our scraper can scrape page data up until the last page and never trigger a CAPTCHA (download stats: https://pepy.tech/project/GoogleNewsScraper)
Using Selenium with Python to Web Scrap Popular Youtube Tech Channels.
Web Scrapping Popular Youtube Tech Channels with Selenium Data Mining, Data Wrangling, and Exploratory Data Analysis About the Data Web scrapi
The core packages of security analyzer web crawler
Security Analyzer 🐍 A large scale web crawler (considered also as vulnerability scanner tool) to take an overview about security of Moroccan sites Cu
The open-source web scrapers that feed the Los Angeles Times California coronavirus tracker.
The open-source web scrapers that feed the Los Angeles Times' California coronavirus tracker. Processed data ready for analysis is available at datade
Current Antarctic large iceberg positions derived from ASCAT and OSCAT-2
Iceberg Locations Antarctic large iceberg positions derived from ASCAT and OSCAT-2. All data collected here are from the NASA SCP website Overview Thi
A tool for scraping and organizing data from NewsBank API searches
nbscraper Overview This simple tool automates the process of copying, pasting, and organizing data from NewsBank API searches. Curerntly, nbscrape onl
A web scraping pipeline project that retrieves TV and movie data from two sources, then transforms and stores data in a MySQL database.
New to Streaming Scraper An in-progress web scraping project built with Python, R, and SQL. The scraped data are movie and TV show information. The go
Universal Reddit Scraper - A comprehensive Reddit scraping command-line tool written in Python.
Universal Reddit Scraper - A comprehensive Reddit scraping command-line tool written in Python.
PyQuery-based scraping micro-framework.
demiurge PyQuery-based scraping micro-framework. Supports Python 2.x and 3.x. Documentation: http://demiurge.readthedocs.org Installing demiurge $ pip
NASA APOD Discord Bot - Fetches information from NASA APOD site.
NASA APOD Discord Bot - Fetches information from NASA APOD site.
Python SCript to scrape members from a selected Telegram group.
A python script to scrape all the members in a telegram group anad save in a CSV file. REGESTRING Go to this link https://core.telegram.org/api/obtain
A script that will warn you, by opening a new browser tab, when there are new content in your favourite websites.
web check A script that will warn you, by opening a new browser tab, when there are new content in your favourite websites. What it does The script wi
A Web Scraping Program.
Web Scraping AUTHOR: Saurabh G. MTech Information Security, IIT Jammu. If you find this repository useful. I would appreciate if you Star it and Fork
Ross Virtual Assistant is a programme which can play Music, search Wikipedia, open Websites and much more.
Ross-Virtual-Assistant Ross Virtual Assistant is a programme which can play Music, search Wikipedia, open Websites and much more. Installation Downloa
Django-registration (redux) provides user registration functionality for Django websites.
Description: Django-registration provides user registration functionality for Django websites. maintainers: Macropin, DiCato, and joshblum contributor
Automated network configuration backups using Github actions and git-scraping
Network Config Scraper This repository demonstrates the use of Github Actions and git-scraping to build an automated backup solution for network confi
Docker containerized Python Flask API that uses selenium to scrape and interact with websites
Docker containerized Python Flask API that uses selenium to scrape and interact with websites
Django API that scrapes and provides the last news of the city of Carlos Casares by semantic way (RDF format).
"Casares News" API Api that scrapes and provides the last news of the city of Carlos Casares by semantic way (RDF format). Usage Consume the articles
Amazon Scraper: A command-line tool for scraping Amazon product data
Amazon Product Scraper: 2021 Description A command-line tool for scraping Amazon product data to CSV or JSON format(s). Requirements Python 3 pip3 Ins
A Web Scraper built with beautiful soup, that fetches udemy course information. Get udemy course information and convert it to json, csv or xml file
Udemy Scraper A Web Scraper built with beautiful soup, that fetches udemy course information. Installation Virtual Environment Firstly, it is recommen
Scraping Thailand COVID-19 data from the DDC's tableau dashboard
Scraping COVID-19 data from DDC Dashboard Scraping Thailand COVID-19 data from the DDC's tableau dashboard. Data is updated at 07:30 and 08:00 daily.
Scraping and analysis of leetcode-compensations page.
Leetcode compensations report Scraping and analysis of leetcode-compensations page.
A Python library to utilize AWS API Gateway's large IP pool as a proxy to generate pseudo-infinite IPs for web scraping and brute forcing.
A Python library to utilize AWS API Gateway's large IP pool as a proxy to generate pseudo-infinite IPs for web scraping and brute forcing.
Command line program to download documents from web portals.
command line document download made easy Highlights list available documents in json format or download them filter documents using string matching re
Download images from forum threads
Forum Image Scraper Downloads images from forum threads Only works with forums which doesn't require a login to view and have an incremental paginatio
mlscraper: Scrape data from HTML pages automatically with Machine Learning
🤖 Scrape data from HTML websites automatically with Machine Learning
crypto currency scraping
SCRYPTO What ? Crypto currencies scraping (At the moment, only bitcoin and ethereum crypto currencies are supported) How ? A python script is running
Autotype on websites that have copy-paste disabled like Moodle, HackerEarth contest etc.
Autotype A quick and small python script that helps you autotype on websites that have copy paste disabled like Moodle, HackerEarth contests etc as it
This is a python based web scraping bot for windows to download all ACCEPTED submissions of any user on Codeforces
CODEFORCES DOWNLOADER This is a python based web scraping bot for windows to download all ACCEPTED submissions of any user on Codeforces Requirements
Create Multiple CF entry for multiple websites
AWS-CloudFront Problem: Deploy multiple CloudFront for account with multiple domains. Functionality: Running this script in loop and deploy CloudFront
Minimal set of tools to conduct stealthy scraping.
Stealthy Scraping Tools Do not use puppeteer and playwright for scraping. Explanation. We only use the CDP to obtain the page source and to get the ab
This little tool is to calculate a MurmurHash value of a favicon to hunt phishing websites on the Shodan platform.
MurMurHash This little tool is to calculate a MurmurHash value of a favicon to hunt phishing websites on the Shodan platform. What is MurMurHash? Murm
Campsite Reservation Finder
yellowstone-camping UPDATE: yellowstone-camping is being expanded and renamed to camply. The updated tool now interfaces with the Recreation.gov API a
Campsite Reservation Cancellation Finder (Yellowstone National Park)
yellowstone-camping yellowstone-camping is a Campsite Reservation Cancellation Finder for Yellowstone National Park. This simple Python application wi
A repository with scraping code and soccer dataset from understat.com.
UNDERSTAT - SHOTS DATASET As many people interested in soccer analytics know, Understat is an amazing source of information. They provide Expected Goa
FireDM is a python open source (Internet Download Manager) with multi-connections, high speed engine, it downloads general files and videos from youtube and tons of other streaming websites .
python open source (Internet Download Manager) with multi-connections, high speed engine, based on python, LibCurl, and youtube_dl https://github.com/firedm/FireDM
Bulk Downloader for Reddit
saveddit is a bulk media downloader for reddit pip3 install saveddit Setting up authorization Register an application with Reddit Write down your clie
Polyglot Machine Learning example for scraping similar news articles.
Polyglot Machine Learning example for scraping similar news articles In this example, we will see how we can work with Machine Learning applications w
ZeroNet - Decentralized websites using Bitcoin crypto and BitTorrent network
ZeroNet Decentralized websites using Bitcoin crypto and the BitTorrent network - https://zeronet.io / onion Why? We believe in open, free, and uncenso
A tool for exporting Telegram group chats into static websites, preserving chat history like mailing list archives.
tg-archive is a tool for exporting Telegram group chats into static websites, preserving chat history like mailing list archives. Preview The @fossuni
Powerful Telegram Members Scraping and Adding Toolkit
🔥 Genisys V2.1 Powerful Telegram Members Scraping and Adding Toolkit 🔻 Features 🔺 ADDS IN BULK[by user id, not by username] Scrapes and adds to pub
A declarative website generator designed for high-quality websites, with a focus on easy maintenance and localization.
Grow Grow is a declarative tool for rapidly building, launching, and maintaining high-quality static HTML. Easy installation Jinja template engine Con
Finds Jobs on LinkedIn using web-scraping
Find Jobs on LinkedIn 📔 This program finds jobs by scraping on LinkedIn 👨💻 Relies on User Input. Accepts: Country, City, State 📑 Data about jobs
dirmaker is a simple, opinionated static site generator for quickly publishing directory websites.
dirmaker is a simple, opinionated static site generator for publishing directory websites (eg: Indic.page, env.wiki It takes entries from a YAML file and generates a categorised, paginated directory website.
Ella is a CMS based on Python web framework Django with a main focus on high-traffic news websites and Internet magazines.
Ella CMS Ella is opensource CMS based on Django framework, designed for flexibility. It is composed from several modules: Ella core is the main module
Securely and anonymously share files, host websites, and chat with friends using the Tor network
OnionShare OnionShare is an open source tool that lets you securely and anonymously share files, host websites, and chat with friends using the Tor ne
A pure-python HTML screen-scraping library
Scrapely Scrapely is a library for extracting structured data from HTML pages. Given some example web pages and the data to be extracted, scrapely con
Transistor, a Python web scraping framework for intelligent use cases.
Web data collection and storage for intelligent use cases. transistor About The web is full of data. Transistor is a web scraping framework for collec
🥫 The simple, fast, and modern web scraping library
About gazpacho is a simple, fast, and modern web scraping library. The library is stable, actively maintained, and installed with zero dependencies. I
Web scraping library and command-line tool for text discovery and extraction (main content, metadata, comments)
trafilatura: Web scraping tool for text discovery and retrieval Description Trafilatura is a Python package and command-line tool which seamlessly dow
Async Python 3.6+ web scraping micro-framework based on asyncio
Ruia 🕸️ Async Python 3.6+ web scraping micro-framework based on asyncio. ⚡ Write less, run faster. Overview Ruia is an async web scraping micro-frame
A Smart, Automatic, Fast and Lightweight Web Scraper for Python
AutoScraper: A Smart, Automatic, Fast and Lightweight Web Scraper for Python This project is made for automatic web scraping to make scraping easy. It
Parsel lets you extract data from XML/HTML documents using XPath or CSS selectors
Parsel Parsel is a BSD-licensed Python library to extract and remove data from HTML and XML using XPath and CSS selectors, optionally combined with re
Web mining module for Python, with tools for scraping, natural language processing, machine learning, network analysis and visualization.
Pattern Pattern is a web mining module for Python. It has tools for: Data Mining: web services (Google, Twitter, Wikipedia), web crawler, HTML DOM par
Scrapy, a fast high-level web crawling & scraping framework for Python.
Scrapy Overview Scrapy is a fast high-level web crawling and web scraping framework, used to crawl websites and extract structured data from their pag
Python binding to Modest engine (fast HTML5 parser with CSS selectors).
A fast HTML5 parser with CSS selectors using Modest engine. Installation From PyPI using pip: pip install selectolax Development version from github:
Pythonic HTML Parsing for Humans™
Requests-HTML: HTML Parsing for Humans™ This library intends to make parsing HTML (e.g. scraping the web) as simple and intuitive as possible. When us
Official python API for Phish.AI public and private API to detect zero-day phishing websites
phish-ai-api Summary Official python API for Phish.AI public and private API to detect zero-day phishing websites How it Works (TLDR) Essentially we h
Download song lyrics and metadata from Genius.com 🎶🎤
LyricsGenius: a Python client for the Genius.com API lyricsgenius provides a simple interface to the song, artist, and lyrics data stored on Genius.co
Pythonic HTML Parsing for Humans™
Requests-HTML: HTML Parsing for Humans™ This library intends to make parsing HTML (e.g. scraping the web) as simple and intuitive as possible. When us
Web mining module for Python, with tools for scraping, natural language processing, machine learning, network analysis and visualization.
Pattern Pattern is a web mining module for Python. It has tools for: Data Mining: web services (Google, Twitter, Wikipedia), web crawler, HTML DOM par
Web mining module for Python, with tools for scraping, natural language processing, machine learning, network analysis and visualization.
Pattern Pattern is a web mining module for Python. It has tools for: Data Mining: web services (Google, Twitter, Wikipedia), web crawler, HTML DOM par
Visual scraping for Scrapy
Portia Portia is a tool that allows you to visually scrape websites without any programming knowledge required. With Portia you can annotate a web pag
A Python library for automating interaction with websites.
Home page https://mechanicalsoup.readthedocs.io/ Overview A Python library for automating interaction with websites. MechanicalSoup automatically stor
Web Scraping Framework
Grab Framework Documentation Installation $ pip install -U grab See details about installing Grab on different platforms here http://docs.grablib.