466 Repositories
Python pdf-scraper-with-ocr Libraries
OCR Post Correction for Endangered Language Texts
📌 Coming soon: an update to the software including features from our paper on semi-supervised OCR post-correction, to be published in the Transaction
Small python-gtk application, which helps the user to merge or split pdf documents and rotate, crop and rearrange their pages using an interactive and intuitive graphical interface
Small python-gtk application, which helps the user to merge or split pdf documents and rotate, crop and rearrange their pages using an interactive and intuitive graphical interface
Python-based tools for document analysis and OCR
ocropy OCRopus is a collection of document analysis programs, not a turn-key OCR system. In order to apply it to your documents, you may need to do so
Tool which allow you to detect and translate text.
Text detection and recognition This repository contains tool which allow to detect region with text and translate it one by one. Description Two pretr
Tools for manipulating and evaluating the hOCR format for representing multi-lingual OCR results by embedding them into HTML.
hocr-tools About About the code Installation System-wide with pip System-wide from source virtualenv Available Programs hocr-check -- check the hOCR f
A simplistic scraper made to download tons of random screenshots made by people.
printStealer 1.1 What is this tool? This tool is developed to show the insecurity of the screenshot utility called prnt sc. It is a site that stores s
A multithreaded tool for searching and downloading images from popular search engines. It is straightforward to set up and run!
🕳️ CygnusX1 Code by Trong-Dat Ngo. Overviews 🕳️ CygnusX1 is a multithreaded tool 🛠️ , used to search and download images from popular search engine
Generate a preview image for a PDF.
PDF ➡️ Preview A simple tool to save me time on Illustrator. Generates a preview image for a PDF file. Useful for sneak peeks to academic publications
Scrape plants scientific name information from Agroforestry Species Switchboard 2.0.
Agroforestry Species Switchboard 2.0 Scraper Scrape plants scientific name information from Species Switchboard 2.0. Requirements python = 3.10 (you
Program that locks/unlocks pdf files🐍
🐍 📄 PDFtools 📄 🐍 Programa que bloqueia/desbloqueia arquivos pdf Requisitos • Como usar • Capturas de Tela 🚨 Aviso 🚨 Altere os caminhos referente
Classical OCR DCNN reproduction based on PaddlePaddle framework.
Paddle-SVHN Classical OCR DCNN reproduction based on PaddlePaddle framework. This project reproduces Multi-digit Number Recognition from Street View I
Converting Html files to pdf using python script, pdfkit module and wkhtmltopdf.
Html-to-pdf-pdfkit-wkhtml- This repository has code for converting local html files and online html resources into pdf. It is an python script which u
Proxy scraper. Format: IP | PORT | COUNTRY | TYPE
proxy scraper 🔎 Installation: git clone https://github.com/ebankoff/proxy_scraper Required pip libraries (pip install library name): lxml beautifulso
this is simple program, that converts pdf file to png
author: a5892731 last update:2021-11-01 version: 1.1 resources: -https://pypi.org/project/pdf2image/ -https://github.com/oschwartz10612/poppler-window
Convert markdown to HTML using the GitHub API and some additional tweaks with Python.
Convert markdown to HTML using the GitHub API and some additional tweaks with Python. Comes with full formula support and image compression.
Scrapy-based cyber security news finder
Cyber-Security-News-Scraper Scrapy-based cyber security news finder Goal To keep up to date on the constant barrage of information within the field of
An Automated udemy coupons scraper which scrapes coupons and autopost the result in blogspot post
Autoscraper-n-blogger An Automated udemy coupons scraper which scrapes coupons and autopost the result in blogspot post and notifies via Telegram bot
This is a implementation of CRAFT OCR method
This is a implementation of CRAFT OCR method
a simple ehentai downloader with jpg 2 pdf
Simple_Ehentai_DownLoader a simple ehentai downloader with jpg 2 pdf 中文介绍 Environment python3.8 How to use before you start,there are some tips. the q
Web scrapping
Project Setup Table of Contents Project Setup Table of Contents Run project locally Install Requirements Run script Run project locally Install Requir
This is a GUI for scrapping PDFs with the help of optical character recognition making easier than ever to scrape PDFs.
pdf-scraper-with-ocr With this tool I am aiming to facilitate the work of those who need to scrape PDFs either by hand or using tools that doesn't imp
Dirty, ugly, and hopefully useful OCR of Facebook Papers docs released by Gizmodo
Quick and Dirty OCR of Facebook Papers Gizmodo has been working through the Facebook Papers and releasing the docs that they process and review. As lu
New World Market Scraper
Bean Seller A New Worlds market scraper. Deployment This must be installed on Windows as it uses the Windows api to do its stuff Install Prerequisites
Generate custom detailed survey paper with topic clustered sections and proper citations, from just a single query in just under 30 mins !!
Auto-Research A no-code utility to generate a detailed well-cited survey with topic clustered sections (draft paper format) and other interesting arti
A Vietnamese personal card OCR website built with Django.
Django VietCardOCR Installation Creation of virtual environments is done by executing the command venv: python -m venv venv That will create a new fol
Table automatically extraction from PDF Document
PDF Table Extractor Table automatically extraction from PDF Document Our Icon 📌 Name : PDF Table Extractor 📌 Authors : Minku Koo Jiyong Park 📌 Deve
Convolutional Recurrent Neural Network (CRNN) for image-based sequence recognition.
Convolutional Recurrent Neural Network This software implements the Convolutional Recurrent Neural Network (CRNN), a combination of CNN, RNN and CTC l
IGLS - Instagram Like Scraper CLI tool
IGLS - Instagram Like Scraper It's a web scraping command line tool based on python and selenium. Description This is a trial tool for learning purpos
Key information extraction from invoice document with Graph Convolution Network
Key Information Extraction from Scanned Invoices Key information extraction from invoice document with Graph Convolution Network Related blog post fro
Convert lecture videos to slides in one line. Takes an input of a directory containing your lecture videos and outputs a directory containing .PDF files containing the slides of each lecture.
Convert lecture videos to slides in one line. Takes an input of a directory containing your lecture videos and outputs a directory containing .PDF files containing the slides of each lecture.
Github scraper app is used to scrape data for a specific user profile created using streamlit and BeautifulSoup python packages
Github Scraper Github scraper app is used to scrape data for a specific user profile. Github scraper app gets a github profile name and check whether
Machine Leaning applied to denoise images to improve OCR Accuracy
Machine Learning to Denoise Images for Better OCR Accuracy This project is an adaptation of this tutorial and used only for learning purposes: https:/
Telegram tools
Telegram-Tools Telegram tools. Explanation English | 中文 Features Export group memebrs Add users to the group Send message to users Setup API Open http
Example of scraping a paginated API endpoint and dumping the data into a DB
Provider API Scraper Example Example of scraping a paginated API endpoint and dumping the data into a DB. Pre-requisits Python = 3.9 Pipenv Setup # i
A bot that extract text from images using the Tesseract OCR.
Text from image (OCR) @ocr_text_bot A simple bot to extract text from images. Usage What do I need? A AWS key configured locally, see here. NodeJS. I
Use Youdao OCR API to covert your clipboard image to text.
Alfred Clipboard OCR 注:本仓库基于 oott123/alfred-clipboard-ocr 的逻辑用 Python 重写,换用了有道 AI 的 API,准确率更高,有效防止百度导致隐私泄露等问题,并且有道 AI 初始提供的 50 元体验金对于其资费而言个人用户基本可以永久使用
Simply scrape / download all the media from an fansly account.
Simply scrape / download all the media from an fansly account. Providing updates as long as its continuously gaining popularity, so hit the ⭐ button!
An helper library to scrape data from TikTok in one line, using the Influencer Hunters APIs.
TikTok Scraper An utility library to scrape data from TikTok hassle-free Go to the website » View Demo · Report Bug · Request Feature About The Projec
Dailyiptvlist.com Scraper With Python
Dailyiptvlist.com scraper Info Made in python Linux only script Script requires to have wget installed Running script Clone repository with: git clone
A simple pdf size compressing telegram robot witten in python.
Pdf Compressor Telegram Bot ##About : A simple pdf size compressing telegram robot witten in python. Mostly useful for digital documentation. Deploy t
Usando o Amazon Textract como OCR para Extração de Dados no DynamoDB
dio-live-textract2 Repositório de código para o live coding do dia 05/10/2021 sobre extração de dados estruturados e gravação em banco de dados a part
Easily report Instagram pages and close the page
Program Features - 📌 Delete target post on Instagram. - 📌 Delete Media Target post on Instagram - 📌 Complete deletion of the target account on Inst
Arxiv2Kindle is a simple script written in python that converts LaTeX source downloaded from Arxiv and recompiles it to better fit a Kindle or other similar reading devices.
Arxiv2Kindle is a simple script written in python that converts LaTeX source downloaded from Arxiv and recompiles it to better fit a read
⬇️ Telegram Bot to download TikTok videos without watermark in a snap with Inline mode support.
⬇️ Tokmate - Telegram Bot to download TikTok videos ⛲ Features Superfast and supports all type of TikTok links Download any TikTok videos without mate
Convert Lecture Videos to PDF
Convert Lecture Videos to PDF Description Want to go through lecture videos faster without missing any information? Wish you can read the lecture vide
A Happy and lightweight Python Package that Provides an API to search for articles on Google News and returns a JSON response.
A Happy and lightweight Python Package that Provides an API to search for articles on Google News and returns a JSON response.
Ground truth data for the Optical Character Recognition of Historical Classical Commentaries.
OCR Ground Truth for Historical Commentaries The dataset OCR ground truth for historical commentaries (GT4HistComment) was created from the public dom
An automated, headless YouTube Watcher and Scraper
Searches YouTube, queries recommended videos and watches them. All fully automated and anonymised through the Tor network. The project consists of two independently usable components, the YouTube automation written in Python and the dockerized Tor Browser.
A Python package that scrapes Google News article data while remaining undetected by Google.
A Python package that scrapes Google News article data while remaining undetected by Google. Our scraper can scrape page data up until the last page and never trigger a CAPTCHA (download stats: https://pepy.tech/project/GoogleNewsScraper)
Console application for downloading images from Reddit in Python
RedditImageScraper Console application for downloading images from Reddit in Python Introduction This short Python script was created for the mass-dow
The open-source web scrapers that feed the Los Angeles Times California coronavirus tracker.
The open-source web scrapers that feed the Los Angeles Times' California coronavirus tracker. Processed data ready for analysis is available at datade
A tool for scraping and organizing data from NewsBank API searches
nbscraper Overview This simple tool automates the process of copying, pasting, and organizing data from NewsBank API searches. Curerntly, nbscrape onl
A tool to easily scrape youtube data using the Google API
YouTube data scraper To easily scrape any data from the youtube homepage, a youtube channel/user, search results, playlists, and a single video itself
A social networking service scraper in Python
snscrape snscrape is a scraper for social networking services (SNS). It scrapes things like user profiles, hashtags, or searches and returns the disco
Universal Reddit Scraper - A comprehensive Reddit scraping command-line tool written in Python.
Universal Reddit Scraper - A comprehensive Reddit scraping command-line tool written in Python.
a micro OCR network with 0.07mb params.
MicroOCR a micro OCR network with 0.07mb params. Layer (type) Output Shape Param # Conv2d-1 [-1, 64, 8,
Programa que viabiliza a OCR (Optical Character Reading - leitura óptica de caracteres) de um PDF.
Este programa tem o intuito de ser um modificador de arquivos PDF. Os arquivos PDFs podem ser 3: PDFs verdadeiros - em que podem ser selecionados o ti
Nekopoi scraper using python3
Features Scrap from url Todo [+] Search by genre [+] Search by query [+] Scrap from homepage Example # Hentai Scraper from nekopoi import Hent
Simple python tool created for downloading PDF.
PDFdownloader Usage Open PDF in full-screen mode Run scan.exe Enter how many pages you want to scan Focus PDF After scanning is done, run merge.exe En
It is a image ocr tool using the Tesseract-OCR engine with the pytesseract package and has a GUI.
OCR-Tool It is a image ocr tool made in Python using the Tesseract-OCR engine with the pytesseract package and has a GUI. This is my second ever pytho
simple http & https proxy scraper and checker
simple http & https proxy scraper and checker
An utility library to scrape data from TikTok, Instagram, Twitch, Youtube, Twitter or Reddit in one line!
Social Media Scraper An utility library to scrape data from TikTok, Instagram, Twitch, Youtube, Twitter or Reddit in one line! Go to the website » Vie
This book will take you on an exploratory journey through the PDF format, and the borb Python library.
This book will take you on an exploratory journey through the PDF format, and the borb Python library.
Docbarcodes extracts 1D and 2D barcodes from scanned PDF documents or images. It can be used to automate extraction and processing of all kind of documents.
Intro Barcodes are being used in many documents or forms to enable machine reading capabilities and reduce manual processing effort. Simple 1D barcode
This app will let you continuously scrape certain parts of LeasePlan and extract data of cars becoming available for lease.
LeasePlan - Scraper This app will let you continuously scrape certain parts of LeasePlan and extract data of cars becoming available for lease. It has
METS/ALTO OCR enhancing tool by the National Library of Luxembourg (BnL)
Nautilus-OCR The National Library of Luxembourg (BnL) started its first initiative in digitizing newspapers, with layout recognition and OCR on articl
Fetch Flipkart product details including name, price, MRP and Stock details in general as well as specific to a pincode
Fetch Flipkart product details including name, price, MRP and Stock details in general as well as specific to a pincode
Kusonime scraper using python3
Features Scrap from url Scrap from recommendation Search by query Todo [+] Search by genre Example # Get download url from kusonime import Scrap
Twitter Scraper
Twitter's API is annoying to work with, and has lots of limitations — luckily their frontend (JavaScript) has it's own API, which I reverse–engineered. No API rate limits. No restrictions. Extremely fast.
Website desenvolvido em Django para gerenciamento e upload de arquivos (.pdf).
Website para Gerenciamento de Arquivos Features Esta é uma aplicação full stack web construída para desenvolver habilidades com o framework Django. O
distfit - Probability density fitting
Python package for probability density function fitting of univariate distributions of non-censored data
An application which enables the users to perform simple yet intriguing PDF operations
AstutePDF A repository containing the GUI for an application which enables the users to perform simple yet intriguing PDF operations. These include, M
🔍 Messages Searcher is make for search custom message in all channels in guild and dm.
🔍 Messages Searcher is make for search custom message in all channels in guild and dm.
I can help you convert your images to pdf file.
IMAGE TO PDF CONVERTER BOT Configs TOKEN - Get bot token from @BotFather API_ID - From my.telegram.org API_HASH - From my.telegram.org Deploy to Herok
Create Fast and easy image datasets using reddit
Reddit-Image-Scraper Reddit Reddit is an American Social news aggregation, web content rating, and discussion website. Reddit has been devided by topi
Parser manager for parsing DOC, DOCX, PDF or HTML files
Parser manager Description Parser gets PDF, DOC, DOCX or HTML file via API and saves parsed data to the database. Implemented in Ruby 3.0.1 using Acti
docTR by Mindee (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.
docTR by Mindee (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.
Convert emails without attachments to pdf and send as email
Email to PDF to email This script will check an imap folder for unread emails. Any unread email that does not have an attachment will be converted to
A simple python web scraper.
Dissec A simple python web scraper. It gets a website and its contents and parses them with the help of bs4. Installation To install the requirements,
Gera um PDF, logo depois de você responder um questionário simples, e envia para o e-mail que você informar.
PDF generator and send it for your email Criador: Francisco Robson de O. Dutra Filho Repositório criado no dia 18/09/2021 Instagram: @robsondutra_ Sob
x-ray is a Python library for finding bad redactions in PDF documents.
A tool to detect whether a PDF has a bad redaction
This is a simple python script to collect sub-domains from hackertarget API
Domain-Scraper 🌐 This is a simple python script to collect sub-domains from hackertarget API Note : This is tool is limited to 20 Queries / day with
A Screen Translator/OCR Translator made by using Python and Tesseract, the user interface are made using Tkinter. All code written in python.
About An OCR translator tool. Made by me by utilizing Tesseract, compiled to .exe using pyinstaller. I made this program to learn more about python. I
Python lib for Simple PDF text extraction
Python lib for Simple PDF text extraction
A simple proxy scraper that utilizes the requests module in python.
Proxy Scraper A simple proxy scraper that utilizes the requests module in python. Usage Depending on your python installation your commands may vary.
WeasyPrint is a smart solution helping web developers to create PDF documents.
WeasyPrint is a smart solution helping web developers to create PDF documents. It turns simple HTML pages into gorgeous statistical reports, invoices, tickets…
Generate text line images for training deep learning OCR model (e.g. CRNN)
Generate text line images for training deep learning OCR model (e.g. CRNN)
borb is a library for reading, creating and manipulating PDF files in python.
borb is a library for reading, creating and manipulating PDF files in python.
🤖 Threaded Scraper to get discord servers from disboard.org written in python3
Disboard-Scraper Threaded Scraper to get discord servers from disboard.org written in python3. Setup. One thread / tag If you whant to look for multip
Fetch McDonald invoices from mailbox and merge them to one PDF file.
concatenate Fetch McDonald invoices from mailbox and merge them to one PDF file. Description This script will fetch all McDonald invoice pdfs from a p
Python script that split PDF files.
Automatic PDF Splitter This script can create new single-page PDFs files from multipaged PDFs. Requirements Python 3.0+ # Debian distros sudo apt-get
Merge multiple PDF files into one.
PDF Merger Merge multiple PDF files into one. Usage % python pdf_merger.py -h usage: pdf_merger.py [-h] [-o OUTPUT] [-f [FILES ...]] optional argumen
Images to PDF Telegram Bot
ilovepdf Convert Images to PDF Bot This bot will helps you to create pdf's from your images [without leaving telegram] 😉 By Default: your pdf fil
Generate a bunch of malicious pdf files with phone-home functionality. Can be used with Burp Collaborator
Malicious PDF Generator ☠️ Generate ten different malicious pdf files with phone-home functionality. Can be used with Burp Collaborator. Used for pene
Scrape all the media from an OnlyFans account - Updated regularly
Scrape all the media from an OnlyFans account - Updated regularly
mescrappy - Python + Selenium Youtube scraper
mescrappy - Python + Selenium Youtube scraper Youtube Sraping With Python (Selenium) Table of Contents About The Project Built With Getting Started In
一个多语言支持、易使用的 OCR 项目。An easy-to-use OCR project with multilingual support.
AgentOCR 简介 AgentOCR 是一个基于 PaddleOCR 和 ONNXRuntime 项目开发的一个使用简单、调用方便的 OCR 项目 本项目目前包含 Python Package 【AgentOCR】 和 OCR 标注软件 【AgentOCRLabeling】 使用指南 Pytho
Youtube videos and channels scraper python wrapper!
YouTubeCrawle Wrapper for python Why This wrapper? This is wrapper is not limited to videos only it can scrape both channel and videos seperately ;D
pix2tex: Using a ViT to convert images of equations into LaTeX code.
The goal of this project is to create a learning based system that takes an image of a math formula and returns corresponding LaTeX code.
Genalog is an open source, cross-platform python package allowing generation of synthetic document images with custom degradations and text alignment capabilities.
Genalog is an open source, cross-platform python package allowing generation of synthetic document images with custom degradations and text alignment capabilities.