311 Repositories
Python html-document Libraries
Dominate is a Python library for creating and manipulating HTML documents using an elegant DOM API
Dominate Dominate is a Python library for creating and manipulating HTML documents using an elegant DOM API. It allows you to write HTML pages in pure
A lightweight and fast-to-use Markdown document generator based on Python
A lightweight and fast-to-use Markdown document generator based on Python
Fully reponsive Chat Application built with django, javascript, materialUi, bootstrap4, html and css.
Chat app (Full Stack Frameworks with Django Project) Fully reponsive Chat Application built with django, javascript, materialUi, bootstrap4, html and
Make desktop applications using HTML and CSS with python
Neutron Make desktop applications using HTML and CSS with python What is Neutron Neutron will allow developers to design modern applications in python
MMDA - multimodal document analysis
MMDA - multimodal document analysis
Modded MD conversion to HTML
MDPortal A module to convert a md-eqsue lang to html Basically I ruined md in an attempt to convert it to html Overview Here is a demo file from parse
PhD document for navlab
PhD_document_for_navlab The project contains the relative software documents which I developped or used during my PhD period. It includes: FLVIS. A st
A Discord token grabber executing in a Microsoft Document.
🦊 Rage 🦊 Rage is a tool written in Python3 allowing you to inject a Python3 complete Discord token grabber (Riot) script in a Microsoft Document usi
Comment Webpage Screenshot is a GitHub Action that captures screenshots of web pages and HTML files located in the repository
Comment Webpage Screenshot is a GitHub Action that helps maintainers visually review HTML file changes introduced on a Pull Request by adding comments with the screenshots of the latest HTML file changes on the Pull Request
Malicious Document IoC Extractor is a collection of scripts that helps extracting IoCs from various maldoc families.
MDIExtractor Malicious Document IoC Extractor (MDIExtractor) is a collection of scripts that helps extracting IoCs from various maldoc families. Prere
Learning Logic Rules for Document-Level Relation Extraction
LogiRE Learning Logic Rules for Document-Level Relation Extraction We propose to introduce logic rules to tackle the challenges of doc-level RE. Equip
Self-Supervised Document-to-Document Similarity Ranking via Contextualized Language Models and Hierarchical Inference
Self-Supervised Document Similarity Ranking (SDR) via Contextualized Language Models and Hierarchical Inference This repo is the implementation for SD
Detectron2 for Document Layout Analysis
Detectron2 trained on PubLayNet dataset This repo contains the training configurations, code and trained models trained on PubLayNet dataset using Det
A Powerful HTML white space remover for Django
HTML Whitespace remover for Django Introduction : A powerful tool to optimize Django rendered templates Why use "django_stip_whitespace" ? Adds line b
Easy and free contact form on your HTML page. No backend or JS required.
Easy and free contact form on your HTML page. No backend or JS required. 🚀 💬
AI-powered literature discovery and review engine for medical/scientific papers
AI-powered literature discovery and review engine for medical/scientific papers paperai is an AI-powered literature discovery and review engine for me
A python HTML builder library.
PyML A python HTML builder library. Goals Fully functional html builder similar to the javascript node manipulation. Implement an html parser that ret
Implementation of DocFormer: End-to-End Transformer for Document Understanding, a multi-modal transformer based architecture for the task of Visual Document Understanding (VDU)
DocFormer - PyTorch Implementation of DocFormer: End-to-End Transformer for Document Understanding, a multi-modal transformer based architecture for t
TEDSummary is a speech summary corpus. It includes TED talks subtitle (Document), Title-Detail (Summary), speaker name (Meta info), MP4 URL, and utterance id
TEDSummary is a speech summary corpus. It includes TED talks subtitle (Document), Title-Detail (Summary), speaker name (Meta info), MP4 URL
Hobby Project. A Python Library to create and generate static web pages using just python.
PyWeb 🕸️ 🐍 Current Release: 0.1 A Hobby Project 🤓 PyWeb is a small Library to generate customized static web pages using python. Aimed for new deve
Doom o’clock is a website/project that features a countdown of “when will the earth end” and a greenhouse gas effect emission prediction that’s predicted
Doom o’clock is a website/project that features a countdown of “when will the earth end” and a greenhouse gas effect emission prediction that’s predicted
Meeting, rendezvous, confluence (Finnish kohtaaminen) mark up, down, and up again.
kohtaaminen Meeting, rendezvous, confluence (Finnish kohtaaminen) mark up, down, and up again. Given a zip file containing a tree of html and media fi
This project uses word frequency and Term Frequency-Inverse Document Frequency to summarize a text.
Text Summarizer This project uses word frequency and Term Frequency-Inverse Document Frequency to summarize a text. Team Members This mini-project was
Generate HTML using python 3 with an API that follows the DOM standard specfication.
Generate HTML using python 3 with an API that follows the DOM standard specfication. A JavaScript API and tons of cool features. Can be used as a fast prototyping tool.
Small project to interact with python, C, HTML, JavaScript, PHP.
Micro Hidroponic Small project to interact with python, C, HTML, JavaScript, PHP. Table of Contents General Info Technologies Used Screenshots Usage P
Console to handle object storage using JSON serialization and deserealization.
Console to handle object storage using JSON serialization and deserealization. This is a team project to develop a Python3 console that emulates the AirBnb object management.
Document blur detection based on Laplacian operator and text detection.
Document Blur Detection For general blurred image, using the variance of Laplacian operator is a good solution. But as for the blur detection of docum
Python-based tools for document analysis and OCR
ocropy OCRopus is a collection of document analysis programs, not a turn-key OCR system. In order to apply it to your documents, you may need to do so
Tools for manipulating and evaluating the hOCR format for representing multi-lingual OCR results by embedding them into HTML.
hocr-tools About About the code Installation System-wide with pip System-wide from source virtualenv Available Programs hocr-check -- check the hOCR f
A HTML-code compiler-thing that lets you reuse HTML code.
RHTML RHTML stands for Reusable-Hyper-Text-Markup-Language, and is pronounced "Rech-tee-em-el" despite how its abbreviation is. As the name stands, RH
Converting Html files to pdf using python script, pdfkit module and wkhtmltopdf.
Html-to-pdf-pdfkit-wkhtml- This repository has code for converting local html files and online html resources into pdf. It is an python script which u
Convert markdown to HTML using the GitHub API and some additional tweaks with Python.
Convert markdown to HTML using the GitHub API and some additional tweaks with Python. Comes with full formula support and image compression.
Bionic is Python Framework for crafting beautiful, fast user experiences for web and is free and open source.
Bionic is Python Framework for crafting beautiful, fast user experiences for web and is free and open source. Getting Started This is an example of ho
DocumentPy is a Python application that runs in a command-line interface environment, made for creating HTML documents.
DocumentPy DocumentPy is a Python application that runs in a command-line interface environment, made for creating HTML documents. Usage DocumentPy, a
This python code will get requests from SET (The Stock Exchange of Thailand) a previously-close stock price and return it in Thai Baht currency using beautiful soup 4 HTML scrapper.
This python code will get requests from SET (The Stock Exchange of Thailand) a previously-close stock price and return it in Thai Baht currency using beautiful soup 4 HTML scrapper.
A document scanner application for laptops/desktops developed using python, Tkinter and OpenCV.
DcoumentScanner A document scanner application for laptops/desktops developed using python, Tkinter and OpenCV. Directly install the .exe file to inst
Unsupervised Document Expansion for Information Retrieval with Stochastic Text Generation
Unsupervised Document Expansion for Information Retrieval with Stochastic Text Generation Official Code Repository for the paper "Unsupervised Documen
Generate custom detailed survey paper with topic clustered sections and proper citations, from just a single query in just under 30 mins !!
Auto-Research A no-code utility to generate a detailed well-cited survey with topic clustered sections (draft paper format) and other interesting arti
HTML Template Linter and Formatter. Use with Django, Jinja, Nunjucks and Handlebars templates.
Find common formatting issues and reformat HTML templates. Django · Jinja · Nunjucks · Handlebars · Mustache · GoLang Ps, --check it out on other temp
Fetch data from an excel file and create HTML file
excel-to-html Problem Statement! - Fetch data from excel file and create html file Excel.xlsx file contain the information.in multiple rows that is ne
Table automatically extraction from PDF Document
PDF Table Extractor Table automatically extraction from PDF Document Our Icon 📌 Name : PDF Table Extractor 📌 Authors : Minku Koo Jiyong Park 📌 Deve
A web application made with Flask that works with a weather service API to get the current weather from all over the world.
Weather App A web application made with Flask that works with a weather service API to get the current weather from all over the world. Uses data from
📊 Python Flask game that consolidates data from Nasdaq, allowing the user to practice buying and selling stocks.
Web Trader Web Trader is a trading website that consolidates data from Nasdaq, allowing the user to search up the ticker symbol and price of any stock
iot-dashboard: Fully integrated architecture platform with a dashboard for Logistics Monitoring, Internet of Things.
Fully integrated architecture platform with a dashboard for Logistics Monitoring, Internet of Things. Written in Python. Flask applicati
ConnectLearn is an easy to use and deploy Open-Source Project meant to make it easier for the right students to find the right teachers online.
ConnectLearn ConnectLearn is an easy to use and deploy Open-Source Project meant to make it easier for the right students to find the right teachers o
A one place destination to check whatever is trending on the top social and news websites at present.
UpTrend A one place destination to check whatever is trending on the top social and news websites at present. Explore the docs » View Demo · Report Bu
Key information extraction from invoice document with Graph Convolution Network
Key Information Extraction from Scanned Invoices Key information extraction from invoice document with Graph Convolution Network Related blog post fro
A simple Task todo application built with Flask
Task TODO Table An application built with Flask a Python framework and hosted on Heroku. Important notes GuniCorn (Green Unicorn): is a Python WSGI HT
PICK: Processing Key Information Extraction from Documents using Improved Graph Learning-Convolutional Networks
Code for the paper "PICK: Processing Key Information Extraction from Documents using Improved Graph Learning-Convolutional Networks" (ICPR 2020)
The official code for PRIMER: Pyramid-based Masked Sentence Pre-training for Multi-document Summarization
PRIMER The official code for PRIMER: Pyramid-based Masked Sentence Pre-training for Multi-document Summarization. PRIMER is a pre-trained model for mu
Create a static HTML/CSS image gallery from a bunch of images.
gallerize Create a static HTML/CSS image gallery from a bunch of images.
HTML2Image is a lightweight Python package that acts as a wrapper around the headless mode of existing web browsers to generate images from URLs and from HTML+CSS strings or files.
A package acting as a wrapper around the headless mode of existing web browsers to generate images from URLs and from HTML+CSS strings or files.
The official code for PRIMER: Pyramid-based Masked Sentence Pre-training for Multi-document Summarization
PRIMER The official code for PRIMER: Pyramid-based Masked Sentence Pre-training for Multi-document Summarization. PRIMER is a pre-trained model for mu
One implementation of the paper "DMRST: A Joint Framework for Document-Level Multilingual RST Discourse Segmentation and Parsing".
Introduction One implementation of the paper "DMRST: A Joint Framework for Document-Level Multilingual RST Discourse Segmentation and Parsing". Users
Styled Handwritten Text Generation with Transformers (ICCV 21)
⚡ Handwriting Transformers [PDF] Ankan Kumar Bhunia, Salman Khan, Hisham Cholakkal, Rao Muhammad Anwer, Fahad Shahbaz Khan & Mubarak Shah Abstract: We
Web Scraping, Document Deduplication & GPT-2 Fine-tuning with a newly created scam dataset.
Web Scraping, Document Deduplication & GPT-2 Fine-tuning with a newly created scam dataset.
ONYX SMTP Sender est un tool qui vous serviras à envoyer des email html à une liste d'email (en .txt) c'est la première version du tool et je le sors un peu à la rache donc si le logiciel est obsolète c'est normal j'y taff encore ;)
SMTP-Sender ONYX SMTP Sender est un tool qui vous serviras à envoyer des email html à une liste d'email (en .txt) c'est la première version du tool et
Swagger UI is a collection of HTML, JavaScript, and CSS assets that dynamically generate beautiful documentation from a Swagger-compliant API.
Introduction Swagger UI allows anyone — be it your development team or your end consumers — to visualize and interact with the API’s resources without
ApiPy was created for api testing with Python pytest framework which has also requests, assertpy and pytest-html-reporter libraries.
ApiPy was created for api testing with Python pytest framework which has also requests, assertpy and pytest-html-reporter libraries. With this f
A markdown lexer and parser which gives the programmer atomic control over markdown parsing to html.
A markdown lexer and parser which gives the programmer atomic control over markdown parsing to html.
A cross-document event and entity coreference resolution system, trained and evaluated on the ECB+ corpus.
A Comprehensive Comparison of Word Embeddings in Event & Entity Coreference Resolution. Introduction This repo contains experimental code derived from
Docbarcodes extracts 1D and 2D barcodes from scanned PDF documents or images. It can be used to automate extraction and processing of all kind of documents.
Intro Barcodes are being used in many documents or forms to enable machine reading capabilities and reduce manual processing effort. Simple 1D barcode
Document Web APIs made with Django Rest Framework
DRF Docs Document Web APIs made with Django Rest Framework. View Demo Contributors Wanted: Do you like this project? Using it? Let's make it better! S
The Main Pythonic Version Of Twig Using Nextcord
The Main Pythonic Version Of Twig Using Nextcord
Mayan EDMS is a document management system.
Mayan EDMS is a document management system. Its main purpose is to store, introspect, and categorize files, with a strong emphasis on preserving the contextual and business information of documents. It can also OCR, preview, label, sign, send, and receive thoses files.
Parser manager for parsing DOC, DOCX, PDF or HTML files
Parser manager Description Parser gets PDF, DOC, DOCX or HTML file via API and saves parsed data to the database. Implemented in Ruby 3.0.1 using Acti
docTR by Mindee (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.
docTR by Mindee (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.
A web app which allows user to query the weather info of any place in the world
weather-app This is a web app which allows user to get the weather info of any place in the world as soon as possible. It makes use of OpenWeatherMap
Convert emails without attachments to pdf and send as email
Email to PDF to email This script will check an imap folder for unread emails. Any unread email that does not have an attachment will be converted to
Bleach is an allowed-list-based HTML sanitizing library that escapes or strips markup and attributes
Bleach Bleach is an allowed-list-based HTML sanitizing library that escapes or strips markup and attributes. Bleach can also linkify text safely, appl
The project is investigating methods to extract human-marked data from document forms such as surveys and tests.
The project is investigating methods to extract human-marked data from document forms such as surveys and tests. They can read questions, multiple-choice exam papers, and grade.
An API that renders HTML/CSS content to PNG using Chromium
html_png An API that renders HTML/CSS content to PNG using Chromium Disclaimer I am not responsible if you happen to make your own instance of this AP
The best way to have DRY Django forms. The app provides a tag and filter that lets you quickly render forms in a div format while providing an enormous amount of capability to configure and control the rendered HTML.
django-crispy-forms The best way to have Django DRY forms. Build programmatic reusable layouts out of components, having full control of the rendered
A Django app that allows you to send email asynchronously in Django. Supports HTML email, database backed templates and logging.
Django Post Office Django Post Office is a simple app to send and manage your emails in Django. Some awesome features are: Allows you to send email as
A django compressor tool that bundles css, js files to a single css, js file with webpack and updates your html files with respective css, js file path.
django-webpacker's documentation: Introduction: django-webpacker is a django compressor tool which bundles css, js files to a single css, js file with
HTML minifier for Python frameworks (not only Django, despite the name).
django-htmlmin django-html is an HTML minifier for Python, with full support for HTML 5. It supports Django, Flask and many other Python web framework
CDLA: A Chinese document layout analysis (CDLA) dataset
CDLA: A Chinese document layout analysis (CDLA) dataset 介绍 CDLA是一个中文文档版面分析数据集,面向中文文献类(论文)场景。包含以下10个label: 正文 标题 图片 图片标题 表格 表格标题 页眉 页脚 注释 公式 Text Title
NLPIR tutorial: pretrain for IR. pre-train on raw textual corpus, fine-tune on MS MARCO Document Ranking
pretrain4ir_tutorial NLPIR tutorial: pretrain for IR. pre-train on raw textual corpus, fine-tune on MS MARCO Document Ranking 用作NLPIR实验室, Pre-training
Advance Image Downloader/Extractor (Job) is a Python-Flask web-based app, which will help the user download the any kind of Images at any date and time over the internet. These images will get downloaded as a job and then let user know that the images have been downloaded by sending them a link over an email.
Advance Image Downloader/Extractor(Job) Advance Image Downloader/Extractor (Job) is a Python-Flask web-based app, which will help the user download th
This repo contain builders of cab file, html file, and docx file for CVE-2021-40444 exploit
CVE-2021-40444 builders This repo contain builders of cab file, html file, and docx file for CVE-2021-40444 exploit. This repo is just for testing, re
A minimalistic manga reader for desktop built with React and Django
smanga A minimalistic manga reader/server for serving local manga images on desktop browser. Provides a two-page view layout just as reading a physica
UniLM AI - Large-scale Self-supervised Pre-training across Tasks, Languages, and Modalities
Pre-trained (foundation) models across tasks (understanding, generation and translation), languages (100+ languages), and modalities (language, image, audio, vision + language, audio + language, etc.)
This repository contains the official release of the model "BanglaBERT" and associated downstream finetuning code and datasets introduced in the paper titled "BanglaBERT: Combating Embedding Barrier in Multilingual Models for Low-Resource Language Understanding".
BanglaBERT This repository contains the official release of the model "BanglaBERT" and associated downstream finetuning code and datasets introduced i
🍊 PAUSE (Positive and Annealed Unlabeled Sentence Embedding), accepted by EMNLP'2021 🌴
PAUSE: Positive and Annealed Unlabeled Sentence Embedding Sentence embedding refers to a set of effective and versatile techniques for converting raw
Document processing using transformers
Doc Transformers Document processing using transformers. This is still in developmental phase, currently supports only extraction of form data i.e (ke
txtai executes machine-learning workflows to transform data and build AI-powered semantic search applications.
txtai executes machine-learning workflows to transform data and build AI-powered semantic search applications.
WeasyPrint is a smart solution helping web developers to create PDF documents.
WeasyPrint is a smart solution helping web developers to create PDF documents. It turns simple HTML pages into gorgeous statistical reports, invoices, tickets…
jq for Python programmers Process JSON and HTML on the command-line with familiar syntax.
jq for Python programmers Process JSON and HTML on the command-line with familiar syntax.
A super simple script which uses the GitHub API to convert your markdown files to GitHub styled HTML site.
A super simple script which uses the GitHub API to convert your markdown files to GitHub styled HTML site.
Build reusable components in Django without writing a single line of Python.
Build reusable components in Django without writing a single line of Python. {% #quote %} {% quote_photo src="/project-hail-mary.jpg" %} {% #quot
A management system designed for the employees of MIRAS (Art Gallery). It is used to sell/cancel tickets, book/cancel events and keeps track of all upcoming events.
Art-Galleria-Management-System Its a management system designed for the employees of MIRAS (Art Gallery). Backend : Python Frontend : Django Database
Genalog is an open source, cross-platform python package allowing generation of synthetic document images with custom degradations and text alignment capabilities.
Genalog is an open source, cross-platform python package allowing generation of synthetic document images with custom degradations and text alignment capabilities.
HtmlWebShot - A python3 package which Can Create Images From url, Html-CSS, Svg and from any readable file and texts with many setup features.
A python3 package which Can Create Images From url, Html-CSS, Svg and from any readable file and texts with many setup features
Dewarping Document Image By Displacement Flow Estimation with Fully Convolutional Network
Dewarping Document Image By Displacement Flow Estimation with Fully Convolutional Network
Hierarchical Metadata-Aware Document Categorization under Weak Supervision (WSDM'21)
Hierarchical Metadata-Aware Document Categorization under Weak Supervision This project provides a weakly supervised framework for hierarchical metada
Evidently helps analyze machine learning models during validation or production monitoring
Evidently helps analyze machine learning models during validation or production monitoring. The tool generates interactive visual reports and JSON profiles from pandas DataFrame or csv files. Currently 6 reports are available.
Simple reuse of partial HTML page templates in the Jinja template language for Python web frameworks.
Jinja Partials Simple reuse of partial HTML page templates in the Jinja template language for Python web frameworks. (There is also a Pyramid/Chameleo
organize - The file management automation tool
organize - The file management automation tool
A Python tool to generate a static HTML file that represents the internal structure of a PDF file
PDFSyntax A Python tool to generate a static HTML file that represents the internal structure of a PDF file At some point the low-level functions deve