311 Repositories
Python html-document Libraries
[ACL 2022] LinkBERT: A Knowledgeable Language Model 😎 Pretrained with Document Links
LinkBERT: A Knowledgeable Language Model Pretrained with Document Links This repo provides the model, code & data of our paper: LinkBERT: Pretraining
This is a beginner-friendly repo to make a collection of some unique and awesome projects. Everyone in the community can benefit & get inspired by the amazing projects present over here.
Awesome-Projects-Collection Quality over Quantity :) What to do? Add some unique and amazing projects as per your favourite tech stack for the communi
Extracting Tables from Document Images using a Multi-stage Pipeline for Table Detection and Table Structure Recognition:
Multi-Type-TD-TSR Check it out on Source Code of our Paper: Multi-Type-TD-TSR Extracting Tables from Document Images using a Multi-stage Pipeline for
Make your first PR. A beginner friendly repository made specifically for open source beginners. Add any program under any language (it can be anything from a simple program to a complex data structure algorithm). Happy coding...
Hacktober Fest 2021 Upload Different Types of Programs in any Language Use this project to make your first contribution to an open source project on G
⚓ Eurybia monitor model drift over time and securize model deployment with data validation
View Demo · Documentation · Medium article 🔍 Overview Eurybia is a Python library which aims to help in : Detecting data drift and model drift Valida
Source code for "A Two-Stream AMR-enhanced Model for Document-level Event Argument Extraction" @ NAACL 2022
TSAR Source code for NAACL 2022 paper: A Two-Stream AMR-enhanced Model for Document-level Event Argument Extraction. 🔥 Introduction We focus on extra
Text classification is one of the popular tasks in NLP that allows a program to classify free-text documents based on pre-defined classes.
Deep-Learning-for-Text-Document-Classification Text classification is one of the popular tasks in NLP that allows a program to classify free-text docu
Browse JSON API in a HTML interface.
Falcon API Browse This project provides a middleware for Falcon Web Framework that will render the response in an HTML form for documentation purpose.
Repo for WWW 2022 paper: Progressively Optimized Bi-Granular Document Representation for Scalable Embedding Based Retrieval
BiDR Repo for WWW 2022 paper: Progressively Optimized Bi-Granular Document Representation for Scalable Embedding Based Retrieval. Requirements torch==
Django-Text-to-HTML-converter - The simple Text to HTML Converter using Django framework
Django-Text-to-HTML-converter This is the simple Text to HTML Converter using Dj
Pgn2tex - Scripts to convert pgn files to latex document. Useful to build books or pdf from pgn studies
Pgn2Latex (WIP) A simple script to make pdf from pgn files and studies. It's sti
Searches a document for hash tags. Support multiple natural languages. Works in various contexts.
ht-getter Searches a document for hash tags. Supports multiple natural languages. Works in various contexts. This package uses a non-regex approach an
An interactive document scanner built in Python using OpenCV
The scanner takes a poorly scanned image, finds the corners of the document, applies the perspective transformation to get a top-down view of the document, sharpens the image, and applies an adaptive color threshold to clean up the image.
Price Prediction model is used to develop an LSTM model to predict the future market price of Bitcoin and Ethereum.
Price Prediction model is used to develop an LSTM model to predict the future market price of Bitcoin and Ethereum.
File-based TF-IDF: Calculates keywords in a document, using a word corpus.
File-based TF-IDF Calculates keywords in a document, using a word corpus. Why? Because I found myself with hundreds of plain text files, with no way t
Split given PDF document into 4 page groups and convert them to booklet format
PUTO: PDF to Booklet converter Split given PDF document into 4 page groups and convert them to booklet format. It creates a PDF like shown below: Fir
Word document generator with python
In this study, real world data is anonymized. The content is completely different, but the structure is the same. It was a script I prepared for the backend of a work using UiPath.
A Microsoft Azure Web App project named Covid 19 Predictor using Machine learning Model
A Microsoft Azure Web App project named Covid 19 Predictor using Machine learning Model (Random Forest Classifier Model ) that helps the user to identify whether someone is showing positive Covid symptoms or not by simply inputting certain values like oxygen level , breath rate , age, Vaccination done or not etc. with the help of kaggle database.
An app that allows you to add recipes from the dashboard made using DJango, JQuery, JScript and HTMl.
An app that allows you to add recipes from the dashboard. Then visitors filter based on different categories also each ingredient has a unique page with their related recipes.
DocEnTr: An end-to-end document image enhancement transformer
DocEnTR Description Pytorch implementation of the paper DocEnTr: An End-to-End Document Image Enhancement Transformer. This model is implemented on to
This repository serves as a place to document a toy attempt on how to create a generative text model in Catalan, based on GPT-2
GPT-2 Catalan playground and scripts to train a GPT-2 model either from scrath or from another pretrained model.
A deep learning framework for historical document image analysis
DIVA-DAF Description A deep learning framework for historical document image analysis. How to run Install dependencies # clone project git clone https
Pytorch implementation of the paper DocEnTr: An End-to-End Document Image Enhancement Transformer.
DocEnTR Description Pytorch implementation of the paper DocEnTr: An End-to-End Document Image Enhancement Transformer. This model is implemented on to
Lektor-html-pretify - Lektor plugin to pretify the HTML DOM using Beautiful Soup
html-pretify Lektor plugin to pretify the HTML DOM using Beautiful Soup. How doe
CS50 pset9: Using flask API to create a web application to exchange stocks' shares.
C$50 Finance In this guide we want to implement a website via which users can “register”, “login” “buy” and “sell” stocks, like below: Background If y
WebScraper - A script that prints out a list of all EXTERNAL references in the HTML response to an HTTP/S request
Project A: WebScraper A script that prints out a list of all EXTERNAL references
Basic-html-scraper - A complete how to of web scraping with Python for beginners
basic-html-scraper Code from YT Video This video includes a complete how to of w
Bootstraparse is a personal project started with a specific goal in mind: creating static html pages for direct display from a markdown-like file
Bootstraparse is a personal project started with a specific goal in mind: creating static html pages for direct display from a markdown-like file
A simple document management REST based API for collaboratively interacting with documents
documan_api A simple document management REST based API for collaboratively interacting with documents.
The official code for “DocTr: Document Image Transformer for Geometric Unwarping and Illumination Correction”, ACM MM, Oral Paper, 2021.
Good news! Our new work exhibits state-of-the-art performances on DocUNet benchmark dataset: DocScanner: Robust Document Image Rectification with Prog
RedisJSON - a JSON data type for Redis
RedisJSON is a Redis module that implements ECMA-404 The JSON Data Interchange Standard as a native data type. It allows storing, updating and fetching JSON values from Redis keys (documents).
Document manipulation detection with python
image manipulation detection task: -- tianchi function image segmentation salie
A very simple document database
DockieDb A simple in-memory document database. Installation Build the Wheel Fork or clone this repository and run python setup.py bdist_wheel in the r
A Deep Learning Based Knowledge Extraction Toolkit for Knowledge Base Population
DeepKE is a knowledge extraction toolkit supporting low-resource and document-level scenarios for entity, relation and attribute extraction. We provide comprehensive documents, Google Colab tutorials, and online demo for beginners.
Application that converts markdown to html.
Markdown-Engine An application that converts markdown to html. Installation Using the package manager [pip] pip install -r requirements.txt Usage Run
PyTorch code for JEREX: Joint Entity-Level Relation Extractor
JEREX: "Joint Entity-Level Relation Extractor" PyTorch code for JEREX: "Joint Entity-Level Relation Extractor". For a description of the model and exp
Lbl2Vec learns jointly embedded label, document and word vectors to retrieve documents with predefined topics from an unlabeled document corpus.
Lbl2Vec Lbl2Vec is an algorithm for unsupervised document classification and unsupervised document retrieval. It automatically generates jointly embed
Shelf DB is a tiny document database for Python to stores documents or JSON-like data
Shelf DB Introduction Shelf DB is a tiny document database for Python to stores documents or JSON-like data. Get it $ pip install shelfdb shelfquery S
This repository contains the code for the paper 'PARM: Paragraph Aggregation Retrieval Model for Dense Document-to-Document Retrieval' published at ECIR'22.
Paragraph Aggregation Retrieval Model (PARM) for Dense Document-to-Document Retrieval This repository contains the code for the paper PARM: A Paragrap
This code is the implementation of the paper "Coherence-Based Distributed Document Representation Learning for Scientific Documents".
Introduction This code is the implementation of the paper "Coherence-Based Distributed Document Representation Learning for Scientific Documents". If
That project takes as input special TXT File, divides its content into lsit of HTML objects and then creates HTML file from them.
That project takes as input special TXT File, divides its content into lsit of HTML objects and then creates HTML file from them.
Import entity definition document into SQLie3. Manage the entity. Also, create a "Create Table SQL file".
EntityDocumentMaker Version 1.00 After importing the entity definition (Excel file), store the data in sqlite3. エンティティ定義(Excelファイル)をインポートした後、データをsqlit
Python utility library for compositing PDF documents with reportlab.
pdfdoc-py Python utility library for compositing PDF documents with reportlab. Installation The pdfdoc-py package can be installed directly from the s
Getdp-project - A Django-built web app that generates a personalized banner of events to come
getdp-project https://get-my-dp.herokuapp.com/ A Django-built web app that gener
NS-Defacer: a auto html injecter, In other words It's a auto defacer to deface a lot of websites in less time
Overview NS-Defacer is a auto html injecter, In other words It's a auto defacer
Htmdf - html to pdf with support for variables using fastApi.
htmdf Converts html to pdf with support for variables using fastApi. Installation Clone this repository. git clone https://github.com/ShreehariVaasish
A supercharged version of paperless: scan, index and archive all your physical documents
Paperless-ng Paperless (click me) is an application by Daniel Quinn and contributors that indexes your scanned documents and allows you to easily sear
JupyterNotebook - C/C++, Javascript, HTML, LaTex, Shell scripts in Jupyter Notebook Also run them on remote computer
JupyterNotebook Read, write and execute C, C++, Javascript, Shell scripts, HTML, LaTex in jupyter notebook, And also execute them on remote computer R
WyPyPlus is a minimal wiki in 42 lines of Python code.
🍦 WyPyPlus: A personal wiki in 42 lines of code 🍦 WyPyPlus (pronounced "whippy plus") is a minimalist wiki server in 42 lines of code based on wypy
A markdown extension for converting Leiden+ epigraphic text to TEI XML/HTML
LeidenMark $ pip install leidenmark A Python Markdown extension for converting Leiden+ epigraphic text to TEI XML/HTML. Inspired by the Brill plain te
An html wrapper for python
MessySoup What is it? MessySoup is a python wrapper for html elements. While still a ways away, the main goal is to be able to build a wesbite straigh
Use minify-html, the extremely fast HTML + JS + CSS minifier, with Django.
django-minify-html Use minify-html, the extremely fast HTML + JS + CSS minifier, with Django. Requirements Python 3.8 to 3.10 supported. Django 2.2 to
An Instagram bot that can mass text users, receive and read a text, and store it somewhere with user details.
Instagram Bot 🤖 July 14, 2021 Overview 👍 A multifunctionality automated instagram bot that can mass text users, receive and read a message and store
Download clips from youtube videos with a few clicks and a GUI!
YouClip v2.0.0 Table Of Contents: What Is YouClip Installation Usage Stuff To Fix Changelog What Is YouClip? ! IMPORTANT: The source files are a total
A python-based static site generator for setting up a CV/Resume site
ezcv A python-based static site generator for setting up a CV/Resume site Table of Contents What does ezcv do? Features & Roadmap Why should I use ezc
Detection And Breaking With Python
Detection And Breaking IIIIIIIIIIIIIIIIIIII PPPPPPPPPPPPPPPPP VVVVVVVV VVVVVVVV I::::::::II::::::::I P:::::::
A fully-featured e-commerce application powered by Django
kobbyshop - Django Ecommerce App A fully featured e-commerce application powered by Django. Sections Project Description Features Technology Setup Scr
Fast HTML/XML template engine for Python
Overview Chameleon is an HTML/XML template engine for Python. It uses the page templates language. You can use it in any Python web application with j
Find thumbnails and original images from URL or HTML file.
Haul Find thumbnails and original images from URL or HTML file. Demo Hauler on Heroku Installation on Ubuntu $ sudo apt-get install build-essential py
ElasticSearch ODM (Object Document Mapper) for Python - pip install esengine
esengine - The Elasticsearch Object Document Mapper esengine is an ODM (Object Document Mapper) it maps Python classes in to Elasticsearch index/doc_t
inscriptis -- HTML to text conversion library, command line client and Web service
inscriptis -- HTML to text conversion library, command line client and Web service A python based HTML to text conversion library, command line client
Zen-Knit is a formal (PDF), informal (HTML) report generator for data analyst and data scientist who wants to use python.
About Zen-Knit: Zen-Knit is a formal (PDF), informal (HTML) report generator for data analyst and data scientist who wants to use python. Inspired fro
Fast and robust date extraction from web pages, with Python or on the command-line
Find original and updated publication dates of any web page. From the command-line or within Python, all the steps needed from web page download to HTML parsing, scraping, and text analysis are included.
Essential Document Generator
Essential Document Generator Dead Simple Document Generation Whether it's testing database performance or a new web interface, we've all needed a dead
A document format conversion service based on Pandoc.
reformed Document format conversion service based on Pandoc. Usage The API specification for the Reformed server is as follows: GET /api/v1/formats: L
Grimoire is a Python library for creating interactive fiction as hyperlinked html.
Grimoire Grimoire is a Python library for creating interactive fiction as hyperlinked html. Installation pip install grimoire-if Usage Check out the
A library for bridging Python and HTML/Javascript (via Svelte) for creating interactive visualizations
A library for bridging Python and HTML/Javascript (via Svelte) for creating interactive visualizations
Python document object mapper (load python object from JSON and vice-versa)
lupin is a Python JSON object mapper lupin is meant to help in serializing python objects to JSON and unserializing JSON data to python objects. Insta
Flask app + (html+css+ajax) contain ability add employee and place where employee work - plant or salon
#Manage your employees! With all employee information stored in one place, you no longer have to sift through hoards of spreadsheets to manually searc
Convert text with ANSI color codes to HTML or to LaTeX.
Convert text with ANSI color codes to HTML or to LaTeX.
An HTML interface for finetuning the sync map output from aeneas
finetuneas 3.0 finetuneas is a simple HTML interface for fine tuning sync maps output by aeneas Version 3.0 Easier adjusting time: following cells wil
Free casino website. Madden just for learning / fun
Website Casino Free casino website. Madden just for learning / fun. Uses Jinja2 (HTML), Flask, JavaScript, etc. Dice game Preview
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
Hiring We are hiring at all levels (including FTE researchers and interns)! If you are interested in working with us on NLP and large-scale pre-traine
Longformer: The Long-Document Transformer
Longformer Longformer and LongformerEncoderDecoder (LED) are pretrained transformer models for long documents. ***** New December 1st, 2020: Longforme
An extremely configurable markdown reverser for Python3.
🔄 Unmarkd A markdown reverser. Unmarkd is a BeautifulSoup-powered Markdown reverser written in Python and for Python. Why This is created as a StackS
We'll be using HTML, CSS and JavaScript for the frontend
We'll be using HTML, CSS and JavaScript for the frontend. Nothing to install in specific. Open your text-editor and start coding a beautiful front-end.
This "I P L Team Project" is developed by Prasanta Kumar Mohanty using Python with Django web framework, HTML & CSS.
I-P-L-Team-Project This "I P L Team Project" is developed by Prasanta Kumar Mohanty using Python with Django web framework, HTML & CSS. Screenshots HO
A toolkit for document-level event extraction, containing some SOTA model implementations
❤️ A Toolkit for Document-level Event Extraction with & without Triggers Hi, there 👋 . Thanks for your stay in this repo. This project aims at buildi
[AAAI 2022] Sparse Structure Learning via Graph Neural Networks for Inductive Document Classification
Sparse Structure Learning via Graph Neural Networks for inductive document classification Make graph dataset create co-occurrence graph for datasets.
A toolkit for document-level event extraction, containing some SOTA model implementations
Document-level Event Extraction via Heterogeneous Graph-based Interaction Model with a Tracker Source code for ACL-IJCNLP 2021 Long paper: Document-le
Generate modern Python clients from OpenAPI
openapi-python-client Generate modern Python clients from OpenAPI 3.x documents. This generator does not support OpenAPI 2.x FKA Swagger. If you need
Simple integration of Flask and WTForms, including CSRF, file upload and Recaptcha integration.
Flask-WTF Simple integration of Flask and WTForms, including CSRF, file upload, and reCAPTCHA. Links Documentation: https://flask-wtf.readthedocs.io/
🦎 A NeoVim plugin for highlighting visual selections like in a normal document editor!
🦎 HighStr.nvim A NeoVim plugin for highlighting visual selections like in a normal document editor! Demo TL;DR HighStr.nvim is a NeoVim plugin writte
Extract data from a wide range of Internet sources into a pandas DataFrame.
pandas-datareader Up to date remote data access for pandas, works for multiple versions of pandas. Installation Install using pip pip install pandas-d
FollowSpot is a comprehensive audition tracking fullstack web application for entertainment industry professionals.
FollowSpot is a comprehensive audition tracking fullstack web application for entertainment industry professionals. This app allows users to store information/media for all of their auditions while also compiling data and displaying statistics to help track progress.
apysc is the Python frontend library to create html and js file, that has ActionScript 3 (as3)-like interface.
apysc apysc is the Python frontend library to create HTML and js files, that has ActionScript 3 (as3)-like interface. Notes: Currently developing and
Simple integration of Flask and WTForms, including CSRF, file upload and Recaptcha integration.
Flask-WTF Simple integration of Flask and WTForms, including CSRF, file upload, and reCAPTCHA. Links Documentation: https://flask-wtf.readthedocs.io/
Flask html response minifier
Flask-HTMLmin Minify flask text/html mime type responses. Just add MINIFY_HTML = True to your deployment config to minify HTML and text responses of y
Bnagla hand written document digiiztion
Bnagla hand written document digiiztion This repo addresses the problem of digiizing hand written documents in Bangla. Documents have definite fields
A Python module and command-line utility for converting .ANS format ANSI art to HTML
ansipants A Python module and command-line utility for converting .ANS format ANSI art to HTML. Installation pip install ansipants Command-line usage
The repo for reproducing Seed-driven Document Ranking for Systematic Reviews: A Reproducibility Study
ECIR Reproducibility Paper: Seed-driven Document Ranking for Systematic Reviews: A Reproducibility Study This code corresponds to the reproducibility
Python script for converting obsidian md-file to html (recursively adds all link/images)
ObsidianToHtmlConverter I made a small python script for converting obsidian md-file to static (local) html (recursively adds all link/images) I made
TensorFlow implementation of the paper "Hierarchical Attention Networks for Document Classification"
Hierarchical Attention Networks for Document Classification This is an implementation of the paper Hierarchical Attention Networks for Document Classi
DUE: End-to-End Document Understanding Benchmark
This is the repository that provide tools to download data, reproduce the baseline results and evaluation. What can you achieve with this guide Based
Machine learning model evaluation made easy: plots, tables, HTML reports, experiment tracking and Jupyter notebook analysis.
sklearn-evaluation Machine learning model evaluation made easy: plots, tables, HTML reports, experiment tracking, and Jupyter notebook analysis. Suppo
The first public repository that provides free BUBT website scraping API script on Github.
BUBT WEBSITE SCRAPPING SCRIPT I think this is the first public repository that provides free BUBT website scraping API script on github. When I was do
Python script to tabulate data formats like json, csv, html, etc
pyT PyT is a a command line tool and as well a library for visualising various data formats like: JSON HTML Table CSV XML, etc. Features Print table o
Pytorch implementation of the paper "Topic Modeling Revisited: A Document Graph-based Neural Network Perspective"
Graph Neural Topic Model (GNTM) This is the pytorch implementation of the paper "Topic Modeling Revisited: A Document Graph-based Neural Network Persp
Table (Finnish Taulukko) glued together to transform into hands-free living.
taulukko Table (Finnish Taulukko) glued together to transform into hands-free living. Installation Preferred way to install is as usual (for testing o
Small wrapper around 3dmol.js and html2canvas for creating self-contained HTML files that display a 3D molecular representation.
Description Small wrapper around 3dmol.js and html2canvas for creating self-contained HTML files that display a 3D molecular representation. Double cl