385 Repositories
Python html-extraction Libraries
PICK: Processing Key Information Extraction from Documents using Improved Graph Learning-Convolutional Networks
Code for the paper "PICK: Processing Key Information Extraction from Documents using Improved Graph Learning-Convolutional Networks" (ICPR 2020)
A Partition Filter Network for Joint Entity and Relation Extraction EMNLP 2021
EMNLP 2021 - A Partition Filter Network for Joint Entity and Relation Extraction
Create a static HTML/CSS image gallery from a bunch of images.
gallerize Create a static HTML/CSS image gallery from a bunch of images.
HTML2Image is a lightweight Python package that acts as a wrapper around the headless mode of existing web browsers to generate images from URLs and from HTML+CSS strings or files.
A package acting as a wrapper around the headless mode of existing web browsers to generate images from URLs and from HTML+CSS strings or files.
Aspect-Sentiment-Multiple-Opinion Triplet Extraction (NLPCC 2021)
The code and data for the paper "Aspect-Sentiment-Multiple-Opinion Triplet Extraction" Requirements Python 3.6.8 torch==1.2.0 pytorch-transformers==1.
Open & Efficient for Framework for Aspect-based Sentiment Analysis
PyABSA - Open & Efficient for Framework for Aspect-based Sentiment Analysis Fast & Low Memory requirement & Enhanced implementation of Local Context F
The PyTorch re-implement of a 3D CNN Tracker to extract coronary artery centerlines with state-of-the-art (SOTA) performance. (paper: 'Coronary artery centerline extraction in cardiac CT angiography using a CNN-based orientation classifier')
The PyTorch re-implement of a 3D CNN Tracker to extract coronary artery centerlines with state-of-the-art (SOTA) performance. (paper: 'Coronary artery centerline extraction in cardiac CT angiography using a CNN-based orientation classifier')
Kimimaro: Skeletonize Densely Labeled Images
Kimimaro: Skeletonize Densely Labeled Images # Produce SWC files from volumetric images. kimimaro forge labels.npy --progress # writes to ./kimimaro_o
Acoustic mosquito detection code with Bayesian Neural Networks
HumBugDB Acoustic mosquito detection with Bayesian Neural Networks. Extract audio or features from our large-scale dataset on Zenodo. This repository
A Dataset for Direct Quotation Extraction and Attribution in News Articles.
DirectQuote - A Dataset for Direct Quotation Extraction and Attribution in News Articles DirectQuote is a corpus containing 19,760 paragraphs and 10,3
ONYX SMTP Sender est un tool qui vous serviras à envoyer des email html à une liste d'email (en .txt) c'est la première version du tool et je le sors un peu à la rache donc si le logiciel est obsolète c'est normal j'y taff encore ;)
SMTP-Sender ONYX SMTP Sender est un tool qui vous serviras à envoyer des email html à une liste d'email (en .txt) c'est la première version du tool et
Swagger UI is a collection of HTML, JavaScript, and CSS assets that dynamically generate beautiful documentation from a Swagger-compliant API.
Introduction Swagger UI allows anyone — be it your development team or your end consumers — to visualize and interact with the API’s resources without
ApiPy was created for api testing with Python pytest framework which has also requests, assertpy and pytest-html-reporter libraries.
ApiPy was created for api testing with Python pytest framework which has also requests, assertpy and pytest-html-reporter libraries. With this f
L-SpEx: Localized Target Speaker Extraction
L-SpEx: Localized Target Speaker Extraction The data configuration and simulation of L-SpEx. The code scripts will be released in the future. Data Gen
A markdown lexer and parser which gives the programmer atomic control over markdown parsing to html.
A markdown lexer and parser which gives the programmer atomic control over markdown parsing to html.
👁️ Tool for Data Extraction and Web Requests.
httpmapper 👁️ Project • Technologies • Installation • How it works • License Project 🚧 For educational purposes. This is a project that I developed,
Docbarcodes extracts 1D and 2D barcodes from scanned PDF documents or images. It can be used to automate extraction and processing of all kind of documents.
Intro Barcodes are being used in many documents or forms to enable machine reading capabilities and reduce manual processing effort. Simple 1D barcode
An easy-to-use Python module that helps you to extract the BERT embeddings for a large text dataset (Bengali/English) efficiently.
An easy-to-use Python module that helps you to extract the BERT embeddings for a large text dataset (Bengali/English) efficiently.
The Main Pythonic Version Of Twig Using Nextcord
The Main Pythonic Version Of Twig Using Nextcord
Code for "Generating Disentangled Arguments with Prompts: a Simple Event Extraction Framework that Works"
GDAP The code of paper "Code for "Generating Disentangled Arguments with Prompts: a Simple Event Extraction Framework that Works"" Event Datasets Prep
Code repo for EMNLP21 paper "Zero-Shot Information Extraction as a Unified Text-to-Triple Translation"
Zero-Shot Information Extraction as a Unified Text-to-Triple Translation Source code repo for paper Zero-Shot Information Extraction as a Unified Text
🔥🔥High-Performance Face Recognition Library on PaddlePaddle & PyTorch🔥🔥
face.evoLVe: High-Performance Face Recognition Library based on PaddlePaddle & PyTorch Evolve to be more comprehensive, effective and efficient for fa
A PyTorch implementation of EfficientNet and EfficientNetV2 (coming soon!)
EfficientNet PyTorch Quickstart Install with pip install efficientnet_pytorch and load a pretrained EfficientNet with: from efficientnet_pytorch impor
Sign Language is detected in realtime using video sequences. Our approach involves MediaPipe Holistic for keypoints extraction and LSTM Model for prediction.
RealTime Sign Language Detection using Action Recognition Approach Real-Time Sign Language is commonly predicted using models whose architecture consi
Pytorch Feature Map Extractor
MapExtrackt Convolutional Neural Networks Are Beautiful We all take our eyes for granted, we glance at an object for an instant and our brains can ide
Parser manager for parsing DOC, DOCX, PDF or HTML files
Parser manager Description Parser gets PDF, DOC, DOCX or HTML file via API and saves parsed data to the database. Implemented in Ruby 3.0.1 using Acti
A web app which allows user to query the weather info of any place in the world
weather-app This is a web app which allows user to get the weather info of any place in the world as soon as possible. It makes use of OpenWeatherMap
Convert emails without attachments to pdf and send as email
Email to PDF to email This script will check an imap folder for unread emails. Any unread email that does not have an attachment will be converted to
Bleach is an allowed-list-based HTML sanitizing library that escapes or strips markup and attributes
Bleach Bleach is an allowed-list-based HTML sanitizing library that escapes or strips markup and attributes. Bleach can also linkify text safely, appl
An API that renders HTML/CSS content to PNG using Chromium
html_png An API that renders HTML/CSS content to PNG using Chromium Disclaimer I am not responsible if you happen to make your own instance of this AP
The best way to have DRY Django forms. The app provides a tag and filter that lets you quickly render forms in a div format while providing an enormous amount of capability to configure and control the rendered HTML.
django-crispy-forms The best way to have Django DRY forms. Build programmatic reusable layouts out of components, having full control of the rendered
AnnIE - Annotation Platform, tool for open information extraction annotations using text files.
AnnIE - Annotation Platform, tool for open information extraction annotations using text files.
A Django app that allows you to send email asynchronously in Django. Supports HTML email, database backed templates and logging.
Django Post Office Django Post Office is a simple app to send and manage your emails in Django. Some awesome features are: Allows you to send email as
Source code for "Pack Together: Entity and Relation Extraction with Levitated Marker"
PL-Marker Source code for Pack Together: Entity and Relation Extraction with Levitated Marker. Quick links Overview Setup Install Dependencies Data Pr
A django compressor tool that bundles css, js files to a single css, js file with webpack and updates your html files with respective css, js file path.
django-webpacker's documentation: Introduction: django-webpacker is a django compressor tool which bundles css, js files to a single css, js file with
HTML minifier for Python frameworks (not only Django, despite the name).
django-htmlmin django-html is an HTML minifier for Python, with full support for HTML 5. It supports Django, Flask and many other Python web framework
Advance Image Downloader/Extractor (Job) is a Python-Flask web-based app, which will help the user download the any kind of Images at any date and time over the internet. These images will get downloaded as a job and then let user know that the images have been downloaded by sending them a link over an email.
Advance Image Downloader/Extractor(Job) Advance Image Downloader/Extractor (Job) is a Python-Flask web-based app, which will help the user download th
This repo contain builders of cab file, html file, and docx file for CVE-2021-40444 exploit
CVE-2021-40444 builders This repo contain builders of cab file, html file, and docx file for CVE-2021-40444 exploit. This repo is just for testing, re
A minimalistic manga reader for desktop built with React and Django
smanga A minimalistic manga reader/server for serving local manga images on desktop browser. Provides a two-page view layout just as reading a physica
Python Library to Extract youtube video Tags without Youtube API
YoutubeTags Python Library to Extract youtube video Tags without Youtube API Installation pip install YoutubeTags Example import YoutubeTags from Yout
Python lib for Simple PDF text extraction
Python lib for Simple PDF text extraction
WeasyPrint is a smart solution helping web developers to create PDF documents.
WeasyPrint is a smart solution helping web developers to create PDF documents. It turns simple HTML pages into gorgeous statistical reports, invoices, tickets…
jq for Python programmers Process JSON and HTML on the command-line with familiar syntax.
jq for Python programmers Process JSON and HTML on the command-line with familiar syntax.
The (Official) PyTorch Implementation of the paper "Deep Extraction of Manga Structural Lines"
MangaLineExtraction_PyTorch The (Official) PyTorch Implementation of the paper "Deep Extraction of Manga Structural Lines" Usage model_torch.py [sourc
Goblyn is a Python tool focused to enumeration and capture of website files metadata.
Goblyn Metadata Enumeration What's Goblyn? Goblyn is a tool focused to enumeration and capture of website files metadata. How it works? Goblyn will se
A simple machine learning package to cluster keywords in higher-level groups.
Simple Keyword Clusterer A simple machine learning package to cluster keywords in higher-level groups. Example: "Senior Frontend Engineer" -- "Fronte
A super simple script which uses the GitHub API to convert your markdown files to GitHub styled HTML site.
A super simple script which uses the GitHub API to convert your markdown files to GitHub styled HTML site.
Build reusable components in Django without writing a single line of Python.
Build reusable components in Django without writing a single line of Python. {% #quote %} {% quote_photo src="/project-hail-mary.jpg" %} {% #quot
Easy compression and extraction for any compression or archival format.
Tzar: Tar, Zip, Anything Really Easy compression and extraction for any compression or archival format. Usage/Examples tzar compress large-dir compres
A management system designed for the employees of MIRAS (Art Gallery). It is used to sell/cancel tickets, book/cancel events and keeps track of all upcoming events.
Art-Galleria-Management-System Its a management system designed for the employees of MIRAS (Art Gallery). Backend : Python Frontend : Django Database
Extract knowledge from raw text
Extract knowledge from raw text This repository is a nearly copy-paste of "From Text to Knowledge: The Information Extraction Pipeline" with some cosm
Py_extract is a simple, light-weight python library to handle some extraction tasks using less lines of code
py_extract Py_extract is a simple, light-weight python library to handle some extraction tasks using less lines of code. Still in Development Stage! I
Code for technical report "An Improved Baseline for Sentence-level Relation Extraction".
RE_improved_baseline Code for technical report "An Improved Baseline for Sentence-level Relation Extraction". Requirements torch = 1.8.1 transformers
HtmlWebShot - A python3 package which Can Create Images From url, Html-CSS, Svg and from any readable file and texts with many setup features.
A python3 package which Can Create Images From url, Html-CSS, Svg and from any readable file and texts with many setup features
open-information-extraction-system, build open-knowledge-graph(SPO, subject-predicate-object) by pyltp(version==3.4.0)
中文开放信息抽取系统, open-information-extraction-system, build open-knowledge-graph(SPO, subject-predicate-object) by pyltp(version==3.4.0)
An implementation for `Text2Event: Controllable Sequence-to-Structure Generation for End-to-end Event Extraction`
Text2Event An implementation for Text2Event: Controllable Sequence-to-Structure Generation for End-to-end Event Extraction Please contact Yaojie Lu (@
A project for developing transformer-based models for clinical relation extraction
Clinical Relation Extration with Transformers Aim This package is developed for researchers easily to use state-of-the-art transformers models for ext
Evidently helps analyze machine learning models during validation or production monitoring
Evidently helps analyze machine learning models during validation or production monitoring. The tool generates interactive visual reports and JSON profiles from pandas DataFrame or csv files. Currently 6 reports are available.
Kaggle Tweet Sentiment Extraction Competition: 1st place solution (Dark of the Moon team)
Kaggle Tweet Sentiment Extraction Competition: 1st place solution (Dark of the Moon team)
Simple reuse of partial HTML page templates in the Jinja template language for Python web frameworks.
Jinja Partials Simple reuse of partial HTML page templates in the Jinja template language for Python web frameworks. (There is also a Pyramid/Chameleo
A Python tool to generate a static HTML file that represents the internal structure of a PDF file
PDFSyntax A Python tool to generate a static HTML file that represents the internal structure of a PDF file At some point the low-level functions deve
Kaldi-compatible feature extraction with PyTorch, supporting CUDA, batch processing, chunk processing, and autograd
Kaldi-compatible feature extraction with PyTorch, supporting CUDA, batch processing, chunk processing, and autograd
Source code for "UniRE: A Unified Label Space for Entity Relation Extraction.", ACL2021.
UniRE Source code for "UniRE: A Unified Label Space for Entity Relation Extraction.", ACL2021. Requirements python: 3.7.6 pytorch: 1.8.1 transformers:
A PyTorch-based open-source framework that provides methods for improving the weakly annotated data and allows researchers to efficiently develop and compare their own methods.
Knodle (Knowledge-supervised Deep Learning Framework) - a new framework for weak supervision with neural networks. It provides a modularization for se
mlscraper: Scrape data from HTML pages automatically with Machine Learning
🤖 Scrape data from HTML websites automatically with Machine Learning
[ACL 20] Probing Linguistic Features of Sentence-level Representations in Neural Relation Extraction
REval Table of Contents Introduction Overview Requirements Installation Probing Usage Citation License 🎓 Introduction REval is a simple framework for
An Efficient Implementation of Analytic Mesh Algorithm for 3D Iso-surface Extraction from Neural Networks
AnalyticMesh Analytic Marching is an exact meshing solution from neural networks. Compared to standard methods, it completely avoids geometric and top
Implementation for our AAAI2021 paper (Entity Structure Within and Throughout: Modeling Mention Dependencies for Document-Level Relation Extraction).
SSAN Introduction This is the pytorch implementation of the SSAN model (see our AAAI2021 paper: Entity Structure Within and Throughout: Modeling Menti
Sierra is a lightweight Python framework for building and integrating web applications
A lightweight Python framework for building and Integrating Web Applications. Sierra is a Python3 library for building and integrating web applications with HTML and CSS using simple enough syntax. You can develop your web applications with Python, taking advantage of its functionalities and integrating them to the fullest.
flexible time-series processing & feature extraction
tsflex is a toolkit for flexible time-series processing & feature extraction, making few assumptions about input data. Useful links Documentation Exam
✈️ HTML Template engine for python. Supports XSS preventation and many more!
Htmotor HTML Template Engine for Python! Installation: Open your terminal and type pip install htmotor.
spafe: Simplified Python Audio-Features Extraction
spafe aims to simplify features extractions from mono audio files. The library can extract of the following features: BFCC, LFCC, LPC, LPCC, MFCC, IMFCC, MSRCC, NGCC, PNCC, PSRCC, PLP, RPLP, Frequency-stats etc. It also provides various filterbank modules (Mel, Bark and Gammatone filterbanks) and other spectral statistics.
Adversarial Robustness Toolbox (ART) - Python Library for Machine Learning Security - Evasion, Poisoning, Extraction, Inference - Red and Blue Teams
Adversarial Robustness Toolbox (ART) is a Python library for Machine Learning Security. ART provides tools that enable developers and researchers to defend and evaluate Machine Learning models and applications against the adversarial threats of Evasion, Poisoning, Extraction, and Inference. ART supports all popular machine learning frameworks (TensorFlow, Keras, PyTorch, MXNet, scikit-learn, XGBoost, LightGBM, CatBoost, GPy, etc.), all data types (images, tables, audio, video, etc.) and machine learning tasks (classification, object detection, speech recognition, generation, certification, etc.).
This repository contains the code for our fast polygonal building extraction from overhead images pipeline.
Polygonal Building Segmentation by Frame Field Learning We add a frame field output to an image segmentation neural network to improve segmentation qu
ddgr is a cmdline utility to search DuckDuckGo (html version) from the terminal
ddgr is a cmdline utility to search DuckDuckGo (html version) from the terminal. While googler is extremely popular among cmdline users, in many forums the need of a similar utility for privacy-aware DuckDuckGo came up. DuckDuckGo Bangs are super-cool too! So here's ddgr for you!
A pre-attack hacker tool which aims to find out sensitives comments in HTML comment tag and to help on reconnaissance process
Find Out in Comment Find sensetive comment out in HTML ⚈ About This is a pre-attack hacker tool that searches for sensitives words in HTML comments ta
Code for the CVPR 2021 paper: Understanding Failures of Deep Networks via Robust Feature Extraction
Welcome to Barlow Barlow is a tool for identifying the failure modes for a given neural network. To achieve this, Barlow first creates a group of imag
The coda and data for "Measuring Fine-Grained Domain Relevance of Terms: A Hierarchical Core-Fringe Approach" (ACL '21)
We propose a hierarchical core-fringe learning framework to measure fine-grained domain relevance of terms – the degree that a term is relevant to a broad (e.g., computer science) or narrow (e.g., deep learning) domain.
T-LOAM: Truncated Least Squares Lidar-only Odometry and Mapping in Real-Time
T-LOAM: Truncated Least Squares Lidar-only Odometry and Mapping in Real-Time The first Lidar-only odometry framework with high performance based on tr
Training data extraction on GPT-2
Training data extraction from GPT-2 This repository contains code for extracting training data from GPT-2, following the approach outlined in the foll
Source code for paper "Document-Level Relation Extraction with Adaptive Thresholding and Localized Context Pooling", AAAI 2021
ATLOP Code for AAAI 2021 paper Document-Level Relation Extraction with Adaptive Thresholding and Localized Context Pooling. If you make use of this co
Datasets of Automatic Keyphrase Extraction
This repository contains 20 annotated datasets of Automatic Keyphrase Extraction made available by the research community. Following are the datasets and the original papers that proposed them. If you know more datasets, and want to contribute, please, notify me.
Cross-media Structured Common Space for Multimedia Event Extraction (ACL2020)
Cross-media Structured Common Space for Multimedia Event Extraction Table of Contents Overview Requirements Data Quickstart Citation Overview The code
Deep Image Search is an AI-based image search engine that includes deep transfor learning features Extraction and tree-based vectorized search.
Deep Image Search - AI-Based Image Search Engine Deep Image Search is an AI-based image search engine that includes deep transfer learning features Ex
Code for paper "Document-Level Argument Extraction by Conditional Generation". NAACL 21'
Argument Extraction by Generation Code for paper "Document-Level Argument Extraction by Conditional Generation". NAACL 21' Dependencies pytorch=1.6 tr
A Feature Rich Modular Malware Configuration Extraction Utility for MalDuck
Malware Configuration Extractor A Malware Configuration Extraction Tool and Modules for MalDuck This project is FREE as in FREE 🍺 , use it commercial
Pytorch reimplement of the paper "A Novel Cascade Binary Tagging Framework for Relational Triple Extraction" ACL2020. The original code is written in keras.
CasRel-pytorch-reimplement Pytorch reimplement of the paper "A Novel Cascade Binary Tagging Framework for Relational Triple Extraction" ACL2020. The o
Few-shot Relation Extraction via Bayesian Meta-learning on Relation Graphs
Few-shot Relation Extraction via Bayesian Meta-learning on Relation Graphs This is an implemetation of the paper Few-shot Relation Extraction via Baye
Simple HTML and PDF document generator for Python - with built-in support for popular data analysis and plotting libraries.
Esparto is a simple HTML and PDF document generator for Python. Its primary use is for generating shareable single page reports with content from popular analytics and data science libraries.
pytorch-kaldi is a project for developing state-of-the-art DNN/RNN hybrid speech recognition systems. The DNN part is managed by pytorch, while feature extraction, label computation, and decoding are performed with the kaldi toolkit.
The PyTorch-Kaldi Speech Recognition Toolkit PyTorch-Kaldi is an open-source repository for developing state-of-the-art DNN/HMM speech recognition sys
Tweak the form field rendering in templates, not in python-level form definitions. CSS classes and HTML attributes can be altered.
django-widget-tweaks Tweak the form field rendering in templates, not in python-level form definitions. Altering CSS classes and HTML attributes is su
The best way to have DRY Django forms. The app provides a tag and filter that lets you quickly render forms in a div format while providing an enormous amount of capability to configure and control the rendered HTML.
django-crispy-forms The best way to have Django DRY forms. Build programmatic reusable layouts out of components, having full control of the rendered
Code for the paper "Relation of the Relations: A New Formalization of the Relation Extraction Problem"
This repo contains the code for the EMNLP 2020 paper "Relation of the Relations: A New Paradigm of the Relation Extraction Problem" (Jin et al., 2020)
SpikeX - SpaCy Pipes for Knowledge Extraction
SpikeX is a collection of pipes ready to be plugged in a spaCy pipeline. It aims to help in building knowledge extraction tools with almost-zero effort.
OpenMMLab Text Detection, Recognition and Understanding Toolbox
Introduction English | 简体中文 MMOCR is an open-source toolbox based on PyTorch and mmdetection for text detection, text recognition, and the correspondi
PlenOctree Extraction algorithm
PlenOctrees_NeRF-SH This is an implementation of the Paper PlenOctrees for Real-time Rendering of Neural Radiance Fields. Not only the code provides t
Tools for the extraction of OpenStreetMap street network data
OSMnet Tools for the extraction of OpenStreetMap (OSM) street network data. Intended to be used in tandem with Pandana and UrbanAccess libraries to ex
git《Joint Entity and Relation Extraction with Set Prediction Networks》(2020) GitHub:
Joint Entity and Relation Extraction with Set Prediction Networks Source code for Joint Entity and Relation Extraction with Set Prediction Networks. W
Automatic extraction of relevant features from time series:
tsfresh This repository contains the TSFRESH python package. The abbreviation stands for "Time Series Feature extraction based on scalable hypothesis
🏆 A ranked list of awesome python libraries for web development. Updated weekly.
Best-of Web Development with Python 🏆 A ranked list of awesome python libraries for web development. Updated weekly. This curated list contains 540 a