292 Repositories
Python csv-parsing Libraries
ICLR 2021: Pre-Training for Context Representation in Conversational Semantic Parsing
SCoRe: Pre-Training for Context Representation in Conversational Semantic Parsing This repository contains code for the ICLR 2021 paper "SCoRE: Pre-Tr
Python code to generate and store certificates automatically , using names from a csv file
WOC-certificate-generator Python code to generate and store certificates automatically , using names from a csv file IMPORTANT In order to make the co
Multiple paper open-source codes of the Microsoft Research Asia DKI group
📫 Paper Code Collection (MSRA DKI Group) This repo hosts multiple open-source codes of the Microsoft Research Asia DKI Group. You could find the corr
📊📈 Serves up Pandas dataframes via the Django REST Framework for use in client-side (i.e. d3.js) visualizations and offline analysis (e.g. Excel)
📊📈 Serves up Pandas dataframes via the Django REST Framework for use in client-side (i.e. d3.js) visualizations and offline analysis (e.g. Excel)
Parser manager for parsing DOC, DOCX, PDF or HTML files
Parser manager Description Parser gets PDF, DOC, DOCX or HTML file via API and saves parsed data to the database. Implemented in Ruby 3.0.1 using Acti
Intent parsing and slot filling in PyTorch with seq2seq + attention
PyTorch Seq2Seq Intent Parsing Reframing intent parsing as a human - machine translation task. Work in progress successor to torch-seq2seq-intent-pars
PICARD - Parsing Incrementally for Constrained Auto-Regressive Decoding from Language Models
This is the official implementation of the following paper: Torsten Scholak, Nathan Schucher, Dzmitry Bahdanau. PICARD - Parsing Incrementally for Con
A Domain Specific Language (DSL) for building language patterns. These can be later compiled into spaCy patterns, pure regex, or any other format
RITA DSL This is a language, loosely based on language Apache UIMA RUTA, focused on writing manual language rules, which compiles into either spaCy co
Neural network models for joint POS tagging and dependency parsing (CoNLL 2017-2018)
Neural Network Models for Joint POS Tagging and Dependency Parsing Implementations of joint models for POS tagging and dependency parsing, as describe
This repo in the implementation of EMNLP'21 paper "SPARQLing Database Queries from Intermediate Question Decompositions" by Irina Saparina, Anton Osokin
SPARQLing Database Queries from Intermediate Question Decompositions This repo is the implementation of the following paper: SPARQLing Database Querie
📊📈 Serves up Pandas dataframes via the Django REST Framework for use in client-side (i.e. d3.js) visualizations and offline analysis (e.g. Excel)
Django REST Pandas Django REST Framework + pandas = A Model-driven Visualization API Django REST Pandas (DRP) provides a simple way to generate and se
Produces a summary CSV report of an Amber Electric customer's energy consumption and cost data.
Amber Electric Usage Summary This is a command line tool that produces a summary CSV report of an Amber Electric customer's energy consumption and cos
Fully reproducible, Dockerized, step-by-step, tutorial on how to mock a "real-time" Kafka data stream from a timestamped csv file. Detailed blog post published on Towards Data Science.
time-series-kafka-demo Mock stream producer for time series data using Kafka. I walk through this tutorial and others here on GitHub and on my Medium
PIZZA - a task-oriented semantic parsing dataset
The PIZZA dataset continues the exploration of task-oriented parsing by introducing a new dataset for parsing pizza and drink orders, whose semantics cannot be captured by flat slots and intents.
A modern simfile parsing & editing library for Python 3
A modern simfile parsing & editing library for Python 3
Code and checkpoints for training the transformer-based Table QA models introduced in the paper TAPAS: Weakly Supervised Table Parsing via Pre-training.
End-to-end neural table-text understanding models.
This is an official implementation for the WTW Dataset in "Parsing Table Structures in the Wild " on table detection and table structure recognition.
WTW-Dataset This is an official implementation for the WTW Dataset in "Parsing Table Structures in the Wild " on ICCV 2021. Here, you can download the
Implementation of fast algorithms for Maximum Spanning Tree (MST) parsing that includes fast ArcMax+Reweighting+Tarjan algorithm for single-root dependency parsing.
Fast MST Algorithm Implementation of fast algorithms for (Maximum Spanning Tree) MST parsing that includes fast ArcMax+Reweighting+Tarjan algorithm fo
This is a tool to make easier brawl stars modding using csv manipulation
Brawler Maker : Modding Tool for Brawl Stars This is a tool to make easier brawl stars modding using csv manipulation if you want to support me, just
TAPEX: Table Pre-training via Learning a Neural SQL Executor
TAPEX: Table Pre-training via Learning a Neural SQL Executor The official repository which contains the code and pre-trained models for our paper TAPE
The official implementation of CSG-Stump: A Learning Friendly CSG-Like Representation for Interpretable Shape Parsing
CSGStumpNet The official implementation of CSG-Stump: A Learning Friendly CSG-Like Representation for Interpretable Shape Parsing Paper | Project page
Intent parsing and slot filling in PyTorch with seq2seq + attention
PyTorch Seq2Seq Intent Parsing Reframing intent parsing as a human - machine translation task. Work in progress successor to torch-seq2seq-intent-pars
A pytorch implementation of the CVPR2021 paper "VSPW: A Large-scale Dataset for Video Scene Parsing in the Wild"
VSPW: A Large-scale Dataset for Video Scene Parsing in the Wild A pytorch implementation of the CVPR2021 paper "VSPW: A Large-scale Dataset for Video
Pytorch implementation for Semantic Segmentation/Scene Parsing on MIT ADE20K dataset
Semantic Segmentation on MIT ADE20K dataset in PyTorch This is a PyTorch implementation of semantic segmentation models on MIT ADE20K scene parsing da
Download history data from binance and save to dataframe or csv file
Binance history data downloader Download history data from binance and save to dataframe or csv file
Convert your subscriptions csv file into a valid json for Newpipe!
Newpipe-CSV-Fixer Convert your Google subscriptions CSV file into a valid JSON for Newpipe! Thanks to nikcorg for sharing how to convert the CSV into
A Web Scraper built with beautiful soup, that fetches udemy course information. Get udemy course information and convert it to json, csv or xml file
Udemy Scraper A Web Scraper built with beautiful soup, that fetches udemy course information. Installation Virtual Environment Firstly, it is recommen
自然言語で書かれた時間情報表現を抽出/規格化するルールベースの解析器
ja-timex 自然言語で書かれた時間情報表現を抽出/規格化するルールベースの解析器 概要 ja-timex は、現代日本語で書かれた自然文に含まれる時間情報表現を抽出しTIMEX3と呼ばれるアノテーション仕様に変換することで、プログラムが利用できるような形に規格化するルールベースの解析器です。
2nd solution of ICDAR 2021 Competition on Scientific Literature Parsing, Task B.
TableMASTER-mmocr Contents About The Project Method Description Dependency Getting Started Prerequisites Installation Usage Data preprocess Train Infe
Tomli is a Python library for parsing TOML
Tomli A lil' TOML parser Table of Contents generated with mdformat-toc Intro Installation Usage Parse a TOML string Parse a TOML file Handle invalid T
Pytorch implementation of “Recursive Non-Autoregressive Graph-to-Graph Transformer for Dependency Parsing with Iterative Refinement”
Graph-to-Graph Transformers Self-attention models, such as Transformer, have been hugely successful in a wide range of natural language processing (NL
Text-to-SQL in the Wild: A Naturally-Occurring Dataset Based on Stack Exchange Data
SEDE SEDE (Stack Exchange Data Explorer) is new dataset for Text-to-SQL tasks with more than 12,000 SQL queries and their natural language description
Tomli is a Python library for parsing TOML. Tomli is fully compatible with TOML v1.0.0.
Tomli A lil' TOML parser Table of Contents generated with mdformat-toc Intro Installation Usage Parse a TOML string Parse a TOML file Handle invalid T
Summarize LSF job properties by parsing log files.
Summarize LSF job properties by parsing log files of workflows executed by Snakemake.
A versatile token stream for handwritten parsers.
Writing recursive-descent parsers by hand can be quite elegant but it's often a bit more verbose than expected, especially when it comes to handling indentation and reporting proper syntax errors. This package provides a powerful general-purpose token stream that addresses these issues and more.
App and Python library for parsing, writing, and validation of the STAND013 file format.
python-stand013 python-stand013 is a Python app and library for parsing, writing, and validation of the STAND013 file format. Features The following i
(AAAI 2021) Progressive One-shot Human Parsing
End-to-end One-shot Human Parsing This is the official repository for our two papers: Progressive One-shot Human Parsing (AAAI 2021) End-to-end One-sh
Vehicle Identification Speed Detection (VISD) extracts vehicle information like License Plate number, Manufacturer and colour from a video and provides this data in the form of a CSV file
Vehicle Identification Speed Detection (VISD) extracts vehicle information like License Plate number, Manufacturer and colour from a video and provides this data in the form of a CSV file. VISD can also perform vehicle speed detection on a video. All these features of VSID are provided to the user using a Web Application which is created using Flask
(AAAI2020)Grapy-ML: Graph Pyramid Mutual Learning for Cross-dataset Human Parsing
Grapy-ML: Graph Pyramid Mutual Learning for Cross-dataset Human Parsing This repository contains pytorch source code for AAAI2020 oral paper: Grapy-ML
simdjson : Parsing gigabytes of JSON per second
JSON is everywhere on the Internet. Servers spend a *lot* of time parsing it. We need a fresh approach. The simdjson library uses commonly available SIMD instructions and microparallel algorithms to parse JSON 4x faster than RapidJSON and 25x faster than JSON for Modern C++.
:hot_pepper: R²SQL: "Dynamic Hybrid Relation Network for Cross-Domain Context-Dependent Semantic Parsing." (AAAI 2021)
R²SQL The PyTorch implementation of paper Dynamic Hybrid Relation Network for Cross-Domain Context-Dependent Semantic Parsing. (AAAI 2021) Requirement
Ethereum ETL lets you convert blockchain data into convenient formats like CSVs and relational databases.
Python scripts for ETL (extract, transform and load) jobs for Ethereum blocks, transactions, ERC20 / ERC721 tokens, transfers, receipts, logs, contracts, internal transactions.
Django application and library for importing and exporting data with admin integration.
django-import-export django-import-export is a Django application and library for importing and exporting data with included admin integration. Featur
a tool that compiles a csv of all h1 program stats
h1stats - h1 Program Stats Scraper This python3 script will call out to HackerOne's graphql API and scrape all currently active programs for informati
The official PyTorch implementation of the paper: *Xili Dai, Xiaojun Yuan, Haigang Gong, Yi Ma. "Fully Convolutional Line Parsing." *.
F-Clip — Fully Convolutional Line Parsing This repository contains the official PyTorch implementation of the paper: *Xili Dai, Xiaojun Yuan, Haigang
QueraToCSV is a simple python CLI project to convert the Quera results file into CSV files.
Quera is an Iranian Learning management system (LMS) that has an online judge for programming languages. Some Iranian universities use it to automate the evaluation of programming assignments.
Python bindings to libpostal for fast international address parsing/normalization
pypostal These are the official Python bindings to https://github.com/openvenues/libpostal, a fast statistical parser/normalizer for street addresses
Exports saved posts and comments on Reddit to a csv file.
reddit-saved-to-csv Exports saved posts and comments on Reddit to a csv file. Columns: ID, Name, Subreddit, Type, URL, NoSFW ID: Starts from 1 and inc
Release of SPLASH: Dataset for semantic parse correction with natural language feedback in the context of text-to-SQL parsing
SPLASH: Semantic Parsing with Language Assistance from Humans SPLASH is dataset for the task of semantic parse correction with natural language feedba
SIEM Logstash parsing for more than hundred technologies
LogIndexer Pipeline Logstash Parsing Configurations for Elastisearch SIEM and OpenDistro for Elasticsearch SIEM Why this project exists The overhead o
Turn images of tables into CSV data. Detect tables from images and run OCR on the cells.
Table of Contents Overview Requirements Demo Modules Overview This python package contains modules to help with finding and extracting tabular data fr
cysimdjson - Very fast Python JSON parsing library
Fast JSON parsing library for Python, 7-12 times faster than standard Python JSON parser.
An open source multi-tool for exploring and publishing data
Datasette An open source multi-tool for exploring and publishing data Datasette is a tool for exploring and publishing data. It helps people take data
Parse Robinhood 1099 Tax Document from PDF into CSV
Robinhood 1099 Parser This project converts Robinhood Securities 1099 tax document from PDF to CSV file. This tool will be helpful for those who need
Convert tables stored as images to an usable .csv file
Convert an image of numbers to a .csv file This Python program aims to convert images of array numbers to corresponding .csv files. It uses OpenCV for
Yet Another Compiler Visualizer
yacv: Yet Another Compiler Visualizer yacv is a tool for visualizing various aspects of typical LL(1) and LR parsers. Check out demo on YouTube to see
A Python toolkit for processing tabular data
meza: A Python toolkit for processing tabular data Index Introduction | Requirements | Motivation | Hello World | Usage | Interoperability | Installat
Datetimes for Humans™
Maya: Datetimes for Humans™ Datetimes are very frustrating to work with in Python, especially when dealing with different locales on different systems
ProPublica's collaborative tip-gathering framework. Import and manage CSV, Google Sheets and Screendoor data with ease.
Collaborate This is a web application for managing and building stories based on tips solicited from the public. This project is meant to be easy to s
ProPublica's collaborative tip-gathering framework. Import and manage CSV, Google Sheets and Screendoor data with ease.
Collaborate This is a web application for managing and building stories based on tips solicited from the public. This project is meant to be easy to s
Python library for creating PEG parsers
PyParsing -- A Python Parsing Module Introduction The pyparsing module is an alternative approach to creating and executing simple grammars, vs. the t
fastNLP: A Modularized and Extensible NLP Framework. Currently still in incubation.
fastNLP fastNLP是一款轻量级的自然语言处理(NLP)工具包,目标是快速实现NLP任务以及构建复杂模型。 fastNLP具有如下的特性: 统一的Tabular式数据容器,简化数据预处理过程; 内置多种数据集的Loader和Pipe,省去预处理代码; 各种方便的NLP工具,例如Embedd
GAP-text2SQL: Learning Contextual Representations for Semantic Parsing with Generation-Augmented Pre-Training
GAP-text2SQL: Learning Contextual Representations for Semantic Parsing with Generation-Augmented Pre-Training Code and model from our AAAI 2021 paper
fastNLP: A Modularized and Extensible NLP Framework. Currently still in incubation.
fastNLP fastNLP是一款轻量级的自然语言处理(NLP)工具包,目标是快速实现NLP任务以及构建复杂模型。 fastNLP具有如下的特性: 统一的Tabular式数据容器,简化数据预处理过程; 内置多种数据集的Loader和Pipe,省去预处理代码; 各种方便的NLP工具,例如Embedd
Pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).
AWS Data Wrangler Pandas on AWS Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretMana
Data parsing and validation using Python type hints
pydantic Data validation and settings management using Python type hinting. Fast and extensible, pydantic plays nicely with your linters/IDE/brain. De
Django application and library for importing and exporting data with admin integration.
django-import-export django-import-export is a Django application and library for importing and exporting data with included admin integration. Featur
A flask extension using pyexcel to read, manipulate and write data in different excel formats: csv, ods, xls, xlsx and xlsm.
Flask-Excel - Let you focus on data, instead of file formats Support the project If your company has embedded pyexcel and its components into a revenu
UDdup - URLs Deduplication Tool
UDdup - URLs Deduplication Tool The tool gets a list of URLs, and removes "duplicate" pages in the sense of URL patterns that are probably repetitive
Trankit is a Light-Weight Transformer-based Python Toolkit for Multilingual Natural Language Processing
Trankit: A Light-Weight Transformer-based Python Toolkit for Multilingual Natural Language Processing Trankit is a light-weight Transformer-based Pyth
Pythonic HTML Parsing for Humans™
Requests-HTML: HTML Parsing for Humans™ This library intends to make parsing HTML (e.g. scraping the web) as simple and intuitive as possible. When us
PhoNLP: A BERT-based multi-task learning toolkit for part-of-speech tagging, named entity recognition and dependency parsing
PhoNLP is a multi-task learning model for joint part-of-speech (POS) tagging, named entity recognition (NER) and dependency parsing. Experiments on Vietnamese benchmark datasets show that PhoNLP produces state-of-the-art results, outperforming a single-task learning approach that fine-tunes the pre-trained Vietnamese language model PhoBERT for each task independently.
:hot_pepper: R²SQL: "Dynamic Hybrid Relation Network for Cross-Domain Context-Dependent Semantic Parsing." (AAAI 2021)
R²SQL The PyTorch implementation of paper Dynamic Hybrid Relation Network for Cross-Domain Context-Dependent Semantic Parsing. (AAAI 2021) Requirement
A Python library that provides an easy way to identify devices like mobile phones, tablets and their capabilities by parsing (browser) user agent strings.
Python User Agents user_agents is a Python library that provides an easy way to identify/detect devices like mobile phones, tablets and their capabili
A simple Python module for parsing human names into their individual components
Name Parser A simple Python (3.2+ & 2.6+) module for parsing human names into their individual components. hn.title hn.first hn.middle hn.last hn.suff
Python library for creating PEG parsers
PyParsing -- A Python Parsing Module Introduction The pyparsing module is an alternative approach to creating and executing simple grammars, vs. the t
A friendly library for parsing HTTP request arguments, with built-in support for popular web frameworks, including Flask, Django, Bottle, Tornado, Pyramid, webapp2, Falcon, and aiohttp.
webargs Homepage: https://webargs.readthedocs.io/ webargs is a Python library for parsing and validating HTTP request objects, with built-in support f
🌐 URL parsing and manipulation made easy.
furl is a small Python library that makes parsing and manipulating URLs easy. Python's standard urllib and urlparse modules provide a number of URL re
Pythonic HTML Parsing for Humans™
Requests-HTML: HTML Parsing for Humans™ This library intends to make parsing HTML (e.g. scraping the web) as simple and intuitive as possible. When us
A suite of utilities for converting to and working with CSV, the king of tabular file formats.
csvkit is a suite of command-line tools for converting to and working with CSV, the king of tabular file formats. It is inspired by pdftk, GDAL and th
Python Module for Tabular Datasets in XLS, CSV, JSON, YAML, &c.
Tablib: format-agnostic tabular dataset library _____ ______ ___________ ______ __ /_______ ____ /_ ___ /___(_)___ /_ _ __/_ __ `/__ _
Single API for reading, manipulating and writing data in csv, ods, xls, xlsx and xlsm files
pyexcel - Let you focus on data, instead of file formats Support the project If your company has embedded pyexcel and its components into a revenue ge
Yet Another Sequence Encoder - Encode sequences to vector of vector in python !
Yase Yet Another Sequence Encoder - encode sequences to vector of vectors in python ! Why Yase ? Yase enable you to encode any sequence which can be r
Standards-compliant library for parsing and serializing HTML documents and fragments in Python
html5lib html5lib is a pure-python library for parsing HTML. It is designed to conform to the WHATWG HTML specification, as is implemented by all majo
Python email address and Mime parsing library
Flanker - email address and MIME parsing for Python Flanker is an open source parsing library written in Python by the Mailgun Team. Flanker currently
Parsing ELF and DWARF in Python
pyelftools pyelftools is a pure-Python library for parsing and analyzing ELF files and DWARF debugging information. See the User's guide for more deta
Datetimes for Humans™
Maya: Datetimes for Humans™ Datetimes are very frustrating to work with in Python, especially when dealing with different locales on different systems
Useful extensions to the standard Python datetime features
dateutil - powerful extensions to datetime The dateutil module provides powerful extensions to the standard datetime module, available in Python. Inst
A Python 3 library for parsing human-written times and dates
Chronyk A small Python 3 library containing some handy tools for handling time, especially when it comes to interfacing with those pesky humans. Featu
Pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).
AWS Data Wrangler Pandas on AWS Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretMana
PathPicker accepts a wide range of input -- output from git commands, grep results, searches -- pretty much anything.After parsing the input, PathPicker presents you with a nice UI to select which files you're interested in. After that you can open them in your favorite editor or execute arbitrary commands.
PathPicker Facebook PathPicker is a simple command line tool that solves the perpetual problem of selecting files out of bash output. PathPicker will:
Data parsing and validation using Python type hints
pydantic Data validation and settings management using Python type hinting. Fast and extensible, pydantic plays nicely with your linters/IDE/brain. De