153 Repositories
Python parallel-corpus Libraries
Parallel and High-Fidelity Text-to-Lip Generation; AAAI 2022 ; Official code
Parallel and High-Fidelity Text-to-Lip Generation This repository is the official PyTorch implementation of our AAAI-2022 paper, in which we propose P
A fast poisson image editing implementation that can utilize multi-core CPU or GPU to handle a high-resolution image input.
Poisson Image Editing - A Parallel Implementation Jiayi Weng (jiayiwen), Zixu Chen (zixuc) Poisson Image Editing is a technique that can fuse two imag
Finally, some decent sample sentences
tts-dataset-prompts This repository aims to be a decent set of sentences for people looking to clone their own voices (e.g. using Tacotron 2). Each se
Easy Parallel Library (EPL) is a general and efficient deep learning framework for distributed model training.
English | 简体中文 Easy Parallel Library Overview Easy Parallel Library (EPL) is a general and efficient library for distributed model training. Usability
Code for "Parallel Instance Query Network for Named Entity Recognition", accepted at ACL 2022.
README Code for Two-stage Identifier: "Parallel Instance Query Network for Named Entity Recognition", accepted at ACL 2022. For details of the model a
A framework for GPU based high-performance medical image processing and visualization
FAST is an open-source cross-platform framework with the main goal of making it easier to do high-performance processing and visualization of medical images on heterogeneous systems utilizing both multi-core CPUs and GPUs. To achieve this, FAST use modern C++, OpenCL and OpenGL.
File-based TF-IDF: Calculates keywords in a document, using a word corpus.
File-based TF-IDF Calculates keywords in a document, using a word corpus. Why? Because I found myself with hundreds of plain text files, with no way t
Understanding Hyperdimensional Computing for Parallel Single-Pass Learning
Understanding Hyperdimensional Computing for Parallel Single-Pass Learning Authors: Tao Yu* Yichi Zhang* Zhiru Zhang Christopher De Sa *: Equal Contri
ReCoin - Restoring our environment and businesses in parallel
Shashank Ojha, Sabrina Button, Abdellah Ghassel, Joshua Gonzales "Reduce Reuse R
Gold standard corpus annotated with verb-preverb connections for Hungarian.
Hungarian Preverb Corpus A gold standard corpus manually annotated with verb-preverb connections for Hungarian. corpus The corpus consist of the follo
OpenAi's gym environment wrapper to vectorize them with Ray
Ray Vector Environment Wrapper You would like to use Ray to vectorize your environment but you don't want to use RLLib ? You came to the right place !
PaRT: Parallel Learning for Robust and Transparent AI
PaRT: Parallel Learning for Robust and Transparent AI This repository contains the code for PaRT, an algorithm for training a base network on multiple
A notebook that shows how to import the IITB English-Hindi Parallel Corpus from the HuggingFace datasets repository
We provide a notebook that shows how to import the IITB English-Hindi Parallel Corpus from the HuggingFace datasets repository. The notebook also shows how to segment the corpus using BPE tokenization which can be used to train an English-Hindi MT System.
EasyRequests is a minimalistic HTTP-Request Library that wraps aiohttp and asyncio in a small package that allows for sequential, parallel or even single requests
EasyRequests EasyRequests is a minimalistic HTTP-Request Library that wraps aiohttp and asyncio in a small package that allows for sequential, paralle
Annotating the Tweebank Corpus on Named Entity Recognition and Building NLP Models for Social Media Analysis
TweebankNLP This repo contains the new Tweebank-NER dataset and off-the-shelf Twitter-Stanza pipeline for state-of-the-art Tweet NLP, as described in
Compute execution plan: A DAG representation of work that you want to get done. Individual nodes of the DAG could be simple python or shell tasks or complex deeply nested parallel branches or embedded DAGs themselves.
Hello from magnus Magnus provides four capabilities for data teams: Compute execution plan: A DAG representation of work that you want to get done. In
DaCe is a parallel programming framework that takes code in Python/NumPy and other programming languages
aCe - Data-Centric Parallel Programming Decoupling domain science from performance optimization. DaCe is a parallel programming framework that takes c
Optuna is an automatic hyperparameter optimization software framework, particularly designed for machine learning
Optuna is an automatic hyperparameter optimization software framework, particularly designed for machine learning. It features an imperative, define-by-run style user API.
ParaMonte is a serial/parallel library of Monte Carlo routines for sampling mathematical objective functions of arbitrary-dimensions
ParaMonte is a serial/parallel library of Monte Carlo routines for sampling mathematical objective functions of arbitrary-dimensions, in particular, the posterior distributions of Bayesian models in data science, Machine Learning, and scientific inference, with the design goal of unifying the automation (of Monte Carlo simulations), user-friendliness (of the library), accessibility (from multiple programming environments), high-performance (at runtime), and scalability (across many parallel processors).
nextPARS, a novel Illumina-based implementation of in-vitro parallel probing of RNA structures.
nextPARS, a novel Illumina-based implementation of in-vitro parallel probing of RNA structures. Here you will find the scripts necessary to produce th
CAMoE + Dual SoftMax Loss (DSL): Improving Video-Text Retrieval by Multi-Stream Corpus Alignment and Dual Softmax Loss
CAMoE + Dual SoftMax Loss (DSL): Improving Video-Text Retrieval by Multi-Stream Corpus Alignment and Dual Softmax Loss This is official implement of "
ParaGen is a PyTorch deep learning framework for parallel sequence generation
ParaGen is a PyTorch deep learning framework for parallel sequence generation. Apart from sequence generation, ParaGen also enhances various NLP tasks, including sequence-level classification, extraction and generation.
CVSS: A Massively Multilingual Speech-to-Speech Translation Corpus
CVSS: A Massively Multilingual Speech-to-Speech Translation Corpus CVSS is a massively multilingual-to-English speech-to-speech translation corpus, co
Lbl2Vec learns jointly embedded label, document and word vectors to retrieve documents with predefined topics from an unlabeled document corpus.
Lbl2Vec Lbl2Vec is an algorithm for unsupervised document classification and unsupervised document retrieval. It automatically generates jointly embed
CVSS: A Massively Multilingual Speech-to-Speech Translation Corpus
CVSS: A Massively Multilingual Speech-to-Speech Translation Corpus CVSS is a massively multilingual-to-English speech-to-speech translation corpus, co
A Python script that wraps the gitleaks tool to enable scanning of multiple repositories in parallel
mpgitleaks A Python script that wraps the gitleaks tool to enable scanning of multiple repositories in parallel. The motivation behind writing this sc
Tutorial on scikit-learn and IPython for parallel machine learning
Parallel Machine Learning with scikit-learn and IPython Video recording of this tutorial given at PyCon in 2013. The tutorial material has been rearra
End-to-End Dense Video Captioning with Parallel Decoding (ICCV 2021)
PDVC Official implementation for End-to-End Dense Video Captioning with Parallel Decoding (ICCV 2021) [paper] [valse论文速递(Chinese)] This repo supports:
A library to make concurrent selenium tests that automatically download and setup webdrivers
AutoParaSelenium A library to make parallel selenium tests that automatically download and setup webdrivers Usage Installation pip install autoparasel
YACLC - Yet Another Chinese Learner Corpus
汉语学习者文本多维标注数据集YACLC V1.0 中文 | English 汉语学习者文本多维标注数据集(Yet Another Chinese Learner
DeepSpeech - Easy-to-use Speech Toolkit including SOTA ASR pipeline, influential TTS with text frontend and End-to-End Speech Simultaneous Translation.
(简体中文|English) Quick Start | Documents | Models List PaddleSpeech is an open-source toolkit on PaddlePaddle platform for a variety of critical tasks i
For use with an 8-bit parallel TFT touchscreen using micropython
ILI9341-parallel-TFT-driver-for-micropython For use with an 8-bit parallel TFT touchscreen using micropython. Many thanks to prenticedavid and his MCU
rosny is a lightweight library for building concurrent systems.
rosny is a lightweight library for building concurrent systems. Installation Tested on: Linux Python = 3.6 From pip: pip install rosny From source: p
About Python's multithreading and GIL
About Python's multithreading and GIL
Ping IP addresses and domains in parallel to find the accessible and inaccessible ones.
🚀 IPpy Parallel testing of IP addresses and domains in python. Reads IP addresses and domains from a CSV file and gives two lists of accessible and i
Execute shell command lines in parallel on Slurm, S(on) of Grid Engine (SGE), PBS/Torque clusters
qbatch Execute shell command lines in parallel on Slurm, S(on) of Grid Engine (SGE), PBS/Torque clusters qbatch is a tool for executing commands in pa
Optimizaciones incrementales al problema N-Body con el fin de evaluar y comparar las prestaciones de los traductores de Python en el ámbito de HPC.
Python HPC Optimizaciones incrementales de N-Body (all-pairs) con el fin de evaluar y comparar las prestaciones de los traductores de Python en el ámb
Pyeventbus: a publish/subscribe event bus
pyeventbus pyeventbus is a publish/subscribe event bus for Python 2.7. simplifies the communication between python classes decouples event senders and
NumPy aware dynamic Python compiler using LLVM
Numba A Just-In-Time Compiler for Numerical Functions in Python Numba is an open source, NumPy-aware optimizing compiler for Python sponsored by Anaco
Unofficial Parallel WaveGAN (+ MelGAN & Multi-band MelGAN & HiFi-GAN & StyleMelGAN) with Pytorch
Parallel WaveGAN implementation with Pytorch This repository provides UNOFFICIAL pytorch implementations of the following models: Parallel WaveGAN Mel
Crawl BookCorpus
These are scripts to reproduce BookCorpus by yourself.
Jug: A Task-Based Parallelization Framework
Jug: A Task-Based Parallelization Framework Jug allows you to write code that is broken up into tasks and run different tasks on different processors.
A parallel branch-and-bound engine for Python.
pybnb A parallel branch-and-bound engine for Python. This software is copyright (c) by Gabriel A. Hackebeil (gabe.hacke
The official repository for ROOT: analyzing, storing and visualizing big data, scientifically
About The ROOT system provides a set of OO frameworks with all the functionality needed to handle and analyze large amounts of data in a very efficien
Model parallel transformers in JAX and Haiku
Table of contents Mesh Transformer JAX Updates Pretrained Models GPT-J-6B Links Acknowledgments License Model Details Zero-Shot Evaluations Architectu
An implementation of model parallel GPT-2 and GPT-3-style models using the mesh-tensorflow library.
GPT Neo 🎉 1T or bust my dudes 🎉 An implementation of model & data parallel GPT3-like models using the mesh-tensorflow library. If you're just here t
The versatile ocean simulator, in pure Python, powered by JAX.
Veros is the versatile ocean simulator -- it aims to be a powerful tool that makes high-performance ocean modeling approachable and fun. Because Veros
A proof-of-concept implementation of a parallel-decodable PNG format
mtpng A parallelized PNG encoder in Rust by Brion Vibber [email protected] Background Compressing PNG files is a relatively slow operation at large imag
Selecting Parallel In-domain Sentences for Neural Machine Translation Using Monolingual Texts
DataSelection-NMT Selecting Parallel In-domain Sentences for Neural Machine Translation Using Monolingual Texts Quick update: The paper got accepted o
Python supercharged for the fastai library
Welcome to fastcore Python goodies to make your coding faster, easier, and more maintainable Python is a powerful, dynamic language. Rather than bake
Parallel TTS web demo based on Flask + Vue (Vuetify).
Parallel TTS web demo based on Flask + Vue (Vuetify).
This is a project of data parallel that running on NLP tasks.
This is a project of data parallel that running on NLP tasks.
Massively parallel Monte Carlo diffusion MR simulator written in Python.
Disimpy Disimpy is a Python package for generating simulated diffusion-weighted MR signals that can be useful in the development and validation of dat
Chatbot in 200 lines of code using TensorLayer
Seq2Seq Chatbot This is a 200 lines implementation of Twitter/Cornell-Movie Chatbot, please read the following references before you read the code: Pr
Adaptive: parallel active learning of mathematical functions
adaptive Adaptive: parallel active learning of mathematical functions. adaptive is an open-source Python library designed to make adaptive parallel fu
Phrase-BERT: Improved Phrase Embeddings from BERT with an Application to Corpus Exploration
Phrase-BERT: Improved Phrase Embeddings from BERT with an Application to Corpus Exploration This is the official repository for the EMNLP 2021 long pa
DANeS is an open-source E-newspaper dataset by collaboration between DATASET JSC (dataset.vn) and AIV Group (aivgroup.vn)
DANeS - Open-source E-newspaper dataset Source: Technology vector created by macrovector - www.freepik.com. DANeS is an open-source E-newspaper datase
Decoding the Protein-ligand Interactions Using Parallel Graph Neural Networks
Decoding the Protein-ligand Interactions Using Parallel Graph Neural Networks Requirements python 0.10+ rdkit 2020.03.3.0 biopython 1.78 openbabel 2.4
RL algorithm PPO and IRL algorithm AIRL written with Tensorflow.
RL algorithm PPO and IRL algorithm AIRL written with Tensorflow. They have a parallel sampling feature in order to increase computation speed (especially in high-performance computing (HPC)).
Parallel Latent Tree-Induction for Faster Sequence Encoding
FastTrees This repository contains the experimental code supporting the FastTrees paper by Bill Pung. Software Requirements Python 3.6, NLTK and PyTor
An open-source Kazakh named entity recognition dataset (KazNERD), annotation guidelines, and baseline NER models.
Kazakh Named Entity Recognition This repository contains an open-source Kazakh named entity recognition dataset (KazNERD), named entity annotation gui
A simple and efficient tool to parallelize Pandas operations on all available CPUs
Pandaral·lel Without parallelization With parallelization Installation $ pip install pandarallel [--upgrade] [--user] Requirements On Windows, Pandara
AWS documentation corpus for zero-shot open-book question answering.
aws-documentation We present the AWS documentation corpus, an open-book QA dataset, which contains 25,175 documents along with 100 matched questions a
Unleashing Transformers: Parallel Token Prediction with Discrete Absorbing Diffusion for Fast High-Resolution Image Generation from Vector-Quantized Codes
Unleashing Transformers: Parallel Token Prediction with Discrete Absorbing Diffusion for Fast High-Resolution Image Generation from Vector-Quantized C
An optimization and data collection toolbox for convenient and fast prototyping of computationally expensive models.
An optimization and data collection toolbox for convenient and fast prototyping of computationally expensive models. Hyperactive: is very easy to lear
Python implementation of an automatic parallel parking system in a virtual environment, including path planning, path tracking, and parallel parking
Automatic Parallel Parking: Path Planning, Path Tracking & Control This repository contains a python implementation of an automatic parallel parking s
Backtesting an algorithmic trading strategy using Machine Learning and Sentiment Analysis.
Trading Tesla with Machine Learning and Sentiment Analysis An interactive program to train a Random Forest Classifier to predict Tesla daily prices us
Taichi is a parallel programming language for high-performance numerical computations.
Taichi is a parallel programming language for high-performance numerical computations.
Video Corpus Moment Retrieval with Contrastive Learning (SIGIR 2021)
Video Corpus Moment Retrieval with Contrastive Learning PyTorch implementation for the paper "Video Corpus Moment Retrieval with Contrastive Learning"
TEDSummary is a speech summary corpus. It includes TED talks subtitle (Document), Title-Detail (Summary), speaker name (Meta info), MP4 URL, and utterance id
TEDSummary is a speech summary corpus. It includes TED talks subtitle (Document), Title-Detail (Summary), speaker name (Meta info), MP4 URL
A Python module for parallel optimization of expensive black-box functions
blackbox: A Python module for parallel optimization of expensive black-box functions What is this? A minimalistic and easy-to-use Python module that e
A hyperparameter optimization framework
Optuna: A hyperparameter optimization framework Website | Docs | Install Guide | Tutorial Optuna is an automatic hyperparameter optimization software
SimplEx - Explaining Latent Representations with a Corpus of Examples
SimplEx - Explaining Latent Representations with a Corpus of Examples Code Author: Jonathan Crabbé ([email protected]) This repository contains the imp
Colossal-AI: A Unified Deep Learning System for Large-Scale Parallel Training
ColossalAI An integrated large-scale model training system with efficient parallelization techniques. arXiv: Colossal-AI: A Unified Deep Learning Syst
Resilient Adaptive Parallel sImulator for griD (rapid)
Rapid is an open-source software library that implements a novel “parallel-in-time” (Parareal) algorithm and semi-analytical solutions for co-simulation of integrated transmission and distribution systems.
A suite of benchmarks for CPU and GPU performance of the most popular high-performance libraries for Python :rocket:
A suite of benchmarks for CPU and GPU performance of the most popular high-performance libraries for Python :rocket:
Colossal-AI: A Unified Deep Learning System for Large-Scale Parallel Training
ColossalAI An integrated large-scale model training system with efficient parallelization techniques Installation PyPI pip install colossalai Install
PASTRIE: A Corpus of Prepositions Annotated with Supersense Tags in Reddit International English
PASTRIE Official release of the corpus described in the paper: Michael Kranzlein, Emma Manning, Siyao Peng, Shira Wein, Aryaman Arora, and Nathan Schn
PyTorch Implementation of ByteDance's Cross-speaker Emotion Transfer Based on Speaker Condition Layer Normalization and Semi-Supervised Training in Text-To-Speech
Cross-Speaker-Emotion-Transfer - PyTorch Implementation PyTorch Implementation of ByteDance's Cross-speaker Emotion Transfer Based on Speaker Conditio
Run functions in parallel easily, with their results typed correctly!
typesafe_parmap pip install pip install typesafe-parmap Run functions in parallel safely with typesafe parmap! GitHub: https://github.com/thejaminato
DaCeML - Machine learning powered by data-centric parallel programming.
Data-centric machine learning powered by DaCe
impy is an all-in-one image analysis library, equipped with parallel processing, GPU support, GUI based tools and so on.
impy is All You Need in Image Analysis impy is an all-in-one image analysis library, equipped with parallel processing, GPU support, GUI based tools a
squid-dl is a massively parallel yt-dlp-based YouTube downloader.
squid-dl squid-dl is a massively parallel yt-dlp-based YouTube downloader. Installation Run the setup.py, which will install squid-dl and its two depe
Code and data for ACL2021 paper Cross-Lingual Abstractive Summarization with Limited Parallel Resources.
Multi-Task Framework for Cross-Lingual Abstractive Summarization (MCLAS) The code for ACL2021 paper Cross-Lingual Abstractive Summarization with Limit
A Python framework for developing parallelized Computational Fluid Dynamics software to solve the hyperbolic 2D Euler equations on distributed, multi-block structured grids.
pyHype: Computational Fluid Dynamics in Python pyHype is a Python framework for developing parallelized Computational Fluid Dynamics software to solve
CAMoE + Dual SoftMax Loss (DSL): Improving Video-Text Retrieval by Multi-Stream Corpus Alignment and Dual Softmax Loss
CAMoE + Dual SoftMax Loss (DSL): Improving Video-Text Retrieval by Multi-Stream Corpus Alignment and Dual Softmax Loss This is official implement of "
A cross-document event and entity coreference resolution system, trained and evaluated on the ECB+ corpus.
A Comprehensive Comparison of Word Embeddings in Event & Entity Coreference Resolution. Introduction This repo contains experimental code derived from
Open-Source Python CLI package for copying DynamoDB tables and items in parallel batch processing + query natural & Global Secondary Indexes (GSIs)
Python Command-Line Interface Package to copy Dynamodb data in parallel batch processing + query natural & Global Secondary Indexes (GSIs).
[2021 MultiMedia] CONQUER: Contextual Query-aware Ranking for Video Corpus Moment Retrieval
CONQUER: Contexutal Query-aware Ranking for Video Corpus Moment Retreival PyTorch implementation of CONQUER: Contexutal Query-aware Ranking for Video
[ICCV 2021] Official Tensorflow Implementation for "Single Image Defocus Deblurring Using Kernel-Sharing Parallel Atrous Convolutions"
KPAC: Kernel-Sharing Parallel Atrous Convolutional block This repository contains the official Tensorflow implementation of the following paper: Singl
NLPIR tutorial: pretrain for IR. pre-train on raw textual corpus, fine-tune on MS MARCO Document Ranking
pretrain4ir_tutorial NLPIR tutorial: pretrain for IR. pre-train on raw textual corpus, fine-tune on MS MARCO Document Ranking 用作NLPIR实验室, Pre-training
SinglepassTextCluster, an TextCluster tools based on Singlepass cluster algorithm that use tfidf vector and doc2vec,which can be used for individual real-time corpus cluster task。基于single-pass算法思想的自动文本聚类小组件,内置tfidf和doc2vec两种文本向量方法,可自动输出聚类数目、类簇文档集合和簇类大小,用于自有实时数据的聚类任务。
项目的背景 SinglepassTextCluster, an TextCluster tool based on Singlepass cluster algorithm that use tfidf vector and doc2vec,which can be used for individ
tox-gh is a tox plugin which helps running tox on GitHub Actions with multiple different Python versions on multiple workers in parallel
tox-gh is a tox plugin which helps running tox on GitHub Actions with multiple different Python versions on multiple workers in parallel. This project is inspired by tox-travis.
A scalable implementation of WobblyStitcher for 3D microscopy images
WobblyStitcher Introduction A scalable implementation of WobblyStitcher Dependencies $ python -m pip install numpy scikit-image Visualization ImageJ
Official code of our work, AVATAR: A Parallel Corpus for Java-Python Program Translation.
AVATAR Official code of our work, AVATAR: A Parallel Corpus for Java-Python Program Translation. AVATAR stands for jAVA-pyThon progrAm tRanslation. AV
Ray-based parallel data preprocessing for NLP and ML.
Wrangl Ray-based parallel data preprocessing for NLP and ML. pip install wrangl # for latest pip install git+https://github.com/vzhong/wrangl See exa
StarGANv2-VC: A Diverse, Unsupervised, Non-parallel Framework for Natural-Sounding Voice Conversion
StarGANv2-VC: A Diverse, Unsupervised, Non-parallel Framework for Natural-Sounding Voice Conversion Yinghao Aaron Li, Ali Zare, Nima Mesgarani We pres
In-place Parallel Super Scalar Samplesort (IPS⁴o)
In-place Parallel Super Scalar Samplesort (IPS⁴o) This is the implementation of the algorithm IPS⁴o presented in the paper Engineering In-place (Share
This repository contains the code for "Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based Bias in NLP".
Self-Diagnosis and Self-Debiasing This repository contains the source code for Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based
Code for the paper "JANUS: Parallel Tempered Genetic Algorithm Guided by Deep Neural Networks for Inverse Molecular Design"
JANUS: Parallel Tempered Genetic Algorithm Guided by Deep Neural Networks for Inverse Molecular Design This repository contains code for the paper: JA