3094 Python Swahili-nlp-data Libraries

Projects that implement various aspects of Data Engineering.

DATAWAREHOUSE ON AWS The purpose of this project is to build a datawarehouse to accomodate data of active user activity for music streaming applicatio

2 Oct 14, 2021

Tribuo - A Java machine learning library

Tribuo - A Java prediction library (v4.1) Tribuo is a machine learning library in Java that provides multi-class classification, regression, clusterin

1.1k Dec 28, 2022

Persian Kaldi profile for Rhasspy built from open speech data

Persian Kaldi Profile A Rhasspy profile for Persian (fa). Installation Get started by first installing Vosk: # Create virtual environment python3 -m v

12 Aug 8, 2022

Fancy data functions that will make your life as a data scientist easier.

WhiteBox Utilities Toolkit: Tools to make your life easier Fancy data functions that will make your life as a data scientist easier. Installing To ins

3 Oct 3, 2022

Exploratory Data Analysis for Employee Retention Dataset

Exploratory Data Analysis for Employee Retention Dataset Employee turn-over is a very costly problem for companies. The cost of replacing an employee

2 Oct 1, 2021

Train 🤗-transformers model with Poutyne.

poutyne-transformers Train 🤗 -transformers models with Poutyne. Installation pip install poutyne-transformers Example import torch from transformers

2 Dec 18, 2022

Graphsignal is a machine learning model monitoring platform.

Graphsignal is a machine learning model monitoring platform. It helps ML engineers, MLOps teams and data scientists to quickly address issues with data and models as well as proactively analyze model performance and availability.

143 Dec 5, 2022

Data Efficient Decision Making

197 Jan 6, 2023

Google's Meena transformer chatbot implementation

Here's my attempt at recreating Meena, a state of the art chatbot developed by Google Research and described in the paper Towards a Human-like Open-Domain Chatbot.

94 Dec 25, 2022

LexGLUE: A Benchmark Dataset for Legal Language Understanding in English

LexGLUE: A Benchmark Dataset for Legal Language Understanding in English ⚖️ 🏆 🧑‍🎓 👩‍⚖️ Dataset Summary Inspired by the recent widespread use of th

95 Dec 8, 2022

Code for CodeT5: a new code-aware pre-trained encoder-decoder model.

CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation This is the official PyTorch implementation

564 Jan 8, 2023

Repositori untuk belajar pemrograman Python dalam bahasa Indonesia

Python Repositori ini berisi kumpulan dari berbagai macam contoh struktur data, algoritma dan komputasi matematika yang diimplementasikan dengan mengg

111 Dec 19, 2022

Azua - build AI algorithms to aid efficient decision-making with minimum data requirements.

Project Azua 0. Overview Many modern AI algorithms are known to be data-hungry, whereas human decision-making is much more efficient. The human can re

197 Jan 6, 2023

Eland is a Python Elasticsearch client for exploring and analyzing data in Elasticsearch with a familiar Pandas-compatible API.

Python Client and Toolkit for DataFrames, Big Data, Machine Learning and ETL in Elasticsearch

463 Dec 30, 2022

Angora is a mutation-based fuzzer. The main goal of Angora is to increase branch coverage by solving path constraints without symbolic execution.

Angora Angora is a mutation-based coverage guided fuzzer. The main goal of Angora is to increase branch coverage by solving path constraints without s

833 Jan 7, 2023

a short visualisation script for pyvideo data

PyVideo Speakers A CLI that visualises repeat speakers from events listed in https://github.com/pyvideo/data Not terribly efficient, but you know. Ins

3 Nov 24, 2021

A Python library for choreographing your machine learning research.

270 Jan 6, 2023

Tablexplore is an application for data analysis and plotting built in Python using the PySide2/Qt toolkit.

81 Dec 26, 2022

wyscoutapi is an extremely basic API client for the Wyscout API (v2 & v3) for Python

wyscoutapi wyscoutapi is an extremely basic API client for the Wyscout API (v2 & v3). Usage Install with pip install wyscoutapi. To connect to the Wys

11 Nov 22, 2022

Signature remover is a NLP based solution which removes email signatures from the rest of the text.

Signature Remover Signature remover is a NLP based solution which removes email signatures from the rest of the text. It helps to enchance data conten

8 Jan 6, 2023

Tickergram is a Telegram bot to look up quotes, charts, general market sentiment and more.

25 Nov 26, 2022

Write maintainable, production-ready pipelines using Jupyter or your favorite text editor. Develop locally, deploy to the cloud. ☁️

2.9k Jan 6, 2023

Extract data from ThousandEyes REST API and visualize it on your customized Grafana Dashboard.

ThousandEyes Grafana Dashboard Extract data from the ThousandEyes REST API and visualize it on your customized Grafana Dashboard. Deploy Grafana, Infl

16 Nov 26, 2022

Efficient Python Tricks and Tools for Data Scientists

Why efficient Python? Because using Python more efficiently will make your code more readable and run more efficiently.

944 Dec 28, 2022

Training RNNs as Fast as CNNs

News SRU++, a new SRU variant, is released. [tech report] [blog] The experimental code and SRU++ implementation are available on the dev branch which

2.1k Jan 1, 2023

ULMFiT for Genomic Sequence Data

Genomic ULMFiT This is an implementation of ULMFiT for genomics classification using Pytorch and Fastai. The model architecture used is based on the A

276 Dec 12, 2022

This repository is for adding codes of data structures and algorithms, leetCode, hackerrank etc solutions in different languages

DSA-Code-Snippet This repository is for adding codes of data structures and algorithms, leetCode, hackerrank etc solutions in different languages Cont

3 Oct 22, 2021

This repository is a compilation of important Data Structures and Algorithms based on Python.

Python DSA 🐍 This repository is a compilation of important Data Structures and Algorithms based on Python. Please make seperate folders for different

27 Oct 29, 2022

Vectorizers for a range of different data types

Tutte Institute for Mathematics and Computing

69 Dec 29, 2022

Beyond Paragraphs: NLP for Long Sequences

338 Dec 2, 2022

Prompt-learning is the latest paradigm to adapt pre-trained language models (PLMs) to downstream NLP tasks

Prompt-learning is the latest paradigm to adapt pre-trained language models (PLMs) to downstream NLP tasks, which modifies the input text with a textual template and directly uses PLMs to conduct pre-trained tasks. This library provides a standard, flexible and extensible framework to deploy the prompt-learning pipeline. OpenPrompt supports loading PLMs directly from huggingface transformers. In the future, we will also support PLMs implemented by other libraries.

2.3k Jan 8, 2023

deep-table implements various state-of-the-art deep learning and self-supervised learning algorithms for tabular data using PyTorch.

63 Oct 17, 2022

Google and Stanford University released a new pre-trained model called ELECTRA

Google and Stanford University released a new pre-trained model called ELECTRA, which has a much compact model size and relatively competitive performance compared to BERT and its variants. For further accelerating the research of the Chinese pre-trained model, the Joint Laboratory of HIT and iFLYTEK Research (HFL) has released the Chinese ELECTRA models based on the official code of ELECTRA. ELECTRA-small could reach similar or even higher scores on several NLP tasks with only 1/10 parameters compared to BERT and its variants.

1.2k Dec 30, 2022

Tools for collecting social media data around focal events

Social Media Focal Events The focalevents codebase provides tools for organizing data collected around focal events on social media. It is often diffi

80 Nov 28, 2022

A Chinese to English Neural Model Translation Project

ZH-EN NMT Chinese to English Neural Machine Translation This project is inspired by Stanford's CS224N NMT Project Dataset used in this project: News C

29 Nov 26, 2022

Transformers4Rec is a flexible and efficient library for sequential and session-based recommendation, available for both PyTorch and Tensorflow.

730 Jan 9, 2023

Paradigm Shift in NLP - "Paradigm Shift in Natural Language Processing".

Paradigm Shift in NLP Welcome to the webpage for "Paradigm Shift in Natural Language Processing". Some resources of the paper are constantly maintaine

41 Dec 30, 2022

Elementary is an open-source data reliability framework for modern data teams. The first module of the framework is data lineage.

Data lineage made simple, reliable, and automated. Effortlessly track the flow of data, understand dependencies and analyze impact. Features Visualiza

898 Jan 9, 2023

[ICCV 2021] Excavating the Potential Capacity of Self-Supervised Monocular Depth Estimation

EPCDepth EPCDepth is a self-supervised monocular depth estimation model, whose supervision is coming from the other image in a stereo pair. Details ar

110 Dec 23, 2022

A curated (most recent) list of resources for Learning with Noisy Labels

321 Jan 9, 2023

PyNeuro is designed to connect NeuroSky's MindWave EEG device to Python and provide Callback functionality to provide data to your application in real time.

PyNeuro PyNeuro is designed to connect NeuroSky's MindWave EEG device to Python and provide Callback functionality to provide data to your application

45 Dec 30, 2022

This YoloV5 based model is fit to detect people and different types of land vehicles, and displaying their density on a fitted map, according to their coordinates and detected labels.

This YoloV5 based model is fit to detect people and different types of land vehicles, and displaying their density on a fitted map, according to their

8 May 22, 2022

A PyTorch repo for data loading and utilities to be shared by the PyTorch domain libraries.

878 Dec 30, 2022

AutoTabular automates machine learning tasks enabling you to easily achieve strong predictive performance in your applications.

AutoTabular automates machine learning tasks enabling you to easily achieve strong predictive performance in your applications. With just a few lines of code, you can train and deploy high-accuracy machine learning and deep learning models tabular data.

55 Dec 27, 2022

🔥🔥High-Performance Face Recognition Library on PaddlePaddle & PyTorch🔥🔥

face.evoLVe: High-Performance Face Recognition Library based on PaddlePaddle & PyTorch Evolve to be more comprehensive, effective and efficient for fa

3.1k Jan 2, 2023

A (PyTorch) imbalanced dataset sampler for oversampling low frequent classes and undersampling high frequent ones.

Imbalanced Dataset Sampler Introduction In many machine learning applications, we often come across datasets where some types of data may be seen more

2k Jan 8, 2023

Pytorch implementations of various Deep NLP models in cs-224n(Stanford Univ)

DeepNLP-models-Pytorch Pytorch implementations of various Deep NLP models in cs-224n(Stanford Univ: NLP with Deep Learning) This is not for Pytorch be

2.9k Dec 24, 2022

An IPython Notebook tutorial on deep learning for natural language processing, including structure prediction.

Table of Contents: Introduction to Torch's Tensor Library Computation Graphs and Automatic Differentiation Deep Learning Building Blocks: Affine maps,

1.8k Jan 4, 2023

Interactive deep learning book with multi-framework code, math, and discussions. Adopted at 200 universities.

D2L.ai: Interactive Deep Learning Book with Multi-Framework Code, Math, and Discussions Book website | STAT 157 Course at UC Berkeley | Latest version

16k Jan 3, 2023

Using / reproducing ACD from the paper "Hierarchical interpretations for neural network predictions" 🧠 (ICLR 2019)

Hierarchical neural-net interpretations (ACD) 🧠 Produces hierarchical interpretations for a single prediction made by a pytorch neural network. Offic

111 Jan 3, 2023

Implementation of linear CorEx and temporal CorEx.

Correlation Explanation Methods Official implementation of linear correlation explanation (linear CorEx) and temporal correlation explanation (T-CorEx

34 Nov 15, 2022

The source code and data of the paper "Instance-wise Graph-based Framework for Multivariate Time Series Forecasting".

IGMTF The source code and data of the paper "Instance-wise Graph-based Framework for Multivariate Time Series Forecasting". Requirements The framework

24 Dec 5, 2022

WRENCH: Weak supeRvision bENCHmark

🔧 What is it? Wrench is a benchmark platform containing diverse weak supervision tasks. It also provides a common and easy framework for development

176 Dec 28, 2022

Code and data to accompany the camera-ready version of "Cross-Attention is All You Need: Adapting Pretrained Transformers for Machine Translation" in EMNLP 2021

16 Jul 16, 2022

Pro Football Reference Game Data Webscraper

Pro Football Reference Game Data Webscraper Code Copyright Yeetzsche This is a simple Pro Football Reference Webscraper that can either collect all ga

6 Dec 21, 2022

Lingvo is a framework for building neural networks in Tensorflow, particularly sequence models.

2.7k Jan 5, 2023

A Tools that help Data Scientists and ML engineers train and deploy ML models.

Domino Research This repo contains projects under active development by the Domino R&D team. We build tools that help Data Scientists and ML engineers

73 Oct 17, 2022

Code and data form the paper BERT Got a Date: Introducing Transformers to Temporal Tagging

BERT Got a Date: Introducing Transformers to Temporal Tagging Satya Almasian*, Dennis Aumiller*, and Michael Gertz Heidelberg University Contact us vi

54 Dec 4, 2022

Python library which makes it possible to dynamically mask/anonymize data using JSON string or python dict rules in a PySpark environment.

pyspark-anonymizer Python library which makes it possible to dynamically mask/anonymize data using JSON string or python dict rules in a PySpark envir

6 Jun 30, 2022

A pytorch implementation of the ACL2019 paper "Simple and Effective Text Matching with Richer Alignment Features".

RE2 This is a pytorch implementation of the ACL 2019 paper "Simple and Effective Text Matching with Richer Alignment Features". The original Tensorflo

287 Dec 21, 2022

Universal Adversarial Triggers for Attacking and Analyzing NLP (EMNLP 2019)

Universal Adversarial Triggers for Attacking and Analyzing NLP This is the official code for the EMNLP 2019 paper, Universal Adversarial Triggers for

248 Dec 17, 2022

Simple Text-Generator with OpenAI gpt-2 Pytorch Implementation

GPT2-Pytorch with Text-Generator Better Language Models and Their Implications Our model, called GPT-2 (a successor to GPT), was trained simply to pre

775 Jan 8, 2023

🤗 Transformers: State-of-the-art Natural Language Processing for Pytorch, TensorFlow, and JAX.

English | 简体中文 | 繁體中文 State-of-the-art Natural Language Processing for Jax, PyTorch and TensorFlow 🤗 Transformers provides thousands of pretrained mo

77.2k Jan 2, 2023

PyTorch Language Model for 1-Billion Word (LM1B / GBW) Dataset

PyTorch Large-Scale Language Model A Large-Scale PyTorch Language Model trained on the 1-Billion Word (LM1B) / (GBW) dataset Latest Results 39.98 Perp

114 Nov 4, 2022

Pervasive Attention: 2D Convolutional Networks for Sequence-to-Sequence Prediction

This is a fork of Fairseq(-py) with implementations of the following models: Pervasive Attention - 2D Convolutional Neural Networks for Sequence-to-Se

490 Dec 15, 2022

Meli Data Challenge 2021 - First Place Solution

My solution for the Meli Data Challenge 2021

23 Mar 9, 2022

WIT (Wikipedia-based Image Text) Dataset is a large multimodal multilingual dataset comprising 37M+ image-text sets with 11M+ unique images across 100+ languages.

740 Dec 24, 2022

The PASS dataset: pretrained models and how to get the data - PASS: Pictures without humAns for Self-Supervised Pretraining

249 Dec 22, 2022

Transformers4Rec is a flexible and efficient library for sequential and session-based recommendation, available for both PyTorch and Tensorflow.

730 Jan 9, 2023

An esoteric data type built entirely of NaNs.

NaNsAreNumbers An esoteric data type built entirely of NaNs. Installation pip install nans_are_numbers Explanation A floating point number is just co

72 Jan 1, 2023

Labelling platform for text using distant supervision

With DataQA, you can label unstructured text documents using rule-based distant supervision.

245 Aug 5, 2022

A PyTorch implementation of the Transformer model in "Attention is All You Need".

Attention is all you need: A Pytorch Implementation This is a PyTorch implementation of the Transformer model in "Attention is All You Need" (Ashish V

7.1k Jan 4, 2023

Implementation of Squeezenet in pytorch, pretrained models on Cifar 10 data to come

Pytorch Squeeznet Pytorch implementation of Squeezenet model as described in https://arxiv.org/abs/1602.07360 on cifar-10 Data. The definition of Sque

86 Oct 28, 2022

Recurrent Variational Autoencoder that generates sequential data implemented with pytorch

Pytorch Recurrent Variational Autoencoder Model: This is the implementation of Samuel Bowman's Generating Sentences from a Continuous Space with Kim's

347 Nov 14, 2022

Hierarchical unsupervised and semi-supervised topic models for sparse count data with CorEx

Anchored CorEx: Hierarchical Topic Modeling with Minimal Domain Knowledge Correlation Explanation (CorEx) is a topic model that yields rich topics tha

592 Dec 18, 2022

A Domain Specific Language (DSL) for building language patterns. These can be later compiled into spaCy patterns, pure regex, or any other format

RITA DSL This is a language, loosely based on language Apache UIMA RUTA, focused on writing manual language rules, which compiles into either spaCy co

60 Sep 26, 2022

A model library for exploring state-of-the-art deep learning topologies and techniques for optimizing Natural Language Processing neural networks

2.9k Dec 31, 2022

"elect", "electoral", "electorate" etc." data-original="https://github.com/gutfeeling/word_forms/raw/master/logo.png" >

Accurately generate all possible forms of an English word e.g "election" -- "elect", "electoral", "electorate" etc.

Accurately generate all possible forms of an English word Word forms can accurately generate all possible forms of an English word. It can conjugate v

570 Dec 31, 2022

The tool to make NLP datasets ready to use

chazutsu photo from Kaikado, traditional Japanese chazutsu maker chazutsu is the dataset downloader for NLP. import chazutsu r = chazutsu.data

243 Dec 29, 2022

TextAttack 🐙 is a Python framework for adversarial attacks, data augmentation, and model training in NLP

TextAttack 🐙 Generating adversarial examples for NLP models [TextAttack Documentation on ReadTheDocs] About • Setup • Usage • Design About TextAttack

2.2k Jan 3, 2023

Faker is a Python package that generates fake data for you.

Faker is a Python package that generates fake data for you. Whether you need to bootstrap your database, create good-looking XML documents, fill-in yo

15.2k Jan 1, 2023

Model Agnostic Confidence Estimator (MACEST) - A Python library for calibrating Machine Learning models' confidence scores

95 Dec 28, 2022

PySpark Cheat Sheet - learn PySpark and develop apps faster

This cheat sheet will help you learn PySpark and write PySpark apps faster. Everything in here is fully functional PySpark code you can run or adapt to your programs.

168 Jan 1, 2023

A2T: Towards Improving Adversarial Training of NLP Models (EMNLP 2021 Findings)

A2T: Towards Improving Adversarial Training of NLP Models This is the source code for the EMNLP 2021 (Findings) paper "Towards Improving Adversarial T

17 Oct 15, 2022

💛 Code and Dataset for our EMNLP 2021 paper: "Perspective-taking and Pragmatics for Generating Empathetic Responses Focused on Emotion Causes"

Perspective-taking and Pragmatics for Generating Empathetic Responses Focused on Emotion Causes Official PyTorch implementation and EmoCause evaluatio

50 Dec 21, 2022

TorchGeo is a PyTorch domain library, similar to torchvision, that provides datasets, transforms, samplers, and pre-trained models specific to geospatial data.

1.3k Dec 30, 2022

The project is investigating methods to extract human-marked data from document forms such as surveys and tests.

The project is investigating methods to extract human-marked data from document forms such as surveys and tests. They can read questions, multiple-choice exam papers, and grade.

5 Mar 27, 2022

Python reader for Linked Data in HDF5 files

Linked Data are becoming more popular for user-created metadata in HDF5 files.

8 May 17, 2022

Natural Language Processing with transformers

we want to create a repo to illustrate usage of transformers in chinese

763 Dec 27, 2022

Django package to log request values such as device, IP address, user CPU time, system CPU time, No of queries, SQL time, no of cache calls, missing, setting data cache calls for a particular URL with a basic UI.

django-web-profiler's documentation: Introduction: django-web-profiler is a django profiling tool which logs, stores debug toolbar statistics and also

77 Oct 29, 2022

Data from popular CS:GO website hltv.org

Welcome to hltv-data 👋 🎮 Data from popular CS:GO website hltv.org Install pip install hltv-data Usage The public methods can be reached using HLTVCl

28 Dec 23, 2022

Obsidian tools - a Python package for analysing an Obsidian.md vault

obsidiantools is a Python package for getting structured metadata about your Obsidian.md notes and analysing your vault.

153 Jan 4, 2023

A demo of a data science project using Kedro

iris Overview This is your new Kedro project, which was generated using Kedro 0.17.4. Take a look at the Kedro documentation to get started. Rules and

14 Oct 14, 2022

IndoBERTweet is the first large-scale pretrained model for Indonesian Twitter. Published at EMNLP 2021 (main conference)

IndoBERTweet 🐦 🇮🇩 1. Paper Fajri Koto, Jey Han Lau, and Timothy Baldwin. IndoBERTweet: A Pretrained Language Model for Indonesian Twitter with Effe

40 Nov 30, 2022

Code for the paper in Findings of EMNLP 2021: "EfficientBERT: Progressively Searching Multilayer Perceptron via Warm-up Knowledge Distillation".

This repository contains the code for the paper in Findings of EMNLP 2021: "EfficientBERT: Progressively Searching Multilayer Perceptron via Warm-up Knowledge Distillation".

28 Nov 10, 2022

django-dashing is a customisable, modular dashboard application framework for Django to visualize interesting data about your project. Inspired in the dashboard framework Dashing

django-dashing django-dashing is a customisable, modular dashboard application framework for Django to visualize interesting data about your project.

703 Dec 22, 2022

Python Swahili-nlp-data Resources

Python swahili-nlp-data Libraries

Projects that implement various aspects of Data Engineering.

Tribuo - A Java machine learning library

Persian Kaldi profile for Rhasspy built from open speech data

Fancy data functions that will make your life as a data scientist easier.

Exploratory Data Analysis for Employee Retention Dataset

Train 🤗-transformers model with Poutyne.

Graphsignal is a machine learning model monitoring platform.

Data Efficient Decision Making

Google's Meena transformer chatbot implementation

LexGLUE: A Benchmark Dataset for Legal Language Understanding in English

Code for CodeT5: a new code-aware pre-trained encoder-decoder model.

Repositori untuk belajar pemrograman Python dalam bahasa Indonesia

Azua - build AI algorithms to aid efficient decision-making with minimum data requirements.

Eland is a Python Elasticsearch client for exploring and analyzing data in Elasticsearch with a familiar Pandas-compatible API.

Angora is a mutation-based fuzzer. The main goal of Angora is to increase branch coverage by solving path constraints without symbolic execution.

a short visualisation script for pyvideo data

A Python library for choreographing your machine learning research.

Tablexplore is an application for data analysis and plotting built in Python using the PySide2/Qt toolkit.

wyscoutapi is an extremely basic API client for the Wyscout API (v2 & v3) for Python

Signature remover is a NLP based solution which removes email signatures from the rest of the text.

Tickergram is a Telegram bot to look up quotes, charts, general market sentiment and more.

Write maintainable, production-ready pipelines using Jupyter or your favorite text editor. Develop locally, deploy to the cloud. ☁️

Extract data from ThousandEyes REST API and visualize it on your customized Grafana Dashboard.

Efficient Python Tricks and Tools for Data Scientists

Training RNNs as Fast as CNNs

ULMFiT for Genomic Sequence Data

This repository is for adding codes of data structures and algorithms, leetCode, hackerrank etc solutions in different languages

This repository is a compilation of important Data Structures and Algorithms based on Python.

Vectorizers for a range of different data types

Beyond Paragraphs: NLP for Long Sequences

Prompt-learning is the latest paradigm to adapt pre-trained language models (PLMs) to downstream NLP tasks

deep-table implements various state-of-the-art deep learning and self-supervised learning algorithms for tabular data using PyTorch.

Google and Stanford University released a new pre-trained model called ELECTRA

Tools for collecting social media data around focal events

A Chinese to English Neural Model Translation Project

Transformers4Rec is a flexible and efficient library for sequential and session-based recommendation, available for both PyTorch and Tensorflow.

Paradigm Shift in NLP - "Paradigm Shift in Natural Language Processing".

Elementary is an open-source data reliability framework for modern data teams. The first module of the framework is data lineage.

[ICCV 2021] Excavating the Potential Capacity of Self-Supervised Monocular Depth Estimation

A curated (most recent) list of resources for Learning with Noisy Labels

PyNeuro is designed to connect NeuroSky's MindWave EEG device to Python and provide Callback functionality to provide data to your application in real time.

This YoloV5 based model is fit to detect people and different types of land vehicles, and displaying their density on a fitted map, according to their coordinates and detected labels.

A PyTorch repo for data loading and utilities to be shared by the PyTorch domain libraries.

AutoTabular automates machine learning tasks enabling you to easily achieve strong predictive performance in your applications.

🔥🔥High-Performance Face Recognition Library on PaddlePaddle & PyTorch🔥🔥

A (PyTorch) imbalanced dataset sampler for oversampling low frequent classes and undersampling high frequent ones.

Pytorch implementations of various Deep NLP models in cs-224n(Stanford Univ)

An IPython Notebook tutorial on deep learning for natural language processing, including structure prediction.

Interactive deep learning book with multi-framework code, math, and discussions. Adopted at 200 universities.

Using / reproducing ACD from the paper "Hierarchical interpretations for neural network predictions" 🧠 (ICLR 2019)

Implementation of linear CorEx and temporal CorEx.

The source code and data of the paper "Instance-wise Graph-based Framework for Multivariate Time Series Forecasting".

WRENCH: Weak supeRvision bENCHmark

Code and data to accompany the camera-ready version of "Cross-Attention is All You Need: Adapting Pretrained Transformers for Machine Translation" in EMNLP 2021

Pro Football Reference Game Data Webscraper

Lingvo is a framework for building neural networks in Tensorflow, particularly sequence models.

A Tools that help Data Scientists and ML engineers train and deploy ML models.

Code and data form the paper BERT Got a Date: Introducing Transformers to Temporal Tagging

Python library which makes it possible to dynamically mask/anonymize data using JSON string or python dict rules in a PySpark environment.

A pytorch implementation of the ACL2019 paper "Simple and Effective Text Matching with Richer Alignment Features".

Universal Adversarial Triggers for Attacking and Analyzing NLP (EMNLP 2019)

Simple Text-Generator with OpenAI gpt-2 Pytorch Implementation

🤗 Transformers: State-of-the-art Natural Language Processing for Pytorch, TensorFlow, and JAX.

PyTorch Language Model for 1-Billion Word (LM1B / GBW) Dataset

Pervasive Attention: 2D Convolutional Networks for Sequence-to-Sequence Prediction

Meli Data Challenge 2021 - First Place Solution

WIT (Wikipedia-based Image Text) Dataset is a large multimodal multilingual dataset comprising 37M+ image-text sets with 11M+ unique images across 100+ languages.

The PASS dataset: pretrained models and how to get the data - PASS: Pictures without humAns for Self-Supervised Pretraining

Transformers4Rec is a flexible and efficient library for sequential and session-based recommendation, available for both PyTorch and Tensorflow.

An esoteric data type built entirely of NaNs.

Labelling platform for text using distant supervision

A PyTorch implementation of the Transformer model in "Attention is All You Need".

Implementation of Squeezenet in pytorch, pretrained models on Cifar 10 data to come

Recurrent Variational Autoencoder that generates sequential data implemented with pytorch

Hierarchical unsupervised and semi-supervised topic models for sparse count data with CorEx

A Domain Specific Language (DSL) for building language patterns. These can be later compiled into spaCy patterns, pure regex, or any other format

A model library for exploring state-of-the-art deep learning topologies and techniques for optimizing Natural Language Processing neural networks

Accurately generate all possible forms of an English word e.g "election" -- "elect", "electoral", "electorate" etc.