3093 Repositories
Python nlp-data Libraries
As we all know the BGMI Loot Crate comes with so many resources for the gamers, this ML Crate will be the hub of various ML projects which will be the resources for the ML enthusiasts! Open Source Program: SWOC 2021 and JWOC 2022.
Machine Learning Loot Crate 💻 🧰 🔴 Welcome contributors! As we all know the BGMI Loot Crate comes with so many resources for the gamers, this ML Cra
Learn Data Science with focus on adding value with the most efficient tech stack.
DataScienceWithPython Get started with Data Science with Python An engaging journey to become a Data Scientist with Python TL;DR Download all Jupyter
Package towards building Explainable Forecasting and Nowcasting Models with State-of-the-art Deep Neural Networks and Dynamic Factor Model on Time Series data sets with single line of code. Also, provides utilify facility for time-series signal similarities matching, and removing noise from timeseries signals.
DeepXF: Explainable Forecasting and Nowcasting with State-of-the-art Deep Neural Networks and Dynamic Factor Model Also, verify TS signal similarities
An end-to-end framework for mixed-integer optimization with data-driven learned constraints.
OptiCL OptiCL is an end-to-end framework for mixed-integer optimization (MIO) with data-driven learned constraints. We address a problem setting in wh
Data Inspector is an open-source python library that brings 15++ types of different functions to make EDA, data cleaning easier.
Data Inspector Data Inspector is an open-source python library that brings 15 types of different functions to make EDA, data cleaning easier. Author:
🚀 RocketQA, dense retrieval for information retrieval and question answering, including both Chinese and English state-of-the-art models.
In recent years, the dense retrievers based on pre-trained language models have achieved remarkable progress. To facilitate more developers using cutt
SciFive: a text-text transformer model for biomedical literature
SciFive SciFive provided a Text-Text framework for biomedical language and natural language in NLP. Under the T5's framework and desrbibed in the pape
Data and code to support "Applied Natural Language Processing" (INFO 256, Fall 2021, UC Berkeley)
anlp21 Course materials for "Applied Natural Language Processing" (INFO 256, Fall 2021, UC Berkeley) Syllabus: http://people.ischool.berkeley.edu/~dba
Multilingual Emotion classification using BERT (fine-tuning). Published at the WASSA workshop (ACL2022).
XLM-EMO: Multilingual Emotion Prediction in Social Media Text Abstract Detecting emotion in text allows social and computational scientists to study h
Official code for "Eigenlanes: Data-Driven Lane Descriptors for Structurally Diverse Lanes", CVPR2022
[CVPR 2022] Eigenlanes: Data-Driven Lane Descriptors for Structurally Diverse Lanes Dongkwon Jin, Wonhui Park, Seong-Gyun Jeong, Heeyeon Kwon, and Cha
Official Implementation of DE-DETR and DELA-DETR in "Towards Data-Efficient Detection Transformers"
DE-DETRs By Wen Wang, Jing Zhang, Yang Cao, Yongliang Shen, and Dacheng Tao This repository is an official implementation of DE-DETR and DELA-DETR in
StocksMA is a package to facilitate access to financial and economic data of Moroccan stocks.
Creating easier access to the Moroccan stock market data What is StocksMA ? StocksMA is a package to facilitate access to financial and economic data
ACL22 paper: Imputing Out-of-Vocabulary Embeddings with LOVE Makes Language Models Robust with Little Cost
Imputing Out-of-Vocabulary Embeddings with LOVE Makes Language Models Robust with Little Cost LOVE is accpeted by ACL22 main conference as a long pape
Official Implementation of DE-CondDETR and DELA-CondDETR in "Towards Data-Efficient Detection Transformers"
DE-DETRs By Wen Wang, Jing Zhang, Yang Cao, Yongliang Shen, and Dacheng Tao This repository is an official implementation of DE-CondDETR and DELA-Cond
Commonality in Natural Images Rescues GANs: Pretraining GANs with Generic and Privacy-free Synthetic Data - Official PyTorch Implementation (CVPR 2022)
Commonality in Natural Images Rescues GANs: Pretraining GANs with Generic and Privacy-free Synthetic Data (CVPR 2022) Potentials of primitive shapes f
FAST Aiming at the problems of cumbersome steps and slow download speed of GNSS data
FAST Aiming at the problems of cumbersome steps and slow download speed of GNSS data, a relatively complete set of integrated multi-source data download terminal software fast is developed. The software contains most of the data sources required in the process of GNSS scientific research and learning. The way of parallel download greatly improves the efficiency of download.
Code and data for ImageCoDe, a contextual vison-and-language benchmark
ImageCoDe This repository contains code and data for ImageCoDe: Image Retrieval from Contextual Descriptions. Data All collected descriptions for the
Implementation of some unbalanced loss like focal_loss, dice_loss, DSC Loss, GHM Loss et.al
Implementation of some unbalanced loss for NLP task like focal_loss, dice_loss, DSC Loss, GHM Loss et.al Summary Here is a loss implementation reposit
Pull sensitive data from users on windows including discord tokens and chrome data.
⭐ For a 🍪 Pegasus Pull sensitive data from users on windows including discord tokens and chrome data. Features 🟩 Discord tokens 🟩 Geolocation data
level2-data-annotation_cv-level2-cv-15 created by GitHub Classroom
[AI Tech 3기 Level2 P Stage] 글자 검출 대회 팀원 소개 김규리_T3016 박정현_T3094 석진혁_T3109 손정균_T3111 이현진_T3174 임종현_T3182 Overview OCR (Optimal Character Recognition) 기술
Code for the Findings of NAACL 2022(Long Paper): AdapterBias: Parameter-efficient Token-dependent Representation Shift for Adapters in NLP Tasks
AdapterBias: Parameter-efficient Token-dependent Representation Shift for Adapters in NLP Tasks arXiv link: upcoming To be published in Findings of NA
dbt (data build tool) adapter for Oracle Autonomous Database
dbt-oracle version 1.0.0 dbt (data build tool) adapter for the Oracle database. dbt "adapters" are responsible for adapting dbt's functionality to a g
DeepGNN is a framework for training machine learning models on large scale graph data.
DeepGNN Overview DeepGNN is a framework for training machine learning models on large scale graph data. DeepGNN contains all the necessary features in
Fluxos de captura e subida de dados no datalake da Prefeitura do Rio de Janeiro.
Pipelines Este repositório contém fluxos de captura e subida de dados no datalake da Prefeitura do Rio de Janeiro. O repositório é gerido pelo Escritó
Visualization of COVID-19 Omicron wave data in Seoul, Osaka, Tokyo, Hong Kong and Shanghai. 首尔、大阪、东京、香港、上海由新冠病毒 Omicron 变异株引起的本轮疫情数据可视化分析。
COVID-19 in East Asian Megacities This repository holds original Python code for processing and visualization COVID-19 data in East Asian megacities a
Princeton NLP's pre-training library based on fairseq with DeepSpeed kernel integration 🚃
This repository provides a library for efficient training of masked language models (MLM), built with fairseq. We fork fairseq to give researchers mor
Addon and nodes for working with structural biology and molecular data in Blender.
Molecular Nodes 🧬 🔬 💻 Buy Me a Coffee to Keep Development Going! Join a Community of Blender SciVis People! What is Molecular Nodes? Molecular Node
Helping data scientists better understand their datasets and models in text classification. With love from ServiceNow.
Azimuth, an open-source dataset and error analysis tool for text classification, with love from ServiceNow. Overview Azimuth is an open source applica
This library is helpful when creating accounts, it has everything you need for this
AccountGeneratorHelper Library to facilitate accounts generation. Unofficial API for temp email services. Receive SMS from free services. Parsing and
A Flask Sentiment Analysis API, with visual implementation
The Sentiment Analysis Api was created using python flask module,it allows users to parse a text or sentence throught the (?text) arguement, then view the sentiment analysis of that sentence. It can be implementable into a web application.
It connects to Telegram's API. It generates JSON files containing channel's data, including channel's information and posts.
It connects to Telegram's API. It generates JSON files containing channel's data, including channel's information and posts. You can search for a specific channel, or a set of channels provided in a text file (one channel per line.)
Adansons Base is a data management tool that organizes metadata of unstructured data and creates and organizes datasets.
Adansons Base is a data management tool that organizes metadata of unstructured data and creates and organizes datasets. It makes dataset creation more effective and helps find essential insights from training results and improves AI performance.
Data visualization app for H&M competition in kaggle
handm_data_visualize_app Data visualization app by streamlit for H&M competition in kaggle. competition page: https://www.kaggle.com/competitions/h-an
Snakemake worflow to process and filter long read data from Oxford Nanopore Technologies.
Nanopore-Workflow Snakemake workflow to process and filter long read data from Oxford Nanopore Technologies. It is designed to compare whole human gen
This repository contains the data and code for the paper "Diverse Text Generation via Variational Encoder-Decoder Models with Gaussian Process Priors" (SPNLP@ACL2022)
GP-VAE This repository provides datasets and code for preprocessing, training and testing models for the paper: Diverse Text Generation via Variationa
Text classification is one of the popular tasks in NLP that allows a program to classify free-text documents based on pre-defined classes.
Deep-Learning-for-Text-Document-Classification Text classification is one of the popular tasks in NLP that allows a program to classify free-text docu
Official repository of the paper Privacy-friendly Synthetic Data for the Development of Face Morphing Attack Detectors
SMDD-Synthetic-Face-Morphing-Attack-Detection-Development-dataset Official repository of the paper Privacy-friendly Synthetic Data for the Development
scAR (single-cell Ambient Remover) is a package for data denoising in single-cell omics.
scAR scAR (single cell Ambient Remover) is a package for denoising multiple single cell omics data. It can be used for multiple tasks, such as, sgRNA
My implementation of Safaricom Machine Learning Codility test. The code has bugs, logical I guess I made errors and any correction will be appreciated.
Safaricom_Codility Machine Learning 2022 The test entails two questions. Question 1 was on Machine Learning. Question 2 was on SQL I ran out of time.
This is the data scrapped of all the pitches made up potential startup's to established bussiness tycoons of India with all the details of Investments made, equity share, Name of investor etc.
SharkTankInvestor This is the data scrapped of all the pitches made up potential startup's to established bussiness tycoons of India with all the deta
Train CNNs for the fruits360 data set in NTOU CS「Machine Vision」class.
CNNs fruits360 Train CNNs for the fruits360 data set in NTOU CS「Machine Vision」class. CNN on a pretrained model Build a CNN on a pretrained model, Res
This is a Text Data Analysis Project Involving (YouTube Case Study).
Text_Data_Analysis This is a Text Data Analysis Project Involving (YouTube Case Study). Problem Statement = Sentiment Analysis. Package1: There are m
Under the hood working of transformers, fine-tuning GPT-3 models, DeBERTa, vision models, and the start of Metaverse, using a variety of NLP platforms: Hugging Face, OpenAI API, Trax, and AllenNLP
Transformers-for-NLP-2nd-Edition @copyright 2022, Packt Publishing, Denis Rothman Contact me for any question you have on LinkedIn Get the book on Ama
Implementaion of our ACL 2022 paper Bridging the Data Gap between Training and Inference for Unsupervised Neural Machine Translation
Bridging the Data Gap between Training and Inference for Unsupervised Neural Machine Translation This is the implementaion of our paper: Bridging the
DataAnalysis: Some data analysis projects in charles_pikachu
DataAnalysis DataAnalysis: Some data analysis projects in charles_pikachu You can star this repository to keep track of the project if it's helpful fo
An algorithmic trading bot that learns and adapts to new data and evolving markets using Financial Python Programming and Machine Learning.
ALgorithmic_Trading_with_ML An algorithmic trading bot that learns and adapts to new data and evolving markets using Financial Python Programming and
Semantic Data Management - Property Graphs 📈
SDM - Lab 1 @ UPC 👨🏻💻 Table of contents Introduction Property Graph Dataset 1. Introduction This repo is all about what we have done in SDM lab 1
Code for the Open Data Day 2022 publicbodies.org Nepal data scraping activities.
Open Data Day Publicbodies.org Nepal We've gathered on Saturday, 5th March 2022 with Open Knowledge Nepal in order to try and automate the collection
Arcpy Tool developed for ArcMap 10.x that checks DVOF points against TDS data and creates an output feature class as well as a check database.
DVOF_check_tool Arcpy Tool developed for ArcMap 10.x that checks DVOF points against TDS data and creates an output feature class as well as a check d
Scrapping malaysianpaygap & Extracting data from the Instagram posts
Scrapping malaysianpaygap & Extracting data from the posts Recently @malaysianpaygap has gotten quite famous as a platform that enables workers throug
This is a general repo that helps you develop fast/effective NLP classifiers using Huggingface
NLP Classifier Introduction This project trains a bert model on any NLP classifcation model. And uses the model in make predictions on new data using
A simple Streamlit App to classify swahili news into different categories.
Swahili News Classifier Streamlit App A simple app to classify swahili news into different categories. Installation Install all streamlit requirements
Linking data between GBIF, Biodiverse, and Open Tree of Life
GBIF-biodiverse-OpenTree Linking data between GBIF, Biodiverse, and Open Tree of Life The python scripts will rely on opentree and Dendropy. To set up
A Python module to encrypt and decrypt data with AES-128 CFB mode.
cryptocfb A Python module to encrypt and decrypt data with AES-128 CFB mode. This module supports 8/64/128-bit CFB mode. It can encrypt and decrypt la
LeetComp - Background tasks powering the static content at LeetComp
LeetComp Analysing compensations mentioned on the Leetcode forums (https://kuuts
Data Structures and Algorithms Python - Practice data structures and algorithms in python with few small projects
Data Structures and Algorithms All the essential resources and template code nee
WebScraping - Scrapes Job website for python developer jobs and exports the data to a csv file
WebScraping Web scraping Pyton program that scrapes Job website for python devel
ZeroGen: Efficient Zero-shot Learning via Dataset Generation
ZEROGEN This repository contains the code for our paper “ZeroGen: Efficient Zero
Hl classification bc - A Network-Based High-Level Data Classification Algorithm Using Betweenness Centrality
A Network-Based High-Level Data Classification Algorithm Using Betweenness Centr
Rootski - Full codebase for rootski.io (without the data)
📣 Welcome to the Rootski codebase! This is the codebase for the application run
Transformers-regression - Regression Bugs Are In Your Model! Measuring, Reducing and Analyzing Regressions In NLP Model Updates
Regression Free Model Update Code for the paper: Regression Bugs Are In Your Mod
SimCTG - A Contrastive Framework for Neural Text Generation
A Contrastive Framework for Neural Text Generation Authors: Yixuan Su, Tian Lan,
Job-Recommend-Competition - Vectorwise Interpretable Attentions for Multimodal Tabular Data
SiD - Simple Deep Model Vectorwise Interpretable Attentions for Multimodal Tabul
Comp445 project - Data Communications & Computer Networks
COMP-445 Data Communications & Computer Networks Change Python version in Conda
Faune proche - Retrieval of Faune-France data near a google maps location
faune_proche Récupération des données de Faune-France près d'un lieu google maps
LotteryBuyPredictionWebApp - Lottery Purchase Prediction Model
Lottery Purchase Prediction Model Objective and Goal Predict the lottery type th
Text-Summarization-using-NLP - Text Summarization using NLP to fetch BBC News Article and summarize its text and also it includes custom article Summarization
Text-Summarization-using-NLP Text Summarization using NLP to fetch BBC News Arti
Gesture recognition on Event Data
Event based Gesture Recognition Gesture recognition on Event Data usually involv
VHub - An API that permits uploading of vulnerability datasets and return of the serialized data
VHub - An API that permits uploading of vulnerability datasets and return of the serialized data
Explore-bikeshare-data - GitHub project as part of the Programming for Data Science with Python Nanodegree from Udacity
Date created February 10, 2022 Project Title Explore US Bikeshare Data Descripti
Using PyTorch Perform intent classification using three different models to see which one is better for this task
Using PyTorch Perform intent classification using three different models to see which one is better for this task
OpenDelta - An Open-Source Framework for Paramter Efficient Tuning.
OpenDelta is a toolkit for parameter efficient methods (we dub it as delta tuning), by which users could flexibly assign (or add) a small amount parameters to update while keeping the most paramters frozen. By using OpenDelta, users could easily implement prefix-tuning, adapters, Lora, or any other types of delta tuning with preferred PTMs.
An arduino/ESP project that can play back G-Force data previously recorded
An arduino/ESP project that can play back G-Force data previously recorded
The LaTeX and Python code for generating the paper, experiments' results and visualizations reported in each paper is available (whenever possible) in the paper's directory
This repository contains the software implementation of most algorithms used or developed in my research. The LaTeX and Python code for generating the
Skforecast is a python library that eases using scikit-learn regressors as multi-step forecasters
Skforecast is a python library that eases using scikit-learn regressors as multi-step forecasters. It also works with any regressor compatible with the scikit-learn API (pipelines, CatBoost, LightGBM, XGBoost, Ranger...).
Code and Datasets from the paper "Self-supervised contrastive learning for volcanic unrest detection from InSAR data"
Code and Datasets from the paper "Self-supervised contrastive learning for volcanic unrest detection from InSAR data" You can download the pretrained
CZU-MHAD: A multimodal dataset for human action recognition utilizing a depth camera and 10 wearable inertial sensors
CZU-MHAD: A multimodal dataset for human action recognition utilizing a depth camera and 10 wearable inertial sensors In order to facilitate the res
VISNOTATE: An Opensource tool for Gaze-based Annotation of WSI Data
VISNOTATE: An Opensource tool for Gaze-based Annotation of WSI Data Introduction Requirements Installation and Setup Supported Hardware and Software R
Data science project for exploratory analysis on the kcse grades dataset (Kamilimu Data Science Track)
Kcse-Data-Analysis Data science project for exploratory analysis on the kcse grades dataset (Kamilimu Data Science Track) Findings The performance of
Vaex library for Big Data Analytics of an Airline dataset
Vaex-Big-Data-Analytics-for-Airline-data A Python notebook (ipynb) created in Jupyter Notebook, which utilizes the Vaex library for Big Data Analytics
Data science/Analysis Health Care Portfolio
Health-Care-DS-Projects Data Science/Analysis Health Care Portfolio Consists Of 3 Projects: Mexico Covid-19 project, analyze the patient medical histo
Automatic game data translator for RPGMaker-MV
RPGMaker-MV Translator 🕹️ 🎮 Use AI to translate all the dialogs and texts of your RPGMaker automatically. 👊 You worked hard to make your game, now
This repository contnains sample problems with test cases using Cormen-Lib
Cormen Lib Sample Problems Description This repository contnains sample problems with test cases using Cormen-Lib. These problems were made for the pu
Implementation of ToeplitzLDA for spatiotemporal stationary time series data.
Code for the ToeplitzLDA classifier proposed in here. The classifier conforms sklearn and can be used as a drop-in replacement for other LDA classifiers. For in-depth usage refer to the learning from label proportions (LLP) example or the example script.
LightGBM + Optuna: no brainer
AutoLGBM LightGBM + Optuna: no brainer auto train lightgbm directly from CSV files auto tune lightgbm using optuna auto serve best lightgbm model usin
An Optical Character Recognition system using Pytesseract/Extracting data from Blood Pressure Reports.
Optical_Character_Recognition An Optical Character Recognition system using Pytesseract/Extracting data from Blood Pressure Reports. As an IOT/Compute
This repository has a implementations of data augmentation for NLP for Japanese.
daaja This repository has a implementations of data augmentation for NLP for Japanese: EDA: Easy Data Augmentation Techniques for Boosting Performance
MoCap-Solver: A Neural Solver for Optical Motion Capture Data
MoCap-Solver is a data-driven-based robust marker denoising method, which takes raw mocap markers as input and outputs corresponding clean markers and skeleton motions.
Image-to-image regression with uncertainty quantification in PyTorch
Image-to-image regression with uncertainty quantification in PyTorch. Take any dataset and train a model to regress images to images with rigorous, distribution-free uncertainty quantification.
Crowd-Kit is a powerful Python library that implements commonly-used aggregation methods for crowdsourced annotation and offers the relevant metrics and datasets
Crowd-Kit: Computational Quality Control for Crowdsourcing Documentation Crowd-Kit is a powerful Python library that implements commonly-used aggregat
Supplementary Data for Evolving Reinforcement Learning Algorithms
evolvingrl Supplementary Data for Evolving Reinforcement Learning Algorithms This dataset contains 1000 loss graphs from two experiments: 500 unique g
Course materials for Fall 2021 "CIS6930 Topics in Computing for Data Science" at New College of Florida
Fall 2021 CIS6930 Topics in Computing for Data Science This repository hosts course materials used for a 13-week course "CIS6930 Topics in Computing f
A repo for materials relating to the tutorial of CS-332 NLP
CS-332-NLP A repo for materials relating to the tutorial of CS-332 NLP Contents Tutorial 1: Introduction Corpus Regular expression Tokenization Tutori
Customizing Visual Styles in Plotly
Customizing Visual Styles in Plotly Code for a workshop originally developed for an Unconference session during the Outlier Conference hosted by Data
What are the best Systems? New Perspectives on NLP Benchmarking
What are the best Systems? New Perspectives on NLP Benchmarking In Machine Learning, a benchmark refers to an ensemble of datasets associated with one
Geospatial data-science analysis on reasons behind delay in Grab ride-share services
Grab x Pulis Detailed analysis done to investigate possible reasons for delay in Grab services for NUS Data Analytics Competition 2022, to be found in
metedraw is a project mainly for data visualization projects of Atmospheric Science, Marine Science, Environmental Science or other majors
It is mainly for data visualization projects of Atmospheric Science, Marine Science, Environmental Science or other majors.
OpenStats is a library built on top of streamlit that extracts data from the Github API and shows the main KPIs
Open Stats Discover and share the KPIs of your OpenSource project. OpenStats is a library built on top of streamlit that extracts data from the Github
A research of IT labor market based especially on hh.ru. Salaries, rate of technologies and etc.
hh_ru_research Проект реализован в учебных целях анализа рынка труда, в особенности по hh.ru Input data В качестве входных данных используются сериали
To build a regression model to predict the concrete compressive strength based on the different features in the training data.
Cement-Strength-Prediction Problem Statement To build a regression model to predict the concrete compressive strength based on the different features