3094 Python Swahili-nlp-data Libraries

Discord webhooks for alerting crypto currency price changes & historical data.

Crypto-Discord Discord Webhooks for alerting crypto currency price changes & historical data. Create virtual environment and install requirements. $ s

1 Sep 2, 2022

Driver Analysis with Factors and Forests: An Automated Data Science Tool using Python

Driver Analysis with Factors and Forests: An Automated Data Science Tool using Python 📊

2 May 26, 2022

Realtime data read and write without page refresh using Ajax in Django.

Realtime read-write with AJAX Hey,this is the basic implementation type of ajax realtime read write from the database. where you can insert or view re

3 Dec 13, 2022

In this workshop we will be exploring NLP state of the art transformers, with SOTA models like T5 and BERT, then build a model using HugginFace transformers framework.

Transformers are all you need In this workshop we will be exploring NLP state of the art transformers, with SOTA models like T5 and BERT, then build a

8 Apr 13, 2022

Robust and blazing fast open-redirect vulnerability scanner with ability of recursevely crawling all of web-forms, entry points, or links with data.

After Golismero project got dead there is no more any up to date open-source tool that can collect links with parametrs and web-forms and then test th

34 Aug 25, 2022

Repository for the Demo of using DVC with PyCaret & MLOps (DVC Office Hours - 20th Jan, 2022)

Using DVC with PyCaret & FastAPI (Demo) This repo contains all the resources for my demo explaining how to use DVC along with other interesting tools

6 Jul 22, 2022

The mitosheet package, trymito.io, and other public Mito code.

Mito Monorepo Mito is a spreadsheet that lives inside your JupyterLab notebooks. It allows you to edit Pandas dataframes like an Excel file, and gener

1.4k Dec 31, 2022

Implemented Exploratory Data Analysis (EDA) using Python.Built a dashboard in Tableau and found that 45.87% of People suffer from heart disease.

Heart_Disease_Diagnostic_Analysis Objective 🎯 The aim of this project is to use the given data and perform ETL and data analysis to infer key metrics

4 Jan 28, 2022

RuCLIP-SB (Russian Contrastive Language–Image Pretraining SWIN-BERT) is a multimodal model for obtaining images and text similarities and rearranging captions and pictures. Unlike other versions of the model we use BERT for text encoder and SWIN transformer for image encoder.

ruCLIP-SB RuCLIP-SB (Russian Contrastive Language–Image Pretraining SWIN-BERT) is a multimodal model for obtaining images and text similarities and re

5 Apr 13, 2022

Explaining Deep Neural Networks - A comparison of different CAM methods based on an insect data set

Explaining Deep Neural Networks - A comparison of different CAM methods based on an insect data set This is the repository for the Deep Learning proje

3 Feb 6, 2022

CMSC320 - Introduction to Data Science - Fall 2021

CMSC320 - Introduction to Data Science - Fall 2021 Instructors: Elias Jonatan Gonzalez and José Manuel Calderón Trilla Lectures: MW 3:30-4:45 & 5:00-6

6 Sep 12, 2022

clustering moroccan stocks time series data using k-means with dtw (dynamic time warping)

Moroccan Stocks Clustering Context Hey! we don't always have to forecast time series am I right ? We use k-means to cluster about 70 moroccan stock pr

7 Oct 18, 2022

This repository contains project created during the Data Challenge module at London School of Hygiene & Tropical Medicine

LSHTM_RCS This repository contains project created during the Data Challenge module at London School of Hygiene & Tropical Medicine (LSHTM) in collabo

3 Jan 30, 2022

Data Engineering ZoomCamp

Data Engineering ZoomCamp I'm partaking in a Data Engineering Bootcamp / Zoomcamp and will be tracking my progress here. I can't promise these notes w

61 Jan 6, 2023

Processed, version controlled history of Minecraft's generated data and assets

mcmeta Processed, version controlled history of Minecraft's generated data and assets Repository structure Each of the following branches has a commit

75 Dec 28, 2022

A web app builds using streamlit API with python backend to analyze and pick insides from multiple data formats.

Data-Analysis-Web-App Data Analysis Web App can analysis data in multiple formates(csv, txt, xls, xlsx, ods, odt) and gives shows you the analysis in

19 Dec 9, 2022

This repository provides the official code for GeNER (an automated dataset Generation framework for NER).

GeNER This repository provides the official code for GeNER (an automated dataset Generation framework for NER). Overview of GeNER GeNER allows you to

50 Nov 30, 2022

TIANCHI Purchase Redemption Forecast Challenge

4 Aug 26, 2022

NLP applications using deep learning.

NLP-Natural-Language-Processing NLP applications using deep learning like text generation etc. 1- Poetry Generation: Using a collection of Irish Poem

1 Jan 27, 2022

WikiPron - a command-line tool and Python API for mining multilingual pronunciation data from Wiktionary

WikiPron WikiPron is a command-line tool and Python API for mining multilingual pronunciation data from Wiktionary, as well as a database of pronuncia

213 Jan 1, 2023

Import some key/value data to Prometheus custom-built Node Exporter in Python

About the app In one particilar project, i had to import some key/value data to Prometheus. So i have decided to create my custom-built Node Exporter

1 May 19, 2022

Machine Learning e Data Science com Python

Machine Learning e Data Science com Python Arquivos do curso de Data Science e Machine Learning com Python na Udemy, cliqe aqui para acessá-lo. O prin

1 Jan 27, 2022

Clean and reusable data-sciency notebooks.

KPACUBO KPACUBO is a set Jupyter notebooks focused on the best practices in both software development and data science, namely, code reuse, explicit d

1 Jan 28, 2022

This repository contains the raw data and a python notebook to ingest historical A&E attendance data and then use a simple Prophet model to predict the number of A&E attendances in England if the COVID-19 pandemic had not happened

ae_attendances_modelling This repository contains the raw data and a python notebook to ingest historical A&E attendance data and then use a simple Pr

2 Mar 29, 2022

FAIR Enough Metrics is an API for various FAIR Metrics Tests, written in python

☑️ FAIR Enough metrics for research FAIR Enough Metrics is an API for various FAIR Metrics Tests, written in python, conforming to the specifications

3 Jul 6, 2022

This repository contains the implementation of the HealthGen model, a generative model to synthesize realistic EHR time series data with missingness

HealthGen: Conditional EHR Time Series Generation This repository contains the implementation of the HealthGen model, a generative model to synthesize

0 Jan 20, 2022

A PyTorch implementation for our paper "Dual Contrastive Learning: Text Classification via Label-Aware Data Augmentation".

Dual-Contrastive-Learning A PyTorch implementation for our paper "Dual Contrastive Learning: Text Classification via Label-Aware Data Augmentation". Y

85 Dec 26, 2022

PyTorch implementation for the paper Visual Representation Learning with Self-Supervised Attention for Low-Label High-Data Regime

Visual Representation Learning with Self-Supervised Attention for Low-Label High-Data Regime Created by Prarthana Bhattacharyya. Disclaimer: This is n

5 Nov 8, 2022

Leaf: Multiple-Choice Question Generation

Leaf: Multiple-Choice Question Generation Easy to use and understand multiple-choice question generation algorithm using T5 Transformers. The applicat

62 Dec 20, 2022

coldcuts is an R package to automatically generate and plot segmentation drawings in R

coldcuts coldcuts is an R package that allows you to draw and plot automatically segmentations from 3D voxel arrays. The name is inspired by one of It

2 Sep 3, 2022

Tackling data scarcity in Speech Translation using zero-shot multilingual Machine Translation techniques

Tackling data scarcity in Speech Translation using zero-shot multilingual Machine Translation techniques This repository is derived from the NMTGMinor

1 Sep 7, 2022

Semantic code search implementation using Tensorflow framework and the source code data from the CodeSearchNet project

Semantic Code Search Semantic code search implementation using Tensorflow framework and the source code data from the CodeSearchNet project. The model

24 Nov 29, 2022

Code for the paper titled "Prabhupadavani: A Code-mixed Speech Translation Data for 25 languages"

Prabhupadavani: A Code-mixed Speech Translation Data for 25 languages Code for the paper titled "Prabhupadavani: A Code-mixed Speech Translation Data

12 Dec 1, 2022

A classification model capable of accurately predicting the price of secondhand cars

The purpose of this project is create a classification model capable of accurately predicting the price of secondhand cars. The data used for model building is open source and has been added to this repository. Most packages used are usually pre-installed in most developed environments and tools like collab, jupyter, etc. This can be useful for people looking to enhance the way the code their predicitve models and efficient ways to deal with tabular data!

2 Sep 13, 2022

Analyzing Earth Observation (EO) data is complex and solutions often require custom tailored algorithms.

eo-grow Earth observation framework for scaled-up processing in Python. Analyzing Earth Observation (EO) data is complex and solutions often require c

18 Dec 23, 2022

Predict the spans of toxic posts that were responsible for the toxic label of the posts

toxic-spans-detection An attempt at the SemEval 2021 Task 5: Toxic Spans Detection. The Toxic Spans Detection task of SemEval2021 required participant

3 Jul 24, 2022

Statistical & Probabilistic Analysis of Store Sales, University Survey, & Manufacturing data

Statistical_Modelling Statistical & Probabilistic Analysis of Store Sales, University Survey, & Manufacturing data Statistical Methods for Decision Ma

1 Jan 27, 2022

Melanoma Skin Cancer Detection using Convolutional Neural Networks and Transfer Learning🕵🏻‍♂️

This is a Kaggle competition in which we have to identify if the given lesion image is malignant or not for Melanoma which is a type of skin cancer.

1 Jan 27, 2022

Final Project for Practical Python Programming and Algorithms for Data Analysis

Final Project for Practical Python Programming and Algorithms for Data Analysis (PHW2781L, Summer 2020) Redlining, Race-Exclusive Deed Restriction Lan

1 Jan 27, 2022

Fetch fund data from avanza.se using Python and some web scraping with bs4

Py(A)vanza Fetch fund data from avanza.se using Python and some web scraping with bs4. The default way is to display the data in the terminal, apply -

1 Jan 27, 2022

Storing, versioning, and downloading files from S3 made as easy as using open() in Python. Caching included.

open(LARGE) Storing, versioning, and downloading files from S3 made as easy as using open() in Python. Caching included. Motivation Oftentimes, especi

2 Jan 30, 2022

This library provides an abstraction to perform Model Versioning using Weight & Biases.

Description This library provides an abstraction to perform Model Versioning using Weight & Biases. Features Version a new trained model Promote a mod

2 Jan 28, 2022

Python-geoarrow - Storing geometry data in Apache Arrow format

geoarrow Storing geometry data in Apache Arrow format Installation $ pip install

11 Mar 3, 2022

Scikit-event-correlation - Event Correlation and Forecasting over High Dimensional Streaming Sensor Data algorithms

scikit-event-correlation Event Correlation and Changing Detection Algorithm Theo

5 Oct 30, 2022

This Repository is an up-to-date version of Harvard nlp's Legacy code and a Refactoring of the jupyter notebook version as a shell script version.

17 Sep 25, 2022

Graviti-python-sdk - Graviti Data Platform Python SDK

Graviti Python SDK Graviti Python SDK is a python library to access Graviti Data

13 Dec 15, 2022

DomainMonitor is a web project that has a RESTful API to get a domain's subdomains and whois data.

2 Feb 5, 2022

Twitter bot that uses NLP models to summarize news articles referenced in a user's twitter timeline

Twitter-News-Summarizer Twitter bot that uses NLP models to summarize news articles referenced in a user's twitter timeline 1.) Extracts all tweets fr

1 Jan 27, 2022

OptiPLANT is a cloud-based based system that empowers professional and non-professional data scientists to build high-quality predictive models

OptiPLANT OptiPLANT is a cloud-based based system that empowers professional and non-professional data scientists to build high-quality predictive mod

1 Jan 26, 2022

This is a NLP based project to extract effective date of the contract from their text files.

Date-Extraction-from-Contracts This is a NLP based project to extract effective date of the contract from their text files. Problem statement This is

1 Jan 26, 2022

Dynamic vae - Dynamic VAE algorithm is used for anomaly detection of battery data

Dynamic VAE frame Automatic feature extraction can be achieved by probability di

10 Oct 7, 2022

This repo provides the source code & data of our paper "GreaseLM: Graph REASoning Enhanced Language Models"

GreaseLM: Graph REASoning Enhanced Language Models This repo provides the source code & data of our paper "GreaseLM: Graph REASoning Enhanced Language

137 Jan 2, 2023

Annotating the Tweebank Corpus on Named Entity Recognition and Building NLP Models for Social Media Analysis

TweebankNLP This repo contains the new Tweebank-NER dataset and off-the-shelf Twitter-Stanza pipeline for state-of-the-art Tweet NLP, as described in

42 Jan 26, 2022

Clustering is a popular approach to detect patterns in unlabeled data

Visual Clustering Clustering is a popular approach to detect patterns in unlabeled data. Existing clustering methods typically treat samples in a data

24 Nov 11, 2022

SpyQL - SQL with Python in the middle

SpyQL SQL with Python in the middle Concept SpyQL is a query language that combines: the simplicity and structure of SQL with the power and readabilit

853 Dec 30, 2022

Data-sets from the survey and analysis

bachelor-thesis "Umfragewerte.xlsx" contains the orginal survey results. "umfrage_alle.csv" contains the survey results but one participant is cancele

1 Jan 26, 2022

Compute execution plan: A DAG representation of work that you want to get done. Individual nodes of the DAG could be simple python or shell tasks or complex deeply nested parallel branches or embedded DAGs themselves.

Hello from magnus Magnus provides four capabilities for data teams: Compute execution plan: A DAG representation of work that you want to get done. In

12 Feb 8, 2022

Es-schema - Common Data Schemas for Elasticsearch

Common Data Schemas for Elasticsearch The Common Data Schema for Elasticsearch i

2 Jan 25, 2022

Data analysis and visualisation projects from a range of individual projects and applications

Python-Data-Analysis-and-Visualisation-Projects Data analysis and visualisation projects from a range of individual projects and applications. Python

1 Jan 25, 2022

Predicting Auction Sale Price using the kaggle bulldozer auction sales data: Modeling with Ensembles vs Neural Network

Predicting Auction Sale Price using the kaggle bulldozer auction sales data: Modeling with Ensembles vs Neural Network The performances of tree ensemb

2 Sep 13, 2022

A program made in PYTHON🐍 that automatically performs data insertions into a POSTGRES database 🐘 , using as base a .CSV file 📁 , useful in mass data insertions

A program made in PYTHON🐍 that automatically performs data insertions into a POSTGRES database 🐘 , using as base a .CSV file 📁 , useful in mass data insertions.

1 Oct 17, 2022

Labelbox is the fastest way to annotate data to build and ship artificial intelligence applications

Labelbox Labelbox is the fastest way to annotate data to build and ship artificial intelligence applications. Use this github repository to help you s

1.7k Dec 29, 2022

Toward Model Interpretability in Medical NLP

Toward Model Interpretability in Medical NLP LING380: Topics in Computational Linguistics Final Project James Cross ([email protected]) and Daniel Kim

1 Mar 4, 2022

Wikipedia-Utils: Preprocessing Wikipedia Texts for NLP

Wikipedia-Utils: Preprocessing Wikipedia Texts for NLP This repository maintains some utility scripts for retrieving and preprocessing Wikipedia text

44 Oct 19, 2022

BERTopic is a topic modeling technique that leverages 🤗 transformers and c-TF-IDF to create dense clusters allowing for easily interpretable topics whilst keeping important words in the topic descriptions

BERTopic BERTopic is a topic modeling technique that leverages 🤗 transformers and c-TF-IDF to create dense clusters allowing for easily interpretable

3.6k Jan 7, 2023

FairLens is an open source Python library for automatically discovering bias and measuring fairness in data

FairLens FairLens is an open source Python library for automatically discovering bias and measuring fairness in data. The package can be used to quick

69 Dec 15, 2022

This is a curated list of medical data for machine learning

Medical Data for Machine Learning This is a curated list of medical data for machine learning. This list is provided for informational purposes only,

5.4k Dec 26, 2022

Segment axon and myelin from microscopy data using deep learning

Segment axon and myelin from microscopy data using deep learning. Written in Python. Using the TensorFlow framework. Based on a convolutional neural network architecture. Pixels are classified as either axon, myelin or background.

103 Nov 29, 2022

Equipped customers with insights about their EVs Hourly energy consumption and helped predict future charging behavior using LSTM model

Equipped customers with insights about their EVs Hourly energy consumption and helped predict future charging behavior using LSTM model. Designed sample dashboard with insights and recommendation for customers.

2 Apr 7, 2022

A Celery application to collect data, download media and extract information from social media APIs

Project IBEX A Celery application to collect data, download media and extract information from social media APIs. Requirements You must have a Redis D

4 Dec 15, 2022

Data processing with Pandas.

Processing-data-with-python This is a simple example showing how to use Pandas to create a dataframe and the processing data with python. The jupyter

1 Jan 23, 2022

Nobel Data Analysis

Nobel_Data_Analysis This project is for analyzing a set of data about people who have won the Nobel Prize in different fields and different countries

1 Jan 24, 2022

Investigating EV charging data

Investigating EV charging data Introduction: Got an opportunity to work with a home monitoring technology company over the last 6 months whose goal wa

2 Apr 7, 2022

Analyze the Gravitational wave data stored at LIGO/VIRGO observatories

Gravitational-Wave-Analysis This project showcases how to analyze the Gravitational wave data stored at LIGO/VIRGO observatories, using Python program

1 Jan 23, 2022

This python script allows you to manipulate the audience data from Sl.ido surveys

Slido-Automated-VoteBot This python script allows you to manipulate the audience data from Sl.ido surveys Since Slido blocks interference from automat

1 Jan 24, 2022

Stanford CoreNLP provides a set of natural language analysis tools written in Java

Stanford CoreNLP Stanford CoreNLP provides a set of natural language analysis tools written in Java. It can take raw human language text input and giv

8.8k Jan 7, 2023

Cormen-Lib - An academic tool for data structures and algorithms courses

The Cormen-lib module is an insular data structures and algorithms library based on the Thomas H. Cormen's Introduction to Algorithms Third Edition. This library was made specifically for administering and grading assignments related to data structure and algorithms in computer science.

12 Aug 18, 2022

Collapse by Conditioning: Training Class-conditional GANs with Limited Data

Collapse by Conditioning: Training Class-conditional GANs with Limited Data Moha

33 Dec 6, 2022

GoogleFormSpammer - A simple CLI script to spam Google Forms used by Crypto Wallet scammers to collect stolen data

14 Dec 17, 2022

Implement the Perspective open source code in preparation for data visualization

Task Overview | Installation Instructions | Link to Module 2 Introduction Experience Technology at JP Morgan Chase Try out what real work is like in t

1 Jan 23, 2022

PostQF is a user-friendly Postfix queue data filter which operates on data produced by postqueue -j.

11 Nov 24, 2022

Synthetic data need to preserve the statistical properties of real data in terms of their individual behavior and (inter-)dependences

Synthetic data need to preserve the statistical properties of real data in terms of their individual behavior and (inter-)dependences. Copula and functional Principle Component Analysis (fPCA) are statistical models that allow these properties to be simulated (Joe 2014). As such, copula generated data have shown potential to improve the generalization of machine learning (ML) emulators (Meyer et al. 2021) or anonymize real-data datasets (Patki et al. 2016).

32 Dec 20, 2022

An easy-to-use feature store

A feature store is a data storage system for data science and machine-learning. It can store raw data and also transformed features, which can be fed straight into an ML model or training script.

48 Dec 9, 2022

Learn Basic to advanced level Data visualisation techniques from this Repository

Data visualisation Hey, You can learn Basic to advanced level Data visualisation techniques from this Repository. Data visualization is the graphic re

16 Jan 3, 2023

Create charts with Python in a very similar way to creating charts using Chart.js

Create charts with Python in a very similar way to creating charts using Chart.js. The charts created are fully configurable, interactive and modular and are displayed directly in the output of the the cells of your jupyter notebook environment.

68 Dec 8, 2022

visualize_ML is a python package made to visualize some of the steps involved while dealing with a Machine Learning problem

visualize_ML visualize_ML is a python package made to visualize some of the steps involved while dealing with a Machine Learning problem. It is build

164 Dec 12, 2022

Equibles Stocks API for Python

Equibles Stocks API for Python Requirements. Python 2.7 and 3.4+ Installation & Usage pip install If the python package is hosted on Github, you can i

3 Apr 15, 2022

A server and client for passing data between computercraft computers/turtles across dimensions or even servers.

ccserver A server and client for passing data between computercraft computers/turtles across dimensions or even servers. pastebin get zUnE5N0v client

1 Jan 22, 2022

Feature engineering and machine learning: together at last

Feature engineering and machine learning: together at last! Lambdo is a workflow engine which significantly simplifies data analysis by unifying featu

14 Sep 15, 2022

INFO-H515 - Big Data Scalable Analytics

INFO-H515 - Big Data Scalable Analytics Jacopo De Stefani, Giovanni Buroni, Théo Verhelst and Gianluca Bontempi - Machine Learning Group Exercise clas

58 Dec 11, 2022

This module is used to create Convolutional AutoEncoders for Variational Data Assimilation

VarDACAE This module is used to create Convolutional AutoEncoders for Variational Data Assimilation. A user can define, create and train an AE for Dat

23 Dec 16, 2022

Tutorials, examples, collections, and everything else that falls into the categories: pattern classification, machine learning, and data mining

**Tutorials, examples, collections, and everything else that falls into the categories: pattern classification, machine learning, and data mining.** S

4k Dec 30, 2022

Perform sentiment analysis on textual data that people generally post on websites like social networks and movie review sites.

Sentiment Analyzer The goal of this project is to perform sentiment analysis on textual data that people generally post on websites like social networ

53 Mar 1, 2022

Japanese NLP Library

Japanese NLP Library Back to Home Contents 1 Requirements 1.1 Links 1.2 Install 1.3 History 2 Libraries and Modules 2.1 Tokenize jTokenize.py 2.2 Cabo

144 Dec 27, 2022

Dive into Machine Learning

Dive into Machine Learning Hi there! You might find this guide helpful if: You know Python or you're learning it 🐍 You're new to Machine Learning You

11.1k Jan 3, 2023

TensorDebugger (TDB) is a visual debugger for deep learning. It extends TensorFlow with breakpoints + real-time visualization of the data flowing through the computational graph

TensorDebugger (TDB) is a visual debugger for deep learning. It extends TensorFlow (Google's Deep Learning framework) with breakpoints + real-time visualization of the data flowing through the computational graph.

1.4k Dec 15, 2022

Python for Data Analysis, 2nd Edition

Python for Data Analysis, 2nd Edition Materials and IPython notebooks for "Python for Data Analysis" by Wes McKinney, published by O'Reilly Media Buy

18.6k Jan 8, 2023

INF42 - Topological Data Analysis

TDA INF421(Conception et analyse d'algorithmes) Projet : Topological Data Analysis SphereMin Etant donné un nuage des points, ce programme contient de

2 Jan 7, 2022

Demonstrate the breadth and depth of your data science skills by earning all of the Databricks Data Scientist credentials

Data Scientist Learning Plan Demonstrate the breadth and depth of your data science skills by earning all of the Databricks Data Scientist credentials

27 Nov 1, 2022

sequitur is a library that lets you create and train an autoencoder for sequential data in just two lines of code

sequitur sequitur is a library that lets you create and train an autoencoder for sequential data in just two lines of code. It implements three differ

305 Dec 21, 2022

Python Swahili-nlp-data Resources

Python swahili-nlp-data Libraries

Discord webhooks for alerting crypto currency price changes & historical data.

Driver Analysis with Factors and Forests: An Automated Data Science Tool using Python

Realtime data read and write without page refresh using Ajax in Django.

In this workshop we will be exploring NLP state of the art transformers, with SOTA models like T5 and BERT, then build a model using HugginFace transformers framework.

Robust and blazing fast open-redirect vulnerability scanner with ability of recursevely crawling all of web-forms, entry points, or links with data.

Repository for the Demo of using DVC with PyCaret & MLOps (DVC Office Hours - 20th Jan, 2022)

The mitosheet package, trymito.io, and other public Mito code.

Implemented Exploratory Data Analysis (EDA) using Python.Built a dashboard in Tableau and found that 45.87% of People suffer from heart disease.

RuCLIP-SB (Russian Contrastive Language–Image Pretraining SWIN-BERT) is a multimodal model for obtaining images and text similarities and rearranging captions and pictures. Unlike other versions of the model we use BERT for text encoder and SWIN transformer for image encoder.

Explaining Deep Neural Networks - A comparison of different CAM methods based on an insect data set

CMSC320 - Introduction to Data Science - Fall 2021

clustering moroccan stocks time series data using k-means with dtw (dynamic time warping)

This repository contains project created during the Data Challenge module at London School of Hygiene & Tropical Medicine

Data Engineering ZoomCamp

Processed, version controlled history of Minecraft's generated data and assets

A web app builds using streamlit API with python backend to analyze and pick insides from multiple data formats.

This repository provides the official code for GeNER (an automated dataset Generation framework for NER).

TIANCHI Purchase Redemption Forecast Challenge

NLP applications using deep learning.

WikiPron - a command-line tool and Python API for mining multilingual pronunciation data from Wiktionary

Import some key/value data to Prometheus custom-built Node Exporter in Python

Machine Learning e Data Science com Python

Clean and reusable data-sciency notebooks.

This repository contains the raw data and a python notebook to ingest historical A&E attendance data and then use a simple Prophet model to predict the number of A&E attendances in England if the COVID-19 pandemic had not happened

FAIR Enough Metrics is an API for various FAIR Metrics Tests, written in python

This repository contains the implementation of the HealthGen model, a generative model to synthesize realistic EHR time series data with missingness

A PyTorch implementation for our paper "Dual Contrastive Learning: Text Classification via Label-Aware Data Augmentation".

PyTorch implementation for the paper Visual Representation Learning with Self-Supervised Attention for Low-Label High-Data Regime

Leaf: Multiple-Choice Question Generation

coldcuts is an R package to automatically generate and plot segmentation drawings in R

Tackling data scarcity in Speech Translation using zero-shot multilingual Machine Translation techniques

Semantic code search implementation using Tensorflow framework and the source code data from the CodeSearchNet project

Code for the paper titled "Prabhupadavani: A Code-mixed Speech Translation Data for 25 languages"

A classification model capable of accurately predicting the price of secondhand cars

Analyzing Earth Observation (EO) data is complex and solutions often require custom tailored algorithms.

Predict the spans of toxic posts that were responsible for the toxic label of the posts

Statistical & Probabilistic Analysis of Store Sales, University Survey, & Manufacturing data

Melanoma Skin Cancer Detection using Convolutional Neural Networks and Transfer Learning🕵🏻‍♂️

Final Project for Practical Python Programming and Algorithms for Data Analysis

Fetch fund data from avanza.se using Python and some web scraping with bs4

Storing, versioning, and downloading files from S3 made as easy as using open() in Python. Caching included.

This library provides an abstraction to perform Model Versioning using Weight & Biases.

Python-geoarrow - Storing geometry data in Apache Arrow format

Scikit-event-correlation - Event Correlation and Forecasting over High Dimensional Streaming Sensor Data algorithms

This Repository is an up-to-date version of Harvard nlp's Legacy code and a Refactoring of the jupyter notebook version as a shell script version.

Graviti-python-sdk - Graviti Data Platform Python SDK

DomainMonitor is a web project that has a RESTful API to get a domain's subdomains and whois data.

Twitter bot that uses NLP models to summarize news articles referenced in a user's twitter timeline

OptiPLANT is a cloud-based based system that empowers professional and non-professional data scientists to build high-quality predictive models

This is a NLP based project to extract effective date of the contract from their text files.

Dynamic vae - Dynamic VAE algorithm is used for anomaly detection of battery data

This repo provides the source code & data of our paper "GreaseLM: Graph REASoning Enhanced Language Models"

Annotating the Tweebank Corpus on Named Entity Recognition and Building NLP Models for Social Media Analysis

Clustering is a popular approach to detect patterns in unlabeled data

SpyQL - SQL with Python in the middle

Data-sets from the survey and analysis

Compute execution plan: A DAG representation of work that you want to get done. Individual nodes of the DAG could be simple python or shell tasks or complex deeply nested parallel branches or embedded DAGs themselves.

Es-schema - Common Data Schemas for Elasticsearch

Data analysis and visualisation projects from a range of individual projects and applications

Predicting Auction Sale Price using the kaggle bulldozer auction sales data: Modeling with Ensembles vs Neural Network

A program made in PYTHON🐍 that automatically performs data insertions into a POSTGRES database 🐘 , using as base a .CSV file 📁 , useful in mass data insertions

Labelbox is the fastest way to annotate data to build and ship artificial intelligence applications

Toward Model Interpretability in Medical NLP

Wikipedia-Utils: Preprocessing Wikipedia Texts for NLP

BERTopic is a topic modeling technique that leverages 🤗 transformers and c-TF-IDF to create dense clusters allowing for easily interpretable topics whilst keeping important words in the topic descriptions

FairLens is an open source Python library for automatically discovering bias and measuring fairness in data

This is a curated list of medical data for machine learning

Segment axon and myelin from microscopy data using deep learning

Equipped customers with insights about their EVs Hourly energy consumption and helped predict future charging behavior using LSTM model

A Celery application to collect data, download media and extract information from social media APIs

Data processing with Pandas.

Nobel Data Analysis

Investigating EV charging data

Analyze the Gravitational wave data stored at LIGO/VIRGO observatories

This python script allows you to manipulate the audience data from Sl.ido surveys

Stanford CoreNLP provides a set of natural language analysis tools written in Java

Cormen-Lib - An academic tool for data structures and algorithms courses

Collapse by Conditioning: Training Class-conditional GANs with Limited Data