4183 Repositories
Python text-data-analysis Libraries
An open source python library for automated feature engineering
"One of the holy grails of machine learning is to automate more and more of the feature engineering process." ― Pedro Domingos, A Few Useful Things to
An intuitive library to extract features from time series
Time Series Feature Extraction Library Intuitive time series feature extraction This repository hosts the TSFEL - Time Series Feature Extraction Libra
The Turing Change Point Detection Benchmark: An Extensive Benchmark Evaluation of Change Point Detection Algorithms on real-world data
Turing Change Point Detection Benchmark Welcome to the repository for the Turing Change Point Detection Benchmark, a benchmark evaluation of change po
Technical Analysis library in pandas for backtesting algotrading and quantitative analysis
bta-lib - A pandas based Technical Analysis Library bta-lib is pandas based technical analysis library and part of the backtrader family. Links Main P
Highly comparative time-series analysis
〰️ hctsa 〰️ : highly comparative time-series analysis hctsa is a software package for running highly comparative time-series analysis using Matlab (fu
Python binding for Khiva library.
Khiva-Python Build Documentation Build Linux and Mac OS Build Windows Code Coverage README This is the Khiva Python binding, it allows the usage of Kh
Timeseries analysis for neuroscience data
=================================================== Nitime: timeseries analysis for neuroscience data ===============================================
A Python library for unevenly-spaced time series analysis
traces A Python library for unevenly-spaced time series analysis. Why? Taking measurements at irregular intervals is common, but most tools are primar
This is a Python wrapper for TA-LIB based on Cython instead of SWIG.
TA-Lib This is a Python wrapper for TA-LIB based on Cython instead of SWIG. From the homepage: TA-Lib is widely used by trading software developers re
Python package for downloading ECMWF reanalysis data and converting it into a time series format.
ecmwf_models Readers and converters for data from the ECMWF reanalysis models. Written in Python. Works great in combination with pytesmo. Citation If
Speech to text streamlit app
Speech to text Streamlit-app! 👄 This speech to text recognition is powered by t
A way of looking at COVID-19 data that I haven't seen before.
Visualizing Omicron: COVID-19 Deaths vs. Cases Click here for other countries. Data is from Our World in Data/Johns Hopkins University. About this pro
Software Engineer Salary Prediction
Based on 2021 stack overflow data, this machine learning web application helps one predict the salary based on years of experience, level of education and the country they work in.
Analyzed the data of VISA applicants to build a predictive model to facilitate the process of VISA approvals.
Analyzed the data of Visa applicants, built a predictive model to facilitate the process of visa approvals, and based on important factors that significantly influence the Visa status recommended a suitable profile for the applicants for whom the visa should be certified or denied.
Medical appointments No-Show classifier
Medical Appointments No-shows Why do 20% of patients miss their scheduled appointments? A person makes a doctor appointment, receives all the instruct
Arquivos do curso online sobre a estatística voltada para ciência de dados e aprendizado de máquina.
Estatistica para Ciência de Dados e Machine Learning Arquivos do curso online sobre a estatística voltada para ciência de dados e aprendizado de máqui
A general and strong 3D object detection codebase that supports more methods, datasets and tools (debugging, recording and analysis).
ALLINONE-Det ALLINONE-Det is a general and strong 3D object detection codebase built on OpenPCDet, which supports more methods, datasets and tools (de
Prometheus Exporter for data scraped from datenplattform.darmstadt.de
darmstadt-opendata-exporter Scrapes data from https://datenplattform.darmstadt.de and presents it in the Prometheus Exposition format. Pull requests w
Deepchecks is a Python package for comprehensively validating your machine learning models and data with minimal effort
Deepchecks is a Python package for comprehensively validating your machine learning models and data with minimal effort
Speech recognition tool to convert audio to text transcripts, for Linux and Raspberry Pi.
Spchcat Speech recognition tool to convert audio to text transcripts, for Linux and Raspberry Pi. Description spchcat is a command-line tool that read
PyTorch EO aims to make Deep Learning for Earth Observation data easy and accessible to real-world cases and research alike.
Pytorch EO Deep Learning for Earth Observation applications and research. 🚧 This project is in early development, so bugs and breaking changes are ex
Full-Stack application that visualizes amusement park safety.
Amusement Park Ride Safety Analysis Project Proposal We have chosen to look into amusement park data to explore ride safety relationships visually, in
Finite Element Analysis
FElupe - Finite Element Analysis FElupe is a Python 3.6+ finite element analysis package focussing on the formulation and numerical solution of nonlin
Datasets, tools, and benchmarks for representation learning of code.
The CodeSearchNet challenge has been concluded We would like to thank all participants for their submissions and we hope that this challenge provided
General Assembly's 2015 Data Science course in Washington, DC
DAT8 Course Repository Course materials for General Assembly's Data Science course in Washington, DC (8/18/15 - 10/29/15). Instructor: Kevin Markham (
Ipython notebook presentations for getting starting with basic programming, statistics and machine learning techniques
Data Science 45-min Intros Every week*, our data science team @Gnip (aka @TwitterBoulder) gets together for about 50 minutes to learn something. While
A middle-to-high level algorithm book designed with coding interview at heart!
Hands-on Algorithmic Problem Solving A one-stop coding interview prep book! About this book In short, this is a middle-to-high level algorithm book de
A Collection of Cheatsheets, Books, Questions, and Portfolio For DS/ML Interview Prep
Here are the sections: Data Science Cheatsheets Data Science EBooks Data Science Question Bank Data Science Case Studies Data Science Portfolio Data J
A site that displays up to date COVID-19 stats, powered by fastpages.
https://covid19dashboards.com This project was built with fastpages Background This project showcases how you can use fastpages to create a static das
Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks
Spark Python Notebooks This is a collection of IPython notebook/Jupyter notebooks intended to train the reader on different Apache Spark concepts, fro
🤖 ⚡ scikit-learn tips
🤖 ⚡ scikit-learn tips New tips are posted on LinkedIn, Twitter, and Facebook. 👉 Sign up to receive 2 video tips by email every week! 👈 List of all
Introduction to Statistics and Basics of Mathematics for Data Science - The Hacker's Way
HackerMath for Machine Learning “Study hard what interests you the most in the most undisciplined, irreverent and original manner possible.” ― Richard
List of papers, code and experiments using deep learning for time series forecasting
Deep Learning Time Series Forecasting List of state of the art papers focus on deep learning and resources, code and experiments using deep learning f
Jupyter Notebook tutorials on solving real-world problems with Machine Learning & Deep Learning using PyTorch
Jupyter Notebook tutorials on solving real-world problems with Machine Learning & Deep Learning using PyTorch. Topics: Face detection with Detectron 2, Time Series anomaly detection with LSTM Autoencoders, Object Detection with YOLO v5, Build your first Neural Network, Time Series forecasting for Coronavirus daily cases, Sentiment Analysis with BERT.
A Practitioner's Guide to Natural Language Processing
Learn how to process, classify, cluster, summarize, understand syntax, semantics and sentiment of text data with the power of Python! This repository contains code and datasets used in my book, Text Analytics with Python published by Apress/Springer.
Koç University deep learning framework.
Knet Knet (pronounced "kay-net") is the Koç University deep learning framework implemented in Julia by Deniz Yuret and collaborators. It supports GPU
A powerful and user-friendly binary analysis platform!
angr angr is a platform-agnostic binary analysis framework. It is brought to you by the Computer Security Lab at UC Santa Barbara, SEFCOM at Arizona S
Anomaly detection related books, papers, videos, and toolboxes
Anomaly Detection Learning Resources Outlier Detection (also known as Anomaly Detection) is an exciting yet challenging field, which aims to identify
Dshell is a network forensic analysis framework.
Dshell An extensible network forensic analysis framework. Enables rapid development of plugins to support the dissection of network packet captures. K
A list of NLP(Natural Language Processing) tutorials
NLP Tutorial A list of NLP(Natural Language Processing) tutorials built on PyTorch. Table of Contents A step-by-step tutorial on how to implement and
My solution to the book A Collection of Data Science Take-Home Challenges
DS-Take-Home Solution to the book "A Collection of Data Science Take-Home Challenges". Note: Please don't contact me for the dataset. This repository
Beancount: Double-Entry Accounting from Text Files.
beancount: Double-Entry Accounting from Text Files Contents Description Documentation Download & Installation Versions Filing Bugs Copyright and Licen
Free and open source qualitative research tool
Taguette A spin on the phrase "tag it!", Taguette is a free and open source qualitative research tool that allows users to: Import PDFs, Word Docs (.d
Cleaned test data list of DukeMTMC-reID, ICCV2021
Cleaned DukeMTMC-reID Cleaned data list of DukeMTMC-reID released with our paper accepted by ICCV 2021: Learning Instance-level Spatial-Temporal Patte
Seasonal Contrast: Unsupervised Pre-Training from Uncurated Remote Sensing Data
Seasonal Contrast: Unsupervised Pre-Training from Uncurated Remote Sensing Data This is the official PyTorch implementation of the SeCo paper: @articl
Resco: A simple python package that report the effect of deep residual learning
resco Description resco is a simple python package that report the effect of dee
a curated list of docker-compose files prepared for testing data engineering tools, databases and open source libraries.
data-services A repository for storing various Data Engineering docker-compose files in one place. How to use it ? Set the required settings in .env f
Customer Service Requests Analysis is one of the practical life problems that an analyst may face. This Project is one such take. The project is a beginner to intermediate level project. This repository has a Source Code, README file, Dataset, Image and License file.
Customer Service Requests Analysis Project 1 DESCRIPTION Background of Problem Statement : NYC 311's mission is to provide the public with quick and e
Analysis of a daily word game "Wordle"
Wordle Analysis of a daily word game "Wordle" https://www.powerlanguage.co.uk/wordle/ Description Worlde is a daily word game in which a player attemp
Creating a python package to convert /transfer excelsheet data to a mysql Database Table
Creating a python package to convert /transfer excelsheet data to a mysql Database Table
Graphical Password Authentication System.
Graphical Password Authentication System. This is used to increase the protection/security of a website. Our system is divided into further 4 layers of protection. Each layer is totally different and diverse than the others. This not only increases protection, but also makes sure that no non-human can log in to your account using different activities such as Brute Force Algorithm and so on.
BERN2: an advanced neural biomedical namedentity recognition and normalization tool
BERN2 We present BERN2 (Advanced Biomedical Entity Recognition and Normalization
Lol qq parser - A League of Legends parser for QQ data
lol_qq_parser A League of Legends parser for QQ data Sources This package relies
Python bindings for Basler's VisualApplets TCL script generation
About visualapplets.py The Basler AG company provides a TCL scripting engine to automatize the creation of VisualApplets designs (a former Silicon Sof
Maiden & Spell community player ranking based on tournament data.
MnSRank Maiden & Spell community player ranking based on tournament data. Why? 2021 just ended and this seemed like a cool idea. Elo doesn't work well
A Telegram crawler to search groups and channels automatically and collect any type of data from them.
Introduction This is a crawler I wrote in Python using the APIs of Telethon months ago. This tool was not intended to be publicly available for a numb
NewpaperNews-API - Json data of the news with python
NewsAPI API Documentation BASE_URL = "https://saurav.tech/NewsAPI/" top_headline
A part of HyRiver software stack for handling geospatial data manipulations
Package Description Status PyNHD Navigate and subset NHDPlus (MR and HR) using web services Py3DEP Access topographic data through National Map's 3DEP
PyExplainer: A Local Rule-Based Model-Agnostic Technique (Explainable AI)
PyExplainer PyExplainer is a local rule-based model-agnostic technique for generating explanations (i.e., why a commit is predicted as defective) of J
The module that allows the collection of data sampling, which is transmitted with WebSocket via WIFI or serial port for CSV file.
The module that allows the collection of data sampling, which is transmitted with WebSocket via WIFI or serial port for CSV file.
Uproot is a library for reading and writing ROOT files in pure Python and NumPy.
Uproot is a library for reading and writing ROOT files in pure Python and NumPy. Unlike the standard C++ ROOT implementation, Uproot is only an I/O li
NNR conformation conditional and global probabilities estimation and analysis in peptides or proteins fragments
NNR and global probabilities estimation and analysis in peptides or protein fragments This module calculates global and NNR conformation dependent pro
Simulate genealogical trees and genomic sequence data using population genetic models
msprime msprime is a population genetics simulator based on tskit. Msprime can simulate random ancestral histories for a sample of individuals (consis
PyMove is a Python library to simplify queries and visualization of trajectories and other spatial-temporal data
Use PyMove and go much further Information Package Status License Python Version Platforms Build Status PyPi version PyPi Downloads Conda version Cond
Python Client for Algorithmia Algorithms and Data API
Algorithmia Common Library (python) Python client library for accessing the Algorithmia API For API documentation, see the PythonDocs Algorithm Develo
constructing maps of intellectual influence from publication data
Influencemap Project @ ANU Influence in the academic communities has been an area of interest for researchers. This can be seen in the popularity of a
Python client for using Prefect Cloud with Saturn Cloud
prefect-saturn prefect-saturn is a Python package that makes it easy to run Prefect Cloud flows on a Dask cluster with Saturn Cloud. For a detailed tu
A Python Package For System Identification Using NARMAX Models
SysIdentPy is a Python module for System Identification using NARMAX models built on top of numpy and is distributed under the 3-Clause BSD license. N
BigDL - Evaluate the performance of BigDL (Distributed Deep Learning on Apache Spark) in big data analysis problems
Evaluate the performance of BigDL (Distributed Deep Learning on Apache Spark) in big data analysis problems.
Text and code for the forthcoming second edition of Think Bayes, by Allen Downey.
Think Bayes 2 by Allen B. Downey The HTML version of this book is here. Think Bayes is an introduction to Bayesian statistics using computational meth
Algorithms written in different programming languages
Data Structures and Algorithms Clean example implementations of data structures and algorithms written in different languages. List of implementations
Quilt is a self-organizing data hub for S3
Quilt is a self-organizing data hub Python Quick start, tutorials If you have Python and an S3 bucket, you're ready to create versioned datasets with
Imaging, analysis, and simulation software for radio interferometry
ehtim (eht-imaging) Python modules for simulating and manipulating VLBI data and producing images with regularized maximum likelihood methods. This ve
A next-generation curated knowledge sharing platform for data scientists and other technical professions.
Knowledge Repo The Knowledge Repo project is focused on facilitating the sharing of knowledge between data scientists and other technical roles using
Diff Match Patch is a high-performance library in multiple languages that manipulates plain text.
The Diff Match and Patch libraries offer robust algorithms to perform the operations required for synchronizing plain text. Diff: Compare two blocks o
Scalene: a high-performance, high-precision CPU, GPU, and memory profiler for Python
Scalene: a high-performance CPU, GPU and memory profiler for Python by Emery Berger, Sam Stern, and Juan Altmayer Pizzorno. Scalene community Slack Ab
Data on COVID-19 (coronavirus) cases, deaths, hospitalizations, tests • All countries • Updated daily by Our World in Data
COVID-19 Dataset by Our World in Data Find our data on COVID-19 and its documentation in public/data. Documentation Data: complete COVID-19 dataset Da
A distributed crawler for weibo, building with celery and requests.
A distributed crawler for weibo, building with celery and requests.
Chinese version of GPT2 training code, using BERT tokenizer.
GPT2-Chinese Description Chinese version of GPT2 training code, using BERT tokenizer or BPE tokenizer. It is based on the extremely awesome repository
Sistema de tratamento e análise de grandes volumes de dados através de técnicas de Data Science
Sistema de tratamento e análise de grandes volumes de dados através de técnicas de data science Todos os scripts, gráficos e relatórios de todas as at
Bunch of different tools which helps visualizing and annotating images for semantic/instance segmentation tasks
Data Framework for Semantic/Instance Segmentation Bunch of different tools which helps visualizing, transforming and annotating images for semantic/in
DUQ is a python package for working with physical Dimensions, Units, and Quantities.
DUQ is a python package for working with physical Dimensions, Units, and Quantities.
DeepLearning Anomalies Detection with Bluetooth Sensor Data
Final Year Project. Constructing models to create offline anomalies detection using Travel Time Data collected from Bluetooth sensors along the route.
It is a Blender Tool which can convert the Object Data Attributes in face corner to the UVs or Vertex Color.
Blender_ObjectDataAttributesConvertTool It is a Blender Tool which can convert the Object Data Attributes in face corner to the UVs or Vertex Color. D
An analysis tool for Python that blurs the line between testing and type systems.
CrossHair An analysis tool for Python that blurs the line between testing and type systems. THE LATEST NEWS: Check out the new crosshair cover command
This project deals with a simplified version of a more general problem of Aspect Based Sentiment Analysis.
Aspect_Based_Sentiment_Extraction Created on: 5th Jan, 2022. This project deals with an important field of Natural Lnaguage Processing - Aspect Based
An interactive DNN Model deployed on web that predicts the chance of heart failure for a patient with an accuracy of 98%
Heart Failure Predictor About A Web UI deployed Dense Neural Network Model Made using Tensorflow that predicts whether the patient is healthy or has c
NLP techniques such as named entity recognition, sentiment analysis, topic modeling, text classification with Python to predict sentiment and rating of drug from user reviews.
This file contains the following documents sumbited for Baruch CIS9665 group 9 fall 2021. 1. Dataset: drug_reviews.csv 2. python codes for text classi
Data cleaning, missing value handle, EDA use in this project
Lending Club Case Study Project Brief Solving this assignment will give you an idea about how real business problems are solved using EDA. In this cas
schemasheets - structuring your data using spreadsheets
schemasheets - structuring your data using spreadsheets Create a data dictionary / schema for your data using simple spreadsheets - no coding required
Data and analysis code for an MS on SK VOC genomes phenotyping/neutralisation assays
Description Summary of phylogenomic methods and analyses used in "Immunogenicity of convalescent and vaccinated sera against clinical isolates of ance
Match SafeGraph POIs with Data collected through a cultural resource survey in Washington DC.
Match SafeGraph POI data with Cultural Resource Places in Washington DC Match SafeGraph POIs with Data collected through a cultural resource survey in
Textual: a TUI (Text User Interface) framework for Python inspired by modern web development
Textual Textual is a TUI (Text User Interface) framework for Python inspired by
ARRU seismic backprojection - Earthquake waveform detection and P/S arrivals picking on continuous data using ARRU phase picker
ARRU_seismic_backprojection Earthquake waveform detection and P/S arrivals picki
Go from graph data to a secure and interactive visual graph app in 15 minutes. Batteries-included self-hosting of graph data apps with Streamlit, Graphistry, RAPIDS, and more!
✔️ Linux ✔️ OS X ❌ Windows (#39) Welcome to graph-app-kit Turn your graph data into a secure and interactive visual graph app in 15 minutes! Why This
Crowd sourced training data for Rasa NLU models
NLU Training Data Crowd-sourced training data for the development and testing of Rasa NLU models. If you're interested in grabbing some data feel free
End-to-End text sumarization, QAs generation using flask.
Help-Me-Read A web application created with Flask + BootStrap + HuggingFace 🤗 to generate summary and question-answer from given input text. It uses
Ferramenta de monitoramento do risco de colapso no sistema de saúde em municípios brasileiros com a Covid-19.
FarolCovid 🚦 Ferramenta de monitoramento do risco de colapso no sistema de saúde em municípios brasileiros com a Covid-19. Monitoring tool & simulati
Easy genetic ancestry predictions in Python
ezancestry Easily visualize your direct-to-consumer genetics next to 2500+ samples from the 1000 genomes project. Evaluate the performance of a custom
Load, explore and analyse data from Scotland and rest of the world related to Covid19.
Streamlit Examples This is my first attempt with Streamlit. It is an open-source framework, free, Python-based and easy to use tool to build and deplo