3240 Repositories
Python functional-data-analysis Libraries
This is a project of data parallel that running on NLP tasks.
This is a project of data parallel that running on NLP tasks.
Hashformers is a framework for hashtag segmentation with transformers.
Hashtag segmentation is the task of automatically inserting the missing spaces between the words in a hashtag. Hashformers applies Transformer models
DANet for Tabular data classification/ regression.
Deep Abstract Networks A PyTorch code implemented for the submission DANets: Deep Abstract Networks for Tabular Data Classification and Regression. Do
Package for extracting emotions from social media text. Tailored for financial data.
EmTract: Extracting Emotions from Social Media Text Tailored for Financial Contexts EmTract is a tool that extracts emotions from social media text. I
BrainGNN - A deep learning model for data-driven discovery of functional connectivity
A deep learning model for data-driven discovery of functional connectivity https://doi.org/10.3390/a14030075 Usman Mahmood, Zengin Fu, Vince D. Calhou
Multiview 3D object detection on MultiviewC dataset through moft3d.
Voxelized 3D Feature Aggregation for Multiview Detection [arXiv] Multiview 3D object detection on MultiviewC dataset through VFA. Introduction We prop
Graph WaveNet apdapted for brain connectivity analysis.
Graph WaveNet for brain network analysis This is the implementation of the Graph WaveNet model used in our manuscript: S. Wein , A. Schüller, A. M. To
Official PyTorch implementation of the ICRA 2021 paper: Adversarial Differentiable Data Augmentation for Autonomous Systems.
Adversarial Differentiable Data Augmentation This repository provides the official PyTorch implementation of the ICRA 2021 paper: Adversarial Differen
Projeto job insights - Projeto avaliativo da Trybe do Bloco 32: Introdução à Python
Termos e acordos Ao iniciar este projeto, você concorda com as diretrizes do Código de Ética e Conduta e do Manual da Pessoa Estudante da Trybe. Boas
A data driven app for bicycle hiring in London(UK)
bicycle_hiring_app_deployed A data driven app for bicycle hiring in London(UK). It predicts expected number of bicycle hire in London. It asks users t
Script to create an animated data visualisation for categorical timeseries data - GIF choropleth map with annotations.
choropleth_ldn Simple script to create a chloropleth map of London with categorical timeseries data. The script in main.py creates a gif of the most f
Simple Python project using Opencv and datetime package to recognise faces and log attendance data in a csv file.
Attendance-System-based-on-Facial-recognition-Attendance-data-stored-in-csv-file- Simple Python project using Opencv and datetime package to recognise
IDA scripts for hypervisor (Hyper-v) analysis and reverse engineering automation
Re-Scripts IA32-VMX-Helper (IDA-Script) IA32-MSR-Decoder (IDA-Script) IA32 VMX Helper It's an IDA script (Updated IA32 MSR Decoder) which helps you to
Extracts data from the database for a graph-node and stores it in parquet files
subgraph-extractor Extracts data from the database for a graph-node and stores it in parquet files Installation For developing, it's recommended to us
A Lightweight Face Recognition and Facial Attribute Analysis (Age, Gender, Emotion and Race) Library for Python
deepface Deepface is a lightweight face recognition and facial attribute analysis (age, gender, emotion and race) framework for python. It is a hybrid
Downloads data from OSM API and uploads it to the mapping sandbox.
OpenStreetMap To Sandbox This is a script to download data from OSM API and upload it to the mapping sandbox. Note that it clears all data in the sand
A simple and lightweight genetic algorithm for optimization of any machine learning model
geneticml This package contains a simple and lightweight genetic algorithm for optimization of any machine learning model. Installation Use pip to ins
Python Multilingual Ucrel Semantic Analysis System
PymUSAS Python Multilingual Ucrel Semantic Analysis System, it currently is a rule based token level semantic tagger which can be added to any spaCy p
Functions to analyze Cell-ID single-cell cytometry data using python language.
PyCellID (building...) Functions to analyze Cell-ID single-cell cytometry data using python language. Dependecies for this project. attrs(=21.1.0) fo
Cash in on Expressed Barcode Tags (EBTs) from NGS Sequencing Data with Python
Cash in on Expressed Barcode Tags (EBTs) from NGS Sequencing Data with Python Cashier is a tool developed by Russell Durrett for the analysis and extr
Modular analysis tools for neurophysiology data
Neuroanalysis Modular and interactive tools for analysis of neurophysiology data, with emphasis on patch-clamp electrophysiology. Functions for runnin
Data pipelines for both TensorFlow and PyTorch!
rapidnlp-datasets Data pipelines for both TensorFlow and PyTorch ! If you want to load public datasets, try: tensorflow/datasets huggingface/datasets
TensorFlow implementation of the paper "Hierarchical Attention Networks for Document Classification"
Hierarchical Attention Networks for Document Classification This is an implementation of the paper Hierarchical Attention Networks for Document Classi
Implementation of the ivis algorithm as described in the paper Structure-preserving visualisation of high dimensional single-cell datasets.
Implementation of the ivis algorithm as described in the paper Structure-preserving visualisation of high dimensional single-cell datasets.
Tugas kelompok Struktur Data
Binary-Tree Tugas kelompok Struktur Data Silahkan jika ingin mengubah tipe data pada operasi binary tree *Boleh juga semua program kelompok bisa disat
This contains timezone mapping information for when preprocessed from the geonames data
when-data This contains timezone mapping information for when preprocessed from the geonames data. It exists in a separate repository so that one does
A variant caller for the GBA gene using WGS data
Gauchian: WGS-based GBA variant caller Gauchian is a targeted variant caller for the GBA gene based on a whole-genome sequencing (WGS) BAM file. Gauch
A python script for extracting/removing exif data from images by @AbirHasan2005
Image-Exif A Python script for extracting exif metadata from images. How to use? Using this script you can extract exif data from image and save in .c
Wordlist attacks on Bitwarden data.json files
BitwardenDecryptBrute This is a slightly modified version of BitwardenDecrypt. In addition to the decryption this version can do wordlist attacks for
RoadMap and preparation material for Machine Learning and Data Science - From beginner to expert.
ML-and-DataScience-preparation This repository has the goal to create a learning and preparation roadMap for Machine Learning Engineers and Data Scien
Data Visualizer Web-Application
Viz-It Data Visualizer Web-Application If I ask you where most of the data wrangler looses their time ? It is Data Overview and EDA. Presenting "Viz-I
Reading streams of Twitter data, save them to Kafka, then process with Kafka Stream API and Spark Streaming
Using Streaming Twitter Data with Kafka and Spark Reading streams of Twitter data, publishing them to Kafka topic, process message using Kafka Stream
A little but useful tool to explore OCR data extracted with `pytesseract` and `opencv`
Screenshot OCR Tool Extracting data from screen time screenshots in iOS and Android. We are exploring 3 options: Simple OCR with no text position usin
This collection of tools that makes it easy to secure and/or obfuscate messages, files, and data.
Scrambler App This collection of tools that makes it easy to secure and/or obfuscate messages, files, and data. It leverages encryption tools such as
A set of Python scripts and notebooks to help administer and configure Workforce projects.
Workforce Scripts A set of Python scripts and notebooks to help administer and configure Workforce projects. Notebooks Several example Jupyter noteboo
Repo with data from local elections in Portugal from 2009 to 2013
autarquicas - local elections in Portugal Repo with data from local elections in Portugal from 2009 to 2013 Objective To provide, to all, raw data fro
Deep Learning applied to Integral data analysis
DeepIntegralCompton Deep Learning applied to Integral data analysis Module installation Move to the root directory of the project and execute : pip in
Binance Kline Data With Python
Binance Kline Data by seunghan(gingerthorp) reference https://github.com/binance/binance-public-data/ All intervals are supported: 1m, 3m, 5m, 15m, 30
My Analysis of the VC4 Assembly Code from the RPI4
My Analysis of the VC4 Assembly Code from the RPI4
Scripts for BGC analysis in large MAGs and results of their application to soil metagenomes within Chernevaya Taiga RSF-funded project
Scripts for BGC analysis in large MAGs and results of their application to soil metagenomes within Chernevaya Taiga RSF-funded project
HTSeq is a Python library to facilitate processing and analysis of data from high-throughput sequencing (HTS) experiments.
HTSeq DEVS: https://github.com/htseq/htseq DOCS: https://htseq.readthedocs.io A Python library to facilitate programmatic analysis of data from high-t
Machine learning model evaluation made easy: plots, tables, HTML reports, experiment tracking and Jupyter notebook analysis.
sklearn-evaluation Machine learning model evaluation made easy: plots, tables, HTML reports, experiment tracking, and Jupyter notebook analysis. Suppo
A package, and script, to perform imaging transcriptomics on a neuroimaging scan.
Imaging Transcriptomics Imaging transcriptomics is a methodology that allows to identify patterns of correlation between gene expression and some prop
Script to generate a massive volume of data in sql, csv, json or xml format
DataGenerator Made with Python Open for pull requests 1. Dependencies To install required dependencies run pip install -r requirements.txt 2. Executi
Technical Analysis Indicators - Pandas TA is an easy to use Python 3 Pandas Extension with 130+ Indicators
Pandas TA - A Technical Analysis Library in Python 3 Pandas Technical Analysis (Pandas TA) is an easy to use library that leverages the Pandas package
This is collection of Managementsystem programs: Hospital Management, Student Managemen, etc
Contribute in this repository and help other students with their assignment by adding python scripts for various management system programs.
The functions we created are included in a script. The necessary parts for pre-processing were taken. Analysis complete.
Feature-Engineering The functions we created are included in a script. The necessary parts for pre-processing were taken. Analysis complete. Business
Official Pytorch implementation for 2021 ICCV paper "Learning Motion Priors for 4D Human Body Capture in 3D Scenes" and trained models / data
Learning Motion Priors for 4D Human Body Capture in 3D Scenes (LEMO) Official Pytorch implementation for 2021 ICCV (oral) paper "Learning Motion Prior
This directory gathers the tools developed by the Data Sourcing Working Group
BigScience Data Sourcing Code This directory gathers the tools developed by the Data Sourcing Working Group First Sourcing Sprint: October 2021 The co
An audnexus client, providing rich author and audiobook data to Plex via it's legacy plugin agent system.
Audnexus.bundle An audnex.us client, providing rich author and audiobook data to Plex via it's legacy plugin agent system. 📝 Table of Contents About
Repository, with small useful and functional applications
Repositorio,com pequenos aplicativos uteis e funcionais A ideia e ir deselvolvendo pequenos aplicativos funcionais e adicionar a este repositorio List
An assignment from my grad-level data mining course demonstrating some experience with NLP/neural networks/Pytorch
NLP-Pytorch-Assignment An assignment from my grad-level data mining course (before I started personal projects) demonstrating some experience with NLP
Demonstrate a Dataflow pipeline that saves data from an API into BigQuery table
Overview dataflow-mvp provides a basic example pipeline that pulls data from an API and writes it to a BigQuery table using GCP's Dataflow (i.e., Apac
A simple and efficient computing package for Genshin Impact gacha analysis
GGanalysisLite计算包 这个版本的计算包追求计算速度,而GGanalysis包有着更多计算功能。 GGanalysisLite包通过卷积计算分布列,通过FFT和快速幂加速卷积计算。 测试玩家得到的排名值rank的数学意义是:与抽了同样数量五星的其他玩家相比,测试玩家花费的抽数大于等于比例
A Python library for loading data from a SpaceX Starlink satellite.
Starlink Python A Python library for loading data from a SpaceX Starlink satellite. The goal is to be a simple interface for Starlink. It builds upon
Slientruss3d : Python for stable truss analysis tool
slientruss3d : Python for stable truss analysis tool Desciption slientruss3d is a python package which can solve the resistances, internal forces and
Visualize large time-series data in plotly
plotly_resampler enables visualizing large sequential data by adding resampling functionality to Plotly figures. In this Plotly-Resampler demo over 11
Rainbow DQN implementation accompanying the paper "Fast and Data-Efficient Training of Rainbow" which reaches 205.7 median HNS after 10M frames. 🌈
Rainbow 🌈 An implementation of Rainbow DQN which reaches a median HNS of 205.7 after only 10M frames (the original Rainbow from Hessel et al. 2017 re
Code for EMNLP 2021 paper: "Learning Implicit Sentiment in Aspect-based Sentiment Analysis with Supervised Contrastive Pre-Training"
SCAPT-ABSA Code for EMNLP2021 paper: "Learning Implicit Sentiment in Aspect-based Sentiment Analysis with Supervised Contrastive Pre-Training" Overvie
Randomisation-based inference in Python based on data resampling and permutation.
Randomisation-based inference in Python based on data resampling and permutation.
Python code for working with NFL play by play data.
nfl_data_py nfl_data_py is a Python library for interacting with NFL data sourced from nflfastR, nfldata, dynastyprocess, and Draft Scout. Includes im
voice assistant made with python that search for covid19 data(like total cases, deaths and etc) in a specific country
covid19-voice-assistant voice assistant made with python that search for covid19 data(like total cases, deaths and etc) in a specific country installi
Analysiscsv.py for extracting analysis and exporting as CSV
wcc_analysis Lichess page documentation: https://lichess.org/page/world-championships Each WCC has a study, studies are fetched using: https://lichess
Simple API for UCI Machine Learning Dataset Repository (search, download, analyze)
A simple API for working with University of California, Irvine (UCI) Machine Learning (ML) repository Table of Contents Introduction About Page of the
An Integrated Experimental Platform for time series data anomaly detection.
Curve Sorry to tell contributors and users. We decided to archive the project temporarily due to the employee work plan of collaborators. There are no
CLIP (Contrastive Language–Image Pre-training) trained on Indonesian data
CLIP-Indonesian CLIP (Radford et al., 2021) is a multimodal model that can connect images and text by training a vision encoder and a text encoder joi
Python script to tabulate data formats like json, csv, html, etc
pyT PyT is a a command line tool and as well a library for visualising various data formats like: JSON HTML Table CSV XML, etc. Features Print table o
MS in Data Science capstone project. Studying attacks on autonomous vehicles.
Surveying Attack Models for CAVs Guide to Installing CARLA and Collecting Data Our project focuses on surveying attack models for Connveced Autonomous
Pipeline for training LSA models using Scikit-Learn.
Latent Semantic Analysis Pipeline for training LSA models using Scikit-Learn. Usage Instead of writing custom code for latent semantic analysis, you j
GDSC UIET KUK 📍 , welcomes you all to this amazing event where you will be introduced to the world of coding 💻 .
GDSC UIET KUK 📍 , welcomes you all to this amazing event where you will be introduced to the world of coding 💻 .
SentAugment is a data augmentation technique for semi-supervised learning in NLP.
SentAugment SentAugment is a data augmentation technique for semi-supervised learning in NLP. It uses state-of-the-art sentence embeddings to structur
Creating predictive checklists from data using integer programming.
Learning Optimal Predictive Checklists A Python package to learn simple predictive checklists from data subject to customizable constraints. For more
Code and Data for the paper: Molecular Contrastive Learning with Chemical Element Knowledge Graph [AAAI 2022]
Knowledge-enhanced Contrastive Learning (KCL) Molecular Contrastive Learning with Chemical Element Knowledge Graph [ AAAI 2022 ]. We construct a Chemi
A simple algorithm for extracting tree height in sparse scene from point cloud data.
TREE HEIGHT EXTRACTION IN SPARSE SCENES BASED ON UAV REMOTE SENSING This is the offical python implementation of the paper "Tree Height Extraction in
My notes on Data structure and Algos in golang implementation and python
My notes on DS and Algo Table of Contents Arrays LinkedList Trees Types of trees: Tree/Graph Traversal Algorithms Heap Priorty Queue Trie Graphs Graph
Minimal pure Python library for working with little-endian list representation of bit strings.
bitlist Minimal Python library for working with bit vectors natively. Purpose This library allows programmers to work with a native representation of
Google, Facebook, Amazon, Microsoft, Netflix tech interview questions
Algorithm and Data Structures Interview Questions HackerRank | Practice, Tutorials & Interview Preparation Solutions This repository consists of solut
Learn to code in any language. If
Learn to Code It is an intiiative undertaken by Student Ambassadors Club, Jamshoro for students who are absolute begineers in programming and want to
Multiple Imputation with Random Forests in Python
miceforest: Fast, Memory Efficient Imputation with lightgbm Fast, memory efficient Multiple Imputation by Chained Equations (MICE) with lightgbm. The
Python ts2vg package provides high-performance algorithm implementations to build visibility graphs from time series data.
ts2vg: Time series to visibility graphs The Python ts2vg package provides high-performance algorithm implementations to build visibility graphs from t
Very useful and necessary functions that simplify working with data
Additional-function-for-pandas Very useful and necessary functions that simplify working with data random_fill_nan(module_name, nan) - Replaces all sp
Student Enrollment Analysis System
SEAS Student Enrollment Analysis System Steps to start working: create a user name "seas", host name: local, password: seas, mark all checkbox - go C
A tool to nowcast quarterly data with monthly indicators: US consumption example
MIDAS_Nowcaster A tool to nowcast quarterly data with monthly indicators: US consumption example Pulls data directly from FRED from a list of codes -
Data Exfiltration without ever making a connection. Using TCP header space.
TCPwned PoC toy code to exfiltrate data without ever making a TCP connection. This will never show up in firewall logs, much less, actually be monitor
Upgini : data search library for your machine learning pipelines
Automated data search library for your machine learning pipelines → find & deliver relevant external data & features to boost ML accuracy :chart_with_upwards_trend:
A light weight data augmentation tool for training CNNs and Viola Jones detectors
hey-daug A light weight data augmentation tool for training CNNs and Viola Jones detectors (Haar Cascades). This tool inflates your data by up to six
Open source annotation tool for machine learning practitioners.
doccano doccano is an open source text annotation tool for humans. It provides annotation features for text classification, sequence labeling and sequ
Tools for curating biomedical training data for large-scale language modeling
Tools for curating biomedical training data for large-scale language modeling
BERT Attention Analysis
BERT Attention Analysis This repository contains code for What Does BERT Look At? An Analysis of BERT's Attention. It includes code for getting attent
Norm-based Analysis of Transformer
Norm-based Analysis of Transformer Implementations for 2 papers introducing to analyze Transformers using vector norms: Kobayashi+'20 Attention is Not
The repository for the paper "When Do You Need Billions of Words of Pretraining Data?"
pretraining-learning-curves This is the repository for the paper When Do You Need Billions of Words of Pretraining Data? Edge Probing We use jiant1 fo
DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.
DeepSpeed+Megatron trained the world's most powerful language model: MT-530B DeepSpeed is hiring, come join us! DeepSpeed is a deep learning optimizat
Information Gain Filtration (IGF) is a method for filtering domain-specific data during language model finetuning. IGF shows significant improvements over baseline fine-tuning without data filtration.
Information Gain Filtration Information Gain Filtration (IGF) is a method for filtering domain-specific data during language model finetuning. IGF sho
The code is the training example of AAAI2022 Security AI Challenger Program Phase 8: Data Centric Robot Learning on ML models.
Example code of [Tianchi AAAI2022 Security AI Challenger Program Phase 8]
DANeS is an open-source E-newspaper dataset by collaboration between DATASET JSC (dataset.vn) and AIV Group (aivgroup.vn)
DANeS - Open-source E-newspaper dataset Source: Technology vector created by macrovector - www.freepik.com. DANeS is an open-source E-newspaper datase
[KBS] Aspect-based sentiment analysis via affective knowledge enhanced graph convolutional networks
#Sentic GCN Introduction This repository was used in our paper: Aspect-Based Sentiment Analysis via Affective Knowledge Enhanced Graph Convolutional N
clustimage is a python package for unsupervised clustering of images.
clustimage The aim of clustimage is to detect natural groups or clusters of images. Image recognition is a computer vision task for identifying and ve
Label Studio is a multi-type data labeling and annotation tool with standardized output format
Website • Docs • Twitter • Join Slack Community What is Label Studio? Label Studio is an open source data labeling tool. It lets you label data types
Data preprocessing rosetta parser for python
datapreprocessing_rosetta_parser I've never done any NLP or text data processing before, so I wanted to use this hackathon as a learning opportunity,
Async boto3 with Autogenerated Data Classes
awspydk Async boto3 with Autogenerated JIT Data Classes Motivation This library is forked from an internal project that works with a lot of backend AW