266 Repositories
Python distributed-crawler Libraries
A library for building and serving multi-node distributed faiss indices.
About Distributed faiss index service. A lightweight library that lets you work with FAISS indexes which don't fit into a single server memory. It fol
A Python library that enables ML teams to share, load, and transform data in a collaborative, flexible, and efficient way :chestnut:
Squirrel Core Share, load, and transform data in a collaborative, flexible, and efficient way What is Squirrel? Squirrel is a Python library that enab
Easy Parallel Library (EPL) is a general and efficient deep learning framework for distributed model training.
English | 简体中文 Easy Parallel Library Overview Easy Parallel Library (EPL) is a general and efficient library for distributed model training. Usability
Sky Computing: Accelerating Geo-distributed Computing in Federated Learning
Sky Computing Introduction Sky Computing is a load-balanced framework for federated learning model parallelism. It adaptively allocate model layers to
Official code for "Distributed Deep Learning in Open Collaborations" (NeurIPS 2021)
Distributed Deep Learning in Open Collaborations This repository contains the code for the NeurIPS 2021 paper "Distributed Deep Learning in Open Colla
Dude is a very simple framework for writing web scrapers using Python decorators
Dude is a very simple framework for writing web scrapers using Python decorators. The design, inspired by Flask, was to easily build a web scraper in just a few lines of code. Dude has an easy-to-learn syntax.
Programmers-quest - Programmer's Quest! An open source MMO built on top of the Panda3D game engine and Astron server
Programmer's Quest! Programmer's Quest! The open source Python 3 2D MMORPG showc
OpenFed: A Comprehensive and Versatile Open-Source Federated Learning Framework
OpenFed: A Comprehensive and Versatile Open-Source Federated Learning Framework Introduction OpenFed is a foundational library for federated learning
Locally Differentially Private Distributed Deep Learning via Knowledge Distillation (LDP-DL)
Locally Differentially Private Distributed Deep Learning via Knowledge Distillation (LDP-DL) A preprint version of our paper: Link here This is a samp
🕷 Phone Crawler with multi-thread functionality
Phone Crawler: Phone Crawler with multi-thread functionality Disclaimer: I'm not responsible for any illegal/misuse actions, this program was made for
This repo has the source code for the crawler and data crawled from auto-data.net
This repo contains the source code for crawler and crawled data of cars specifications from autodata. The data has roughly 45k cars
Python project that aims to discover CDP neighbors and map their Layer-2 topology within a shareable medium like Visio or Draw.io.
Python project that aims to discover CDP neighbors and map their Layer-2 topology within a shareable medium like Visio or Draw.io.
Python script for crawling ResearchGate.net papers✨⭐️📎
ResearchGate Crawler Python script for crawling ResearchGate.net papers About the script This code start crawling process by urls in start.txt and giv
A new version of the CIDACS-RL linkage tool suitable to a cluster computing environment.
Fully Distributed CIDACS-RL The CIDACS-RL is a brazillian record linkage tool suitable to integrate large amount of data with high accuracy. However,
BaseCls BaseCls 是一个基于 MegEngine 的预训练模型库,帮助大家挑选或训练出更适合自己科研或者业务的模型结构
BaseCls BaseCls 是一个基于 MegEngine 的预训练模型库,帮助大家挑选或训练出更适合自己科研或者业务的模型结构。 文档地址:https://basecls.readthedocs.io 安装 安装环境 BaseCls 需要 Python = 3.6。 BaseCls 依赖 M
FedML: A Research Library and Benchmark for Federated Machine Learning
FedML: A Research Library and Benchmark for Federated Machine Learning 📄 https://arxiv.org/abs/2007.13518 News 2021-02-01 (Award): #NeurIPS 2020# Fed
Near-Optimal Sparse Allreduce for Distributed Deep Learning (published in PPoPP'22)
Near-Optimal Sparse Allreduce for Distributed Deep Learning (published in PPoPP'22) Ok-Topk is a scheme for distributed training with sparse gradients
Meme-videos - Scrapes memes and turn them into a video compilations
Meme Videos Scrapes memes from reddit using praw and request and then converts t
Amazon web scraping using Scrapy Framework
Amazon-web-scraping-using-Scrapy-Framework Scrapy Scrapy is an application framework for crawling web sites and extracting structured data which can b
Orchestrating Distributed Materials Acceleration Platform Tutorial
Orchestrating Distributed Materials Acceleration Platform Tutorial This tutorial for orchestrating distributed materials acceleration platform was pre
ShadowClone allows you to distribute your long running tasks dynamically across thousands of serverless functions and gives you the results within seconds where it would have taken hours to complete
ShadowClone allows you to distribute your long running tasks dynamically across thousands of serverless functions and gives you the results within seconds where it would have taken hours to complete
Optuna is an automatic hyperparameter optimization software framework, particularly designed for machine learning
Optuna is an automatic hyperparameter optimization software framework, particularly designed for machine learning. It features an imperative, define-by-run style user API.
Python script that reads Aliexpress offers urls from a Excel filename (.csv) and post then in a Telegram channel using a bot
Aliexpress to telegram post Python script that reads Aliexpress offers urls from a Excel filename (.csv) and post then in a Telegram channel using a b
GNES enables large-scale index and semantic search for text-to-text, image-to-image, video-to-video and any-to-any content form
GNES is Generic Neural Elastic Search, a cloud-native semantic search system based on deep neural network.
Circuit Training: An open-source framework for generating chip floor plans with distributed deep reinforcement learning
Circuit Training: An open-source framework for generating chip floor plans with distributed deep reinforcement learning. Circuit Training is an open-s
Collection of machine learning related notebooks to share.
ML_Notebooks Collection of machine learning related notebooks to share. Notebooks GAN_distributed_training.ipynb In this Notebook, TensorFlow's tutori
Video Games Web Scraper is a project that crawls websites and APIs and extracts video game related data from their pages.
Video Games Web Scraper Video Games Web Scraper is a project that crawls websites and APIs and extracts video game related data from their pages. This
This code is the implementation of the paper "Coherence-Based Distributed Document Representation Learning for Scientific Documents".
Introduction This code is the implementation of the paper "Coherence-Based Distributed Document Representation Learning for Scientific Documents". If
This is a simple website crawler which asks for a website link from the user to crawl and find specific data from the given website address.
This is a simple website crawler which asks for a website link from the user to crawl and find specific data from the given website address.
A lightweight python module for building event driven distributed systems
Eventify A lightweight python module for building event driven distributed systems. Installation pip install eventify Problem Developers need a easy a
An API-first distributed deployment system of deep learning models using timeseries data to analyze and predict systems behaviour
Gordo Building thousands of models with timeseries data to monitor systems. Table of content About Examples Install Uninstall Developer manual How to
Check bookings for TUM libraries.
TUM Library Checker Only for educational purposes This repository contains a crawler to save bookings for TUM libraries in a CSV file. Sample data fro
A web crawler for recording posts in "sina weibo"
Web Crawler for "sina weibo" A web crawler for recording posts in "sina weibo" Introduction This script helps collect attributes of posts in "sina wei
Distributed deep learning on Hadoop and Spark clusters.
Note: we're lovingly marking this project as Archived since we're no longer supporting it. You are welcome to read the code and fork your own version
A Telegram crawler to search groups and channels automatically and collect any type of data from them.
Introduction This is a crawler I wrote in Python using the APIs of Telethon months ago. This tool was not intended to be publicly available for a numb
Develop and deploy applications with the Ionburst Cloud Python SDK.
Ionburst SDK for Python The Ionburst SDK for Python enables developers to easily integrate with Ionburst Cloud, building in ultra-secure and private o
BigDL - Evaluate the performance of BigDL (Distributed Deep Learning on Apache Spark) in big data analysis problems
Evaluate the performance of BigDL (Distributed Deep Learning on Apache Spark) in big data analysis problems.
High available distributed ip proxy pool, powerd by Scrapy and Redis
高可用IP代理池 README | 中文文档 本项目所采集的IP资源都来自互联网,愿景是为大型爬虫项目提供一个高可用低延迟的高匿IP代理池。 项目亮点 代理来源丰富 代理抓取提取精准 代理校验严格合理 监控完备,鲁棒性强 架构灵活,便于扩展 各个组件分布式部署 快速开始 注意,代码请在release
A distributed crawler for weibo, building with celery and requests.
A distributed crawler for weibo, building with celery and requests.
RLMeta is a light-weight flexible framework for Distributed Reinforcement Learning Research.
RLMeta rlmeta - a flexible lightweight research framework for Distributed Reinforcement Learning based on PyTorch and moolib Installation To build fro
Build a small, 3 domain internet using Github pages and Wikipedia and construct a crawler to crawl, render, and index.
TechSEO Crawler Build a small, 3 domain internet using Github pages and Wikipedia and construct a crawler to crawl, render, and index. Play with the r
Distributed Grid Descent: an algorithm for hyperparameter tuning guided by Bayesian inference, designed to run on multiple processes and potentially many machines with no central point of control
Distributed Grid Descent: an algorithm for hyperparameter tuning guided by Bayesian inference, designed to run on multiple processes and potentially many machines with no central point of control.
Distributed-systems-algos - Distributed Systems Algorithms For Python
Distributed Systems Algorithms ISIS algorithm In an asynchronous system that kee
Bigdata - This Scrapy project uses Redis and Kafka to create a distributed on demand scraping cluster
Scrapy Cluster This Scrapy project uses Redis and Kafka to create a distributed
This Scrapy project uses Redis and Kafka to create a distributed on demand scraping cluster
This Scrapy project uses Redis and Kafka to create a distributed on demand scraping cluster.
An distributed automation framework.
Automation Kit Repository Welcome to the Automation Kit repository! Note: This package is progressing quickly but is not yet ready for full production
Multilingual word vectors in 78 languages
Aligning the fastText vectors of 78 languages Facebook recently open-sourced word vectors in 89 languages. However these vectors are monolingual; mean
Code for a real-time distributed cooperative slam(RDC-SLAM) system for ROS compatible platforms.
RDC-SLAM This repository contains code for a real-time distributed cooperative slam(RDC-SLAM) system for ROS compatible platforms. The system takes in
Instrument asyncio Python for distributed tracing with AWS X-Ray.
xraysink (aka xray-asyncio) Extra AWS X-Ray instrumentation to use distributed tracing with asyncio Python libraries that are not (yet) supported by t
Amazon scraper using scrapy, a python framework for crawling websites.
#Amazon-web-scraper This is a python program, which use scrapy python framework to crawl all pages of the product and scrap products data. This progra
An off-line judger supporting distributed problem repositories
Thaw 中文 | English Thaw is an off-line judger supporting distributed problem repositories. Everyone can use Thaw release problems with license on GitHu
Create crawler get some new products with maximum discount in banimode website
crawler-banimode create crawler and get some new products with maximum discount in banimode website. این پروژه کوچک جهت یادگیری و کار با ابزار سلنیوم
Python Actor concurrency library
Thespian Actor Library This library provides the framework of an Actor model for use by applications implementing Actors. Thespian Site with Documenta
Screen scraping and web crawling framework
Pomp Pomp is a screen scraping and web crawling framework. Pomp is inspired by and similar to Scrapy, but has a simpler implementation that lacks the
Python Testing Crawler 🐍 🩺 🕷️ A crawler for automated functional testing of a web application
Python Testing Crawler 🐍 🩺 🕷️ A crawler for automated functional testing of a web application Crawling a server-side-rendered web application is a
coURLan: Clean, filter, normalize, and sample URLs
coURLan: Clean, filter, normalize, and sample URLs Why coURLan? “Given that the bandwidth for conducting crawls is neither infinite nor free, it is be
Free and Open, Distributed, RESTful Search Engine
Elasticsearch Elasticsearch is the distributed, RESTful search and analytics engine at the heart of the Elastic Stack. You can use Elasticsearch to st
Distributed behavioral experiments
Autopilot Docs Paper Forum Hardware Autopilot is a Python framework for performing complex, hardware-intensive behavioral experiments with swarms of n
A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch
Introduction This is a Python package available on PyPI for NVIDIA-maintained utilities to streamline mixed precision and distributed training in Pyto
A pythonic interface to high-throughput virtual screening software
pyscreener A pythonic interface to high-throughput virtual screening software Overview This repository contains the source of pyscreener, both a libra
Crawl BookCorpus
These are scripts to reproduce BookCorpus by yourself.
A pure-Python KSUID implementation
Svix - Webhooks as a service Svix-KSUID This library is inspired by Segment's KSUID implementation: https://github.com/segmentio/ksuid What is a ksuid
Clearly see and debug your celery cluster in real time!
Clearly see and debug your celery cluster in real time! Do you use celery, and monitor your tasks with flower? You'll probably like Clearly! 👍 Clearl
Deep Distributed Control of Port-Hamiltonian Systems
De(e)pendable Distributed Control of Port-Hamiltonian Systems (DeepDisCoPH) This repository is associated to the paper [1] and it contains: The full p
This is a web crawler that works on employ email data by gmane.org and visualizes it in different ways.
crawler_to_visual_gmane Analyzing an EMAIL Archive from gmane and vizualizing the data using the D3 JavaScript library. This is a set of tools that al
A full pipeline AutoML tool for tabular data
HyperGBM Doc | 中文 We Are Hiring! Dear folks,we are offering challenging opportunities located in Beijing for both professionals and students who are k
A distributed block-based data storage and compute engine
Nebula is an extremely-fast end-to-end interactive big data analytics solution. Nebula is designed as a high-performance columnar data storage and tabular OLAP engine.
PaperRobot: a paper crawler that can quickly download numerous papers, facilitating paper studying and management
PaperRobot PaperRobot 是一个论文抓取工具,可以快速批量下载大量论文,方便后期进行持续的论文管理与学习。 PaperRobot通过多个接口抓取论文,目前抓取成功率维持在90%以上。通过配置Config文件,可以抓取任意计算机领域相关会议的论文。 Installation Down
A small distributed download manager to help bypass device-specific bandwidth limitations.
Distributed Download Manager A small distributed download manager to help bypass device-specific bandwidth limitations. Architecture The download mana
Efficient and Scalable Physics-Informed Deep Learning and Scientific Machine Learning on top of Tensorflow for multi-worker distributed computing
Notice: Support for Python 3.6 will be dropped in v.0.2.1, please plan accordingly! Efficient and Scalable Physics-Informed Deep Learning Collocation-
The versatile ocean simulator, in pure Python, powered by JAX.
Veros is the versatile ocean simulator -- it aims to be a powerful tool that makes high-performance ocean modeling approachable and fun. Because Veros
An Amazon Product Scraper built using scapy module of python
Amazon Product Scraper This is an Amazon Product Scraper built using scapy module of python Features it scrape various things Product Title Product Im
A high-performance distributed deep learning system targeting large-scale and automated distributed training.
HETU Documentation | Examples Hetu is a high-performance distributed deep learning system targeting trillions of parameters DL model training, develop
Distributed algorithms, reimplemented for fun and practice
Distributed Algorithms Playground for reimplementing and experimenting with algorithms for distributed computing. Usage Running the code for Ring-AllR
ProxyBroker is an open source tool that asynchronously finds public proxies from multiple sources and concurrently checks them
ProxyBroker is an open source tool that asynchronously finds public proxies from multiple sources and concurrently checks them. Features F
A dead simple crawler to get books information from Douban.
Introduction A dead simple crawler to get books information from Douban. Pre-requesites Python 3 Install dependencies from requirements.txt (Optional)
A dead simple crawler to get books information from Douban.
Introduction A dead simple crawler to get books information from Douban. Pre-requesites Python 3 Install dependencies from requirements.txt (Optional)
Reproducible Data Science at Scale!
Pachyderm: The Data Foundation for Machine Learning Pachyderm provides the data layer that allows machine learning teams to productionize and scale th
Version three of the Accounting Project. You can now connect multiple computers together who are on the same IP (I'm sure I could set it up so it would work on different IP's) and add to a distributed ledger verified by blockchain.
Accounting_Cycle_V3 As I talked about in the second iteration of the accoutning project, I was going to add networking capabilities to the project. Re
Federated Deep Reinforcement Learning for the Distributed Control of NextG Wireless Networks.
FDRL-PC-Dyspan Federated Deep Reinforcement Learning for the Distributed Control of NextG Wireless Networks. This repository contains the entire code
Pretrained Cost Model for Distributed Constraint Optimization Problems
Pretrained Cost Model for Distributed Constraint Optimization Problems Requirements PyTorch 1.9.0 PyTorch Geometric 1.7.1 Directory structure baseline
FedTorch is an open-source Python package for distributed and federated training of machine learning models using PyTorch distributed API
FedTorch is a generic repository for benchmarking different federated and distributed learning algorithms using PyTorch Distributed API.
A High-Performance Distributed Library for Large-Scale Bundle Adjustment
MegBA: A High-Performance and Distributed Library for Large-Scale Bundle Adjustment This repo contains an official implementation of MegBA. MegBA is a
Simple Python library, distributed via binary wheels with few direct dependencies, for easily using wav2vec 2.0 models for speech recognition
Wav2Vec2 STT Python Beta Software Simple Python library, distributed via binary wheels with few direct dependencies, for easily using wav2vec 2.0 mode
A Docker image for plotting and farming the Chia™ cryptocurrency on one computer or across many.
An easy-to-use WebUI for crypto plotting and farming. Offers Plotman, MadMax, Chiadog, Bladebit, Farmr, and Forktools in a Docker container. Supports Chia, Cactus, Chives, Flax, Flora, HDDCoin, Maize, N-Chain, Staicoin, and Stor among others.
DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.
DeepSpeed+Megatron trained the world's most powerful language model: MT-530B DeepSpeed is hiring, come join us! DeepSpeed is a deep learning optimizat
Distributed, blockchain based hashtables middleware for deduplication of file uploads to the cloud
distributed-blockchain-based-secure-file-dedupe Searching is Distributed, Block and Access List for each upload is unique and it is stored in a single
PipeTransformer: Automated Elastic Pipelining for Distributed Training of Large-scale Models
PipeTransformer: Automated Elastic Pipelining for Distributed Training of Large-scale Models This repository is the official implementation of the fol
Demo code for "Logs in distributed systems" webinar
Hexlet Logs Demo Пререквизиты docker-compose python3 Учетка в DataDog Базовое понимание, что такое логи (можно почитать гайд
About Solve CTF offline disconnection problem - based on python3's small crawler
About Solve CTF offline disconnection problem - based on python3's small crawler, support keyword search and local map bed establishment, currently support Jianshu, xianzhi,anquanke,freebuf,seebug
Rottentomatoes, Goodreads and IMDB sites crawler. Semantic Web final project.
Crawler Rottentomatoes, Goodreads and IMDB sites crawler. Crawler written by beautifulsoup, selenium and lxml to gather books and films information an
Simple and Distributed Machine Learning
Synapse Machine Learning SynapseML (previously MMLSpark) is an open source library to simplify the creation of scalable machine learning pipelines. Sy
High performance distributed framework for training deep learning recommendation models based on PyTorch.
PERSIA (Parallel rEcommendation tRaining System with hybrId Acceleration) is developed by AI platform@Kuaishou Technology, collaborating with ETH. It
SPTAG: A library for fast approximate nearest neighbor search
SPTAG: A library for fast approximate nearest neighbor search SPTAG SPTAG (Space Partition Tree And Graph) is a library for large scale vector approxi
Problem statements on System Design and Software Architecture as part of Arpit's System Design Masterclass
Problem statements on System Design and Software Architecture as part of Arpit's System Design Masterclass
The pure and clear PyTorch Distributed Training Framework.
The pure and clear PyTorch Distributed Training Framework. Introduction Requirements and Usage Dependency Dataset Basic Usage Slurm Cluster Usage Base
Reinforcement learning library(framework) designed for PyTorch, implements DQN, DDPG, A2C, PPO, SAC, MADDPG, A3C, APEX, IMPALA ...
Automatic, Readable, Reusable, Extendable Machin is a reinforcement library designed for pytorch. Build status Platform Status Linux Windows Supported
Management of exclusive GPU access for distributed machine learning workloads
TensorHive is an open source tool for managing computing resources used by multiple users across distributed hosts. It focuses on granting
A distributed deep learning framework that supports flexible parallelization strategies.
FlexFlow FlexFlow is a deep learning framework that accelerates distributed DNN training by automatically searching for efficient parallelization stra