266 Repositories
Python distributed-crawler Libraries
Building house price data pipelines with Apache Beam and Spark on GCP
This project contains the process from building a web crawler to extract the raw data of house price to create ETL pipelines using Google Could Platform services.
A tool to determine optimal projects for Gridcoin crunchers. Maximize your magnitude!
FindTheMag FindTheMag helps optimize your BOINC client for Gridcoin mining. You can group BOINC projects into two groups: "preferred" projects and "mi
Official PyTorch implementation for "Low Precision Decentralized Distributed Training with Heterogenous Data"
Low Precision Decentralized Training with Heterogenous Data Official PyTorch implementation for "Low Precision Decentralized Distributed Training with
Google Maps crawler using Selenium
Google Maps Crawler using Selenium Built as part of the Antifragile Dev Project Selenium crawler that browses Google Maps as a regular user and stores
High performance distributed framework for training deep learning recommendation models based on PyTorch.
High performance distributed framework for training deep learning recommendation models based on PyTorch.
A distributed, plug-n-play algorithm for multi-robot applications with a priori non-computable objective functions
A distributed, plug-n-play algorithm for multi-robot applications with a priori non-computable objective functions Kapoutsis, A.C., Chatzichristofis,
Aiorq is a distributed task queue with asyncio and redis
Aiorq is a distributed task queue with asyncio and redis, which rewrite from arq to make improvement and include web interface.
Python script to check if there is any differences in responses of an application when the request comes from a search engine's crawler.
crawlersuseragents This Python script can be used to check if there is any differences in responses of an application when the request comes from a se
AdaNet is a lightweight TensorFlow-based framework for automatically learning high-quality models with minimal expert intervention
AdaNet is a lightweight TensorFlow-based framework for automatically learning high-quality models with minimal expert intervention. AdaNet buil
An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.
NNI Doc | 简体中文 NNI (Neural Network Intelligence) is a lightweight but powerful toolkit to help users automate Feature Engineering, Neural Architecture
A hyperparameter optimization framework
Optuna: A hyperparameter optimization framework Website | Docs | Install Guide | Tutorial Optuna is an automatic hyperparameter optimization software
Code for the paper "Attention Approximates Sparse Distributed Memory"
Attention Approximates Sparse Distributed Memory - Codebase This is all of the code used to run analyses in the paper "Attention Approximates Sparse D
A Pixiv web crawler module
Pixiv-spider A Pixiv spider module WARNING It's an unfinished work, browsing the code carefully before using it. Features 0004 - Readme.md updated, co
The Python agent for Apache SkyWalking
SkyWalking Python Agent SkyWalking-Python: The Python Agent for Apache SkyWalking, which provides the native tracing abilities for Python project. Sky
System Design Assignments as part of Arpit's System Design Masterclass
System Design Assignments The repository contains a set of problem statements around Software Architecture and System Design as conducted by Arpit's S
Unified Distributed Execution
Unified Distributed Execution The framework supports multiple execution backends: Ray, Dask, MPI and MultiProcessing. To run tests you need to install
Vertical Federated Principal Component Analysis and Its Kernel Extension on Feature-wise Distributed Data based on Pytorch Framework
VFedPCA+VFedAKPCA This is the official source code for the Paper: Vertical Federated Principal Component Analysis and Its Kernel Extension on Feature-
Qtas(Quite a Storage)is an experimental distributed storage system developed by Q-team in BJFU Advanced Computer Network sources.
Qtas(Quite a Storage)is a experimental distributed storage system developed by Q-team in BJFU Advanced Computer Network sources.
A multithreaded tool for searching and downloading images from popular search engines. It is straightforward to set up and run!
🕳️ CygnusX1 Code by Trong-Dat Ngo. Overviews 🕳️ CygnusX1 is a multithreaded tool 🛠️ , used to search and download images from popular search engine
Qtas(Quite a Storage)is an experimental distributed storage system developed by Q-team in BJFU Advanced Computer Network sources.
Qtas(Quite a Storage)is a experimental distributed storage system developed by Q-team in BJFU Advanced Computer Network sources.
Distributing Deep Learning Hyperparameter Tuning for 3D Medical Image Segmentation
DistMIS Distributing Deep Learning Hyperparameter Tuning for 3D Medical Image Segmentation. DistriMIS Distributing Deep Learning Hyperparameter Tuning
Generalized and Efficient Blackbox Optimization System.
OpenBox Doc | OpenBox中文文档 OpenBox: Generalized and Efficient Blackbox Optimization System OpenBox is an efficient and generalized blackbox optimizatio
Revealing and Protecting Labels in Distributed Training
Revealing and Protecting Labels in Distributed Training
Migration of Edge-based Distributed Federated Learning
FedFly: Towards Migration in Edge-based Distributed Federated Learning About the research Due to mobility, a device participating in Federated Learnin
Fast and Easy-to-use Distributed Graph Learning for PyTorch Geometric
Fast and Easy-to-use Distributed Graph Learning for PyTorch Geometric
OneFlow is a performance-centered and open-source deep learning framework.
OneFlow OneFlow is a performance-centered and open-source deep learning framework. Latest News Version 0.5.0 is out! First class support for eager exe
Colossal-AI: A Unified Deep Learning System for Large-Scale Parallel Training
ColossalAI An integrated large-scale model training system with efficient parallelization techniques. arXiv: Colossal-AI: A Unified Deep Learning Syst
Crawler do site Fundamentus.com com o uso do framework scrapy, tanto da aba detalhada como a de resumo.
Crawler do site Fundamentus.com com o uso do framework scrapy, tanto da aba detalhada como a de resumo. (Todas as infomações)
Colossal-AI: A Unified Deep Learning System for Large-Scale Parallel Training
ColossalAI An integrated large-scale model training system with efficient parallelization techniques Installation PyPI pip install colossalai Install
Distributed Synchronization for Python
Distributed Synchronization for Python Tutti is a nearly drop-in replacement for python's built-in synchronization primitives that lets you fearlessly
A fantasy life simulator and role-playing game hybrid distributed as CLI, written in Python 3.
Life is Fantasy Epic (LIFE) A fantasy life simulator and role-playing game hybrid distributed as CLI, written in Python 3. This repository will be pro
PyTorch image models, scripts, pretrained weights -- ResNet, ResNeXT, EfficientNet, EfficientNetV2, NFNet, Vision Transformer, MixNet, MobileNet-V3/V2, RegNet, DPN, CSPNet, and more
PyTorch Image Models Sponsors What's New Introduction Models Features Results Getting Started (Documentation) Train, Validation, Inference Scripts Awe
Deep Web Miner Python | Spyder Crawler
Webcrawler written in Python. This crawler does dig in till the 3 level of inside addressed and mine the respective data accordingly
eyes is a Public Opinion Mining System focusing on taiwanese forums such as PTT, Dcard.
eyes is a Public Opinion Mining System focusing on taiwanese forums such as PTT, Dcard. Features 🔥 Article monitor: helps you capture the trend at a
Hapi is a Python library for building Conceptual Distributed Model using HBV96 lumped model & Muskingum routing method
Current build status All platforms: Current release info Name Downloads Version Platforms Hapi - Hydrological library for Python Hapi is an open-sourc
A Python framework for developing parallelized Computational Fluid Dynamics software to solve the hyperbolic 2D Euler equations on distributed, multi-block structured grids.
pyHype: Computational Fluid Dynamics in Python pyHype is a Python framework for developing parallelized Computational Fluid Dynamics software to solve
Pytorch implementation for DFN: Distributed Feedback Network for Single-Image Deraining.
DFN:Distributed Feedback Network for Single-Image Deraining Abstract Recently, deep convolutional neural networks have achieved great success for sing
A web crawler script that crawls the target website and lists its links
A web crawler script that crawls the target website and lists its links || A web crawler script that lists links by scanning the target website.
A crawler of doubamovie
豆瓣电影 A crawler of doubamovie 一个小小的入门级scrapy框架的应用,选取豆瓣电影对排行榜前1000的电影数据进行爬取。 spider.py start_requests方法为scrapy的方法,我们对它进行重写。 def start_requests(self):
Crawler job that scrapes comments from social media posts and saves them in a S3 bucket.
Toxicity comments crawler Crawler job that scrapes comments from social media posts and saves them in a S3 bucket. Twitter Tweets and replies are scra
Audio media crawler for lbry.
Audio media crawler for lbry. Requirements Python 3.8 Poetry 1.1.7 Elasticsearch 7.14.0 Lbry-sdk 0.99.0 Development This project uses poetry as a depe
A Python package that scrapes Google News article data while remaining undetected by Google.
A Python package that scrapes Google News article data while remaining undetected by Google. Our scraper can scrape page data up until the last page and never trigger a CAPTCHA (download stats: https://pepy.tech/project/GoogleNewsScraper)
Console application for downloading images from Reddit in Python
RedditImageScraper Console application for downloading images from Reddit in Python Introduction This short Python script was created for the mass-dow
👁️ Tool for Data Extraction and Web Requests.
httpmapper 👁️ Project • Technologies • Installation • How it works • License Project 🚧 For educational purposes. This is a project that I developed,
Crawler in Python 3.7, 3.8. 3.9. Pypy3
Description Python Crawler written Python 3. (Supports major Python releases Python3.6, Python3.7 and Python 3.8) Installation and Use Setup VirtualEn
The core packages of security analyzer web crawler
Security Analyzer 🐍 A large scale web crawler (considered also as vulnerability scanner tool) to take an overview about security of Moroccan sites Cu
This app will let you continuously scrape certain parts of LeasePlan and extract data of cars becoming available for lease.
LeasePlan - Scraper This app will let you continuously scrape certain parts of LeasePlan and extract data of cars becoming available for lease. It has
Experiments for distributed optimization algorithms
Network-Distributed Algorithm Experiments -- This repository contains a set of optimization algorithms and objective functions, and all code needed to
输入Google Hacking语句,自动调用Chrome浏览器爬取结果
Google-Hacking-Crawler 该脚本可输入Google Hacking语句,自动调用Chrome浏览器爬取结果 环境配置 python -m pip install -r requirements.txt 下载Chrome浏览器
Unofficial PyTorch implementation of DeepMind's Perceiver IO with PyTorch Lightning scripts for distributed training
Unofficial PyTorch implementation of DeepMind's Perceiver IO with PyTorch Lightning scripts for distributed training
Pritunl is a distributed enterprise vpn server built using the OpenVPN protocol.
Pritunl is a distributed enterprise vpn server built using the OpenVPN protocol.
A lightweight wrapper for PyTorch that provides a simple declarative API for context switching between devices, distributed modes, mixed-precision, and PyTorch extensions.
A lightweight wrapper for PyTorch that provides a simple declarative API for context switching between devices, distributed modes, mixed-precision, and PyTorch extensions.
Lingvo is a framework for building neural networks in Tensorflow, particularly sequence models.
Lingvo is a framework for building neural networks in Tensorflow, particularly sequence models.
Pytorch implementation of Distributed Proximal Policy Optimization: https://arxiv.org/abs/1707.02286
Pytorch-DPPO Pytorch implementation of Distributed Proximal Policy Optimization: https://arxiv.org/abs/1707.02286 Using PPO with clip loss (from https
A low-code tool that generates python crawler code based on curl or url
KKBA Intruoduction A low-code tool that generates python crawler code based on curl or url Requirement Python = 3.6 Install pip install kkba Usage Co
Python script to download all images/webms of a 4chan thread
Python3 script to continuously download all images/webms of multiple 4chan thread simultaneously - without installation
Python implementation of the IPv8 layer provide authenticated communication with privacy
Python implementation of the IPv8 layer provide authenticated communication with privacy
Pytorch implementation of Distributed Proximal Policy Optimization
Pytorch-DPPO Pytorch implementation of Distributed Proximal Policy Optimization: https://arxiv.org/abs/1707.02286 Using PPO with clip loss (from https
A toolkit to automatically crawl the paper list and download paper pdfs of ACL Ahthology.
ACL-Anthology-Crawler A toolkit to automatically crawl the paper list and download paper pdfs of ACL Anthology
Pixiv 爬虫,使用 Python 实现。支持批量下载、上传到图床。
用 Python 实现的 Pixiv 爬虫,支持批量下载和上传。 随机图片 API: https://loliapi.ml/ Deploy Github Action 集成部署 建议使用本方法部署,相较于本地部署,无需搭建环境,全程在线上完成。并且使用国外服务器下载、上传,网络更加通畅。 Fork
A Web Scraper built with beautiful soup, that fetches udemy course information. Get udemy course information and convert it to json, csv or xml file
Udemy Scraper A Web Scraper built with beautiful soup, that fetches udemy course information. Installation Virtual Environment Firstly, it is recommen
一款利用Python来自动获取QQ音乐上某个歌手所有歌曲歌词的爬虫软件
QQ音乐歌词爬虫 一款利用Python来自动获取QQ音乐上某个歌手所有歌曲歌词的爬虫软件,默认去除了所有演唱会(Live)版本的歌曲。 使用方法 直接运行python run.py即可,然后输入你想获取的歌手名字,然后静静等待片刻。 output目录下保存生成的歌词和歌名文件。以周杰伦为例,会生成两
HyperPose is a library for building high-performance custom pose estimation applications.
HyperPose is a library for building high-performance custom pose estimation applications.
XGBoost-Ray is a distributed backend for XGBoost, built on top of distributed computing framework Ray.
XGBoost-Ray is a distributed backend for XGBoost, built on top of distributed computing framework Ray.
WAGMA-SGD is a decentralized asynchronous SGD for distributed deep learning training based on model averaging.
WAGMA-SGD is a decentralized asynchronous SGD based on wait-avoiding group model averaging. The synchronization is relaxed by making the collectives externally-triggerable, namely, a collective can be initiated without requiring that all the processes enter it. It partially reduces the data within non-overlapping groups of process, improving the parallel scalability.
Bagua is a flexible and performant distributed training algorithm development framework.
Bagua is a flexible and performant distributed training algorithm development framework.
mlscraper: Scrape data from HTML pages automatically with Machine Learning
🤖 Scrape data from HTML websites automatically with Machine Learning
Distributed DataLoader For Pytorch Based On Ray
Dpex——用户无感知分布式数据预处理组件 一、前言 随着GPU与CPU的算力差距越来越大以及模型训练时的预处理Pipeline变得越来越复杂,CPU部分的数据预处理已经逐渐成为了模型训练的瓶颈所在,这导致单机的GPU配置的提升并不能带来期望的线性加速。预处理性能瓶颈的本质在于每个GPU能够使用的C
This repository contains all the code and materials distributed in the 2021 Q-Programming Summer of Qode.
Q-Programming Summer of Qode This repository contains all the code and materials distributed in the Q-Programming Summer of Qode. If you want to creat
Secure Distributed Training at Scale
Secure Distributed Training at Scale This repository contains the implementation of experiments from the paper "Secure Distributed Training at Scale"
DistML is a Ray extension library to support large-scale distributed ML training on heterogeneous multi-node multi-GPU clusters
DistML is a Ray extension library to support large-scale distributed ML training on heterogeneous multi-node multi-GPU clusters
A parallel framework for population-based multi-agent reinforcement learning.
MALib: A parallel framework for population-based multi-agent reinforcement learning MALib is a parallel framework of population-based learning nested
A simple multi-threaded distributed SSH brute-forcing tool written in Python.
OrbitalDump A simple multi-threaded distributed SSH brute-forcing tool written in Python. How it Works When the script is executed without the --proxi
dask-sql is a distributed SQL query engine in python using Dask
dask-sql is a distributed SQL query engine in Python. It allows you to query and transform your data using a mixture of common SQL operations and Python code and also scale up the calculation easily if you need it.
discovering subdomains, hidden paths, extracting unique links
python-website-crawler discovering subdomains, hidden paths, extracting unique links pip install -r requirements.txt discover subdomain: You can give
A general-purpose multi-agent training framework.
MALib A general-purpose multi-agent training framework. Installation step1: build environment conda create -n malib python==3.7 -y conda activate mali
DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.
DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.
Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning
GenSen Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning Sandeep Subramanian, Adam Trischler, Yoshua B
Ray provides a simple, universal API for building distributed applications.
An open source framework that provides a simple, universal API for building distributed applications. Ray is packaged with RLlib, a scalable reinforcement learning library, and Tune, a scalable hyperparameter tuning library.
Distributed Task Queue (development branch)
Version: 5.1.0b1 (singularity) Web: https://docs.celeryproject.org/en/stable/index.html Download: https://pypi.org/project/celery/ Source: https://git
A multiprocessing distributed task queue for Django
A multiprocessing distributed task queue for Django Features Multiprocessing worker pool Asynchronous tasks Scheduled, cron and repeated tasks Signed
A Research-oriented Federated Learning Library and Benchmark Platform for Graph Neural Networks. Accepted to ICLR'2021 - DPML and MLSys'21 - GNNSys workshops.
FedGraphNN: A Federated Learning System and Benchmark for Graph Neural Networks A Research-oriented Federated Learning Library and Benchmark Platform
Automatically detect changes made to the official Telegram sites.
🕷 Telegram Web Crawler This project is developed to automatically detect changes made to the official Telegram sites. This is necessary for anticipat
Simple, fast, and parallelized symbolic regression in Python/Julia via regularized evolution and simulated annealing
Parallelized symbolic regression built on Julia, and interfaced by Python. Uses regularized evolution, simulated annealing, and gradient-free optimization.
Simple Pose: Rethinking and Improving a Bottom-up Approach for Multi-Person Pose Estimation
SimplePose Code and pre-trained models for our paper, “Simple Pose: Rethinking and Improving a Bottom-up Approach for Multi-Person Pose Estimation”, a
A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch
Introduction This repository holds NVIDIA-maintained utilities to streamline mixed precision and distributed training in Pytorch. Some of the code her
🎛 Distributed machine learning made simple.
🎛 lazycluster Distributed machine learning made simple. Use your preferred distributed ML framework like a lazy engineer. Getting Started • Highlight
Distributed Evolutionary Algorithms in Python
DEAP DEAP is a novel evolutionary computation framework for rapid prototyping and testing of ideas. It seeks to make algorithms explicit and data stru
Distributed scikit-learn meta-estimators in PySpark
sk-dist: Distributed scikit-learn meta-estimators in PySpark What is it? sk-dist is a Python package for machine learning built on top of scikit-learn
Decentralized deep learning in PyTorch. Built to train models on thousands of volunteers across the world.
Hivemind: decentralized deep learning in PyTorch Hivemind is a PyTorch library to train large neural networks across the Internet. Its intended usage
Distributed Computing for AI Made Simple
Project Home Blog Documents Paper Media Coverage Join Fiber users email list [email protected] Fiber Distributed Computing for AI Made Simp
A high performance and generic framework for distributed DNN training
BytePS BytePS is a high performance and general distributed training framework. It supports TensorFlow, Keras, PyTorch, and MXNet, and can run on eith
a distributed deep learning platform
Apache SINGA Distributed deep learning system http://singa.apache.org Quick Start Installation Examples Issues JIRA tickets Code Analysis: Mailing Lis
Distributed Tensorflow, Keras and PyTorch on Apache Spark/Flink & Ray
A unified Data Analytics and AI platform for distributed TensorFlow, Keras and PyTorch on Apache Spark/Flink & Ray What is Analytics Zoo? Analytics Zo
DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.
DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective. 10x Larger Models 10x Faster Trainin
Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.
Petastorm Contents Petastorm Installation Generating a dataset Plain Python API Tensorflow API Pytorch API Spark Dataset Converter API Analyzing petas
Distributed Deep learning with Keras & Spark
Elephas: Distributed Deep Learning with Keras & Spark Elephas is an extension of Keras, which allows you to run distributed deep learning models at sc
BigDL: Distributed Deep Learning Framework for Apache Spark
BigDL: Distributed Deep Learning on Apache Spark What is BigDL? BigDL is a distributed deep learning library for Apache Spark; with BigDL, users can w
Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.
Horovod Horovod is a distributed deep learning training framework for TensorFlow, Keras, PyTorch, and Apache MXNet. The goal of Horovod is to make dis
An open source framework that provides a simple, universal API for building distributed applications. Ray is packaged with RLlib, a scalable reinforcement learning library, and Tune, a scalable hyperparameter tuning library.
Ray provides a simple, universal API for building distributed applications. Ray is packaged with the following libraries for accelerating machine lear