24 Repositories
Python clusters Libraries
BERTopic is a topic modeling technique that leverages 🤗 transformers and c-TF-IDF to create dense clusters allowing for easily interpretable topics whilst keeping important words in the topic descriptions
BERTopic BERTopic is a topic modeling technique that leverages 🤗 transformers and c-TF-IDF to create dense clusters allowing for easily interpretable
Static Features Classifier - A static features classifier for Point-Could clusters using an Attention-RNN model
Static Features Classifier This is a static features classifier for Point-Could
Distributed deep learning on Hadoop and Spark clusters.
Note: we're lovingly marking this project as Archived since we're no longer supporting it. You are welcome to read the code and fork your own version
The full training script for Enformer (Tensorflow Sonnet) on TPU clusters
Enformer TPU training script (wip) The full training script for Enformer (Tensorflow Sonnet) on TPU clusters, in an effort to migrate the model to pyt
Joint Unsupervised Learning (JULE) of Deep Representations and Image Clusters.
Joint Unsupervised Learning (JULE) of Deep Representations and Image Clusters. Overview This project is a Torch implementation for our CVPR 2016 paper
Project Faros is a reference implimentation of Red Hat OpenShift 4 on small footprint, bare-metal clusters.
Project Faros Project Faros is a reference implimentation of Red Hat OpenShift 4 on small footprint, bare-metal clusters. The project includes referen
Execute shell command lines in parallel on Slurm, S(on) of Grid Engine (SGE), PBS/Torque clusters
qbatch Execute shell command lines in parallel on Slurm, S(on) of Grid Engine (SGE), PBS/Torque clusters qbatch is a tool for executing commands in pa
Predicting Baseball Metric Clusters: Clustering Application in Python Using scikit-learn
Clustering Clustering Application in Python Using scikit-learn This repository contains the prediction of baseball metric clusters using MLB Statcast
Automatic tool focused on deriving metallicities of open clusters
metalcode Automatic tool focused on deriving metallicities of open clusters. Based on the method described in Pöhnl & Paunzen (2010, https://ui.adsabs
Tool for synchronizing clickhouse clusters
clicksync Tool for synchronizing clickhouse clusters works only with partitioned MergeTree tables can sync clusters with different node number uses in
The Fundamental Clustering Problems Suite (FCPS) summaries 54 state-of-the-art clustering algorithms, common cluster challenges and estimations of the number of clusters as well as the testing for cluster tendency.
FCPS Fundamental Clustering Problems Suite The package provides over sixty state-of-the-art clustering algorithms for unsupervised machine learning pu
Buckshot++ is a new algorithm that finds highly stable clusters efficiently.
Buckshot++: An Outlier-Resistant and Scalable Clustering Algorithm. (Inspired by the Buckshot Algorithm.) Here, we introduce a new algorithm, which we
The AKS cluster provisioner provisions AKS clusters :-)
Overview The AKS cluster provisioner provisions AKS clusters :-) It uses the Azure CLI to configure VNet and subnets before creating the cluster itsel
sysctl/sysfs settings on a fly for Kubernetes Cluster. No restarts are required for clusters and nodes.
SysBindings Daemon Little toolkit for control the sysctl/sysfs bindings on Kubernetes Cluster on the fly and without unnecessary restarts of cluster o
TensorFlowOnSpark brings TensorFlow programs to Apache Spark clusters.
TensorFlowOnSpark TensorFlowOnSpark brings scalable deep learning to Apache Hadoop and Apache Spark clusters. By combining salient features from the T
CLabel is a terminal-based cluster labeling tool that allows you to explore text data interactively and label clusters based on reviewing that data.
CLabel is a terminal-based cluster labeling tool that allows you to explore text data interactively and label clusters based on reviewing that
Joint parameterization and fitting of stroke clusters
StrokeStrip: Joint Parameterization and Fitting of Stroke Clusters Dave Pagurek van Mossel1, Chenxi Liu1, Nicholas Vining1,2, Mikhail Bessmeltsev3, Al
DistML is a Ray extension library to support large-scale distributed ML training on heterogeneous multi-node multi-GPU clusters
DistML is a Ray extension library to support large-scale distributed ML training on heterogeneous multi-node multi-GPU clusters
A very lightweight monitoring system for Raspberry Pi clusters running Kubernetes.
OMNI A very lightweight monitoring system for Raspberry Pi clusters running Kubernetes. Why? When I finished my Kubernetes cluster using a few Raspber
Massively parallel self-organizing maps: accelerate training on multicore CPUs, GPUs, and clusters
Somoclu Somoclu is a massively parallel implementation of self-organizing maps. It exploits multicore CPUs, it is able to rely on MPI for distributing
Python 3.6+ toolbox for submitting jobs to Slurm
Submit it! What is submitit? Submitit is a lightweight tool for submitting Python functions for computation within a Slurm cluster. It basically wraps
TensorFlowOnSpark brings TensorFlow programs to Apache Spark clusters.
TensorFlowOnSpark TensorFlowOnSpark brings scalable deep learning to Apache Hadoop and Apache Spark clusters. By combining salient features from the T
Flower is a web based tool for monitoring and administrating Celery clusters.
Real-time monitor and web admin for Celery distributed task queue
Massively parallel self-organizing maps: accelerate training on multicore CPUs, GPUs, and clusters
Somoclu Somoclu is a massively parallel implementation of self-organizing maps. It exploits multicore CPUs, it is able to rely on MPI for distributing