A very lightweight monitoring system for Raspberry Pi clusters running Kubernetes.

Related tags

Deep Learning omni
Overview

OMNI

A very lightweight monitoring system for Raspberry Pi clusters running Kubernetes.

omni

Why?

When I finished my Kubernetes cluster using a few Raspberry Pis, the first thing I wanted to do is install Prometheus + Grafana for monitoring, and so I did. But when I had all of it working I found a few drawbacks:

  • The Prometheus exporter pods use a lot of RAM
  • The Prometheus exporter pods use a considerable amount of CPU
  • Prometheus gathers way too much data that I don't really need.
  • The node where the main Prometheus pod is installed gets all of the information and saves it in its own database, constantly performing a lot of writes to the SD card. SD cards under lots of constant writing operations tend to die.

Last but not least, I like to learn how these things work.

Advantages

Omni has (what I consider) some advantages over the regular Prometheus + Grafana combo:

  • It uses almost no RAM (13 Mb)
  • It uses almost no CPU
  • It gathers only the information I need
  • All of the information is sent to an InfluxDB instance that could be outside of the cluster. This means that no information is persisted in the Pis, extending their SD card's lifetime.
  • InfluxDB acts as the database and the graph dashboard at the same time, so there is no need to also install Grafana (although you could if you wanted to).

Prerequisites

For Omni to work, you'll need to have a couple of things running first.

InfluxDB

It's a time series database (just like Prometheus) that has nice charts and UI overall.

One of the goals of this project is to avoid constant writing to the SD cards, so you have a few options for the placement of the database:

  1. Use InfluxDB's online service (there is even a free tier https://www.influxdata.com/influxdb-pricing/)
  2. Run an InfluxDB instance in a server outside the Pi cluster (this what I'm doing right now)
  3. If you have better storage in your cluster (like M.2, SSD, etc.) and don't have the SD card limitation, run InfluxDB in the same cluster.

Libraries

You'll need to have the libseccomp2.deb library installed in each of your nodes to avoid a Python error:

Fatal Python Error: pyinit_main: can't initialize time

(more info here)

To install it you can do it in two ways (only one is needed):

  • Ansible: all nodes at the same time

    Edit the file ansible-playbook-libs.yaml in this repo, add your hosts and run:

    ansible-playbook install-libs.yaml
  • SSH: one by one

    Connect into each of your nodes and run:

    wget http://ftp.us.debian.org/debian/pool/main/libs/libseccomp/libseccomp2_2.5.1-1_armhf.deb
    sudo dpkg -i libseccomp2_2.5.1-1_armhf.deb

Once you have it, everything should work ok.

Installation

Before deploying Omni you'll have to specify the attributes of your InfluxDB instance.

  1. Open omni-install.yaml and fill the variables with your InfluxDB instance information.

    NOTE: The attribute OMNI_DATA_RATE_SECONDS specifies the number of seconds between data reporting events that are sent to the InfluxDB server.

  2. Check that everything is running as expected:

kubectl get all -n omni-system

And you are done! 🎉

Contributions

Pull requests with improvements and new features are more than welcome.

Comments
  • Is this system compatible with VictoriaMetrics ?

    Is this system compatible with VictoriaMetrics ?

    I know that there is a practice to write data by Influx protocol into InfluxDB. Also I know that InfluxDB have an issues with high RAM consumption on running device. Is this system compatible with VictoriaMetrics ? Because it can be run as simple binary file and as I know it requires less memory and cpu.

    opened by denisgolius 4
  • armhf vs arm64

    armhf vs arm64

    Hey, This is really cool - thanks for all the hard work :smile:

    Just thought I'd raise this, as it was a small issue I hit.

    my hardware:

    1 x raspberry pi 4 (8gb) 2 x raspberry pi 3b+ 's (1gb) 7 x raspberry pi compute module 3+'s (1gb) All running ubuntu server 20.04 LTS 64-bit

    On trying to install using your Ansible, i got this error across all nodes:

    fatal: [node2]: FAILED! => {"changed": true, "cmd": ["dpkg", "-i", "libseccomp2_2.5.1-1_armhf.deb"], "delta": "0:00:00.237499", "end": "2021-06-18 14:07:28.893279", "msg": "non-zero return code", "rc": 1, "start": "2021-06-18 14:07:28.655780", "stderr": "dpkg: error processing archive libseccomp2_2.5.1-1_armhf.deb (--install):\n package architecture (armhf) does not match system (arm64)\nErrors were encountered while processing:\n libseccomp2_2.5.1-1_armhf.deb", "stderr_lines": ["dpkg: error processing archive libseccomp2_2.5.1-1_armhf.deb (--install):", " package architecture (armhf) does not match system (arm64)", "Errors were encountered while processing:", " libseccomp2_2.5.1-1_armhf.deb"], "stdout": "", "stdout_lines": []}

    I just switched the URL to point to an arm64 binary instead and all was fine. I was going to make the change and open a pull request, but i appreciate you may want to keep the current armhf deb in place for a number of reasons! If not, let me know and I'll get a pull request opened!

    For anyone who may come across this issue in future, my fix was updating the ansible-playbook-libs.yaml file to the following:

      - name: Download libseccomp2.deb
        command: "wget http://ftp.us.debian.org/debian/pool/main/libs/libseccomp/libseccomp2_2.5.1-1_arm64.deb"
        become: yes
    
      - name: Install libseccomp2.deb
        command: "dpkg -i libseccomp2_2.5.1-1_arm64.deb"
        become: yes`
    

    Thanks again!

    opened by DrewKnowles1 3
  • Dashboard

    Dashboard

    Hello, first of all thank you so much. This repo is so great for a small start on monitoring Raspberry Pi's.

    I couldn't find the InfluxDB Dashboard you have attached in the README anywhere in the repo. Am I missing something or did you simply not added the dashboard to the project?

    opened by unalkalkan 2
  • Pod specific statistics - probably a feature request

    Pod specific statistics - probably a feature request

    Hi,

    I really like the light-weight nature of Omni. It was simple to install as well. I was like you, I tried to run Prometheus on my 3 pi k3s cluster, and while it would run for a day or so, it would soon crash no mater how many resources I gave to it.

    Would it be possible to add pod-specific statistics to Omni? I'd like to be able to make sure that individual pods don't run out of disk space, and also see how much CPU they consume over time. I do have the Kubernetes dashboard installed, but I'd like the flexibility of setting up alerts etc.

    Thanks!

    P.S. it was awesome hooking up Omni to the influx free cloud service. I had no idea that existed, and its great for my small use-case!

    opened by alarys 1
  • 404 on libseccomp2 library wget url

    404 on libseccomp2 library wget url

    The library url http://ftp.us.debian.org/debian/pool/main/libs/libseccomp/libseccomp2_2.5.1-1_armhf.deb for the wget shown in both the readme and the ansible playbook returns 404 now.

    I am figuring out the update and will do a PR but if anyone else gets this error I am poking around on the packages webpage here to find the updated url.

    For now I I have 64 bit Ubuntu 21.10 on my rpi4 cluster so the latest version was installed anyways and all worked great. Thanks again for thsi project

    opened by tonysurma 1
  • Omni not passing token?

    Omni not passing token?

    this is great - I did EXACTLY what you did and setup Prometheus and Grafana and nearly killed my 4-node Rpi cluster. When I set everything up, I ended up seeing in the omni pod logs this error:

                    omni
                    return self.request("POST", url,
                    omni
                    File "/usr/local/lib/python3.8/site-packages/influxdb_client/rest.py", line 250, in request
                    omni
                    raise ApiException(http_resp=r)
                    omni
                    influxdb_client.rest.ApiException: (401)
                    omni
                    Reason: Unauthorized
    

    There is a corresponding error in influxdb:

                  influxdb
                  ts=2021-10-13T17:10:18.772914Z lvl=info msg=Unauthorized log_id=0XAaFye0000 error="token required"
    

    I've triple-checked, and tried a couple of different kinds of tokens. Have you seen any common reasons for this? Thanks in advance for any help, if I can end up helping here I'll do so but I'm pretty rusty.

    opened by spanko 2
Owner
Matias Godoy
Jack of all trades, master of none
Matias Godoy
Technical Indicators implemented in Python only using Numpy-Pandas as Magic - Very Very Fast! Very tiny! Stock Market Financial Technical Analysis Python library . Quant Trading automation or cryptocoin exchange

MyTT Technical Indicators implemented in Python only using Numpy-Pandas as Magic - Very Very Fast! to Stock Market Financial Technical Analysis Python

dev 34 Dec 27, 2022
An air quality monitoring service with a Raspberry Pi and a SDS011 sensor.

Raspberry Pi Air Quality Monitor A simple air quality monitoring service for the Raspberry Pi. Installation Clone the repository and run the following

rydercalmdown 24 Dec 9, 2022
Complete system for facial identity system. Include one-shot model, database operation, features visualization, monitoring

Complete system for facial identity system. Include one-shot model, database operation, features visualization, monitoring

null 2 Dec 28, 2021
MohammadReza Sharifi 27 Dec 13, 2022
Joint parameterization and fitting of stroke clusters

StrokeStrip: Joint Parameterization and Fitting of Stroke Clusters Dave Pagurek van Mossel1, Chenxi Liu1, Nicholas Vining1,2, Mikhail Bessmeltsev3, Al

Dave Pagurek 44 Dec 1, 2022
TensorFlowOnSpark brings TensorFlow programs to Apache Spark clusters.

TensorFlowOnSpark TensorFlowOnSpark brings scalable deep learning to Apache Hadoop and Apache Spark clusters. By combining salient features from the T

Yahoo 3.8k Jan 4, 2023
Automatic tool focused on deriving metallicities of open clusters

metalcode Automatic tool focused on deriving metallicities of open clusters. Based on the method described in Pöhnl & Paunzen (2010, https://ui.adsabs

null 2 Dec 13, 2021
The full training script for Enformer (Tensorflow Sonnet) on TPU clusters

Enformer TPU training script (wip) The full training script for Enformer (Tensorflow Sonnet) on TPU clusters, in an effort to migrate the model to pyt

Phil Wang 10 Oct 19, 2022
Static Features Classifier - A static features classifier for Point-Could clusters using an Attention-RNN model

Static Features Classifier This is a static features classifier for Point-Could

ABDALKARIM MOHTASIB 1 Jan 25, 2022
Face Recognition & AI Based Smart Attendance Monitoring System.

In today’s generation, authentication is one of the biggest problems in our society. So, one of the most known techniques used for authentication is h

Sagar Saha 1 Jan 14, 2022
Lightweight mmm - Lightweight (Bayesian) Media Mix Model

Lightweight (Bayesian) Media Mix Model This is not an official Google product. L

Google 342 Jan 3, 2023
A facial recognition doorbell system using a Raspberry Pi

Facial Recognition Doorbell This project expands on the person-detecting doorbell system to allow it to identify faces, and announce names accordingly

rydercalmdown 22 Apr 15, 2022
Machine Learning Platform for Kubernetes

Reproduce, Automate, Scale your data science. Welcome to Polyaxon, a platform for building, training, and monitoring large scale deep learning applica

polyaxon 3.2k Dec 23, 2022
OpenDILab RL Kubernetes Custom Resource and Operator Lib

DI Orchestrator DI Orchestrator is designed to manage DI (Decision Intelligence) jobs using Kubernetes Custom Resource and Operator. Prerequisites A w

OpenDILab 205 Dec 29, 2022
Real-Time Social Distance Monitoring tool using Computer Vision

Social Distance Detector A Real-Time Social Distance Monitoring Tool Table of Contents Motivation YOLO Theory Detection Output Tech Stack Functionalit

Pranav B 13 Oct 14, 2022
Attendance Monitoring with Face Recognition using Python

Attendance Monitoring with Face Recognition using Python A python GUI integrated attendance system using face recognition to take attendance. In this

Vaibhav Rajput 2 Jun 21, 2022
Run object detection model on the Raspberry Pi

Using TensorFlow Lite with Python is great for embedded devices based on Linux, such as Raspberry Pi.

Dimitri Yanovsky 6 Oct 8, 2022
Tutorial to set up TensorFlow Object Detection API on the Raspberry Pi

A tutorial showing how to set up TensorFlow's Object Detection API on the Raspberry Pi

Evan 1.1k Dec 26, 2022
Make a surveillance camera from your raspberry pi!

rpi-surveillance Make a surveillance camera from your Raspberry Pi 4! The surveillance is built as following: the camera records 10 seconds video and

Vladyslav 62 Feb 3, 2022