Flexible and scalable monitoring framework

Related tags

DevOps Tools shinken
Overview

Presentation of the Shinken project

Welcome to the Shinken project.

https://api.travis-ci.org/naparuba/shinken.svg?branch=master

Shinken is a modern, Nagios compatible monitoring framework, written in Python. Its main goal is to give users a flexible architecture for their monitoring system that is designed to scale to large environments.

Shinken is backwards-compatible with the Nagios configuration standard and plugins. It works on any operating system and architecture that supports Python, which includes Windows, GNU/Linux and FreeBSD.

Requirements

See the Documentation

There are mandatory and conditional requirements for the installation methods which are described below.

Installing Shinken

See the Documentation

Update

Launch:

python setup.py install --update

It will only update the shinken lib and scripts, but won't touch your current configuration

Running

Shinken is installed with init.d scripts, enables them at boot time and starts them right after the install process ends. Based on your linux distro you only need to do:

chkconfig --add shinken chkconfig shinken on

or :

update-rc.d shinken defaults 20

Where is the configuration?

The configuration is on the directory, /etc/shinken.

Where are the logs?

Logs are in /var/log/shinken (what did you expect?)

I got a bug, how to launch the daemons in debug mode?

You only need to launch:

/etc/init.d/shinken -d start

Debug logs will be based on the log directory (/var/log/shinken)

I switched from Nagios, do I need to change my existing Nagios configuration?

No, there is no need to change the existing configuration - unless you want to add some new hosts and services. Once you are comfortable with Shinken you can start to use its unique and powerful features.

Learn more about how to use and configure Shinken

Jump to the Shinken documentation.

If you find a bug

Bugs are tracked in the issue list on GitHub . Always search for existing issues before filing a new one (use the search field at the top of the page). When filing a new bug, please remember to include:

  • A helpful title - use descriptive keywords in the title and body so others can find your bug (avoiding duplicates).
  • Steps to reproduce the problem, with actual vs. expected results
  • Shinken version (or if you're pulling directly from the Git repo, your current commit SHA - use git rev-parse HEAD)
  • OS version
  • If the problem happens with specific code, link to test files (gist.github.com is a great place to upload code).
  • Screenshots are very helpful if you're seeing an error message or a UI display problem. (Just drag an image into the issue description field to include it).
Comments
  • NrpeBooster Stale checks

    NrpeBooster Stale checks

    Reference: http://www.shinken-monitoring.org/forum/index.php/topic,502.0.html

    Poller and scheduler debug logs are in the thread.

    draustus writes:

    Shinken 1.0.1 installation with Multisite frontend, using NRPE on hosts, and the NrpeBooster module for the poller. I've noticed the NRPE checks are polled once after the Shinken service is restarted, but then never again till the next restart. I can watch the "time of the next scheduled service check" field value climb endlessly from 0 seconds, when that value should start at 3-5 minutes and shrink, as it does on non-NRPE checks.

    Also, past a certain window of time after a service restart, actively forcing a re-check via the web interface will result in a timeout, though running a check using the check_nrpe Nagios plugin will succeed. I'm not sure how much time must pass before it hits this communication breakdown, but I can try periodically testing to make a guess.

    Commenting out the "modules NrpeBooster" line of shinken-specific.cfg seems to stop Shinken from checking NRPE at all, and I'm not seeing anything in the logs either way, whether debugging is enabled or not, as to why checks aren't happening.

    Looks to be a bug with the NrpeBooster module; commenting out that line in shinken-specific.cfg and " module_type nrpe_poller" in commands.cfg, falling back on the Nagios plugin check_nrpe has made the checks succeed and decrement the time till next check counter properly.

    When the time till next check field is working properly, it says something like 'in 45 seconds', but when it's not working it just says '45 seconds', for instance. Maybe that will help in tracking things down.

    Also, when this is happening it looks like forced re-checks start timing out 15 minutes after a shinken restart.

    Poller 
    opened by xkilian 63
  •  Services disappearing from thruk on config change and arbiter restart

    Services disappearing from thruk on config change and arbiter restart

    We're having a strange issue on our current production Shinken servers. When restarting the arbiter to take into account a configuration modification, some services might not show up on Thruk. If I restart the arbiter a second time, all is well.

    I managed to double check that: In the arbiterd.log, I always get the right number of services ("[1435131526] INFO: [Shinken] Checked 1238 services") If I dump the shinken configuration as a json file to parse it myself, I do get the right service count each time.

    It seems only Thruk/Livestatus has issues

    Version information: Shinken 2.4 Thruk 1.88-4 Latest livestatus module

    What information can I provide to help ? Note: I'm not afraid of getting my hands in the code a bit.

    opened by mobarre 51
  • Broker/LiveStatus + Thruk eat memory for breakfast

    Broker/LiveStatus + Thruk eat memory for breakfast

    Trying to load the "Event Log" (or "Trends" and the likes) in Thruk results in shinken-brokerd progressively exhausting the system memory and swap. Total size of the livestatus.db + archives files is 3.4GB, system has 4GB RAM and 1GB swap. Left on its own, the broker will eventually get killed by oom-killer. It looks like all data from the livestatus files are loaded in memory before doing any filtering. This cannot possibly work on production systems. Releases prior to 1.0 did not show this behavior.

    Livestatus Feature 
    opened by ghost 46
  • acknowledged satellite's service sometimes doubles

    acknowledged satellite's service sometimes doubles

    Hi,

    I am using Shinken 1.2.0 with its livestatus module and Thruk 1.5.0. There are 2 additional realms / satellites, each of them has its own broker, reactionner, scheduler and poller. The realms are not a member of the default-realm, I have 3 independent realms. Most of the checks are done by these 2 satellites, the shinken with the arbiter is mainly used for configuration and checking the satellites.

    Now there are some critical services reported by the satellites so I acknowledged them and wrote a comment. Up to this point everything works fine. Then I do some configuration (for example adding a new host) and reload the arbiter. Sometimes it happens that some of the acknowledged services suddenly appear twice in Thruk. I thought about a Thruk issue so I used telnet to connect via livestatus, it also reports the acknowledged service twice.

    The only difference between these 2 services (which should be one) is the last/next Check time. The one service seems to be ok and gets checked every X minutes like defined with the check_interval. The defunct service doesn't change any of its values, the last check time is about the time of the arbiter-reload, the next check time is the time of the arbiter-reload + a few minutes. This defunct service only disappears if I restart shinken on the satellite where the defunct service belongs to.

    I already search for similar issues but I could not find any, also I looked at the logs but I can't see anything that looks different to normal behaviour. At the moment I am trying to reproduce this issue to prove some screenshots or more detailed description.

    Bug 
    opened by Finn10111 36
  • Debian 8 : many connection errors to shinken daemons

    Debian 8 : many connection errors to shinken daemons

    I often have many connection errors with a fresh install on Debian 8. It looks like #1643 but I applied the fix proposed by @naparuba here : https://github.com/naparuba/shinken/commit/e8d929c5aa8fb0e9e8b524919997715640335485

    Thoses errors mainly concern poller and reactionner, but sometimes it is the scheduler.

    [1434302669] INFO: [broker-master] Connection problem to the scheduler scheduler-master: Connection error 1 to http://localhost:7768/ : The requested URL returned error: 404 Not Found
    
    ...
    
    [1434302791] WARNING: [broker-master] Connection problem to the poller poller-master: Connection error 1 to http://localhost:7771/ : Operation timed out after 120004 milliseconds with 0 bytes received
    [1434302794] WARNING: [broker-master] Connection problem to the reactionner reactionner-master: Connection error 1 to http://localhost:7769/ : Operation timed out after 3000 milliseconds with 0 bytes received
    [1434302797] WARNING: [broker-master] Connection problem to the receiver receiver-master: Connection error 1 to http://localhost:7773/ : Operation timed out after 3001 milliseconds with 0 bytes received
    [1434302798] INFO: [broker-master] Connection OK to the scheduler scheduler-master
    [1434302798] INFO: [broker-master] Connection OK to the poller poller-master
    [1434302798] INFO: [broker-master] Connection OK to the reactionner reactionner-master
    [1434302798] INFO: [broker-master] Connection OK to the receiver receiver-master
    
    
    opened by mohierf 35
  • Worldmap plugin for Web UI

    Worldmap plugin for Web UI

    Hi,

    I played with the Worldmap plugin in the WebUI and made some modifications in the source code but there is a problem in the map's control and infowindow rendering with the WebUI.

    Please see attached screen capture. How it should be captureok

    How it is ... control is almost not viewable and the rendering of the infoview window is incomplete. captureko

    Best regards, Fred.

    PS: Sorry but I am quite a newbie in using tools like Github and I do not know how I should upload my source code ... please explain and I will try ! I do not dare testing by myself because I do not want to delete or break some existing files ...

    opened by mohierf 35
  • Enh: Memory leak hunting

    Enh: Memory leak hunting

    This patch introduces several features to protect services from memory leaks in big environments, including queues throttling options, memory watchdog and graceful restart.

    Queues throttling

    In big environments where pollers or reactionners manage many concurrent actions, a synchronization effect may appear where a single poller or reactionner gets all the scheduled actions at once. The result is a single machine drawing almost all the workload, the other doing almost nothing.

    This may lead to a chain reaction described below (let's illustrate it with a poller, it's the same for reactionners).

    • A poller gets more actions it can manage.
    • Under high load, the concurrency creates a contention on the internal queues.
    • When contention lowers (because actions get executed), big results lists are returned to the scheduler.
    • The scheduler has to deserialize many objects at once to integrate them. As the python interpreter does not correctly free the used memory, this may generate a memory leak: object are deserialized into memory, processed, then deleted, but the used memory is not really freed by the interpreter.
    • Each integrated result generates a new brok, which creates huge broks lists.
    • The broker fetching all the broks at once, it has to deserialize huge broks lists, involving the same effect as on the scheduler.

    The higher the load, the worst the effect.

    In addition, if results are too long to come back, the actions get rescheduled which aggravate the phenomenon. This may lead to machines crash due to memory exhaustion.

    To prevent this situation, throttling options have been added to control the number of checks a poller or reactionner may get, the number of results it may return, and the number of broks a broker may get. They are described below.

    Poller and reactionner options

    • max_q_size: defines the maximum number of actions (slots) a poller (or reactionner) may hold.
    • q_factor: same option as the previous one, but the number of available slots is calculated from the formula workers * processes_by_worker * q_factor, so it may automatically adapt to the number of CPU cores.
    • results_batch: defines the maximum number of results a poller or reactionner may return at once.

    Broker options

    • broks_batch: defines the maximum number of broks a broker may get at once.

    Graceful restart

    Another memory leak comes when new configuration is sent from the arbiter. As the configuration is made of many objects linked by circular references, they are not properly garbage collected. To release this memory, the only solution is to restart the service.

    This patch introduces the graceful restart. If enabled, when a new configuration is received by a daemon, it forks a new daemon and send it the received configuration on stdin. Next it exits to let the newly spawned instance replace it. This avoids having to wait for a new configuration after a classical restart.

    Graceful restart can be enabled by setting the graceful_enabled to 1 in the daemon configuration.

    Memory watchdog

    To protect the machines from crashing because of memory exhaustion if the previous options are not or badly set, a memory watchdog may be activated with the harakiri_threshold in the services configuration. It automatically restarts the services if their memory goes above the threshold.

    Actions priority

    The patch also introduces the checks, event handlers and notifications priority. The commands may be added the priority attribute. When daemons ask the scheduler for actions, it will send the actions with the lowest priority first. This allows to guaranties that important actions will get executed first.

    Note: by default, the options above are not set, and the services operate as usual.

    Note2: this patch allows to manage slightly bigger configuration on a single scheduler. It is no more necessary to spawn many scheduler to spread the objects.

    Note3: some of the internal data structure had to be changed to guarantee execution order.

    opened by geektophe 34
  • Arbiter doesn't write logs in its initial phase

    Arbiter doesn't write logs in its initial phase

    Hi,

    I am not able to debug any of the arbiter daemons that I have tested (mod-mysql-import and dummy_arbiter). It's like if the logger class wasn't (from shinken.log import logger) doing anything. Also, if I write a syntax error in the module, isn't cached by the arbiter, I am completely blind about anything inside the module.

    I restart shinken with -d and checking "-debug" logs but i can't see anything.

    In order to debug the modules, i had to write the following function and use it instead of logger.*. It's the only way to see what's happening inside the module.

    def log2file(log):
        with open('/tmp/shinken-module-log', 'a+') as fd:
            fd.write(log + "\n")
    

    Am I doing something wrong? Is there any kind of stdout/stderr redirection that I am missing?

    opened by dgilm 33
  • Timeperiod exclusions don't work properly with a single weekday exclusion

    Timeperiod exclusions don't work properly with a single weekday exclusion

    Here is a simple example of the bug.

    Occuring with that two timeperiods definitions (in timeperiod.cfg):

    define timeperiod {
        timeperiod_name             my_check_period
        alias                       my_check_period
        2013-03-01 - 2020-03-01     00:00-24:00
        exclude                     tuesday
    }
    
    define timeperiod {
        timeperiod_name             tuesday
        alias                       tuesday
        tuesday                     00:00-24:00
    }
    

    With those two definition, we was Tuesday 26/03/2013 and the next check was scheduled for the Wednesday 03/04/2013 00:00:01 instead of the Wednesday 27/03/2013 00:00:01.

    So the were a 1 week gap with the suposed-to-be-good next check time.

    I make it work adding all the "void" weekdays in the tuesday timeperiod, as follow :

    define timeperiod {
        timeperiod_name             new_tuesday
        alias                       new_tuesday
        monday                      00:00-00:00
        tuesday                     00:00-24:00
        wednesday                   00:00-00:00
        thursday                    00:00-00:00
        friday                      00:00-00:00
        saturday                    00:00-00:00
        sunday                      00:00-00:00
    }
    

    Hope it can help !

    Bug 
    opened by BusyBusinessCat 33
  • Shinken won't restart Arbiter

    Shinken won't restart Arbiter

    Hello,

    I've a problem with my shinken-arbiter, I don't know why but it won't restart correctly. Here is the output of log :

    [1430388835] INFO: [Shinken] I am the master Arbiter: arbiter-master [1430388835] INFO: [Shinken] My own modules: import-glpi,ws-arbiter [1430388835] INFO: [Shinken] Modules directory: /var/lib/shinken/modules [1430388835] INFO: [Shinken] Modules directory: /var/lib/shinken/modules [1430388835] INFO: [Shinken] [GLPI Arbiter] Get a Simple GLPI arbiter for plugin import-glpi [1430388835] INFO: [Shinken] [WS_Arbiter] Configuration done, host: 192.168.1.23(7760), username: anonymous) [1430388835] INFO: [Shinken] Trying to init module: import-glpi [1430388835] INFO: [Shinken] [GLPI Arbiter] I open the GLPI connection to http://localhost/glpi/plugins/webservices/xmlrpc.php [1430388835] INFO: [Shinken] [GLPI Arbiter] Connection opened [1430388835] INFO: [Shinken] [GLPI Arbiter] Authentication in progress [1430388835] INFO: [Shinken] [GLPI Arbiter] Authenticated, session : 0khjnhftp53r9mae5g8gcc3ta1 [1430388835] INFO: [Shinken] I correctly loaded the modules: [import-glpi,ws-arbiter] [1430388836] INFO: [Shinken] [GLPI Arbiter] Returning all data to Arbiter [1430388836] CRITICAL: [Shinken] I got an unrecoverable error. I have to exit. [1430388836] CRITICAL: [Shinken] You can get help at https://github.com/naparuba/shinken [1430388836] CRITICAL: [Shinken] If you think this is a bug, create a new ticket including details mentioned in the README [1430388836] CRITICAL: [Shinken] Back trace of the error: Traceback (most recent call last): File "/usr/local/lib/python2.7/dist-packages/shinken/daemons/arbiterdaemon.py", line 594, in main self.load_config_file() File "/usr/local/lib/python2.7/dist-packages/shinken/daemons/arbiterdaemon.py", line 365, in load_config_file self.conf.create_objects(raw_objects) File "/usr/local/lib/python2.7/dist-packages/shinken/objects/config.py", line 717, in create_objects self.create_objects_for_type(raw_objects, t) File "/usr/local/lib/python2.7/dist-packages/shinken/objects/config.py", line 736, in create_objects_for_type o = cls(obj_cfg) File "/usr/local/lib/python2.7/dist-packages/shinken/objects/item.py", line 96, in init val = self.properties[key].pythonize(params[key]) File "/usr/local/lib/python2.7/dist-packages/shinken/property.py", line 182, in pythonize return to_int(val) File "/usr/local/lib/python2.7/dist-packages/shinken/util.py", line 234, in to_int return int(float(val)) ValueError: could not convert string to float:

    That's strange, because before he restart without problem. So he does not want to talk glpi and take the new services created. I think the problem is Shinken, that's why I post here.

    Here is my config : Shinken 2.2 Debian 7

    If you want more information, I can give you.

    Thanks for help.

    opened by algorys 32
  • Migration from Mootools to jQuery & Twitter Bootstrap

    Migration from Mootools to jQuery & Twitter Bootstrap

    1 Login

    • CSS Grid migration -> done
    • iPhone / iPad password effect -> done

    2 Eltdetail

    • CSS Grid migration -> done
    • All Action Buttons -> done -- JavaScript -> done -- CSS Style -> done
    • Tab menu -> done
    • 'SOLVE THIS' highlight box, pulsate effect -> done
    • Switch Buttons -> done
    • jQuery Tooltip -> done
    • Growl like notification --> meow done
    • Gesture canvas -> done

    3 Impact

    • CSS Grid migration
    • js migration -> done

    4 System

    • CSS migration
    • System Overview -- jQuery Tooltip -> done
    • Log

    5 Problems

    • CSS Grid migration * done* (even on IE!)
    • Pagination -> done
    • js migration -> done
    • get back perfometers and graphs from master --> done

    6 Dashboard

    • Drag n Drop *done!!!!

    98 Global

    • state image style -> done
    • global menu --> done
    • global search form --> done
    • personal welcomer --> done
    • refresh do not update if backend is not ready --> done
    • more icons: Add Font Awesome --> done

    99 Style Guidelines

    opened by Frescha 29
  • Upload Shinken 2.4.4 in pypi

    Upload Shinken 2.4.4 in pypi

    Hi,

    I saw that version 2.4.4 of shinken was recently released. And it's an exciting new after more than 6 years without any update.

    However, the version 2.4.4 is still not yet available on pypi: https://pypi.org/project/Shinken/#history Thus, it is impossible to upgrade through pip.

    Would it be possible to upload version 2.4.4 on pypi?

    Thank you

    opened by couloum 0
  • generic service template without host_name/service_description definition

    generic service template without host_name/service_description definition

    Link to this patch https://github.com/naparuba/shinken/commit/d680640dcca5aadf25bef613b1c337be6df89a90

    I upgraded to shinken 2.4.4 and has issue loading arbiter:

    [1668706565] CRITICAL: [Shinken] Back trace of the error: Traceback (most recent call last):
      File "/usr/local/lib/python2.7/dist-packages/Shinken-2.4.4-py2.7.egg/shinken/daemons/arbiterdaemon.py", line 633, in main
        self.load_config_file()
      File "/usr/local/lib/python2.7/dist-packages/Shinken-2.4.4-py2.7.egg/shinken/daemons/arbiterdaemon.py", line 361, in load_config_file
        self.conf.create_objects(raw_objects)
      File "/usr/local/lib/python2.7/dist-packages/Shinken-2.4.4-py2.7.egg/shinken/objects/config.py", line 1117, in create_objects
        self.create_objects_for_type(raw_objects, t)
      File "/usr/local/lib/python2.7/dist-packages/Shinken-2.4.4-py2.7.egg/shinken/objects/config.py", line 1141, in create_objects_for_type
        setattr(self, prop, clss(lst, initial_index, self.conflict_policy))
      File "/usr/local/lib/python2.7/dist-packages/Shinken-2.4.4-py2.7.egg/shinken/objects/item.py", line 741, in __init__
        self.add_items(items, index_items)
      File "/usr/local/lib/python2.7/dist-packages/Shinken-2.4.4-py2.7.egg/shinken/objects/item.py", line 763, in add_items
        self.add_template(i)
      File "/usr/local/lib/python2.7/dist-packages/Shinken-2.4.4-py2.7.egg/shinken/objects/service.py", line 1359, in add_template
        tpl = self.index_template(tpl)
      File "/usr/local/lib/python2.7/dist-packages/Shinken-2.4.4-py2.7.egg/shinken/objects/item.py", line 853, in index_template
        tpl = self.manage_conflict(tpl, name)
      File "/usr/local/lib/python2.7/dist-packages/Shinken-2.4.4-py2.7.egg/shinken/objects/item.py", line 808, in manage_conflict
        objname = "%s/%s" % (item.host_name, item.service_description)
    AttributeError: host_name
    

    If service template definition has no host_name definition or service_description, the start of arbiter is crashing.

    I use try/except print the item and it was a generic template service without setting host_name/service_description like :

    define service {
            name                    my_service
            alias                   my-service
            use                     generic-service
            resultmodulations       unknown_is_critical
            contact_groups          admins
            check_interval          8
            retry_interval          5
            max_check_attempts      3
            register                0
    }
    

    For me, it is useful, the arbiter should register this service object in this case of inheritance definition like it was in 2.4.3.

    opened by edefaria 1
  • Install failing

    Install failing

    Install failing

    python3 ./setup.py install File "./setup.py", line 124

    except KeyError, exp:
                   ^
    

    SyntaxError: invalid syntax

    opened by gunner4361 0
  • use nodeset to concat host

    use nodeset to concat host

    Hi all,

    I would like to incoporate the nodeset module python from clustershell provided by CEA HPC in shinken: https://github.com/cea-hpc/clustershell

    (cluster)-[root@admin ~]$ yum info clustershell
    Loaded plugins: auto-update-debuginfo, etckeeper, fastestmirror, langpacks, versionlock
    Repodata is over 2 weeks old. Install yum-cron? Or run: yum makecache fast
    Loading mirror speeds from cached hostfile
     * epel: mirror.in2p3.fr
     * epel-debuginfo: mirror.in2p3.fr
    Installed Packages
    Name        : clustershell
    Arch        : noarch
    Version     : 1.8.4
    Release     : 1.el7
    Size        : 324 k
    Repo        : installed
    From repo   : Local-EPEL-7
    Summary     : Python framework for efficient cluster administration
    URL         : http://cea-hpc.github.io/clustershell/
    License     : LGPLv2+
    Description : ClusterShell is a set of tools and a Python library to execute commands
                : on cluster nodes in parallel depending on selected engine and worker
                : mechanisms. Advanced node sets and node groups handling methods are provided
                : to ease and improve the daily administration of large compute clusters or
                : server farms. Command line utilities like clush, clubak and nodeset (or
                : cluset) allow traditional shell scripts to take benefit of the features
                : offered by the library.
    

    This will allow us to translate something like that to:

    define hostgroup {
      hostgroup_name  node
      members         node0,node1,node2,node3,node4,node5,node6,node7,node8,node9,node10
    }
    

    to something like that:

    define hostgroup {
      hostgroup_name  node
      members         node[0-10]
    }
    

    Someone can help me :)

    opened by garadar 1
  • Fin de Shinken ?

    Fin de Shinken ?

    Bonsoir , J'ai essayé en vain d'installer Shinken mais j'ai rencontré beaucoup de problèmes comme souvent évoqués sur le forum. Bref cela ne fonctionne pas ....

    • python 3 pas pris en compte
    • pycurl ne s'installe pas dans le bon repertoire sans parler de la version à prendre en compte
    • setuptools j'ai quelques doutes sur l'installation
    • je ne vous parle pas des exigences conditionnelles Je suis sous debian je me débrouille mais je suis loin d'être un spécialiste dans le domaine. ma question est donc plutôt simple Plutôt que de persister sur cette installation pouvez vous me dire si ce paquet est obsolète?
    • si oui avez vous des alternatives d'applications similaires
    • si non avez vous une description d'installation shinken qui fonctionne ( autre que celle décrite dans "read me" github) Merci de votre retour François
    opened by FrancoisT44 4
Owner
Gabès Jean
CEO at Shinken Solutions Shinken & OpsBro project leader Python and C lover Father of 3 ♥
Gabès Jean
More than 130 check plugins for Icinga and other Nagios-compatible monitoring applications. Each plugin is a standalone command line tool (written in Python) that provides a specific type of check.

Python-based Monitoring Check Plugins Collection This Enterprise Class Check Plugin Collection offers a package of more than 130 Python-based, Nagios-

Linuxfabrik 119 Dec 27, 2022
Iris is a highly configurable and flexible service for paging and messaging.

Iris Iris core, API, UI and sender service. For third-party integration support, see iris-relay, a stateless proxy designed to sit at the edge of a pr

LinkedIn 715 Dec 28, 2022
A cron monitoring tool written in Python & Django

Healthchecks Healthchecks is a cron job monitoring service. It listens for HTTP requests and email messages ("pings") from your cron jobs and schedule

Healthchecks 5.8k Jan 2, 2023
framework providing automatic constructions of vulnerable infrastructures

中文 | English 1 Introduction Metarget = meta- + target, a framework providing automatic constructions of vulnerable infrastructures, used to deploy sim

rambolized 685 Dec 28, 2022
DataOps framework for Machine Learning projects.

Noronha DataOps Noronha is a Python framework designed to help you orchestrate and manage ML projects life-cycle. It hosts Machine Learning models ins

null 52 Oct 30, 2022
A tool to convert AWS EC2 instances back and forth between On-Demand and Spot billing models.

ec2-spot-converter This tool converts existing AWS EC2 instances back and forth between On-Demand and 'persistent' Spot billing models while preservin

jcjorel 152 Dec 29, 2022
Let's learn how to build, release and operate your containerized applications to Amazon ECS and AWS Fargate using AWS Copilot.

?? Welcome to AWS Copilot Workshop In this workshop, you'll learn how to build, release and operate your containerised applications to Amazon ECS and

Donnie Prakoso 15 Jul 14, 2022
pyinfra automates infrastructure super fast at massive scale. It can be used for ad-hoc command execution, service deployment, configuration management and more.

pyinfra automates/provisions/manages/deploys infrastructure super fast at massive scale. It can be used for ad-hoc command execution, service deployme

Nick Barrett 2.1k Dec 29, 2022
Software to automate the management and configuration of any infrastructure or application at scale. Get access to the Salt software package repository here:

Latest Salt Documentation Open an issue (bug report, feature request, etc.) Salt is the world’s fastest, most intelligent and scalable automation engi

SaltStack 12.9k Jan 4, 2023
Simple, Pythonic remote execution and deployment.

Welcome to Fabric! Fabric is a high level Python (2.7, 3.4+) library designed to execute shell commands remotely over SSH, yielding useful Python obje

Fabric 13.8k Jan 6, 2023
This repository contains code examples and documentation for learning how applications can be developed with Kubernetes

BigBitBus KAT Components Click on the diagram to enlarge, or follow this link for detailed documentation Introduction Welcome to the BigBitBus Kuberne

null 51 Oct 16, 2022
Define and run multi-container applications with Docker

Docker Compose Docker Compose is a tool for running multi-container applications on Docker defined using the Compose file format. A Compose file is us

Docker 28.2k Jan 8, 2023
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows

Apache Airflow Apache Airflow (or simply Airflow) is a platform to programmatically author, schedule, and monitor workflows. When workflows are define

The Apache Software Foundation 28.6k Jan 1, 2023
Ajenti Core and stock plugins

Ajenti is a Linux & BSD modular server admin panel. Ajenti 2 provides a new interface and a better architecture, developed with Python3 and AngularJS.

Ajenti Project 7k Jan 3, 2023
Ganeti is a virtual machine cluster management tool built on top of existing virtualization technologies such as Xen or KVM and other open source software.

Ganeti 3.0 =========== For installation instructions, read the INSTALL and the doc/install.rst files. For a brief introduction, read the ganeti(7) m

null 395 Jan 4, 2023
Glances an Eye on your system. A top/htop alternative for GNU/Linux, BSD, Mac OS and Windows operating systems.

Glances - An eye on your system Summary Glances is a cross-platform monitoring tool which aims to present a large amount of monitoring information thr

Nicolas Hennion 22k Jan 8, 2023