Diamond is a python daemon that collects system metrics and publishes them to Graphite (and others). It is capable of collecting cpu, memory, network, i/o, load and disk metrics. Additionally, it features an API for implementing custom collectors for gathering metrics from almost any source.

Last update: Jan 5, 2023

Related tags

Monitoring Diamond

Overview

Diamond

Diamond is a python daemon that collects system metrics and publishes them to Graphite (and others). It is capable of collecting cpu, memory, network, i/o, load and disk metrics. Additionally, it features an API for implementing custom collectors for gathering metrics from almost any source.

Getting Started

Steps to getting started:

Read the documentation
Install via pip install diamond. The releases on GitHub are not recommended for use. Use pypi-install diamond on Debian/Ubuntu systems with python-stdeb installed to build packages.
Copy the diamond.conf.example file to diamond.conf.
Optional: Run diamond-setup to help set collectors in diamond.conf.
Modify diamond.conf for your needs.
Run diamond with one of: diamond or initctl start diamond or /etc/init.d/diamond restart.

Success Stories

Diamond has successfully been deployed to a cluster of 1000 machines pushing 3 million points per minute.
Diamond is deployed on Fabric's infrastructure, polling hundreds of metric sources and pushing millions of points per minute.
Have a story? Please share!

Repos

Historically Diamond was a brightcove project and hosted at BrightcoveOS. However none of the active developers are brightcove employees and so the development has moved to python-diamond. We request that any new pull requests and issues be cut against python-diamond. We will keep BrightcoveOS updated and still honor issues/tickets cut on that repo.

Diamond Related Projects

Related Projects

Contact

IRC #python-diamond on freenode. Webchat
Mailing List [email protected] - Email the list and you will automatically subscribe. Archive

Comments

Removing dev/proc/sys mount restriction from DiskSpaceCollector

We're trying to collect data about a tmpfs mount at /dev and under /sys and diskspace.py will skip right over those mount points. There's no comment as to why this is done, unlike the previous checks, and given issue#262 I wouldn't be surprised if the original purpose has been lost. We aren't seeing any issue when we manually do the heavy lifting:

>>> def test(string):
...     stat = os.stat(string)
...     major = os.major(stat.st_dev)
...     minor = os.minor(stat.st_dev)
...     print stat
...     print major
...     print minor
...
>>> test('/dev')
posix.stat_result(st_mode=16877, st_ino=1025, st_dev=5L, st_nlink=14, st_uid=0, st_gid=0, st_size=4280, st_atime=1456860792, st_mtime=1460050361, st_ctime=1460050361)
0
5
>>> test('/run')
posix.stat_result(st_mode=16877, st_ino=1212, st_dev=16L, st_nlink=20, st_uid=0, st_gid=0, st_size=700, st_atime=1456868938, st_mtime=1460151861, st_ctime=1460151861)
0
16

This check should be removed if it doesn't have a purpose. If it does then there should at least be a comment or abstraction to explain why this restriction is in place.

To give some context, these are our tmpfs mount locations that would be great to monitor:

$ sudo cat /proc/mounts | grep tmpfs
udev /dev devtmpfs rw,relatime,size=24650316k,nr_inodes=6162579,mode=755 0 0
tmpfs /run tmpfs rw,nosuid,noexec,relatime,size=4932236k,mode=755 0 0
none /sys/fs/cgroup tmpfs rw,relatime,size=4k,mode=755 0 0
none /run/lock tmpfs rw,nosuid,nodev,noexec,relatime,size=5120k 0 0
none /run/shm tmpfs rw,nosuid,nodev,relatime 0 0
none /run/user tmpfs rw,nosuid,nodev,noexec,relatime,size=102400k,mode=755 0 0

type: enhancement category: collector collector: diskspace

opened by JScott 31

Add regex bean matching, and regex key search/replace to normalize names

This patch adds two more config items to the JolokiaCollector.conf mbeansre: works like mbeans: but matches with regularexpressions rewrite: which allows rewrite pairs in order to rename collected keys before data is passed to the handler

opened by sbrynen 23
python-2.4 is not working

The docs claim that python-2.4 is supported (eg on RHEL5) but diamond fails with:

ERROR: Failed to set UID/GID. 'module' object has no attribute 'initgroups'

/usr/bin/diamond:204: os.initgroups(pwd.getpwuid(uid).pw_name, gid)
type: bug

opened by bhepple 18
Update InfluxDBHandler for post InfluxDB 0.9
Fix for ticket #297 where InfluxDBHandler was formatting metrics based on an older version of influxdb-python

Maintains support for InfluxDB version 0.8 by way of an additional 'influxdb_version' attribute in the config.

Reformats measurement schema to be more useful for InfluxDB 0.9's removal of merge and joins.

Adds several test cases for the handler.

type: enhancement category: handler needs: rebase handler: influxdb
opened by cj-dimaggio 17
Kafka Collector error with urllib2

Hi, I try connect kafka collection with kafka 0.8.2.

[2015-07-17 11:44:56,298] [MainThread] '' Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/diamond/collector.py", line 472, in _run self.collect() File "/usr/share/diamond/collectors/kafkastat/kafkastat.py", line 163, in collect match = self.get_mbeans(pattern) File "/usr/share/diamond/collectors/kafkastat/kafkastat.py", line 84, in get_mbeans mbeans = self._get('/serverbydomain', query_args) File "/usr/share/diamond/collectors/kafkastat/kafkastat.py", line 70, in _get response = urllib2.urlopen(url) File "/usr/lib64/python2.7/urllib2.py", line 127, in urlopen return _opener.open(url, data, timeout) File "/usr/lib64/python2.7/urllib2.py", line 404, in open response = self._open(req, data) File "/usr/lib64/python2.7/urllib2.py", line 422, in _open '_open', req) File "/usr/lib64/python2.7/urllib2.py", line 382, in _call_chain result = func(*args) File "/usr/lib64/python2.7/urllib2.py", line 1216, in http_open return self.do_open(httplib.HTTPConnection, req) File "/usr/lib64/python2.7/urllib2.py", line 1189, in do_open r = h.getresponse(buffering=True) File "/usr/lib64/python2.7/httplib.py", line 1045, in getresponse response.begin() File "/usr/lib64/python2.7/httplib.py", line 409, in begin version, status, reason = self._read_status() File "/usr/lib64/python2.7/httplib.py", line 373, in _read_status raise BadStatusLine(line) BadStatusLine: ''

Any idea?
status: fix-provided collector: kafka

opened by maauso 17
Current release "has problems".

I had heard good things about Diamond, and I'm not that happy with collectd, so I thought I'd give it a look.

First problem, there are absolutely no docs. No problem I thought, I'll write a GettingStarted and submit a pullreq for it. Except that I could never get started myself.

I notice there is a "debian" subdirectory, so I try a debuild, and find that the 4.0 release produces a debian file named "3.1.0". Submitted a PR.

Then I start up Diamond after copying the example config over to diamond.conf and I'm getting a Traceback that the handlers list object has no split(',') method. Ooook, so that's supposed to be a string instead? I quote it in the config file, but then I'm running into more problems.

I finally decide to grab the latest code and see if that worked without quotes on the handlers line. It does and at least I have it writing to the archive handler.

Buuut... The Influxdb handler doesn't seem to be doing anything. No logs I can see to say why, just silently isn't doing anything.

Recommendations:

You probably at least need to tag a new release, fix the debian/changelog file, and put out some sort of docs.
type: bug category: collector status: fix-provided handler: influxdb

opened by linsomniac 15
Diamond hemorrhages memory when Graphite server is inaccessible

I recently ran into a situation on one of my hosts where Diamond 4.0 series was consuming over 3 GB of memory after being unable to connect to Graphite for a couple of hours -- RES in htop appears to grow by 2-3 MB per minute. I'm not sure if this is an issue with unbounded write buffer growth or leaked objects in connection handling, but it presents a serious threat to system stability.
type: bug

opened by jgoldschrafe 15
Add Basic/Shield auth to elasticsearch collector

Current implementation of Elasticsearch collector has no ability to authenticate against Shield plugin for Elasticsearch, so I've added optional parameters to authenticate with. I'm using this version of collector in prod right now and it works just fine.
type: enhancement category: collector collector: elasticsearch

opened by okushchenko 14
TSDB basic authorization, gzip, batch, prefix

Added some features to the TSDB Handler basic authorization: simple header with user and password for firewall protection gzip: added gzip support to compress metrics batch: added support to send metrics in batch prefix: you can add a prefix to all you metrics like diamond.myhostname.cpu.cpu_count

All these features can be disabled by not defining or setting a value lower than 1.

Sending metrics in batch can give you quite a nice performance boost and lower you cpu load. Compressing will reduce the buffer size you need to configure on the tsdb end.

On the negative side this will break the recently added tests and I was not able to recreate them. I would need some help as there is currently no handler (using urllib2) with tests. But I didn't want to keep these changes for my self. This graph shows you how long it took to send a metric to the db.
type: enhancement handler: tsdb

opened by Grotax 13
Allow precision to be set in nginx collector
This patch allows us to set precision in the Nginx collector's config file. Eg:

enabled = True precision = 2 req_port = 9080

Changing the precision from 0 to 2 resulted in this change on a testing system when viewing the data in Grafana:

Note that the number of requests/sec hasn't changed, just the precision config value.

When you hover over a datapoint in Grafana:

Some point in time before the patch:

After the patch and changing the config value (with requests coming in at the same rate):

type: enhancement category: collector collector: nginx
opened by scottcunningham 13
SNMP collector not working

[root@graphite collectors]# cat SNMPInterfaceCollector.conf enabled = True path_suffix = "" retries = 3 measure_collector_time = False byte_unit = byte timeout = 5

path = interface interval = 60

[devices] [fw01]] host = 192.168.1.1 port = 161 community = public

here goes logs

Traceback (most recent call last): File "/usr/lib/python2.6/site-packages/diamond/utils/scheduler.py", line 70, in collector_process collector._run() File "/usr/lib/python2.6/site-packages/diamond/collector.py", line 472, in _run self.collect() File "/usr/lib/python2.6/site-packages/diamond/collector.py", line 366, in collect raise NotImplementedError() NotImplementedError [2015-07-08 13:56:56,318] [MainThread] Collector failed! Traceback (most recent call last): File "/usr/lib/python2.6/site-packages/diamond/utils/scheduler.py", line 70, in collector_process collector._run() File "/usr/lib/python2.6/site-packages/diamond/collector.py", line 472, in _run self.collect() File "/usr/lib/python2.6/site-packages/diamond/collector.py", line 366, in collect raise NotImplementedError() NotImplementedError [2015-07-08 13:56:58,332] [MainThread] Collector failed! Traceback (most recent call last): File "/usr/lib/python2.6/site-packages/diamond/utils/scheduler.py", line 70, in collector_process collector._run() File "/usr/lib/python2.6/site-packages/diamond/collector.py", line 472, in _run self.collect() File "/usr/lib/python2.6/site-packages/diamond/collector.py", line 366, in collect raise NotImplementedError() NotImplementedError
type: bug category: collector status: fix-provided

opened by harishva 13
fix counter metric spike using statsD handler
Current implementation in statsD handler may work fine for general use case where the counter metric is sent from application directly to statsD (push), but in our prod env we have another stats application which pulls metrics from multiple applications and send to statsD, similar to how Promethus works (pull), and thus it will cause spike in the first data point every time when this stats application starts because the difference calculated automatically will be wrong if the actual application is already running for long time

The fix is to adding an option for the caller to manually override the counter value using the value attribute but the internal update to the old value map will still use the raw value attribute
opened by cdong8812 0
[wip] Python 3
It felt kinda weird that there was absolutely nothing in this repo about how to run Diamond using Python 3. Like many others I recently upgraded to Ubuntu 20.04 which does not ship with a fully formed python 2.7 environment so I gave a stab at making it work with Python 3.

This is very much a WIP but I got it running fully for our setup.

How to install for you:

pip3 install distro pip3 install git+https://github.com/feederco/Diamond.git@d8429765009fdf115c85aca09a5f2c1b570f8078

Since platform.distro() was deprecated in 3.8 which setup.py relied on it now uses distro, which is a pip module that needs to be installed before running pip install diamond.

Todo:

Tests still not ported

So far only tested in production with MySQL collector and hostedgraphite and archive handler

This branch is all-out p3k, so we'd need to figure out how to interop these two languages, or tbh just make a fork, or keep python2 support in a branch, because python 2 seems pretty ded

References #396
type: bug type: py3k
opened by erkie 4
Unable lo load second handler metrices

Hi Team,

i am having one handler in test-agent.conf, when i trying to add another andler [[TestHandler2]] the metrices is not loading can some one help me on this

[handlers]

daemon logging handler(s)

keys = rotated_file

Defaults options for all Handlers

[[default]]

[[TestHandler]]

abc URL to post the metrics

url = abac

abc Datasource api key

api_key = *******

opened by rbellary-vi 0

diamond fails to run at boot with "Name or service not known"

On CentOS7 using diamond-4.0.515 with the included systemd service file, diamond always fails to run at boot but starts fine by a hand later.

The error in the log is:


[2020-10-23 16:07:50,198] [MainThread] Unhandled exception: [Errno -2] Name or service not known
[2020-10-23 16:07:50,199] [MainThread] traceback: Traceback (most recent call last):
  File "/usr/local/diamond/bin/diamond", line 298, in main
    server.run()
  File "/usr/local/diamond/lib/python2.7/site-packages/diamond/server.py", line 108, in run
    self.handlers = load_handlers(self.config, handlers)
  File "/usr/local/diamond/lib/python2.7/site-packages/diamond/utils/classes.py", line 89, in load_handlers
    h = cls(handler_config)
  File "/usr/local/diamond/lib/python2.7/site-packages/diamond/handler/stats_d.py", line 66, in __init__
    self._connect()
  File "/usr/local/diamond/lib/python2.7/site-packages/diamond/handler/stats_d.py", line 161, in _connect
    port=self.port
  File "/usr/lib/python2.7/site-packages/statsd/client.py", line 139, in __init__
    host, port, fam, socket.SOCK_DGRAM)[0]
gaierror: [Errno -2] Name or service not known

I tried adding the following to [Unit] in /etc/systemd/system/diamond.service but it made no difference

After=network.target

I then changed that to the following and it did work:

After=network.target remote-fs.target nss-lookup.target

needs: patch

opened by paulraines68 2

Problem with GlusterFS

Hi.

I just fought with DiskSpaceCollector because it didn't report the disk usage for a fuse.glusterfs mounted directory.

The first problem (trivial) was that in the default config "gluster" is given instead of "fuse.gluster" . No problem, corrected.

But even after correction the metrics were silently dropped. I found that the if at diskspace.py:153 assumes the "device" starts with a '/'. Too bad, usually mountpoints for gluster use the format srv1[,srv2]:volume_name

It's possible to use /srv1[,srv2]:volume_name (with a leading '/') but I think it's quite uncommon. I made that if a no-op (adding as first condition "(1==1) or" ), but I think it could be better to consider the GlusterFS case and/or give a meaningful message stating why that mountpoint is getting discarded.

HIH

opened by NdK73 1

Owner

GitHub http://diamond.readthedocs.org/

GoAccess is a real-time web log analyzer and interactive viewer that runs in a terminal in *nix systems or through your browser.

GoAccess What is it? GoAccess is an open source real-time web log analyzer and interactive viewer that runs in a terminal on *nix systems or through y

15.6k Jan 2, 2023

Diamond is a python daemon that collects system metrics and publishes them to Graphite (and others). It is capable of collecting cpu, memory, network, i/o, load and disk metrics. Additionally, it features an API for implementing custom collectors for gathering metrics from almost any source.

Related tags

Overview

Diamond

Getting Started

Success Stories

Repos

Diamond Related Projects

Contact

Comments

here goes logs

daemon logging handler(s)

Defaults options for all Handlers

abc URL to post the metrics

abc Datasource api key

Owner

System monitor - A python-based real-time system monitoring tool

Development tool to measure, monitor and analyze the memory behavior of Python objects in a running Python application.

Monitor Memory usage of Python code

Monitor Memory usage of Python code

ASGI middleware to record and emit timing metrics (to something like statsd)

Real-time metrics for nginx server

Exports osu! user stats to prometheus metrics for a specified set of users

Cross-platform lib for process and system monitoring in Python

Glances an Eye on your system. A top/htop alternative for GNU/Linux, BSD, Mac OS and Windows operating systems.

Prometheus instrumentation library for Python applications

Automatically monitor the evolving performance of Flask/Python web services.

Sampling profiler for Python programs

Yet Another Python Profiler, but this time thread&coroutine&greenlet aware.

Line-by-line profiling for Python

🚴 Call stack profiler for Python. Shows you why your code is slow!

Visual profiler for Python

Was an interactive continuous Python profiler.

pprofile + matplotlib = Python program profiled as an awesome heatmap!

GoAccess is a real-time web log analyzer and interactive viewer that runs in a terminal in *nix systems or through your browser.