A cron monitoring tool written in Python & Django

Healthchecks

Last update: Jan 2, 2023

Related tags

DevOps Tools devops django cron monitoring ops cron-jobs

Overview

Healthchecks

Healthchecks is a cron job monitoring service. It listens for HTTP requests and email messages ("pings") from your cron jobs and scheduled tasks ("checks"). When a ping does not arrive on time, Healthchecks sends out alerts.

Healthchecks comes with a web dashboard, API, 25+ integrations for delivering notifications, monthly email reports, WebAuthn 2FA support, team management features: projects, team members, read-only access.

The building blocks are:

Python 3.6+
Django 3
PostgreSQL or MySQL

Healthchecks is licensed under the BSD 3-clause license.

Healthchecks is available as a hosted service at https://healthchecks.io/.

Setting Up for Development

To set up Healthchecks development environment:

Install dependencies (Debian/Ubuntu):

  $ sudo apt-get update
  $ sudo apt-get install -y gcc python3-dev python3-venv libpq-dev

Prepare directory for project code and virtualenv. Feel free to use a different location:
```
  $ mkdir -p ~/webapps
  $ cd ~/webapps
```

Prepare virtual environment (with virtualenv you get pip, we'll use it soon to install requirements):

  $ python3 -m venv hc-venv
  $ source hc-venv/bin/activate
  $ pip3 install wheel # make sure wheel is installed in the venv

Check out project code:

  $ git clone https://github.com/healthchecks/healthchecks.git

Install requirements (Django, ...) into virtualenv:

  $ pip install -r healthchecks/requirements.txt

Create database tables and a superuser account:
```
  $ cd ~/webapps/healthchecks
  $ ./manage.py migrate
  $ ./manage.py createsuperuser
```
With the default configuration, Healthchecks stores data in a SQLite file hc.sqlite in the checkout directory (~/webapps/healthchecks).

To use PostgreSQL or MySQL, see the section Database Configuration section below.
Run tests:
```
  $ ./manage.py test
```
Run development server:
```
  $ ./manage.py runserver
```

The site should now be running at http://localhost:8000. To access Django administration site, log in as a superuser, then visit http://localhost:8000/admin/

Configuration

Healthchecks reads configuration from environment variables.

Full list of configuration parameters.

Accessing Administration Panel

Healthchecks comes with Django's administration panel where you can manually view and modify user accounts, projects, checks, integrations etc. To access it,

if you haven't already, create a superuser account: ./manage.py createsuperuser
log into the site using superuser credentials
in the top navigation, "Account" dropdown, select "Site Administration"

Sending Emails

Healthchecks must be able to send email messages, so it can send out login links and alerts to users. Specify your SMTP credentials using the following environment variables:

EMAIL_HOST = "your-smtp-server-here.com"
EMAIL_PORT = 587
EMAIL_HOST_USER = "smtp-username"
EMAIL_HOST_PASSWORD = "smtp-password"
EMAIL_USE_TLS = True

For more information, have a look at Django documentation, Sending Email section.

Receiving Emails

Healthchecks comes with a smtpd management command, which starts up a SMTP listener service. With the command running, you can ping your checks by sending email messages to [email protected] email addresses.

Start the SMTP listener on port 2525:

$ ./manage.py smtpd --port 2525

Send a test email:

$ curl --url 'smtp://127.0.0.1:2525' \
    --mail-from '[email protected]' \
    --mail-rcpt '[email protected]' \
    -F '='

Sending Status Notifications

healtchecks comes with a sendalerts management command, which continuously polls database for any checks changing state, and sends out notifications as needed. Within an activated virtualenv, you can manually run the sendalerts command like so:

$ ./manage.py sendalerts

In a production setup, you will want to run this command from a process manager like supervisor or systemd.

Database Cleanup

With time and use the Healthchecks database will grow in size. You may decide to prune old data: inactive user accounts, old checks not assigned to users, records of outgoing email messages and records of received pings. There are separate Django management commands for each task:

Remove old records from api_ping table. For each check, keep 100 most recent pings:
```
$ ./manage.py prunepings
```
Remove old records of sent notifications. For each check, remove notifications that are older than the oldest stored ping for same check.
```
$ ./manage.py prunenotifications
```
Remove user accounts that match either of these conditions:
- Account was created more than 6 months ago, and user has never logged in. These can happen when user enters invalid email address when signing up.
- Last login was more than 6 months ago, and the account has no checks. Assume the user doesn't intend to use the account any more and would probably want it removed.
```
$ ./manage.py pruneusers
```
Remove old records from the api_tokenbucket table. The TokenBucket model is used for rate-limiting login attempts and similar operations. Any records older than one day can be safely removed.
```
$ ./manage.py prunetokenbucket
```
Remove old records from the api_flip table. The Flip objects are used to track status changes of checks, and to calculate downtime statistics month by month. Flip objects from more than 3 months ago are not used and can be safely removed.
```
$ ./manage.py pruneflips
```

When you first try these commands on your data, it is a good idea to test them on a copy of your database, not on the live database right away. In a production setup, you should also have regular, automated database backups set up.

Two-factor Authentication

Healthchecks optionally supports two-factor authentication using the WebAuthn standard. To enable WebAuthn support, set the RP_ID (relying party identifier ) setting to a non-null value. Set its value to your site's domain without scheme and without port. For example, if your site runs on https://my-hc.example.org, set RP_ID to my-hc.example.org.

Note that WebAuthn requires HTTPS, even if running on localhost. To test WebAuthn locally with a self-signed certificate, you can use the runsslserver command from the django-sslserver package.

External Authentication

Healthchecks supports external authentication by means of HTTP headers set by reverse proxies or the WSGI server. This allows you to integrate it into your existing authentication system (e.g., LDAP or OAuth) via an authenticating proxy. When this option is enabled, healtchecks will trust the header's value implicitly, so it is very important to ensure that attackers cannot set the value themselves (and thus impersonate any user). How to do this varies by your chosen proxy, but generally involves configuring it to strip out headers that normalize to the same name as the chosen identity header.

To enable this feature, set the REMOTE_USER_HEADER value to a header you wish to authenticate with. HTTP headers will be prefixed with HTTP_ and have any dashes converted to underscores. Headers without that prefix can be set by the WSGI server itself only, which is more secure.

When REMOTE_USER_HEADER is set, Healthchecks will:

assume the header contains user's email address
look up and automatically log in the user with a matching email address
automatically create an user account if it does not exist
disable the default authentication methods (login link to email, password)

Integrations

Slack

To enable the Slack "self-service" integration, you will need to create a "Slack App".

To do so:

Create a new Slack app on https://api.slack.com/apps/
Add at least one scope in the permissions section to be able to deploy the app in your workspace (By example incoming-webhook for the Bot Token Scopes https://api.slack.com/apps/APP_ID/oauth?).
Add a redirect url in the format SITE_ROOT/integrations/add_slack_btn/. For example, if your SITE_ROOT is https://my-hc.example.org then the redirect URL would be https://my-hc.example.org/integrations/add_slack_btn/.
Look up your Slack app for the Client ID and Client Secret at https://api.slack.com/apps/APP_ID/general? . Put them in SLACK_CLIENT_ID and SLACK_CLIENT_SECRET environment variables.

Discord

To enable Discord integration, you will need to:

register a new application on https://discordapp.com/developers/applications/me
add a redirect URI to your Discord application. The URI format is SITE_ROOT/integrations/add_discord/. For example, if you are running a development server on localhost:8000 then the redirect URI would be http://localhost:8000/integrations/add_discord/
Look up your Discord app's Client ID and Client Secret. Put them in DISCORD_CLIENT_ID and DISCORD_CLIENT_SECRET environment variables.

Pushover

Pushover integration works by creating an application on Pushover.net which is then subscribed to by Healthchecks users. The registration workflow is as follows:

On Healthchecks, the user adds a "Pushover" integration to a project
Healthchecks redirects user's browser to a Pushover.net subscription page
User approves adding the Healthchecks subscription to their Pushover account
Pushover.net HTTP redirects back to Healthchecks with a subscription token
Healthchecks saves the subscription token and uses it for sending Pushover notifications

To enable the Pushover integration, you will need to:

Register a new application on Pushover via https://pushover.net/apps/build.
Within the Pushover 'application' configuration, enable subscriptions. Make sure the subscription type is set to "URL". Also make sure the redirect URL is configured to point back to the root of the Healthchecks instance (e.g., http://healthchecks.example.com/).
Put the Pushover application API Token and the Pushover subscription URL in PUSHOVER_API_TOKEN and PUSHOVER_SUBSCRIPTION_URL environment variables. The Pushover subscription URL should look similar to https://pushover.net/subscribe/yourAppName-randomAlphaNumericData.

Signal

Healthchecks uses signal-cli to send Signal notifications. Healthcecks interacts with signal-cli over DBus.

To enable the Signal integration:

Set up and configure signal-cli to listen on DBus system bus (instructions). Make sure you can send test messages from command line, using the dbus-send example given in the signal-cli instructions.
Set the SIGNAL_CLI_ENABLED environment variable to True.

Create a Telegram bot by talking to the BotFather. Set the bot's name, description, user picture, and add a "/start" command.
After creating the bot you will have the bot's name and token. Put them in TELEGRAM_BOT_NAME and TELEGRAM_TOKEN environment variables.
Run settelegramwebhook management command. This command tells Telegram where to forward channel messages by invoking Telegram's setWebhook API call:
```
$ ./manage.py settelegramwebhook
Done, Telegram's webhook set to: https://my-monitoring-project.com/integrations/telegram/bot/
```

For this to work, your SITE_ROOT needs to be correct and use "https://" scheme.

Apprise

To enable Apprise integration, you will need to:

ensure you have apprise installed in your local environment:

pip install apprise

enable the apprise functionality by setting the APPRISE_ENABLED environment variable.

Shell Commands

The "Shell Commands" integration runs user-defined local shell commands when checks go up or down. This integration is disabled by default, and can be enabled by setting the SHELL_ENABLED environment variable to True.

Note: be careful when using "Shell Commands" integration, and only enable it when you fully trust the users of your Healthchecks instance. The commands will be executed by the manage.py sendalerts process, and will run with the same system permissions as the sendalerts process.

Matrix

To enable the Matrix integration you will need to:

Register a bot user (for posting notifications) in your preferred homeserver.
Use the Login API call to retrieve bot user's access token. You can run it as shown in the documentation, using curl in command shell.
Set the MATRIX_ environment variables. Example:

MATRIX_HOMESERVER=https://matrix.org
MATRIX_USER_ID=@mychecks:matrix.org
MATRIX_ACCESS_TOKEN=[a long string of characters returned by the login call]

Running in Production

Here is a non-exhaustive list of pointers and things to check before launching a Healthchecks instance in production.

Environment variables, settings.py and local_settings.py.
- DEBUG. Make sure it is set to False.
- ALLOWED_HOSTS. Make sure it contains the correct domain name you want to use.
- Server Errors. When DEBUG=False, Django will not show detailed error pages, and will not print exception tracebacks to standard output. To receive exception tracebacks in email, review and edit the ADMINS and SERVER_EMAIL settings. Another good option for receiving exception tracebacks is to use Sentry.
Management commands that need to be run during each deployment.
- This project uses Django Compressor to combine the CSS and JS files. It is configured for offline compression – run the manage.py compress command whenever files in the /static/ directory change.
- This project uses Django's staticfiles app. Run the manage.py collectstatic command whenever files in the /static/ directory change. This command collects all the static files inside the static-collected directory. Configure your web server to serve files from this directory under the /static/ prefix.
- Database migration should be run after each update to make sure the database schemas are up to date. You can do that with ./manage.py migrate.
Processes that need to be running constantly.
- manage.py runserver is intended for development only. Do not use it in production, instead consider using uWSGI or gunicorn.
- Make sure the manage.py sendalerts command is running and can survive server restarts. On modern linux systems, a good option is to define a systemd service for it.
General
- Make sure the database is secured well and is getting backed up regularly
- Make sure the TLS certificates are secured well and are getting refreshed regularly
- Have monitoring in place to be sure the Healthchecks instance itself is operational (is accepting pings, is sending out alerts, is not running out of resources).

Comments

OpenID Connect support

It would be nice if healthchecks supported OAuth/OpenID Connect or similar for on-premise deployments. That way a user could be redirected to a central IAM platform (AWS's, Active Directory's Federated Services, Keycloak, there a number of similar providers and tools) that they likely are already logged in to.
feature

opened by nogweii 23
Healthcheck with Started and OK pings goes down if OK hasn't come at one grace period after Started

I have a check with a 1 week period and 6 hour grace time. It's a very long borgmatic check operation. As you can see below, it Started at 5am, and at 11am (6 hours later) the check went down because the grace period had elapsed.

This doesn't seem like accurate behavior to me. The operation just isn't done, that doesn't mean it's down.

If I had a check without a Started ping, it would just compare to the OK times, and only if the OK came > 1 period + 1 grace period after the previous OK would it go down.

Sorry if my description is confusing... I can try to clarify further.

opened by kaysond 19
Add http header auth
I wanted to self-host healthchecks and integrate it with my central authentication system (see #185), so rather than develop something specific to my needs, I added support for HTTP header-based authentication. This way, people can integrate whatever auth system they want (LDAP, mTLS, SAML, OAuth, whatever) at the reverse proxy level and remove the need for healthchecks to care about the implementation details.

I added two new settings (with corresponding environment variables):

REMOTE_USER_HEADER — set this to the header you wish to authenticate with. HTTP headers will be prefixed with HTTP_ and have any dashes converted to underscores. Headers without that prefix can be set by the WSGI server itself only, which is more secure.

REMOTE_USER_HEADER_TYPE — If set to EMAIL, the specified header will be treated as the user's email. If set to ID, the specified header will be set to the user's UUID. Any other value (including empty, the default) disables header-based authentication.
opened by Phyxius 19
Adding Content-Type header to Webhook integrations

Adding Content-Type header to Webhook integrations to work correctly with services like IFTTT Maker Webhooks which require a specific content type, like application/json.

If the content-type is not provided, the post data is not parsed by IFTTT, which prevents the ability to use variables like, $NAME and $STATUS

opened by someposer 18
Added duration to ping details

This is useful on devices that have a small screen, since the duration cannot be seen in the main view. With this, one can see the duration in the ping's details.

opened by seidnerj 17
Monitoring execution time of script

It would be interesting to monitor the execution time of script, for example

curl https://hchk.io/UUID/start when script starts curl https://hchk.io/UUID/end when script ends

And a option to alert on a maximum execution time user-defined, in ranges as same in checks

Kind regards

opened by ZUNbado 15
Running in production on a subdomain /app/ possible?

Hello,

I have this fantastic app running in a venv, with mod_wsgi setup on Apache, and I need to run it as a subdomain to my main URL, in my case I mapped it to myURL/hc/ and what I am noticing is that several pages don't direct themselves to /hc/ and rather revert to myURL of my existing site.

For instance if I click on an individual "check" it takes me to myURL/checks/ee19e9f1-f727-4a8e-96fd-601cc26d85f6/details/ instead of myURL/hc/check/UUID, but not every page is like this, projects for example works as intended myURL/hc/projects/8195d2d5-af87-4552-a3cd-34886363e7b8/checks/

This is my first shot at turning up a Django app in production and not in debug mode so just wondering what I am missing or if the app is not designed to be used in this way? Please let me know if I can post any other relevant info to troubleshooting. Thank you for taking the time to create such a cool project, looking forward to using it.

opened by netopsengineer 14

Problem with telegram

I up and running healthckecs.io on my website. Everything is fine, you have done a lot of work, product is great, but I have some problems with setting up telegram integration. I have added TELEGRAM_TOKEN, TELEGRAM_BOT_NAME to my settings.py. I have set webhook url with

healthchecks@56373e540a6c:/healthchecks$ python3 ./manage.py settelegramwebhook                                                                                
Done, Telegram's webhook set to: https://healthcheck.servonline.xyz/integrations/telegram/bot/

When I get telegram update from getWebhookInfo i have next error

{"ok":true,"result":{"url":"https://healthcheck.servonline.xyz/integrations/telegram/bot/","has_custom_certificate":false,"pending_update_count":2,"last_error_date":1515392356,"last_error_message":"Wrong response from the webhook: 400 Bad Request","max_connections":40}}

What I do wrong?

opened by retraut 14

Bug: Deprecated function call breaking `manage.py collectstatic --noinput`

It seems like django deprecated admin_static which is causing the following exception when running manage.py collectstatic --noinput command.

Is there a way to resolve this exception??

Some posts suggest uninstalling and reinstalling django which I tried with no luck.

Traceback (most recent call last):
  File "/usr/lib/python3.7/site-packages/django/template/utils.py", line 66, in __getitem__
    return self._engines[alias]
KeyError: 'django'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.7/site-packages/django/template/backends/django.py", line 121, in get_package_libraries
    module = import_module(entry[1])
  File "/usr/lib/python3.7/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
  File "<frozen importlib._bootstrap>", line 983, in _find_and_load
  File "<frozen importlib._bootstrap>", line 967, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 677, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 728, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/usr/lib/python3.7/site-packages/django/contrib/admin/templatetags/admin_static.py", line 5, in <module>
    from django.utils.deprecation import RemovedInDjango30Warning
ImportError: cannot import name 'RemovedInDjango30Warning' from 'django.utils.deprecation' (/usr/lib/python3.7/site-packages/django/utils/deprecation.py)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "manage.py", line 10, in <module>
    execute_from_command_line(sys.argv)
  File "/usr/lib/python3.7/site-packages/django/core/management/__init__.py", line 401, in execute_from_command_line
    utility.execute()
  File "/usr/lib/python3.7/site-packages/django/core/management/__init__.py", line 395, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/usr/lib/python3.7/site-packages/django/core/management/base.py", line 328, in run_from_argv
    self.execute(*args, **cmd_options)
  File "/usr/lib/python3.7/site-packages/django/core/management/base.py", line 369, in execute
    output = self.handle(*args, **options)
  File "/usr/lib/python3.7/site-packages/compressor/management/commands/compress.py", line 277, in handle
    self.handle_inner(**options)
  File "/usr/lib/python3.7/site-packages/compressor/management/commands/compress.py", line 300, in handle_inner
    offline_manifest, block_count, results = self.compress(engine, extensions, verbosity, follow_links, log)
  File "/usr/lib/python3.7/site-packages/compressor/management/commands/compress.py", line 100, in compress
    if not self.get_loaders():
  File "/usr/lib/python3.7/site-packages/compressor/management/commands/compress.py", line 50, in get_loaders
    for e in engines.all():
  File "/usr/lib/python3.7/site-packages/django/template/utils.py", line 90, in all
    return [self[alias] for alias in self]
  File "/usr/lib/python3.7/site-packages/django/template/utils.py", line 90, in <listcomp>
    return [self[alias] for alias in self]
  File "/usr/lib/python3.7/site-packages/django/template/utils.py", line 81, in __getitem__
    engine = engine_cls(params)
  File "/usr/lib/python3.7/site-packages/django/template/backends/django.py", line 25, in __init__
    options['libraries'] = self.get_templatetag_libraries(libraries)
  File "/usr/lib/python3.7/site-packages/django/template/backends/django.py", line 43, in get_templatetag_libraries
    libraries = get_installed_libraries()
  File "/usr/lib/python3.7/site-packages/django/template/backends/django.py", line 108, in get_installed_libraries
    for name in get_package_libraries(pkg):
  File "/usr/lib/python3.7/site-packages/django/template/backends/django.py", line 125, in get_package_libraries
    "trying to load '%s': %s" % (entry[1], e)
django.template.library.InvalidTemplateLibrary: Invalid template library specified. ImportError raised when trying to load 'django.contrib.admin.templatetags.admin_static': cannot import name 'RemovedInDjango30Warning' from 'django.utils.deprecation' (/usr/lib/python3.7/site-packages/django/utils/deprecation.py)

Reference: https://django.readthedocs.io/en/2.2.x/releases/2.1.html#id2

opened by gganeshan 13

Late vs Started

If I use /start to signal the start of a job my badge shows up as "Late" even though it started well within the allowed time frame.

At the start of my long-running PS script I am calling this: Invoke-RestMethod https://hc-ping.com/<id>/start

If it succeeds, I'm calling this: Invoke-RestMethod https://hc-ping.com/<id>

If it fails, I'm adding /fail.

Can the badge show "Running"? Am I doing something wrong?

opened by ScottBeeson 13
Ability to alert when check failed X times

We have a healthcheck that fails about once or twice a month for unrelated reasons. It would be cool if we could configure the healthcheck to require X amount of consecutive failures to send an alert, so we don't get pinged for unrelated reasons.
feature

opened by caleb15 12
improvement of Zulip integration docs
I've had issues setting up a Zulip integration. Sending a notification to a stream worked fine but sending it to a private user didn't work:

Could not send a test notification. Received status code 400 with a message: "Invalid email '...'".

The problem was a setting in Zulip's permission settings: "Who can access user email addresses" was set to "Administrators and Moderators" which excluded regular members from accessing a user's email address. However, the HC bot is set up to be a regular member. Because of that, the integration couldn't send the notification to the given user.

It might be helpful to add this to the the page where you set up the Zulip integration.
opened by tiltX 0
Possible permission error with opening /opt/healthchecks/docker/uwsgi.ini after building image and launching the container

On some (particularly security-conscious?) machines, after the repository is cloned, files and directories in the local copy can end up with 700/600 permissions. In my case it was a Synology system with "0077" umask forcing the permissions to be like this:

ll healthchecks/ total 268 drwx------ 1 root root 264 Dec 26 11:51 . drwx------ 1 root root 24 Dec 26 11:50 .. -rw------- 1 root root 24634 Dec 26 11:51 CHANGELOG.md -rw------- 1 root root 2462 Dec 26 11:51 CONTRIBUTING.md drwx------ 1 root root 116 Dec 26 11:52 docker drwx------ 1 root root 138 Dec 26 11:51 .git drwx------ 1 root root 40 Dec 26 11:51 .github -rw------- 1 root root 81 Dec 26 11:51 .gitignore drwx------ 1 root root 190 Dec 26 11:51 hc -rw------- 1 root root 1507 Dec 26 11:51 LICENSE drwx------ 1 root root 16 Dec 26 11:51 locale -rwx------ 1 root root 468 Dec 26 11:51 manage.py -rw------- 1 root root 20283 Dec 26 11:51 README.md -rw------- 1 root root 175 Dec 26 11:51 requirements.txt -rw------- 1 root root 401408 Dec 26 11:51 search.db drwx------ 1 root root 26 Dec 26 11:51 static drwx------ 1 root root 82 Dec 26 11:51 stuff drwx------ 1 root root 214 Dec 26 11:51 templates

As a consequence, when you build a docker image on such a system, the container directory "/opt/healthchecks/docker/" ends up with the same permissions, meaning it is not accessible by the "hs" user in the container. So the startup command "uwsgi /opt/healthchecks/docker/uwsgi.ini" fails and the container never starts.

For a solution, the owner of the directory in the container can simply be changed to the "hs" user in the Dockerfile, for example:

RUN
rm -f /opt/healthchecks/hc/local_settings.py && \ DEBUG=False SECRET_KEY=build-key ./manage.py collectstatic --noinput && \ DEBUG=False SECRET_KEY=build-key ./manage.py compress && \ chown -R hc /opt/healthchecks

I tried the above while building my own image and it seems to be ok. The container works and the file system looks like this:

hc@b7f3a717696d:/opt/healthchecks$ ls -l total 488 -rw------- 1 hc root 24634 Dec 26 10:51 CHANGELOG.md -rw------- 1 hc root 2462 Dec 26 10:51 CONTRIBUTING.md -rw------- 1 hc root 1507 Dec 26 10:51 LICENSE -rw------- 1 hc root 20283 Dec 26 10:51 README.md drwx------ 1 hc root 4096 Dec 26 10:52 docker drwx------ 1 hc root 4096 Dec 26 10:52 hc drwx------ 1 hc root 4096 Dec 26 10:51 locale -rwx------ 1 hc root 468 Dec 26 10:51 manage.py -rw------- 1 hc root 175 Dec 26 10:51 requirements.txt -rw------- 1 hc root 401408 Dec 26 10:51 search.db drwx------ 1 hc root 4096 Dec 26 10:51 static drwxr-xr-x 8 hc root 4096 Dec 26 10:52 static-collected drwx------ 1 hc root 4096 Dec 26 10:51 stuff drwx------ 1 hc root 4096 Dec 26 10:51 templates

I don't believe there should be other issues with having "hc" own the healthchecks directory...in fact it does seems to make logical sense. If you agree, the fix here should be fairly straightforward and might avoid some unexpected headaches in the future ;)

opened by Klionheart 1
Add filtering option for classifying email messages as "start" events

I see we can set filter rules for email pings to indicate a success or failure.

Similar to HTTP requests, we would like to create an email filter rule for the "start" and "success" to measure time performance and other reasons.

opened by c0nfus3d 0
Feature request: average execution time

I would like to add an idea, that would be helpful for me: In the detail page of a check an option to display the average execution time would be very helpful. Maybe even a graph that displays the execution time? Also a minimum or maximum value could be helpful. This would help especially when adding new features to a script to see if it is not consuming too much time. Thanks for your consideration and your work!

opened by BlackScreen 0
Feature Request: Allow to notify on each failed check
There are cases where one wants to be notified with every failed trigger of a check. This is per job/task and not globally where there is already a setting to re-notify daily/weekly....

Story:

Mail Monitoring for Keywords

Check is currently "up"

Check fails by triggered keyword, goes "down", followed by a triggered notification

Check fails again by triggered keyword, but nothing happens since the check is already "down"

Workaround: Have a "success" trigger every minute to switch the status back to be able to be notified one it's down by failure keyword or just failed check again.

Idea: One wants to be able to be notified every time that a check fails. That might be multiple failed checks / errors again and again. There might be checks that throw an error over and over again. Then the status never changes but it's seriousness cannot be detected by notifications when logging into the dashboard might not be required at that time.

e.g. with email keyword filtering it doesn't really work if one is looking for keywords and to be notified every time a keyword was found.

Possible solution: Add a simple option that allows to re-notify every time that a failure was logged/triggered. It may also make sense to add a limiter to how often it may trigger in a certain period of time.

Another solution may be to extend the email filtering to allow simple notification whenever a keyword was found. Not "down" or "up" but "warning" or "match found".

This should be per task/job and not globally (not sure if this would serve any purpose globally).

Thanks in advance!
opened by DerDanilo 3

Releases(v2.5)

v2.5(Dec 14, 2022)
Improvements

Upgrade to fido2 1.1.0 and simplify hc.lib.webauthn

Add handling for ipv4address:port values in the X-Forwarded-For header (#714)

Add a form for submitting Signal CAPTCHA solutions

Add Duration field in the Ping Details dialog (#720)

Update Mattermost setup instructions

Add support for specifying a run ID via a "rid" query parameter (#722)

Add last ping body in Slack notifications (#735)

Add ntfy integration (#728)

Add ".txt" suffix to the filename when downloading ping body (#738)

Add API support for fetching ping bodies (#737)

Change "Settings - Email Reports" page to allow manual timezone selection

Bug Fixes

Fix the most recent ping lookup in the "Ping Details" dialog

Fix binary data handling in the hc.front.views.ping_body view

Fix downtime summaries in weekly reports (#736)

Fix week, month boundary calculation to use user's timezone

Source code(tar.gz)
Source code(zip)
v2.4.1(Oct 18, 2022)
Bug Fixes

Fix the GHA workflow for building arm/v7 docker image

Source code(tar.gz)
Source code(zip)
v2.4(Oct 18, 2022)
Improvements

Add support for EMAIL_USE_SSL environment variable (#685)

Switch from requests to pycurl

Implement documentation search

Add date filters in the Log page

Upgrade to cronsim 2.3

Add support for the $BODY placeholder in webhook payloads (#708)

Implement the "Clear Events" function

Add support for custom topics in Zulip notifications (#583)

Bug Fixes

Fix the handling of TooManyRedirects exceptions

Fix MySQL 8 support in the Docker image (upgrade from buster to bullseye) (#717)

Source code(tar.gz)
Source code(zip)
v2.3(Aug 5, 2022)
Improvements

Update Dockerfile to start SMTP listener (#668)

Implement the "Add Check" dialog

Include last ping type in Slack, Mattermost, Discord notifications

Upgrade to cron-descriptor 1.2.30

Add "Filter by keywords in the message body" feature (#653)

Upgrade to HiDPI screenshots in the documentation

Add support for the $JSON placeholder in webhook payloads

Add ping endpoints for "log" events

Add the "Badges" page in docs

Add support for multiple recipients in incoming email (#669)

Upgrade to fido2 1.0.0, requests 2.28.1, segno 1.5.2

Implement auto-refresh and running indicator in the My Projects page (#681)

Upgrade to Django 4.1 and django-compressor 4.1

Add API support for resuming paused checks (#687)

Bug Fixes

Fix the display of ignored pings with non-zero exit status

Fix a race condition in the "Change Email" flow

Fix grouping and sorting in the text version of the report/nag emails (#679)

Fix the update_timeout and pause views to create flips (for downtime bookkeeping)

Fix the checks list to preserve selected filters when adding/updating checks (#684)

Fix duration calculation to skip "log" and "ign" events

Source code(tar.gz)
Source code(zip)
v2.2.1(Jun 13, 2022)
Improvements

Improve the text version of the alert email template

Bug Fixes

Fix the version number displayed in the footer

Source code(tar.gz)
Source code(zip)
v2.2(Jun 13, 2022)
Improvements

Add address verification step in the "Change Email" flow

Reduce logging output from sendalerts and sendreports management commands (#656)

Add Ctrl+C handler in sendalerts and sendreports management commands

Add notes in docs about configuring uWSGI via UWSGI_ env vars (#656)

Implement login link expiration (login links will now expire in 1 hour)

Add Gotify integration (#270)

Add API support for reading/writing the subject and subject_fail fields (#659)

Add "Disabled" priority for Pushover notifications (#663)

Bug Fixes

Update hc.front.views.channels to handle empty strings in settings (#635)

Add logic to handle ContentDecodingError exceptions

Source code(tar.gz)
Source code(zip)
v2.1(May 10, 2022)
Improvements

Add logic to alert ADMINS when Signal transport hits a CAPTCHA challenge

Implement the "started" progress spinner in the details pages

Add "hc_check_started" metric in the Prometheus metrics endpoint (#630)

Add a management command for submitting Signal rate limit challenges

Upgrade to django-compressor 4.0

Update the C# snippet

Increase max displayed duration from 24h to 72h (#644)

Add "Ping-Body-Limit" response header in ping API responses

Bug Fixes

Fix unwanted localization in badge SVG generation (#629)

Update email template to handle not yet uploaded ping bodies

Add small delay in transports.Email.notify to allow ping body to upload

Fix prunenotifications to handle checks with missing pings (#636)

Fix "Send Test Notification" for integrations that only send "up" notifications

Source code(tar.gz)
Source code(zip)
v2.0.1(Mar 18, 2022)
Bug Fixes

Fix the GHA workflow for building arm/v7 docker image

Source code(tar.gz)
Source code(zip)
v2.0(Mar 18, 2022)
This release contains a backwards-incompatible change to the Signal integration (hence the major version number bump). Healthchecks uses signal-cli to deliver Signal notifications. In the past versions, Healthchecks interfaced with signal-cli over DBus. Starting from this version, Healthchecks interfaces with signal-cli using JSON RPC. Please see README for details on how to set this up.

Improvements

Update Telegram integration to treat "group chat was deleted" as permanent error

Update email bounce handler to mark email channels as disabled (#446)

Update Signal integration to use JSON RPC over UNIX socket

Update the "Add TOTP" form to display plaintext TOTP secret (#602)

Improve PagerDuty notifications

Add Ping.body_raw field for storing body as bytes

Add support for storing ping bodies in S3-compatible object storage (#609)

Add a "Download Original" link in the "Ping Details" dialog

Bug Fixes

Fix unwanted special character escaping in notification messages (#606)

Fix JS error after copying a code snippet

Make email non-editable in the "Invite Member" dialog when team limit reached

Fix Telegram bot to handle TransportError exceptions

Fix Signal integration to handle UNREGISTERED_FAILURE errors

Fix unwanted localization of period and grace values in data- attributes (#617)

Fix Mattermost integration to treat 404 as a transient error (#613)

Source code(tar.gz)
Source code(zip)
v1.25.0(Jan 7, 2022)
Improvements

Implement Pushover emergency alert cancellation when check goes up

Add "The following checks are also down" section in Telegram notifications

Add "The following checks are also down" section in Signal notifications

Upgrade to django-compressor 3.0

Add support for Telegram channels (#592)

Implement Telegram group to supergroup migration (#132)

Update the Slack integration to not retry when Slack returns 404

Refactor transport classes to raise exceptions on delivery problems

Add Channel.disabled field, for disabling integrations on permanent errors

Upgrade to Django 4

Bump the min. Python version from 3.6 to 3.8 (as required by Django 4)

Bug Fixes

Fix report templates to not show the "started" status (show UP or DOWN instead)

Update Dockerfile to avoid running "pip wheel" more than once (#594)

Source code(tar.gz)
Source code(zip)
v1.24.1(Nov 10, 2021)
Bug Fixes

Fix Dockerfile for arm/v7 - install all dependencies from piwheels

Source code(tar.gz)
Source code(zip)
v1.24.0(Nov 10, 2021)
Improvements

Switch from croniter to cronsim

Change outgoing webhook timeout to 10s, but cap the total time to 20s

Implement automatic api_ping and api_notification pruning (#556)

Update Dockerfile to install apprise (#581)

Improve period and grace controls, allow up to 365 day periods (#281)

Add SIGTERM handling in sendalerts and sendreports

Remove the "welcome" landing page, direct users to the sign in form instead

Bug Fixes

Fix hc.api.views.ping to handle non-utf8 data in request body (#574)

Fix a crash when hc.api.views.pause receives a single integer in request body

Source code(tar.gz)
Source code(zip)
v1.23.1(Oct 13, 2021)
Bug Fixes

Fix missing uwsgi dependencies in arm/v7 Docker image

Source code(tar.gz)
Source code(zip)
v1.23.0(Oct 13, 2021)
Improvements

Add /api/v1/badges/ endpoint (#552)

Add ability to edit existing email, Signal, SMS, WhatsApp integrations

Add new ping URL format: /{ping_key}/{slug} (#491)

Reduce Docker image size by using slim base image and multi-stage Dockerfile

Upgrade to Bootstrap 3.4.1

Upgrade to jQuery 3.6.0

Bug Fixes

Add handling for non-latin-1 characters in webhook headers

Fix dark mode bug in selectpicker widgets

Fix a crash during login when user's profile does not exist (#77)

Drop API support for GET, DELETE requests with a request body

Add missing @csrf_exempt annotations in API views

Fix the ping handler to reject status codes > 255

Add 'schemaVersion' field in the shields.io endpoint (#566)

Source code(tar.gz)
Source code(zip)
v1.22.0(Aug 6, 2021)
Improvements

Use multicolor channel icons for better appearance in the dark mode

Add SITE_LOGO_URL setting (#323)

Add admin action to log in as any user

Add a "Manager" role (#484)

Add support for 2FA using TOTP (#354)

Add Whitenoise (#548)

Bug Fixes

Fix dark mode styling issues in Cron Syntax Cheatsheet

Fix a 403 when transferring a project to a read-only team member

Security: fix allow_redirect function to reject absolute URLs

Source code(tar.gz)
Source code(zip)
v1.21.0(Jul 2, 2021)
Improvements

Increase "Success / Failure Keywords" field lengths to 200

Django 3.2.4

Improve the handling of unknown email addresses in the Sign In form

Add support for "... is UP" SMS notifications

Add an option for weekly reports (in addition to monthly)

Implement PagerDuty Simple Install Flow, remove PD Connect

Implement dark mode

Bug Fixes

Fix off-by-one-month error in monthly reports, downtime columns (#539)

Source code(tar.gz)
Source code(zip)
v1.20.0(Apr 22, 2021)
Improvements

Django 3.2

Rename VictorOps -> Splunk On-Call

Implement email body decoding in the "Ping Details" dialog

Add a "Subject" field in the "Ping Details" dialog

Improve HTML email display in the "Ping Details" dialog

Add a link to check's details page in Slack notifications

Replace details_url with cloaked_url in email and chat notifications

In the "My Projects" page, show projects with failing checks first

Bug Fixes

Fix downtime summary to handle months when the check didn't exist yet (#472)

Relax cron expression validation: accept all expressions that croniter accepts

Fix sendalerts to clear Profile.next_nag_date if all checks up

Fix the pause action to clear Profile.next_nag_date if all checks up

Fix the "Email Reports" screen to clear Profile.next_nag_date if all checks up

Fix the month boundary calculation in monthly reports (#497)

Source code(tar.gz)
Source code(zip)
v1.19.0(Feb 3, 2021)
Improvements

Add tighter parameter checks in hc.front.views.serve_doc

Update OpsGenie instructions (#450)

Update the email notification template to include more check and last ping details

Improve the crontab snippet in the "Check Details" page (#465)

Add Signal integration (#428)

Change Zulip onboarding, ask for the zuliprc file (#202)

Add a section in Docs about running self-hosted instances

Add experimental Dockerfile and docker-compose.yml

Add rate limiting for Pushover notifications (6 notifications / user / minute)

Add support for disabling specific integration types (#471)

Bug Fixes

Fix unwanted HTML escaping in SMS and WhatsApp notifications

Fix a crash when adding an integration for an empty Trello account

Change icon CSS class prefix to 'ic-' to work around Fanboy's filter list

Source code(tar.gz)
Source code(zip)
v1.18.0(Dec 9, 2020)
Improvements

Add a tooltip to the 'confirmation link' label (#436)

Update API to allow specifying channels by names (#440)

When saving a phone number, remove any invisible unicode characers

Update the read-only dashboard's CSS for better mobile support (#442)

Reduce the number of SQL queries used in the "Get Checks" API call

Add support for script's exit status in ping URLs (#429)

Improve phone number sanitization: remove spaces and hyphens

Change the "Test Integration" behavior for webhooks: don't retry failed requests

Add retries to the the email sending logic

Require confirmation codes (sent to email) before sensitive actions

Implement WebAuthn two-factor authentication

Implement badge mode (up/down vs up/late/down) selector (#282)

Add Ping.exitstatus field, store client's reported exit status values (#455)

Implement header-based authentication (#457)

Add a "Lost password?" link with instructions in the Sign In page

Bug Fixes

Fix db field overflow when copying a check with a long name

Source code(tar.gz)
Source code(zip)
v1.17.0(Oct 14, 2020)
Improvements

Django 3.1

Handle status callbacks from Twilio, show delivery failures in Integrations

Removing unused /api/v1/notifications/{uuid}/bounce endpoint

Less verbose output in the senddeletionnotices command

Host a read-only dashboard (from github.com/healthchecks/dashboard/)

LINE Notify integration (#412)

Read-only team members

API support for setting the allowed HTTP methods for making ping requests

Bug Fixes

Handle excessively long email addresses in the signup form

Handle excessively long email addresses in the team member invite form

Don't allow duplicate team memberships

When copying a check, copy all fields from the "Filtering Rules" dialog (#417)

Fix missing Resume button (#421)

When decoding inbound emails, decode encoded headers (#420)

Escape markdown in MS Teams notifications (#426)

Set the "title" and "summary" fields in MS Teams notifications (#435)

Source code(tar.gz)
Source code(zip)
v1.16.0(Aug 4, 2020)
Improvements

Paused ping handling can be controlled via API (#376)

Add "Get a list of checks's logged pings" API call (#371)

The /api/v1/checks/ endpoint now accepts either UUID or unique_key (#370)

Added /api/v1/checks/uuid/flips/ endpoint (#349)

In the cron expression dialog, show a human-friendly version of the expression

Indicate a started check with a progress spinner under status icon (#338)

Added "Docs > Reliability Tips" page

Spike.sh integration (#402)

Updated Discord integration to use discord.com instead of discordapp.com

Add "Failure Keyword" filtering for inbound emails (#396)

Add support for multiple, comma-separated keywords (#396)

New integration: phone calls (#403)

Bug Fixes

Removing Pager Team integration, project appears to be discontinued

Sending a test notification updates Channel.last_error (#391)

Handle HTTP 429 responses from Matrix server when joining a Matrix room

Source code(tar.gz)
Source code(zip)
v1.15.0(Jun 4, 2020)
Improvements

Rate limiting for Telegram notifications (10 notifications per chat per minute)

Use Slack V2 OAuth flow

Users can edit their existing webhook integrations (#176)

Add a "Transfer Ownership" feature in Project Settings

In checks list, the pause button asks for confirmation (#356)

Added /api/v1/metrics/ endpoint, useful for monitoring the service itself

Added "When paused, ignore pings" option in the Filtering Rules dialog (#369)

Bug Fixes

"Get a single check" API call now supports read-only API keys (#346)

Don't escape HTML in the subject line of notification emails

Don't let users clone checks if the account is at check limit

Source code(tar.gz)
Source code(zip)
v1.14.0(Mar 23, 2020)
Improvements

Improved UI to invite users from account's other projects (#258)

Experimental Prometheus metrics endpoint (#300)

Don't store user's current project in DB, put it explicitly in page URLs (#336)

API reference in Markdown

Use Selectize.js for entering tags (#324)

Zulip integration (#202)

OpsGenie integration returns more detailed error messages

Telegram integration returns more detailed error messages

Added the "Get a single check" API call (#337)

Display project name in Slack notifications (#342)

Bug Fixes

The "render_docs" command checks if markdown and pygments is installed (#329)

The team size limit is applied to the n. of distinct users across all projects (#332)

API: don't let SuspiciousOperation bubble up when validating channel ids

API security: check channel ownership when setting check's channels

API: update check's "alert_after" field when changing schedule

API: validate channel identifiers before creating/updating a check (#335)

Fix redirect after login when adding Telegram integration

Source code(tar.gz)
Source code(zip)
v1.13.0(Feb 13, 2020)
Improvements

Show a red "!" in project's top navigation if any integration is not working

createsuperuser management command requires an unique email address (#318)

For superusers, show "Site Administration" in top navigation, note in README (#317)

Make Ping.body size limit configurable (#301)

Show sub-second durations with higher precision, 2 digits after decimal point (#321)

Replace the gear icon with three horizontal dots icon (#322)

Add a Pause button in the checks list (#312)

Documentation in Markdown

Added an example of capturing and submitting log output (#315)

The sendalerts commands measures dwell time and reports it over statsd protocol

Django 3.0.3

Show a warning in top navigation if the project has no integrations (#327)

Bug Fixes

Increase the allowable length of Matrix room alias to 100 (#320)

Make sure Check.last_ping and Ping.created timestamps match exactly

Don't trigger "down" notifications when changing schedule interactively in web UI

Fix sendalerts crash loop when encountering a bad cron schedule

Stricter cron validation, reject schedules like "At midnight of February 31"

In hc.front.views.ping_details, if a ping does not exist, return a friendly message

Source code(tar.gz)
Source code(zip)
v1.12.0(Jan 2, 2020)
Improvements

Django 3.0

"Filtering Rules" dialog, an option to require HTTP POST (#297)

Show Healthchecks version in Django admin header (#306)

Added JSON endpoint for Shields.io (#304)

senddeletionnotices command skips profiles with recent last_active_date

The "Update Check" API call can update check's description (#311)

Bug Fixes

Don't set CSRF cookie on first visit. Signup is exempt from CSRF protection

Fix List-Unsubscribe email header value: add angle brackets

Unsubscribe links serve a form, and require HTTP POST to actually unsubscribe

For webhook integration, validate each header line separately

Fix "Send Test Notification" for webhooks that only fire on checks going up

Don't allow adding webhook integrations with both URLs blank

Don't allow adding email integrations with both "up" and "down" unchecked

Source code(tar.gz)
Source code(zip)
v1.11.0(Nov 22, 2019)
Improvements

In monthly reports, no downtime stats for the current month (month has just started)

Add Microsoft Teams integration (#135)

Add Profile.last_active_date field for more accurate inactive user detection

Add "Shell Commands" integration (#302)

PagerDuty integration works with or without PD_VENDOR_KEY (#303)

Bug Fixes

On mobile, "My Checks" page, always show the gear (Details) button (#286)

Make log events fit better on mobile screens

Source code(tar.gz)
Source code(zip)
v1.10.0(Oct 21, 2019)
Improvements

Add the "Last Duration" field in the "My Checks" page (#257)

Add "last_duration" attribute to the Check API resource (#257)

Upgrade to psycopg2 2.8.3

Add Go usage example

Send monthly reports on 1st of every month, not randomly during the month

Signup form sets the "auto-login" cookie to avoid an extra click during first login

Autofocus the email field in the signup form, and submit on enter key

Add support for OpsGenie EU region (#294)

Update OpsGenie logo and setup illustrations

Add a "Create a Copy" function for cloning checks (#288)

Send email notification when monthly SMS sending limit is reached (#292)

Bug Fixes

Prevent double-clicking the submit button in signup form

Upgrade to Django 2.2.6 – fixes sqlite migrations (#284)

Source code(tar.gz)
Source code(zip)
v1.9.0(Sep 3, 2019)
Improvements

Show the number of downtimes and total downtime minutes in monthly reports (#104)

Show the number of downtimes and total downtime minutes in "Check Details" page

Add the pruneflips management command

Add Mattermost integration (#276)

Three choices in timezone switcher (UTC / check's timezone / browser's timezone) (#278)

After adding a new check redirect to the "Check Details" page

Bug Fixes

Fix javascript code to construct correct URLs when running from a subdirectory (#273)

Don't show the "Sign Up" link in the login page if registration is closed (#280)

Source code(tar.gz)
Source code(zip)
v1.8.0(Jul 8, 2019)
Improvements

Add the prunetokenbucket management command

Show check counts in JSON "badges" (#251)

Webhooks support HTTP PUT (#249)

Webhooks can use different req. bodies and headers for "up" and "down" events. (#249)

Show check's code instead of full URL on 992px - 1200px wide screens. (#253)

Add WhatsApp integration (uses Twilio same as the SMS integration)

Webhooks support the $TAGS placeholder

Don't include ping URLs in API responses when the read-only key is used

Bug Fixes

Fix badges for tags containing special characters (#240, #237)

Fix the "Integrations" page for when the user has no active project

Prevent email clients from opening the one-time login links (#255)

Fix prunepings and prunepingsslow, they got broken when adding Projects (#264)

Source code(tar.gz)
Source code(zip)
v1.7.0(May 2, 2019)
Improvements

Add the EMAIL_USE_VERIFICATION configuration setting (#232)

Show "Badges" and "Settings" in top navigation (#234)

Upgrade to Django 2.2

Can configure the email integration to only report the "down" events (#231)

Add "Test!" function in the Integrations page (#207)

Rate limiting for the log in attempts

Password strength meter and length check in the "Set Password" form

Show the Description section even if the description is missing. (#246)

Include the description in email alerts. (#247)

Source code(tar.gz)
Source code(zip)