Posted 06 Jan 2021 in Linux

A lightweight service health check tool written in Bash

cdzombak/lightweight-healthcheck is an easy-to-use (and deploy) service health check tool, written entirely in Bash.

⚠️

This post was automatically migrated from my old blogging software, and I have not reviewed it for problems yet. Please contact me if you notice any important issues.

Part of the Project Announcements series.

For websites like a2mi.social and other apps and services I run, I need to know when there's a problem. But for personal & crowdfunded projects with few (or 1) users, services like Pingdom are expensive and overkill.

So I wrote lightweight-healthcheck, a Bash script which I run periodically (typically every minute or two) via cron. When the service it's checking goes down, I get an email and text message. When it recovers, I get another email & text.

It has some useful features, above & beyond a trivial one-off script one might write to achieve minimum viable monitoring:

Written in Bash & trivially deployable on Linux or macOS.
Email & SMS notifications, including date/time and the hostname that sent the alert.
Mail/SMS alert rate limiting, to avoid blowing through your Twilio/Mailgun quota.
Customizable delay between first detecting a down condition and sending alert.
Logging of down/alert/clear events. (Logs are automatically written to ~/.lightweight-healthcheck for the user running the checks.)

SMS delivery is acheived with Twilio. For emails, the script uses the mailx tool, and your server will need to be configured correctly to deliver mail. I personally use Mailgun to ensure mail from my servers reaches me reliably. You can set this up for your Linux servers using this guide.

To deploy the script, you just make a copy, customize the variables on top, and write a check in Bash for your service.

Some tips on writing checks with curl:

Pass -s to silence its output.
Set --connect-timeout to a small number (I usually start with 5 seconds, and tweak it from that as needed).
See stackoverflow.com/a/42873372 for notes on curl's retry options.

Here are a couple example checks I use for my own services:

# verify that my website is online & serving my home page:
curl -s --connect-timeout 5 --max-time 15 --retry 3 --retry-max-time 50 https://www.dzombak.com | grep -c "<title> # Chris Dzombak</title>"

# verify that the a2mi.social streaming API is online:
curl -s --connect-timeout 5 https://a2mi.social/api/v1/streaming/health | grep -c OK