For websites like a2mi.social and other apps and services I run, I need to know when there's a problem. But for personal & crowdfunded projects with few (or 1) users, services like Pingdom are expensive and overkill.
So I wrote lightweight-healthcheck, a Bash script which I run periodically (typically every minute or two) via cron. When the service it's checking goes down, I get an email and text message. When it recovers, I get another email & text.
It has some useful features, above & beyond a trivial one-off script one might write to achieve minimum viable monitoring:
- Written in Bash & trivially deployable on Linux or macOS.
- Email & SMS notifications, including date/time and the hostname that sent the alert.
- Mail/SMS alert rate limiting, to avoid blowing through your Twilio/Mailgun quota.
- Customizable delay between first detecting a down condition and sending alert.
- Logging of down/alert/clear events. (Logs are automatically written to
~/.lightweight-healthcheckfor the user running the checks.)
SMS delivery is acheived with Twilio. For emails, the script uses the
mailx tool, and your server will need to be configured correctly to deliver mail. I personally use Mailgun to ensure mail from my servers reaches me reliably. You can set this up for your Linux servers using this guide.
Some tips on writing checks with
-sto silence its output.
--connect-timeoutto a small number (I usually start with 5 seconds, and tweak it from that as needed).
- See stackoverflow.com/a/42873372 for notes on
curl's retry options.
Here are a couple example checks I use for my own services:
# verify that my website is online & serving my home page: curl -s --connect-timeout 5 --max-time 15 --retry 3 --retry-max-time 50 https://www.dzombak.com | grep -c "<title> # Chris Dzombak</title>"
# verify that the a2mi.social streaming API is online: curl -s --connect-timeout 5 https://a2mi.social/api/v1/streaming/health | grep -c OK