Part of the Raspberry Pi Reliability series.
In my blog posts about making long-running Raspberry Pis more reliable, I’ve suggested a number of changes. Some of these are pretty large, invasive changes; others are tiny and unremarkable.
With any intervention, it’s important to consider the risks involved and weigh them against your personal risk tolerance, your understanding of the system, and the potential benefits of the change.
You may well decide that potentially-destructive interventions are fine, because you understand what’s going on and have a recovery plan; or you might decide to make only minimal changes, because your Pi is running fine and only occasionally exhibits a single problem.
In this post I’ll briefly describe examples of my personal risk-benefit considerations for a few different situations.
The Raspberry Pi’s hardware watchdog isn’t enabled by default. It is a useful and powerful tool, but enabling it can lead to:
- Reboot loops that require you to physically remove and edit the SD card
- Hard reboots in the middle of critical operations like software updates
- SD card corruption due to unclean shutdowns
Most of my Pis run with very little load and are overall quite stable, so reboot loops seem unlikely. Running with automatic software updates disabled or an entirely read-only root filesystem (LINK TK) limits the risk of a hard reboot causing system or SD card corruption (having backups helps with that, too).
Given the benefits the hardware watchdog provides, I think these risks are acceptable for my Raspberry Pi use cases, but I do carefully consider this decision before enabling the hardware watchdog for a given Pi. YMMV.
In contrast, disabling swapping on the Pi SD card is a low-risk change and I give it almost no thought.
Most Pis don’t need swap space to function properly, and if they do it’s simple to add swap on a sacrificial external drive.
There’s little that can go wrong with this change, it’s unlikely to take the Pi offline and require manual intervention, and it’s easy to undo the change if it does cause a problem.
My original fix for Netdata causing excessive lighttpd log messages on my Pi-Hole was undeniably a hack: I suggested adding a cron job that would use
tee to remove unwanted entries from the lighttpd log file in-place.
I knew this was a hack, and I figured there was some risk of corrupting the log file entirely. I judged that risk acceptable because:
- The change fixed a real problem I had.
- I had experimented with a few more elgant fixes in the lighttpd configuration, but I’m not a lighttpd power user and I couldn’t get these fixes to work.
- The change was quick to implement, allowing me to move on to other things.
- I don’t really care about the lighttpd log file; it’s nice to have, but I basically never look at it. If it gets corrupted once in a while, well, whatever.
lighttpd developer Glenn Strauss emailed me to note that this hack could cause “corruption, truncation, and missing log entries (other than the ones you want filtered).” He also kindly provided a working lighttpd configuration tweak to filter out the unwanted log entries.
This change is not a hack, and it comes with the same benefits as listed, so I’m happily running it now. It’s always good to use lower-risk options when they’re available and come with no trade-offs!