Chris Dzombak

sharing preview • dzombak.com

Monitoring for NAS data corruption on ext4 with cshatag

Two open-source programs, cshatag and runner, make it easy to monitor an ext4 filesystem for data corruption.

Monitoring for NAS data corruption on ext4 with cshatag

One item on my todo list for ages was to put in place some sort of monitoring for silent data corruption on my NAS. This turned out to be surprisingly easy to do, thanks to two programs: cshatag and my own runner!

Before we dive in, some background information:

Both cshatag and runner provide installation directions in their READMEs, so I won’t duplicate them here.

TK: install links

On modern systems, including mine, xattrs are enabled by default on ext4 filesystems, so no changes to /etc/fstab were needed. And the system is already set up with Postfix and Mailgun to email me output from cron jobs, so no additional configuration was needed there. (runner also supports sending email directly, in case you can’t or don’t want to set up email systemwide.)

All that’s left is putting together a file in /etc/cron.d with the right combination of arguments! Here’s what the file /etc/cron.d/cshatag looks like on my NAS:

SHELL=/bin/bash
RUNNER_LOG_DIR=/var/log/cshatag

#
# cshatag: check for data corruption
# https://github.com/rfjakob/cshatag
#

# Monday: general
05 11 * * 1  root  runner -job-name cshatag-general  -work-dir /mnt/storage -healthy-exit 0 -healthy-exit 2 -healthy-exit 3 -- ionice -c2 -n7 nice -n19 /usr/local/bin/cshatag -qq -recursive ./general

# Wedsnesday: plex
05 11 * * 3  root  runner -job-name cshatag-plex     -work-dir /mnt/storage -healthy-exit 0 -healthy-exit 2 -healthy-exit 3 -- ionice -c2 -n7 nice -n19 /usr/local/bin/cshatag -qq -recursive ./plex

# Cleanup logs in RUNNER_LOG_DIR older than 30 days:
00 00 * * *  root  find "$RUNNER_LOG_DIR" -mtime +30 -name "*.log" -delete  >/dev/null

Let’s break down everything on the line for the “Monday: general” job:

The end result is that, every Monday at 11:05am, cshatag runs recursively on /mnt/storage/general with low IO and CPU priority. If cshatag returns an error (indicating data corruption or a failure writing xattrs), I’ll be notified via email.