v3_tuto_check

Policy tutorial: checking data integrity

Table of Contents Presentation Configuration Policy declaration File classes Policy rules Policy triggers Running the policy From command line Using robinhood service Specific reporting Processing checksums on multiple hosts

Presentation

This tutorial explains how to setup a checksum policy to regularly check the contents of files and possibly detect data corruption.

The principle of this policy is to compute the checksum of files and store them in the robinhood database. Next time a file is checked, if the file doesn't seam to have changed (same mtime, same size), it should have the same checksum. If not, the file may be corrupted and its checksum status is marked as 'failed' in robinhood DB.

You can easily get a summary of the checksum status of filesystem entries, and list the entries for each status.

Configuration

Policy declaration

Include 'check.inc' in your config file to define the 'checksum' policy:

 %include "includes/check.inc"

File classes

Specify your policy targets as fileclasses. Fileclass definitions must be based on rather static criteria (like owner, path, group...). They should NOT be based on time criteria (age of last access, last modification, etc.): time-based criteria will be specified in policy rules.

Examples:

 fileclass important_files {
     definition { type == file and name == "*.data"
                  and tree == "/path/to/important_data" }
 }

Define the following fileclass that will help you to define you policy rules (this will make it possible to define a different rule for files that have never been checked):

 fileclass never_checked {
    # never checked => last_check == 0.
    # 'output' stands for previous command stdout
    definition { checksum.last_check == 0 or checksum.output == "" }
    # don't display this fileclass in --classinfo reports.
    report = no;
 }

Policy rules

Then specify checksum rules. In the following example, we run the initial checksum computation after 1day, and recheck entries weekly:

 checksum_rules {
    # simple filters to optimize policy run
    ignore { last_check < 1d }
    ignore { last_mod < 1d }
 
    rule initial_check {
        target_fileclass = never_checked;
        condition { last_mod > 1d }
    }
 
    rule default {
       condition { last_mod > 1d and last_check > 7d }
    }
 }

Policy triggers

If you want robinhood daemon to regularly run the checksum policy, define a trigger:

 checksum_trigger {
     trigger_on = scheduled;
     check_interval = 6h;
 }

Running the policy

From command line

Run a daemon that regularly apply the checksum policy (according to trigger interval):

robinhood --run=checksum -d

Check policy rules for all entries. The program exits when the policy run is complete.

robinhood --run=checksum --target=all

Match policy rules for a subset of entries:
- Example 1: apply policy to user 'foo':

robinhood --run=checksum --target=user:foo

- Example 2: apply policy to fileclass 'small':

robinhood --run=checksum --target=class:small

Limit the number of checked entries:
- Example 1: check 100TB of data

robinhood --run=checksum --target=all --max-vol=100TB

- Example 2: check 1000 files in OST 23

robinhood --run=checksum --target=ost:23 --max-count=1000

Using robinhood service

To make robinhood daemon run the checksum policy when you start robinhood service:

Edit /etc/sysconfig/robinhood (or /etc/sysconfig/robinhood.''fsname'' on RHEL7)
Append --run=checksum to RBH_OPT (if the --run option is already present, add it to the list of run arguments, e.g. --run=policy1,checksum).
Start (or restart) robinhood service:
- On RHEL6: service robinhood restart
- On RHEL7: systemctl restart robinhood[@''fsname'']

Specific reporting

Besides standard report commands, robinhood provides the following reports about checksums:

rbh-report --status-info checksum

Indicates the number of entries for each status:

- empty for entries that have not been checksummed yet
- ok: checksumming succeeded and is correct
- failed: checksumming failed, or unexpected checksum value

 checksum.status,     type,      count,     volume,   spc_used,   avg_size
                ,     file,   33433138,    2.14 TB,    2.13 TB,   68.62 KB
              ok,     file,    1517984,  147.25 GB,  149.09 GB,  101.72 KB
          failed,     file,    1901298,  120.60 GB,  125.52 GB,   66.51 KB
 ...
 Total: 43347748 entries, volume: 2652255534468 bytes (2.41 TB), space used: 2655941301248 bytes (2.42 TB)

rbh-find -status checksum:failed

List entries with the given checksum status (failed in this example).

rbh-report -e entry_path_or_fid | grep checksum

Displays checksum specific attributes, e.g.:

 checksum.status: 	ok
 checksum.last_check: 	2016/08/04 17:02:13
 checksum.last_success:	2016/08/04 17:02:13
 checksum.output: 	0:da39a3ee5e6b4b0d3255bfef95601890afd80709

rbh-du --status checksum:ok --details -H

Filter 'du' by checksum status.

 /ccc/scratch
         file count:1517984, size:147.3G, spc_used:149.1G

Processing checksums on multiple hosts

The checksum command wrapper provided by robinhood is executed on the same host as the robinhood process.

This can be limiting if you want to run checksum on huge volumes of data, or if you want to avoid overloading the robinhood host with too much I/Os.

You can easily distribute this processing on multiple hosts by writing a wrapper that will distribute the work on several hosts:

This wrapper can run 'ssh <node></node> <checksum_cmd></checksum_cmd> ' on a random node in a node set.
It can connect to a 'xinetd' service which is dedicated to process these checksums.
And there are probably many other ways to do so...

The wrapper must keep the same conventions (output format and error code) as the original robinhood checksum script (/usr/sbin/rbh_cksum.sh).

Back to wiki home

Download latest version: robinhood 3.1.6 (2020/09/29)
Papers and presentations
Online documentation
Subscribe robinhood-news
Mail robinhood-support
Wiki home

Provide feedback

Saved searches

Use saved searches to filter your results more quickly