Skip to content

lhsm_tutorial

Thomas Leibovici edited this page Apr 9, 2015 · 12 revisions

Table of Contents

First steps with robinhood-lhsm

robinhood v2.5.4

December 9th, 2014

Contact: [email protected]

Installation

Lustre/HSM

First, you need to setup Lustre/HSM components:

  • Lustre >= 2.5
  • Copytool daemons for your HSM backend, running on Lustre clients
  • Enable mdt hsm coordinator
  • Enable Lustre MDT changelogs
Refer to Lustre documentation for more details about these steps: Lustre manual

Robinhood

Install from RPM

Pre-generated RPMs can be downloaded on sourceforge, for the following configurations:

  • x86_64 architecture , RedHat 5/6 Linux families
  • MySQL database 5.x
  • Lustre 2.5, 2.6, 2.7
Purpose specific RPM: robinhood-lhsm

It must be installed on a Lustre client. It is recommended to run the same Lustre version on this client and Lustre servers.

It includes:

  • 'rbh-lhsm' daemon
  • Reporting commands: 'rbh-lhsm-report', 'rbh-lhsm-find', 'rbh-lhsm-du', 'rbh-lhsm-diff'
  • Man pages
  • '/etc/init.d/robinhood-lhsm' service and its configuration file '/etc/sysconfig/robinhood-lhsm'
  • A detailed configuration example: '/etc/robinhood.d/lhsm/templates/lhsm_detailed.conf'
Admin RPM (all purposes): robinhood-adm

Includes 'rbh-config' configuration helper. It is helpful on the DB host and Lustre MDS.

robinhood-webgui RPM installs a web interface to visualize stats from Robinhood database.

It must be installed on a HTTP server.

Build and install from the source tarball

Requirements

Before building Robinhood, make sure the following packages are installed on your system:

  • mysql-devel
  • lustre API library (if Robinhood is to be run on a Lustre filesystem):'/usr/include/liblustreapi.h' and '/usr/lib*/liblustreapi.so' are installed by lustre rpm.

Build

Retrieve Robinhood tarball from sourceforge: http://sourceforge.net/projects/robinhood/files

Unzip and untar the sources:

tar zxvf robinhood-2.5.4.tar.gz
cd robinhood-2.5.4

Then, use the "configure" script to generate Makefiles:

  • use the --with-purpose=LUSTRE_HSM option to build it for Lustre/HSM:
 ./configure --with-purpose=LUSTRE_HSM

Other './configure' options:

  • You can change the default prefix of installation path (default is /usr) using:'--prefix=<path>'
Then build the RPMs:
make rpm

RPMs are generated in the 'rpms/RPMS/<arch>' directory. RPM is tagged with the lustre version it was built for.

MySQL database

Robinhood needs a MySQL database for storing its data. This database can run on a different host from Robinhood node. However, a common configuration is to install robinhood on the DB host, to reduce DB request latency.

Requirements

Install 'mysql' and 'mysql-server' packages on the node where you want to run the database engine.

Start the database engine:
service mysqld start

Creating database

Using the helper script:

To easily create robinhood database, you can use rbh-config script. Run this script on the database host to check your system configuration and perform database creation steps:

# check database requirements and create it:
rbh-config create_db

Note: if no option is given to rbh-config, it prompts for configuration parameters interactively. Else, if you specify parameters on command line, it runs in batch mode.

Write the database password to a file with restricted access (root/600), e.g. /etc/robinhood.d/.dbpassword

or manually:

Alternatively, if you want a better control on the database configuration and access rights, you can perform the following steps of your own:

  • Create the database (one per filesystem) using the mysqladmin command: mysqladmin create ''<robinhood_db_name>''
  • Connect to the database: mysql ''<robinhood_db_name>''
Then execute the following commands in the MySQL session:
    • Create a robinhood user and set its password (MySQL 5+ only): create user ''robinhood'' identified by '''password''';
    • Give access rights on database to this user (you can restrict client host access by replacing '%' by the node where robinhood will be running):
Mysql 5:
GRANT USAGE ON ''robinhood_db_name.* TO 'robinhood'@'%''' ;
GRANT ALL PRIVILEGES ON ''robinhood_db_name.* TO 'robinhood'@'%''';
Mysql 4.1:
GRANT USAGE ON ''robinhood_db_name.* TO 'robinhood'@'%''' identified by 'password';
GRANT ALL PRIVILEGES ON ''robinhood_db_name.* TO 'robinhood'@'%''';
    • The 'super' privilege is required for creating DB triggers (needed for accounting optimizations):
GRANT SUPER ON *.* TO 'robinhood'@'%' IDENTIFIED BY '''password''' ;
    • Refresh server access settings: FLUSH PRIVILEGES ;
    • You can check user privileges using: SHOW GRANTS FOR robinhood ;
  • For testing access to database, execute the following command on the machine where robinhood will be running :
mysql --user=''robinhood'' --password=''password'' --host=''db_host robinhood_db_name''

If the command is successful, a SQL shell is started. Else, you will get a 'permission denied' error.

At this time, the database schema is empty. Robinhood will automatically create it the first time it is launched.

First run

The best way to use robinhood on Lustre v2 is reading MDT changelogs. This Lustre feature makes it possible to update Robinhood database near real-time. Scanning the filesystem is no longer required after the initial filesystem scan (still needed to populate the DB).

In any case, robinhood runs on a lustre client.

Activate MDT Changelogs

If filesystem MDS and MGS are on the same host, you can simply enable this feature by running 'rbh-config' on this host (for other cases, see Robinhood admin guide). 'rbh-config' is installed by robinhood-adm package.

rbh-config enable_chglogs

This registers a changelog reader as 'cl1' and sets the changelog event mask (see /proc/fs/lustre/mdd/<fsname>-*/changelog_mask).

The reader is registered persistently. However, the changelog mask must be set when restarting the MDS (prior to any filesystem operation).

Simple configuration file

Let's start with a basic configuration file:

General {
    fs_path = "/path/to/lustre";
}
Log {
    log_file = "/var/log/robinhood/lustre.log";
    report_file = "/var/log/robinhood/reports.log";
    alert_file = "/var/log/robinhood/alerts.log";
}
ListManager {
    MySQL {
        server = db_host;
        db = robinhood_test;
        user = robinhood;
        password_file = /etc/robinhood.d/.dbpassword;
    }
}
ChangeLog {
    MDT {
        mdt_name = "MDT0000";
        reader_id = "cl1";
    }
}

General section:

  • fs_path is the mount point of the lustre filesystem we want to manage.
'Log section:'
  • Make sure the log directory exists.
  • You can also specify special values stderr, stdout or syslog for log parameters.
  • robinhood is compliant with log rotation (if its log file is renamed, it will automatically switch to a new log file).
ListManager::MySQL section:

This section is for configuring database access.

Set the host name of the database server (server parameter), the database name (db parameter), the database user (user parameter) and specify a file where you wrote the password for connecting to the database (password_file parameter).

/!\ Make sure the password file cannot be read by any user, by setting it a '600' mode for example.

If you don't care about security, you can directly specify the password in the configuration file, by setting the password parameter. E.g.: password = 'passw0rd' ;

ChangeLog::MDT section:

This section controls Changelog reading. For this simple case, we only specify the mdt_name (always 'MDT0000' if you don't use DNE with multiple MDTs) and the registered Changelog reader as reader_id (usually 'cl1' if you have a single Changelog consumer).

Running initial scan

To populate the DB, we need to run an initial scan. Unlike scanning in daemon mode, we just want to scan once and exit. Thus, we run rbh-lhsm with the --scan and --once option.
You can specify the configuration file using the -f option, else it will use the config file in '/etc/robinhood.d/lhsm'. If you have several config files in this directory, you can use a short name to distinguish them. e.g. '-f test' for '/etc/robinhood.d/lhsm/test.conf'.
If you want to override configuration values for log file, use the '-L' option. For example, let's specify '-L stdout'
rbh-lhsm -f test -L stdout --scan --once
or just:
rbh-lhsm -L stdout --scan --once
(if your config file is the only one in /etc/robinhood.d/lhsm)

You should get something like this:

2013/07/17 13:49:06: FS Scan | Starting scan of /mnt/lustre
2013/07/17 13:49:06: FS Scan | Full scan of /mnt/lustre completed, 7130 entries found. Duration = 0.07s
2013/07/17 13:49:06: FS Scan | File list of /mnt/lustre has been updated
2013/07/17 13:49:06: Main | All tasks done! Exiting.

Reading Lustre Changelogs

Then we want to read MDT changelogs to keep the DB up-to-date. We start it as a daemon, as we want to this continuously:
rbh-lhsm --readlog --detach

Getting filesystems statistics

Now the DB is updated near real-time, we can get fresh statistics about the filesystem.

rbh-lhsm-report

Now we performed a scan, we can get stats about users, files, directories, etc. using rbh-lhsm-report:

  • Get stats for a user: -u option
'''rbh-lhsm-report -u foo'''
user , type,  count,  spc_used,  avg_size
foo  ,  dir,  75450, 306.10 MB,   4.15 KB
foo  , file, 116396,  11.14 TB, 100.34 MB

Total: 191846 entries, 12248033808384 bytes used (11.14 TB)
  • Split user's usage per group: -S option
'''rbh-lhsm-report -u bar -S'''
user , group,  type,  count,  spc_used,   avg_size
bar  , proj1,  file,      4,  40.00 MB,   10.00 MB
bar  , proj2,  file,   3296, 947.80 MB,  273.30 KB
bar  , proj3,  file, 259781, 781.21 GB,    3.08 MB
  • Get largest files: --top-size option
'''rbh-lhsm-report --top-size'''
rank, path           ,      size,  user, group,         last_access,            last_mod, purge class
   1, /tmp/file.big1 , 512.00 GB,  foo1,   p01, 2012/10/14 17:41:38, 2011/05/25 14:22:41, BigFiles
   2, /tmp/file2.tar , 380.53 GB,  foo2,   p01, 2012/10/14 21:38:07, 2012/02/01 14:30:48, BigFiles
   3, /tmp/big.1     , 379.92 GB,  foo1,   p02, 2012/10/14 20:24:20, 2012/05/17 17:40:57, BigFiles
...
  • Get top space consumers: --top-users option
'''rbh-lhsm-report --top-users'''
rank, user    , spc_used,  count, avg_size
   1, usr0021 , 11.14 TB, 116396, 100.34 MB
   2, usr3562 ,  5.54 TB,    575,   9.86 GB
   3, usr2189 ,  5.52 TB,   9888, 585.50 MB
   4, usr2672 ,  3.21 TB, 238016,  14.49 MB
   5, usr7267 ,  2.09 TB,   8230, 266.17 MB
...

Notes:

  • --by-count option sorts users by entry count
  • --by-avgsize option sorts users by average file size
  • --reverse option reverses sort order (e.g. smallest first)
  • Use --count-min N option to only display users with at least N entries.
  • --by-size-ratio option makes it possible to sort users using the percentage of files in the given range.
  • Filesystem content summary: -i option
'''rbh-lhsm-report -i'''
status,    type ,    count,   volume, avg_size
n/a   ,     dir ,  1780074,  8.02 GB,  4.72 KB
n/a   , symlink ,   496142, 24.92 MB,       53
new   ,    file , 21366275, 91.15 TB,  4.47 MB

Total: 23475376 entries, 100399805708329 bytes (91.31 TB)

This report indicates the count and volume of each object type, and their status.
As we have not archived data for now, all objects are marked as 'new'. This field does not make sense for directory objects (n/a), as they do not contain data.

  • Entry information: -e option
'''rbh-lhsm-report -e /mnt/lustre/dir1/file.1'''
id          :     [0x200000400:0x16a94:0x0]
parent_id   :     [0x200000007:0x1:0x0]
name        :     file.1
path updt   :     2013/10/30 10:25:30
path 	    : 	  /mnt/lustre/dir1/file.1
depth       :     0
user        :     root
group       :     root
size        :     1.42 MB
spc_used    :     1.42 MB
creation    :     2013/10/30 10:07:17
last_access :     2013/10/30 10:15:28
last_mod    :     2013/10/30 10:10:52
last_archive:     2013/10/30 10:13:34
type        :     file
mode        :     rw-r--r--
nlink       :     1
status      :     modified
md updt     :     2013/10/30 10:25:30
stripe_cnt, stripe_size, pool:  2, 1.00 MB,
stripes     :     ost#1: 30515, ost#0: 30520
  • fileclasses summary: --class-info option
Once you have defined fileclasses (see next sections of this tutorial), you can get file repartition by fileclass:

'''rbh-report --class-info'''

archive class  ,    count, spc_used,   volume, min_size,  max_size,  avg_size
BigFiles       ,     1103, 19.66 TB, 20.76 TB,  8.00 GB, 512.00 GB,  19.28 GB
EmptyFiles     ,  1048697,  7.92 GB,  4.15 GB,        0,   1.96 GB,   4.15 KB
SmallFiles     , 20218577,  9.63 TB,  9.67 TB,        0,  95.71 MB, 513.79 KB
ImportantFiles ,   426427, 60.75 TB, 60.86 TB, 16.00 MB,   7.84 GB, 149.66 MB
  • ...and more: you can also generate reports, or dump files per directory, per OST, etc...
    To get more details about available reports, run 'rbh-lhsm-report --help'.

rbh-lhsm-find

'find' clone accessing robinhood database.
Example:

rbh-lhsm-find /mnt/lustre/dir -u root -size +32M -mtime +1h -ost 2 -status new -ls

rbh-lhsm-du

'du' clone acessing robinhood database. It provides extra features like filtering on a given user, group or type...
Example:

> rbh-lhsm-du -H -u foo /mnt/lustre/dir.3
45.0G /mnt/lustre/dir.3

Archiving

Now we know how to setup and query robinhood, let's archive data to the HSM backend.

Robinhood archives data incrementally. In other words, it only copies new or modified files but do not copy unchanged files multiple times.
Admin can set the priority criteria that determines the copy order: it can be last modification time, last archive time, creation time, last access time... By default, copy priority is based on last modification time (oldest first).

Using a single default policy

Robinhood makes it possible to define different migration policies for several file classes.
In this example, we will only define a single policy for all files.
This is done in the 'migration_policies' section of the config file:

migration_policies {
   policy default {
       condition {last_mod > 1h}
   }
}

'default' policy is a special policy that applies to files that don't match a file class.
In a policy, you must specify a condition for allowing entries to be migrated. In this example, we don't want to copy recently modified entries (modified within the last hour).

Run rbh-lhsm --migrate --once to apply this policy once.

You can also run it as a daemon (without the '--once' option). It this case, it will periodically run the migration on eligible entries.

Defining file classes

Robinhood makes it possible to apply different migration policies to files, depending on their properties (path, posix attributes, ...). This can be done by defining file classes that will be addressed in policies.

In this section of the tutorial, we will define 3 classes and apply different policies to them:

  • We don't want *.log files of user root to be archived.
  • We want to quickly archive files from directory '/mnt/lustre/saveme' (1hour after their creation, then archive hourly as long as they are modified).
  • Archive other entries 6h after their last modification.
First, we need to define those file classes, in the 'filesets' section of the configuration file.We associate a custom name to each FileClass, and specify the definition of the class:
Filesets {
    # log files owned by root
    FileClass root_log_files {
         definition {
             owner == root
             and
             name == "*.log"
         }
    }
    # files in filesystem tree /mnt/lustre/saveme
    FileClass saveme {
        definition { tree == "/mnt/lustre/saveme" }
    }
}

Then, those classes can be used in policies:

  • entries can be ignored for the policy, using a ignore_fileclass statement;
  • they can be targeted in a policy, using a target_fileclass directive.
migration_policies {
    # don't archive log files of 'root'
    ignore_fileclass = root_log_file;

    # quickly archive files in saveme
    policy copy_saveme {
        target_fileclass = saveme;
        # last_archive == 0 means "never archived"
        condition {(last_archive == 0 and creation > 1h)
                   or last_archive > 1h}
    }
    # The default policy applies to all other files
    policy default {
        condition {last_mod > 6h}
    }
}

Notes:

  • A given FileClass cannot be targeted simultaneously in several migration policies;
  • policies are matched in the order they appear in the configuration file. In particular, if 2 policy targets overlap, the first matching policy will be used;
  • You can directly ignore entries by specifying a condition in the 'migration_policies' section (without fileclass definition), using a 'ignore' block:
migration_policies {
    ignore { owner == root and name == "*.log" }
...

A FileClass can be defined as the union or the intersection of other FileClasses. To do so, use the special keywords union, inter and not in the fileclass definition:

FileClass root_log_A {
    definition {
         (root_log_files inter A_files)
         union (not B_files)
    }
}

Specifying a target archive

Lustre/HSM can manage multiple archive backends. Archive backends are identified by a unique and persistent index (archive_id). Be default, robinhood performs 'hsm_archive' operations without specifying an archive_id, so Lustre uses the default 'archive_id' from Lustre MDT configuration.

Robihood allows specifying a target archive per fileclass. This can be done in fileclass definitions, by specifying an 'archive_id' parameter:

fileclass foo {
        definition { ... }
        archive_id = 2;
}

Target 'archive_id' can also be specified in each policy rule:

migration_policies {
    policy save_to_arch1 {
        target_fileclass = foo1;
        target_fileclass = foo2;
        condition {last_mod > 1h}
        archive_id = 1;
    }
    policy save_to_arch2 {
        target_fileclass = foo3;
        condition {last_mod > 1h}
        archive_id = 2;
    }
    policy default {
        condition {last_mod > 1h}
        archive_id = 3;
    }
}

Note: Policy rule 'archive_id' overrides Fileclass 'archive_id'.

Migration parameters

Robinhood provides a fine control of migration streams: number of simultaneous copies, runtime interval, max volume or file count to be copied per run, priority criteria...

Those parameters are set in the 'migration_parameters' section. See the main parameters below:

Migration_Parameters {
    # simultaneous copies
    nb_threads_migration = 4 ;
    # sort order for applying migration policy
    # can be one of: last_mod, last_access, creation, last_archive
    lru_sort_attr = last_mod ;
    # interval for running migrations
    runtime_interval = 15min ;

    # maximum number of migration requests per pass (0: unlimited)
    max_migration_count = 50000 ;
    # maximum volume of migration requests per pass (0: unlimited)
    max_migration_volume = 10TB ;

    # stop current migration if 50% of copies fail
    #(after at least 100 errors)
    suspend_error_pct = 50% ;
    suspend_error_min = 100 ;
}

Manual migration actions

In previous migration commands we run, all files were considered.
You can apply policies only to a subset of files:

  • To apply migration policies only for files on a specific OST, run (one-shot command):
    rbh-lhsm --migrate-ost=<ost_index>
  • To apply migration policies only for a given user, run (one-shot command):
    rbh-lhsm --migrate-user=<username>
  • To apply migration policies only for a given group, run (one-shot command):
    rbh-lhsm --migrate-group=<groupname>
  • To apply migration policies only for a given fileclass, run (one-shot command):
    rbh-lhsm --migrate-class=<fileclass>
  • To apply migration policies to a single file, run (one-shot command):
    rbh-lhsm --migrate-file=<filepath>
  • To force archiving all files and ignore policy time conditions, run:
    rbh-lhsm --sync

Releasing Lustre disk space

Once data is archived to the HSM backend, it can be released from the Lustre level if disk space is missing to write new data or to restore older data from the backend.

Robinhood lhsm schedules this kind of operation as 'Purge' policies.

Purge are scheduled by 'triggers', based on high and low thresholds parameters: when an OST usage exceeds the high threshold, Robinhood release data in this OST until the OST usage is back to the low threshold.

Purge order is based on last access time (oldest first).

Using a single default policy

Robinhood makes it possible to define different purge policies for several file classes.
In this example, we will only define a single policy for all files.
This is done in the 'purge_policies' section of the config file:

purge_policies {
   policy default {
       condition {last_access > 2h}
   }
}

'default' policy is a special policy that applies to files that don't match a file class.
In a policy, you must specify a condition for allowing entries to be purged. In this example, we don't want to release recently accessed entries (read or written within the last 2 hours).

Note: like for migration policies, you can define multiple purge rules that apply to multiple fileclasses (see above).

We also define a 'purge_trigger' to trigger purge operations when a given OST is full:

purge_trigger {
    trigger_on         = OST_usage;
    high_threshold_pct = 85%;
    low_threshold_pct  = 80%;
    check_interval     = 5min;
}
  • trigger_on specifies the type of trigger.
  • high_threshold_pct indicates the OST usage that must be reached for starting purge.
  • low_threshold_pct indicates the OST usage that must be reached for stopping purge.
  • check_interval is the interval for checking disk usage.
Once a purge policy and a trigger have been defined, we can:
  • Run a trigger check once and trigger purge operations if necessary:
    <tt>rbh-lhsm --purge --once</tt>
  • Check triggers continuously and purge data when needed (daemon mode):
    <tt>rbh-lhsm --purge --detach</tt>
  • Check trigger thresholds without executing 'hsm_release' operations, run: rbh-lhsm --check-thresholds -L stderr
Note: the list of executed 'hsm_release' action is logged in the report file.

Other trigger parameters: To receive a mail notification each time a high threshold is reached, add this parameter to a trigger:
alert_high = yes ;

By default, robinhood raises an alert if it can't purge enough data to reach the low threshold.
You can disable those alerts by adding this in a trigger definition:
alert_low = no ;

Manual purge actions

In previous examples, purge was driven by a trigger. You can also force purging an OST or the whole filesystem to reach a given usage level:

  • To purge files of a given OST until is usage reach a given level, run (one-shot command):
    rbh-lhsm --purge-ost=<ost_index>,<usage_pct>
  • To purge files until the overall filesystem usage reach a given value, run (one-shot command):
    rbh-lhsm --purge-ost=<usage_pct>
  • To purge files of a given fileclass, run: (one-shot command):
    rbh-lhsm --purge-class=<fileclass_name>

Purge parameters

You can add a "purge_parameters" block to the configuration file, to have a better control on purges:

purge_parameters {
    nb_threads_purge = 4;
    post_purge_df_latency = 1min;
    db_result_size_max = 10000;
    recheck_ignored_classes = true;
}
  • Purge actions are performed in parallel. You can specify the number of purge threads by setting the nb_threads_purge parameter.
  • Filesystem may report OST usage asynchronously, so the lfs df command may take a few minutes before returning an up-to-date value after purging a lot of files. Thus, Robinhood must wait before checking disk usage again after a purge. This is driven by the post_purge_df_latency parameter.
  • If purge policies application looks too slow, you can speed-up it by disabling recheck_ignored_classes. This with result in not rechecking previously ignored entries when applying a policy. It is recommended enable it after you changed policy definitions.

Orphan cleaning

Orphan cleaning in HSM backend

Robinhood keeps track of deleted files in Lustre and clean the related entries in the archive after a certain delay (default delay is 1day).
This cleaning is done when running 'rbh-lhsm --hsm-remove' (with --once to run just one-shot).
To change the delay, or disable orphan cleaning, define a 'hsm_remove_policy' block in the config:

hsm_remove_policy {
    # set this parameter to 'off' for disabling removal in the archive
    hsm_remove = enabled;

    # delay before cleaning deleted object in the archive
    deferred_remove_delay = 30d;
}

Daemon mode and 'robinhood-lhsm' service

So far, we only started actions independently using options like '--readlog', '--migrate', '--purge'...
Note that can combine actions on the command line, e.g.:

rbh-lhsm --migrate --hsm-remove
You can also run all actions in a single instance of robinhood running as a daemon, just by executing 'rbh-lhsm' with no argument.
This is also done by default when starting the 'robinhood-lhsm' service:
service robinhood-lhsm start
You can possibly change the behavior of service robinhood-lhsm, by editing /etc/sysconfig/robinhood-lhsm For example:
RBH_OPT="--readlog --migrate"

Setting up web interface

Web interface makes it possible for an administrator to visualize top disk space consumers (per user or per group), top inode consumers with fancy charts, details for each user. It also makes it possible to search for specific entries in the filesystem.

You can install it on any machine with a web server (not necessarily the robinhood or the database node). Of course, the web server must be able to contact the Robinhood database.

Requirements: php/mysql support must be enabled in the web server configuration.
The following packages must be installed on the web server: php, php-mysql, php-xml, php-pdo, php-gd

The following parameter must be set in httpd.conf:

AllowOverride All

Install robinhood interface:

  • install robinhood-webgui RPM on the web server (it will install php files into /var/www/html/robinhood)
or
  • untar the robinhood-webgui tarball in your web server root directory(e.g. /var/www/http)
Configuration:
In a web browser, enter the robinhood URL: http://yourserver/robinhood
The first time you connect to this address, fill-in database parameters (host, login, password, ...).
Those parameters are saved in: /var/www/http/robinhood/app/config/database.xml

That's done. You can enjoy statistics charts.

Optimizations and compatibility

[new 2.5] Performance strategy for DB operations

You have the choice between 2 strategies to maximize robinhood processing speed:

  • multi-threading: perform multiple DB operations in parallel as independent transactions.
  • batching: batch database operations (insert, update...) into a single transaction, which minimize the need for IOPS on the database backend. Batches are not executed in parallel.
The following benchmarks evaluated the DB performance for each strategy.

Slow DB backend

The following benchmark ran on a simple test-bed using a basic SATA disk as DB storage for innodb.

Database performance benchmark over ext3/SATA:

In this configuration, batching looks more efficient than multi-threading whatever the thread count, so it has been made the defaut behavior for robinhood 2.5.

You can control batches size defining this parameter in the EntryProcessor configuration block (see section 3):

  • [new 2.5] max_batch_size (positive integer): by default, the entry processor tries to batch similar database operations to speed them. This can be controlled by the max_batch_size parameter. The default max batch size is 1000.

Fast DB backend

The following benchmark ran on a fast device as DB storage for innodb (e.g. SSD drive). Database performance benchmark over SSD:

In this configuration, multi-threading gives a better throughput with an optimal value of 24 pipeline threads in this case.

If your DB storage backend is efficient enough (high IOPS) it may be better to use the multi-threading strategy. To switch from batching strategy to multi-threading, set max_batch_size = 1.This will automatically disable batching and enables multi-threading for DB operations. Consider increasing nb_threads parameter in this case (in EntryProcessor configuration block):

  • nb_threads (integer): total number of threads for performing pipeline tasks. Default is 4. Consider increasing it if you disable batching.

Database tunings

You can modify those parameters in /etc/my.cnf to speed-up database requests:
(tuning innodb_buffer_pool_size is strongly recommended)

innodb_file_per_table
# 50% to 90% of the physical memory
innodb_buffer_pool_size=16G
# 2*nbr_cpu_cores
innodb_thread_concurrency=32
# memory cache tuning
innodb_max_dirty_pages_pct=15
# robinhood is massively multithreaded: set enough connections for its threads, and its multiple instances
max_connections=256
# increase this parameter if you get DB connection failures
connect_timeout=60
# This parameter appears to have a significant impact on performances:
# see the following article to tune it appropriately:
# http://www.mysqlperformanceblog.com/2008/11/21/how-to-calculate-a-good-innodb-log-file-size
innodb_log_file_size=500M
innodb_log_buffer_size=8M

To manage transactions efficiently, innodb needs a storage backend with high IOPS performances. You can monitor you disk stress by running sar -d on your DB storage device: if %util field is close to 100%, your database rate is limited by disk IOPS.In this case you have the choice between these 2 solutions, depending on how critical is your robinhood DB content:

  • Safe (needs specific hardware): put your DB on a SSD device, or use a write-back capable storage that is protected against power-failures. In this case, no DB operation can be lost.
  • Cheap (and unsafe): add this tuning to /etc/my.cnf: innodb_flush_log_at_trx_commit=2
This results in flushing transactions to disk only every second, which dramatically reduce the required IO rate. The risk is to loose the last second of recorded information if the DB host crashes. This is affordable if you use to scan your filesystem (the missing information will be added to the database during the next scan). If you read Lustre changelogs, then you will need to scan your filesystem after a DB server failure.

This little script is also very convenient to analyze your database performance and it often suggests relevant tunings: http://mysqltuner.pl

Optimize scanning vs reporting speed

By default, robinhood is optimized for speeding up common accounting reports (by user, by group, ...), but this can slow database operations during filesystem scans. If you only need specific reports, you can disable some parameters to make scan faster.
For instance, if you only need usage reports by user, you had better disable group_acct dd parameter; this will improve scan performance. In this case, reports on groups will still be available, but their generation will be slower: if you request a group-info report and if group_acct is off, the program will iterate through all the entries (complexity: O(n) with n=number of entries). If group_acct is on, robinhood will directly access the data in its accounting table, which is quite instantaneous (complexity: O(1)).

Performance example: with group_acct parameter activated, group-info report is generated in 0.01sec for 1M entries. If group_acct is disabled, the same report takes about 10sec.

SLES init depedencies

On SLES systems, the default dependency for boot scheduling is on "mysql" service. However, in many cases, it should be too early for starting robinhood daemon, especially if the filesystem it manages is not yet mounted. In such case, you have to modify the following lines in scripts/robinhood.init.sles.in before you run ./configure:

# Required-Start: <required service>

Lustre troubles

Several bugs or bad behaviours in Lustre can make your node crash or use a lot of memory when Robinhood is scanning or massively purging entries in the FileSystem. Here are some workarounds we had to apply on our system for making it stable:

  • If your system Oops in statahead function, disable statahead feature:
echo 0 > /proc/fs/lustre/llite/*/statahead_max
  • CPU overload and client performance drop when free memory is low (bug #17282):
    in this case, lru_size must be set at CPU_count * 100:
 lctl set_param ldlm.namespaces.*.lru_size=800 

Getting more information

Robinhood tmpfs admin guide gives more details about robinhood configuration (tmpfs mode). Most information in this guide also apply to lhsm mode.

You can find several tips and answers for frequently asked questions in the wiki pages on the robinhood project website:
http://robinhood.sf.net

You can also take a look at the archive of support mailing list on sourceforge:
http://sourceforge.net/projects/robinhood :
Mailing Lists > robinhood-support: archive / search

If that didn't help, send your question to the support mailing list:
[email protected]

Clone this wiki locally