Skip to content

Data Management

Tom Tucker edited this page Oct 7, 2021 · 4 revisions

D/SOS Data Management

Containers

A container is the named entity into which all object data and object indices are stored, collectively the container's data.

Inside the container are one or more partitions each of which contains a portion of the container's data. Partitions can be attached to and detached from a container. Partitions are the mechanism by which the contents a container is managed, i.e. backed up, moved, etc...

Create a new container and query it:

$ sos-db --create --path /tmp/cont-test --mode 0o660  
$ sos-db --query --path /tmp/cont-test  
Name               State      Accessed           Modified           Size       Path  
------------------ ---------- ------------------ ------------------ ----------- --------------------  
default            PRIMARY    10/04/21 12:56:04  10/04/21 12:55:50       0.1MB /tmp/cont-test/default  

Note that the new container has a single default partition. We can query the partition in more detail using the sos-part command.

$ sos-part --query --cont /tmp/cont-test
Name               State      uid   gid   Permissions  Size       Description                              Path
------------------ ---------- ----- ----- ------------ ---------- ------------------------------------ --------------------
default            PRIMARY     1002  1002 rw-rw----        65.5K  default container partition          /tmp/cont-test/default
------------------ ---------- ----- ----- ------------ ---------- ------------------------------------ --------------------

Note the default partition has an owner, group and permission. This controls access to the partition's data. A container may be attached to many partitions owned by different users and with different access rights. When a client attempts to access data in a container, it only sees the data for which it has read access. In the case above only user 1002 or anyone in the 1002 group can "see" the partition's data.

Partitions

A partition is a location in a filesystem into which data is stored. A partition is created with the sos-part command.

$ sos-part --create --path /tmp/partitions/current --desc "New data added will go here" --user 1002 --group 1002 --perm 0o666
$ sos-part --query --path /tmp/partitions/current
Name               State      uid   gid   Permissions  Size       Description                          Path
------------------ ---------- ----- ----- ------------ ---------- ------------------------------------ --------------------
                   DETACHED    1002  1002 rw-rw-rw-         196B  New data added will go here          

Note that this new partition is in the DETACHED state and the path is missing! This is because the path where a partition is located and its name are attributes of the container, not the partition. A partition is only a location in a filesystem that has been formatted to contain D/SOS data. Once created, a partition must then be attached to a container to be accessible. The reason for this is that a partition may be attached to multiple containers, with different names and different paths. The owner/group, access rights, and description, however, are stored in the partition itself and will appear the same in any container.

To store into this new partition, we have to attach it to the container.

$ sos-part --query --cont /tmp/cont-test
Name               State      uid   gid   Permissions  Size       Description                          Path
------------------ ---------- ----- ----- ------------ ---------- ------------------------------------ --------------------
default            PRIMARY     1002  1002 rw-rw----        65.5K  default container partition          /tmp/cont-test/default
current            OFFLINE     1002  1002 rw-rw-rw-         297B  New data added will go here          /tmp/partitions/current
------------------ ---------- ----- ----- ------------ ---------- ------------------------------------ --------------------

The partition is now attached to cont-test, however, a newly attached partition is in the OFFLINE state and its data is not visible to clients of the container. The reason for this is to allow administrative control of access to partition data from a container regardless of the partition's access rights.

The state of the attached partition controls administrative access. The states are as follows:

PRIMARY   The partition's data is visible to clients, and new data will flow into this partition by default. *ACTIVE *   The partition's data is visible to clients, and new data can be added to this partition if the client requests so. OFFLINE   The partition is visible to the container, but the partition data is not.

To make the current partition the target of new data, set its state to primary.

$ sos-part --cont /tmp/cont-test --name current --state primary
$ sos-part --query --cont /tmp/cont-test
Name               State      uid   gid   Permissions  Size       Description                          Path
------------------ ---------- ----- ----- ------------ ---------- ------------------------------------ --------------------
default            ACTIVE      1002  1002 rw-rw----        65.5K  default container partition          /tmp/cont-test/default
current            PRIMARY     1002  1002 rw-rw-rw-         402B  New data added will go here          /tmp/partitions/current
------------------ ---------- ----- ----- ------------ ---------- ------------------------------------ --------------------

LDMSD Use Case

On even a modest cluster size of a thousand nodes, ldmsd will collect terabytes of data in a few days. The container on the storage assets local to the ldmsd are useful for run-time and near-term analysis, but they cannot grow unbounded without exceeding the capacity of these storage assets. There needs to be a strategy for migrating some of the data out of the local container to remote storage for subsequent large scale data analysis.

Consider an approach where we wish to maintain a week of live data in the local container. After the first week, the oldest data needs to be moved out of the local container to make room for the newly arriving data. The partition feature of D/SOS is elemental to implementing strategy.

Let's assume that this is all implemented with a set of cron rules as follows:

  • Every day at midnight a new partition is created.
    • The path needs to be unique to avoid clobbering other data. Perhaps the weekday name with the UNIX timestamp (i.e. date +%s) appended.
  • This partition is attached to the local container and the state set to PRIMARY
    • ldmsd will now store all subsequent data to this partition.
  • The oldest partition is detached from the local container.
  • This partition is copied to remote storage, it's path is arbitrary, but must be unique in the remote storage to avoid clobbering existing content.
    • It is a good idea to use paths in remote storage that are instructive for defining the partition content, e.g. host-month-year, etc... or some finer granularity if desired.
      • The host in this case is necessary for DSOS because although the local partition paths may be the same on each server, they cannot be in remote storage.
  • The partition path in the local storage is removed, recovering this storage capacity.

This implements the storage management function at the ldmsd aggregator, but what about analysis of historical data?

Analysis Containers

Researchers will want to perform analysis over time scales far greater than what are supported at the local containers. Again, the partition feature of D/SOS is elemental for this capability.

Using only SOS and not D/SOS, the strategy is as follows:

  • Create an analysis container using the sos-db command
  • Attach the partitions containing the data of interest using the sos-part command

If you wish to distribute the access, the strategy is the same except that N containers will be created, one for each D/SOS server. Each of these containers will host a subset of the partitions.

Main

LDMSCON

Tutorials are available at the conference websites

D/SOS Documentation

LDMS v4 Documentation

Basic

Configurations

Features & Functionalities

Working Examples

Development

Reference Docs

Building

Cray Specific
RPMs
  • Coming soon!

Adding to the code base

Testing

Misc

Man Pages

  • Man pages currently not posted, but they are available in the source and build

LDMS Documentation (v3 branches)

V3 has been deprecated and will be removed soon

Basic

Reference Docs

Building

General
Cray Specific

Configuring

Running

  • Running

Tutorial

Clone this wiki locally