Skip to content

Releases: ovis-hpc/ldms

OVIS-4.2.2

10 Jun 17:55
Compare
Choose a tag to compare

This is STABLE OVIS-4.2.X release.

Soon master will move from V3 to V4.

OVIS-3.4.13

29 Apr 14:23
Compare
Choose a tag to compare

Changes:

  • LDMS_LOG_PATH and gender attribute variable may now contain ${} for shell variable expansion in the daemon's runtime environment.
  • ldms-sensor-config expanded to support newer library use of openat.
  • simplified self check for host; fixes corner cases seen on clusters.
  • extended systemd init scripts to take default schema names from plugin names.
  • extended systemd init scripts to provide hooks for general site-specific extensions such as prepopulating job data on non-slurm nodes with the empty data set. The default hook handles the slurm case for the jobid/job_info plugins and may need disabling or tailoring per site in ldmsd.local.conf.
  • added csv utilities specific to ldms data export tasks.

OVIS-4.2.1-rc1

02 Mar 02:55
Compare
Choose a tag to compare
OVIS-4.2.1-rc1 Pre-release
Pre-release

OVIS-4.2.1-rc1.
OVIS-4.2_Beta will be deprecated.

OVIS-3.4.12

26 Feb 21:01
Compare
Choose a tag to compare

This release provides testing improvements and bug fixes, usability improvements, and a new sampler feature 'filesingle' for single-metric files typical of sysfs.

General

  • Improvements in user input checking in daemon and systemd scripts.
  • Added in-daemon check to prevent misconfiguration of aggregators collecting from themselves.

Testing tools

  • Slurm-based parallel testing tool pll-ldms-static-test.sh added.
  • Option transportdata= added to store_csv to enable collection of transport debugging info.
  • Added configure option --enable-mmdebug which disables mmap of transport data and detection of buffer overruns (but only for the sock transport) if environment variable LDMS_ENABLE_MMALLOC_DEBUG is also defined.

Plugins

  • Sampler filesingle added for collecting sysfs metrics (temps, volts, speeds, lustre, etc); config helper ldms-sensors-config provided.
  • Errors in Lustre 2.8 sampler corrected.
  • Added alternate store_csvdbg plugin, which is store_csv compiled with the storing transport data enabled.
  • Added chkmeminfo plugin, which is the meminfo plugin compiled with data-corruption-check stored in high bits of metrics.

Systemd

  • Provided do-not-repeat-yourself genders configuration of aggregators and stores by adding LDMSD_GENDERS_1 and _2 to allow L1 and L2 aggregators to inspect L0 genders files for connection data.
  • Fixed scripting errors in interpretation of genders for certain storage policy specifications.
  • Added LDMSD_DEBUG_CONFIG_FILE option to ldmsd.%I.conf which allows arbitrary ldmsd scripting to be appended to genders-based configuration output in /var/run/ldmsd/all-config.%I.
  • Fixed error messages from systemd scripts to be tagged with the correct daemon identity instead of 'root'.

Notes:

The luster2_client sampler in this release does not support lustre 2.10 and later due to refactoring of the lustre /proc/sys interface.

OVIS-3.4.11

25 Oct 20:34
Compare
Choose a tag to compare
  • Add metric whitelist and blacklist options to store_flatfile plugin.
  • Add rolltype=5 rollagain={period} (periodic rollover based on the wall clock time) to store_csv plugin.
  • ldmsd genders support changes:
    Add ldmsd_strgp_POLICYNAME support to customize containers.
    Fix ldmsaggd_event_thds gender support.
  • Add humane diagnostic of missing input files to ldms-static-test.sh.
  • Fix ldmsd mis-handling of empty aggregator interval specifications.
  • Fix ldms-static-test.sh bug (bug could NOT affect TOSS3 users).
  • Fix misformat of network port numbers in some log messages.

OVIS-3.4.10

26 Sep 18:00
Compare
Choose a tag to compare

Changes since 3.4.9:

  • Add %{env} support in csv rename template option.
  • Add -h option and rollover_created function to ldms-static-test.sh utility.
  • Updated man pages.
  • Fixes to insecure directory permissions (755) on rename/create in csv store.
  • Fix generation order for updtr_start command in ldmsctl_args3.
  • Fix init script miscomputed 'instance=' on certain sampler configuration cases.
  • Fix ldmsd@agg local example genders file.
  • Fix to SLURM prolog example in Plugin_jobid man page.

OVIS-3.4.9

16 Aug 20:18
Compare
Choose a tag to compare

Changes since 3.4.8:

  • lnet_stats bug fixed to not report stale data.
  • systemd init scripts updated to better handle custom schema names (or lack of them).
  • systemd libgenders specification error detection improved.

OVIS-3.4.8

25 Jul 17:32
Compare
Choose a tag to compare

New since 3.4.7:

  • a store_rabbitkw (see man Plugin_store_rabbitkw).
  • a new script command 'lsdate' for those working with timestamped csv files from store_csv. (man lsdate)
  • a bunch of minor improvements to test scripts.

OVIS-3.4.7

12 Jul 21:08
Compare
Choose a tag to compare

Changes in 3.4.7 since 3.4.6

FEATURE ADDS:

  • Made rate computation in sysclassib optional.
  • Added options to csv store to define uid/gid/perm at file create time.
  • Include timezone offset in ldms_ls output date stamps.
  • Extended gender options with ldmsd_idsuffix and ldmsd_id.

BUG FIXES:

  • Fixed rate computation in sysclassib so that reset-drops appear as negative rates. Before they appeared as large random numbers.
  • Fixed computation of host component ids in systemd scripts to account for fixed width (leading 0) integer fields in hostnames.
  • Fixed exclusion of 0 and uid/gid > 65536 on csv uid/gid options.
  • Detect and log once certain schema name conflicts at store_csv and store_rabbitkw. When the first instance of a schema name is larger than the second to hit the store, missing metrics are detectable. The reverse is not true, and data mislabelling may occur in this case.
  • Better logging of store_rabbitkw issues.
  • Fixed opa2 sampler log message priority.(error -> info)
  • Fixed incorrect (excess) detection of comments in config parameter names containing #.
    A # following whitespace or beginning a line begins a comment.

OVIS-3.4.6

23 May 15:48
Compare
Choose a tag to compare

Changes in 3.4.6 since 3.4.4

FUNCTIONAL CHANGES:

  • Added /usr/bin/ldms-static-test.sh and numerous test examples of ldms configuration in /usr/share/doc/ovis-ldms-3.4.6/examples/static-test. See man ldms-static-test. Includes store, sampler, and multilevel aggregation examples.

  • Added dstat sampler for monitoring ldmsd itself. Expected use is to be
    loaded on aggregator and storage ldmsd instances. See Plugin_dstat man page.

  • Added jobid collection support to lustre2_client sampler.

  • Added opa2 sampler to collect omnipath hfi interface metrics. See Plugin_opa2 man page.

  • Updated libgenders support for managing ports (see man ldms-attributes) in init scripts (see man ldms-attributes):
    ldmsd_use_unix_socket
    ldmsd_sockpath
    ldmsd_use_inet_socket
    ldmsd_config_port
    ldmsd_log
    ldmsd_vg
    ldmsd_vgargfile

  • Added filters to trap and warn about common gender spelling and punctuation errors.

  • Split the build/install of libgenders/boost tool from install of systemd scripts. Systemd scripts can be used without the ldmsctl_args3 tool if the user provides the daemon configuration commands in a named script listed in ldmsd.local.conf.

  • Added missing man pages for samplers ported from LDMS v2: clock, procstat, sysclassib, jobid, lustre2_client, procsensors.

  • New/updates to man, plugins for cray samplers aries_linkstatus, aries_mmr.

  • Changed defaults in systemd scripts to allow more open files at aggregators and syslogid.

  • Fixed overzealous failure condition handling in ldms_jobid.

  • Added debug output of registered memory (mmalloc) in use at exit to better bound -m option value needed for ldmsd instances. New mm_stat call in lib/mmalloc supplies the data.

SECURITY CHANGES:

  • Fixed default insecure (commonly know secret) ldmsauth file. Now it is invalid by default (too short).

RUNTIME CHANGES/BUG FIXES:

  • Fixed C bugs in store related code:

    • idx_delete
    • notification (memory leak)
    • avl (attribute/value list handling of error conditions)
    • thread locking error in store_csv
  • Fixed C bugs in network transports:

    • rdma connection resource leaks in error handling cases.
  • Fixed C bugs in samplers:

    • jobid minor fixes
    • procnfs sampler now accounts for variations in nfs file layout. The procnfs sampler has never supported nfsv4 metrics and does not now.
    • Reduced repetitive logging of the same transient failure conditions.
    • Updated several samplers to run through transient disappearance of /proc.

HOUSEKEEPING CHANGES:

  • Removed LDMS_BUILDTYPE from systemd control scripts (it was preventing relocatability, and is in any case obsolete).

  • Remove most old packaging scripts from ldms source tree packaging/ directory.

  • Change install permissions on pedigree script.

  • Update rpath macro in build (deprecates some old apple os versions).

  • Made rpms fully relocatable without forcing the user to manually set ld and zap related environment variables before invocation. This entails wrapping all the sbin/ldms binaries in .ldms-wrapper. Thanks to cray for assistance in this.

DEVELOPER CHANGES:

  • Updated installed include files and /usr/lib/ovis-[ldms/lib]-configvars.sh so that 3rd party plugins can be built when only the installed ldms binaries and headers are used.

  • Updated .gitignore settings.