Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleanup LDMSD's command-line options #1369

Merged
merged 11 commits into from
Mar 5, 2024

Conversation

nichamon
Copy link
Collaborator

@nichamon nichamon commented Mar 4, 2024

No description provided.

The `default_auth` command is designed for integration into
configuration files, serving as an alternative to the `-a` and `-A`
command-line options. The command-line options take precedent over the
`default_auth` in configuration files.

usage:
 default_auth plugin=<auth plugin name> <auth-attribute value pairs>
The `set_memory` command is designed for integration into configuration
files, serving as an alternative to the `-m` command-line option. The
command-line option `-m` takes precedent over the `set_memory` command
in configuration files.

Usage:
   set_memory size=<memory size>

Examples:
   set_memory size=2GB
   set_memory size=512M
The `log_file` command is designed for integration into configuration
files, serving as an alternative to the `-l` command-line option. The
command-line option `-l` takes prededent over the `log_file` command in
configuration files.

Usage:
   log_file path=<path to the log file>
The `publish_kernel` command is designed to replace the `-k` and `-s`
command-line options.
The `daemon_name` command is designed for intergration into
configuration files, serving as a replacement of the `-n` command-line
option.

Usage:
   daemon_name name=<NAME>
The `worker_threads` command is designed for integration into
configuration files, serving as a replacement of the `-P` command-line
option.

Usage:
   worker_threads num=<number of threads>
The `default_credits` command is designed for integration into
configuration files, serving as a replacement of the `-C` command-line
option.

Usage:
   default_credits credits=<INTEGER>
The `pid_file` configuration command is an alternative approach to set
the PID file path in configuration file to the `-r` command-line option.
The `banner` configuration command is a replacement of the `-B`
command-line option.

Example:
banner mode=<0|1|2>
Without the patch, LDMSD always creates a PID file. It determines the
PID file location in the following order: the command-line option `-r`,
the environment variables `LDMSD_PIDFILE`, and the default path
`/var/run/ldmsd.pid`. Running LDMSD as a non-root user withput
specifying a PID file path always causes a permission-denied error
message because a non-root user cannot access `/var/run` directory.

The patch changes LDMSD so that it only create a PID file when either
the command-line option `-r` or the config command `pid_path` is
specified. Regarding starting LDMSD via `systemd`, `systemd` can be
configured to create a PID file for LDMSD.
The patch makes the daemon mode obsolete. The command-line option `-F`
is removed.
@nichamon nichamon requested a review from tom95858 March 4, 2024 17:44
cleanup(8, "daemon failed to start");
}
}

/*
* TODO: It should be a better way to get this information.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nichamon can you elaborate on this comment?

Copy link
Collaborator Author

@nichamon nichamon Mar 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the comment, when I refactored LDMSD to use the ovis_log library, as a reminder to find a better way for admins to change the time format of log messages from date-time to timestamp. Currently, admins can configure LDMSD to use timestamps in seconds from the epoch in log messages by setting the environment variable LDMSD_LOG_TIME_SEC. I wasn't a fan of this approach because it is hard to use.

Since I introduce this, there haven't been any use cases. Here are two path forwards:

  1. We decide there's no practical use for multiple time formats: In this case, I will remove the code block.
  2. We decide supporting multiple formats may benefit in the future: I would change the new log_file config command from
    log_file path=<log file path>
    to
    log dest=<log file path> [time_format=<date-time|timestamp|none>].
    The ovis_log library supports the three modes: date-time (LDMSD default), timestamp in second from epoch, and no time information. Beside modifying the config command, I would remove the logic to get the time format from the environment. The only way to change the time format in LDMSD's log messages is to use the config command `log.

What do you think?

exit(1);
}
}
if (pidfile) {
if( !access( pidfile, F_OK ) ) {
ovis_log(NULL, OVIS_LERROR, "Existing pid file named '%s': %s\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nichamon, @valleydlr do we need this message?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems this message could be helpful in identifying misconfiguration of the -r in LDMSD. For example, if two LDMSDs are configured to create a PID file at /home/my/var/pid/ldmsd.pid on different nodes that share the same home directory, they will overwrite each other's PID files.This scenario is more likely to occur in a test environment. Ultimately, I am neutral on whether to keep or remove the message.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nichamon, I think this is reasonable.

@@ -2241,13 +2245,12 @@ int main(int argc, char *argv[])
ldms_version.flags, OVIS_GIT_LONG

#if OVIS_LDMS_HAVE_AUTH
fprintf(bfile, BANNER_PART1_A
fprintf(bfile, BANNER_PART1_A
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nichamon, @valleydlr did we talk about getting rid of this banner file?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the spreadsheet we went over together in Austin (I left early), it was decided to remove the command-line option -B and add a new config command to replace the option.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@valleydlr, do you want to retain the Banner file?

@tom95858 tom95858 merged commit 90d1563 into ovis-hpc:OVIS-4 Mar 5, 2024
13 of 14 checks passed
@baallan
Copy link
Collaborator

baallan commented Mar 6, 2024

@nichamon @tom95858 This merge makes subsequent subsequent pull requests fail in for example:
https://github.com/ovis-hpc/ovis/actions/runs/8175122029/job/22351519845?pr=1370
which hangs forever in 'starting agg4'
repeatably.

@tom95858
Copy link
Collaborator

tom95858 commented Mar 6, 2024

@narategithub I think this causes the compatibility test to hang trying to detect the successful start of a daemon. Could you please take a look a this and fix it?

@narategithub
Copy link
Collaborator

@tom95858 I'm looking into it.

@nichamon
Copy link
Collaborator Author

nichamon commented Mar 6, 2024

@narategithub created a patch to fix the problem here. #1371

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants