Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add log_level integration test and reduce order dependence between load and log_level #1344

Closed
wants to merge 1 commit into from

Conversation

baallan
Copy link
Collaborator

@baallan baallan commented Jan 30, 2024

fixes #1343 (excess order dependence between load and log_level).
updates the header documentation for ovis_log_set_level_by_name.
Make the ovis_log_t pointer opaque outside ovis_log.c

illustrates #1342 (excess reordering of log_level statements breaks debugging scenario log flow management).

$prefix/bin/ldms-static-test.sh loglvl

will generate a FAIL message at the end due to reordering of log_level commands in the input script.

@baallan
Copy link
Collaborator Author

baallan commented Jan 30, 2024

As the issue fixed here affects developer/admin usability rather than data collection, I don't propose to back-cherry-pick it to 4.4.x.

@tom95858
Copy link
Collaborator

tom95858 commented Feb 2, 2024

@nichamon, could you please review this?

@nichamon
Copy link
Collaborator

nichamon commented Feb 2, 2024

@nichamon, could you please review this?

The code looks okay.

A downside is that there may not be any log_subsystem with the name given to log_level ever, especially in the case of a typo, and LDMSD will not report the error.

I am running it against the ldms-test test for ovis_log.

@tom95858
Copy link
Collaborator

tom95858 commented Feb 2, 2024

@baallan the reason there is an order dependence is because the plugin decides what the log name is. I'm struggling with creating logs for plugins that are not loaded yet. What is the problem exactly with setting the log level after the plugin is loaded and the name is known and you can get a proper error message if the configuration writer finger fumbles the log name?

@baallan
Copy link
Collaborator Author

baallan commented Feb 3, 2024

@baallan the reason there is an order dependence is because the plugin decides what the log name is. I'm struggling with creating logs for plugins that are not loaded yet. What is the problem exactly with setting the log level after the plugin is loaded and the name is known and you can get a proper error message if the configuration writer finger fumbles the log name?

@tom95858
One of the problems is that debugging statements in the init function called as part of plugin loading cannot be controlled by specific log configuration statements if such log configurations can only be made after the plugin is fully loaded.

Another of the problems is that log contexts with dynamically determined names like a plugin store.store_csv.$context might want to be configured as part of the plugin configuration that happens between load and config (where $context is finally registered only after the first set of type 'context' arrives). The user knows the exact name they care about but the plugin only knows naming construction patterns.

Finally, least-surprise suggests that we follow the practice in other tools that provide context dependent logging; e.g. log4c/log4j let the end user write a configuration file parsed at global log init, not after context registration. This decouples
(when needed) debugging settings from the rest of the application loading logic, just as this change does.

The finger-fumble problem exists but is trivially handled. log_status clearly displays "(pending registration)" for any plugin not yet loaded or any name that is misspelled. For example a common error will be for someone to set the log level of a plugin with

log_level name=meminfo ...
instead of
log_level name=sampler.meminfo ...

As we do not in any other area of the C-based command language provide typo-detection messages of the sort "X123 not defined; did you mean item X124?" I don't see why we would do it here. If a core fuzzy-matching library was added, it could of course be supported in many areas of the interface including log config.

However, if you want a WARNING or INFO message emitted when log level is set for a pending name, i'll be happy to add that. This would be equivalent to the message GDB gives when you set a breakpoint to a function not yet loaded.

@tom95858
Copy link
Collaborator

tom95858 commented Feb 4, 2024

@baallan the reason there is an order dependence is because the plugin decides what the log name is. I'm struggling with creating logs for plugins that are not loaded yet. What is the problem exactly with setting the log level after the plugin is loaded and the name is known and you can get a proper error message if the configuration writer finger fumbles the log name?

@tom95858 One of the problems is that debugging statements in the init function called as part of plugin loading cannot be controlled by specific log configuration statements if such log configurations can only be made after the plugin is fully loaded.

What is hard about this?

Another of the problems is that log contexts with dynamically determined names like a plugin store.store_csv.$context might want to be configured as part of the plugin configuration that happens between load and config (where $context is finally registered only after the first set of type 'context' arrives). The user knows the exact name they care about but the plugin only knows naming construction patterns.

Finally, least-surprise suggests that we follow the practice in other tools that provide context dependent logging; e.g. log4c/log4j let the end user write a configuration file parsed at global log init, not after context registration. This decouples (when needed) debugging settings from the rest of the application loading logic, just as this change does.

You're not defining open() or printf() here @baallan which is where this quote comes from, these are application level interfaces and ironically this change makes them more unpredictable not less.

The finger-fumble problem exists but is trivially handled. log_status clearly displays "(pending registration)" for any plugin not yet loaded or any name that is misspelled. For example a common error will be for someone to set the log level of a plugin with

A person would have to understand what "pending registration" means. It's silly, they won't.

log_level name=meminfo ...
instead of
log_level name=sampler.meminfo ...

As we do not in any other area of the C-based command language provide typo-detection messages of the sort "X123 not defined; did you mean item X124?" I don't see why we would do it here. If a core fuzzy-matching library was added, it could of course be supported in many areas of the interface including log config.

However, if you want a WARNING or INFO message emitted when log level is set for a pending name, i'll be happy to add that. This would be equivalent to the message GDB gives when you set a breakpoint to a function not yet loaded.

I'm sorry, I can't parse this part in any way.

@tom95858
Copy link
Collaborator

tom95858 commented Feb 4, 2024

How do we feel about adding a log-level option to the load command? That way we could get what plugin writers want/need and not have the possibility of waif logs that are not associated with any plugin.

@baallan
Copy link
Collaborator Author

baallan commented Feb 5, 2024

How do we feel about adding a log-level option to the load command? That way we could get what plugin writers want/need and not have the possibility of waif logs that are not associated with any plugin.

The concern for "waif's" is massively overblown and customizing old commands to accommodate new orthogonal features seems like the wrong way to go. Separation of concerns (what logging i want vs what test i am applying it to) seems really obvious here. Also, this moves us for logging into the declarative direction in all but the most verbose debugging scenarios.

In fact I dislike 'extending load' intensely. Here's my motivating use-case for this dislike:

  • I turn on global default 'DEBUG' and start my daemon with the configuration to be tested (which may have come to me from elsewhere)
  • I inspect the log for all the unique context names (with grep/sed/cut/sort -u this is trivial) if running log_status isn't convenient.
  • I make a file with "log_level name=X level=Y" or some such for each item in the list.
  • I prepend that level setting file to whatever input I'm testing, set the global default back to error or critical, and proceed to rerun the test with no risk of screwing up anything in the test that came to me from elsewhere.

Similarly, we can in tutorials publish a variety of log level setting prologs for different purposes that users can then apply to any scenario they want to investigate. e.g.

  • monitor your store csv
  • monitor your sosdb store
  • monitor the auth processes
  • monitor the transmission of a new command you added

@morrone
Copy link
Collaborator

morrone commented Feb 6, 2024

I love making the log struct opaque.

One thing I'm not following in the "(pending registration)" part. It seems at first glance that the only way that any "subsystem" can log messages is through the ovis_log_t handle. The only way to get an ovis_log_t handle is through ovis_log_register(), at which time "desc" is set. Under what conditions would "(pending registration)" be displayed? Do we need that at all?

@tom95858
Copy link
Collaborator

tom95858 commented Feb 6, 2024

He's using at as a "sentinel" to determine when/if the log has been pre-registered or registered. I think this is something called "data coupling" which is, in general, highly discouraged. This is still top of tree, so we have time to refactor.

@baallan
Copy link
Collaborator Author

baallan commented Feb 6, 2024

@morrone "(pending registration)" means no plugin or framework has yet called register(context_name, context_description). I have provided usage examples above where the configurer of ldmsd would want to set the log configuration for a given plugin before the plugin is loaded. Allowing level setting only after the plugin is loaded (a) prevents controlling the logging that may happen during loading, (b) forces modifications of a configuration script to be debugged into that script after the load statement, introducing the possibility of other inadvertent changes to the configuration being debugged.

(Pending registration) is the place-holder for the description that the plugin will provide. It is displayed by log_status which lists all the registered (or with this change the expected) log names and descriptions.

@morrone
Copy link
Collaborator

morrone commented Feb 6, 2024

@baallan the reason there is an order dependence is because the plugin decides what the log name is.

That seems like a problematic design in the existing log code. How would a mere human ever know what to put in the "name=" field for the log command then, if the plugin can make that anything it wants? That is inconsistent with existing command design if the name isn't the same name used in the other commands.

I think I agree that this points to us needing to allow passing a log level in as part of "load".

I certainly sympathize with the desire to set log levels at the top of the file, but this is maybe one of the many down sides of us having an imperative configuration system rather than having a declarative configuration file. Perhaps more global log level setting (with proper checks to ensure no dangling log levels) can be part of the declarative configuration format that we are going to get from maestro?

@morrone
Copy link
Collaborator

morrone commented Feb 6, 2024

@morrone "(pending registration)" means no plugin or framework has yet called register(context_name, context_description). I have provided usage examples above where the configurer of ldmsd would want to set the log configuration for a given plugin before the plugin is loaded. Allowing level setting only after the plugin is loaded (a) prevents controlling the logging that may happen during loading, (b) forces modifications of a configuration script to be debugged into that script after the load statement, introducing the possibility of other inadvertent changes to the configuration being debugged.

(Pending registration) is the place-holder for the description that the plugin will provide. It is displayed by log_status which lists all the registered (or with this change the expected) log names and descriptions.

Oh, well in that case I think it would be more straight-forward to just add a:

  bool registered;

to the struct. That is far more clear to the code reviewer what you are trying to accomplish. As currently written, that says to me "we expect log messages to be sent to this before registration happens". Having a separate flag in the struct that gates usage based on whether or not it is registered makes things far clearer. Sure, the code that handles log_status needs to be updated to say "(Pending registration)" when a log struct is not yet registered, but you are genuinely adding a new state of existence to that struct, so making that explicit makes things clearer, I think.

@baallan
Copy link
Collaborator Author

baallan commented Feb 6, 2024

There is a much more complicated implementation I could propose for a separate "preregistration" list. Such a list would collect all the log_level settings for names that cannot be permanently resolved at the time the statement is parsed/first processed. No new command-line syntax needed; just semantics extended to make log configuration much more flexible.

This list would actually permanently include any regex name policies as well, since it is almost certain that some plugins or some contexts will be registered after regex policies are defined; it would be similar to the repeated handling of a regex matching list for producers.

There could still be a typo in the name during a preregistration, but we would separately list preregistration & regex log_level rules from the actually running logs.

@baallan
Copy link
Collaborator Author

baallan commented Feb 6, 2024

@morrone On the topic of "declarative" or not, a very common debugging log scenario is something like:

log_level name=sampler.foo level=debug
load name=foo
config name=foo
log_level name=sampler.foo level=WARNING

where we don't want per-sample info in bulk.
So this really is a procedural activity in some cases to adjust log_level.

We could expand the log_level API to have not just a plugin context but a second function context.

log_level name=sampler.meminfo action=config level=debug

but this then adds comparison logic and another argument for the action to the log call processing .
I think i would rather proliferate log objects: sampler.meminfo.config. It would handle the majority of cases where I want to turn logging on or off procedurally. The exception would be where i want to change the log level after the first N (N usually 1) calls to a specific api function on a specific instance.

@morrone morrone mentioned this pull request Feb 6, 2024
@tom95858
Copy link
Collaborator

tom95858 commented Feb 6, 2024

@baallan, this is getting a bit far afield. All we want to do is log messages from init(), correct?

@tom95858
Copy link
Collaborator

tom95858 commented Feb 7, 2024

@baallan the reason there is an order dependence is because the plugin decides what the log name is.

That seems like a problematic design in the existing log code. How would a mere human ever know what to put in the "name=" field for the log command then, if the plugin can make that anything it wants? That is inconsistent with existing command design if the name isn't the same name used in the other commands.

I think I agree that this points to us needing to allow passing a log level in as part of "load".

I certainly sympathize with the desire to set log levels at the top of the file, but this is maybe one of the many down sides of us having an imperative configuration system rather than having a declarative configuration file. Perhaps more global log level setting (with proper checks to ensure no dangling log levels) can be part of the declarative configuration format that we are going to get from maestro?

@morrone, there are commands to query the registered logs. And the log naming is structured, so I can say "log_level name=sampler.*" and select all sampler messages, etc... I think this whole thing stems from the multi-config changes that added an init function and @baallan wants to be able to control how messages are emitted by that function. This is why I suggested amending the load command to specify the initial log level. You could change it globally too, by setting a log level for the default log, but then you would get a flood of messages that you may not be interested in.

As I said above, the design case was to allow the user to issue a query command to see what logs are available and then use the log_level command to select what they were interested in changing. The man page would/should also specify what the log name is for each plugin.

With this change, anyone can submit a log_level request and create a log that may never be used (without error) which is why @nichamon and I are skeptical of this change. Also, they may request to see debug messages from an init function and never know that they had specified the wrong log name.

At this point, I think we should revert this change and replace it with the extra option(s) for the load command.

@nichamon, what do you think?

@morrone
Copy link
Collaborator

morrone commented Feb 7, 2024

At this point, I think we should revert this change and replace it with the extra option(s) for the load command.

I don't think this has landed yet. Nothing to revert, unless I'm missing something?

@tom95858
Copy link
Collaborator

tom95858 commented Feb 7, 2024

Sorry, I meant re-factor...

@baallan
Copy link
Collaborator Author

baallan commented Feb 7, 2024

Even if we mix in log_level with load, that doesn't fix the problem that the current implementation for wildcard log name matching only applies to logs that are already registered. (Admittedly, this patch as-is also doesn't fix the regex registration problem-- I am trying to first get the load debugging problem fixed and get the precedent set that we need forward-looking log level setting policy conceptually).

If folks generally prefer a one-off solution for load and for Mon to bring the regex policy processing up to the same methodology we have for producer/updtr matching, that would be adequate.

@nichamon
Copy link
Collaborator

nichamon commented Feb 7, 2024

Firstly, it's important to clarify that the pull request aims to enable applications to set the log level of a non-existing log subsystem. @baallan initiated the patch because he has a specific use case: he wants to observe DEBUG messages from a particular plugin during its loading and configuring stages, and then change the level to ERROR after the plugin has been configured.

It's crucial to note that the reason for the order dependency is not because plugins can name their log subsystems arbitrarily. Rather, it's because ovis_log doesn't permit applications to set the level of a non-existing subsystem. Therefore, ovis_log_set_level_by_name() and ovis_log_set_level_by_regex() are designed to only adjust the log level of existing log subsystems.

Another crucial point is that LDMSD plugins cannot arbitrarily name their log subsystems. A plugin's log subsystem must align with <plugin type>.<plugin name>, where <plugin name> corresponds to the string base.name. This requirement was discussed in a user group meeting and should be documented in the guidelines for writing LDMSD plugins. As a side note, ongoing efforts are being made to update and enhance the documentation, as it remains a work-in-progress.

@nichamon
Copy link
Collaborator

nichamon commented Feb 7, 2024

To address @baallan's use case of setting the log level of a specific plugin during both load and config times, different from the global log level, there are two proposed solutions:

  1. the patch created by Ben,
  2. modifying the load command and the plugin load interface to accept the default log level of a plugin.

In my view, I'm not particularly supportive of either option. Firstly, regarding option 1), I'm concerned that it could potentially lead to more errors in LDMSD configuration. While LDMSD might not rigorously check for typos in all instances, I don't believe we should introduce additional opportunities for typos to occur. As for option 2), modifying the plugin interface will effects on all plugin implementations, which leads to a huge change.

@nichamon
Copy link
Collaborator

nichamon commented Feb 7, 2024

I believe that the primary use case for changing the log level before and after configuring a plugin occurs mainly during the testing stage of a release or when troubleshooting issues arises.

An alternative approach to achieving what Ben desires is to initially set the global log level to DEBUG (or the most verbose level they want to see log messages) and then adjust it at the end of the configuration file. While this method would result in a higher volume of log messages, it offers a workaround. By utilizing tools like grep and specifying the plugin name along with the desired log level, one can filter out the relevant log messages. Although this method might not be as convenient as having the proposed patch integrated, it only requires a couple of bash commands to extract the desired log messages.

@nichamon
Copy link
Collaborator

nichamon commented Feb 8, 2024

@baallan I just reread issue #1343 and your example in #1342. I want to understand your use case and the problems you encountered, which led you to require more verbose logging during the loading process and also this patch. From this, I inferred the following:

  • LDMSD had been running.
  • You were sending a load command from a prolog script to load a sampler plugin.
  • The plugin didn't load as expected.
  • Therefore, you want to see the plugin-specific DEBUG messages reported in get_plugin().

Please confirm if my understanding is accurate.

@baallan
Copy link
Collaborator Author

baallan commented Feb 8, 2024

@nichamon That is partially correct. In addition to the not loading example you correctly extracted, there is information I want logged at successful load time rather than waiting for config() or sample().

@tom95858
Copy link
Collaborator

tom95858 commented Feb 8, 2024

@nichamon That is partially correct. In addition to the not loading example you correctly extracted, there is information I want logged at successful load time rather than waiting for config() or sample().

What Ben, please be specific.

@baallan
Copy link
Collaborator Author

baallan commented Feb 8, 2024

@nichamon That is partially correct. In addition to the not loading example you correctly extracted, there is information I want logged at successful load time rather than waiting for config() or sample().

What Ben, please be specific.

The specifics are highly plugin and developer dependent; they're messages to the debug or warning log that go beyond the standard plugin loading messages from the framework about finding the library. So how is it relevant to this discussion?

An obvious example would be something like "resource xyz does not exist; this plugin cannot do anything useful on this node. if it should, you may need to 'insmod xyz' before starting ldmsd".

@tom95858
Copy link
Collaborator

tom95858 commented Feb 9, 2024

@nichamon That is partially correct. In addition to the not loading example you correctly extracted, there is information I want logged at successful load time rather than waiting for config() or sample().

What Ben, please be specific.

The specifics are highly plugin and developer dependent; they're messages to the debug or warning log that go beyond the standard plugin loading messages from the framework about finding the library. So how is it relevant to this discussion?

An obvious example would be something like "resource xyz does not exist; this plugin cannot do anything useful on this node. if it should, you may need to 'insmod xyz' before starting ldmsd".

I apologize, but that is not an example, it's a vague statement about a presumed problem. What is the name of the sampler? What is the resource in question? From which function/context, etc... do you want this message emitted from?

@baallan
Copy link
Collaborator Author

baallan commented Feb 9, 2024

@tom95858 a substantial list of samplers (tx2mon, the various permutations on samplers of IB, omnipath, infiniband, slingshot, and lustre) would be improved by putting appropriate diagnostics in the load() implementation instead of delaying to config(). This is even more true for the multi-instance samplers where the problem is with the plugin/system combination and not with a specific configuration of the plugin.
The resources in question for the plugins/plugin groups listed above are the devices or drivers not present or present but their data in /proc or /sys is not accessible.

@baallan
Copy link
Collaborator Author

baallan commented Feb 9, 2024

@nichamon, you mentioned:

Another crucial point is that LDMSD plugins cannot arbitrarily name their log subsystems. A plugin's log subsystem must align with <plugin type>.<plugin name>, where <plugin name> corresponds to the string base.name. This requirement was discussed in a user group meeting and should be documented in the guidelines for writing LDMSD plugins. As a side note, ongoing efforts are being made to update and enhance the documentation, as it remains a work-in-progress.

That (type.name) was simply an opinion and a recommendation; it was not something broadly agreed to and it is not something enforceable when the registration name is simply a string. For example (type.name.instance_name) names are needed for filtering logs from multi-instance plugins. Because the plugin author may determine a need for further named log subcontexts for more complicated components (like samplers that also handle streams and sets or like transform plugins or like whatever else we cannot foresee), we have no business describing (type.name) as the only allowable context name.
No other OO logging system enforces something like this; they leave it to the discretion of developers of code to understand what contexts are useful in logging, after stating similar guidelines.

But this is far afield from the primary discussion which is that configuring a log context should not be tied to that context already having been defined (and that 'load' is the poster-child example of why they should not be sequentially tied).

@tom95858
Copy link
Collaborator

tom95858 commented Nov 1, 2024

@baallan, I am closing this request. Please move this to a discussion and we will continue there.

@tom95858 tom95858 closed this Nov 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

log_level command for a subsystem does not work before the subsystem is loaded
4 participants