Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Count the number of throttled log message and print it. #1520

Closed
wants to merge 5 commits into from

Conversation

ovalenti
Copy link
Contributor

@ovalenti ovalenti commented Jan 26, 2024

Description

We have recently increased significantly the delay for some throttled messages.

In order to get a better idea of the amount of messages, this PR adds a counter.

Checklist

inspected CI test results for throttled messages:

  • Seen a message not throttled.
  • Seen a message which was throttled.

Testing

With a container generating 100 zombie processes, the collector contains:

...
[INFO    2024/07/05 19:12:09] Successfully established GRPC stream for signals.
[INFO    2024/07/05 19:12:09] Found self-check process event.
[ERROR   2024/07/05 19:12:09] Could not determine network namespace: No such file or directory
[INFO    2024/07/05 19:12:10] Found self-check connection event.
[INFO    2024/07/05 19:12:19] self-check (pid=76) exited with status: 0
[INFO    2024/07/05 19:12:39] Flushing thread table
[INFO    2024/07/05 19:12:39] Flushing container table
[ERROR   2024/07/05 19:12:39] [Throttled 99 messages] Could not determine network namespace: No such file or directory
[INFO    2024/07/05 19:13:09] Flushing container table
[ERROR   2024/07/05 19:13:09] [Throttled 99 messages] Could not determine network namespace: No such file or directory
...

@ovalenti ovalenti self-assigned this Jan 26, 2024
@ovalenti ovalenti closed this Feb 27, 2024
@ovalenti ovalenti reopened this Jun 18, 2024
@ovalenti ovalenti force-pushed the ovalenti/count-throttled-logs branch 4 times, most recently from 5e68bb8 to fb917d9 Compare July 5, 2024 19:01
@ovalenti ovalenti marked this pull request as ready for review July 5, 2024 19:57
@ovalenti ovalenti requested a review from a team as a code owner July 5, 2024 19:57
static std::chrono::steady_clock::time_point _clog_lastlog_##__LINE__; \
static unsigned long _clog_throttle_times_##__LINE__ = 0; \
std::chrono::duration _clog_elapsed_##__LINE__ = std::chrono::steady_clock::now() - _clog_lastlog_##__LINE__; \
if (collector::logging::CheckLogLevel(collector::logging::LogLevel::lvl) && (cond) && _clog_elapsed_##__LINE__ < interval) { \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not been able to refactor this to use nested ifs. I think this is okay.

@JoukoVirtanen JoukoVirtanen self-requested a review August 1, 2024 21:03
@JoukoVirtanen
Copy link
Contributor

I will approve after merge conflicts are fixed.

@@ -56,10 +56,15 @@ const size_t LevelPaddingWidth = 7;

class LogMessage {
public:
LogMessage(const char* file, int line, bool throttled, LogLevel level)
: file_(file), line_(line), level_(level), throttled_(throttled) {
LogMessage(const char* file, int line, LogLevel level, unsigned long* throttled_times = 0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] using nullptr rather than 0 would make this a little clearer

Suggested change
LogMessage(const char* file, int line, LogLevel level, unsigned long* throttled_times = 0)
LogMessage(const char* file, int line, LogLevel level, unsigned long* throttled_times = nullptr)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on this one, we are terrible at using nullptr, we should try to do better.

collector/lib/Logging.h Show resolved Hide resolved
Copy link
Contributor

@JoukoVirtanen JoukoVirtanen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@@ -56,10 +56,15 @@ const size_t LevelPaddingWidth = 7;

class LogMessage {
public:
LogMessage(const char* file, int line, bool throttled, LogLevel level)
: file_(file), line_(line), level_(level), throttled_(throttled) {
LogMessage(const char* file, int line, LogLevel level, unsigned long* throttled_times = 0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on this one, we are terrible at using nullptr, we should try to do better.

(std::chrono::steady_clock::now() - _clog_lastlog_##__LINE__ >= interval)) \
_clog_lastlog_##__LINE__ = std::chrono::steady_clock::now(), \
collector::logging::LogMessage(__FILE__, __LINE__, true, collector::logging::LogLevel::lvl)
#define CLOG_THROTTLED_IF(cond, lvl, interval) \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm gonna be honest, I don't like macros. I think most of them are directly written by the devil himself, waiting for us to make a silly mistake in an expansion, triggering undefined behavior and bringing down entire programs in an instant.

This particular case was already pretty bad, but with the added changes it becomes borderline unreadable, I had to put a TON of effort into just trying to understand what was going on in it and, I have to assume, if we ever need to do any further changes or (God forbid) do a bug fix, it will take way too much effort to do so.

That said, I am also in favor of the added feature, I think counting the amount of times a log has been throttled adds value (throttling twice is not as bad as throttling 2000 times). Maybe we can come up with a better solution, there might be a way we can rewrite at least part of the macro to be a concrete method and pass things like __LINE__ as an argument to it. Or maybe we have to rethink how our entire logging implementation works, the implementation itself is really clever, using the destructor of the class to do the actual print out, but that also means every time we log we are creating and destroying an object, maybe we want to change it to have a single instance of a logger object and this sort of throttling can be done with a map of sorts, grouping line number to the amount of triggers that have occurred.

I digress, I will not block the change from being merged because I like the idea behind it, but I cannot in good faith approve the PR because of the reasons stated before.

@ovalenti ovalenti closed this Oct 4, 2024
@ovalenti ovalenti deleted the ovalenti/count-throttled-logs branch October 31, 2024 09:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants