Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DAPHNE-#575] Error logger, meaningful default log level, cleanup #575

Merged
merged 6 commits into from
Jul 28, 2023

Conversation

corepointer
Copy link
Collaborator

@corepointer corepointer commented Jul 18, 2023

This commit introduces the "error" logger that should be used upon quitting the program abnormally due to some error or exception. This logger ignores the log level limit to ensure that fatal error messages get through to the std err console output.

Furthermore, this commit contains a lot of replacements of std::cout and std::cerr (exceptions to this are the distributed code paths for example).

@corepointer corepointer added this to the v0.2 milestone Jul 18, 2023
@corepointer corepointer marked this pull request as ready for review July 20, 2023 09:54
@corepointer corepointer requested a review from pdamme July 20, 2023 09:54
@corepointer corepointer changed the title [DAPHNE-#abc] Error logger, meaningful default log level, cleanup [DAPHNE-#575] Error logger, meaningful default log level, cleanup Jul 20, 2023
Copy link
Collaborator

@pdamme pdamme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @corepointer! Thanks for making the error messages show up on the console again. This is definitely very useful.

It already works well for parser errors and pass/compiler errors. However, exceptions thrown by kernels are not displayed. The simple DaphneDSL script print(ctable([1], [1, 2])); should print ctable: lhs and rhs must have the same number of rows, but fails silently. I think this can be fixed by changing spdlog::error() to spdlog::get("error")->error() in daphne_internal.cpp line 544.

Regarding the how, my main question is: Do we really need a separate error logger? My concerns with this approach are:

  • It is not intuitive when one should use spdlog::error() and when spdlog::get("error")->error(). They sound the same.
  • It is possible to use other log levels with the error logger, e.g., spdlog::get("error")->warn(), the semantics of those are not intuitive.
  • So far, I thought individual loggers are meant for individual "topics", e.g., CUDA, but error is actually a log level.

A simple solution could be calling it a console logger, because the special thing about it is that it always prints messages to the console (std::cerr). Nevertheless, isn't the output target a dimension orthogonal to log levels and topics? For instance, it is already possible to specify files where to append the logger output. I think printing to std::cerr may be better understood as an output target. To sum up, I'm wondering if the goal of ensuring that all error messages are printed on std::cerr may not be simpler achieved by using the default logger (i.e., spdlog::error()) with default log limit err and default output to std::cerr (instead of a file). This up for discussion, but I would be okay with merging it the way it is now to solve the functional problem first of all.

Required changes:

  1. Execution errors (e.g., when a kernel throws an exception) should be displayed (see above).
  2. A few SQL-related test cases are still failing.
  3. I think the changes in RewriteToCallKernelOpPass.cpp are unrelated to this feature, so I recommend committing them separately.

Optional changes (can also be addressed later):

  1. Consistent prefixes, e.g., "WARNING" vs. "Warning", "Execution error:" vs. "Execution error!"
  2. There are still a few instances of direct printing to std::cerr in src/runtime/local/context/FPGAContext.cpp, src/runtime/local/vectorized/MTWrapper.h, and src/runtme/local/vectorized/WorkerCPU.h, maybe they should be replaced, too.
  3. I think the message prefixes prepended when converting exceptions to log messages sound a bit too technical in some cases. Users of DAPHNE are not required to know about C++, so telling them that a std::runtime_error happened internally might make the error sound more complicated to them than it is.
  4. Exceptions thrown by compiler passes used to be caught in daphne_internal.cpp, but now there are try-catch blocks that log errors in multiple places (e.g., in InferencePass, RewriteSqlOpPass, DaphneInferTypesOpInterface). This approach is prone to inconsistencies (duplicated message prefixes; sometimes catching only std::runtime_error, sometimes also ... ellipsis) and bugs (e.g., if another pass throws an exception, maybe one that bubbled up from a utility function, it may not be displayed). Maybe we could have a central point for catching all those in the same binary by placing it in DaphneIrExecutor.

@corepointer
Copy link
Collaborator Author

Thx for the feedback @pdamme, I'll give it another pass :)

The reason why the error logger came into existence as it is in this PR is that we never decided on any guidelines when we need to output to std err and when to std out. So I went the way of least disruptive change and tried to keep current behavior. So the design goals of the separate error logger were

  • deal with the topic of errors (yes, having to write spdlog::get("error")->error() looks a bit redundant and awkward)
  • output to std err (to replace all occurrences of std::cerr)
  • be somewhat separate (e.g. ignore log level limit to still get critical error messages)

Having a console logger would be a way to go. But we could also say the default logger should fulfill this part as it prints to console anyway. File logging can be configured by the user in addition. If we go the route of a console/default logger we should probably let go of using std err and always output to std out. We'll have enough log levels to distinguish errors anyway.

Central catching point and uniform messages sgtm.

Removed some commented code and reformatted some lines
…eanups

This commit improves the behavior of the default logger to properly log all errors from the compiler passes and runtime kernels and test cases.
An initial effort has been made to have more uniform error messages but this remains work in progress.
A few occurrences of std::cout and std::cerr are left for the original authors of that code to decide how to log these outputs.

Closes daphne-eu#575
This commit sets up the DaphneLogger for the WorkerImplGRPC to make the test cases work.
Doing the same for MPI remains for future work (no test cases there yet).
@corepointer
Copy link
Collaborator Author

I addressed the required and a few optional changes. Some of the std::cerr and std::cout are still in the code for the original authors to fix (@AhmedEleliemy for the scheduling debug print and @pdamme for the timing) as you know best how you want to deal with those. I also left the commented out occurrences which mainly occur in the distributed code paths of @AhmedEleliemy.
Some observations for future work:

  • The MTWrapper and Worker classes have some compile time flags for verbosity. This can be replaced with loggers imho
  • There's the "critical" log level. We could make use of it to distinguish between errors that abort execution and errors that do not (which would raise the question how we want to use the "warn" log level). In general we could discuss/publish a guideline. It shares the fate of a coding standard imho
  • Same same for formatting of the default log outputs
  • I found a note in the spdlog documentation that spdlog::get() should be avoided in performance critical sections as it requires acquiring a lock and one should get() the logger once and use the shared_ptr in subsequent calls. I have implemented this guideline in the MarkCUDAOpsPass and the context objects for the runtime loggers to serve as examples. Will update documentation shortly.

Copy link
Collaborator

@pdamme pdamme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @corepointer, for the clarifications and for addressing the points mentioned above. This PR looks very good to me now. I will merge it in a minute.

Thanks for the clarification regarding the original ideas behind the error logger. I didn't realize at first that error was meant in the sense of cerr, while the default logger was meant for cout. Anyway, I like the current implementation very much!

All the required changes from my previous review have been addressed.

Thanks for also addressing some of the optional items, especially the more consistent (in terms of prefixes) and less technical log messages.

I think the logging support is very good for now and has surely improved the code base. We all can agree on some advanced aspects like logging guidelines in the near future :) .


Since we're about to release: I saw that you also changed run-daphne.sh, which could have an impact on a release artifact. Just want to make you aware in case in was by accident.

@pdamme pdamme merged commit fedc4cf into daphne-eu:main Jul 28, 2023
1 of 2 checks passed
pdamme pushed a commit that referenced this pull request Jul 28, 2023
This commit improves the behavior of the default logger to properly log all errors from the compiler passes and runtime kernels and test cases.
An initial effort has been made to have more uniform error messages but this remains work in progress.
A few occurrences of std::cout and std::cerr are left for the original authors of that code to decide how to log these outputs.

Closes #575
pdamme pushed a commit that referenced this pull request Jul 28, 2023
This commit sets up the DaphneLogger for the WorkerImplGRPC to make the test cases work.
Doing the same for MPI remains for future work (no test cases there yet).
@corepointer
Copy link
Collaborator Author

Thx for reviewing and merging @pdamme

Yes, I committed the run script change by accident - fixed.

Releasing v0.2 once the images and release artifacts have been rebuilt successfully =)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants