Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Target Process may crash or freezes on detaching process on windows #115712

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

anatawa12
Copy link

@anatawa12 anatawa12 commented Nov 11, 2024

I hope this fixes #67825 Actually a another issue, see comment below

Fixes #67825 Fixes #89077

Fixes RIDER-99436, which is upstream issue of #67825.

This PR changes the timing of calling DebugActiveProcessStop to after calling ContinueDebugEvent for last debugger exception.

I confirmed the crashing behavior is because we call DebugActiveProcessStop before ContinueDebugEvent for last debugger exception with https://github.com/anatawa12/debug-api-test.

I have no idea how should I test this so I did not add tests. Could you let me know how should I test for this chages?

Copy link

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot llvmbot added the lldb label Nov 11, 2024
@llvmbot
Copy link

llvmbot commented Nov 11, 2024

@llvm/pr-subscribers-lldb

Author: anatawa12 (anatawa12)

Changes

I hope this fixes #67825
Fixes RIDER-99436, which is upstream issue of #67825.

This PR changes the timing of calling DebugActiveProcessStop to after calling ContinueDebugEvent for last debugger exception.

I confirmed the crashing behavior is because we call DebugActiveProcessStop before ContinueDebugEvent for last debugger exception with https://github.com/anatawa12/debug-api-test.

I have no idea how should I test this so I did not add tests. Could you let me know how should I test for this chages?


Full diff: https://github.com/llvm/llvm-project/pull/115712.diff

2 Files Affected:

  • (modified) lldb/source/Plugins/Process/Windows/Common/DebuggerThread.cpp (+29-16)
  • (modified) lldb/source/Plugins/Process/Windows/Common/DebuggerThread.h (+1-1)
diff --git a/lldb/source/Plugins/Process/Windows/Common/DebuggerThread.cpp b/lldb/source/Plugins/Process/Windows/Common/DebuggerThread.cpp
index ca8e9c078e1f99..ac63997abe823d 100644
--- a/lldb/source/Plugins/Process/Windows/Common/DebuggerThread.cpp
+++ b/lldb/source/Plugins/Process/Windows/Common/DebuggerThread.cpp
@@ -239,12 +239,13 @@ void DebuggerThread::DebugLoop() {
     BOOL wait_result = WaitForDebugEvent(&dbe, INFINITE);
     if (wait_result) {
       DWORD continue_status = DBG_CONTINUE;
+      bool shutting_down = m_is_shutting_down;
       switch (dbe.dwDebugEventCode) {
       default:
         llvm_unreachable("Unhandle debug event code!");
       case EXCEPTION_DEBUG_EVENT: {
-        ExceptionResult status =
-            HandleExceptionEvent(dbe.u.Exception, dbe.dwThreadId);
+        ExceptionResult status = HandleExceptionEvent(
+            dbe.u.Exception, dbe.dwThreadId, shutting_down);
 
         if (status == ExceptionResult::MaskException)
           continue_status = DBG_CONTINUE;
@@ -292,6 +293,25 @@ void DebuggerThread::DebugLoop() {
 
       ::ContinueDebugEvent(dbe.dwProcessId, dbe.dwThreadId, continue_status);
 
+      // We have to DebugActiveProcessStop after ContinueDebugEvent, otherwise
+      // the target process will crash
+      if (shutting_down) {
+        // A breakpoint that occurs while `m_pid_to_detach` is non-zero is a
+        // magic exception that we use simply to wake up the DebuggerThread so
+        // that we can close out the debug loop.
+        if (m_pid_to_detach != 0 &&
+            (dbe.u.Exception.ExceptionRecord.ExceptionCode ==
+                 EXCEPTION_BREAKPOINT ||
+             dbe.u.Exception.ExceptionRecord.ExceptionCode ==
+                 STATUS_WX86_BREAKPOINT)) {
+          LLDB_LOG(log,
+                   "Breakpoint exception is cue to detach from process {0:x}",
+                   m_pid_to_detach.load());
+          ::DebugActiveProcessStop(m_pid_to_detach);
+          m_detached = true;
+        }
+      }
+
       if (m_detached) {
         should_debug = false;
       }
@@ -310,25 +330,18 @@ void DebuggerThread::DebugLoop() {
 
 ExceptionResult
 DebuggerThread::HandleExceptionEvent(const EXCEPTION_DEBUG_INFO &info,
-                                     DWORD thread_id) {
+                                     DWORD thread_id, bool shutting_down) {
   Log *log = GetLog(WindowsLog::Event | WindowsLog::Exception);
-  if (m_is_shutting_down) {
-    // A breakpoint that occurs while `m_pid_to_detach` is non-zero is a magic
-    // exception that
-    // we use simply to wake up the DebuggerThread so that we can close out the
-    // debug loop.
-    if (m_pid_to_detach != 0 &&
+  if (shutting_down) {
+    bool is_breakpoint =
         (info.ExceptionRecord.ExceptionCode == EXCEPTION_BREAKPOINT ||
-         info.ExceptionRecord.ExceptionCode == STATUS_WX86_BREAKPOINT)) {
-      LLDB_LOG(log, "Breakpoint exception is cue to detach from process {0:x}",
-               m_pid_to_detach.load());
-      ::DebugActiveProcessStop(m_pid_to_detach);
-      m_detached = true;
-    }
+         info.ExceptionRecord.ExceptionCode == STATUS_WX86_BREAKPOINT);
 
     // Don't perform any blocking operations while we're shutting down.  That
     // will cause TerminateProcess -> WaitForSingleObject to time out.
-    return ExceptionResult::SendToApplication;
+    // We should not send breakpoint exceptions to the application.
+    return is_breakpoint ? ExceptionResult::MaskException
+                         : ExceptionResult::SendToApplication;
   }
 
   bool first_chance = (info.dwFirstChance != 0);
diff --git a/lldb/source/Plugins/Process/Windows/Common/DebuggerThread.h b/lldb/source/Plugins/Process/Windows/Common/DebuggerThread.h
index e3439ff34584be..5960237b778673 100644
--- a/lldb/source/Plugins/Process/Windows/Common/DebuggerThread.h
+++ b/lldb/source/Plugins/Process/Windows/Common/DebuggerThread.h
@@ -46,7 +46,7 @@ class DebuggerThread : public std::enable_shared_from_this<DebuggerThread> {
   void FreeProcessHandles();
   void DebugLoop();
   ExceptionResult HandleExceptionEvent(const EXCEPTION_DEBUG_INFO &info,
-                                       DWORD thread_id);
+                                       DWORD thread_id, bool shutting_down);
   DWORD HandleCreateThreadEvent(const CREATE_THREAD_DEBUG_INFO &info,
                                 DWORD thread_id);
   DWORD HandleCreateProcessEvent(const CREATE_PROCESS_DEBUG_INFO &info,

@anatawa12 anatawa12 changed the title fix: Target Process may crash on detaching process fix: Target Process may crash on detaching process on windows Nov 11, 2024
@labath
Copy link
Collaborator

labath commented Nov 12, 2024

So, it sounds like this is fixing test/API/commands/process/detach-resumes/TestDetachResumes.py (which is linked to #89077, which may be a dupe of the bug you're referencing), so the first step would be to figure out whether this test actually passes with your patch. Note that the test was never successfully run on windows, so you may need to tweak it a bit to make it windows compatible (I don't see anything that should be a problem for windows, but it's very easy for os assumptions to sneak in).

I'm not really familiar with windows APIs, but the change doesn't seem particularly controversial.

@anatawa12
Copy link
Author

I tried to run TestDetachResumes but it fails (still in suspended state) so I may need some other work to them work correctly. I'll investigate more tomorrow.

@anatawa12
Copy link
Author

After I changed debug-api-test, I found that calling DebugActiveProcessStop before ContinueDebugEvent will follow behavior specified by DebugSetProcessKillOnExit for some cases.

This PR only fixes this behavior, which is reported on Rider issue tracker as RIDER-99436.

It looks freezing in Suspended state, which are reported as #67825 and #89077, are completely different issue and not fixed with this PR.

@anatawa12
Copy link
Author

anatawa12 commented Nov 13, 2024

I looked the process in suspended state with sysinternal process explorer and I found all threads have exactly one suspend count.
Therefore, I added logs to calling SuspendThread and ResumeThread and I found those calls are unbalanced exactly one for each thread so I think #67825 and #89077 are about calling ResumeThread and completely different problem.
I'm interested in #67825 but this is completely different bug so I'm going to create another Pull Request when I reached to fix.

EDIT: I found we have to both this and suspend issue to fix #89077 so I'll add commits to this PR for now. If needed, I'll split pull request to two.

@anatawa12 anatawa12 changed the title fix: Target Process may crash on detaching process on windows fix: Target Process may crash or freezes on detaching process on windows Nov 13, 2024
@anatawa12
Copy link
Author

anatawa12 commented Nov 13, 2024

I have implemented fix for #89077 and I checked test works on windows.

Sorry for many comments.

Copy link
Collaborator

@labath labath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for looking into this. So, from what I was able to gather, there are two issues/bugs here, but I'm having trouble figuring out what they are. Can you explain what's the difference between them? I'm trying to figure out if we can/should have a separate test case for the other bug.

for (uint32_t i = 0; i < m_thread_list.GetSize(); ++i) {
auto thread = std::static_pointer_cast<TargetThreadWindows>(
m_thread_list.GetThreadAtIndex(i));
Status result = thread->DoResume();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean that if the thread/process gets another exception before we manage to detach from it, it will remain suspended? I'm not familiar with windows debugging API, so I don't know if there's a way to handle this, but I believe this is the reason why linux (and others) have a PTRACE_DETACH call which (atomically) detaches from a process/thread and resumes it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean that if the thread/process gets another exception before we manage to detach from it, it will remain suspended?

No, the process will not suspend.
The process will receive the exception and the process handles it instead.
For simple programs, it will crash the process but some process may handle in their way.
We can see this behavior with TestDetachResumes by reverting 5ad1a98.

I'm not familiar with windows debugging API, so I don't know if there's a way to handle this, but I believe this is the reason why linux (and others) have a PTRACE_DETACH call which (atomically) detaches from a process/thread and resumes it.

I agree with that but as far as I researched I could not figure out the API to detach and resume thread simultaneously.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the process will not suspend.
The process will receive the exception and the process handles it instead.

Ok, just so we're clear: If the exception happens after the last call to WaitForDebugEvent (but before the call to DebugActiveProcessStop), the process will not stay stopped, but it will receive the exception (and then either handle it or get killed by it). True or false?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's True, (as far as I know)

@anatawa12
Copy link
Author

anatawa12 commented Nov 13, 2024

Can you explain what's the difference between them?

Sorry for confusion. I'll explain two bugs cleanly.

First of all, both bug must be fixed to pass TestDetachResumes test.

The first bug is the process may freeze in Suspended state on detaching process.
This bug is tracked as #67825.
It looks this only happens when we detached process without resuming process (continue command).

The second bug is the process may crash (killed)on detaching process.
It looks this bug is not tracked in llvm-project repository but I think RIDER-99436 is experiencing this bug.
It think this bug will always happen on detaching process. This is incorrect.

Both bug will make the attached process unusable on detaching process, but the difference between two bug is what happens on detaching process.
The first bug make the process frozen in Suspended state, the later bug crashes the process.

@labath
Copy link
Collaborator

labath commented Nov 13, 2024

Just one more clarification. I'm sorry for this takes so long, but I'm not familiar with the windows debugging API, which means that I might not be getting some of the things that are obvious. I just want to make sure I understand what's happening here.

The first bug is the process may freeze in Suspended state on detaching process.

Okay, I think I understand this part. The process is "suspended" by the breakpoint and detaching from it does not automatically resume it.

The second bug is the process may crash (killed)on detaching process.

I'm less sure about this one. Is that due to the breakpoint exception somehow remaining "pending" and that we need to clear/ignore/suppress it before detaching?

@anatawa12
Copy link
Author

I'm less sure about this one. Is that due to the breakpoint exception somehow remaining "pending" and that we need to clear/ignore/suppress it before detaching?

Sorry I don't know the reason of this behavior, but not calling corresponding ContinueDebugEvent for WaitForDebugEvent may cause crash is what I know.

It looks leaving some "pending" events / exceptions in queue doesn't trigger problem, but leaving "processing" events / exceptions would cause this problem.


I misunderstand the behavior a little.

Sorry for confusion again.
In my previous comment, I said "It think this bug will always happen on detaching process" for second bug but I found that this bug depends on the implementation of target process and a.out from TestDetachResumes would not have this problem.
I fixed the previous comment.

Copy link
Collaborator

@labath labath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I think I'm satisfied by the explanation. Thanks for your patience. Do you need me to push the "merge" button?

@anatawa12
Copy link
Author

anatawa12 commented Nov 15, 2024

Do you need me to push the "merge" button?

Yes please. I don't have merge button for this repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
3 participants