Race condition/bug on waiting for and killing process #80

faern · 2019-10-28T15:28:48Z

This is going to be a very vague issue report. Sorry for that. But we have seen this for quite some time now, and I figured better a vague report than nothing. It could be that we do something wrong on our end, but we think there might be a bug in duct. This only shows up very rarely and we don't have any reliable way to reproduce it. Which is why we have not been able to debug it properly. But we are hit with what we think is a race condition or other type of bug related to waiting for and killing subprocesses in https://github.com/mullvad/mullvadvpn-app/

We have one thread blocked on handle.wait() to detect if the subprocess dies. And sometimes we send a SIGTERM from another thread, wait a few seconds and then try to force-kill the process iff it's still alive. This works 99.999% of the times. But sometimes our logs report that the handle thinks the process is alive after the SIGTERM and it goes on to kill it, only to get stuck in kill(), forever. The thread waiting in wait() never returns neither. When looking at the active processes it seems the subprocess is gone already. So both threads are stuck on on wait()/kill() for a PID that no longer exists.

This seems most common on macOS, but I think we have seen issues that could be the same thing on Linux and Windows as well. But not sure.

The text was updated successfully, but these errors were encountered:

oconnor663 · 2019-10-28T16:46:40Z

I'm glad you filed the issue. I'll try to brainstorm with you here, and maybe one of the cases I know of will sound like it could fit your situation.

Thinking about kill waiting forever.

I don't know of any way kill could block on the act of sending a signal, so I assume it blocks waiting to reap the child that it killed. The only way I know of for a child process to "resist" SIGKILL is if it's in a long-lasting, uninterruptible system call. And the only example of that that I'm familiar with is reads to a FUSE filesystem, where the underlying FUSE server is itself blocked on something, maybe some network request that will never complete. Is there any chance your processes are reading FUSE filesystems when you try to kill them? Or maybe some other type of network drive?

That said, this scenario wouldn't fit with the detail of child process being already gone in the process monitor. But there's a lot that confuses me about that detail. How can the waiting thread wait on a process that's already gone? Even if there was a race condition leading to that situation, it should result in an immediate error. Or even if our waitid logic was broken, and the parent ended up waiting on someone else's child (because the system was under extreme load and quickly reusing PIDs), that wrong child should at least be visible in the process monitor.

Are you certain the child process is gone? That's a very important detail if so, and it strongly suggests some deadlock to do with duct's internal condvar (immediately below). But if there's any chance your process monitor might be e.g. hiding zombie processes or something, that could send us down the wrong track.

Thinking about deadlocks in duct itself

We do have a couple mutexes that we acquire in different places. It's possible there's a deadlock there. But we should always be taking the state_lock before we take the child_lock, and that consistent acquisition order is supposed to prevent deadlocks. There could also be some mistake in how we're using the condvar. But I don't see how that could cause both the killing thread and the waiting thread to deadlock. But then again it might fit if you have many threads waiting and killing the same child? If somehow one or two of them succeed but the rest remain blocked on some deadlock? (This is also not a well tested case.)

macOS specific issues

One platform issue we have on macOS is that it doesn't implement pipe2. So the os_pipe crate has to race to make pipes non-inheritable. It's possible that if you're spawning children from multiple threads, you could hit that race, and one child would inherit an extra output pipe that belongs to another child. If the first child is long-lived, that could block threads trying to read the output of the second child. But that shouldn't ever cause kill to block. (As noted below, the automatic wait performed by kill doesn't wait on IO.)

Windows specific issues

Windows had an issue similar to the above, but it was a bug in os_pipe that I fixed in version 0.9.1. If you had a process deadlock issue that was Windows-specific, that only showed up with multiple threads spawning children, and that seemed to go away with the latest version of os_pipe, this would almost certainly be the culprit.

grandchildren issues

I have a longer discussion about grandchildren in the gotchas file. If your child processes are spawning further child processes of their own, you could be running into an issue with grandchildren. The kill function only kills direct children, so if a grandchild is holding its own copy of an output pipe, that could cause wait to continue blocking. That said, the inner wait performed by kill does not wait on IO for exactly this reason. So even if you had some kind of grandchild race, that shouldn't cause your kill thread to block. Unless there's some bug in how kill is waiting -- though we do specifically test this case.

oconnor663 · 2019-10-28T17:03:10Z

Ah it looks like you're using Duct v0.12 in most places. In that version, kill did not wait on child processes at all. (You've made me realize that I forgot to cover this in the changelog. Fixing that now.) If you found that kill was deadlocking, that would almost certainly point to some sort of internal locking bug in shared_child.

I wonder if these lines could be the issue:

    // The child has already exited, so this wait should clean up without blocking.
    let final_result = noreap_result.and_then(|_| self.child.lock().unwrap().wait());

The idea there is that we've already seen that the child has exited, so calling Child.wait() should never block. Perhaps there's some rare case where that's wrong?

faern · 2019-10-29T10:29:42Z

Thanks for the very thorough brainstorming.
I'm currently adding extra logging around all wait() and kill() calls. So next time it happens we hopefully have a little bit more info. We also have more layers of abstraction that we would really need. Much of this code was written before we knew the scope of what we had to support down the line. Maybe we can simplify our subprocess usage overall and narrow down the problem. We'll see.

Regarding long-lasting, uninterruptible system call. Our subprocess here is OpenVPN. It does not do anything related to FUSE to my knowledge. Not sure exactly what other system calls it makes. But as you say, probably irrelevant since the process was gone from the process table by the time I got to inspect the computer.

Are you certain the child process is gone?

I did ps aux | grep -i openvpn and got nothing. I'm pretty sure our code did not spawn any new subprocesses in this state, so I'm fairly confident the PID was not being reused and we were waiting on some other child process.

But then again it might fit if you have many threads waiting and killing the same child?

I think we only have one thread waiting for it. But the kill might come in from different threads. And the way we implement killing the process nicely is by first sending a SIGTERM and wait poll the process with try_wait every 50ms for a maximum of 4 seconds, then we use kill() if try_wait still reports that it's alive. During these deadlock cases the 4 seconds passed. So try_wait must have thought the process was alive at least.

grandchildren issues

I don't think OpenVPN uses any child processes in the setup we use.

so if a grandchild is holding its own copy of an output pipe, that could cause wait to continue blocking.

We use stdout_null() and stderr_null() when setting up our Expression. So hopefully wait should not care about any output pipe at all?

oconnor663 · 2019-10-29T16:00:03Z

Out of curiosity, how do you send SIGTERM? Duct hasn't exposed a way to get the child PID, so you must be using some side channel? (As an aside, I really should expose such a thing.)

oconnor663 · 2019-10-29T17:11:15Z

Here's a test program that tries to discover any deadlocks in shared_child. When I run this on my Linux machine, it doesn't seem to deadlock. But I'd be curious to see if you get any different results in your macOS environment.

[package]
name = "example"
version = "0.1.0"
edition = "2018"

[dependencies]
shared_child = "0.3.4"
libc = "0.2.65"
os_pipe = "0.9.1"
anyhow = "1.0.18"
crossbeam-utils = "0.6.6"

use anyhow::Result;
use shared_child::unix::SharedChildExt;
use shared_child::SharedChild;
use std::os::unix::prelude::*;
use std::process::{Command, ExitStatus};
use std::sync::Barrier;

fn main() -> Result<()> {
    // Loop forever spawning a child and racing to kill it with both SIGTERM
    // and SIGKILL, while waiting with both wait and try_wait.
    loop {
        let handle = SharedChild::spawn(Command::new("/usr/bin/sleep").arg("1000"))?;
        let barrier = Barrier::new(4);
        crossbeam_utils::thread::scope(|s| -> Result<()> {
            let sigterm_thread = {
                s.spawn(|_| -> Result<()> {
                    barrier.wait();
                    handle.send_signal(libc::SIGTERM)?;
                    Ok(())
                })
            };

            let sigkill_thread = {
                s.spawn(|_| -> Result<()> {
                    barrier.wait();
                    handle.kill()?;
                    Ok(())
                })
            };

            let try_wait_thread = {
                s.spawn(|_| -> Result<Option<ExitStatus>> {
                    barrier.wait();
                    let maybe_status = handle.try_wait()?;
                    Ok(maybe_status)
                })
            };

            barrier.wait();
            let status = handle.wait()?;
            let maybe_status = match try_wait_thread.join().unwrap()? {
                Some(status) => status.signal().unwrap().to_string(),
                None => "_".to_string(),
            };
            sigterm_thread.join().unwrap()?;
            sigkill_thread.join().unwrap()?;
            eprint!("{}/{} ", status.signal().unwrap(), maybe_status);
            Ok(())
        })
        .unwrap()?;
    }
}

faern · 2019-10-30T09:13:48Z

Out of curiosity, how do you send SIGTERM? Duct hasn't exposed a way to get the child PID, so you must be using some side channel?

Sorry, my bad. I thought we did, and we probably did or tried to do at some point. But since that does not exist on Windows, and we need to make OpenVPN close gracefully on all platforms, we went with another solution. We have our own patched version of OpenVPN that tries to read from stdin and whenever the stdin pipe is closed it treats that the same way it would treat a ctrl-c/SIGTERM. So the way we gracefully close it is by attaching a pipe to stdin and then just close that pipe.

(As an aside, I really should expose such a thing.)

Yes! Just giving access to the PID is important and a nice thing to have. Otherwise people can't work around limitations in the library. Maybe I want to read the /proc/<pid>/ filesystem or write my own pidfile for the child or something.

oconnor663 · 2019-10-30T13:59:18Z

Just uploaded v0.13.3 with a new pids API.

oconnor663 · 2019-10-30T14:00:29Z

So the way we gracefully close it is by attaching a pipe to stdin and then just close that pipe.

You probably know this, but if you pass any pipes to Expression::stdin_file (previously stdin_handle), you'll have to make sure you drop the expression if you want the child to see that the pipe is closed.

faern · 2019-10-30T14:13:32Z

I did not. But we don't store the Expression anywhere, it goes out of scope just after spawning the child anyway. This works fine 99.999% of the time. So that's not the issue here.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Race condition/bug on waiting for and killing process #80

Race condition/bug on waiting for and killing process #80

faern commented Oct 28, 2019 •

edited

Loading

oconnor663 commented Oct 28, 2019 •

edited

Loading

oconnor663 commented Oct 28, 2019 •

edited

Loading

faern commented Oct 29, 2019

oconnor663 commented Oct 29, 2019

oconnor663 commented Oct 29, 2019 •

edited

Loading

faern commented Oct 30, 2019

oconnor663 commented Oct 30, 2019

oconnor663 commented Oct 30, 2019

faern commented Oct 30, 2019

Race condition/bug on waiting for and killing process #80

Race condition/bug on waiting for and killing process #80

Comments

faern commented Oct 28, 2019 • edited Loading

oconnor663 commented Oct 28, 2019 • edited Loading

oconnor663 commented Oct 28, 2019 • edited Loading

faern commented Oct 29, 2019

oconnor663 commented Oct 29, 2019

oconnor663 commented Oct 29, 2019 • edited Loading

faern commented Oct 30, 2019

oconnor663 commented Oct 30, 2019

oconnor663 commented Oct 30, 2019

faern commented Oct 30, 2019

faern commented Oct 28, 2019 •

edited

Loading

oconnor663 commented Oct 28, 2019 •

edited

Loading

oconnor663 commented Oct 28, 2019 •

edited

Loading

oconnor663 commented Oct 29, 2019 •

edited

Loading