Watchdog #475

Thomasdezeeuw · 2021-06-07T11:33:49Z

Currently if a single asynchronous actor blocks and takes up very large amounts of time we have no way to signal that, let alone recover from it.

Since we're using cooperative scheduling there is little we can do recover/fix it (as a framework), but we could let the user know. For example, by creating a watchdog thread. This watchdog thread would "check in" with the worker thread every x milliseconds, seeing if another process had the chance to run. This could detect long running processes.

The idea is that worker threads post the State in a shared place somewhere allowing the watchdog thread to check it. The watchdog checks if the worker thread is not in the State::Running (with the same pid) for too long.

enum State {
    /// Worker thread is polling.
    Polling,
    /// Worker thread is running a process `pid`.
    Running {
        pid: ProcessId,
        started: Instant,
    },
}

This watchdog could be a separate thread or the coordinator could pick up this task.

The text was updated successfully, but these errors were encountered:

Thomasdezeeuw added the idea An idea, open to discussion. label Jun 7, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Watchdog #475

Watchdog #475

Thomasdezeeuw commented Jun 7, 2021

Watchdog #475

Watchdog #475

Comments

Thomasdezeeuw commented Jun 7, 2021