Skip to content

Commit

Permalink
test showing ProcessInfoCache crash
Browse files Browse the repository at this point in the history
Summary:
I want to use the process info cache in another spot in watchman so we can get
client information in a new spot. However, there weve seen a crash in this code
a few times.

Muir seems to be able to repro this pretty frequently. Here are a couple
sample stack traces from his machine: P1009632719 & P1517801272.

They are assertions in the folly future code
```
libc++abi: terminating due to uncaught exception of type std::logic_error: setResult unexpected state: 2
```
and
```
libc++abi: terminating due to uncaught exception of type std::logic_error: setCallback unexpected state: 4
```

The eden code that seems to trigger this is this future wait https://github.com/facebookexperimental/edencommon/blob/main/eden/common/utils/ProcessInfoCache.cpp#L116

I've learned that these messages each mean that:
- One thread is calling setResult and trying to update the future state, but
it unexpectedly gets set to "OnlyResult" before the code can update it's state.
Meaning that another thread sets the value of the future while this one is
trying to.
and
- One thread is calling setProxy and trying to update the future state, but
its unexpectedly gets set to "OnlyCallback". Meaning that one thread is trying
to set the future that the value should be provided to when another is also
setting the callback of the future.

When futures are used as intended there should only ever be one call to set
the value and one call to set the callback.

Some reference material:
- future states: https://github.com/facebook/folly/blob/main/folly/futures/detail/Core.h#L44-L53
- valid state transitions: https://github.com/facebook/folly/blob/main/folly/futures/detail/Core.h#L252-L271
- "proxy"-ing happens when a future callback returns a future and this needs
to be pipelined to the next future callback nicely: https://github.com/facebook/folly/blob/main/folly/futures/Future-inl.h#L454

So were doing something funky with futures that causes two producers or consumers
in the ProcessInfoCache code.

I looked at the produce side (the one that provides the value or sets the
promise). And it seems generally above board. We pull the promise out of the
queue and then should only set the promise once: https://github.com/facebookexperimental/edencommon/blob/main/eden/common/utils/ProcessInfoCache.cpp#L301-L325

However, on the consume side, we store this future in a thread local and
state cache: https://github.com/facebookexperimental/edencommon/blob/main/eden/common/utils/ProcessInfoCache.cpp#L38

That allows multiple threads to access and .wait() on the same future. https://github.com/facebookexperimental/edencommon/blob/main/eden/common/utils/ProcessInfoCache.cpp#L116

waiting looks like basically adding a callback to a future, and that should
only be done once I think.

This test shows teo threads trying to both wait at the same time and crash.

Reviewed By: jdelliot

Differential Revision: D60923796

fbshipit-source-id: 6956c8030890dab1a51d7039352f59c96b41d94d
  • Loading branch information
Katie Mancini authored and facebook-github-bot committed Aug 9, 2024
1 parent 50e94a7 commit 5444270
Show file tree
Hide file tree
Showing 3 changed files with 45 additions and 1 deletion.
2 changes: 1 addition & 1 deletion eden/common/utils/ProcessInfoCache.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -293,7 +293,7 @@ void ProcessInfoCache::workerThread() {

sem_.wait();
if (faultInjector_) {
faultInjector_->check("ProcessInfoCache::workerThread", "");
faultInjector_->check("ProcessInfoCache::workerThread", "workerThread");
}

size_t currentNamesSize;
Expand Down
1 change: 1 addition & 0 deletions eden/common/utils/ProcessInfoCache.h
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,7 @@ class ProcessInfoHandle {

private:
FRIEND_TEST(ProcessInfoCache, faultinjector);
FRIEND_TEST(ProcessInfoCache, multipleLookups);

const folly::SemiFuture<ProcessInfo>& future() const;

Expand Down
43 changes: 43 additions & 0 deletions eden/common/utils/test/ProcessInfoCacheTest.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
#include "eden/common/utils/ProcessInfoCache.h"

#include <folly/portability/GTest.h>
#include <folly/system/ThreadName.h>

#include "eden/common/utils/FaultInjector.h"

Expand Down Expand Up @@ -169,4 +170,46 @@ TEST(ProcessInfoCache, faultinjector) {
// is something legit.
EXPECT_NE("", info.get().name);
}

TEST(ProcessInfoCache, multipleLookups) {
FaultInjector faultInjector = FaultInjector{/*enabled=*/true};
ProcessInfoCache processInfoCache = ProcessInfoCache{
/*expiry=*/std::chrono::minutes{5},
/*threadLocalCache=*/nullptr,
/*clock=*/nullptr,
/*readInfo=*/nullptr,
/*faultInjector=*/&faultInjector};

// prevent the process info cache from making any progress on looking up pids
faultInjector.injectBlock("ProcessInfoCache::workerThread", ".*");

auto info1 = processInfoCache.lookup(getpid());
ASSERT_TRUE(
faultInjector.waitUntilBlocked("ProcessInfoCache::workerThread", 1s));

// auto info2 = processInfoCache.lookup(getpid());

// Assumption: these should share the same node since they are cached and
// worker could not have made any progress yet.
// ASSERT_EQ(info1.node_.get(), info2.node_.get());

// currently if we run both threads it cause a crash sometimes. because both
// threads are calling wait() at the same time which can cause multiple
// callbacks to be added to the same future core.
auto thread1 = std::thread{[info = std::move(info1)] {
folly::setThreadName("info1");
EXPECT_NE("", info.get().name);
}};

// auto thread2 = std::thread{[info = std::move(info2)] {
// folly::setThreadName("info2");
// EXPECT_NE("", info.get().name);
// }};

faultInjector.removeFault("ProcessInfoCache::workerThread", ".*");
faultInjector.unblock("ProcessInfoCache::workerThread", ".*");

thread1.join();
// thread2.join();
}
} // namespace facebook::eden

0 comments on commit 5444270

Please sign in to comment.