Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ideas and resources for fuzzing shared-everything threads #84

Open
fitzgen opened this issue Oct 15, 2024 · 6 comments
Open

Ideas and resources for fuzzing shared-everything threads #84

fitzgen opened this issue Oct 15, 2024 · 6 comments

Comments

@fitzgen
Copy link
Contributor

fitzgen commented Oct 15, 2024

This issue is intended as a place for discussion about approaches we can all use to improve our assurances that we correctly implement this proposal. A place to share relevant papers, moonshot ideas, etc...

Unlike many other Wasm proposals,1 we can't effectively exercise this proposal's additions to the language by just simply generating pseudo-random Wasm modules that contain this proposal's new instruction and feeding those into the system under test.

Off the top of my head, we need to additionally do things like

  • actually spawn multiple threads
  • check whether the dynamic trace of accesses across threads is valid given the happens-before relationships imposed by those operations used to make those accesses
  • ideally avoid dead locks in these generated concurrent programs

Footnotes

  1. To some degree everything here applies equally to the original threads proposal, but shared-everything threads expands the scope much further.

@fitzgen
Copy link
Contributor Author

fitzgen commented Oct 15, 2024

FWIW, @abrown has a pull request implementing support for shared-everything threads in wasm-smith: bytecodealliance/wasm-tools#1756

This is good for exercising validators, but won't give us much as far as engine-implementation correctness is concerned.

@tlively
Copy link
Member

tlively commented Oct 15, 2024

Ideas I've had include:

  • Use an abstract interpreter that collects all possible executions as the fuzzing oracle (but only extraordinarily simple programs would avoid combinatoric explosion in possible results).
  • Start by generating random happens-before/reads-from graphs that will produce reasonable numbers of possible results, then generate programs that embody those graphs.
  • Implement TSan instrumentation for WasmGC to check that fuzzer-generated programs that are intended to be race-free are in fact race-free (also useful for checking real-world programs for bugs).
  • Use methods other than differential fuzzing (e.g. translation validation) to find misoptimization bugs.

I'd be very interested in reading any available literature in this area.

@fitzgen
Copy link
Contributor Author

fitzgen commented Oct 15, 2024

Some things that seem vageuly related, but which I haven't actually read yet:

@cfallin
Copy link
Collaborator

cfallin commented Oct 16, 2024

The hardest part by far seems, to me right now at least, to be the generation of programs that terminate (as Nick mentions above "ideally don't deadlock").

I wonder if one could hit both goals -- actually use the primitives, and terminate -- with a grammar-based random testcase generator that builds out of primitives such that the testcase must terminate and must produce a deterministic result, by construction. Then test with the composition of two levels of randomness: (i) building test cases, (ii) running each test case many times, perhaps with artificial perturbation (token-passing in the style of deterministic execution proposals like DMP but with token order changed by some random seed). The oracle condition is: results match across perturbed executions.

The grammar could include primitives like: spawn into two subthreads, (recursively generate subprograms), join; lock/unlock a spinlock, carefully constructed so the spinlock is scoped to a level in the spawn hierarchy and locks/unlocks are well-paired; similar, with the atomic wakeup primitives, well-paired between subthreads; atomic-add on one single atomic, with the returned values summed into a checksum and not used otherwise. No other shared data; add other random computations as per normal fuzz testcase generation, perhaps with normal loads/stores to disjoint (per-subtask) addresses and to TLS. (Importantly, no memory accesses to memory "protected by a lock" -- not racy but would introduce nondeterminism as critical sections commute.)

This seems to possibly have an advantage of avoiding more complex oracles (building happens-before graphs and verifying dynamically or somesuch), with the usual downside of grammar-based fuzzing that one only gets the coverage one plans to generate (but maybe this gives enough coverage by exercising all the building blocks? empirical question for sure).

@fitzgen
Copy link
Contributor Author

fitzgen commented Oct 16, 2024

Thought: If the engine supports interruption and we are checking dynamic traces for validity, then we don’t need to generate terminating and non-deadlocking programs. We can just check as much of the trace as we got up to the interruption point.

@fitzgen
Copy link
Contributor Author

fitzgen commented Oct 16, 2024

Another source of possible inspiration would be Jensen: https://github.com/jepsen-io/jepsen

Jepsen is a Clojure library. A test is a Clojure program which uses the Jepsen library to set up a distributed system, run a bunch of operations against that system, and verify that the history of those operations makes sense.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants