You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This issue is partly about implementing it, but more importantly to have the discussion about whether we want this. There are tradeoffs that I'll try to explain in this issue.
Fairness and Determinism
This crate strives to be fair in polling futures when there is a choice between which future or stream to poll next. The exact details of the guarantee that we want aren't specified yet (#45), but roughly we want it so that no future is unfairly more likely to complete than another. So if we always polled 0..n, that would be unfair because it would be biased towards futures that are earlier in the list.
The way we guarantee fairness currently is to pick a random index and start polling from there. This keeps any particular entry in the list from getting an unfair advantage. This version does not give us determinism though.
From a discussion with @conradludgate, it seems like it might be enough to just remember the last index we polled and then go from there. This version would give us a deterministic scheduler, but in the long run we don't give an advantage to any index (modulo some special cases I'll get into later).
Why we want determinism?
Concurrency bugs can be extremely difficult to debug and create test cases for. While there will likely always be nondeterminism coming from external sources (e.g. network requests can take a variable amount of time to complete), we can make it so the scheduler does not introduce any additional nondeterminism. This also makes it possible to carefully construct futures for test cases that can exhibit a particular tricky interleaving that may expose bugs. For debugging, when trying to figure out why something went wrong it can be helpful to replay the exact same sequence of events.
Determinism is also potentially faster, since we can save a few instructions from calculating a random number each time around. I doubt this will matter much in practice though.
Why do we not want determinism?
As kind of a dual to the testing/debugging argument I made in the last section, nondeterminism can also help with development and testing. It might be that there are weird bugs that only happen in weird interleavings of futures that would be unlikely to come up during development but could happen in production. In that case, introducing nondeterminism can help uncover these. It's basically fuzz testing for the scheduler.
Also, determinism is somewhat in conflict with fairness. If we did something like [ready(1), ready(2), ready(3)].race() then by our intuitive definition of fairness we should expect to get either 1, 2, or 3 out. But any deterministic scheduler will always return the same answer. So a deterministic scheduler can only be fair in the long run, but not for short-lived processes.
What are some other options?
If we decide to have a deterministic scheduler, it might be worth adding a fuzz testing mode that randomizes the scheduling order, which would give us one of the main benefits of a nondeterministic scheduler.
If we decide not to have determinism, we could add an option to fix the random seed which would make things deterministic when needed, such as when writing a regression test for a nasty bug.
We could add a feature flag and let library users decide which mode they want, although in my opinion it'd be better to not have the complexity of maintaining two versions.
What should we do?
I'm partial to a deterministic scheduler (which I why I opened the issue). For real-world futures, things will be inherently nondeterministic, but I think it makes sense to be predictable where we can. I'm curious to hear other people's opinions though, since I think there are good arguments either way.
The text was updated successfully, but these errors were encountered:
I think you're correctly pointing out that fairness and non-determinism are not the same, and don't need to be implemented through the same mechanism. While we may want fairness, non-determinism comes at a cost.
Also, determinism is somewhat in conflict with fairness. If we did something like [ready(1), ready(2), ready(3)].race() then by our intuitive definition of fairness we should expect to get either 1, 2, or 3 out. But any deterministic scheduler will always return the same answer. So a deterministic scheduler can only be fair in the long run, but not for short-lived processes.
This to me is quite interesting! - To me the example you used is pretty much the "perfect" counter-example, and most cases wouldn't quite be like this. In practice it seems rare to expect someone to explicitly race two synchronous operations. In the case someone races 1 async + 1 sync operation, we already deterministically know that the synchronous one will complete before the async one. So the poll ordering practically doesn't matter.
The main example I'm thinking of is: we want to ensure that if we have a set of iterators, one iterator isn't starved before the other. We don't want Merge to accidentally provide Chain semantics when operating on non-async sequences. But as you say, we can trivially guarantee that by simply doing a round-robin iteration. [repeat(1), repeat(2), repeat(3)].merge() would then always yield 1, 2, 3, 1, 2, 3, 1, 2, 3. Which is fair and also deterministic.
Conclusion
I'm on board with this change ✅. Probably for the implementation it'd be nice if we could create a shared Indexer struct or something which replaces the current RandomGenerator and just cycles through the starting index. Basically the same semantics as (1..N).cycle() - but as a separate struct.
This issue is partly about implementing it, but more importantly to have the discussion about whether we want this. There are tradeoffs that I'll try to explain in this issue.
Fairness and Determinism
This crate strives to be fair in polling futures when there is a choice between which future or stream to poll next. The exact details of the guarantee that we want aren't specified yet (#45), but roughly we want it so that no future is unfairly more likely to complete than another. So if we always polled 0..n, that would be unfair because it would be biased towards futures that are earlier in the list.
The way we guarantee fairness currently is to pick a random index and start polling from there. This keeps any particular entry in the list from getting an unfair advantage. This version does not give us determinism though.
From a discussion with @conradludgate, it seems like it might be enough to just remember the last index we polled and then go from there. This version would give us a deterministic scheduler, but in the long run we don't give an advantage to any index (modulo some special cases I'll get into later).
Why we want determinism?
Concurrency bugs can be extremely difficult to debug and create test cases for. While there will likely always be nondeterminism coming from external sources (e.g. network requests can take a variable amount of time to complete), we can make it so the scheduler does not introduce any additional nondeterminism. This also makes it possible to carefully construct futures for test cases that can exhibit a particular tricky interleaving that may expose bugs. For debugging, when trying to figure out why something went wrong it can be helpful to replay the exact same sequence of events.
Determinism is also potentially faster, since we can save a few instructions from calculating a random number each time around. I doubt this will matter much in practice though.
Why do we not want determinism?
As kind of a dual to the testing/debugging argument I made in the last section, nondeterminism can also help with development and testing. It might be that there are weird bugs that only happen in weird interleavings of futures that would be unlikely to come up during development but could happen in production. In that case, introducing nondeterminism can help uncover these. It's basically fuzz testing for the scheduler.
Also, determinism is somewhat in conflict with fairness. If we did something like
[ready(1), ready(2), ready(3)].race()
then by our intuitive definition of fairness we should expect to get either 1, 2, or 3 out. But any deterministic scheduler will always return the same answer. So a deterministic scheduler can only be fair in the long run, but not for short-lived processes.What are some other options?
If we decide to have a deterministic scheduler, it might be worth adding a fuzz testing mode that randomizes the scheduling order, which would give us one of the main benefits of a nondeterministic scheduler.
If we decide not to have determinism, we could add an option to fix the random seed which would make things deterministic when needed, such as when writing a regression test for a nasty bug.
We could add a feature flag and let library users decide which mode they want, although in my opinion it'd be better to not have the complexity of maintaining two versions.
What should we do?
I'm partial to a deterministic scheduler (which I why I opened the issue). For real-world futures, things will be inherently nondeterministic, but I think it makes sense to be predictable where we can. I'm curious to hear other people's opinions though, since I think there are good arguments either way.
The text was updated successfully, but these errors were encountered: