Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

URLSession._MultiHandle doesn't wait for DispatchSource cancel handler #4791

Closed
lxbndr opened this issue Jul 3, 2023 · 9 comments
Closed

Comments

@lxbndr
Copy link
Contributor

lxbndr commented Jul 3, 2023

The problem is pretty old and has been reported in Dispatch initially. After the investigation, it appears that it has to be fixed in Foundation. I am opening this issue for tracking and sharing the progress.

Background

URLSession utilizes cURL for socket management, including reading, writing, and basic protocol operations. cURL, in turn, requires external notification when there is pending data or an event on a socket. This is where DispatchSource comes into play. The implementation ties the lifecycle of a DispatchSource to the register/unregister requests from cURL.

The Issue

When cURL requests _MultiHandle to unregister a socket, the _SocketSources.tearDown function is implicitly called. This function invokes DispatchSource.cancel() and immediately returns control to cURL. Right after that cURL closes the socket. The Dispatch documentation says

It is invalid to close a file descriptor or deallocate a mach port that is currently being tracked by a dispatch source object before the cancellation handler is invoked.

Details

Although this violation is common for all platforms, it seems that Windows is the most severely impacted. It is crucial not to close the socket too early, as doing so could result in Windows reusing the freed socket handle for a new connection. This, in turn, may lead to Dispatch handling the same socket simultaneously as both dead and alive. Also, Dispatch proactively checks for invalid sockets in critical locations (like this) and crashes immediately if an invalid socket handle is encountered. Therefore, it is highly important to prevent the socket from being closed until Dispatch has finished its operations.

The simplest crashing scenario is HTTP timeout (originally reported here). You will get crash immediately as the timeout handler removes _EasyHandle from _MultiHandle. Trying to mute the crash (by ignoring WSAENOTSOCK error, as proposed here) reduces the crash rate, but leaves us open to much more cryptic crashes due to socket handle reuse.

Possible Solutions

Obviously, we have to wait for cancellation handlers, but it is a bit tricky to implement. URLSession is designed to process network events on a single queue. Cancellation handlers are also submitted on that queue. We can't just wait for a callback, because it would be an effective deadlock.

  • We introduced a separate queue for DispatchSource events/handlers and it worked flawlessly. However, we acknowledge that this approach can be seen as more of a hacky workaround and contradicts the overall design of URLSession and _MultiHandle.
  • A more natural solution appears to be making the socket closure asynchronous. cURL offers the option to provide a custom socket close function instead of using the standard one. Within this function, we can add the socket handle to a designated list instead of immediately closing it. When the DispatchSource close handler is triggered, it can search the list for any "close pending" sockets and then proceed with the actual closure.

Additional Case

There is a real scenario when cURL still closes the socket internally (ignoring the custom close function) and passes us an already closed socket. It's unclear whether this is a bug in cURL or a valid case that may require a client-side workaround. We probably should solve that separately.

Next Steps

  1. Convenient test cases are needed. I have some test code for all known crash scenarios, but it is not portable enough. Would be ideal to implement tests based on Foundation Tests internal HTTP server.
  2. Once we have tests, we could experiment with "postponed close" solution (and other solutions if we have any)
  3. Deal with CURL already-closed-socket issue (perhaps, extracting this scenario as separate issue)
@weissi
Copy link
Contributor

weissi commented Jul 31, 2023

@lxbndr I'd suggest you use https://github.com/swift-server/async-http-client instead of FoundationNetworking on Linux. On Linux, it has many issues and most users use AsyncHTTPClient which also works much more nicely with async/await (the async extensions for URLSession are Darwin-only right now).

@gregcotten
Copy link

I came back to see if any progress had been made in this area... You still cannot use URLSession on Windows without the threat of a crash, unfortunately...

@lxbndr
Copy link
Contributor Author

lxbndr commented Nov 21, 2023

@gregcotten unfortunately, yes :( I had some intermediate results, but not something worth to commit yet. But hey, this is not abandoned. There is real need in fixing this, in production, so I am planning to invest some effort into this soon.

@brianmichel
Copy link

I just wanted to lend my voice to this conversation as I was reminded about this issue as I was extending a third party library to support Windows. This issue is a pretty large problem when trying to approach Swift in a portable fashion. @weissi does async-http-client work on Windows?

@gregcotten
Copy link

I just wanted to lend my voice to this conversation as I was reminded about this issue as I was extending a third party library to support Windows. This issue is a pretty large problem when trying to approach Swift in a portable fashion. @weissi does async-http-client work on Windows?

Definitely not - swift-nio needs a Windows networking champion to make that happen.

@brianmichel
Copy link

Does anyone have test cases they can share on this issue so that we might make this a bit of a starting point for people looking to try to tackle this problem? @lxbndr would you be able to share any patches/code you've created while trying to work with this issue?

@lxbndr
Copy link
Contributor Author

lxbndr commented Nov 29, 2023

The basic crashing sample is here. Depending on the target host behavior you would get one of two kinds of crash.

  • If the timeout is purely http thing (could easily be simulated with any online service like httpstat.us through a parameter), that would be "soft" crash due to the SocketSource cancel rules violation. We're using a separate queue as a workaround: readdle@819eab9
  • In case of socket connection timeout, _MultiHandle will get invalid (closed) socket handle from cURL, and Dispatch would crash trying to work with that handle. We have no workaround for this other than suppressing crashes in Dispatch, unfortunately. Also, cURL version was bumped since then, so this have to be re-checked.

I believe I had a test sample for socket handle value reuse issue, but can't find it anywhere. Basically, I just repeated same (normal, not timeout) request in cycle. Every new request is started in URLSession's completion handler of previous request, and socket reuse appears almost immediately.

@lxbndr
Copy link
Contributor Author

lxbndr commented Nov 29, 2023

Also, I have to mention the fix for non-Windows platforms our Android team made: readdle@017ceb0
This looks better to me, as we don't have to play with queues. But AFAIK we can't apply this to Windows, as sockets there are different by nature.

lxbndr added a commit to readdle/swift-corelibs-foundation that referenced this issue Dec 19, 2023
This adds test cases for swiftlang#4791. Test HTTPServer needs to be reconfigured for one of the test scenarios, so there are options now.
lxbndr added a commit to readdle/swift-corelibs-foundation that referenced this issue Dec 19, 2023
This adds test cases for swiftlang#4791. Test HTTPServer needs to be reconfigured for one of the test scenarios, so there are options now.
lxbndr added a commit to readdle/swift-corelibs-foundation that referenced this issue Dec 19, 2023
This adds test cases for swiftlang#4791. Test HTTPServer needs to be reconfigured for one of the test scenarios, so there are options now.
lxbndr added a commit to readdle/swift-corelibs-foundation that referenced this issue Dec 19, 2023
This adds test cases for swiftlang#4791. Test HTTPServer needs to be reconfigured for one of the test scenarios, so there are options now.
lxbndr added a commit to readdle/swift-corelibs-foundation that referenced this issue Dec 19, 2023
This adds test cases for swiftlang#4791. Test HTTPServer needs to be reconfigured for one of the test scenarios, so there are options now.
lxbndr added a commit to readdle/swift-corelibs-foundation that referenced this issue Dec 25, 2023
This adds test cases for swiftlang#4791. Test HTTPServer needs to be reconfigured for one of the test scenarios, so there are options now.
lxbndr added a commit to readdle/swift-corelibs-foundation that referenced this issue Dec 26, 2023
Extends socket lifetime enough to let DispatchSource cancel properly.
Also prevents from creating new DispatchSources while other are in the
middle of cancelling.

Also includes tests (see swiftlang#4854 for test details).
lxbndr added a commit to readdle/swift-corelibs-foundation that referenced this issue Dec 26, 2023
Extends socket lifetime enough to let DispatchSource cancel properly.
Also prevents from creating new DispatchSources while other are in the
middle of cancelling.

Also includes tests (see swiftlang#4854 for test details).
lxbndr added a commit to readdle/swift-corelibs-foundation that referenced this issue Dec 27, 2023
Extends socket lifetime enough to let DispatchSource cancel properly.
Also prevents from creating new DispatchSources while other are in the
middle of cancelling.

Also includes tests (see swiftlang#4854 for test details).
lxbndr added a commit to readdle/swift-corelibs-foundation that referenced this issue Dec 29, 2023
Extends socket lifetime enough to let DispatchSource cancel properly.
Also prevents from creating new DispatchSources while other are in the
middle of cancelling.

Also includes tests (see swiftlang#4854 for test details).
lxbndr added a commit to readdle/swift-corelibs-foundation that referenced this issue Jan 8, 2024
lxbndr added a commit to readdle/swift-corelibs-foundation that referenced this issue Jan 8, 2024
lxbndr added a commit to readdle/swift-corelibs-foundation that referenced this issue Jan 16, 2024
lxbndr added a commit to readdle/swift-corelibs-foundation that referenced this issue Jan 16, 2024
lxbndr added a commit to readdle/swift-corelibs-foundation that referenced this issue Apr 23, 2024
lxbndr added a commit to readdle/swift-corelibs-foundation that referenced this issue Apr 23, 2024
lxbndr added a commit to readdle/swift-corelibs-foundation that referenced this issue Apr 23, 2024
lxbndr added a commit to readdle/swift-corelibs-foundation that referenced this issue Apr 23, 2024
lxbndr added a commit to readdle/swift-corelibs-foundation that referenced this issue Jun 9, 2024
lxbndr added a commit to readdle/swift-corelibs-foundation that referenced this issue Jul 11, 2024
lxbndr added a commit to readdle/swift-corelibs-foundation that referenced this issue Jul 11, 2024
@lxbndr
Copy link
Contributor Author

lxbndr commented Oct 19, 2024

Closing this, as the subject is fixed now. There are still issues in Dispatch preventing the URLSession from being stable on Windows (and, perhaps, on Linux and Android). Should be investigated and tracked separately. We successfully mitigated the issue by replacing DispatchSource with simple custom socket handling on top of WinSDK.

@lxbndr lxbndr closed this as completed Oct 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants