Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-adding dispatch source for socket file descriptors results in reuse and incorrect update of dispatch_muxnote_t on Ubuntu, Android, and Windows #833

Open
hmelder opened this issue Jul 2, 2024 · 5 comments

Comments

@hmelder
Copy link

hmelder commented Jul 2, 2024

Creating a dispatch source for a socket file descriptor, removing it, and re-adding it shortly after results in a reuse of the underlying dispatch_muxnote_t which was created with the first dispatch source. This happens on Ubuntu 23.10 and Android API 24,34.

However, during registration of the new unote in _dispatch_unote_register_muxed https://github.com/apple/swift-corelibs-libdispatch/blob/542b7f32311680b11b6fc8fcb2576955460ba7da/src/event/event_epoll.c#L270-L272 _dispatch_epoll_update returns ENOENT and the existing muxnote is in turn not correctly updated.

Sometimes, this is because of https://github.com/apple/swift-corelibs-libdispatch/blob/542b7f32311680b11b6fc8fcb2576955460ba7da/src/event/event_epoll.c#L602-L611 EPOLLHUP.

I also noticed removal of muxnotes which still had an active reader or writer.

Here is a workaround that worked for us:

diff --git a/src/event/event_epoll.c b/src/event/event_epoll.c
index f31d13e..8654104 100644
--- a/src/event/event_epoll.c
+++ b/src/event/event_epoll.c
@@ -268,7 +268,10 @@ _dispatch_unote_register_muxed(dispatch_unote_t du)
 		if (events & ~_dispatch_muxnote_armed_events(dmn)) {
 			events |= _dispatch_muxnote_armed_events(dmn);
 			if (_dispatch_epoll_update(dmn, events, EPOLL_CTL_MOD) < 0) {
-				dmn = NULL;
+				// The file descriptor was closed, reregister in epoll
+				if (_dispatch_epoll_update(dmn, events, EPOLL_CTL_ADD) < 0) {
+					dmn = NULL;
+				}
 			} else {
 				dmn->dmn_events |= events;
 				dmn->dmn_disarmed_events &= ~events;
@@ -319,6 +322,8 @@ _dispatch_unote_unregister_muxed(dispatch_unote_t du)
 	dispatch_unote_linkage_t dul = _dispatch_unote_get_linkage(du);
 	dispatch_muxnote_t dmn = dul->du_muxnote;
 	uint32_t events = dmn->dmn_events;
+	int has_readers = 1;
+	int has_writers = 1;
 
 	LIST_REMOVE(dul, du_link);
 	_LIST_TRASH_ENTRY(dul, du_link);
@@ -326,6 +331,7 @@ _dispatch_unote_unregister_muxed(dispatch_unote_t du)
 
 	if (LIST_EMPTY(&dmn->dmn_readers_head)) {
 		events &= (uint32_t)~EPOLLIN;
+		has_readers = 0;
 		if (dmn->dmn_disarmed_events & EPOLLIN) {
 			dmn->dmn_disarmed_events &= (uint16_t)~EPOLLIN;
 			dmn->dmn_events &= (uint32_t)~EPOLLIN;
@@ -333,13 +339,14 @@ _dispatch_unote_unregister_muxed(dispatch_unote_t du)
 	}
 	if (LIST_EMPTY(&dmn->dmn_writers_head)) {
 		events &= (uint32_t)~EPOLLOUT;
+		has_writers = 0;
 		if (dmn->dmn_disarmed_events & EPOLLOUT) {
 			dmn->dmn_disarmed_events &= (uint16_t)~EPOLLOUT;
 			dmn->dmn_events &= (uint32_t)~EPOLLOUT;
 		}
 	}
-
-	if (events & (EPOLLIN | EPOLLOUT)) {
+	
+	if (events & (EPOLLIN | EPOLLOUT) || has_readers || has_writers) {
 		if (events != _dispatch_muxnote_armed_events(dmn)) {
 			dmn->dmn_events = events;
 			events = _dispatch_muxnote_armed_events(dmn);
Here is an example on Android API 34 with head at 542b7f3.
2024-07-02 14:33:26.479 30762-30762 com.exampl...objectivec com.example.helloobjectivec          E  [thread:531362555096] Resume Task!
2024-07-02 14:33:26.479 30762-30883 com.exampl...objectivec com.example.helloobjectivec          E  [thread:531362589848] Timer Callback for Session <NSURLSession: 0x7bb7adf018>: multi=0x7c47b37360 timeout_ms=0
2024-07-02 14:33:26.480 30762-30883 com.exampl...objectivec com.example.helloobjectivec          E  [thread:531362589848] File NSURLSession.m: 382. In [NSURLSession -_resumeTask:] Added task=<NSURLSessionDataTask: 0x7c47b3ad38> easy=0x7c07b48780 to multi=0x7c47b37360 with return value 0
2024-07-02 14:33:26.480 30762-30883 com.exampl...objectivec com.example.helloobjectivec          E  [thread:531362589848] File NSURLSession.m: 280. In [NSURLSession -initWithConfiguration:delegate:delegateQueue:] Timer fired for session <NSURLSession: 0x7bb7adf018> with oldTimer=0 timer=0
2024-07-02 14:33:26.482 30762-30883 com.exampl...objectivec com.example.helloobjectivec          E  [thread:531362589848] Socket Callback for Session <NSURLSession: 0x7bb7adf018>: socket=117 easy:0x7c07b48780 what=IN
2024-07-02 14:33:26.482 30762-30883 com.exampl...objectivec com.example.helloobjectivec          E  [thread:531362589848] File NSURLSession.m: 459. In [NSURLSession -_addSocket:easyHandle:what:] Add Socket: 117 easy: 0x7c07b48780
2024-07-02 14:33:26.483 30762-30883 com.exampl...objectivec com.example.helloobjectivec          E  [thread:531362589848] File NSURLSession.m: 498. In [NSURLSession -_setSocket:sources:what:] Creating a reading dispatch source: socket=117 sources=0x7be7baba30 what=1
2024-07-02 14:33:26.483 30762-30883 com.exampl...objectivec com.example.helloobjectivec          E  [thread:531362589848] Timer Callback for Session <NSURLSession: 0x7bb7adf018>: multi=0x7c47b37360 timeout_ms=1
2024-07-02 14:33:26.483 30762-30870 GNUstep                 com.example.helloobjectivec          I  created new muxnote dmb=0x79fd9343b0 dmn=0x7bc7dd1710 for du=0x7a6bd1a830 events=0x40004001
2024-07-02 14:33:26.483 30762-30870 GNUstep                 com.example.helloobjectivec          I  adding reader to muxnote dmn=0x7bc7dd1710 for du=0x7a6bd1a830
2024-07-02 14:33:26.484 30762-30883 com.exampl...objectivec com.example.helloobjectivec          E  [thread:531362589848] File NSURLSession.m: 280. In [NSURLSession -initWithConfiguration:delegate:delegateQueue:] Timer fired for session <NSURLSession: 0x7bb7adf018> with oldTimer=0 timer=1
2024-07-02 14:33:26.484 30762-30883 com.exampl...objectivec com.example.helloobjectivec          E  [thread:531362589848] Timer Callback for Session <NSURLSession: 0x7bb7adf018>: multi=0x7c47b37360 timeout_ms=2
2024-07-02 14:33:26.489 30762-30883 com.exampl...objectivec com.example.helloobjectivec          E  [thread:531362589848] File NSURLSession.m: 280. In [NSURLSession -initWithConfiguration:delegate:delegateQueue:] Timer fired for session <NSURLSession: 0x7bb7adf018> with oldTimer=1 timer=2
2024-07-02 14:33:26.489 30762-30883 com.exampl...objectivec com.example.helloobjectivec          E  [thread:531362589848] Timer Callback for Session <NSURLSession: 0x7bb7adf018>: multi=0x7c47b37360 timeout_ms=4

2024-07-02 14:33:26.493 30762-30870 GNUstep                 com.example.helloobjectivec          I  dispatch_event_loop_drain: EVFILT_READ with dmn=0x7bc7dd1710 events=0x1
                                                                                                    Dispatch event merge fd dmn=0x7bc7dd1710 events=0x1 dwm
2024-07-02 14:33:26.493 30762-30870 GNUstep                 com.example.helloobjectivec          I  _disarmed_events=0x1
                                                                                                    dmn_events=0x40004001 dmn_disarmed_events=0x1 armed=0x40004000
                                                                                                    dispatch epoll update from merge fd dmn=0x7
2024-07-02 14:33:26.493 30762-30870 GNUstep                 com.example.helloobjectivec          I  bc7dd1710 events=0x40004000
2024-07-02 14:33:26.494 30762-30883 com.exampl...objectivec com.example.helloobjectivec          E  [thread:531362589848] Socket Callback for Session <NSURLSession: 0x7bb7adf018>: socket=117 easy:0x7c07b48780 what=REMOVE
2024-07-02 14:33:26.494 30762-30870 GNUstep                 com.example.helloobjectivec          I  dispatch_event_loop_drain: EVFILT_READ with dmn=0x7bc7dd1710 events=0x10
                                                                                                    Dispatch event merge fd dmn=0x7bc7dd1710 events=0x10 d
2024-07-02 14:33:26.494 30762-30870 GNUstep                 com.example.helloobjectivec          I  wm_disarmed_events=0x1
                                                                                                    removing muxnote dmn=0x7bc7dd1710 from epoll for hangup

2024-07-02 14:33:26.494 30762-30883 com.exampl...objectivec com.example.helloobjectivec          E  [thread:531362589848] File NSURLSession.m: 434. In [NSURLSession -_removeSocket:] Remove socket with SourceInfo: 0x7be7baba30
2024-07-02 14:33:26.495 30762-30883 com.exampl...objectivec com.example.helloobjectivec          E  [thread:531362589848] Socket Callback for Session <NSURLSession: 0x7bb7adf018>: socket=117 easy:0x7c07b48780 what=OUT
2024-07-02 14:33:26.496 30762-30883 com.exampl...objectivec com.example.helloobjectivec          E  [thread:531362589848] File NSURLSession.m: 459. In [NSURLSession -_addSocket:easyHandle:what:] Add Socket: 117 easy: 0x7c07b48780
2024-07-02 14:33:26.496 30762-30883 com.exampl...objectivec com.example.helloobjectivec          E  [thread:531362589848] File NSURLSession.m: 540. In [NSURLSession -_setSocket:sources:what:] Creating a writing dispatch source: socket=117 sources=0x7be7ba9f30 what=2

2024-07-02 14:33:26.496 30762-30870 GNUstep                 com.example.helloobjectivec          I  found existing muxnote dmb=0x79fd9343b0 dmn=0x7bc7dd1710 for du=0x7a6bd1a830 events=0x40004004
                                                                                                    dmn_events=0x40004001 dmn_disarm
2024-07-02 14:33:26.496 30762-30870 GNUstep                 com.example.helloobjectivec          I  ed_events=0x1 armed=0x40004000
                                                                                                    dmn_events=0x40004001 dmn_disarmed_events=0x1 armed=0x40004000
                                                                                                    updating existing muxnote dmb=0x7
2024-07-02 14:33:26.496 30762-30870 GNUstep                 com.example.helloobjectivec          I  9fd9343b0 dmn=0x7bc7dd1710 for du=0x7a6bd1a830 events=0x40004004
                                                                                                    failed to update epoll dmn: No such file or directory
2024-07-02 14:33:26.496 30762-30883 com.exampl...objectivec com.example.helloobjectivec          E  [thread:531362589848] Timer Callback for Session <NSURLSession: 0x7bb7adf018>: multi=0x7c47b37360 timeout_ms=2
2024-07-02 14:33:26.497 30762-30870 GNUstep                 com.example.helloobjectivec          I  unregistering muxnote dmn=0x7bc7dd1710 for du=0x7a6bd1a840 events=0x40004001
                                                                                                    no more readers for muxnote dmn=0x7bc7dd1710 for d
2024-07-02 14:33:26.497 30762-30870 GNUstep                 com.example.helloobjectivec          I  u=0x7a6bd1a840 events=0x40004000
                                                                                                    disarmed EPOLLIN for muxnote dmn=0x7bc7dd1710 for du=0x7a6bd1a840 events=0x40004000
                                                                                                    no more wr
2024-07-02 14:33:26.497 30762-30870 GNUstep                 com.example.helloobjectivec          I  iters for muxnote dmn=0x7bc7dd1710 for du=0x7a6bd1a840 events=0x40004000
                                                                                                    has_readers=0 has_writers=0
                                                                                                    removing muxnote dmn=0x7bc
2024-07-02 14:33:26.497 30762-30870 GNUstep                 com.example.helloobjectivec          I  7dd1710 for du=0x7a6bd1a840
                                                                                                    removed epoll dmn=0x7bc7dd1710 for du=0x7a6bd1a840

2024-07-02 14:33:26.499 30762-30883 com.exampl...objectivec com.example.helloobjectivec          E  [thread:531362589848] File NSURLSession.m: 280. In [NSURLSession -initWithConfiguration:delegate:delegateQueue:] Timer fired for session <NSURLSession: 0x7bb7adf018> with oldTimer=4 timer=2
2024-07-02 14:33:26.499 30762-30883 com.exampl...objectivec com.example.helloobjectivec          E  [thread:531362589848] Timer Callback for Session <NSURLSession: 0x7bb7adf018>: multi=0x7c47b37360 timeout_ms=29990

The patch with logging

diff --git a/src/event/event_epoll.c b/src/event/event_epoll.c
index f31d13e..b65c1ec 100644
--- a/src/event/event_epoll.c
+++ b/src/event/event_epoll.c
@@ -265,19 +265,29 @@ _dispatch_unote_register_muxed(dispatch_unote_t du)
 	dmb = _dispatch_unote_muxnote_bucket(du);
 	dmn = _dispatch_unote_muxnote_find(dmb, du);
 	if (dmn) {
+		printf("found existing muxnote dmb=%p dmn=%p for du=%p events=0x%x\n", dmb, dmn, &du, events);
 		if (events & ~_dispatch_muxnote_armed_events(dmn)) {
 			events |= _dispatch_muxnote_armed_events(dmn);
+			printf("updating existing muxnote dmb=%p dmn=%p for du=%p events=0x%x\n", dmb, dmn, &du, events);
 			if (_dispatch_epoll_update(dmn, events, EPOLL_CTL_MOD) < 0) {
-				dmn = NULL;
+				// The file descriptor was closed, reregister in epoll
+				if (_dispatch_epoll_update(dmn, events, EPOLL_CTL_ADD) < 0) {
+					//printf("Failed to add epoll after not found dmn=%p for du=%p\n", dmn, &du);
+					dmn = NULL;
+				}
+				printf("Readded epoll dmn=%p for du=%p\n", dmn, &du);
 			} else {
 				dmn->dmn_events |= events;
 				dmn->dmn_disarmed_events &= ~events;
+				printf("Set dmn_events=0x%x dmn_disarmed_events=0x%x\n", dmn->dmn_events, dmn->dmn_disarmed_events);
 			}
 		}
 	} else {
 		dmn = _dispatch_muxnote_create(du, events);
+		printf("created new muxnote dmb=%p dmn=%p for du=%p events=0x%x\n", dmb, dmn, &du, events);
 		if (dmn) {
 			if (_dispatch_epoll_update(dmn, events, EPOLL_CTL_ADD) < 0) {
+				printf("failed to add epoll dmn=%p for du=%p\n", dmn, &du);
 				_dispatch_muxnote_dispose(dmn);
 				dmn = NULL;
 			} else {
@@ -289,8 +299,10 @@ _dispatch_unote_register_muxed(dispatch_unote_t du)
 	if (dmn) {
 		dispatch_unote_linkage_t dul = _dispatch_unote_get_linkage(du);
 		if (events & EPOLLOUT) {
+			printf("adding writer to muxnote dmn=%p for du=%p\n", dmn, &du);
 			LIST_INSERT_HEAD(&dmn->dmn_writers_head, dul, du_link);
 		} else {
+			printf("adding reader to muxnote dmn=%p for du=%p\n", dmn, &du);
 			LIST_INSERT_HEAD(&dmn->dmn_readers_head, dul, du_link);
 		}
 		dul->du_muxnote = dmn;
@@ -306,6 +318,8 @@ _dispatch_unote_resume_muxed(dispatch_unote_t du)
 	dispatch_assert(_dispatch_unote_registered(du));
 	uint32_t events = _dispatch_unote_required_events(du);
 
+	printf("resuming muxnote dmn=%p for du=%p events=0x%x\n", dmn, &du, events);
+
 	if (events & dmn->dmn_disarmed_events) {
 		dmn->dmn_disarmed_events &= ~events;
 		events = _dispatch_muxnote_armed_events(dmn);
@@ -319,6 +333,10 @@ _dispatch_unote_unregister_muxed(dispatch_unote_t du)
 	dispatch_unote_linkage_t dul = _dispatch_unote_get_linkage(du);
 	dispatch_muxnote_t dmn = dul->du_muxnote;
 	uint32_t events = dmn->dmn_events;
+	int has_readers = 1;
+	int has_writers = 1;
+
+	printf("unregistering muxnote dmn=%p for du=%p events=0x%x\n", dmn, &du, events);
 
 	LIST_REMOVE(dul, du_link);
 	_LIST_TRASH_ENTRY(dul, du_link);
@@ -326,27 +344,36 @@ _dispatch_unote_unregister_muxed(dispatch_unote_t du)
 
 	if (LIST_EMPTY(&dmn->dmn_readers_head)) {
 		events &= (uint32_t)~EPOLLIN;
+		has_readers = 0;
+		printf("no more readers for muxnote dmn=%p for du=%p events=0x%x\n", dmn, &du, events);
 		if (dmn->dmn_disarmed_events & EPOLLIN) {
 			dmn->dmn_disarmed_events &= (uint16_t)~EPOLLIN;
 			dmn->dmn_events &= (uint32_t)~EPOLLIN;
+			printf("disarmed EPOLLIN for muxnote dmn=%p for du=%p events=0x%x\n", dmn, &du, events);
 		}
 	}
 	if (LIST_EMPTY(&dmn->dmn_writers_head)) {
 		events &= (uint32_t)~EPOLLOUT;
+		has_writers = 0;
+		printf("no more writers for muxnote dmn=%p for du=%p events=0x%x\n", dmn, &du, events);
 		if (dmn->dmn_disarmed_events & EPOLLOUT) {
 			dmn->dmn_disarmed_events &= (uint16_t)~EPOLLOUT;
 			dmn->dmn_events &= (uint32_t)~EPOLLOUT;
+			printf("disarmed EPOLLOUT for muxnote dmn=%p for du=%p events=0x%x\n", dmn, &du, events);
 		}
 	}
-
-	if (events & (EPOLLIN | EPOLLOUT)) {
+	
+	if (events & (EPOLLIN | EPOLLOUT) || has_readers || has_writers) {
 		if (events != _dispatch_muxnote_armed_events(dmn)) {
+			printf("updating muxnote dmn=%p for du=%p events=0x%x\n", dmn, &du, events);
 			dmn->dmn_events = events;
 			events = _dispatch_muxnote_armed_events(dmn);
 			_dispatch_epoll_update(dmn, events, EPOLL_CTL_MOD);
 		}
 	} else {
+		printf("removing muxnote dmn=%p for du=%p\n", dmn, &du);
 		epoll_ctl(_dispatch_epfd, EPOLL_CTL_DEL, dmn->dmn_fd, NULL);
+		printf("removed epoll dmn=%p for du=%p\n", dmn, &du);
 		LIST_REMOVE(dmn, dmn_list);
 		_dispatch_muxnote_dispose(dmn);
 	}
@@ -422,6 +449,7 @@ _dispatch_timeout_program(uint32_t tidx, uint64_t target,
 			return;
 		}
 	} else {
+		printf("removing timer epoll registration %d\n", tidx);
 		op = EPOLL_CTL_DEL;
 	}
 	dispatch_assume_zero(epoll_ctl(_dispatch_epfd, op, timer->det_fd, &ev));
@@ -572,6 +600,8 @@ _dispatch_event_merge_fd(dispatch_muxnote_t dmn, uint32_t events)
 
 	dmn->dmn_disarmed_events |= (events & (EPOLLIN | EPOLLOUT));
 
+	printf("Dispatch event merge fd dmn=%p events=0x%x dwm_disarmed_events=0x%x\n", dmn, events, dmn->dmn_disarmed_events);
+
 	if (events & EPOLLIN) {
 		data = _dispatch_get_buffer_size(dmn, false);
 		LIST_FOREACH_SAFE(dul, &dmn->dmn_readers_head, du_link, dul_next) {
@@ -608,11 +638,13 @@ _dispatch_event_merge_fd(dispatch_muxnote_t dmn, uint32_t events)
 			dispatch_unote_t du = _dispatch_unote_linkage_get_unote(dul);
 			_dispatch_event_merge_hangup(du);
 		}
+		printf("removing muxnote dmn=%p from epoll for hangup\n", dmn);
 		epoll_ctl(_dispatch_epfd, EPOLL_CTL_DEL, dmn->dmn_fd, NULL);
 		return;
 	}
 
 	events = _dispatch_muxnote_armed_events(dmn);
+	printf("dispatch epoll update from merge fd dmn=%p events=0x%x\n", dmn, events);
 	if (events) _dispatch_epoll_update(dmn, events, EPOLL_CTL_MOD);
 }
 
@@ -674,6 +706,7 @@ retry:
 				break;
 
 			case EVFILT_READ:
+				printf("dispatch_event_loop_drain: EVFILT_READ with dmn=%p events=0x%x\n", dmn, ev[i].events);
 				_dispatch_event_merge_fd(dmn, ev[i].events);
 				break;
 			}
@lxbndr
Copy link

lxbndr commented Jul 7, 2024

I am guessing that Windows implementation follows epoll logic, at least in some parts. It also suffers from similar dispatch_muxnote_t issue. It is easily reproduced from Foundation by making POST URL request with body larger than 64k (which outstands CURL buffer and makes it to request adding and removing read/write dispatch sources exactly as needed to step on this issue). Our Android team reported that 64k POST is also a working breaker for them.

@hmelder
Copy link
Author

hmelder commented Jul 7, 2024

It also suffers from similar dispatch_muxnote_t issue

Interesting. We have not encountered the issues on Windows, but I'll investigate that too.

Our Android team reported that 64k POST is also a working breaker for them.

Do you have a hotfix for this that's different from mine?

@lxbndr
Copy link

lxbndr commented Jul 7, 2024

We workarounded it in Foundation by duplicating sockets with dup. This sort of works, because a "new" socket makes a separate muxntote, no collision.
Here is the commit. I am wondering if it works for you as well. But, anyway, that is a pure hack, unfortunately. Not to mention that it is not applicable to Windows. It have to be fixed in Dispatch. Your patch seems to be looking in the right direction.

@hmelder
Copy link
Author

hmelder commented Jul 7, 2024

This sort of works, because a "new" socket makes a separate muxntote, no collision.

Huh forgot that this is actually guaranteed by the API on Linux :)

The new file descriptor number is guaranteed to be the lowest-numbered
file descriptor that was unused in the calling process.

@andriydruk
Copy link

Hi, we created this fix with dub in Foundation to fix the 64K problem but I'm not very happy with it. Mostly, because it leads to over usage of file descriptors in runtime.

Android limits FD count to 1024, which may sound reasonable for most applications. But in our application users can add a lot of accounts that will use separate connections and we see that some power users reach this limit from time to time :(

That's why we would be glad to see a fix for this issue in libdispatch 👍

@hmelder hmelder changed the title Re-adding dispatch source for socket file descriptors results in reuse and incorrect update of dispatch_muxnote_t on Ubuntu and Android Re-adding dispatch source for socket file descriptors results in reuse and incorrect update of dispatch_muxnote_t on Ubuntu, Android, and Windows Jul 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants