-
Notifications
You must be signed in to change notification settings - Fork 112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance is constrained by single-core CPU speed when I/O does not depend on CPU #224
Comments
Thanks a lot for reporting, a very interesting case! I agree that more threads at merely 500 entries/s shouldn't be able to hog a whole CPU. Profiling would be needed though to see if it's spending time in the kernel or if it's truly in user-space, but for now it's fine to assume it's indeed something in The next generation traversal engine fixes the CPU problem, that much I could validate already, even though it's unclear when the underlying |
I tried to run And then the scanning speed grows roughly linear from there; with 4 threads it's ~twice faster than with 2 threads; with 200 threads it's ~twice faster than with 100 threads. CPU consumption with 100 threads is around 110%, with 200 threads around 120%, indicating that the scanning threads are not CPU-bound; it's just something in the main thread that is, every time that So probably that's the same problem you know about, related to I should probably also mention that I observe this behavior in non-interactive mode. |
For posterity, here is a profile run of a This is probably related to how To me the next-gen traversal engine will be the solution, and even if not at first, @pascalkuthe will make sure it will be :). |
yeah the moonwalk tansversal engine avoids almost all locking (its lockfree) but its very very hard to get right. Eventually we will solve this :) |
I couldn't find any issue describing this scenario, so I'll just leave it here, so that at least it's known.
I'm trying to run
dua
against a remote share (Hetzner storage box mounted over SMB on non-hetzner Linux-based system, to be precise). The file structure is: several millions files, ~10 layers deep, with directories' sizes ranging from 1 to hundreds of subdirectories, and files always being in a leaf directory with 1-2 files (which means there are several millions directories as well).The problem is that I/O for this remote share has a really high latency, so with
ncdu
scanning progresses at around 5-10 files per second, and withdust
, at maybe 15-30 files per second.With
dua
, it seems to scale linearly with the number of threads... at first. So even at 100 threads I'm initially getting better results than at 50 threads, at least in the first seconds of the scan.However, very quickly I get constrained by CPU for some reason, with
dua
consuming a full single CPU core (more specifically,dua
's CPU usage intop
seems to hover around 110%, on a modern AMD CPU with eight physical cores). The performance stabilizes at ~1-1.5 million files per hour (or 300-500 files per second). Which is incredibly better thanncdu
ordust
, but it still feels likedua
is not supposed to be single-core-bound in this scenario, not at 500 files (or subdirectories) at least?I understand that non-concurrent parts of the code can be a bottleneck in some scenarios, but maxing out a fast CPU at just 500 files / subdirectories per second does not seem right, so maybe there is something else going on.
(That's on
dua
v2.26.0 from Alpine package repository.)The text was updated successfully, but these errors were encountered: