Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unintuitive behavior of -l due to rounding #2460

Open
neldredge opened this issue Jun 9, 2024 · 0 comments
Open

Unintuitive behavior of -l due to rounding #2460

neldredge opened this issue Jun 9, 2024 · 0 comments

Comments

@neldredge
Copy link

The help text for the -l option says "do not start new jobs if the load average is greater than N". However, as currently implemented, that is not how it behaves.

The relevant code is:

ninja/src/build.cc

Lines 629 to 631 in 0b4b43a

int load_capacity = config_.max_load_average - GetLoadAverage();
if (load_capacity < capacity)
capacity = load_capacity;

Suppose we are running with -l5, so that config_.max_load_average == 5.0, and suppose GetLoadAverage() returns 4.1. Then the help text implies that new jobs should be able to start. However, this gives config_.max_load_average - GetLoadAverage() == 0.9, and assigning to int truncates the fractional part, so we end up with load_capacity == 0 and this stops any new jobs from starting until the load average falls to 4.0 or lower.

I think the code should instead be

int load_capacity = ceil(config_.max_load_average - GetLoadAverage()); 

so that we round up.

The code as it stands has other odd effects as well. I have been running ninja with -j4 -l5 on a 4-core system which is otherwise lightly loaded, so I expect that ninja will run 4 jobs at a time, and the load average will stabilize around 4.1. What happens instead is that ninja initially starts 4 jobs, but after a while it runs only 3 jobs at a time, and the load average stabilizes around 3.1, so most of a core goes unused.

Specifically, ninja initially runs 4 jobs at a time, until the load average reaches or slightly exceeds 4.0 (let's say 4.01). Then the next time a job finishes, we have load_capacity == 0, so a new job is not started and now only 3 jobs are running. The load average now falls down to say 3.99, and when the next job finishes, load_capacity == 1. This allows one new job to replace the one that just finished, and so we go on with just 3 jobs. We will continue to have load_capacity == 1 as long as the load average is greater than 3.0, which should remain true since we are running 3 jobs.

Switching to ceil() should fix that as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant