Unintuitive behavior of -l due to rounding #2460

neldredge · 2024-06-09T05:02:13Z

The help text for the -l option says "do not start new jobs if the load average is greater than N". However, as currently implemented, that is not how it behaves.

The relevant code is:

ninja/src/build.cc

Lines 629 to 631 in 0b4b43a

    
           int load_capacity = config_.max_load_average - GetLoadAverage(); 
        
           if (load_capacity < capacity) 
        
             capacity = load_capacity;

Suppose we are running with -l5, so that config_.max_load_average == 5.0, and suppose GetLoadAverage() returns 4.1. Then the help text implies that new jobs should be able to start. However, this gives config_.max_load_average - GetLoadAverage() == 0.9, and assigning to int truncates the fractional part, so we end up with load_capacity == 0 and this stops any new jobs from starting until the load average falls to 4.0 or lower.

I think the code should instead be

int load_capacity = ceil(config_.max_load_average - GetLoadAverage());

so that we round up.

The code as it stands has other odd effects as well. I have been running ninja with -j4 -l5 on a 4-core system which is otherwise lightly loaded, so I expect that ninja will run 4 jobs at a time, and the load average will stabilize around 4.1. What happens instead is that ninja initially starts 4 jobs, but after a while it runs only 3 jobs at a time, and the load average stabilizes around 3.1, so most of a core goes unused.

Specifically, ninja initially runs 4 jobs at a time, until the load average reaches or slightly exceeds 4.0 (let's say 4.01). Then the next time a job finishes, we have load_capacity == 0, so a new job is not started and now only 3 jobs are running. The load average now falls down to say 3.99, and when the next job finishes, load_capacity == 1. This allows one new job to replace the one that just finished, and so we go on with just 3 jobs. We will continue to have load_capacity == 1 as long as the load average is greater than 3.0, which should remain true since we are running 3 jobs.

Switching to ceil() should fix that as well.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unintuitive behavior of -l due to rounding #2460

Unintuitive behavior of -l due to rounding #2460

neldredge commented Jun 9, 2024

Unintuitive behavior of -l due to rounding #2460

Unintuitive behavior of -l due to rounding #2460

Comments

neldredge commented Jun 9, 2024