Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consistent ninja crash when my build folder has a specific path #2309

Closed
jcelerier opened this issue Jul 28, 2023 · 8 comments · Fixed by #2489
Closed

Consistent ninja crash when my build folder has a specific path #2309

jcelerier opened this issue Jul 28, 2023 · 8 comments · Fixed by #2489

Comments

@jcelerier
Copy link

Hi,

  • ninja 1.11.1
  • generated by cmake 3.27.1
  • on an up-to-date arch linux x86_64

I'm having a bewildering issue.. my build suddenly started failing, and only when i'm doing it in a specific path on my machine (I don't have any specific mounts or something funky like that... basic linux on an XFS partition).

it's an issue I sometimes see sporadically but just rm'ing the build folder is enough.. this time it's a persistent repro though.

$ pwd                          
/home/jcelerier/score-master/3rdparty/libossia/3rdparty/build-libremidi-System_Qt6-Debug

$ cmake ~/score-master/3rdparty/libossia/3rdparty/libremidi -GNinja -DCMAKE_BUILD_TYPE=Debug   
-- The CXX compiler identification is GNU 13.1.1
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- libremidi: Using std::vector for libremidi::message
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE  
-- Found ALSA: /usr/lib/libasound.so (found version "1.2.9") 
-- libremidi: using ALSA
-- libremidi: using linked JACK
-- Looking for jack_port_rename
-- Looking for jack_port_rename - not found
-- Configuring done (0.4s)
-- Generating done (0.0s)
CMake Error:
  Running

   '/usr/bin/ninja' '-C' '/home/jcelerier/score-master/3rdparty/libossia/3rdparty/build-libremidi-System_Qt6-Debug' '-t' 'recompact'

  failed with:

   /usr/include/c++/12.2.1/bits/stl_vector.h:1123: std::vector<_Tp, _Alloc>::reference std::vector<_Tp, _Alloc>::operator[](size_type) [with _Tp = Node*; _Alloc = std::allocator<Node*>; reference = Node*&; size_type = long unsigned int]: Assertion '__n < this->size()' failed.

  Subprocess aborted

If I build in any other path (even with the same final folder name) I don't see the issue. I thought there'd be some off-by-one name buffer issue or something like this somewhere which happens to trigger in just this exact case, but if I change even one character anywhere in the path, it seems to work so it does not look like a buffer overflow.

$ pwd
/home/jcelerier/score-master/3rdparty/libossia/3rdparty/build-libremidi-System_Qt6-DebuX
$ cmake ~/score-master/3rdparty/libossia/3rdparty/libremidi -GNinja -DCMAKE_BUILD_TYPE=Debug   
-- The CXX compiler identification is GNU 13.1.1
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- libremidi: Using std::vector for libremidi::message
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE  
-- Found ALSA: /usr/lib/libasound.so (found version "1.2.9") 
-- libremidi: using ALSA
-- libremidi: using linked JACK
-- Looking for jack_port_rename
-- Looking for jack_port_rename - not found
-- Configuring done (0.4s)
-- Generating done (0.0s)
-- Build files have been written to: /home/jcelerier/score-master/3rdparty/libossia/3rdparty/build-libremidi-System_Qt6-DebuX

Trace:

(gdb) bt
#0  __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
#1  0x00007ffff7a9f2d3 in __pthread_kill_internal (signo=6, threadid=<optimized out>) at pthread_kill.c:78
#2  0x00007ffff7a4fa08 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#3  0x00007ffff7a38538 in __GI_abort () at abort.c:79
#4  0x00007ffff7cdd3b2 in std::__glibcxx_assert_fail (file=file@entry=0x55555558ad00 "/usr/include/c++/12.2.1/bits/stl_vector.h", line=line@entry=1123, 
    function=function@entry=0x55555558b330 "std::vector<_Tp, _Alloc>::reference std::vector<_Tp, _Alloc>::operator[](size_type) [with _Tp = Node*; _Alloc = std::allocator<Node*>; reference = Node*&; size_type = long unsigned int]", condition=condition@entry=0x55555558a1b8 "__n < this->size()") at /usr/src/debug/gcc/gcc/libstdc++-v3/src/c++11/debug.cc:61
#5  0x000055555558519f in std::vector<Node*, std::allocator<Node*> >::operator[] (this=<optimized out>, __n=<optimized out>) at /usr/include/c++/12.2.1/bits/stl_vector.h:1123
#6  std::vector<Node*, std::allocator<Node*> >::operator[] (this=0x7fffffffdd30, __n=<optimized out>) at /usr/include/c++/12.2.1/bits/stl_vector.h:1121
#7  DepsLog::Load(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, State*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*) [clone .constprop.0] (this=0x7fffffffdd00, path=".ninja_deps", state=0x7fffffffdb68, err=0x7fffffffd850) at src/deps_log.cc:219
#8  0x000055555555ea1b in (anonymous namespace)::NinjaMain::OpenDepsLog (this=this@entry=0x7fffffffdb50, recompact_only=recompact_only@entry=false) at src/ninja.cc:1278
#9  0x0000555555561f11 in (anonymous namespace)::real_main (argc=<optimized out>, argv=<optimized out>) at src/ninja.cc:1572
#10 0x000055555555a59f in main (argc=<optimized out>, argv=<optimized out>) at src/ninja.cc:1620

I attach a core dump, my ninja binary and the source & build folder with their absolute paths:

repro.tar.gz

@thesamesam
Copy link

Note that this needs -D_GLIBCXX_ASSERTIONS to trigger. It might happen with the libcxx equivalent too.

@digit-google
Copy link
Contributor

Thank you for your archive. I wrote a small Python script to dump the content of your .ninja_deps file which shows that it is malformed! In particular, it contains multiple input path id values like 1441 that are out of bounds with respect of the max ids allocated in the file.

This means two things:

  1. Somehow Ninja managed to generate a malformed .ninja_deps file entries, which is unexplained and very concerning :-/

  2. Ninja doesn't protect against malicious / malformed input correctly. In particular, the following code in src/deps_log.cc should catch the bad value in the first assert here, but I assume you didn't build your Ninja with assertions enabled. Moreover, an assert() is not a valid protection against malformed input like this one, and should be turned into real compile-time checks + error handling:

      for (int i = 0; i < deps_count; ++i) {
        assert(deps_data[i] < (int)nodes_.size());
        assert(nodes_[deps_data[i]]);
        deps->nodes[i] = nodes_[deps_data[i]];
      }

It should be easy to fix the deps log loader against such input. On the other hand I do not understand how something like that could be produced. Maybe it's a bad recompaction, may
be it's something else. And I'd love to better understand the issue because it's just a little scary.

read_ninja_deps.zip

digit-google added a commit to digit-google/ninja that referenced this issue Sep 2, 2024
The parser used asserts to verify the consistency of the
deps log, which resulted in crashes in production. Replace
them with correct checks and return conditions.

This allows the parser to survive and report corrupted
files, and Ninja to ignore the latter.

+ Add related unit-test.

Fixes ninja-build#2472
Fixes ninja-build#2309
digit-google added a commit to digit-google/ninja that referenced this issue Sep 3, 2024
The parser used asserts to verify the consistency of the
deps log, which resulted in crashes in production. Replace
them with correct checks and return conditions.

This allows the parser to survive and report corrupted
files, and Ninja to ignore the latter.

+ Add related unit-test.

Fixes ninja-build#2472
Fixes ninja-build#2309
digit-google added a commit to digit-google/ninja that referenced this issue Sep 4, 2024
The parser used asserts to verify the consistency of the
deps log, which resulted in crashes in production. Replace
them with correct checks and return conditions.

This allows the parser to survive and report corrupted
files, and Ninja to ignore the latter.

+ Add related unit-test.

Fixes ninja-build#2472
Fixes ninja-build#2309
@llvee
Copy link

llvee commented Sep 10, 2024

Not sure if your team saw the updates, looks like this was similar to two other issues & that this is due to corrupted files in a specific path based on the commits below. Maybe there are no commit notifications? I noticed the other issues are still open also despite seeming to have resolutions.

@digit-google
Copy link
Contributor

I have uploaded a PR that should solve the crashes (see #2489), still awaiting approval from @jhasse at this point.

@llvee
Copy link

llvee commented Sep 10, 2024

Nice, any write ups around best issues that need solving? I am interested in job opportunities at Google.

@digit-google
Copy link
Contributor

I fear this is not the right place to ask for that. While some people at Google do work on Ninja from time to time, it is a now an open-source project maintained entirely externally on Github by many developers, from different companies and environments. For example the official maintainer (@jhasse) is not a Google employee.

If you are interested in helping with Ninja, everybody will be happy for it, and you can take a look at the list of current issues in this Github repository. But be warned that this will not help you reach your stated goal (though a quick lookup with any search engine will give you the official page where you can find available positions).

@llvee
Copy link

llvee commented Sep 10, 2024

@digit-google I understand where you are coming from. I am not expecting anything from helping. I do know that it will be good for employees to verify that I can solve the technical issues in the day to day work via issues.

I do also think it may help me connect with other engineers who find compiler engineering fun, interesting like yourself.

@jhasse jhasse closed this as completed in 197dbdb Oct 4, 2024
@jhasse jhasse added this to the 1.13.0 milestone Oct 4, 2024
@llvee
Copy link

llvee commented Oct 14, 2024

@digit-google
Do you happen to have any recommendations when it comes to support channels specifically for people interested in contributing to FOSS Google?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants