Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crypt format occasionally fails a --test-full=0 --format=cpu run on Ubuntu 22 powerpc64le #5491

Open
claudioandre-br opened this issue Jun 3, 2024 · 13 comments

Comments

@claudioandre-br
Copy link
Member

claudioandre-br commented Jun 3, 2024

This is Ubuntu 22, Canonical's hardware. Nothing related changed in john itself recently.

This is probably a hard to reproduce issue.

Version: 1.9.0-jumbo-1+bleeding-d384b5be9a 2024-05-30 20:33:48 +0200
Build: linux-gnu 64-bit powerpc64le Altivec AC OMP
SIMD: AltiVec, interleaving: MD4:1 MD5:1 SHA1:1 SHA256:1 SHA512:1
[...]
gcc version: 11.4.0
GNU libc version: 2.35 (loaded: 2.35)
Crypto library: OpenSSL
OpenSSL library version: 030000020
OpenSSL 3.0.2 15 Mar 2022
Will run 4 OpenMP threads
Testing: descrypt, traditional crypt(3) [DES 128/128 AltiVec]... (4xOMP) PASS
Testing: bsdicrypt, BSDI crypt(3) ("_J9..", 725 iterations) [DES 128/128 AltiVec]... (4xOMP) PASS
[...]
Testing: dummy [N/A]... PASS
Testing: crypt, generic crypt(3) [?/64]... (4xOMP) FAILED (cmp_all(96))
1 out of 410 tests have FAILED
 FAILED: -test-full=0 --format=cpu

In all cases, documenting. Well, "a close look" at the format can always make it better.


As I recall, crypt is a format that I have seen fail in some context(s).

Full log at https://launchpadlibrarian.net/733306772/buildlog_snap_ubuntu_jammy_ppc64el_john-the-ripper_BUILDING.txt.gz

@solardiz
Copy link
Member

solardiz commented Jun 3, 2024

Oh wow, could be a thread-safety bug in libxcrypt. What do you mean by "Canonical hardware"?

@claudioandre-br
Copy link
Member Author

The owner is Canonical (Ubuntu company). It might be the first time I've seen this, but it might not be either.

@solardiz
Copy link
Member

solardiz commented Jun 3, 2024

Does this occur during snap package build? Does the build fail when this happens? I suppose we have no easy way to try and trigger the bug on its own (e.g., running just this one test)? Maybe we should try on Compile Farm's hardware.

@claudioandre-br
Copy link
Member Author

Does this occur during snap package build?

Yes

Does the build fail when this happens?

Only if I want it to fail.

I suppose we have no easy way to try and trigger the bug on its own (e.g., running just this one test)?

It's probably random, so it will be extremely difficult to get more information about it.
I join a queue to create the build, some stuff is possible, but it is not nice for debugging tasks.

@solardiz
Copy link
Member

solardiz commented Jun 3, 2024

Reviewed the code in libxcrypt and our c3_fmt.c, found no relevant issues, including none recently fixed in libxcrypt (as Ubuntu may not have the latest version). However, found and will fix various other minor issues in our code, which should make no difference with respect to this issue.

@solardiz
Copy link
Member

solardiz commented Jun 3, 2024

GNU libc version: 2.35 (loaded: 2.35)

It wouldn't be needed this time, but in general I wonder if we want and can easily add libxcrypt version in here, but somehow only when we know we're linking against libxcrypt (tricky, since we don't do that explicitly - we just do -lcrypt, which can be provided by different libraries depending on system)?

@solardiz solardiz changed the title crypt format fails a --test-full=0 --format=cpu_ run crypt format occasionally fails a --test-full=0 --format=cpu run on Ubuntu 22 powerpc64le Jun 3, 2024
@solardiz
Copy link
Member

solardiz commented Jun 3, 2024

I can't reproduce this on cfarm29 (Raptor Blackbird ppc64le POWER9 Debian 12.5 bookworm 6.1.0-21-powerpc64le) running:

while :; do OMP_NUM_THREADS=4 ../run/john --test-full=0 --format=crypt || break; done
solar@cfarm29:~/john/src$ ldd ../run/john
        linux-vdso64.so.1 (0x00007fff94c80000)
        libcrypto.so.3 => /lib/powerpc64le-linux-gnu/libcrypto.so.3 (0x00007fff92800000)
        libgmp.so.10 => /lib/powerpc64le-linux-gnu/libgmp.so.10 (0x00007fff94b80000)
        libm.so.6 => /lib/powerpc64le-linux-gnu/libm.so.6 (0x00007fff926d0000)
        libz.so.1 => /lib/powerpc64le-linux-gnu/libz.so.1 (0x00007fff94b40000)
        libcrypt.so.1 => /lib/powerpc64le-linux-gnu/libcrypt.so.1 (0x00007fff94ae0000)
        libbz2.so.1.0 => /lib/powerpc64le-linux-gnu/libbz2.so.1.0 (0x00007fff92dc0000)
        libgomp.so.1 => /lib/powerpc64le-linux-gnu/libgomp.so.1 (0x00007fff92650000)
        libc.so.6 => /lib/powerpc64le-linux-gnu/libc.so.6 (0x00007fff92200000)
        /lib64/ld64.so.2 (0x00007fff94c90000)
solar@cfarm29:~/john/src$ dpkg -S /lib/powerpc64le-linux-gnu/libcrypt.so.1
libcrypt1:ppc64el: /lib/powerpc64le-linux-gnu/libcrypt.so.1
solar@cfarm29:~/john/src$ dpkg -s libcrypt1
Package: libcrypt1
Protected: yes
Status: install ok installed
Priority: optional
Section: libs
Installed-Size: 289
Maintainer: Marco d'Itri <[email protected]>
Architecture: ppc64el
Multi-Arch: same
Source: libxcrypt
Version: 1:4.4.33-2
Replaces: libc6 (<< 2.29-4)
Depends: libc6 (>= 2.36)
Conflicts: libpam0g (<< 1.4.0-10)
Description: libcrypt shared library
 libxcrypt is a modern library for one-way hashing of passwords.
 It supports DES, MD5, NTHASH, SUNMD5, SHA-2-256, SHA-2-512, and
 bcrypt-based password hashes
 It provides the traditional Unix 'crypt' and 'crypt_r' interfaces,
 as well as a set of extended interfaces like 'crypt_gensalt'.
Important: yes

@solardiz
Copy link
Member

solardiz commented Jun 4, 2024

I can't reproduce this on cfarm29 (Raptor Blackbird ppc64le POWER9 Debian 12.5 bookworm 6.1.0-21-powerpc64le)

On the same system, I also couldn't reproduce this with:

for n in `seq 0 99`; do time OMP_NUM_THREADS=4 ../run/john --test-full=0 -form=cpu || break; done

which took over 10 hours.

@solardiz
Copy link
Member

solardiz commented Jun 4, 2024

Still couldn't reproduce with 32, 33, 333 threads (this system has 32 hardware threads). However, after a while I got this:

Testing: gpg, OpenPGP / GnuPG Secret Key [32/64]... (32xOMP) FAILED (cmp_all(64))

which could be a bug in our format, in OpenSSL, in the OpenMP implementation, in the kernel, or below.

@solardiz
Copy link
Member

solardiz commented Jun 4, 2024

The gpg issue is quite reproducible:

solar@cfarm29:~/john/src$ for n in `seq 0 99`; do ../run/john --test-full=0 -form=gpg -v=5 || break; done
initUnicode(UNICODE, RAW/RAW)
RAW -> RAW -> RAW
Will run 32 OpenMP threads
Testing: gpg, OpenPGP / GnuPG Secret Key [32/64]... (32xOMP) Loaded 30 hashes with 30 different salts to test db from test vectors

gpg OMP autotune using test db with s2k-count of 11534336

OMP scale 1: 32 crypts (1x32) in 64.405 ms, 496 c/s +
OMP scale 2: 64 crypts (1x64) in 115.352 ms, 554 c/s +
Autotune found best speed at OMP scale of 2
PASS
initUnicode(UNICODE, RAW/RAW)
RAW -> RAW -> RAW
Will run 32 OpenMP threads
Testing: gpg, OpenPGP / GnuPG Secret Key [32/64]... (32xOMP) Loaded 30 hashes with 30 different salts to test db from test vectors

gpg OMP autotune using test db with s2k-count of 11534336

OMP scale 1: 32 crypts (1x32) in 62.556 ms, 511 c/s +
OMP scale 2: 64 crypts (1x64) in 114.906 ms, 556 c/s +
Autotune found best speed at OMP scale of 2
PASS
initUnicode(UNICODE, RAW/RAW)
RAW -> RAW -> RAW
Will run 32 OpenMP threads
Testing: gpg, OpenPGP / GnuPG Secret Key [32/64]... (32xOMP) Loaded 30 hashes with 30 different salts to test db from test vectors

gpg OMP autotune using test db with s2k-count of 11534336

OMP scale 1: 32 crypts (1x32) in 62.355 ms, 513 c/s +
OMP scale 2: 64 crypts (1x64) in 114.695 ms, 557 c/s +
Autotune found best speed at OMP scale of 2
PASS
initUnicode(UNICODE, RAW/RAW)
RAW -> RAW -> RAW
Will run 32 OpenMP threads
Testing: gpg, OpenPGP / GnuPG Secret Key [32/64]... (32xOMP) Loaded 30 hashes with 30 different salts to test db from test vectors

gpg OMP autotune using test db with s2k-count of 11534336

OMP scale 1: 32 crypts (1x32) in 65.688 ms, 487 c/s +
OMP scale 2: 64 crypts (1x64) in 115.867 ms, 552 c/s +
Autotune found best speed at OMP scale of 2
FAILED (cmp_all(64) $gpg$*1*650*2048*72624bb7243579c0c77cf1e64565251e0ac9d0dcb2f4b98fa54e1678ee4234409efe464a117b21aff978907cfbf19eb2547d44e3a2e6f7db5bfceb4af2391992f30ff55a292d0c011f05c3ab27a1a3fde1a9fd1fbf)

and on another occasion:

Will run 32 OpenMP threads
Testing: gpg, OpenPGP / GnuPG Secret Key [32/64]... (32xOMP) Loaded 30 hashes with 30 different salts to test db from test vectors

gpg OMP autotune using test db with s2k-count of 11534336

OMP scale 1: 32 crypts (1x32) in 62.383 ms, 512 c/s +
OMP scale 2: 64 crypts (1x64) in 115.573 ms, 553 c/s +
Autotune found best speed at OMP scale of 2
FAILED (get_key(47) (case) 80808080�67890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901 openwall)

It usually fails on cmp_all(64), but one was get_key(47) as above.

On this system, it auto-tunes to OMP_SCALE 2, which isn't something we commonly test on x86_64 (where we have pre-tuned OMP_SCALE of 1 for this format).

@solardiz
Copy link
Member

solardiz commented Jun 4, 2024

On this system, it auto-tunes to OMP_SCALE 2, which isn't something we commonly test on x86_64 (where we have pre-tuned OMP_SCALE of 1 for this format).

Looks like a red herring. After a little while, I also got it to fail here with -tune=1:

Will run 32 OpenMP threads
Testing: gpg, OpenPGP / GnuPG Secret Key [32/64]... (32xOMP) Loaded 30 hashes with 30 different salts to test db from test vectors
FAILED (cmp_all(32) $gpg$*1*650*2048*72624bb7243579c0c77cf1e64565251e0ac9d0dcb2f4b98fa54e1678ee4234409efe464a117b21aff978907cfbf19eb2547d44e3a2e6f7db5bfceb4af2391992f30ff55a292d0c011f05c3ab27a1a3fde1a9fd1fbf)

@solardiz
Copy link
Member

solardiz commented Jun 4, 2024

I took further comments on the gpg issue to #3543 as we already had that issue opened and its cause is quite likely separate from what Claudio observed with the crypt format.

@solardiz
Copy link
Member

solardiz commented Jun 9, 2024

If anyone wants to proceed to debug this further, a next step could be to identify and take Ubuntu's exact libxcrypt version and binary package where the issue was triggered and experiment with that on cfarm29. I think the system is similar enough that the same libxcrypt binary could be loaded there via LD_LIBRARY_PATH. @claudioandre-br maybe you?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants