Bug 1871087

Summary: lbzip2 testsuite hangs on s390x when built with LTO
Product: [Fedora] Fedora Reporter: Dominik 'Rathann' Mierzejewski <dominik>
Component: lbzip2Assignee: Dominik 'Rathann' Mierzejewski <dominik>
Status: CLOSED RAWHIDE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 34CC: dan, dominik, law, lzap, mizdebsk, tstaudt
Target Milestone: ---   
Target Release: ---   
Hardware: s390x   
OS: Unspecified   
Whiteboard:
Fixed In Version: lbzip2-2.5-23.20171011gitb6dc48a.fc37 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-02-22 11:55:12 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 467765, 485231    

Description Dominik 'Rathann' Mierzejewski 2020-08-21 10:10:30 UTC
Description of problem:
Test suite hangs on F33+
...
PASS: manual-expand.test 15 a20fc1f69cbf5e491cbd9a06352cb70d19e3e153
PASS: manual-expand.test 16 a904f3028c0ed3c565a1fb5c69404415f7a8f1d4
PASS: manual-expand.test 17 b229cc0d0253184fdf74b29dddaa8f11ea3f2e2f
PASS: manual-expand.test 18 da39a3ee5e6b4b0d3255bfef95601890afd80709
make[3]: Leaving directory '/builddir/build/BUILD/lbzip2-b6dc48a7b9bfe6b340ed1f6d72133608ad57144b/tests'
<hang>

Version-Release number of selected component (if applicable):
2.5-16.20171011gitb6dc48a

Comment 1 Dan Horák 2020-08-21 10:39:49 UTC
Seems to be LTO runtime related, with %global _lto_cflags %{nil} the rpm is built.

Comment 2 Dan Horák 2020-08-21 10:42:02 UTC
And for the record, build fails early in the build-aux/autogen.sh call when nspawn isolation is used (I guess on all arches), works OK in chroot.

Comment 3 Dominik 'Rathann' Mierzejewski 2020-08-21 12:12:27 UTC
(In reply to Dan Horák from comment #1)
> Seems to be LTO runtime related, with %global _lto_cflags %{nil} the rpm is
> built.

Confirmed and opt-out applied for s390x only, thank you!

Cc'ing Jeff.

(In reply to Dan Horák from comment #2)
> And for the record, build fails early in the build-aux/autogen.sh call when
> nspawn isolation is used (I guess on all arches), works OK in chroot.

It doesn't in koji anymore. glibc-2.32.9000-4.fc34 fixes this.

By the way, could you update the fedora-gpg-keys packages on your lfedora2 s390x test VM? It's unable to build any rawhide packages in mock:
...
Public key for zstd-1.4.5-5.fc33.s390x.rpm is not installed. Failing package is: zstd-1.4.5-5.fc33.s390x
 GPG Keys are configured as: file:///usr/share/distribution-gpg-keys/fedora/RPM-GPG-KEY-fedora-33-primary, file:///usr/share/distribution-gpg-keys/fedora/RPM-GPG-KEY-fedora-33-primary, file:///usr/share/distribution-gpg-keys/fedora/RPM-GPG-KEY-fedora-32-primary
Error: GPG check FAILED

Comment 4 Dan Horák 2020-08-21 12:28:32 UTC
some details from the stuck test, I wonder if there could be a real deadlock hidden somewhere in the app and LTO exposes it

ps output:
 110995 ?        Ss     0:00 /usr/bin/rpmbuild -bb --target s390x --nodeps /builddir/build/SPECS/lbzip2.spec
 131144 ?        S      0:00 /bin/sh -e /var/tmp/rpm-tmp.F1FJ3s
 131145 ?        S      0:00 /usr/bin/make -O -j8 V=1 VERBOSE=1 check
 131146 ?        S      0:00 /bin/sh -c fail=; \ if (target_option=k; case ${target_option-} in ?) ;; *) echo "am__make_running_with_option: internal error: invalid" "target option '${target
 131170 ?        S      0:00 /bin/sh -c fail=; \ if (target_option=k; case ${target_option-} in ?) ;; *) echo "am__make_running_with_option: internal error: invalid" "target option '${target
 131171 ?        S      0:00 /usr/bin/make check
 131205 ?        S      0:00 /usr/bin/make check-TESTS check-local
 131211 ?        S      0:00 /bin/sh -c set +e; bases='manual-compress.log fuzz-collect.log fuzz-divbwt.log manual-expand.log'; bases=`for i in $bases; do echo $i; done | sed 's/\.log$//'`; 
 131220 ?        S      0:00 /usr/bin/make test-suite.log TEST_LOGS=manual-compress.log fuzz-collect.log fuzz-divbwt.log manual-expand.log
 131221 ?        S      0:00 /bin/sh -c p='manual-compress.test'; \ case 'manual-compress.log' in */*) case 'manual-compress' in */*) b='manual-compress';; *) b=`echo 'manual-compress.log' |
 131237 ?        S      0:00 /bin/sh ../build-aux/tap-driver.sh --test-name manual-compress.test --log-file manual-compress.log --trs-file manual-compress.trs --color-tests no --enable-hard-
 131241 ?        S      0:00 /bin/sh ../build-aux/tap-driver.sh --test-name manual-compress.test --log-file manual-compress.log --trs-file manual-compress.trs --color-tests no --enable-hard-
 131242 ?        S      0:00 gawk -v me=tap-driver.sh -v test_script_name=manual-compress.test -v log_file=manual-compress.log -v trs_file=manual-compress.trs -v expect_failure=0 -v merge=0 
 131243 ?        S      0:00 ./driver compress suite/manual-compress
 131811 ?        S      0:00 sh -c cat >&3
 131814 ?        S      0:00 cat
 131933 ?        Sl     0:00 lbzip2

[sharkcz@rock-kvmlp-fedora lbzip2]$ sudo strace -p 131933
[sudo] password for sharkcz: 
strace: Process 131933 attached
rt_sigsuspend([PIPE XFSZ], 8

^Cstrace: Process 131933 detached
 <detached ...>


after attaching gdb to the process

(gdb) thread apply all bt

Thread 10 (LWP 131944):
#0  0x000003ff8298fc88 in pthread_cond_wait@@GLIBC_2.3.2 () from target:/lib64/libpthread.so.0
#1  0x000002aa28d12c70 in worker_thread_proc () at process.c:452
#2  0x000002aa28d123ee in thread_entry (real_entry=<optimized out>) at process.c:82
#3  0x000003ff82989556 in start_thread () from target:/lib64/libpthread.so.0
#4  0x000003ff828a04f6 in thread_start () from target:/lib64/libc.so.6

Thread 9 (LWP 131943):
#0  0x000003ff8298fc88 in pthread_cond_wait@@GLIBC_2.3.2 () from target:/lib64/libpthread.so.0
#1  0x000002aa28d12c70 in worker_thread_proc () at process.c:452
#2  0x000002aa28d123ee in thread_entry (real_entry=<optimized out>) at process.c:82
#3  0x000003ff82989556 in start_thread () from target:/lib64/libpthread.so.0
#4  0x000003ff828a04f6 in thread_start () from target:/lib64/libc.so.6

Thread 8 (LWP 131942):
#0  0x000003ff8298fc88 in pthread_cond_wait@@GLIBC_2.3.2 () from target:/lib64/libpthread.so.0
#1  0x000002aa28d12c70 in worker_thread_proc () at process.c:452
#2  0x000002aa28d123ee in thread_entry (real_entry=<optimized out>) at process.c:82
#3  0x000003ff82989556 in start_thread () from target:/lib64/libpthread.so.0
#4  0x000003ff828a04f6 in thread_start () from target:/lib64/libc.so.6

Thread 7 (LWP 131941):
#0  0x000003ff8298fc88 in pthread_cond_wait@@GLIBC_2.3.2 () from target:/lib64/libpthread.so.0
#1  0x000002aa28d12c70 in worker_thread_proc () at process.c:452
#2  0x000002aa28d123ee in thread_entry (real_entry=<optimized out>) at process.c:82
#3  0x000003ff82989556 in start_thread () from target:/lib64/libpthread.so.0
#4  0x000003ff828a04f6 in thread_start () from target:/lib64/libc.so.6

Thread 6 (LWP 131940):
#0  0x000003ff8298fc88 in pthread_cond_wait@@GLIBC_2.3.2 () from target:/lib64/libpthread.so.0
#1  0x000002aa28d12c70 in worker_thread_proc () at process.c:452
#2  0x000002aa28d123ee in thread_entry (real_entry=<optimized out>) at process.c:82
#3  0x000003ff82989556 in start_thread () from target:/lib64/libpthread.so.0
#4  0x000003ff828a04f6 in thread_start () from target:/lib64/libc.so.6

Thread 5 (LWP 131939):
#0  0x000003ff8298fc88 in pthread_cond_wait@@GLIBC_2.3.2 () from target:/lib64/libpthread.so.0
#1  0x000002aa28d12c70 in worker_thread_proc () at process.c:452
#2  0x000002aa28d123ee in thread_entry (real_entry=<optimized out>) at process.c:82
#3  0x000003ff82989556 in start_thread () from target:/lib64/libpthread.so.0
#4  0x000003ff828a04f6 in thread_start () from target:/lib64/libc.so.6

Thread 4 (LWP 131938):
#0  0x000003ff8298fc88 in pthread_cond_wait@@GLIBC_2.3.2 () from target:/lib64/libpthread.so.0
#1  0x000002aa28d12c70 in worker_thread_proc () at process.c:452
#2  0x000002aa28d123ee in thread_entry (real_entry=<optimized out>) at process.c:82
#3  0x000003ff82989556 in start_thread () from target:/lib64/libpthread.so.0
#4  0x000003ff828a04f6 in thread_start () from target:/lib64/libc.so.6

Thread 3 (LWP 131936):
#0  0x000003ff8298fc88 in pthread_cond_wait@@GLIBC_2.3.2 () from target:/lib64/libpthread.so.0
#1  0x000002aa28d128ba in sink_thread_proc () at process.c:373
#2  0x000002aa28d123ee in thread_entry (real_entry=<optimized out>) at process.c:82
--Type <RET> for more, q to quit, c to continue without paging--
#3  0x000003ff82989556 in start_thread () from target:/lib64/libpthread.so.0
#4  0x000003ff828a04f6 in thread_start () from target:/lib64/libc.so.6

Thread 2 (LWP 131935):
#0  0x000003ff8298fc88 in pthread_cond_wait@@GLIBC_2.3.2 () from target:/lib64/libpthread.so.0
#1  0x000002aa28d152c4 in worker_thread_proc () at process.c:427
#2  primary_thread () at process.c:529
#3  0x000002aa28d123ee in thread_entry (real_entry=<optimized out>) at process.c:82
#4  0x000003ff82989556 in start_thread () from target:/lib64/libpthread.so.0
#5  0x000003ff828a04f6 in thread_start () from target:/lib64/libc.so.6

Thread 1 (LWP 131933):
#0  0x000003ff827cae7a in sigsuspend () from target:/lib64/libc.so.6
#1  0x000002aa28d03458 in halt () at signals.c:233
#2  schedule (proc=0x2aa28d26a28 <compression>) at process.c:611
#3  work () at process.c:654
#4  main (argc=<optimized out>, argv=<optimized out>) at main.c:934

Comment 5 Jeff Law 2020-08-21 15:23:33 UTC
It could well be the case there's a concurrency issue that is pre-existing and possibly made worse by LTO -- particulary if there's shared state across the threads which is not properly marked.  Has anyone tried the thread sanitizer on this code?

Comment 6 Ben Cotton 2021-02-09 15:16:09 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 34 development cycle.
Changing version to 34.

Comment 7 Dan Horák 2022-02-22 11:55:12 UTC
seems it's not a problem any more, closing