Bug 2188064
| Summary: | elfutils: eu-elfcompress now breaks hard links | |||
|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 9 | Reporter: | Jan Grulich <jgrulich> | |
| Component: | elfutils | Assignee: | Mark Wielaard <mjw> | |
| elfutils sub component: | system-version | QA Contact: | Martin Cermak <mcermak> | |
| Status: | CLOSED ERRATA | Docs Contact: | ||
| Severity: | unspecified | |||
| Priority: | unspecified | CC: | fweimer, mcermak, mjw, mprchlik, ohudlick, sipoyare, tpelka | |
| Version: | 9.3 | Keywords: | Regression, Triaged | |
| Target Milestone: | rc | Flags: | pm-rhel:
mirror+
|
|
| Target Release: | --- | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | elfutils-0.189-2.el9 | Doc Type: | No Doc Update | |
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 2190006 (view as bug list) | Environment: | ||
| Last Closed: | 2023-11-07 08:51:58 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 2190006 | |||
|
Description
Jan Grulich
2023-04-19 16:08:11 UTC
Note: I'm not sure the issue is in binutils, but we discussed this issue with @fweimer and it's just one of potential packages where this issue might be. I tried to downgrade "file", "rpm" and "glibc" before in order to see if the issue is there, but I could still reproduce in local builds. I am clutching at straws here, but I think that the culprit might be the elfutils package, or even the redhat-rpm-macros package, rather than the binutils. The reason for saying this is that the extraction of debug information from a binary (including its .symtab section) is handled by the find-debuginfo.sh script which is part of redhat-rpm-macros, and this script uses the eu-strip program from the elfutils package to strip binaries, rather than the strip program from the binutils package. Looking at the build.log files for the 5.15.3 and 5.15.9 builds I see some differences in the command lines used to invoke find-debuginfo.sh: /usr/lib/rpm/find-debuginfo.sh -j16 --strict-build-id -m -i --build-id-seed 5.15.3-1.el9 --unique-debug-suffix -5.15.3-1.el9.x86_64 --unique-debug-src-base qt5-qtxmlpatterns-5.15.3-1.el9.x86_64 --run-dwz --dwz-low-mem-die-limit 10000000 --dwz-max-die-limit 110000000 -S debugsourcefiles.list /builddir/build/BUILD/qtxmlpatterns-everywhere-src-5.15.3 /usr/lib/rpm/find-debuginfo.sh -j64 --strict-build-id -m -i --build-id-seed 5.15.9-1.el9 --unique-debug-suffix -5.15.9-1.el9.x86_64 --unique-debug-src-base qt5-qtxmlpatterns-5.15.9-1.el9.x86_64 --run-dwz --dwz-low-mem-die-limit 10000000 --dwz-max-die-limit 110000000 --remove-section .gnu.build.attributes -S debugsourcefiles.list /builddir/build/BUILD/qtxmlpatterns-everywhere-src-5.15.9 The -j option makes me wonder - could this be a race condition issue ? If we used to run 16 debug extraction jobs in parallel and we are now running 64, could there be a greater chance for the script to attempt to extract debuginfo from the same file at the same time, via the fact that the two files are hard linked ? ie could this be a latent bug that has always been there, but is now triggering more reliably because we are running more jobs in parallel ? Jan - quick (ish) question: Does adding: %define _find_debuginfo_opts "-q1" to the spec file result in binaries that are properly stripped ? If not, then I will have to ponder some more. Sorry make that: %define _find_debuginfo_opts "-j1" ie changing the number of parallel jobs to 1. (In reply to Nick Clifton from comment #3) > Sorry make that: > > %define _find_debuginfo_opts "-j1" > > ie changing the number of parallel jobs to 1. I tried a scratch build, but it doesn't make any difference. Still same issue :-/. Actually, I can see in the build log that it still uses "/usr/lib/rpm/find-debuginfo.sh -j64 ..." so it didn't change anything. Anyway, I tried it also locally, by default it uses "-j16" and it still happens. I don't know where this is invoked from, but I changed the "find-debuginfo.sh" script to use "-j=1" and it also didn't make any difference. I also think that if it would be a random concurrent issue, it would not have happened in other packages, but I can see same issue two other Qt modules. Installing elfutils-0.189-1.el9 into the 9.2 buildroot reproduces the issue. I looked at the upstream commit history, and I do not see yet what is causing this. 8-( (In reply to Florian Weimer from comment #9) > I looked at the upstream commit history, and I do not see yet what is > causing this. 8-( Hard links are tricky, there is some special code in find-debuginfo for it. One issue here might be that find-debuginfo and friends were moved into their own upstream debugedit, which is packaged and shipped with rhel9. rpmbuild in fedora uses that, but not in rhel9. See https://bugzilla.redhat.com/show_bug.cgi?id=2166383 So does this happen only with 9.3? Does it happen in Fedora? <mock-chroot> sh-5.1# cp ./src/xz/.libs/xz xz-1
<mock-chroot> sh-5.1# ln xz-1 xz-2
<mock-chroot> sh-5.1# ls -li xz-1 xz-2
183413389 -rwxr-xr-x. 2 root root 349440 Apr 21 13:59 xz-1
183413389 -rwxr-xr-x. 2 root root 349440 Apr 21 13:59 xz-2
<mock-chroot> sh-5.1# eu-elfcompress -q -p -t none xz-2
<mock-chroot> sh-5.1# ls -li xz-1 xz-2
183413389 -rwxr-xr-x. 1 root root 349440 Apr 21 13:59 xz-1
183413392 -rwxr-xr-x. 1 root root 349440 Apr 21 13:59 xz-2
<mock-chroot> sh-5.1# rpm -q elfutils
elfutils-0.189-1.el9.x86_64
I suspect it's either
commit 6bb3e0b5c2124d51c604ec0cf145419c6856f5c0
Author: Martin Liska <mliska>
Date: Mon Nov 28 14:10:36 2022 +0100
Refactor elf_compare
or:
commit a5b07cdf9c491fb7a4a16598c482c68b718f59b9
Author: Martin Liska <mliska>
Date: Tue Nov 29 10:59:30 2022 +0100
support ZSTD compression algorithm
The first one should say “elfcompress”, not “elfcompare”.
Sorry our comments crossed. And I now see Florian already tracked it down to an elfutils 0.188 -> 0.189 change with eu-elfcompress. This is slightly unfortunate because the eu-elfcompress -t none invocation is kind of unnecessary and a local fedora/rhel tweak. I don't fully understand yet how it happened though, will investigate/bisect. Found it: https://patchwork.sourceware.org/project/elfutils/patch/20230421234543.1052146-1-mark@klomp.org/ (In reply to Mark Wielaard from comment #13) > Found it: > https://patchwork.sourceware.org/project/elfutils/patch/20230421234543. > 1052146-1-mark/ Aha, so eu-elfcompress breaks hard links because it makes a spurious change to the file that is not really needed? Does this mean we still break hard links after the fix is in if we actually need to uncompress something? (In reply to Florian Weimer from comment #14) > (In reply to Mark Wielaard from comment #13) > > Found it: > > https://patchwork.sourceware.org/project/elfutils/patch/20230421234543. > > 1052146-1-mark/ > > Aha, so eu-elfcompress breaks hard links because it makes a spurious change > to the file that is not really needed? Does this mean we still break hard > links after the fix is in if we actually need to uncompress something? eu-elfcompress doesn't change the file in-place. It first writes any changes to a new file, then when done (and no errors) moves it back. So yes, if a file would actually contain compressed debug ELF sections, then it would be come a new (unlinked) file. Note that in practice this never happens. Calling eu-elfcompress before find-debuginfo seems to be a Fedora/RHEL specific thing because of a bug in the golang toolchain which did create compressed debug sections (which breaks other tools like debugedit and dwz). Verified by rebuilding qt5-qtxmlpatterns-5.15.3-1.el9.src.rpm locally with old (elfutils-0.189-1.el9) and new (elfutils-0.189-2.el9). After the rpmbuild --rebuild, I've unpacked qt5-qtxmlpatterns-devel with rpm2cpio and checked with file: 9 x86_64 # find . -type f | grep 'usr/lib64/qt5/bin/xmlpatterns$' | xargs file ./rpmbuild_old/RPMS/x86_64/usr/lib64/qt5/bin/xmlpatterns: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=73a24c426185f0161fb611c84564a86b6253e4cb, for GNU/Linux 3.2.0, with debug_info, not stripped, too many notes (256) ./rpmbuild_new/RPMS/x86_64/usr/lib64/qt5/bin/xmlpatterns: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=6f402917827d9ee9da16ae2aebc0f13b5ff5287a, for GNU/Linux 3.2.0, stripped 9 x86_64 # ls Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (elfutils bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2023:6609 |