Bug 2119304
Summary: | glibc: Upgrading to glibc-2.28-209.el8.x86_64 causes segfaults during concurrent process launch | |||
---|---|---|---|---|
Product: | Red Hat Enterprise Linux 8 | Reporter: | Ben Morrice <ben.morrice> | |
Component: | glibc | Assignee: | Florian Weimer <fweimer> | |
Status: | CLOSED ERRATA | QA Contact: | Martin Coufal <mcoufal> | |
Severity: | unspecified | Docs Contact: | ||
Priority: | unspecified | |||
Version: | CentOS Stream | CC: | alex.iribarren, ashankar, bstinson, codonell, daniel.vanderster, davide, dj, fweimer, jwboyer, kpfleming, mcoufal, mnewsome, pfrankli, sipoyare, skolosov | |
Target Milestone: | rc | Keywords: | Bugfix, Patch, Triaged | |
Target Release: | --- | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | glibc-2.28-211.el8 | Doc Type: | No Doc Update | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 2121536 (view as bug list) | Environment: | ||
Last Closed: | 2022-11-08 10:43:12 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 2121536 | |||
Deadline: | 2022-08-29 |
Description
Ben Morrice
2022-08-18 09:14:33 UTC
The new ld.so cannot load the old libc. There is a brief time window when RPM has already renamed the new ld.so in place, but the file system still has the old libc.so (actual file names differ). This is not a new issue, we have tickled the same bug during the life time of Red Hat Enterprise Linux 8 as we implemented other dynamic loader enhancements. I'm not sure what we can do about this. I posted some upstream patches to detect this situation and avoid the coredump and print a clear error message (“Fatal glibc error: ld.so/libc.so mismatch detected”): [PATCH 0/2] Check ld.so/libc.so consistency during startup <https://sourceware.org/pipermail/libc-alpha/2022-August/141525.html> During the update, processes will still fail to start (that's going to be much hard to fix because upstream needs to be sufficiently distribution-agnostic), but the failures do not result in coredumps, so the secondary crashes from systemd-coredumpd are gone. The upstream change works to mask the segmentation fault in the reproducer (and a couple of related ones) with our 2.28-based glibc, but we have additional ABI exposure due to our still-separate libpthread and libdl downstream (merged into libc upstream in 2.34). I'm inclined to move forward with the coredump suppression logic we can implement today, although it is incomplete. Here is what we are changing: Updates from -208 and earlier to -211 on aarch64, x86_64 are expected not to cause any disruptions (even in the presence of concurrent process creation) because we have taken steps to minimize internal ABI impact coming from the LD_AUDIT changes (internal bug 2047981). On ppc64le POWER9, an error message “undefined symbol: _dl_audit_symbind_alt, version GLIBC_PRIVATE” might be printed due to different shared object upgrade order on this platform. For s390x, the crash free upgrades are possible from version -202 and earlier. This is because bug 2077835 already introduced a private ABI change. Because we have reverted the internal ABI changes, upgrades from -209 and later to -211 (-203 for s390x) will likely crash once more if processes are created concurrently with an update. Downgrades are negatively impacted as well. There is really no good way to avoid either set of issues. We investigated version fingerprinting, but the impact on static dlopen was eventually deemed to high. Please note that this ABI inconsistency is restricted to the internal glibc ABI, not the public ABI. The issue materializes if a process loads different components of glibc at different times. (This can happen because glibc is split across several files internally.) Typically, this is triggered if a process launches concurrently with a glibc update or downgrade, but it is also possible that issues arise if a long-running component loads parts of glibc later (say indirectly via dlopen). Any idea when this update will be released? And of course it was released just after my post... :) Sorry for the noise. (In reply to Alex Iribarren from comment #18) > And of course it was released just after my post... :) Sorry for the noise. No worries, I should have said here that the update was under way when I received word that it was. Please let us know if there are remaining issues with updates on live systems. I think we addressed the main source of crashes, but delayed loading of libpthread with glibc 2.28 remains tricky. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (glibc bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2022:7684 |