RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 2119304 - glibc: Upgrading to glibc-2.28-209.el8.x86_64 causes segfaults during concurrent process launch
Summary: glibc: Upgrading to glibc-2.28-209.el8.x86_64 causes segfaults during concurr...
Keywords:
Status: CLOSED ERRATA
Alias: None
Deadline: 2022-08-29
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: glibc
Version: CentOS Stream
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Florian Weimer
QA Contact: Martin Coufal
URL:
Whiteboard:
Depends On:
Blocks: 2121536
TreeView+ depends on / blocked
 
Reported: 2022-08-18 09:14 UTC by Ben Morrice
Modified: 2023-07-18 14:30 UTC (History)
15 users (show)

Fixed In Version: glibc-2.28-211.el8
Doc Type: No Doc Update
Doc Text:
Clone Of:
: 2121536 (view as bug list)
Environment:
Last Closed: 2022-11-08 10:43:12 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHELPLAN-131400 0 None None None 2022-08-18 09:19:39 UTC
Red Hat Product Errata RHBA-2022:7684 0 None None None 2022-11-08 10:43:17 UTC

Internal Links: 2149994

Description Ben Morrice 2022-08-18 09:14:33 UTC
Description of problem:

Version-Release number of selected component (if applicable):

glibc-2.28-209.el8.x86_64

How reproducible:

Easily / every time

Steps to Reproduce:

Have glibc-2.28-208.el8.x86_64 (or lower) installed
Run a simple script such as

#!/bin/bash
while true; do
  /bin/true
  sleep 0.05
done

Upgrade glibc to glibc-2.28-209.el8.x86_64

This will cause the script '/bin/true' to seg fault

Example dmesg output:

[ 2264.221694] show_signal: 10 callbacks suppressed
[ 2264.221709] traps: true[36840] general protection fault ip:7f36ffe01d83 sp:7ffe42ce05e0 error:0 in libc-2.28.so[7f36ffdc7000+1bc000]
[ 2264.230469] traps: systemd-coredum[36841] general protection fault ip:7f2b6dabfd83 sp:7fffdd63f360 error:0 in libc-2.28.so[7f2b6da85000+1bc000]
[ 2264.230501] Process 36841(systemd-coredum) has RLIMIT_CORE set to 1
[ 2264.230503] Aborting core
[ 2264.232424] traps: sleep[36842] general protection fault ip:7fcc20367d83 sp:7fff0edec260 error:0 in libc-2.28.so[7fcc2032d000+1bc000]
[ 2264.238065] traps: systemd-coredum[36843] general protection fault ip:7f3235f57d83 sp:7ffcb67e90c0 error:0 in libc-2.28.so[7f3235f1d000+1bc000]
[ 2264.238095] Process 36843(systemd-coredum) has RLIMIT_CORE set to 1
[ 2264.238097] Aborting core
[ 2264.239801] traps: true[36844] general protection fault ip:7f95eec16d83 sp:7ffc2267d190 error:0 in libc-2.28.so[7f95eebdc000+1bc000]
[ 2264.244854] traps: systemd-coredum[36845] general protection fault ip:7f1517fe3d83 sp:7ffd75dbf3d0 error:0 in libc-2.28.so[7f1517fa9000+1bc000]
[ 2264.244876] Process 36845(systemd-coredum) has RLIMIT_CORE set to 1
[ 2264.244878] Aborting core
[ 2264.246514] traps: sleep[36846] general protection fault ip:7fa2cf43bd83 sp:7fff41aee840 error:0 in libc-2.28.so[7fa2cf401000+1bc000]
[ 2264.251660] traps: systemd-coredum[36847] general protection fault ip:7f36951c4d83 sp:7ffefb45ebf0 error:0 in libc-2.28.so[7f369518a000+1bc000]
[ 2264.251684] Process 36847(systemd-coredum) has RLIMIT_CORE set to 1
[ 2264.251685] Aborting core
[ 2264.253329] traps: true[36848] general protection fault ip:7fdd93ae0d83 sp:7fffeab980d0 error:0 in libc-2.28.so[7fdd93aa6000+1bc000]
[ 2264.258431] traps: systemd-coredum[36849] general protection fault ip:7f950145ed83 sp:7ffef6c5b960 error:0 in libc-2.28.so[7f9501424000+1bc000]
[ 2264.258455] Process 36849(systemd-coredum) has RLIMIT_CORE set to 1
[ 2264.258456] Aborting core
[ 2264.265079] Process 36851(systemd-coredum) has RLIMIT_CORE set to 1
[ 2264.265082] Aborting core
[ 2264.271712] Process 36853(systemd-coredum) has RLIMIT_CORE set to 1
[ 2264.271715] Aborting core
[ 2264.278745] Process 36855(systemd-coredum) has RLIMIT_CORE set to 1

Actual results:

processes seg fault

Expected results:

processes should not seg fault

Additional info:

The script referred to above is just an example.
We are seeing this behaviour across a wide range of processes whilst glibc is upgraded on many systems.

Comment 1 Florian Weimer 2022-08-18 11:43:50 UTC
The new ld.so cannot load the old libc. There is a brief time window when RPM has already renamed the new ld.so in place, but the file system still has the old libc.so (actual file names differ). This is not a new issue, we have tickled the same bug during the life time of Red Hat Enterprise Linux 8 as we implemented other dynamic loader enhancements.

I'm not sure what we can do about this.

Comment 2 Florian Weimer 2022-08-19 11:26:06 UTC
I posted some upstream patches to detect this situation and avoid the coredump and print a clear error message (“Fatal glibc error: ld.so/libc.so mismatch detected”):

[PATCH 0/2] Check ld.so/libc.so consistency during startup
<https://sourceware.org/pipermail/libc-alpha/2022-August/141525.html>

During the update, processes will still fail to start (that's going to be much hard to fix because upstream needs to be sufficiently distribution-agnostic), but the failures do not result in coredumps, so the secondary crashes from systemd-coredumpd are gone.

Comment 3 Florian Weimer 2022-08-24 11:36:29 UTC
The upstream change works to mask the segmentation fault in the reproducer (and a couple of related ones) with our 2.28-based glibc, but we have additional ABI exposure due to our still-separate libpthread and libdl downstream (merged into libc upstream in 2.34). I'm inclined to move forward with the coredump suppression logic we can implement today, although it is incomplete.

Comment 10 Florian Weimer 2022-08-26 12:21:48 UTC
Here is what we are changing:

Updates from -208 and earlier to -211 on aarch64, x86_64 are expected not to cause any disruptions (even in the presence of concurrent process creation) because we have taken steps to minimize internal ABI impact coming from the LD_AUDIT changes (internal bug 2047981). On ppc64le POWER9, an error message “undefined symbol: _dl_audit_symbind_alt, version GLIBC_PRIVATE” might be printed due to different shared object upgrade order on this platform.

For s390x, the crash free upgrades are possible from version -202 and earlier. This is because bug 2077835 already introduced a private ABI change.

Because we have reverted the internal ABI changes, upgrades from -209 and later to -211 (-203 for s390x) will likely crash once more if processes are created concurrently with an update. Downgrades are negatively impacted as well. There is really no good way to avoid either set of issues. We investigated version fingerprinting, but the impact on static dlopen was eventually deemed to high.

Please note that this ABI inconsistency is restricted to the internal glibc ABI, not the public ABI. The issue materializes if a process loads different components of glibc at different times. (This can happen because glibc is split across several files internally.) Typically, this is triggered if a process launches concurrently with a glibc update or downgrade, but it is also possible that issues arise if a long-running component loads parts of glibc later (say indirectly via dlopen).

Comment 16 Alex Iribarren 2022-09-13 11:31:13 UTC
Any idea when this update will be released?

Comment 18 Alex Iribarren 2022-09-14 08:25:33 UTC
And of course it was released just after my post... :) Sorry for the noise.

Comment 19 Florian Weimer 2022-09-14 08:31:46 UTC
(In reply to Alex Iribarren from comment #18)
> And of course it was released just after my post... :) Sorry for the noise.

No worries, I should have said here that the update was under way when I received word that it was.

Please let us know if there are remaining issues with updates on live systems. I think we addressed the main source of crashes, but delayed loading of libpthread with glibc 2.28 remains tricky.

Comment 21 errata-xmlrpc 2022-11-08 10:43:12 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (glibc bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:7684


Note You need to log in before you can comment on or make changes to this bug.