Note: This bug is displayed in read-only format because
the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Description of problem:
A customer hit a CPU consumption issue within bash, which happens when bash tries to reap a dead child.
Taking regularly coredumps, we could see that backtraces show it's spinning in bgp_delete():
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
#0 0x000055b4ce1a64cf in bgp_delete (pid=pid@entry=2074684) at jobs.c:879
#1 0x000055b4ce1aa7d1 in make_child (command=0x55b4d0c77ed0 "cut -f 1 -d '|'", async_p=async_p@entry=0)
at jobs.c:2104
#2 0x000055b4ce195e98 in execute_simple_command (simple_command=0x55b4d01725a0, pipe_in=pipe_in@entry=3,
pipe_out=pipe_out@entry=-1, async=async@entry=0, fds_to_close=fds_to_close@entry=0x55b4d0d17110)
at /usr/include/bits/string_fortified.h:90
#3 0x000055b4ce1981a6 in execute_command_internal (command=0x55b4d0175140, asynchronous=0, pipe_in=3, pipe_out=-1,
fds_to_close=0x55b4d0d17110) at execute_cmd.c:819
#4 0x000055b4ce19b157 in execute_pipeline (command=command@entry=0x55b4d016c860, asynchronous=asynchronous@entry=0,
pipe_in=pipe_in@entry=-1, pipe_out=pipe_out@entry=-1, fds_to_close=fds_to_close@entry=0x55b4d0d17110)
at execute_cmd.c:2502
[...]
(gdb) list
874 if (bgpids.storage == 0 || bgpids.nalloc == 0 || bgpids.npid == 0)
875 return 0;
876
877 /* Search chain using hash to find bucket in pidstat_table */
878 for (psi = *(pshash_getbucket (pid)); psi != NO_PIDSTAT; psi = bgpids.storage[psi].bucket_next)
879 if (bgpids.storage[psi].pid == pid)
880 break;
881
882 if (psi == NO_PIDSTAT)
883 return 0; /* not found */
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
The issues started appearing after fixing BZ #1890888, but since we don't have a reproducer yet, it's hard to verify this.
Anyway we found out that the issue was reported upstream and on ubuntu:
https://bugs.launchpad.net/ubuntu/+source/bash/+bug/1822776https://lists.gnu.org/archive/html/bug-bash/2017-02/msg00025.html
And patched by https://ftp.gnu.org/gnu/bash/bash-4.4-patches/bash44-020
Please backport this ASAP, since we cannot know if other customers will hit this and the consequence is severe:
- zombies accumulating
- high CPU consumption
Version-Release number of selected component (if applicable):
bash-4.4.19-12.el8.x86_64 (RHEL8.3)
bash-4.4.19-11.el8_1.x86_64 (RHEL8.1)
How reproducible:
Don't know, upstream reproducer seems to take days / weeks.
Comment 21Siteshwar Vashisht
2021-04-08 16:22:42 UTC
*** Bug 1944734 has been marked as a duplicate of this bug. ***
Just wanted to bring attention to the fact that this particular beast has been spotted in the wild again (case #02920491). Any extra attention paid to this will be greatly appreciated. Thank you.
I also meant to mention that the customer in the new case I mentioned in my earlier update did go and take the upstream patch and include it into our .rpm.src and rebuild and is now testing it. Well, they're "testing" it in production, which they know makes that bash officially unsupported, but it is testing nonetheless.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory (bash bug fix and enhancement update), and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.
https://access.redhat.com/errata/RHBA-2021:4495
Description of problem: A customer hit a CPU consumption issue within bash, which happens when bash tries to reap a dead child. Taking regularly coredumps, we could see that backtraces show it's spinning in bgp_delete(): -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- #0 0x000055b4ce1a64cf in bgp_delete (pid=pid@entry=2074684) at jobs.c:879 #1 0x000055b4ce1aa7d1 in make_child (command=0x55b4d0c77ed0 "cut -f 1 -d '|'", async_p=async_p@entry=0) at jobs.c:2104 #2 0x000055b4ce195e98 in execute_simple_command (simple_command=0x55b4d01725a0, pipe_in=pipe_in@entry=3, pipe_out=pipe_out@entry=-1, async=async@entry=0, fds_to_close=fds_to_close@entry=0x55b4d0d17110) at /usr/include/bits/string_fortified.h:90 #3 0x000055b4ce1981a6 in execute_command_internal (command=0x55b4d0175140, asynchronous=0, pipe_in=3, pipe_out=-1, fds_to_close=0x55b4d0d17110) at execute_cmd.c:819 #4 0x000055b4ce19b157 in execute_pipeline (command=command@entry=0x55b4d016c860, asynchronous=asynchronous@entry=0, pipe_in=pipe_in@entry=-1, pipe_out=pipe_out@entry=-1, fds_to_close=fds_to_close@entry=0x55b4d0d17110) at execute_cmd.c:2502 [...] (gdb) list 874 if (bgpids.storage == 0 || bgpids.nalloc == 0 || bgpids.npid == 0) 875 return 0; 876 877 /* Search chain using hash to find bucket in pidstat_table */ 878 for (psi = *(pshash_getbucket (pid)); psi != NO_PIDSTAT; psi = bgpids.storage[psi].bucket_next) 879 if (bgpids.storage[psi].pid == pid) 880 break; 881 882 if (psi == NO_PIDSTAT) 883 return 0; /* not found */ -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- The issues started appearing after fixing BZ #1890888, but since we don't have a reproducer yet, it's hard to verify this. Anyway we found out that the issue was reported upstream and on ubuntu: https://bugs.launchpad.net/ubuntu/+source/bash/+bug/1822776 https://lists.gnu.org/archive/html/bug-bash/2017-02/msg00025.html And patched by https://ftp.gnu.org/gnu/bash/bash-4.4-patches/bash44-020 Please backport this ASAP, since we cannot know if other customers will hit this and the consequence is severe: - zombies accumulating - high CPU consumption Version-Release number of selected component (if applicable): bash-4.4.19-12.el8.x86_64 (RHEL8.3) bash-4.4.19-11.el8_1.x86_64 (RHEL8.1) How reproducible: Don't know, upstream reproducer seems to take days / weeks.