1391150 – bash stack smash due to longjmp back to wait_builtin after it returned

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1391150 - bash stack smash due to longjmp back to wait_builtin after it returned

Summary: bash stack smash due to longjmp back to wait_builtin after it returned

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Enterprise Linux 6
Classification:	Red Hat
Component:	bash
Sub Component:
Version:	6.9
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	Siteshwar Vashisht
QA Contact:	Martin Kyral
Docs Contact:
URL:
Whiteboard:
Depends On:	1372806
Blocks:
TreeView+	depends on / blocked

Reported:	2016-11-02 16:11 UTC by Martin Kyral
Modified:	2017-12-06 12:12 UTC (History)
CC List:	1 user (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:	1372806
Environment:
Last Closed:	2017-12-06 12:12:45 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Martin Kyral 2016-11-02 16:11:30 UTC

On bash-4.1.2-42.el6 the reproducer causes segfault in a short time from being started (20-30s typically) even without the 'stack smashing detected' error message:

------------------------------------->8-----
:: [ 12:06:04 ] :: Detected 1 CPU cores, the test will run for 1200s
:: [  BEGIN   ] :: Test runs witn bash :: actually running 'grep bash script.exp'
spawn bash
send -- "bash test.sh\r"
:: [   PASS   ] :: Test runs witn bash (Expected 0, got 0)
:: [ 12:06:04 ] :: Taking small nap
:: [ 12:06:14 ] :: test.sh running in bash with PID=7581
:: [ 12:06:14 ] :: Taking small nap
:: [ 12:06:24 ] :: kill.sh running with PID=7600
kill.sh: line 5: kill: (7581) - No such process
./runtest.sh: line 75: kill: (7600) - No such process
:: [   FAIL   ] :: bash with PID 7581 dies unexpectedly after 18s 
:: [   FAIL   ] :: The test shall have ran for 1200s, it ended after 18s 
./runtest.sh: line 91: kill: (7600) - No such process
:: [ 12:06:49 ] :: Taking small nap
:: [   PASS   ] :: File 'test.log' should not contain 'stack smashing detected' 
:: [   FAIL   ] :: File 'test.log' should not contain 'core dumped' 
Segmentation fault (core dumped)
-----8<---------------------------------------

The crash has FAF report: http://faf.lab.eng.brq.redhat.com/faf/reports/201/
ABRT-generated crash data: ftp://bordell.englab.brq.redhat.com/pub/bash-4.1.2-42.el6-ccpp-2016-11-02-11-50-34-21686.tar.gz

+++ This bug was initially created as a clone of Bug #1372806 +++

On special conditions it is possible that bash will
jump back to wait_builtin after having returned from it.

  An user can easily reproduce it on their environment,
but it is very difficult to catch it for example in
gdb, as attaching gdb will modify timings and no longer
trigger the issue.

Stack looks like this:
#0  0x00007ffff76215f7 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
56        return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);
(gdb) bt
#0  0x00007ffff76215f7 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x00007ffff7622e28 in __GI_abort () at abort.c:119
#2  0x00007ffff7661317 in __libc_message (do_abort=do_abort@entry=2, fmt=fmt@entry=0x7ffff77696f4 "*** %s ***: %s terminated\n") at ../sysdeps/unix/sysv/linux/libc_fatal.c:196
#3  0x00007ffff76f9b37 in __GI___fortify_fail (msg=msg@entry=0x7ffff77696dc "stack smashing detected") at fortify_fail.c:31
#4  0x00007ffff76f9b00 in __stack_chk_fail () at stack_chk_fail.c:28
#5  0x00000000004796d0 in wait_builtin (list=<optimized out>) at ./wait.def:186
#6  0x0000000000000000 in ?? ()

And stack protector message looks like:
*** stack smashing detected ***: /bin/bash terminated
======= Backtrace: =========
/lib64/libc.so.6(__fortify_fail+0x37)[0x7ffff76f9b37]
/lib64/libc.so.6(__fortify_fail+0x0)[0x7ffff76f9b00]
/bin/bash[0x4796d0]
======= Memory map: ========
00400000-004dd000 r-xp 00000000 fd:00 313150                             /usr/bin/bash
006dc000-006dd000 r--p 000dc000 fd:00 313150                             /usr/bin/bash
006dd000-006e6000 rw-p 000dd000 fd:00 313150                             /usr/bin/bash
006e6000-0080d000 rw-p 00000000 00:00 0                                  [heap]
7ffff0eaf000-7ffff0ec4000 r-xp 00000000 fd:00 354651                     /usr/lib64/libgcc_s-4.8.5-20150702.so.1
7ffff0ec4000-7ffff10c3000 ---p 00015000 fd:00 354651                     /usr/lib64/libgcc_s-4.8.5-20150702.so.1
7ffff10c3000-7ffff10c4000 r--p 00014000 fd:00 354651                     /usr/lib64/libgcc_s-4.8.5-20150702.so.1
7ffff10c4000-7ffff10c5000 rw-p 00015000 fd:00 354651                     /usr/lib64/libgcc_s-4.8.5-20150702.so.1
7ffff10c5000-7ffff75ec000 r--p 00000000 fd:00 442477                     /usr/lib/locale/locale-archive
7ffff75ec000-7ffff77a2000 r-xp 00000000 fd:00 354740                     /usr/lib64/libc-2.17.so
7ffff77a2000-7ffff79a2000 ---p 001b6000 fd:00 354740                     /usr/lib64/libc-2.17.so
7ffff79a2000-7ffff79a6000 r--p 001b6000 fd:00 354740                     /usr/lib64/libc-2.17.so
7ffff79a6000-7ffff79a8000 rw-p 001ba000 fd:00 354740                     /usr/lib64/libc-2.17.so
7ffff79a8000-7ffff79ad000 rw-p 00000000 00:00 0
7ffff79ad000-7ffff79b0000 r-xp 00000000 fd:00 354184                     /usr/lib64/libdl-2.17.so
7ffff79b0000-7ffff7baf000 ---p 00003000 fd:00 354184                     /usr/lib64/libdl-2.17.so
7ffff7baf000-7ffff7bb0000 r--p 00002000 fd:00 354184                     /usr/lib64/libdl-2.17.so
7ffff7bb0000-7ffff7bb1000 rw-p 00003000 fd:00 354184                     /usr/lib64/libdl-2.17.so
7ffff7bb1000-7ffff7bd6000 r-xp 00000000 fd:00 353667                     /usr/lib64/libtinfo.so.5.9
7ffff7bd6000-7ffff7dd6000 ---p 00025000 fd:00 353667                     /usr/lib64/libtinfo.so.5.9
7ffff7dd6000-7ffff7dda000 r--p 00025000 fd:00 353667                     /usr/lib64/libtinfo.so.5.9
7ffff7dda000-7ffff7ddb000 rw-p 00029000 fd:00 353667                     /usr/lib64/libtinfo.so.5.9
7ffff7ddb000-7ffff7dfc000 r-xp 00000000 fd:00 354581                     /usr/lib64/ld-2.17.so
7ffff7fa9000-7ffff7fde000 r--s 00000000 fd:00 345136                     /var/db/nscd/passwd
7ffff7fde000-7ffff7fe1000 rw-p 00000000 00:00 0
7ffff7ff1000-7ffff7ff2000 rw-p 00000000 00:00 0
7ffff7ff2000-7ffff7ff9000 r--s 00000000 fd:00 353531                     /usr/lib64/gconv/gconv-modules.cache
7ffff7ff9000-7ffff7ffa000 rw-p 00000000 00:00 0
7ffff7ffa000-7ffff7ffc000 r-xp 00000000 00:00 0                          [vdso]
7ffff7ffc000-7ffff7ffd000 r--p 00021000 fd:00 354581                     /usr/lib64/ld-2.17.so
7ffff7ffd000-7ffff7ffe000 rw-p 00022000 fd:00 354581                     /usr/lib64/ld-2.17.so
7ffff7ffe000-7ffff7fff000 rw-p 00000000 00:00 0
7ffffffde000-7ffffffff000 rw-p 00000000 00:00 0                          [stack]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]

  The problem is caused because the global variable
this_shell_builtin may be set to wait_builtin for some
time that there may be concurrence, and the code in
jobs.c does not properly check the  value of the other
global, interrupt_immediately, that will be non zero
if the setjmp is still valid.

  Bash-4.3 should handle this condition, but the patch
is too large, so, the smallest possible patch is
proposed to correct the problem, and is confirmed to
correct it.
  Upstream change is in commit
ac50fbac377e32b98d2de396f016ea81e8ee9961
but it is too large and not exactly for the problem
described here.
  The proposed patch just adds a guard to not longjmp
back in wait_builtin if it has already returned.

Comment 5 Jan Kurik 2017-12-06 12:12:45 UTC

Red Hat Enterprise Linux 6 is in the Production 3 Phase. During the Production 3 Phase, Critical impact Security Advisories (RHSAs) and selected Urgent Priority Bug Fix Advisories (RHBAs) may be released as they become available.

The official life cycle policy can be reviewed here:

http://redhat.com/rhel/lifecycle

This issue does not meet the inclusion criteria for the Production 3 Phase and will be marked as CLOSED/WONTFIX. If this remains a critical requirement, please contact Red Hat Customer Support to request a re-evaluation of the issue, citing a clear business justification. Note that a strong business justification will be required for re-evaluation. Red Hat Customer Support can be contacted via the Red Hat Customer Portal at the following URL:

https://access.redhat.com/

Note You need to log in before you can comment on or make changes to this bug.