Bug 1372806
Summary: | bash stack smash due to longjmp back to wait_builtin after it returned | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Paulo Andrade <pandrade> | ||||||||||||||
Component: | bash | Assignee: | Siteshwar Vashisht <svashisht> | ||||||||||||||
Status: | CLOSED ERRATA | QA Contact: | Martin Kyral <mkyral> | ||||||||||||||
Severity: | high | Docs Contact: | |||||||||||||||
Priority: | urgent | ||||||||||||||||
Version: | 7.2 | CC: | fkrska, hartsjc, isenfeld, jherrman, mkolaja, mkyral, ovasik, pandrade, svashisht | ||||||||||||||
Target Milestone: | rc | Keywords: | Patch, ZStream | ||||||||||||||
Target Release: | --- | Flags: | fkrska:
needinfo+
|
||||||||||||||
Hardware: | All | ||||||||||||||||
OS: | Linux | ||||||||||||||||
Whiteboard: | |||||||||||||||||
Fixed In Version: | bash-4.2.46-26.el7 | Doc Type: | Bug Fix | ||||||||||||||
Doc Text: |
Due to a bug in trap signal handling, bash in some cases terminated unexpectedly after performing the longjmp() function from the wait_sigint_handler() function to the wait_builtin() function. This update fixes the bug and thus prevents the described crashes from occurring.
|
Story Points: | --- | ||||||||||||||
Clone Of: | |||||||||||||||||
: | 1384520 1384521 1391150 (view as bug list) | Environment: | |||||||||||||||
Last Closed: | 2017-08-01 20:32:05 UTC | Type: | Bug | ||||||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||||||
Documentation: | --- | CRM: | |||||||||||||||
Verified Versions: | Category: | --- | |||||||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||||
Embargoed: | |||||||||||||||||
Bug Depends On: | |||||||||||||||||
Bug Blocks: | 1384520, 1384521, 1391150 | ||||||||||||||||
Attachments: |
|
I am able to reliably reproduce this issue by following below steps (Scripts test.sh and kill.sh are attached with the bug): 1. Execute './test.sh'. It will print the process id of bash process that execute the script. 2. Execute './kill.sh <process id of bash process executing test.sh>' 3. kill.sh will continuously send SIGINT and SIGUSR1 signals to test.sh until it crashes. I was not able to reproduce this issue on a single cpu RHEL 7.2 docker instance, however it reproduces very reliably on a system with 4 cpus. I tried the test rpms from sfdc case 01683190, but bash still crashes. However it shows a different stacktrace : (gdb) bt #0 0x00007febd59f25f7 in ?? () #1 0x00007febd59f3ce8 in ?? () #2 0x0000000000000020 in ?? () #3 0x0000000000000000 in ?? () So it looks like the patch only partially works around the problem. Created attachment 1200619 [details]
test.sh
Created attachment 1200620 [details]
kill.sh
Created attachment 1200814 [details]
bash-4.2-sfdc01683190.patch-complete
Very good job Siteshwar :)
My first patch, now attached, works, with your reproducer.
The patch in bash-4.2.46-20.el7.sfdc01683190.3
The second patch was the one I added to the bugzilla,
because without a reproducer the second one would be more
likely to be accepted, the one in
bash-4.2.46-20.el7.sfdc01683190.4
The earlier test packages were a libasan enabled bash
and another built without optimizations and with debug
information, to make it easier to debug the problem, but
neither did reproduce the issue.
Note that the "new" patch is basically a backport of
the diff from bash 4.3, and to be really complete/correct
need to add the definition and calls to CHECK_WAIT_INTR,
but it starts to become too large of a patch :(
As per my understanding changes in nojobs.c are not required as this file is not used. Is there any specific reason you have made those changes ? The patch to nojobs.c is not required by Red Hat packages. The change was intended to be a backport of the bash-4.3 diff, but as I said, the patch is not complete, so I stopped when the diff would became too large without a reproducer... I tried to create a reproducer with a C program that would spawn a bash that would execute a script and keep sending signals to it, but tried to create a "single shot" reproducer; a brute force one would have been better as you found :) I can remake the patch if you prefer, basically add definition of CHECK_WAIT_INTR and calls to it, but the patch will likely become twice as large. Created attachment 1202537 [details]
bash-4.2-sfdc01683190.patch
For the provided test case, this patch is very close
to latest bash-4.3 and shows the same behavior.
Does not crash, but, because the test case runs in
a loop sending signals, it does eventually stack
overflow due to not handling the traps fast enough.
A variant of the minimal patch that does not crash is: diff -up bash-4.2/jobs.c.orig bash-4.2/jobs.c --- bash-4.2/jobs.c~ 2016-09-26 10:07:28.117401802 -0300 +++ bash-4.2/jobs.c 2016-09-26 12:09:07.803979913 -0300 @@ -2244,10 +2244,14 @@ wait_sigint_handler (sig) signal_is_trapped (SIGINT) && ((sigint_handler = trap_to_sighandler (SIGINT)) == trap_handler)) { - interrupt_immediately = 0; trap_handler (SIGINT); /* set pending_traps[SIGINT] */ wait_signal_received = SIGINT; - longjmp (wait_intr_buf, 1); + if (interrupt_immediately) + { + interrupt_immediately = 0; + longjmp (wait_intr_buf, 1); + } + SIGRETURN (0); } ADDINTERRUPT; Created attachment 1206497 [details]
bash-4.3-trapped-signals.patch
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:1931 |
Created attachment 1197263 [details] bash-4.2-sfdc01683190.patch On special conditions it is possible that bash will jump back to wait_builtin after having returned from it. An user can easily reproduce it on their environment, but it is very difficult to catch it for example in gdb, as attaching gdb will modify timings and no longer trigger the issue. Stack looks like this: #0 0x00007ffff76215f7 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56 56 return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig); (gdb) bt #0 0x00007ffff76215f7 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56 #1 0x00007ffff7622e28 in __GI_abort () at abort.c:119 #2 0x00007ffff7661317 in __libc_message (do_abort=do_abort@entry=2, fmt=fmt@entry=0x7ffff77696f4 "*** %s ***: %s terminated\n") at ../sysdeps/unix/sysv/linux/libc_fatal.c:196 #3 0x00007ffff76f9b37 in __GI___fortify_fail (msg=msg@entry=0x7ffff77696dc "stack smashing detected") at fortify_fail.c:31 #4 0x00007ffff76f9b00 in __stack_chk_fail () at stack_chk_fail.c:28 #5 0x00000000004796d0 in wait_builtin (list=<optimized out>) at ./wait.def:186 #6 0x0000000000000000 in ?? () And stack protector message looks like: *** stack smashing detected ***: /bin/bash terminated ======= Backtrace: ========= /lib64/libc.so.6(__fortify_fail+0x37)[0x7ffff76f9b37] /lib64/libc.so.6(__fortify_fail+0x0)[0x7ffff76f9b00] /bin/bash[0x4796d0] ======= Memory map: ======== 00400000-004dd000 r-xp 00000000 fd:00 313150 /usr/bin/bash 006dc000-006dd000 r--p 000dc000 fd:00 313150 /usr/bin/bash 006dd000-006e6000 rw-p 000dd000 fd:00 313150 /usr/bin/bash 006e6000-0080d000 rw-p 00000000 00:00 0 [heap] 7ffff0eaf000-7ffff0ec4000 r-xp 00000000 fd:00 354651 /usr/lib64/libgcc_s-4.8.5-20150702.so.1 7ffff0ec4000-7ffff10c3000 ---p 00015000 fd:00 354651 /usr/lib64/libgcc_s-4.8.5-20150702.so.1 7ffff10c3000-7ffff10c4000 r--p 00014000 fd:00 354651 /usr/lib64/libgcc_s-4.8.5-20150702.so.1 7ffff10c4000-7ffff10c5000 rw-p 00015000 fd:00 354651 /usr/lib64/libgcc_s-4.8.5-20150702.so.1 7ffff10c5000-7ffff75ec000 r--p 00000000 fd:00 442477 /usr/lib/locale/locale-archive 7ffff75ec000-7ffff77a2000 r-xp 00000000 fd:00 354740 /usr/lib64/libc-2.17.so 7ffff77a2000-7ffff79a2000 ---p 001b6000 fd:00 354740 /usr/lib64/libc-2.17.so 7ffff79a2000-7ffff79a6000 r--p 001b6000 fd:00 354740 /usr/lib64/libc-2.17.so 7ffff79a6000-7ffff79a8000 rw-p 001ba000 fd:00 354740 /usr/lib64/libc-2.17.so 7ffff79a8000-7ffff79ad000 rw-p 00000000 00:00 0 7ffff79ad000-7ffff79b0000 r-xp 00000000 fd:00 354184 /usr/lib64/libdl-2.17.so 7ffff79b0000-7ffff7baf000 ---p 00003000 fd:00 354184 /usr/lib64/libdl-2.17.so 7ffff7baf000-7ffff7bb0000 r--p 00002000 fd:00 354184 /usr/lib64/libdl-2.17.so 7ffff7bb0000-7ffff7bb1000 rw-p 00003000 fd:00 354184 /usr/lib64/libdl-2.17.so 7ffff7bb1000-7ffff7bd6000 r-xp 00000000 fd:00 353667 /usr/lib64/libtinfo.so.5.9 7ffff7bd6000-7ffff7dd6000 ---p 00025000 fd:00 353667 /usr/lib64/libtinfo.so.5.9 7ffff7dd6000-7ffff7dda000 r--p 00025000 fd:00 353667 /usr/lib64/libtinfo.so.5.9 7ffff7dda000-7ffff7ddb000 rw-p 00029000 fd:00 353667 /usr/lib64/libtinfo.so.5.9 7ffff7ddb000-7ffff7dfc000 r-xp 00000000 fd:00 354581 /usr/lib64/ld-2.17.so 7ffff7fa9000-7ffff7fde000 r--s 00000000 fd:00 345136 /var/db/nscd/passwd 7ffff7fde000-7ffff7fe1000 rw-p 00000000 00:00 0 7ffff7ff1000-7ffff7ff2000 rw-p 00000000 00:00 0 7ffff7ff2000-7ffff7ff9000 r--s 00000000 fd:00 353531 /usr/lib64/gconv/gconv-modules.cache 7ffff7ff9000-7ffff7ffa000 rw-p 00000000 00:00 0 7ffff7ffa000-7ffff7ffc000 r-xp 00000000 00:00 0 [vdso] 7ffff7ffc000-7ffff7ffd000 r--p 00021000 fd:00 354581 /usr/lib64/ld-2.17.so 7ffff7ffd000-7ffff7ffe000 rw-p 00022000 fd:00 354581 /usr/lib64/ld-2.17.so 7ffff7ffe000-7ffff7fff000 rw-p 00000000 00:00 0 7ffffffde000-7ffffffff000 rw-p 00000000 00:00 0 [stack] ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall] The problem is caused because the global variable this_shell_builtin may be set to wait_builtin for some time that there may be concurrence, and the code in jobs.c does not properly check the value of the other global, interrupt_immediately, that will be non zero if the setjmp is still valid. Bash-4.3 should handle this condition, but the patch is too large, so, the smallest possible patch is proposed to correct the problem, and is confirmed to correct it. Upstream change is in commit ac50fbac377e32b98d2de396f016ea81e8ee9961 but it is too large and not exactly for the problem described here. The proposed patch just adds a guard to not longjmp back in wait_builtin if it has already returned.