Hide Forgot
On bash-4.1.2-42.el6 the reproducer causes segfault in a short time from being started (20-30s typically) even without the 'stack smashing detected' error message: ------------------------------------->8----- :: [ 12:06:04 ] :: Detected 1 CPU cores, the test will run for 1200s :: [ BEGIN ] :: Test runs witn bash :: actually running 'grep bash script.exp' spawn bash send -- "bash test.sh\r" :: [ PASS ] :: Test runs witn bash (Expected 0, got 0) :: [ 12:06:04 ] :: Taking small nap :: [ 12:06:14 ] :: test.sh running in bash with PID=7581 :: [ 12:06:14 ] :: Taking small nap :: [ 12:06:24 ] :: kill.sh running with PID=7600 kill.sh: line 5: kill: (7581) - No such process ./runtest.sh: line 75: kill: (7600) - No such process :: [ FAIL ] :: bash with PID 7581 dies unexpectedly after 18s :: [ FAIL ] :: The test shall have ran for 1200s, it ended after 18s ./runtest.sh: line 91: kill: (7600) - No such process :: [ 12:06:49 ] :: Taking small nap :: [ PASS ] :: File 'test.log' should not contain 'stack smashing detected' :: [ FAIL ] :: File 'test.log' should not contain 'core dumped' Segmentation fault (core dumped) -----8<--------------------------------------- The crash has FAF report: http://faf.lab.eng.brq.redhat.com/faf/reports/201/ ABRT-generated crash data: ftp://bordell.englab.brq.redhat.com/pub/bash-4.1.2-42.el6-ccpp-2016-11-02-11-50-34-21686.tar.gz +++ This bug was initially created as a clone of Bug #1372806 +++ On special conditions it is possible that bash will jump back to wait_builtin after having returned from it. An user can easily reproduce it on their environment, but it is very difficult to catch it for example in gdb, as attaching gdb will modify timings and no longer trigger the issue. Stack looks like this: #0 0x00007ffff76215f7 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56 56 return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig); (gdb) bt #0 0x00007ffff76215f7 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56 #1 0x00007ffff7622e28 in __GI_abort () at abort.c:119 #2 0x00007ffff7661317 in __libc_message (do_abort=do_abort@entry=2, fmt=fmt@entry=0x7ffff77696f4 "*** %s ***: %s terminated\n") at ../sysdeps/unix/sysv/linux/libc_fatal.c:196 #3 0x00007ffff76f9b37 in __GI___fortify_fail (msg=msg@entry=0x7ffff77696dc "stack smashing detected") at fortify_fail.c:31 #4 0x00007ffff76f9b00 in __stack_chk_fail () at stack_chk_fail.c:28 #5 0x00000000004796d0 in wait_builtin (list=<optimized out>) at ./wait.def:186 #6 0x0000000000000000 in ?? () And stack protector message looks like: *** stack smashing detected ***: /bin/bash terminated ======= Backtrace: ========= /lib64/libc.so.6(__fortify_fail+0x37)[0x7ffff76f9b37] /lib64/libc.so.6(__fortify_fail+0x0)[0x7ffff76f9b00] /bin/bash[0x4796d0] ======= Memory map: ======== 00400000-004dd000 r-xp 00000000 fd:00 313150 /usr/bin/bash 006dc000-006dd000 r--p 000dc000 fd:00 313150 /usr/bin/bash 006dd000-006e6000 rw-p 000dd000 fd:00 313150 /usr/bin/bash 006e6000-0080d000 rw-p 00000000 00:00 0 [heap] 7ffff0eaf000-7ffff0ec4000 r-xp 00000000 fd:00 354651 /usr/lib64/libgcc_s-4.8.5-20150702.so.1 7ffff0ec4000-7ffff10c3000 ---p 00015000 fd:00 354651 /usr/lib64/libgcc_s-4.8.5-20150702.so.1 7ffff10c3000-7ffff10c4000 r--p 00014000 fd:00 354651 /usr/lib64/libgcc_s-4.8.5-20150702.so.1 7ffff10c4000-7ffff10c5000 rw-p 00015000 fd:00 354651 /usr/lib64/libgcc_s-4.8.5-20150702.so.1 7ffff10c5000-7ffff75ec000 r--p 00000000 fd:00 442477 /usr/lib/locale/locale-archive 7ffff75ec000-7ffff77a2000 r-xp 00000000 fd:00 354740 /usr/lib64/libc-2.17.so 7ffff77a2000-7ffff79a2000 ---p 001b6000 fd:00 354740 /usr/lib64/libc-2.17.so 7ffff79a2000-7ffff79a6000 r--p 001b6000 fd:00 354740 /usr/lib64/libc-2.17.so 7ffff79a6000-7ffff79a8000 rw-p 001ba000 fd:00 354740 /usr/lib64/libc-2.17.so 7ffff79a8000-7ffff79ad000 rw-p 00000000 00:00 0 7ffff79ad000-7ffff79b0000 r-xp 00000000 fd:00 354184 /usr/lib64/libdl-2.17.so 7ffff79b0000-7ffff7baf000 ---p 00003000 fd:00 354184 /usr/lib64/libdl-2.17.so 7ffff7baf000-7ffff7bb0000 r--p 00002000 fd:00 354184 /usr/lib64/libdl-2.17.so 7ffff7bb0000-7ffff7bb1000 rw-p 00003000 fd:00 354184 /usr/lib64/libdl-2.17.so 7ffff7bb1000-7ffff7bd6000 r-xp 00000000 fd:00 353667 /usr/lib64/libtinfo.so.5.9 7ffff7bd6000-7ffff7dd6000 ---p 00025000 fd:00 353667 /usr/lib64/libtinfo.so.5.9 7ffff7dd6000-7ffff7dda000 r--p 00025000 fd:00 353667 /usr/lib64/libtinfo.so.5.9 7ffff7dda000-7ffff7ddb000 rw-p 00029000 fd:00 353667 /usr/lib64/libtinfo.so.5.9 7ffff7ddb000-7ffff7dfc000 r-xp 00000000 fd:00 354581 /usr/lib64/ld-2.17.so 7ffff7fa9000-7ffff7fde000 r--s 00000000 fd:00 345136 /var/db/nscd/passwd 7ffff7fde000-7ffff7fe1000 rw-p 00000000 00:00 0 7ffff7ff1000-7ffff7ff2000 rw-p 00000000 00:00 0 7ffff7ff2000-7ffff7ff9000 r--s 00000000 fd:00 353531 /usr/lib64/gconv/gconv-modules.cache 7ffff7ff9000-7ffff7ffa000 rw-p 00000000 00:00 0 7ffff7ffa000-7ffff7ffc000 r-xp 00000000 00:00 0 [vdso] 7ffff7ffc000-7ffff7ffd000 r--p 00021000 fd:00 354581 /usr/lib64/ld-2.17.so 7ffff7ffd000-7ffff7ffe000 rw-p 00022000 fd:00 354581 /usr/lib64/ld-2.17.so 7ffff7ffe000-7ffff7fff000 rw-p 00000000 00:00 0 7ffffffde000-7ffffffff000 rw-p 00000000 00:00 0 [stack] ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall] The problem is caused because the global variable this_shell_builtin may be set to wait_builtin for some time that there may be concurrence, and the code in jobs.c does not properly check the value of the other global, interrupt_immediately, that will be non zero if the setjmp is still valid. Bash-4.3 should handle this condition, but the patch is too large, so, the smallest possible patch is proposed to correct the problem, and is confirmed to correct it. Upstream change is in commit ac50fbac377e32b98d2de396f016ea81e8ee9961 but it is too large and not exactly for the problem described here. The proposed patch just adds a guard to not longjmp back in wait_builtin if it has already returned.
Red Hat Enterprise Linux 6 is in the Production 3 Phase. During the Production 3 Phase, Critical impact Security Advisories (RHSAs) and selected Urgent Priority Bug Fix Advisories (RHBAs) may be released as they become available. The official life cycle policy can be reviewed here: http://redhat.com/rhel/lifecycle This issue does not meet the inclusion criteria for the Production 3 Phase and will be marked as CLOSED/WONTFIX. If this remains a critical requirement, please contact Red Hat Customer Support to request a re-evaluation of the issue, citing a clear business justification. Note that a strong business justification will be required for re-evaluation. Red Hat Customer Support can be contacted via the Red Hat Customer Portal at the following URL: https://access.redhat.com/