Bug 1487748

Summary: bash crash due to async unsafe calls to malloc/free
Product: Red Hat Enterprise Linux 6 Reporter: Paulo Andrade <pandrade>
Component: bashAssignee: Siteshwar Vashisht <svashisht>
Status: CLOSED WONTFIX QA Contact: BaseOS QE - Apps <qe-baseos-apps>
Severity: high Docs Contact:
Priority: unspecified    
Version: 6.9CC: ashankar, fweimer, kdudka, mnewsome, pfrankli, svashisht
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-11-06 13:23:03 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1461138    
Attachments:
Description Flags
bug1487748.patch kdudka: review+

Description Paulo Andrade 2017-09-01 19:32:32 UTC
While attempting to create a reproducer to a crash due to
"async unsafe" access to gettext related functions, I created
a test case that has a fail rate of around 10% in rhel6, and
around 1% on rhel7:

$ cat 01915026.sh
#!/bin/bash
function handleResult()
{
  # do nothing
  :
}
function killIt()
{
    for i in $(seq 250):; do
	kill -1 $1 || exit
	sleep .01
    done
}
killIt $$ &
for i in $(seq 250):; do
  trap "handleResult 1" 1 SIGQUIT SIGTSTP 15     
  read -p "reading..." foo < /proc/$$/mem # cause I/O error
  trap "handleResult 1" 1 15
  trap "" SIGQUIT SIGTSTP
done

$ for i in $(seq 100); do gdb -ex "handle SIGHUP nostop noprint" -ex "run" -ex "q" --args bash ./01915026.sh ; done

if any error happens, gdb will stop, or get stuck in an infinite
loop in malloc code. On rhel7 I got like 1% failure rate, and could
not trigger it (on not too many runs) in Fedora 26.

Comment 2 Siteshwar Vashisht 2017-09-06 10:48:14 UTC
This is the backtrace I receive in RHEL 6:

(gdb) bt
#0  __lll_lock_wait_private () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:97
#1  0x00000032f0e7d2eb in _L_lock_10288 () from /lib64/libc.so.6
#2  0x00000032f0e7ab13 in __libc_malloc (bytes=218793304352) at malloc.c:3665
#3  0x0000000000466743 in xmalloc (bytes=16) at xmalloc.c:112
#4  0x000000000041df3e in save_token_state () at ./parse.y:1746
#5  0x0000000000450e18 in run_pending_traps () at trap.c:342
#6  0x000000000045101d in trap_handler (sig=1) at trap.c:414
#7  <signal handler called>
#8  _int_malloc (av=0x32f118e120, bytes=8) at malloc.c:4341
#9  0x00000032f0e7aaac in __libc_malloc (bytes=8) at malloc.c:3667
#10 0x0000000000466743 in xmalloc (bytes=8) at xmalloc.c:112
#11 0x0000000000443dca in dequote_string (string=0x8f46a0 "SIGQUIT") at subst.c:3529
#12 0x000000000044a58c in glob_expand_word_list (list=<value optimized out>, eflags=31) at subst.c:8853
#13 expand_word_list_internal (list=<value optimized out>, eflags=31) at subst.c:9134
#14 0x000000000042ee9f in execute_simple_command (simple_command=0x8ee3c0, pipe_in=-1, pipe_out=-1, async=<value optimized out>, 
    fds_to_close=<value optimized out>) at execute_cmd.c:3656
#15 0x0000000000430463 in execute_command_internal (command=0x8ece40, asynchronous=<value optimized out>, pipe_in=-1, pipe_out=-1, 
    fds_to_close=0x8eee00) at execute_cmd.c:741
#16 0x00000000004335b3 in execute_connection (command=0x8ecf80, asynchronous=0, pipe_in=-1, pipe_out=-1, fds_to_close=0x8eee00)
    at execute_cmd.c:2242
#17 0x00000000004303bd in execute_command_internal (command=0x8ecf80, asynchronous=<value optimized out>, pipe_in=-1, pipe_out=-1, 
    fds_to_close=0x8eee00) at execute_cmd.c:896
#18 0x000000000043110e in execute_command (command=0x8ecf80) at execute_cmd.c:379
#19 0x0000000000430e09 in execute_for_command (command=0x8ed760, asynchronous=<value optimized out>, 
    pipe_in=<value optimized out>, pipe_out=<value optimized out>, fds_to_close=0x0) at execute_cmd.c:2425
#20 execute_command_internal (command=0x8ed760, asynchronous=<value optimized out>, pipe_in=<value optimized out>, 
    pipe_out=<value optimized out>, fds_to_close=0x0) at execute_cmd.c:808
#21 0x000000000043110e in execute_command (command=0x8ed760) at execute_cmd.c:379
#22 0x000000000041d6d6 in reader_loop () at eval.c:153
#23 0x000000000041cebc in main (argc=<value optimized out>, argv=0x7fffffffe668, env=0x7fffffffe680) at shell.c:759

I suspect it is the same problem as described in https://sourceware.org/bugzilla/show_bug.cgi?id=13699

Comment 3 Florian Weimer 2017-09-06 11:24:32 UTC
This is a bash bug.  Calling malloc from a signal handler is not supported because malloc is not async-signal-safe:

#3  0x0000000000466743 in xmalloc (bytes=16) at xmalloc.c:112
#4  0x000000000041df3e in save_token_state () at ./parse.y:1746
#5  0x0000000000450e18 in run_pending_traps () at trap.c:342
#6  0x000000000045101d in trap_handler (sig=1) at trap.c:414
#7  <signal handler called>
#8  _int_malloc (av=0x32f118e120, bytes=8) at malloc.c:4341
#9  0x00000032f0e7aaac in __libc_malloc (bytes=8) at malloc.c:3667
#10 0x0000000000466743 in xmalloc (bytes=8) at xmalloc.c:112

bash attempts to delay execution of the signal handler using the interrupt_immediately flag, but I suspect that it is incorrectly set in this case.

Comment 4 Siteshwar Vashisht 2017-09-06 11:40:05 UTC
Yes, we noticed it and assigned back to bash.

Comment 5 Siteshwar Vashisht 2017-09-06 22:30:52 UTC
Signal handling had been reworked in bash-4.3 to avoid such issues [1]. 

"This release fixes many outstanding bugs in bash-4.2 and introduces several
new features.  The most significant bug fix is the reworking of signal
handling to avoid running signal and trap handlers in a signal handler
context.  This led to issues with glibc, which uses internal locks
extensively and handles longjmps from user code very poorly."

[1] https://lists.gnu.org/archive/html/bug-bash/2014-02/msg00081.html

Comment 7 Siteshwar Vashisht 2017-09-07 14:15:49 UTC
A bit simplified reproducer from comment 0:

#!/bin/bash                      
function handleResult()          
{                                
  # do nothing                   
  :                              
}                                
function killIt()                
{                                
    sleep 1                      
    for i in $(seq 25000):; do   
        kill -1 $1 || exit       
        sleep .01                
    done                         
}                                
killIt $$ &                      
for i in $(seq 25000):; do       
  trap "handleResult 1" 1 SIGQUIT SIGTSTP 15                      
  read -p "reading..." foo < /proc/$$/mem # cause I/O error       
  trap "handleResult 1" 1 15     
  trap "" SIGQUIT SIGTSTP        
done

Comment 8 Siteshwar Vashisht 2017-09-08 13:18:53 UTC
Created attachment 1323746 [details]
bug1487748.patch

Decrement interrupt_immediately and terminate_immediately if read fails.

Comment 9 Siteshwar Vashisht 2017-09-08 13:21:36 UTC
interrupt_immediately was not decremented if read returned due to error. Repeated read failures resulted into interrupt_immediately being set to a very high value. 

From the gdb session :

(gdb) p interrupt_immediately    
$3 = 23533                       


Patch in comment 8 fixes it.

Comment 10 Kamil Dudka 2017-09-08 15:27:18 UTC
Comment on attachment 1323746 [details]
bug1487748.patch

Looks good to me.  Does it mean we have the same bug in RHEL-7 bash?

Comment 11 Siteshwar Vashisht 2017-09-08 15:47:55 UTC
I have not been able to reproduce it in RHEL 7.4. However the code is same in read builtin, so this bug may exist there, but it's not as easily reproducible as RHEL 6.9.

Comment 12 Paulo Andrade 2017-09-13 12:28:50 UTC
  On a vm not fully updated, used to debug some other issues
and using the original reproducer:

$ rpm -q bash glibc
bash-4.2.46-19.el7.x86_64
glibc-2.17-106.el7_2.8.x86_64
glibc-2.17-106.el7_2.8.i686

after blocked for 10 sec:

...
foo.sh: line 17: read: read error: 0: Input/output error
^C
Program received signal SIGINT, Interrupt.
__lll_lock_wait_private ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:95
95	2:	movl	%edx, %eax
A debugging session is active.

	Inferior 1 [process 7373] will be killed.

Quit anyway? (y or n) n
Not confirmed.
(gdb) bt
#0  __lll_lock_wait_private ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:95
#1  0x00007ffff766e1c7 in _L_lock_14687 () at malloc.c:5252
#2  0x00007ffff766b2c3 in __GI___libc_malloc (bytes=140737347479392)
    at malloc.c:2893
#3  0x000000000046a22b in xmalloc (bytes=bytes@entry=32) at xmalloc.c:112
#4  0x0000000000458d4b in array_create () at array.c:89
#5  0x0000000000458d91 in array_copy (a=0x941cf0) at array.c:138
#6  0x000000000043ad0b in save_pipestatus_array () at variables.c:4777
#7  0x0000000000453f11 in run_pending_traps () at trap.c:287
#8  0x00000000004541c2 in trap_handler (sig=1) at trap.c:427
#9  <signal handler called>
#10 0x00007ffff7668d74 in _int_malloc (
    av=av@entry=0x7ffff79a6760 <main_arena>, bytes=bytes@entry=112)
    at malloc.c:3316
#11 0x00007ffff766b26c in __GI___libc_malloc (bytes=112) at malloc.c:2895
#12 0x000000000046a22b in xmalloc (bytes=bytes@entry=112) at xmalloc.c:112
#13 0x000000000044f2cf in expand_word_internal (word=0x941b90, 
    quoted=quoted@entry=0, isexp=isexp@entry=0, 
    contains_dollar_at=contains_dollar_at@entry=0x7fffffffd8b4, 
    expanded_something=expanded_something@entry=0x7fffffffd8b0) at subst.c:7951
#14 0x0000000000451c02 in shell_expand_word_list (eflags=31, tlist=0x6f7510)
    at subst.c:9254
#15 expand_word_list_internal (list=<optimized out>, eflags=eflags@entry=31)
    at subst.c:9371
#16 0x00000000004523fa in expand_words (list=<optimized out>) at subst.c:8993
#17 0x0000000000430b00 in execute_simple_command (
    simple_command=<optimized out>, pipe_in=pipe_in@entry=-1, 
    pipe_out=pipe_out@entry=-1, async=async@entry=0, 
    fds_to_close=fds_to_close@entry=0x941830) at execute_cmd.c:3787
#18 0x00000000004326ab in execute_command_internal (command=0x6f6590, 
    asynchronous=0, pipe_in=-1, pipe_out=-1, fds_to_close=0x941830)
    at execute_cmd.c:747
#19 0x00000000004324c0 in execute_connection (fds_to_close=0x941830, 
    pipe_out=-1, pipe_in=-1, asynchronous=0, command=0x6f64e0)
    at execute_cmd.c:2338
#20 execute_command_internal (command=0x6f64e0, asynchronous=0, pipe_in=-1, 
    pipe_out=-1, fds_to_close=0x941830) at execute_cmd.c:903
#21 0x0000000000433b5e in execute_command (command=0x6f64e0)
    at execute_cmd.c:386
#22 0x0000000000432484 in execute_connection (fds_to_close=0x9435f0, 
    pipe_out=-1, pipe_in=-1, asynchronous=0, command=0x6f5ad0)
    at execute_cmd.c:2336
#23 execute_command_internal (command=0x6f5ad0, asynchronous=0, pipe_in=-1, 
    pipe_out=-1, fds_to_close=0x9435f0) at execute_cmd.c:903
#24 0x0000000000433b5e in execute_command (command=0x6f5ad0)
    at execute_cmd.c:386
#25 0x0000000000432484 in execute_connection (fds_to_close=0x9413d0, 
    pipe_out=-1, pipe_in=-1, asynchronous=0, command=0x6f5f20)
    at execute_cmd.c:2336
#26 execute_command_internal (command=0x6f5f20, asynchronous=0, pipe_in=-1, 
    pipe_out=-1, fds_to_close=0x9413d0) at execute_cmd.c:903
#27 0x0000000000433b5e in execute_command (command=0x6f5f20)
    at execute_cmd.c:386
#28 0x0000000000432ec5 in execute_for_command (for_command=0x6f5f70)
    at execute_cmd.c:2521
#29 execute_command_internal (command=0x6f6700, asynchronous=<optimized out>, 
    pipe_in=<optimized out>, pipe_out=<optimized out>, 
    fds_to_close=<optimized out>) at execute_cmd.c:815
#30 0x0000000000433b5e in execute_command (command=0x6f6700)
    at execute_cmd.c:386
#31 0x000000000041e285 in reader_loop () at eval.c:153
#32 0x000000000041c8ee in main (argc=2, argv=0x7fffffffe138, 
    env=0x7fffffffe150) at shell.c:759

a bit later:

foo.sh: line 17: read: read error: 0: Input/output error
*** Error in `/usr/bin/bash': free(): invalid pointer: 0x0000000000943500 ***
======= Backtrace: =========
/lib64/libc.so.6(+0x7d053)[0x7ffff7668053]
/usr/bin/bash(dispose_redirects+0x3c)[0x42e49c]
/usr/bin/bash[0x43138c]
/usr/bin/bash(execute_command_internal+0xf0b)[0x4326ab]
/usr/bin/bash(execute_command_internal+0xd20)[0x4324c0]
/usr/bin/bash(execute_command+0x4e)[0x433b5e]
/usr/bin/bash(execute_command_internal+0xce4)[0x432484]
/usr/bin/bash(execute_command+0x4e)[0x433b5e]
/usr/bin/bash(execute_command_internal+0xce4)[0x432484]
/usr/bin/bash(execute_command+0x4e)[0x433b5e]
/usr/bin/bash(execute_command_internal+0x1725)[0x432ec5]
/usr/bin/bash(execute_command+0x4e)[0x433b5e]
/usr/bin/bash(reader_loop+0x195)[0x41e285]
/usr/bin/bash(main+0xc4e)[0x41c8ee]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x7ffff760cb15]
/usr/bin/bash[0x41d38d]
======= Memory map: ========
00400000-004dd000 r-xp 00000000 00:23 1258402                            /usr/bin/bash
006dc000-006dd000 r--p 000dc000 00:23 1258402                            /usr/bin/bash
006dd000-006e6000 rw-p 000dd000 00:23 1258402                            /usr/bin/bash
006e6000-00949000 rw-p 00000000 00:00 0                                  [heap]
7fffec000000-7fffec021000 rw-p 00000000 00:00 0 
7fffec021000-7ffff0000000 ---p 00000000 00:00 0 
7ffff0eae000-7ffff0ec3000 r-xp 00000000 00:23 186955                     /usr/lib64/libgcc_s-4.8.5-20150702.so.1
7ffff0ec3000-7ffff10c2000 ---p 00015000 00:23 186955                     /usr/lib64/libgcc_s-4.8.5-20150702.so.1
7ffff10c2000-7ffff10c3000 r--p 00014000 00:23 186955                     /usr/lib64/libgcc_s-4.8.5-20150702.so.1
7ffff10c3000-7ffff10c4000 rw-p 00015000 00:23 186955                     /usr/lib64/libgcc_s-4.8.5-20150702.so.1
7ffff10c4000-7ffff75eb000 r--p 00000000 00:23 1267105                    /usr/lib/locale/locale-archive
7ffff75eb000-7ffff77a2000 r-xp 00000000 00:23 1267115                    /usr/lib64/libc-2.17.so
7ffff77a2000-7ffff79a2000 ---p 001b7000 00:23 1267115                    /usr/lib64/libc-2.17.so
7ffff79a2000-7ffff79a6000 r--p 001b7000 00:23 1267115                    /usr/lib64/libc-2.17.so
7ffff79a6000-7ffff79a8000 rw-p 001bb000 00:23 1267115                    /usr/lib64/libc-2.17.so
7ffff79a8000-7ffff79ad000 rw-p 00000000 00:00 0 
7ffff79ad000-7ffff79b0000 r-xp 00000000 00:23 1267121                    /usr/lib64/libdl-2.17.so
7ffff79b0000-7ffff7baf000 ---p 00003000 00:23 1267121                    /usr/lib64/libdl-2.17.so
7ffff7baf000-7ffff7bb0000 r--p 00002000 00:23 1267121                    /usr/lib64/libdl-2.17.so
7ffff7bb0000-7ffff7bb1000 rw-p 00003000 00:23 1267121                    /usr/lib64/libdl-2.17.so
7ffff7bb1000-7ffff7bd6000 r-xp 00000000 00:23 22367                      /usr/lib64/libtinfo.so.5.9
7ffff7bd6000-7ffff7dd6000 ---p 00025000 00:23 22367                      /usr/lib64/libtinfo.so.5.9
7ffff7dd6000-7ffff7dda000 r--p 00025000 00:23 22367                      /usr/lib64/libtinfo.so.5.9
7ffff7dda000-7ffff7ddb000 rw-p 00029000 00:23 22367                      /usr/lib64/libtinfo.so.5.9
7ffff7ddb000-7ffff7dfc000 r-xp 00000000 00:23 1267108                    /usr/lib64/ld-2.17.so
7ffff7fdf000-7ffff7fe2000 rw-p 00000000 00:00 0 
7ffff7ff0000-7ffff7ff2000 rw-p 00000000 00:00 0 
7ffff7ff2000-7ffff7ff9000 r--s 00000000 00:23 1267429                    /usr/lib64/gconv/gconv-modules.cache
7ffff7ff9000-7ffff7ffa000 rw-p 00000000 00:00 0 
7ffff7ffa000-7ffff7ffc000 r-xp 00000000 00:00 0                          [vdso]
7ffff7ffc000-7ffff7ffd000 r--p 00021000 00:23 1267108                    /usr/lib64/ld-2.17.so
7ffff7ffd000-7ffff7ffe000 rw-p 00022000 00:23 1267108                    /usr/lib64/ld-2.17.so
7ffff7ffe000-7ffff7fff000 rw-p 00000000 00:00 0 
7ffffffde000-7ffffffff000 rw-p 00000000 00:00 0                          [stack]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]

Program received signal SIGABRT, Aborted.
0x00007ffff76205f7 in __GI_raise (sig=sig@entry=6)
    at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
56	  return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);
A debugging session is active.

	Inferior 1 [process 25346] will be killed.

Quit anyway? (y or n) 
Not confirmed.
(gdb) bt
#0  0x00007ffff76205f7 in __GI_raise (sig=sig@entry=6)
    at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x00007ffff7621ce8 in __GI_abort () at abort.c:90
#2  0x00007ffff7660327 in __libc_message (do_abort=do_abort@entry=2, 
    fmt=fmt@entry=0x7ffff776a488 "*** Error in `%s': %s: 0x%s ***\n")
    at ../sysdeps/unix/sysv/linux/libc_fatal.c:196
#3  0x00007ffff7668053 in malloc_printerr (ar_ptr=0x7ffff79a6760 <main_arena>, 
    ptr=<optimized out>, str=0x7ffff7767b74 "free(): invalid pointer", 
    action=3) at malloc.c:5022
#4  _int_free (av=0x7ffff79a6760 <main_arena>, p=<optimized out>, have_lock=0)
    at malloc.c:3842
#5  0x000000000042e49c in dispose_redirects (list=0x0, list@entry=0x941e40)
    at dispose_cmd.c:340
#6  0x000000000043138c in cleanup_redirects (list=0x941e40)
    at execute_cmd.c:442
#7  execute_builtin_or_function (flags=0, fds_to_close=0x941830, 
    redirects=<optimized out>, var=<optimized out>, 
    builtin=0x474430 <read_builtin>, words=0x941a60) at execute_cmd.c:4586
#8  execute_simple_command (simple_command=<optimized out>, 
    pipe_in=pipe_in@entry=-1, pipe_out=pipe_out@entry=-1, async=async@entry=0, 
    fds_to_close=fds_to_close@entry=0x941830) at execute_cmd.c:3956
#9  0x00000000004326ab in execute_command_internal (command=0x6f6590, 
    asynchronous=0, pipe_in=-1, pipe_out=-1, fds_to_close=0x941830)
    at execute_cmd.c:747
#10 0x00000000004324c0 in execute_connection (fds_to_close=0x941830, 
    pipe_out=-1, pipe_in=-1, asynchronous=0, command=0x6f64e0)
    at execute_cmd.c:2338
#11 execute_command_internal (command=0x6f64e0, asynchronous=0, pipe_in=-1, 
    pipe_out=-1, fds_to_close=0x941830) at execute_cmd.c:903
#12 0x0000000000433b5e in execute_command (command=0x6f64e0)
    at execute_cmd.c:386
#13 0x0000000000432484 in execute_connection (fds_to_close=0x9435f0, 
    pipe_out=-1, pipe_in=-1, asynchronous=0, command=0x6f5ad0)
    at execute_cmd.c:2336
#14 execute_command_internal (command=0x6f5ad0, asynchronous=0, pipe_in=-1, 
    pipe_out=-1, fds_to_close=0x9435f0) at execute_cmd.c:903
#15 0x0000000000433b5e in execute_command (command=0x6f5ad0)
    at execute_cmd.c:386
#16 0x0000000000432484 in execute_connection (fds_to_close=0x9413d0, 
    pipe_out=-1, pipe_in=-1, asynchronous=0, command=0x6f5f20)
    at execute_cmd.c:2336
#17 execute_command_internal (command=0x6f5f20, asynchronous=0, pipe_in=-1, 
    pipe_out=-1, fds_to_close=0x9413d0) at execute_cmd.c:903
#18 0x0000000000433b5e in execute_command (command=0x6f5f20)
    at execute_cmd.c:386
#19 0x0000000000432ec5 in execute_for_command (for_command=0x6f5f70)
    at execute_cmd.c:2521
#20 execute_command_internal (command=0x6f6700, asynchronous=<optimized out>, 
    pipe_in=<optimized out>, pipe_out=<optimized out>, 
    fds_to_close=<optimized out>) at execute_cmd.c:815
#21 0x0000000000433b5e in execute_command (command=0x6f6700)
    at execute_cmd.c:386
#22 0x000000000041e285 in reader_loop () at eval.c:153
#23 0x000000000041c8ee in main (argc=2, argv=0x7fffffffe138, 
    env=0x7fffffffe150) at shell.c:759

Comment 14 Paulo Andrade 2017-09-29 16:24:59 UTC
  The package now appears to create some reproducible errors for the user case:

(gdb) bt
#0  __lll_lock_wait_private () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:97
#1  0x00000037abe2f80e in _L_lock_629 () from /lib64/libc-2.12.so
#2  0x00000037abe2f6f6 in _nl_expand_alias (name=0x37ac18f250 "\002") at localealias.c:150
#3  0x00000037abe2def8 in _nl_find_domain (dirname=0x37abf59920 "/usr/share/locale", locale=0x7ffe1088a0c0 "en_US.UTF-8", domainname=0x7ffe1088a0e0 "LC_MESSAGES/bash.mo", domainbinding=0x1a82000) at finddomain.c:120
#4  0x00000037abe2d6e0 in __dcigettext (domainname=0x1a82030 "bash", msgid1=0x4a4ff2 "%s: line %d: ", msgid2=0x0, plural=0, n=0, category=5) at dcigettext.c:628
#5  0x000000000043f0a6 in notify_of_job_status () at jobs.c:3519
#6  0x0000000000440241 in notify_and_cleanup () at jobs.c:2634
#7  0x0000000000440783 in wait_for (pid=<value optimized out>) at jobs.c:2581
#8  0x0000000000430f19 in execute_command_internal (command=0x1b7c490, asynchronous=<value optimized out>, pipe_in=3, pipe_out=<value optimized out>, fds_to_close=0x0) at execute_cmd.c:780
#9  0x000000000043336f in execute_pipeline (command=<value optimized out>, asynchronous=0, pipe_in=<value optimized out>, pipe_out=-1, fds_to_close=0x1b7eea0) at execute_cmd.c:2161
#10 0x000000000043362a in execute_connection (command=0x1b7e2c0, asynchronous=0, pipe_in=-1, pipe_out=-1, fds_to_close=0x1b7eea0) at execute_cmd.c:2254
#11 0x00000000004303bd in execute_command_internal (command=0x1b7e2c0, asynchronous=<value optimized out>, pipe_in=-1, pipe_out=-1, fds_to_close=0x1b7eea0) at execute_cmd.c:896
#12 0x000000000043110e in execute_command (command=0x1b7e2c0) at execute_cmd.c:379
#13 0x0000000000430545 in execute_command_internal (command=0x1b85950, asynchronous=<value optimized out>, pipe_in=-1, pipe_out=-1, fds_to_close=0x1b7bbc0) at execute_cmd.c:848
#14 0x000000000043110e in execute_command (command=0x1b85950) at execute_cmd.c:379
#15 0x000000000043357e in execute_connection (command=0x1b79bb0, asynchronous=0, pipe_in=-1, pipe_out=-1, fds_to_close=0x1b4e9e0) at execute_cmd.c:2240
#16 0x00000000004303bd in execute_command_internal (command=0x1b79bb0, asynchronous=<value optimized out>, pipe_in=-1, pipe_out=-1, fds_to_close=0x1b4e9e0) at execute_cmd.c:896
#17 0x000000000043110e in execute_command (command=0x1b79bb0) at execute_cmd.c:379
#18 0x000000000043357e in execute_connection (command=0x1b87d10, asynchronous=0, pipe_in=-1, pipe_out=-1, fds_to_close=0x1b7c600) at execute_cmd.c:2240
#19 0x00000000004303bd in execute_command_internal (command=0x1b87d10, asynchronous=<value optimized out>, pipe_in=-1, pipe_out=-1, fds_to_close=0x1b7c600) at execute_cmd.c:896
#20 0x000000000043110e in execute_command (command=0x1b87d10) at execute_cmd.c:379
#21 0x0000000000430b0f in execute_case_command (command=0x1b81df0, asynchronous=<value optimized out>, pipe_in=<value optimized out>, pipe_out=<value optimized out>, fds_to_close=0x0) at execute_cmd.c:3010
#22 execute_command_internal (command=0x1b81df0, asynchronous=<value optimized out>, pipe_in=<value optimized out>, pipe_out=<value optimized out>, fds_to_close=0x0) at execute_cmd.c:830
#23 0x00000000004335b3 in execute_connection (command=0x1b4f540, asynchronous=0, pipe_in=-1, pipe_out=-1, fds_to_close=0x1b7fbb0) at execute_cmd.c:2242
#24 0x00000000004303bd in execute_command_internal (command=0x1b4f540, asynchronous=<value optimized out>, pipe_in=-1, pipe_out=-1, fds_to_close=0x1b7fbb0) at execute_cmd.c:896
#25 0x000000000043110e in execute_command (command=0x1b4f540) at execute_cmd.c:379
#26 0x000000000043357e in execute_connection (command=0x1b79370, asynchronous=0, pipe_in=-1, pipe_out=-1, fds_to_close=0x1b7fb60) at execute_cmd.c:2240
#27 0x00000000004303bd in execute_command_internal (command=0x1b79370, asynchronous=<value optimized out>, pipe_in=-1, pipe_out=-1, fds_to_close=0x1b7fb60) at execute_cmd.c:896
#28 0x000000000043110e in execute_command (command=0x1b79370) at execute_cmd.c:379
#29 0x000000000043357e in execute_connection (command=0x1b798a0, asynchronous=0, pipe_in=-1, pipe_out=-1, fds_to_close=0x1b866a0) at execute_cmd.c:2240
#30 0x00000000004303bd in execute_command_internal (command=0x1b798a0, asynchronous=<value optimized out>, pipe_in=-1, pipe_out=-1, fds_to_close=0x1b866a0) at execute_cmd.c:896
#31 0x0000000000430362 in execute_command_internal (command=0x1b77c40, asynchronous=<value optimized out>, pipe_in=-1, pipe_out=-1, fds_to_close=0x1b866a0) at execute_cmd.c:889
#32 0x0000000000432169 in execute_function (var=<value optimized out>, words=<value optimized out>, flags=<value optimized out>, fds_to_close=0x1b866a0, async=0, subshell=0) at execute_cmd.c:4172
#33 0x000000000042fa18 in execute_builtin_or_function (simple_command=<value optimized out>, pipe_in=-1, pipe_out=-1, async=<value optimized out>, fds_to_close=<value optimized out>) at execute_cmd.c:4402
#34 execute_simple_command (simple_command=<value optimized out>, pipe_in=-1, pipe_out=-1, async=<value optimized out>, fds_to_close=<value optimized out>) at execute_cmd.c:3817
#35 0x0000000000430463 in execute_command_internal (command=0x1b77ec0, asynchronous=<value optimized out>, pipe_in=-1, pipe_out=-1, fds_to_close=0x1b866a0) at execute_cmd.c:741
#36 0x000000000043110e in execute_command (command=0x1b77ec0) at execute_cmd.c:379
#37 0x000000000043357e in execute_connection (command=0x1b83890, asynchronous=0, pipe_in=-1, pipe_out=-1, fds_to_close=0x1b7a4d0) at execute_cmd.c:2240
#38 0x00000000004303bd in execute_command_internal (command=0x1b83890, asynchronous=<value optimized out>, pipe_in=-1, pipe_out=-1, fds_to_close=0x1b7a4d0) at execute_cmd.c:896
#39 0x000000000043110e in execute_command (command=0x1b83890) at execute_cmd.c:379
#40 0x000000000043357e in execute_connection (command=0x1b824b0, asynchronous=0, pipe_in=-1, pipe_out=-1, fds_to_close=0x1b7c920) at execute_cmd.c:2240
#41 0x00000000004303bd in execute_command_internal (command=0x1b824b0, asynchronous=<value optimized out>, pipe_in=-1, pipe_out=-1, fds_to_close=0x1b7c920) at execute_cmd.c:896
#42 0x000000000043110e in execute_command (command=0x1b824b0) at execute_cmd.c:379
#43 0x0000000000430545 in execute_command_internal (command=0x1b7f0e0, asynchronous=<value optimized out>, pipe_in=-1, pipe_out=-1, fds_to_close=0x1b8b7e0) at execute_cmd.c:848
#44 0x00000000004335b3 in execute_connection (command=0x1b8c820, asynchronous=0, pipe_in=-1, pipe_out=-1, fds_to_close=0x1b8b7e0) at execute_cmd.c:2242
#45 0x00000000004303bd in execute_command_internal (command=0x1b8c820, asynchronous=<value optimized out>, pipe_in=-1, pipe_out=-1, fds_to_close=0x1b8b7e0) at execute_cmd.c:896
#46 0x0000000000430362 in execute_command_internal (command=0x1b87ca0, asynchronous=<value optimized out>, pipe_in=-1, pipe_out=-1, fds_to_close=0x1b8b7e0) at execute_cmd.c:889
#47 0x0000000000432169 in execute_function (var=<value optimized out>, words=<value optimized out>, flags=<value optimized out>, fds_to_close=0x1b8b7e0, async=0, subshell=0) at execute_cmd.c:4172
#48 0x000000000042fa18 in execute_builtin_or_function (simple_command=<value optimized out>, pipe_in=-1, pipe_out=-1, async=<value optimized out>, fds_to_close=<value optimized out>) at execute_cmd.c:4402
#49 execute_simple_command (simple_command=<value optimized out>, pipe_in=-1, pipe_out=-1, async=<value optimized out>, fds_to_close=<value optimized out>) at execute_cmd.c:3817
#50 0x0000000000430463 in execute_command_internal (command=0x1b75ff0, asynchronous=<value optimized out>, pipe_in=-1, pipe_out=-1, fds_to_close=0x1b8b7e0) at execute_cmd.c:741
#51 0x000000000046b86b in parse_and_execute (string=<value optimized out>, from_file=0x4a4f71 "trap", flags=<value optimized out>) at evalstring.c:339
#52 0x0000000000450e79 in run_pending_traps () at trap.c:349
#53 0x000000000045101d in trap_handler (sig=1) at trap.c:414
#54 <signal handler called>
#55 0x00000037abee5347 in munmap () at ../sysdeps/unix/syscall-template.S:82
#56 0x00000037abe7449a in _IO_setb (f=0x1b889b0, b=0x0, eb=<value optimized out>, a=<value optimized out>) at genops.c:406
#57 0x00000037abe727b8 in _IO_new_file_close_it (fp=0x1b889b0) at fileops.c:188
#58 0x00000037abe664f8 in _IO_new_fclose (fp=0x1b889b0) at iofclose.c:62
#59 0x00000037abe2f510 in read_alias_file (fname=<value optimized out>, fname_len=<value optimized out>) at localealias.c:355
#60 0x00000037abe2f77e in _nl_expand_alias (name=0x7ffe1088c400 "en_US.UTF-8") at localealias.c:189
#61 0x00000037abe2def8 in _nl_find_domain (dirname=0x37abf59920 "/usr/share/locale", locale=0x7ffe1088c400 "en_US.UTF-8", domainname=0x7ffe1088c420 "LC_MESSAGES/libc.mo", domainbinding=0x0) at finddomain.c:120
#62 0x00000037abe2d6e0 in __dcigettext (domainname=0x37abf55af5 "libc", msgid1=0x37abf55d4d "Input/output error", msgid2=0x0, plural=0, n=0, category=5) at dcigettext.c:628
#63 0x00000037abe81088 in __strerror_r (errnum=5, buf=0x0, buflen=0) at _strerror.c:65
#64 0x00000037abe80f8e in strerror (errnum=5) at strerror.c:33
#65 0x00000000004719fa in read_builtin (list=0x1b85610) at ./read.def:588
#66 0x000000000042d8ce in execute_builtin (builtin=0x470660 <read_builtin>, words=0x1b8c9f0, flags=<value optimized out>, subshell=<value optimized out>) at execute_cmd.c:3976
#67 0x000000000042f80b in execute_builtin_or_function (simple_command=<value optimized out>, pipe_in=-1, pipe_out=-1, async=<value optimized out>, fds_to_close=<value optimized out>) at execute_cmd.c:4400
#68 execute_simple_command (simple_command=<value optimized out>, pipe_in=-1, pipe_out=-1, async=<value optimized out>, fds_to_close=<value optimized out>) at execute_cmd.c:3817
#69 0x0000000000430463 in execute_command_internal (command=0x1b44050, asynchronous=<value optimized out>, pipe_in=-1, pipe_out=-1, fds_to_close=0x1b75dc0) at execute_cmd.c:741
#70 0x000000000043110e in execute_command (command=0x1b44050) at execute_cmd.c:379
#71 0x0000000000430545 in execute_command_internal (command=0x1b44cb0, asynchronous=<value optimized out>, pipe_in=-1, pipe_out=-1, fds_to_close=0x1b860a0) at execute_cmd.c:848
#72 0x00000000004335b3 in execute_connection (command=0x1b4bee0, asynchronous=0, pipe_in=-1, pipe_out=-1, fds_to_close=0x1b860a0) at execute_cmd.c:2242
#73 0x00000000004303bd in execute_command_internal (command=0x1b4bee0, asynchronous=<value optimized out>, pipe_in=-1, pipe_out=-1, fds_to_close=0x1b860a0) at execute_cmd.c:896
#74 0x000000000043110e in execute_command (command=0x1b4bee0) at execute_cmd.c:379
#75 0x000000000043357e in execute_connection (command=0x1b483b0, asynchronous=0, pipe_in=-1, pipe_out=-1, fds_to_close=0x1b80f30) at execute_cmd.c:2240
#76 0x00000000004303bd in execute_command_internal (command=0x1b483b0, asynchronous=<value optimized out>, pipe_in=-1, pipe_out=-1, fds_to_close=0x1b80f30) at execute_cmd.c:896
#77 0x000000000043110e in execute_command (command=0x1b483b0) at execute_cmd.c:379
#78 0x000000000043357e in execute_connection (command=0x1b43ef0, asynchronous=0, pipe_in=-1, pipe_out=-1, fds_to_close=0x1b75f90) at execute_cmd.c:2240
#79 0x00000000004303bd in execute_command_internal (command=0x1b43ef0, asynchronous=<value optimized out>, pipe_in=-1, pipe_out=-1, fds_to_close=0x1b75f90) at execute_cmd.c:896
#80 0x000000000043110e in execute_command (command=0x1b43ef0) at execute_cmd.c:379
#81 0x000000000043357e in execute_connection (command=0x1b43f20, asynchronous=0, pipe_in=-1, pipe_out=-1, fds_to_close=0x1b4cca0) at execute_cmd.c:2240
#82 0x00000000004303bd in execute_command_internal (command=0x1b43f20, asynchronous=<value optimized out>, pipe_in=-1, pipe_out=-1, fds_to_close=0x1b4cca0) at execute_cmd.c:896
#83 0x000000000043110e in execute_command (command=0x1b43f20) at execute_cmd.c:379
#84 0x0000000000430b0f in execute_case_command (command=0x1b3ed90, asynchronous=<value optimized out>, pipe_in=<value optimized out>, pipe_out=<value optimized out>, fds_to_close=0x0) at execute_cmd.c:3010
#85 execute_command_internal (command=0x1b3ed90, asynchronous=<value optimized out>, pipe_in=<value optimized out>, pipe_out=<value optimized out>, fds_to_close=0x0) at execute_cmd.c:830
#86 0x00000000004335b3 in execute_connection (command=0x1b3ff00, asynchronous=0, pipe_in=-1, pipe_out=-1, fds_to_close=0x1b7f1d0) at execute_cmd.c:2242
#87 0x00000000004303bd in execute_command_internal (command=0x1b3ff00, asynchronous=<value optimized out>, pipe_in=-1, pipe_out=-1, fds_to_close=0x1b7f1d0) at execute_cmd.c:896
#88 0x000000000043110e in execute_command (command=0x1b3ff00) at execute_cmd.c:379
#89 0x0000000000431a17 in execute_while_or_until (while_command=0x1b426d0, type=0) at execute_cmd.c:3094
#90 0x0000000000430565 in execute_while_command (command=0x1b3f4a0, asynchronous=<value optimized out>, pipe_in=-1, pipe_out=-1, fds_to_close=0x1b4d7b0) at execute_cmd.c:3040
#91 execute_command_internal (command=0x1b3f4a0, asynchronous=<value optimized out>, pipe_in=-1, pipe_out=-1, fds_to_close=0x1b4d7b0) at execute_cmd.c:836
#92 0x00000000004335b3 in execute_connection (command=0x1b40060, asynchronous=0, pipe_in=-1, pipe_out=-1, fds_to_close=0x1b4d7b0) at execute_cmd.c:2242
#93 0x00000000004303bd in execute_command_internal (command=0x1b40060, asynchronous=<value optimized out>, pipe_in=-1, pipe_out=-1, fds_to_close=0x1b4d7b0) at execute_cmd.c:896
#94 0x000000000043110e in execute_command (command=0x1b40060) at execute_cmd.c:379
#95 0x000000000043357e in execute_connection (command=0x1b40cc0, asynchronous=0, pipe_in=-1, pipe_out=-1, fds_to_close=0x1b797f0) at execute_cmd.c:2240
#96 0x00000000004303bd in execute_command_internal (command=0x1b40cc0, asynchronous=<value optimized out>, pipe_in=-1, pipe_out=-1, fds_to_close=0x1b797f0) at execute_cmd.c:896
#97 0x000000000043110e in execute_command (command=0x1b40cc0) at execute_cmd.c:379
#98 0x000000000043357e in execute_connection (command=0x1b40870, asynchronous=0, pipe_in=-1, pipe_out=-1, fds_to_close=0x1b79860) at execute_cmd.c:2240
#99 0x00000000004303bd in execute_command_internal (command=0x1b40870, asynchronous=<value optimized out>, pipe_in=-1, pipe_out=-1, fds_to_close=0x1b79860) at execute_cmd.c:896
#100 0x000000000043110e in execute_command (command=0x1b40870) at execute_cmd.c:379
#101 0x000000000043357e in execute_connection (command=0x1b41bd0, asynchronous=0, pipe_in=-1, pipe_out=-1, fds_to_close=0x1b79810) at execute_cmd.c:2240
#102 0x00000000004303bd in execute_command_internal (command=0x1b41bd0, asynchronous=<value optimized out>, pipe_in=-1, pipe_out=-1, fds_to_close=0x1b79810) at execute_cmd.c:896
#103 0x000000000043110e in execute_command (command=0x1b41bd0) at execute_cmd.c:379
#104 0x000000000043357e in execute_connection (command=0x1b41c80, asynchronous=0, pipe_in=-1, pipe_out=-1, fds_to_close=0x1b42b80) at execute_cmd.c:2240
#105 0x00000000004303bd in execute_command_internal (command=0x1b41c80, asynchronous=<value optimized out>, pipe_in=-1, pipe_out=-1, fds_to_close=0x1b42b80) at execute_cmd.c:896
#106 0x0000000000430362 in execute_command_internal (command=0x1b423e0, asynchronous=<value optimized out>, pipe_in=-1, pipe_out=-1, fds_to_close=0x1b42b80) at execute_cmd.c:889
#107 0x0000000000432169 in execute_function (var=<value optimized out>, words=<value optimized out>, flags=<value optimized out>, fds_to_close=0x1b42b80, async=0, subshell=0) at execute_cmd.c:4172
#108 0x000000000042fa18 in execute_builtin_or_function (simple_command=<value optimized out>, pipe_in=-1, pipe_out=-1, async=<value optimized out>, fds_to_close=<value optimized out>) at execute_cmd.c:4402
#109 execute_simple_command (simple_command=<value optimized out>, pipe_in=-1, pipe_out=-1, async=<value optimized out>, fds_to_close=<value optimized out>) at execute_cmd.c:3817
#110 0x0000000000430463 in execute_command_internal (command=0x1b43350, asynchronous=<value optimized out>, pipe_in=-1, pipe_out=-1, fds_to_close=0x1b42b80) at execute_cmd.c:741
#111 0x00000000004335b3 in execute_connection (command=0x1b433b0, asynchronous=0, pipe_in=-1, pipe_out=-1, fds_to_close=0x1b42b80) at execute_cmd.c:2242
#112 0x00000000004303bd in execute_command_internal (command=0x1b433b0, asynchronous=<value optimized out>, pipe_in=-1, pipe_out=-1, fds_to_close=0x1b42b80) at execute_cmd.c:896
#113 0x000000000043110e in execute_command (command=0x1b433b0) at execute_cmd.c:379
#114 0x000000000043357e in execute_connection (command=0x1b434a0, asynchronous=0, pipe_in=-1, pipe_out=-1, fds_to_close=0x1b42c50) at execute_cmd.c:2240
#115 0x00000000004303bd in execute_command_internal (command=0x1b434a0, asynchronous=<value optimized out>, pipe_in=-1, pipe_out=-1, fds_to_close=0x1b42c50) at execute_cmd.c:896
#116 0x000000000043110e in execute_command (command=0x1b434a0) at execute_cmd.c:379
#117 0x0000000000430545 in execute_command_internal (command=0x1b43620, asynchronous=<value optimized out>, pipe_in=-1, pipe_out=-1, fds_to_close=0x1b42c90) at execute_cmd.c:848
#118 0x000000000043110e in execute_command (command=0x1b43620) at execute_cmd.c:379
#119 0x000000000041d6d6 in reader_loop () at eval.c:153
#120 0x000000000041cebc in main (argc=<value optimized out>, argv=0x7ffe1088dfc8, env=0x7ffe1088dfe0) at shell.c:759
(gdb)

_nl_expand_alias is holding a mutex in frame 60, and frame 0 blocks attempting
to get it again.

  So far I could not create a reproducer for this specific issue, and it should
be very difficult due to require a very small time window.

  The ugliest fix should be to not translate the "read error" in the chunk
of builtins/read.def:

#if 1
  if (retval < 0)
    {
      builtin_error (_("read error: %d: %s"), fd, strerror (errno));
      run_unwind_frame ("read_builtin");
      interrupt_immediately--;
      terminate_immediately--;
      return (EXECUTION_FAILURE);
    }
#endif

  The least chance of problems, likely would be to decrease interrupt_immediately
before calling "builtin_error", and then increase again after it returns. Also,
when builtin_error returns, likely would need to call run_pending_traps blindly
*or* add extra states to know a signal was received during the builtin_error call.

  Any comments about my suggested change above?

Comment 15 Siteshwar Vashisht 2017-09-29 20:28:33 UTC
You can try moving code to decrement interrupt_immediately and terminate_immediately above call to builtin_error() function (and not increment it later). I am not able to understand why issue in comment 14 did not reproduce earlier before this patch.