Bug 1291782

Summary: deadlock when receiving SIGCHILD during free
Product: Red Hat Enterprise Linux 7 Reporter: Paulo Andrade <pandrade>
Component: zshAssignee: Kamil Dudka <kdudka>
Status: CLOSED ERRATA QA Contact: Jan Kepler <jkejda>
Severity: high Docs Contact: Maxim Svistunov <msvistun>
Priority: urgent    
Version: 7.1CC: fkrska, isenfeld, jkejda, kdudka, ovasik, zpytela
Target Milestone: rcKeywords: Patch, ZStream
Target Release: ---   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: zsh-5.0.2-17.el7 Doc Type: Bug Fix
Doc Text:
*zsh* no longer hangs when receiving a signal while processing a job exit Previously, signal handlers were enabled while processing a job exit in *zsh*. Consequently, if a signal was received while using the memory allocator and its handler attempted to allocate or free memory, the *zsh* process ended up in a deadlock and became unresponsive. With this update, signal handlers are no longer enabled while processing a job exit. Instead, signals are queued for delayed execution of the signal handlers. As a result, the deadlock no longer occurs and *zsh* no longer hangs.
Story Points: ---
Clone Of:
: 1337913 (view as bug list) Environment:
Last Closed: 2016-11-03 23:02:47 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1203710, 1295829, 1313485, 1337913    

Description Paulo Andrade 2015-12-15 15:19:25 UTC
User has this backtrace:

(gdb) bt
#0  0x00007f44e3639b6c in __lll_lock_wait_private () from /lib64/libc.so.6
#1  0x00007f44e35b8bd6 in _L_lock_12192 () from /lib64/libc.so.6
#2  0x00007f44e35b6181 in malloc () from /lib64/libc.so.6
#3  0x0000000000452260 in zalloc (size=<optimized out>, size@entry=8)
    at mem.c:802
#4  0x000000000044520b in addbgstatus (pid=31802, status=0) at jobs.c:1888
#5  0x0000000000474405 in wait_for_processes () at signals.c:530
#6  0x0000000000474b25 in zhandler (sig=17) at signals.c:592
#7  <signal handler called>
#8  0x00007f44e35b2d24 in _int_free () from /lib64/libc.so.6
#9  0x00000000004408d5 in inputsetline (str=str@entry=0x11bb2c0 "wait \n", 
    flags=flags@entry=1) at input.c:342
#10 0x0000000000440c8f in inputline () at input.c:327
#11 ingetc () at input.c:217
#12 0x000000000044ad3e in gettok () at lex.c:714
#13 zshlex () at lex.c:395
#14 0x0000000000468137 in parse_event () at parse.c:451
#15 0x000000000043d449 in loop (toplevel=toplevel@entry=1, 
    justonce=justonce@entry=0) at init.c:132
#16 0x000000000044074e in zsh_main (argc=<optimized out>, 
    argv=<optimized out>) at init.c:1616
#17 0x00007f44e3557af5 in __libc_start_main () from /lib64/libc.so.6
#18 0x000000000040ed21 in _start ()

  I believe a pseudo patch would be:

Src/mem.c:
[...]
 mod_export void
 zfree(void *p, UNUSED(int sz))
 {
     if (p)
+    {
+        queue_signals();
	 free(p);
+        unqueue_signals();
+    }
 }

 /**/
 mod_export void
 zsfree(char *p)
 {
     if (p)
+    {
+        queue_signals();
	 free(p);
+        unqueue_signals();
+    }
 }
[...]

and:

Src/input.c:
[...]
 static void
 inputsetline(char *str, int flags)
 {
     if ((inbufflags & INP_FREE) && inbuf) {
- 	free(inbuf);
+ 	zsfree(inbuf);
     }
[...]

Comment 1 Kamil Dudka 2015-12-15 15:39:42 UTC
I believe this could fixed by the following upstream commit:

http://sourceforge.net/p/zsh/code/ci/93ca77f8

(In reply to Paulo Andrade from comment #0)
>   I believe a pseudo patch would be:
> 
> Src/mem.c:
> [...]
>  mod_export void
>  zfree(void *p, UNUSED(int sz))
>  {
>      if (p)
> +    {
> +        queue_signals();
> 	 free(p);
> +        unqueue_signals();
> +    }
>  }

Upstream discourages from wrapping [z]free() by signal queuing globally because it could hide unprotected accesses to global state and result in wrong behavior, which is actually worse (and more difficult to debug) than a deadlock.

Comment 15 errata-xmlrpc 2016-11-03 23:02:47 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-2152.html