Hide Forgot
zsh-5.0.2-18.el7 lacks the queue_signals() block in bld_eprog() as well. +++ This bug was initially created as a clone of Bug #1311166 +++ Description of problem: zhandler() - free() deadlock in bld_eprog(): (gdb) bt #0 __lll_lock_wait_private () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:97 #1 0x000000333827cdc0 in _L_lock_5199 () from /lib64/libc-2.12.so #2 0x000000333827871b in _int_free (av=0x333858fe80, p=0x24b9a10, have_lock=0) at malloc.c:4963 #3 0x000000000044362d in freejob (jn=0x249a560, deleting=1) at jobs.c:1103 #4 0x000000000044486d in printjob (jn=0x249a560, lng=0, synch=0) at jobs.c:1066 #5 0x00000000004472da in update_job (jn=0x249a560) at jobs.c:508 #6 0x0000000000473bcb in wait_for_processes () at signals.c:502 #7 0x0000000000474495 in zhandler (sig=17) at signals.c:584 #8 <signal handler called> #9 0x0000003338278443 in _int_free (av=0x333858fe80, p=0x24baf20, have_lock=0) at malloc.c:4973 #10 0x0000000000465a47 in bld_eprog () at parse.c:413 #11 0x000000000040f134 in bin_test (name=0x7f49c1ad7120 "[", argv=0x7f49c1ad7528, ops=<value optimized out>, func=<value optimized out>) at builtin.c:5851 Version-Release number of selected component (if applicable): zsh-4.3.11-4.el6_7.1.x86_64 How reproducible: Happened in production, not reproduced in test environment so far. Steps to Reproduce: 1. 2. 3. Actual results: zsh deadlocks in futex-wait state. Expected results: zsh doesn't deadlock Additional info: zsh-5.0.2-18.el7 will suffer as well. Fixed in upstream zsh-5.2, backport patch proposal: diff -up zsh-4.3.11/Src/parse.c.bld_eprog_sigleak zsh-4.3.11/Src/parse.c --- zsh-4.3.11/Src/parse.c.bld_eprog_sigleak 2016-02-23 15:34:15.063441138 +0100 +++ zsh-4.3.11/Src/parse.c 2016-02-23 15:35:04.369447251 +0100 @@ -391,6 +391,8 @@ bld_eprog(void) Eprog ret; int l; + queue_signals(); + ecadd(WCB_END()); ret = (Eprog) zhalloc(sizeof(*ret)); @@ -413,6 +415,8 @@ bld_eprog(void) zfree(ecbuf, eclen); ecbuf = NULL; + unqueue_signals(); + return ret; } Please, consider re-initiating talks about calling free() (non signal safe function i.e. not supposed to be called from signal handlers) from zhandler() and re-scan of rhel6/7/upstream zsh code for any other possible signal leaks again.
(In reply to Filip Krska from comment #0) > zsh-5.0.2-18.el7 will suffer as well. Fixed in upstream zsh-5.2, backport > patch proposal: > > diff -up zsh-4.3.11/Src/parse.c.bld_eprog_sigleak zsh-4.3.11/Src/parse.c > --- zsh-4.3.11/Src/parse.c.bld_eprog_sigleak 2016-02-23 15:34:15.063441138 > +0100 > +++ zsh-4.3.11/Src/parse.c 2016-02-23 15:35:04.369447251 +0100 > @@ -391,6 +391,8 @@ bld_eprog(void) > Eprog ret; > int l; > > + queue_signals(); > + > ecadd(WCB_END()); > > ret = (Eprog) zhalloc(sizeof(*ret)); > @@ -413,6 +415,8 @@ bld_eprog(void) > zfree(ecbuf, eclen); > ecbuf = NULL; > > + unqueue_signals(); > + > return ret; > } Thanks! This hunk is taken from a bigger upstream commit: https://sourceforge.net/p/zsh/code/ci/99586845 Would it also make sense to pick the other hunk for parse.c? --- a/Src/parse.c +++ b/Src/parse.c @@ -456,6 +456,8 @@ init_parse_status(void) void init_parse(void) { + queue_signals(); + if (ecbuf) zfree(ecbuf, eclen); ecbuf = (Wordcode) zalloc((eclen = EC_INIT_SIZE) * sizeof(wordcode)); @@ -466,6 +468,8 @@ init_parse(void) ecnfunc = 0; init_parse_status(); + + unqueue_signals(); } /* Build eprog. */ > Please, consider re-initiating talks about calling free() (non signal safe > function i.e. not supposed to be called from signal handlers) from > zhandler() and re-scan of rhel6/7/upstream zsh code for any other possible > signal leaks again. I keep gathering commits like this at bug #1198671. The one you picked the hunk from is already mentioned there (bug #1198671 comment #5). The problem is that these upstream patches are not exactly safe. They introduced regressions when they landed upstream. I do not think we want to propagate them to RHEL.
I am closing this as a duplicate of bug #1198671 because the proposed patch is already included in the patch for bug #1198671. *** This bug has been marked as a duplicate of bug 1198671 ***