tcsh-6.18.01-12.fc22.x86_64 reproduces as well +++ This bug was initially created as a clone of Bug #1181682 +++ tcsh-6.18.01-7.el7.x86_64 reproduces as well +++ This bug was initially created as a clone of Bug #1181680 +++ Description of problem: tcsh hangs on wait builtin on system with more than one CPU. Version-Release number of selected component (if applicable): tcsh-6.17-24.el6.x86_64 How reproducible: Always Steps to Reproduce: 1. create executable script /tmp/reproduce.csh #!/bin/csh /bin/date & /bin/true wait 2. execute it in loop in csh shell: csh$ while ( 1 ) while? /tmp/reproduce.csh > & /dev/null while? end 3. in another terminal watch `ps -C reproduce.csh` command Actual results: after some time (several seconds or minutes) the script hangs (you observe the PID stops changing) Expected results: the watched PID keeps changing Additional info: Reproduces with tcsh-6.18.01-12.fc22.x86_64, tcsh-6.18.01-7.el7.x86_64 as well. (gdb) bt #0 0x00000034bf832c2a in do_sigsuspend (set=0x7fff735ec840) at ../sysdeps/unix/sysv/linux/sigsuspend.c:63 #1 __sigsuspend (set=0x7fff735ec840) at ../sysdeps/unix/sysv/linux/sigsuspend.c:74 #2 0x00000000004201c5 in dowait (v=<value optimized out>, c=<value optimized out>) at sh.proc.c:604 #3 0x000000000042197f in execute (t=0x2413850, wanttty=-1, pipein=0x0, pipeout=0x0, do_glob=1) at sh.sem.c:632 #4 0x0000000000421396 in execute (t=0x23cb1e0, wanttty=-1, pipein=0x0, pipeout=0x0, do_glob=1) at sh.sem.c:720 #5 0x0000000000403f38 in process (catch=1) at sh.c:2043 #6 0x000000000040605b in main (argc=<value optimized out>, argv=0x7fff735ed1a8) at sh.c:1290 (gdb) f 2 #2 0x00000000004201c5 in dowait (v=<value optimized out>, c=<value optimized out>) at sh.proc.c:604 604 sigsuspend(&pause_mask); (gdb) l 599 loop: 600 for (pp = proclist.p_next; pp; pp = pp->p_next) 601 if (pp->p_procid && /* pp->p_procid == pp->p_jobid && */ 602 pp->p_flags & PRUNNING) { 603 handle_pending_signals(); 604 sigsuspend(&pause_mask); 605 handle_pending_signals(); 606 goto loop; 607 } 608 pjobs = 0;
#2 0x0000000000420445 in dowait (v=<optimized out>, c=<optimized out>) at sh.proc.c:611 611 sigsuspend(&pause_mask); (gdb) l 606 loop: 607 for (pp = proclist.p_next; pp; pp = pp->p_next) 608 if (pp->p_procid && /* pp->p_procid == pp->p_jobid && */ 609 pp->p_flags & PRUNNING) { 610 (void)handle_pending_signals(); 611 sigsuspend(&pause_mask); 612 opintr_disabled = pintr_disabled; 613 pintr_disabled = 0; 614 gotsig = handle_pending_signals(); 615 pintr_disabled = opintr_disabled; (gdb) p proclist $15 = {p_next = 0x778908, p_friends = 0x0, p_cwd = 0x0, p_flags = 0, p_reason = 0 '\000', p_index = 0, p_parentid = 0, p_procid = 0, p_jobid = 0, p_btime = {tv_sec = 0, tv_usec = 0}, p_etime = {tv_sec = 0, tv_usec = 0}, p_rusage = {ru_utime = {tv_sec = 0, tv_usec = 0}, ru_stime = {tv_sec = 0, tv_usec = 0}, {ru_maxrss = 0, __ru_maxrss_word = 0}, {ru_ixrss = 0, __ru_ixrss_word = 0}, {ru_idrss = 0, __ru_idrss_word = 0}, {ru_isrss = 0, __ru_isrss_word = 0}, {ru_minflt = 0, __ru_minflt_word = 0}, {ru_majflt = 0, __ru_majflt_word = 0}, {ru_nswap = 0, __ru_nswap_word = 0}, {ru_inblock = 0, __ru_inblock_word = 0}, {ru_oublock = 0, __ru_oublock_word = 0}, {ru_msgsnd = 0, __ru_msgsnd_word = 0}, {ru_msgrcv = 0, __ru_msgrcv_word = 0}, {ru_nsignals = 0, __ru_nsignals_word = 0}, {ru_nvcsw = 0, __ru_nvcsw_word = 0}, {ru_nivcsw = 0, __ru_nivcsw_word = 0}}, p_command = 0x0} (gdb) p *proclist->p_next $17 = {p_next = 0x733508, p_friends = 0x778908, p_cwd = 0x7416c8, p_flags = 4, p_reason = 0 '\000', p_index = 0, p_parentid = 14001, p_procid = 0, p_jobid = 14017, p_btime = {tv_sec = 1421314672, tv_usec = 366544}, p_etime = {tv_sec = 0, tv_usec = 0}, p_rusage = { ru_utime = {tv_sec = 0, tv_usec = 2000}, ru_stime = {tv_sec = 0, tv_usec = 0}, {ru_maxrss = 2124, __ru_maxrss_word = 2124}, { ru_ixrss = 0, __ru_ixrss_word = 0}, {ru_idrss = 0, __ru_idrss_word = 0}, {ru_isrss = 0, __ru_isrss_word = 0}, {ru_minflt = 85, __ru_minflt_word = 85}, {ru_majflt = 0, __ru_majflt_word = 0}, {ru_nswap = 0, __ru_nswap_word = 0}, {ru_inblock = 0, __ru_inblock_word = 0}, {ru_oublock = 0, __ru_oublock_word = 0}, {ru_msgsnd = 0, __ru_msgsnd_word = 0}, {ru_msgrcv = 0, __ru_msgrcv_word = 0}, {ru_nsignals = 0, __ru_nsignals_word = 0}, {ru_nvcsw = 1, __ru_nvcsw_word = 1}, {ru_nivcsw = 1, __ru_nivcsw_word = 1}}, p_command = 0x7f3988 L"/bin/true"} (gdb) p *proclist->p_next->p_next $21 = {p_next = 0x0, p_friends = 0x733508, p_cwd = 0x7416c8, p_flags = 32772, p_reason = 0 '\000', p_index = 1, p_parentid = 14001, p_procid = 14016, p_jobid = 14016, p_btime = {tv_sec = 1421314672, tv_usec = 365773}, p_etime = {tv_sec = 0, tv_usec = 0}, p_rusage = {ru_utime = {tv_sec = 0, tv_usec = 4000}, ru_stime = {tv_sec = 0, tv_usec = 0}, {ru_maxrss = 2124, __ru_maxrss_word = 2124}, {ru_ixrss = 0, __ru_ixrss_word = 0}, {ru_idrss = 0, __ru_idrss_word = 0}, {ru_isrss = 0, __ru_isrss_word = 0}, {ru_minflt = 100, __ru_minflt_word = 100}, {ru_majflt = 0, __ru_majflt_word = 0}, {ru_nswap = 0, __ru_nswap_word = 0}, {ru_inblock = 0, __ru_inblock_word = 0}, {ru_oublock = 0, __ru_oublock_word = 0}, {ru_msgsnd = 0, __ru_msgsnd_word = 0}, {ru_msgrcv = 0, __ru_msgrcv_word = 0}, {ru_nsignals = 0, __ru_nsignals_word = 0}, {ru_nvcsw = 1, __ru_nvcsw_word = 1}, {ru_nivcsw = 4, __ru_nivcsw_word = 4}}, p_command = 0x7f3dc8 L"/bin/date"} Race condition theory: /bin/date exits SIGCHLD is emited then pchild() is called then dowait() is called when wait met in the script 606 loop: 607 for (pp = proclist.p_next; pp; pp = pp->p_next) 608 if (pp->p_procid && /* pp->p_procid == pp->p_jobid && */ 609 pp->p_flags & PRUNNING) { reaches /bin/date process, however PRUNNING flag still may be in place sigsuspend() is called and waits forever... pchild() sets pp->p_flags &= ~(PRUNNING | PSTOPPED | PREPORTED), however, too late Does it make sense, Ondrej?
Reproduced. Sending SIGCHLD to tcsh then starts the loop again, yep it seems like SIGCHLD leak. I'll have a look.
Created attachment 981414 [details] Fix proposed upstream. I sent the attached patch to tcsh-bugs.com, this needs to be fixed upstream first definitely. Mail archives are slow now so I'll post the link to proposal later.
http://mx.gw.com/pipermail/tcsh-bugs/2015-January/000914.html
F22/F21: http://pkgs.fedoraproject.org/cgit/tcsh.git/commit/?id=ed7c627f46baa1e F20: http://pkgs.fedoraproject.org/cgit/tcsh.git/commit/?h=f20&id=bc7aea1b23ef3
tcsh-6.18.01-13.fc21 has been submitted as an update for Fedora 21. https://admin.fedoraproject.org/updates/tcsh-6.18.01-13.fc21
tcsh-6.18.01-6.fc20 has been submitted as an update for Fedora 20. https://admin.fedoraproject.org/updates/tcsh-6.18.01-6.fc20
Package tcsh-6.18.01-6.fc20: * should fix your issue, * was pushed to the Fedora 20 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing tcsh-6.18.01-6.fc20' as soon as you are able to. Please go to the following url: https://admin.fedoraproject.org/updates/FEDORA-2015-1280/tcsh-6.18.01-6.fc20 then log in and leave karma (feedback).
tcsh-6.18.01-6.fc20 has been pushed to the Fedora 20 stable repository. If problems still persist, please make note of it in this bug report.
tcsh-6.18.01-13.fc21 has been pushed to the Fedora 21 stable repository. If problems still persist, please make note of it in this bug report.