1181685 – tcsh hangs on wait

Bug 1181685 - tcsh hangs on wait

Summary: tcsh hangs on wait

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	tcsh
Sub Component:
Version:	rawhide
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Assignee:	Pavel Raiskup
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2015-01-13 15:10 UTC by Filip Krska
Modified:	2015-02-06 04:03 UTC (History)
CC List:	5 users (show)
Fixed In Version:	tcsh-6.18.01-13.fc21
Clone Of:	1181682
Environment:
Last Closed:	2015-02-06 04:02:47 UTC
Type:	Bug
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Fix proposed upstream. (2.29 KB, text/plain) 2015-01-19 08:26 UTC, Pavel Raiskup	no flags	Details
View All

Description Filip Krska 2015-01-13 15:10:52 UTC

tcsh-6.18.01-12.fc22.x86_64 reproduces as well

+++ This bug was initially created as a clone of Bug #1181682 +++

tcsh-6.18.01-7.el7.x86_64 reproduces as well

+++ This bug was initially created as a clone of Bug #1181680 +++

Description of problem:

tcsh hangs on wait builtin on system with more than one CPU.


Version-Release number of selected component (if applicable):

tcsh-6.17-24.el6.x86_64

How reproducible:

Always

Steps to Reproduce:
1. create executable script /tmp/reproduce.csh

#!/bin/csh
/bin/date &
/bin/true
wait

2. execute it in loop in csh shell:
csh$ while ( 1 )
while? /tmp/reproduce.csh > & /dev/null
while? end

3. in another terminal watch `ps -C reproduce.csh` command

Actual results:

after some time (several seconds or minutes) the script hangs (you observe the PID stops changing)


Expected results:

the watched PID keeps changing

Additional info:

Reproduces with tcsh-6.18.01-12.fc22.x86_64, tcsh-6.18.01-7.el7.x86_64 as well.

(gdb) bt
#0  0x00000034bf832c2a in do_sigsuspend (set=0x7fff735ec840) at ../sysdeps/unix/sysv/linux/sigsuspend.c:63
#1  __sigsuspend (set=0x7fff735ec840) at ../sysdeps/unix/sysv/linux/sigsuspend.c:74
#2  0x00000000004201c5 in dowait (v=<value optimized out>, c=<value optimized out>) at sh.proc.c:604
#3  0x000000000042197f in execute (t=0x2413850, wanttty=-1, pipein=0x0, pipeout=0x0, do_glob=1) at sh.sem.c:632
#4  0x0000000000421396 in execute (t=0x23cb1e0, wanttty=-1, pipein=0x0, pipeout=0x0, do_glob=1) at sh.sem.c:720
#5  0x0000000000403f38 in process (catch=1) at sh.c:2043
#6  0x000000000040605b in main (argc=<value optimized out>, argv=0x7fff735ed1a8) at sh.c:1290
(gdb) f 2
#2  0x00000000004201c5 in dowait (v=<value optimized out>, c=<value optimized out>) at sh.proc.c:604
604		    sigsuspend(&pause_mask);
(gdb) l
599	loop:
600	    for (pp = proclist.p_next; pp; pp = pp->p_next)
601		if (pp->p_procid &&	/* pp->p_procid == pp->p_jobid && */
602		    pp->p_flags & PRUNNING) {
603		    handle_pending_signals();
604		    sigsuspend(&pause_mask);
605		    handle_pending_signals();
606		    goto loop;
607		}
608	    pjobs = 0;

Comment 1 Filip Krska 2015-01-15 15:48:22 UTC

#2  0x0000000000420445 in dowait (v=<optimized out>, c=<optimized out>) at sh.proc.c:611
611		    sigsuspend(&pause_mask);
(gdb) l
606	loop:
607	    for (pp = proclist.p_next; pp; pp = pp->p_next)
608		if (pp->p_procid &&	/* pp->p_procid == pp->p_jobid && */
609		    pp->p_flags & PRUNNING) {
610		    (void)handle_pending_signals();
611		    sigsuspend(&pause_mask);
612		    opintr_disabled = pintr_disabled;
613		    pintr_disabled = 0;
614		    gotsig = handle_pending_signals();
615		    pintr_disabled = opintr_disabled;
(gdb) p proclist 
$15 = {p_next = 0x778908, p_friends = 0x0, p_cwd = 0x0, p_flags = 0, p_reason = 0 '\000', p_index = 0, p_parentid = 0, p_procid = 0, 
  p_jobid = 0, p_btime = {tv_sec = 0, tv_usec = 0}, p_etime = {tv_sec = 0, tv_usec = 0}, p_rusage = {ru_utime = {tv_sec = 0, 
      tv_usec = 0}, ru_stime = {tv_sec = 0, tv_usec = 0}, {ru_maxrss = 0, __ru_maxrss_word = 0}, {ru_ixrss = 0, 
      __ru_ixrss_word = 0}, {ru_idrss = 0, __ru_idrss_word = 0}, {ru_isrss = 0, __ru_isrss_word = 0}, {ru_minflt = 0, 
      __ru_minflt_word = 0}, {ru_majflt = 0, __ru_majflt_word = 0}, {ru_nswap = 0, __ru_nswap_word = 0}, {ru_inblock = 0, 
      __ru_inblock_word = 0}, {ru_oublock = 0, __ru_oublock_word = 0}, {ru_msgsnd = 0, __ru_msgsnd_word = 0}, {ru_msgrcv = 0, 
      __ru_msgrcv_word = 0}, {ru_nsignals = 0, __ru_nsignals_word = 0}, {ru_nvcsw = 0, __ru_nvcsw_word = 0}, {ru_nivcsw = 0, 
      __ru_nivcsw_word = 0}}, p_command = 0x0}
(gdb) p *proclist->p_next 
$17 = {p_next = 0x733508, p_friends = 0x778908, p_cwd = 0x7416c8, p_flags = 4, p_reason = 0 '\000', p_index = 0, p_parentid = 14001, 
  p_procid = 0, p_jobid = 14017, p_btime = {tv_sec = 1421314672, tv_usec = 366544}, p_etime = {tv_sec = 0, tv_usec = 0}, p_rusage = {
    ru_utime = {tv_sec = 0, tv_usec = 2000}, ru_stime = {tv_sec = 0, tv_usec = 0}, {ru_maxrss = 2124, __ru_maxrss_word = 2124}, {
      ru_ixrss = 0, __ru_ixrss_word = 0}, {ru_idrss = 0, __ru_idrss_word = 0}, {ru_isrss = 0, __ru_isrss_word = 0}, {ru_minflt = 85, 
      __ru_minflt_word = 85}, {ru_majflt = 0, __ru_majflt_word = 0}, {ru_nswap = 0, __ru_nswap_word = 0}, {ru_inblock = 0, 
      __ru_inblock_word = 0}, {ru_oublock = 0, __ru_oublock_word = 0}, {ru_msgsnd = 0, __ru_msgsnd_word = 0}, {ru_msgrcv = 0, 
      __ru_msgrcv_word = 0}, {ru_nsignals = 0, __ru_nsignals_word = 0}, {ru_nvcsw = 1, __ru_nvcsw_word = 1}, {ru_nivcsw = 1, 
      __ru_nivcsw_word = 1}}, p_command = 0x7f3988 L"/bin/true"}
(gdb) p *proclist->p_next->p_next
$21 = {p_next = 0x0, p_friends = 0x733508, p_cwd = 0x7416c8, p_flags = 32772, p_reason = 0 '\000', p_index = 1, p_parentid = 14001, 
  p_procid = 14016, p_jobid = 14016, p_btime = {tv_sec = 1421314672, tv_usec = 365773}, p_etime = {tv_sec = 0, tv_usec = 0}, 
  p_rusage = {ru_utime = {tv_sec = 0, tv_usec = 4000}, ru_stime = {tv_sec = 0, tv_usec = 0}, {ru_maxrss = 2124, 
      __ru_maxrss_word = 2124}, {ru_ixrss = 0, __ru_ixrss_word = 0}, {ru_idrss = 0, __ru_idrss_word = 0}, {ru_isrss = 0, 
      __ru_isrss_word = 0}, {ru_minflt = 100, __ru_minflt_word = 100}, {ru_majflt = 0, __ru_majflt_word = 0}, {ru_nswap = 0, 
      __ru_nswap_word = 0}, {ru_inblock = 0, __ru_inblock_word = 0}, {ru_oublock = 0, __ru_oublock_word = 0}, {ru_msgsnd = 0, 
      __ru_msgsnd_word = 0}, {ru_msgrcv = 0, __ru_msgrcv_word = 0}, {ru_nsignals = 0, __ru_nsignals_word = 0}, {ru_nvcsw = 1, 
      __ru_nvcsw_word = 1}, {ru_nivcsw = 4, __ru_nivcsw_word = 4}}, p_command = 0x7f3dc8 L"/bin/date"}

Race condition theory:

/bin/date exits
SIGCHLD is emited then
pchild() is called then
dowait() is called when wait met in the script

606	loop:
607	    for (pp = proclist.p_next; pp; pp = pp->p_next)
608		if (pp->p_procid &&	/* pp->p_procid == pp->p_jobid && */
609		    pp->p_flags & PRUNNING) {

reaches /bin/date process, however PRUNNING flag still may be in place
sigsuspend() is called and waits forever...
pchild() sets pp->p_flags &= ~(PRUNNING | PSTOPPED | PREPORTED), however, too late

Does it make sense, Ondrej?

Comment 3 Pavel Raiskup 2015-01-16 16:32:16 UTC

Reproduced.  Sending SIGCHLD to tcsh then starts the loop again, yep it seems
like SIGCHLD leak.  I'll have a look.

Comment 4 Pavel Raiskup 2015-01-19 08:26:39 UTC

Created attachment 981414 [details]
Fix proposed upstream.

I sent the attached patch to tcsh-bugs.com, this needs to be fixed
upstream first definitely.  Mail archives are slow now so I'll post the link
to proposal later.

Comment 5 Pavel Raiskup 2015-01-19 09:55:40 UTC

http://mx.gw.com/pipermail/tcsh-bugs/2015-January/000914.html

Comment 7 Pavel Raiskup 2015-01-27 06:34:55 UTC

F22/F21:
http://pkgs.fedoraproject.org/cgit/tcsh.git/commit/?id=ed7c627f46baa1e
F20:
http://pkgs.fedoraproject.org/cgit/tcsh.git/commit/?h=f20&id=bc7aea1b23ef3

Comment 8 Fedora Update System 2015-01-27 06:58:19 UTC

tcsh-6.18.01-13.fc21 has been submitted as an update for Fedora 21.
https://admin.fedoraproject.org/updates/tcsh-6.18.01-13.fc21

Comment 9 Fedora Update System 2015-01-27 06:59:37 UTC

tcsh-6.18.01-6.fc20 has been submitted as an update for Fedora 20.
https://admin.fedoraproject.org/updates/tcsh-6.18.01-6.fc20

Comment 10 Fedora Update System 2015-01-28 19:57:20 UTC

Package tcsh-6.18.01-6.fc20:
* should fix your issue,
* was pushed to the Fedora 20 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing tcsh-6.18.01-6.fc20'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2015-1280/tcsh-6.18.01-6.fc20
then log in and leave karma (feedback).

Comment 11 Fedora Update System 2015-02-06 04:02:47 UTC

tcsh-6.18.01-6.fc20 has been pushed to the Fedora 20 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 12 Fedora Update System 2015-02-06 04:03:07 UTC

tcsh-6.18.01-13.fc21 has been pushed to the Fedora 21 stable repository.  If problems still persist, please make note of it in this bug report.

Note You need to log in before you can comment on or make changes to this bug.