Bug 690356
Summary: | tcsh doesn't exit when child killed, consumes 100% cpu | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Bob Arendt <rda> | ||||||
Component: | tcsh | Assignee: | Vojtech Vitek <vvitek> | ||||||
Status: | CLOSED ERRATA | QA Contact: | Branislav NĂ¡ter <bnater> | ||||||
Severity: | urgent | Docs Contact: | |||||||
Priority: | medium | ||||||||
Version: | 6.0 | CC: | angelotech, bnater, hripps, jkaluza, jwest, msvoboda, ovasik | ||||||
Target Milestone: | rc | Keywords: | Patch | ||||||
Target Release: | --- | ||||||||
Hardware: | All | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | tcsh-6.17-13.el6 | Doc Type: | Bug Fix | ||||||
Doc Text: |
If the tcsh shell redirected standard output to a child process using a pipe and this child process terminated, the shell tried to print a message to the already-closed pipe as a high-priority event that could never been finished. As a consequence, tcsh entered an infinite loop and consumed up to 100% of the CPU. To fix this issue, this error event is now removed from the event queue before the shell tries to write it to the broken pipe. As result, the parent process terminates ordinarily.
|
Story Points: | --- | ||||||
Clone Of: | |||||||||
: | 690500 (view as bug list) | Environment: | |||||||
Last Closed: | 2011-12-06 17:02:06 UTC | Type: | --- | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | |||||||||
Bug Blocks: | 690500 | ||||||||
Attachments: |
|
Description
Bob Arendt
2011-03-24 03:29:31 UTC
This request was evaluated by Red Hat Product Management for inclusion in the current release of Red Hat Enterprise Linux. Because the affected component is not scheduled to be updated in the current release, Red Hat is unfortunately unable to address this request at this time. Red Hat invites you to ask your support representative to propose this request, if appropriate and relevant, in the next release of Red Hat Enterprise Linux. If you would like it considered as an exception in the current release, please ask your support representative. Created attachment 487184 [details] patch for tcsh consumes 100% CPU when child killed Here's the analysis and patch that my colleague Justin Irwin came up with. This is happening in tcsh's process event reporting in sh.proc.c. When the child is killed the event (sigchild?) results in a call to pendjob(). But the output pipe is closed, so the xprintf() fails without clearing the event. But the queued event is still the top of the queue, so it tries to process the same event again. Forever. Justin's patch zero's the pcurrjob pointer just before xprintf, removing the current error from the queue just prior to the xprintf. If it fails - no output. Output isn't possible in this situation anyway. But the job event is off the queue and tcsh is able to complete and terminate. We've tested it and it solves the problem. Notice that pcurrjob is zero'd on exit anyway; This is a very safe patch. This may alleviate the other tcsh bugs cited in comment #1. Strace of stuck tcsh, after running the reproducer (attachment 487182 [details]):
<snip>
lseek(16, 0, SEEK_END) = -1 ESPIPE (Illegal seek)
lseek(16, 0, SEEK_END) = -1 ESPIPE (Illegal seek)
write(17, "[1] 10393\n", 10) = -1 EPIPE (Broken pipe)
--- SIGPIPE (Broken pipe) @ 0 (0) ---
lseek(16, 0, SEEK_END) = -1 ESPIPE (Illegal seek)
lseek(16, 0, SEEK_END) = -1 ESPIPE (Illegal seek)
write(17, "[1] 10393\n", 10) = -1 EPIPE (Broken pipe)
--- SIGPIPE (Broken pipe) @ 0 (0) ---
</snip>
Gdb:
<snip>
Program received signal SIGPIPE, Broken pipe.
0x0000003c696d34f0 in __write_nocancel () at ../sysdeps/unix/syscall-template.S:82
82 T_PSEUDO (SYSCALL_SYMBOL, SYSCALL_NAME, SYSCALL_NARGS)
</snip>
The patch seems ok.
Thanks Bob & Justin, I am going to apply the patch in all supported Fedoras and I'll escalate this for inclusion to RHEL-6.2 and RHEL-5.7 tcsh617. The patch has been accepted by upstream as well: http://mx.gw.com/pipermail/tcsh-bugs/2011-March/000727.html For the record, the strace output from this reproducer looks somewhat different on RHEL5 tcsh-6.14-12.el5. IMO, this looks an awful lot like the strace output from bug #618723 (https://bugzilla.redhat.com/show_bug.cgi?id=618723). fstat64(247, 0xbfca35a4) = -1 EBADF (Bad file descriptor) fstat64(248, 0xbfca35a4) = -1 EBADF (Bad file descriptor) fstat64(249, 0xbfca35a4) = -1 EBADF (Bad file descriptor) fstat64(250, 0xbfca35a4) = -1 EBADF (Bad file descriptor) fstat64(251, 0xbfca35a4) = -1 EBADF (Bad file descriptor) fstat64(252, 0xbfca35a4) = -1 EBADF (Bad file descriptor) fstat64(253, 0xbfca35a4) = -1 EBADF (Bad file descriptor) fstat64(254, 0xbfca35a4) = -1 EBADF (Bad file descriptor) fstat64(255, 0xbfca35a4) = -1 EBADF (Bad file descriptor) write(17, "[1] 18966\n", 10) = -1 EPIPE (Broken pipe) --- SIGPIPE (Broken pipe) @ 0 (0) --- _llseek(16, 0, 0xbfca3510, SEEK_END) = -1 ESPIPE (Illegal seek) rt_sigprocmask(SIG_SETMASK, NULL, [], 8) = 0 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 fstat64(0, 0xbfca35a4) = -1 EBADF (Bad file descriptor) fstat64(1, 0xbfca35a4) = -1 EBADF (Bad file descriptor) fstat64(2, 0xbfca35a4) = -1 EBADF (Bad file descriptor) fstat64(3, 0xbfca35a4) = -1 EBADF (Bad file descriptor) fstat64(4, 0xbfca35a4) = -1 EBADF (Bad file descriptor) fstat64(5, 0xbfca35a4) = -1 EBADF (Bad file descriptor) fstat64(6, 0xbfca35a4) = -1 EBADF (Bad file descriptor) Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: If the tcsh shell redirected standard output to a child process using a pipe and this child process terminated, the shell tried to print a message to the already-closed pipe as a high-priority event that could never been finished. As a consequence, tcsh entered an infinite loop and consumed up to 100% of the CPU. To fix this issue, this error event is now removed from the event queue before the shell tries to write it to the broken pipe. As result, the parent process terminates ordinarily. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2011-1686.html |