Bug 166669
Summary: | [RHEL3 U5] waitpid() returns unexpected ECHILD | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 3 | Reporter: | Issue Tracker <tao> | ||||||
Component: | kernel | Assignee: | Ingo Molnar <mingo> | ||||||
Status: | CLOSED WONTFIX | QA Contact: | Brian Brock <bbrock> | ||||||
Severity: | medium | Docs Contact: | |||||||
Priority: | medium | ||||||||
Version: | 3.0 | CC: | lwang, maeno.masaki, pcormier, petrides, riek, tao | ||||||
Target Milestone: | --- | ||||||||
Target Release: | --- | ||||||||
Hardware: | All | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | RHSA-2006-0144 | Doc Type: | Bug Fix | ||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2006-05-08 11:59:47 UTC | Type: | --- | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
Issue Tracker
2005-08-24 14:48:39 UTC
Created attachment 118063 [details]
test case for the bug
Created attachment 118759 [details]
A simple test case from IT 72214
1) Compile with "gcc -O2 -o killipf killipf.c -lpthread"
2) Run with "PASS=0; while ./killipf; do let PASS=++PASS; echo $PASS; done"
3) Program dies occasionally with "waitpid failed!: No child processes"
4) With the Fujitsu patch, the program loops correctly forever with output:
child pid:xxxxx
Status returned:9
PASS : received expected signal 9
A fix for this problem has just been committed to the RHEL3 U7 patch pool this evening (in kernel version 2.4.21-37.5.EL). Please give me the beta-kernel (kernel-2.4.21-37.EL --> kernel-2.4.21-37.5.EL) included the patch. We encounter the problem that looks like. We try to use the patch. I found kernel-2.4.21-37.6.EL in ftp.pbone.net. We try to use kernel-2.4.21-37.6.EL.src.rpm. ftp://ftp.pbone.net/mirror/people-redhat.com/zaitcev/171129/kernel-2.4.21- 37.6.EL.bz171129.1.src.rpm $ cat test.c #include <stdio.h> #include <stdlib.h> #include <string.h> #include <signal.h> #include <errno.h> #include <unistd.h> int main(int argc, char *argv[]) { int i, rc; signal(SIGCHLD, SIG_IGN); for (i=0; i<100; i++) { //rc = system("/bin/zcat a.gz > b.txt"); rc = system("ls -a"); fprintf(stderr, "system result[%3d] = %x errno=[%3d]", i, rc, errno); perror("system"); errno = 0; } return 0; } $ gcc -o test test.c $ ./test 2>&1 | tee test.txt rt_sigaction(SIGINT, {SIG_IGN}, {SIG_DFL}, 8) = 0 rt_sigaction(SIGQUIT, {SIG_IGN}, {SIG_DFL}, 8) = 0 rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0 clone(child_stack=0, flags=CLONE_PARENT_SETTID|SIGCHLD, parent_tidptr=0xbfffcad8) = 13409 . .. ignore.txt ignore2 ignore2.c ignore4 ignore4.c ignore_st.log waitpid(13409, 0xbfffcad4, 0) = -1 ECHILD (No child processes) rt_sigaction(SIGINT, {SIG_DFL}, NULL, 8) = 0 rt_sigaction(SIGQUIT, {SIG_DFL}, NULL, 8) = 0 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 write(2, "system result[ 97] = ffffffff er"..., 41system result[ 97] = ffffffff errno=[ 10]) = 41 write(2, "system: No child processes\n", 27system: No child processes ) = 27 rt_sigaction(SIGINT, {SIG_IGN}, {SIG_DFL}, 8) = 0 rt_sigaction(SIGQUIT, {SIG_IGN}, {SIG_DFL}, 8) = 0 rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0 clone(child_stack=0, flags=CLONE_PARENT_SETTID|SIGCHLD, parent_tidptr=0xbfffcad8) = 13410 waitpid(13410, . .. ignore.txt ignore2 ignore2.c ignore4 ignore4.c ignore_st.log [WIFEXITED(s) && WEXITSTATUS(s) == 0], 0) = 13410 rt_sigaction(SIGINT, {SIG_DFL}, NULL, 8) = 0 rt_sigaction(SIGQUIT, {SIG_DFL}, NULL, 8) = 0 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 --- SIGCHLD (Child exited) @ 0 (0) --- write(2, "system result[ 98] = 0 errno=[ "..., 34system result[ 98] = 0 errno= [ 0]) = 34 write(2, "system: Success\n", 16system: Success ) = 16 rt_sigaction(SIGINT, {SIG_IGN}, {SIG_DFL}, 8) = 0 rt_sigaction(SIGQUIT, {SIG_IGN}, {SIG_DFL}, 8) = 0 rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0 clone(child_stack=0, flags=CLONE_PARENT_SETTID|SIGCHLD, parent_tidptr=0xbfffcad8) = 13411 waitpid(13411, . .. ignore.txt ignore2 ignore2.c ignore4 ignore4.c ignore_st.log [WIFEXITED(s) && WEXITSTATUS(s) == 0], 0) = 13411 rt_sigaction(SIGINT, {SIG_DFL}, NULL, 8) = 0 rt_sigaction(SIGQUIT, {SIG_DFL}, NULL, 8) = 0 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 --- SIGCHLD (Child exited) @ 0 (0) --- write(2, "system result[ 99] = 0 errno=[ "..., 34system result[ 99] = 0 errno= [ 0]) = 34 write(2, "system: Success\n", 16system: Success ) = 16 exit_group(0) = ? $ If waitpid() is executed before child process is executed, return value of waitpid() is 0. Otherwise, return value of waitpid() is error, errno is ECHILD. This is contradicts SingleUNIXSpecification: http://www.opengroup.org/onlinepubs/009695399/functions/waitpid.html ====== If the calling process has SA_NOCLDWAIT set or has SIGCHLD set to SIG_IGN, and the process has no unwaited-for children that were transformed into zombie processes, the calling thread shall block until all of the children of the process containing the calling thread terminate, and wait() and waitpid() shall fail and set errno to [ECHILD]. ====== - kernel2.6 is always ECHILD. It conforms to SingleUNIXSpecification. - kernel2.4 is always 0 if kernel2.4 is not implemented. It doesn't conform to its specification. - Only RHEL3-kernel is 0 and ECHILD, is not constant. RHEL3-kernel is an imperfect backing port. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2006-0144.html > You may reopen this bug report if the solution does not work for you.
We confirmed that the problem reproduced on RHEL3U7(kernel-2.4.21-40.ELsmp)
intalled newly.
(a) "Only" RHEL3's kernel is 0 and ECHILD, is not constant.
RHEL3-kernel is an imperfect backing port from kernel2.6.
(b) kernel2.6 is always ECHILD. It conforms to SingleUNIXSpecification.
(c) kernel2.4 is always 0 if kernel2.4 is not implemented. It doesn't conform
to its specification.
We are convinced that there is the problem in wait() kernel-function.
Please confirm this problem on RHEL3U7(=not constant) and RHEL4U3(=always
ECHILD) by RedHat's engineer executes test.c(#23 or below)
$ cat test.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <signal.h>
#include <errno.h>
#include <unistd.h>
int main(int argc, char *argv[])
{
int i, rc;
signal(SIGCHLD, SIG_IGN);
for (i=0; i<100; i++) {
//rc = system("/bin/zcat a.gz > b.txt");
rc = system("ls -a");
fprintf(stderr, "system result[%3d] = %x errno=[%3d]", i, rc,
errno); perror("system");
errno = 0;
}
return 0;
}
$ gcc -o test test.c
$ ./test 2>&1 | tee test01.txt
$ ./test 2>&1 | tee test02.txt
$ ./test 2>&1 | tee test03.txt
$ ./test 2>&1 | tee test04.txt
.....
^
|
simultaneous multiple execution (about 10 - 10000)
Please reopen this issue, and read #23 and #33 well and understand its problem. The codepath in which this bug occurs is known to be a very sensitive and fragile section of code. Given where the RHEL3 product is in its lifecycle we are not receptive to making changes in this space. The risk of introducing regressions, or incompatibility issues is too high. Consequently, closing this issue as wontfix. This problem does not exist in RHEL4. |