Bug 166669
| Summary: | [RHEL3 U5] waitpid() returns unexpected ECHILD | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 3 | Reporter: | Issue Tracker <tao> | ||||||
| Component: | kernel | Assignee: | Ingo Molnar <mingo> | ||||||
| Status: | CLOSED WONTFIX | QA Contact: | Brian Brock <bbrock> | ||||||
| Severity: | medium | Docs Contact: | |||||||
| Priority: | medium | ||||||||
| Version: | 3.0 | CC: | lwang, maeno.masaki, pcormier, petrides, riek, tao | ||||||
| Target Milestone: | --- | ||||||||
| Target Release: | --- | ||||||||
| Hardware: | All | ||||||||
| OS: | Linux | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | RHSA-2006-0144 | Doc Type: | Bug Fix | ||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2006-05-08 11:59:47 UTC | Type: | --- | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Attachments: |
|
||||||||
|
Description
Issue Tracker
2005-08-24 14:48:39 UTC
Created attachment 118063 [details]
test case for the bug
Created attachment 118759 [details]
A simple test case from IT 72214
1) Compile with "gcc -O2 -o killipf killipf.c -lpthread"
2) Run with "PASS=0; while ./killipf; do let PASS=++PASS; echo $PASS; done"
3) Program dies occasionally with "waitpid failed!: No child processes"
4) With the Fujitsu patch, the program loops correctly forever with output:
child pid:xxxxx
Status returned:9
PASS : received expected signal 9
A fix for this problem has just been committed to the RHEL3 U7 patch pool this evening (in kernel version 2.4.21-37.5.EL). Please give me the beta-kernel (kernel-2.4.21-37.EL --> kernel-2.4.21-37.5.EL) included the patch. We encounter the problem that looks like. We try to use the patch. I found kernel-2.4.21-37.6.EL in ftp.pbone.net. We try to use kernel-2.4.21-37.6.EL.src.rpm. ftp://ftp.pbone.net/mirror/people-redhat.com/zaitcev/171129/kernel-2.4.21- 37.6.EL.bz171129.1.src.rpm $ cat test.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <signal.h>
#include <errno.h>
#include <unistd.h>
int main(int argc, char *argv[])
{
int i, rc;
signal(SIGCHLD, SIG_IGN);
for (i=0; i<100; i++) {
//rc = system("/bin/zcat a.gz > b.txt");
rc = system("ls -a");
fprintf(stderr, "system result[%3d] = %x errno=[%3d]", i, rc,
errno); perror("system");
errno = 0;
}
return 0;
}
$ gcc -o test test.c
$ ./test 2>&1 | tee test.txt
rt_sigaction(SIGINT, {SIG_IGN}, {SIG_DFL}, 8) = 0
rt_sigaction(SIGQUIT, {SIG_IGN}, {SIG_DFL}, 8) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
clone(child_stack=0, flags=CLONE_PARENT_SETTID|SIGCHLD,
parent_tidptr=0xbfffcad8) = 13409
.
..
ignore.txt
ignore2
ignore2.c
ignore4
ignore4.c
ignore_st.log
waitpid(13409, 0xbfffcad4, 0) = -1 ECHILD (No child processes)
rt_sigaction(SIGINT, {SIG_DFL}, NULL, 8) = 0
rt_sigaction(SIGQUIT, {SIG_DFL}, NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
write(2, "system result[ 97] = ffffffff er"..., 41system result[ 97] = ffffffff
errno=[ 10]) = 41
write(2, "system: No child processes\n", 27system: No child processes
) = 27
rt_sigaction(SIGINT, {SIG_IGN}, {SIG_DFL}, 8) = 0
rt_sigaction(SIGQUIT, {SIG_IGN}, {SIG_DFL}, 8) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
clone(child_stack=0, flags=CLONE_PARENT_SETTID|SIGCHLD,
parent_tidptr=0xbfffcad8) = 13410
waitpid(13410, .
..
ignore.txt
ignore2
ignore2.c
ignore4
ignore4.c
ignore_st.log
[WIFEXITED(s) && WEXITSTATUS(s) == 0], 0) = 13410
rt_sigaction(SIGINT, {SIG_DFL}, NULL, 8) = 0
rt_sigaction(SIGQUIT, {SIG_DFL}, NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
--- SIGCHLD (Child exited) @ 0 (0) ---
write(2, "system result[ 98] = 0 errno=[ "..., 34system result[ 98] = 0 errno=
[ 0]) = 34
write(2, "system: Success\n", 16system: Success
) = 16
rt_sigaction(SIGINT, {SIG_IGN}, {SIG_DFL}, 8) = 0
rt_sigaction(SIGQUIT, {SIG_IGN}, {SIG_DFL}, 8) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
clone(child_stack=0, flags=CLONE_PARENT_SETTID|SIGCHLD,
parent_tidptr=0xbfffcad8) = 13411
waitpid(13411, .
..
ignore.txt
ignore2
ignore2.c
ignore4
ignore4.c
ignore_st.log
[WIFEXITED(s) && WEXITSTATUS(s) == 0], 0) = 13411
rt_sigaction(SIGINT, {SIG_DFL}, NULL, 8) = 0
rt_sigaction(SIGQUIT, {SIG_DFL}, NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
--- SIGCHLD (Child exited) @ 0 (0) ---
write(2, "system result[ 99] = 0 errno=[ "..., 34system result[ 99] = 0 errno=
[ 0]) = 34
write(2, "system: Success\n", 16system: Success
) = 16
exit_group(0) = ?
$
If waitpid() is executed before child process is executed, return value of
waitpid() is 0.
Otherwise, return value of waitpid() is error, errno is ECHILD.
This is contradicts SingleUNIXSpecification:
http://www.opengroup.org/onlinepubs/009695399/functions/waitpid.html
======
If the calling process has SA_NOCLDWAIT set or has SIGCHLD set to SIG_IGN, and
the process has no unwaited-for children that were transformed into zombie
processes, the calling thread shall block until all of the children of the
process containing the calling thread terminate, and wait() and waitpid() shall
fail and set errno to [ECHILD].
======
- kernel2.6 is always ECHILD. It conforms to SingleUNIXSpecification.
- kernel2.4 is always 0 if kernel2.4 is not implemented. It doesn't conform to
its specification.
- Only RHEL3-kernel is 0 and ECHILD, is not constant.
RHEL3-kernel is an imperfect backing port.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2006-0144.html > You may reopen this bug report if the solution does not work for you.
We confirmed that the problem reproduced on RHEL3U7(kernel-2.4.21-40.ELsmp)
intalled newly.
(a) "Only" RHEL3's kernel is 0 and ECHILD, is not constant.
RHEL3-kernel is an imperfect backing port from kernel2.6.
(b) kernel2.6 is always ECHILD. It conforms to SingleUNIXSpecification.
(c) kernel2.4 is always 0 if kernel2.4 is not implemented. It doesn't conform
to its specification.
We are convinced that there is the problem in wait() kernel-function.
Please confirm this problem on RHEL3U7(=not constant) and RHEL4U3(=always
ECHILD) by RedHat's engineer executes test.c(#23 or below)
$ cat test.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <signal.h>
#include <errno.h>
#include <unistd.h>
int main(int argc, char *argv[])
{
int i, rc;
signal(SIGCHLD, SIG_IGN);
for (i=0; i<100; i++) {
//rc = system("/bin/zcat a.gz > b.txt");
rc = system("ls -a");
fprintf(stderr, "system result[%3d] = %x errno=[%3d]", i, rc,
errno); perror("system");
errno = 0;
}
return 0;
}
$ gcc -o test test.c
$ ./test 2>&1 | tee test01.txt
$ ./test 2>&1 | tee test02.txt
$ ./test 2>&1 | tee test03.txt
$ ./test 2>&1 | tee test04.txt
.....
^
|
simultaneous multiple execution (about 10 - 10000)
Please reopen this issue, and read #23 and #33 well and understand its problem. The codepath in which this bug occurs is known to be a very sensitive and fragile section of code. Given where the RHEL3 product is in its lifecycle we are not receptive to making changes in this space. The risk of introducing regressions, or incompatibility issues is too high. Consequently, closing this issue as wontfix. This problem does not exist in RHEL4. |