Bug 166669 - [RHEL3 U5] waitpid() returns unexpected ECHILD
[RHEL3 U5] waitpid() returns unexpected ECHILD
Status: CLOSED WONTFIX
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: kernel (Show other bugs)
3.0
All Linux
medium Severity medium
: ---
: ---
Assigned To: Ingo Molnar
Brian Brock
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2005-08-24 10:48 EDT by Issue Tracker
Modified: 2013-01-10 16:42 EST (History)
6 users (show)

See Also:
Fixed In Version: RHSA-2006-0144
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2006-05-08 07:59:47 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
test case for the bug (3.59 KB, application/octet-stream)
2005-08-24 10:52 EDT, Jatin Nansi
no flags Details
A simple test case from IT 72214 (1.66 KB, text/x-c)
2005-09-13 12:26 EDT, Wendy Cheng
no flags Details

  None (edit)
Description Issue Tracker 2005-08-24 10:48:39 EDT
Escalated to Bugzilla from IssueTracker
Comment 3 Jatin Nansi 2005-08-24 10:52:02 EDT
Created attachment 118063 [details]
test case for the bug
Comment 8 Wendy Cheng 2005-09-13 12:26:22 EDT
Created attachment 118759 [details]
A simple test case from IT 72214

1) Compile with "gcc -O2 -o killipf killipf.c -lpthread"
2) Run with "PASS=0; while ./killipf; do let PASS=++PASS; echo $PASS; done"
3) Program dies occasionally with "waitpid failed!: No child processes" 
4) With the Fujitsu patch, the program loops correctly forever with output:
child pid:xxxxx
Status returned:9
PASS : received expected signal 9
Comment 12 Ernie Petrides 2005-10-07 22:11:35 EDT
A fix for this problem has just been committed to the RHEL3 U7
patch pool this evening (in kernel version 2.4.21-37.5.EL).
Comment 19 Masaki MAENO 2005-12-01 04:59:58 EST
Please give me the beta-kernel (kernel-2.4.21-37.EL --> kernel-2.4.21-37.5.EL) 
included the patch.

We encounter the problem that looks like. We try to use the patch.
Comment 20 Masaki MAENO 2005-12-04 20:21:45 EST
I found kernel-2.4.21-37.6.EL in ftp.pbone.net.
We try to use kernel-2.4.21-37.6.EL.src.rpm.

ftp://ftp.pbone.net/mirror/people-redhat.com/zaitcev/171129/kernel-2.4.21-
37.6.EL.bz171129.1.src.rpm
Comment 23 Masaki MAENO 2006-01-30 23:14:34 EST
$ cat test.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <signal.h>
#include <errno.h>
#include <unistd.h>
 
int main(int argc, char *argv[])
{
    int i, rc;
 
    signal(SIGCHLD, SIG_IGN);
 
    for (i=0; i<100; i++) {
        //rc = system("/bin/zcat a.gz > b.txt");
        rc = system("ls -a");
        fprintf(stderr, "system result[%3d] = %x errno=[%3d]", i, rc, 
errno);        perror("system");
        errno = 0;
    }
 
    return 0;
}

$ gcc -o test test.c
$ ./test 2>&1 | tee test.txt
rt_sigaction(SIGINT, {SIG_IGN}, {SIG_DFL}, 8) = 0
rt_sigaction(SIGQUIT, {SIG_IGN}, {SIG_DFL}, 8) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
clone(child_stack=0, flags=CLONE_PARENT_SETTID|SIGCHLD, 
parent_tidptr=0xbfffcad8) = 13409
.
..
ignore.txt
ignore2
ignore2.c
ignore4
ignore4.c
ignore_st.log
waitpid(13409, 0xbfffcad4, 0)           = -1 ECHILD (No child processes)
rt_sigaction(SIGINT, {SIG_DFL}, NULL, 8) = 0
rt_sigaction(SIGQUIT, {SIG_DFL}, NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
write(2, "system result[ 97] = ffffffff er"..., 41system result[ 97] = ffffffff 
errno=[ 10]) = 41
write(2, "system: No child processes\n", 27system: No child processes
) = 27
rt_sigaction(SIGINT, {SIG_IGN}, {SIG_DFL}, 8) = 0
rt_sigaction(SIGQUIT, {SIG_IGN}, {SIG_DFL}, 8) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
clone(child_stack=0, flags=CLONE_PARENT_SETTID|SIGCHLD, 
parent_tidptr=0xbfffcad8) = 13410
waitpid(13410, .
..
ignore.txt
ignore2
ignore2.c
ignore4
ignore4.c
ignore_st.log
[WIFEXITED(s) && WEXITSTATUS(s) == 0], 0) = 13410
rt_sigaction(SIGINT, {SIG_DFL}, NULL, 8) = 0
rt_sigaction(SIGQUIT, {SIG_DFL}, NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
--- SIGCHLD (Child exited) @ 0 (0) ---
write(2, "system result[ 98] = 0 errno=[  "..., 34system result[ 98] = 0 errno=
[  0]) = 34
write(2, "system: Success\n", 16system: Success
)       = 16
rt_sigaction(SIGINT, {SIG_IGN}, {SIG_DFL}, 8) = 0
rt_sigaction(SIGQUIT, {SIG_IGN}, {SIG_DFL}, 8) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
clone(child_stack=0, flags=CLONE_PARENT_SETTID|SIGCHLD, 
parent_tidptr=0xbfffcad8) = 13411
waitpid(13411, .
..
ignore.txt
ignore2
ignore2.c
ignore4
ignore4.c
ignore_st.log
[WIFEXITED(s) && WEXITSTATUS(s) == 0], 0) = 13411
rt_sigaction(SIGINT, {SIG_DFL}, NULL, 8) = 0
rt_sigaction(SIGQUIT, {SIG_DFL}, NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
--- SIGCHLD (Child exited) @ 0 (0) ---
write(2, "system result[ 99] = 0 errno=[  "..., 34system result[ 99] = 0 errno=
[  0]) = 34
write(2, "system: Success\n", 16system: Success
)       = 16
exit_group(0)                           = ?
$

If waitpid() is executed before child process is executed, return value of 
waitpid() is 0.
Otherwise, return value of waitpid() is error, errno is ECHILD.

This is contradicts SingleUNIXSpecification: 
http://www.opengroup.org/onlinepubs/009695399/functions/waitpid.html
======
If the calling process has SA_NOCLDWAIT set or has SIGCHLD set to SIG_IGN, and 
the process has no unwaited-for children that were transformed into zombie 
processes, the calling thread shall block until all of the children of the 
process containing the calling thread terminate, and wait() and waitpid() shall 
fail and set errno to [ECHILD]. 
======

- kernel2.6 is always ECHILD. It conforms to SingleUNIXSpecification.
- kernel2.4 is always 0 if kernel2.4 is not implemented. It doesn't conform to 
its specification.
- Only RHEL3-kernel is 0 and ECHILD, is not constant. 
  RHEL3-kernel is an imperfect backing port. 
Comment 29 Red Hat Bugzilla 2006-03-15 11:29:06 EST
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2006-0144.html
Comment 33 Masaki MAENO 2006-03-17 02:58:38 EST
> You may reopen this bug report if the solution does not work for you.
We confirmed that the problem reproduced on RHEL3U7(kernel-2.4.21-40.ELsmp) 
intalled newly.

(a) "Only" RHEL3's kernel is 0 and ECHILD, is not constant. 
    RHEL3-kernel is an imperfect backing port from kernel2.6. 
(b) kernel2.6 is always ECHILD. It conforms to SingleUNIXSpecification.
(c) kernel2.4 is always 0 if kernel2.4 is not implemented. It doesn't conform 
   to its specification.

We are convinced that there is the problem in wait() kernel-function.
Please confirm this problem on RHEL3U7(=not constant) and RHEL4U3(=always 
ECHILD) by RedHat's engineer executes test.c(#23 or below)

$ cat test.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <signal.h>
#include <errno.h>
#include <unistd.h>
 
int main(int argc, char *argv[])
{
    int i, rc;
 
    signal(SIGCHLD, SIG_IGN);
 
    for (i=0; i<100; i++) {
        //rc = system("/bin/zcat a.gz > b.txt");
        rc = system("ls -a");
        fprintf(stderr, "system result[%3d] = %x errno=[%3d]", i, rc, 
errno);        perror("system");
        errno = 0;
    }
 
    return 0;
}

$ gcc -o test test.c

$ ./test 2>&1 | tee test01.txt
$ ./test 2>&1 | tee test02.txt
$ ./test 2>&1 | tee test03.txt
$ ./test 2>&1 | tee test04.txt
    .....
 ^
 |
simultaneous multiple execution (about 10 - 10000)
Comment 34 Masaki MAENO 2006-03-17 03:04:46 EST
Please reopen this issue, and read #23 and #33 well and understand its problem.
Comment 37 Tim Burke 2006-05-08 07:59:47 EDT
The codepath in which this bug occurs is known to be a very sensitive and
fragile section of code.  Given where the RHEL3 product is in its lifecycle we
are not receptive to making changes in this space. The risk of introducing
regressions, or incompatibility issues is too high.  Consequently, closing this
issue as wontfix.

This problem does not exist in RHEL4.

Note You need to log in before you can comment on or make changes to this bug.