Bug 168775 - wait() and waitpid() return inconsistencies under high load
wait() and waitpid() return inconsistencies under high load
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel (Show other bugs)
i686 Linux
medium Severity medium
: ---
: ---
Assigned To: Kernel Maintainer List
Brian Brock
Depends On:
Blocks: 168429
  Show dependency treegraph
Reported: 2005-09-20 05:59 EDT by Ionut Leonte
Modified: 2007-11-30 17:07 EST (History)
3 users (show)

See Also:
Fixed In Version: RHSA-2006-0132
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2006-03-07 15:08:28 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
vfork() a child which fork()'s another (3.01 KB, text/plain)
2005-09-20 06:02 EDT, Ionut Leonte
no flags Details

  None (edit)
Description Ionut Leonte 2005-09-20 05:59:19 EDT
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.10) Gecko/20050716 Firefox/1.0.6

Description of problem:
On certain ocasions the wait() and waitpid() calls erroneously generate an ECHILD (No child processes)
error when in fact a child process was successfuly (v)fork()-ed and is running.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1. compile the atttached program:
        gcc -W -Wall -g3 -O0 -o test-bin forktest.c -lpthread

2. create a script called 'test-thread' with the following contents:
        <-------------------- CUT HERE ----------------------->
        for i in `seq 1000`; do ./test-bin 1 ; done
        <--------------------- END CUT ----------------------->

3. create a second script called 'test-main' with the following contents:
        <-------------------- CUT HERE ----------------------->

        for i in `seq 20`; do
            ./test-thread &

        read ABCD

        killall -9 test-thread
        killall -9 test-bin
        <--------------------- END CUT ----------------------->

4. execute the following command:
        ./test-main | grep ">"

Actual Results:  .......................................................................
>>>>>>>>>>>>>>>>>>>>>>>>>> run_level1() waitpid surprise: No child processes
>>>>>>>>>>>>>>>>>>>>>>>>>> kill( [PID_OF_CHILD], 0 ): 0 Success
...................... (repeated) .....................................

Expected Results:  The output of 'test-main' should not contain any lines with '>' characters...

Additional info:

1. the call to kill() is always successful, thus confirming that the process exists and is running

2. the problem seems to only occur under high load. The speed at which processes are spawned also seems to be an important factor: the slower the rate the harder it is to reproduce the problem

The previous kernel version (kernel-smp-2.6.9-5.EL) does not seem to be affected by this issue.
Comment 1 Ionut Leonte 2005-09-20 06:02:51 EDT
Created attachment 119018 [details]
vfork() a child which fork()'s another

note the (sometimes) incorrect behaviour of waitpid() in run_level1()
Comment 4 Jason Baron 2005-10-10 15:10:03 EDT
thanks for the test case. this looks a lot like: bug 166454. In fact i'm going
to proactively dup it. We can un-dup it later, if i'm wrong.

*** This bug has been marked as a duplicate of 166454 ***
Comment 5 Mihai Maties 2005-10-11 07:21:42 EDT
We did some more testing (at BitDefender) using the Fedora Development kernels    
and found out that the last kernel version we tried did not have this bug.   
Unfortunately I do not remember the precise version of the kernel from FC 
Devel we used, but I can give you a hint: it was released in the same period 
of time we submitted this bug.   
Comment 6 Jason Baron 2005-10-12 13:13:44 EDT
Can you please try -22.3 at: http://people.redhat.com/~jbaron/rhel4/ thanks.
Comment 7 Mihai Maties 2005-10-13 09:32:20 EDT
I can confirm that the bug is gone in the -22.3 release. 
Comment 9 Red Hat Bugzilla 2006-03-07 15:08:28 EST
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.


Note You need to log in before you can comment on or make changes to this bug.