Bug 168775
Summary: | wait() and waitpid() return inconsistencies under high load | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 4 | Reporter: | Ionut Leonte <ileonte> | ||||
Component: | kernel | Assignee: | Kernel Maintainer List <kernel-maint> | ||||
Status: | CLOSED ERRATA | QA Contact: | Brian Brock <bbrock> | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | medium | ||||||
Version: | 4.0 | CC: | drepper, mihai, mingo | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | i686 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | RHSA-2006-0132 | Doc Type: | Bug Fix | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2006-03-07 20:08:28 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 168429 | ||||||
Attachments: |
|
Created attachment 119018 [details]
vfork() a child which fork()'s another
note the (sometimes) incorrect behaviour of waitpid() in run_level1()
thanks for the test case. this looks a lot like: bug 166454. In fact i'm going to proactively dup it. We can un-dup it later, if i'm wrong. *** This bug has been marked as a duplicate of 166454 *** We did some more testing (at BitDefender) using the Fedora Development kernels and found out that the last kernel version we tried did not have this bug. Unfortunately I do not remember the precise version of the kernel from FC Devel we used, but I can give you a hint: it was released in the same period of time we submitted this bug. Can you please try -22.3 at: http://people.redhat.com/~jbaron/rhel4/ thanks. I can confirm that the bug is gone in the -22.3 release. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2006-0132.html |
From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.10) Gecko/20050716 Firefox/1.0.6 Description of problem: On certain ocasions the wait() and waitpid() calls erroneously generate an ECHILD (No child processes) error when in fact a child process was successfuly (v)fork()-ed and is running. Version-Release number of selected component (if applicable): kernel-smp-2.6.9-11.EL How reproducible: Always Steps to Reproduce: 1. compile the atttached program: gcc -W -Wall -g3 -O0 -o test-bin forktest.c -lpthread 2. create a script called 'test-thread' with the following contents: <-------------------- CUT HERE -----------------------> #/bin/bash for i in `seq 1000`; do ./test-bin 1 ; done <--------------------- END CUT -----------------------> 3. create a second script called 'test-main' with the following contents: <-------------------- CUT HERE -----------------------> #!/bin/bash for i in `seq 20`; do ./test-thread & done read ABCD killall -9 test-thread killall -9 test-bin <--------------------- END CUT -----------------------> 4. execute the following command: ./test-main | grep ">" Actual Results: ....................................................................... >>>>>>>>>>>>>>>>>>>>>>>>>> run_level1() waitpid surprise: No child processes >>>>>>>>>>>>>>>>>>>>>>>>>> kill( [PID_OF_CHILD], 0 ): 0 Success ...................... (repeated) ..................................... Expected Results: The output of 'test-main' should not contain any lines with '>' characters... Additional info: 1. the call to kill() is always successful, thus confirming that the process exists and is running 2. the problem seems to only occur under high load. The speed at which processes are spawned also seems to be an important factor: the slower the rate the harder it is to reproduce the problem The previous kernel version (kernel-smp-2.6.9-5.EL) does not seem to be affected by this issue.