Red Hat Bugzilla – Bug 141896
false ECHILD result from wait* with zombie group leader
Last modified: 2007-11-30 17:10:56 EST
Description of problem:
wait* syscall can sometimes return ECHILD though a multithreaded process child
exists and is in the process of exitting.
Version-Release number of selected component (if applicable):
2.6.9 or thereabouts.
test program does 50 runs, usually happens in < 10 on 2-CPU machine
Steps to Reproduce:
1. gcc -g waitpidbug.c -lpthread -o waitpidbug
Will eventually say "Kill failed!", exit 1.
No such error from test program, exit 0.
Fix on the way.
Created attachment 107907 [details]
test case for wait bug
Created attachment 107908 [details]
I've just posted this fix upstream.
Seems to work nicely with 2.6.9-1.1032_FC4smp. But I only tested the SMP kernel
on a UP HT machine.
Patch is in 2.6.10-rc3-mm1 tree upstream, not yet in Linus tree.
This went into Linus's kernel and should be in 2.6.10 when released.
2.6.10 has this fixed.