Description of problem: wait* syscall can sometimes return ECHILD though a multithreaded process child exists and is in the process of exitting. Version-Release number of selected component (if applicable): 2.6.9 or thereabouts. How reproducible: test program does 50 runs, usually happens in < 10 on 2-CPU machine Steps to Reproduce: 1. gcc -g waitpidbug.c -lpthread -o waitpidbug 2. ./waitpidbug 3. Actual results: Will eventually say "Kill failed!", exit 1. Expected results: No such error from test program, exit 0. Additional info: Fix on the way.
Created attachment 107907 [details] test case for wait bug
Created attachment 107908 [details] proposed fix I've just posted this fix upstream.
Seems to work nicely with 2.6.9-1.1032_FC4smp. But I only tested the SMP kernel on a UP HT machine.
Patch is in 2.6.10-rc3-mm1 tree upstream, not yet in Linus tree.
This went into Linus's kernel and should be in 2.6.10 when released.
2.6.10 has this fixed.