Red Hat Bugzilla – Bug 85559
waitpid produces strange results
Last modified: 2007-04-18 12:51:50 EDT
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1) Gecko/20030221
Description of problem:
If you try to run a UML kernel on phoebe, (either a generic build (2.4.20 plus
uml patch 1) or vmlinuz-2.4.18-19.8.0uml from redhat 8.0), it quickly exits with
Kernel panic: outer trampoline didn't exit with SIGKILL
having failed an internal safety check.
The relevant bit of code generating this error is in arch/um/kernel/process.c,
/* Start the process and wait for it to kill itself */
new_pid = clone(outer_tramp, (void *) sp, clone_flags, &arg);
if(new_pid < 0) return(-errno);
while((err = waitpid(new_pid, &status, 0) < 0) && (errno == EINTR)) ;
if(err < 0) panic("Waiting for outer trampoline failed - errno = %d",
if(!WIFSIGNALED(status) || (WTERMSIG(status) != SIGKILL))
panic("outer trampoline didn't exit with SIGKILL");
if you hack the code a bit you find that
WIFSIGNALED(status)=1 and WTERMSIG(status)=82, which doesn't make a lot of sense
to me. There was no problem on 8.0 (vmlinuz-2.4.18-19.8.0uml worked unmodified)
and booting the main system kernel with the nosysinfo flag makes no difference.
If you disable the test altogether the uml system boots normally.
I have observed the bug with several kernel/glibc versions up to
kernel-2.4.20-2.49 and glibc-2.3.1-51
Steps to Reproduce:
1. Try to boot a uml kernel (no uml file system needed as it doesn't get that far!)
wouldn't be surprised if uml has a signal bug here; if it has SIGCHILD set to
SIG_IGN then waitpid is a nop....
Yes. It looks like status is unchanged by waitpid (eg. if you set it explicitly
beforehand, the numbers change), and that SIGCHLD is set to SIG_IGN at least
some of the time - explicitly setting it to SIG_DFL removes the warnings.
that's an application bug. the kernel will even printk a warning for it ;)
basically waitpid() while SIGCHILD is SIG_IGN is undefined behavior, you can
either get your child, or if timing is unlucky, the child is reaped by init
(which is the posix specified behavior of SIG_IGN SIGCHILD) before you hit
waitpid(). NPTL changed the timinig of this so the later is more happening more
Fixed in uml-patch-2.4.20-4.