Bug 85559 - waitpid produces strange results
Summary: waitpid produces strange results
Alias: None
Product: Red Hat Linux
Classification: Retired
Component: kernel
Version: 9
Hardware: i386
OS: Linux
Target Milestone: ---
Assignee: Arjan van de Ven
QA Contact: Brian Brock
Depends On:
TreeView+ depends on / blocked
Reported: 2003-03-04 12:29 UTC by Michael Young
Modified: 2007-04-18 16:51 UTC (History)
0 users

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Last Closed: 2003-04-24 18:22:47 UTC

Attachments (Terms of Use)

Description Michael Young 2003-03-04 12:29:37 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1) Gecko/20030221

Description of problem:
If you try to run a UML kernel on phoebe, (either a generic build (2.4.20 plus
uml patch 1) or vmlinuz-2.4.18-19.8.0uml from redhat 8.0), it quickly exits with
Kernel panic: outer trampoline didn't exit with SIGKILL
having failed an internal safety check.
The relevant bit of code generating this error is in arch/um/kernel/process.c, 

        /* Start the process and wait for it to kill itself */
        new_pid = clone(outer_tramp, (void *) sp, clone_flags, &arg);
        if(new_pid < 0) return(-errno);
        while((err = waitpid(new_pid, &status, 0) < 0) && (errno == EINTR)) ;
        if(err < 0) panic("Waiting for outer trampoline failed - errno = %d",
        if(!WIFSIGNALED(status) || (WTERMSIG(status) != SIGKILL))
                panic("outer trampoline didn't exit with SIGKILL");

if you hack the code a bit you find that
WIFSIGNALED(status)=1 and WTERMSIG(status)=82, which doesn't make a lot of sense
to me. There was no problem on 8.0 (vmlinuz-2.4.18-19.8.0uml worked unmodified)
and booting the main system kernel with the nosysinfo flag makes no difference.
If you disable the test altogether the uml system boots normally.

I have observed the bug with several kernel/glibc versions up to
kernel-2.4.20-2.49 and glibc-2.3.1-51

Steps to Reproduce:
1. Try to boot a uml kernel (no uml file system needed as it doesn't get that far!)

Comment 1 Arjan van de Ven 2003-03-04 12:32:34 UTC
wouldn't be surprised if uml has a signal bug here; if it has SIGCHILD set to
SIG_IGN then waitpid is a nop....

Comment 2 Michael Young 2003-03-07 10:26:47 UTC
Yes. It looks like status is unchanged by waitpid (eg. if you set it explicitly
beforehand, the numbers change), and that SIGCHLD is set to SIG_IGN at least
some of the time - explicitly setting it to SIG_DFL removes the warnings.

Comment 3 Arjan van de Ven 2003-03-07 11:08:13 UTC
that's an application bug. the kernel will even printk a warning for it ;)
basically waitpid() while SIGCHILD is SIG_IGN is undefined behavior, you can
either get your child, or if timing is unlucky, the child is reaped by init
(which is the posix specified behavior of SIG_IGN SIGCHILD) before you hit
waitpid(). NPTL changed the timinig of this so the later is more happening more

Comment 4 Michael Young 2003-04-24 18:22:47 UTC
Fixed in uml-patch-2.4.20-4.

Note You need to log in before you can comment on or make changes to this bug.