Bug 85559

Summary: waitpid produces strange results
Product: [Retired] Red Hat Linux Reporter: Michael Young <m.a.young>
Component: kernelAssignee: Arjan van de Ven <arjanv>
Status: CLOSED CURRENTRELEASE QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: medium    
Version: 9   
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2003-04-24 18:22:47 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Michael Young 2003-03-04 12:29:37 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1) Gecko/20030221

Description of problem:
If you try to run a UML kernel on phoebe, (either a generic build (2.4.20 plus
uml patch 1) or vmlinuz-2.4.18-19.8.0uml from redhat 8.0), it quickly exits with
Kernel panic: outer trampoline didn't exit with SIGKILL
having failed an internal safety check.
The relevant bit of code generating this error is in arch/um/kernel/process.c, 

        /* Start the process and wait for it to kill itself */
        new_pid = clone(outer_tramp, (void *) sp, clone_flags, &arg);
        if(new_pid < 0) return(-errno);
        while((err = waitpid(new_pid, &status, 0) < 0) && (errno == EINTR)) ;
        if(err < 0) panic("Waiting for outer trampoline failed - errno = %d",
                          errno);
        if(!WIFSIGNALED(status) || (WTERMSIG(status) != SIGKILL))
                panic("outer trampoline didn't exit with SIGKILL");

if you hack the code a bit you find that
WIFSIGNALED(status)=1 and WTERMSIG(status)=82, which doesn't make a lot of sense
to me. There was no problem on 8.0 (vmlinuz-2.4.18-19.8.0uml worked unmodified)
and booting the main system kernel with the nosysinfo flag makes no difference.
If you disable the test altogether the uml system boots normally.

I have observed the bug with several kernel/glibc versions up to
kernel-2.4.20-2.49 and glibc-2.3.1-51

Steps to Reproduce:
1. Try to boot a uml kernel (no uml file system needed as it doesn't get that far!)

Comment 1 Arjan van de Ven 2003-03-04 12:32:34 UTC
wouldn't be surprised if uml has a signal bug here; if it has SIGCHILD set to
SIG_IGN then waitpid is a nop....

Comment 2 Michael Young 2003-03-07 10:26:47 UTC
Yes. It looks like status is unchanged by waitpid (eg. if you set it explicitly
beforehand, the numbers change), and that SIGCHLD is set to SIG_IGN at least
some of the time - explicitly setting it to SIG_DFL removes the warnings.

Comment 3 Arjan van de Ven 2003-03-07 11:08:13 UTC
that's an application bug. the kernel will even printk a warning for it ;)
basically waitpid() while SIGCHILD is SIG_IGN is undefined behavior, you can
either get your child, or if timing is unlucky, the child is reaped by init
(which is the posix specified behavior of SIG_IGN SIGCHILD) before you hit
waitpid(). NPTL changed the timinig of this so the later is more happening more
frequent.

Comment 4 Michael Young 2003-04-24 18:22:47 UTC
Fixed in uml-patch-2.4.20-4.