Bug 85559 - waitpid produces strange results
waitpid produces strange results
Status: CLOSED CURRENTRELEASE
Product: Red Hat Linux
Classification: Retired
Component: kernel (Show other bugs)
9
i386 Linux
medium Severity medium
: ---
: ---
Assigned To: Arjan van de Ven
Brian Brock
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2003-03-04 07:29 EST by Michael Young
Modified: 2007-04-18 12:51 EDT (History)
0 users

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2003-04-24 14:22:47 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)

  None (edit)
Description Michael Young 2003-03-04 07:29:37 EST
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1) Gecko/20030221

Description of problem:
If you try to run a UML kernel on phoebe, (either a generic build (2.4.20 plus
uml patch 1) or vmlinuz-2.4.18-19.8.0uml from redhat 8.0), it quickly exits with
Kernel panic: outer trampoline didn't exit with SIGKILL
having failed an internal safety check.
The relevant bit of code generating this error is in arch/um/kernel/process.c, 

        /* Start the process and wait for it to kill itself */
        new_pid = clone(outer_tramp, (void *) sp, clone_flags, &arg);
        if(new_pid < 0) return(-errno);
        while((err = waitpid(new_pid, &status, 0) < 0) && (errno == EINTR)) ;
        if(err < 0) panic("Waiting for outer trampoline failed - errno = %d",
                          errno);
        if(!WIFSIGNALED(status) || (WTERMSIG(status) != SIGKILL))
                panic("outer trampoline didn't exit with SIGKILL");

if you hack the code a bit you find that
WIFSIGNALED(status)=1 and WTERMSIG(status)=82, which doesn't make a lot of sense
to me. There was no problem on 8.0 (vmlinuz-2.4.18-19.8.0uml worked unmodified)
and booting the main system kernel with the nosysinfo flag makes no difference.
If you disable the test altogether the uml system boots normally.

I have observed the bug with several kernel/glibc versions up to
kernel-2.4.20-2.49 and glibc-2.3.1-51

Steps to Reproduce:
1. Try to boot a uml kernel (no uml file system needed as it doesn't get that far!)
Comment 1 Arjan van de Ven 2003-03-04 07:32:34 EST
wouldn't be surprised if uml has a signal bug here; if it has SIGCHILD set to
SIG_IGN then waitpid is a nop....
Comment 2 Michael Young 2003-03-07 05:26:47 EST
Yes. It looks like status is unchanged by waitpid (eg. if you set it explicitly
beforehand, the numbers change), and that SIGCHLD is set to SIG_IGN at least
some of the time - explicitly setting it to SIG_DFL removes the warnings.
Comment 3 Arjan van de Ven 2003-03-07 06:08:13 EST
that's an application bug. the kernel will even printk a warning for it ;)
basically waitpid() while SIGCHILD is SIG_IGN is undefined behavior, you can
either get your child, or if timing is unlucky, the child is reaped by init
(which is the posix specified behavior of SIG_IGN SIGCHILD) before you hit
waitpid(). NPTL changed the timinig of this so the later is more happening more
frequent.
Comment 4 Michael Young 2003-04-24 14:22:47 EDT
Fixed in uml-patch-2.4.20-4.

Note You need to log in before you can comment on or make changes to this bug.