From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1) Gecko/20030221 Description of problem: If you try to run a UML kernel on phoebe, (either a generic build (2.4.20 plus uml patch 1) or vmlinuz-2.4.18-19.8.0uml from redhat 8.0), it quickly exits with Kernel panic: outer trampoline didn't exit with SIGKILL having failed an internal safety check. The relevant bit of code generating this error is in arch/um/kernel/process.c, /* Start the process and wait for it to kill itself */ new_pid = clone(outer_tramp, (void *) sp, clone_flags, &arg); if(new_pid < 0) return(-errno); while((err = waitpid(new_pid, &status, 0) < 0) && (errno == EINTR)) ; if(err < 0) panic("Waiting for outer trampoline failed - errno = %d", errno); if(!WIFSIGNALED(status) || (WTERMSIG(status) != SIGKILL)) panic("outer trampoline didn't exit with SIGKILL"); if you hack the code a bit you find that WIFSIGNALED(status)=1 and WTERMSIG(status)=82, which doesn't make a lot of sense to me. There was no problem on 8.0 (vmlinuz-2.4.18-19.8.0uml worked unmodified) and booting the main system kernel with the nosysinfo flag makes no difference. If you disable the test altogether the uml system boots normally. I have observed the bug with several kernel/glibc versions up to kernel-2.4.20-2.49 and glibc-2.3.1-51 Steps to Reproduce: 1. Try to boot a uml kernel (no uml file system needed as it doesn't get that far!)
wouldn't be surprised if uml has a signal bug here; if it has SIGCHILD set to SIG_IGN then waitpid is a nop....
Yes. It looks like status is unchanged by waitpid (eg. if you set it explicitly beforehand, the numbers change), and that SIGCHLD is set to SIG_IGN at least some of the time - explicitly setting it to SIG_DFL removes the warnings.
that's an application bug. the kernel will even printk a warning for it ;) basically waitpid() while SIGCHILD is SIG_IGN is undefined behavior, you can either get your child, or if timing is unlucky, the child is reaped by init (which is the posix specified behavior of SIG_IGN SIGCHILD) before you hit waitpid(). NPTL changed the timinig of this so the later is more happening more frequent.
Fixed in uml-patch-2.4.20-4.