From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6) Gecko/20040506 Firefox/0.8 Description of problem: Application 'nxserver' (compiled for RH9) is exiting unexpectantly immediately after the following error message from the kernel is displayed: kernel: application bug: nxserver(4788) has SIGCHLD set to SIG_IGN bu t calls wait(). Jul 14 11:12:17 is-fletch kernel: (see the NOTES section of 'man 2 wait'). Workaround activated . I understand that you are not supporting NX, but something you have changed in the kernel may well be breaking a properly working application. Could it be something that was backported into the 2.4.21 kernel? Sometimes the parent process that is involved in this call considers the child to have closed normally and reports the child process terminating normally. Sometimes the parent process looses track of the child process completely -- whether that is due to the particular workaround chosen or whether the child process actually aborts is anyone's guess. Possibly it's related to the the child process exiting before the call to waitpid() is even initiated by the parent. Maybe some new behavior exibited in the signal handling prevents a zombie from being created. (I've read the notes in the man page as well as much as I could fine about this issue online). The vendor of 'nx' (nomachine) is participating in looking for the source of this bug. Version-Release number of selected component (if applicable): kernel-2.4.21-15-ELsmp How reproducible: Always Steps to Reproduce: 1. Install RHEL3 2. Download the eval 'nxserver' and nxclient from www.nomachine.com 3. Install them an establish a connection from a remote machine to the server you just configured. 4. Observe /var/log/messages and notice that correctly authenticated session never startup an X environment. They either shut down immediately or hang for around 60 seconds and then shutdown. Actual Results: See above. Expected Results: An X Windows login session should have been established and the selected X environment should have started up (either Gnome, KDE or other) Additional info: I have seen references to this kernel message elsewhere, but very little specific information on the workaround.
Hello, Jim. When the 'nxserver' application is compiled on RHEL3, does the problem still occur? (I'm not sure whether we guarantee application-binary-compatibility with RHL 9, but it would be nice to remove this variable from the equation.)
the sigchld issue is that it's not valid to call wait() (and by extension library functions that call wait) when you've set SIGCHILD to SIGIGN. That will cause deadlocks in case the child gets reaped by init before the wait() executes. Older kernels sort of kinda tolerated this, NPTL does not, but tries to work around it somewhat in our kernels.