From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.3) Gecko/20041020 Description of problem: When an application opens a device, it may result in the kernel loading a new module. Internally, the kernel calls kernel_thread() and waitpid(). The problem is that if the application has SIGCHLD set to SIG_IGN, sys_wait4 issues this complaint: application bug: <cmd>(<pid>) has SIGCHLD set to SIG_IGN but calls wait(). (see the NOTES section of 'man 2 wait'). Workaround activated. The message is wrong: the application doesn't call wait(). It calls open(). This isn't a bug because there is no rule against having SIGCHLD set to SIG_IGN when calling open(). Version-Release number of selected component (if applicable): kernel-2.4.21-20.EL How reproducible: Always Steps to Reproduce: 1. Compile this program: #include <unistd.h> #include <fcntl.h> #include <signal.h> int main() { signal(SIGCHLD, SIG_IGN); open("/dev/cdrom", O_RDONLY | O_NONBLOCK); } 2. # rmmod ide-cd 3. # ./rhel_bug 4. # tail /var/log/messages Actual Results: Nov 23 10:59:38 redhat10 kernel: ide-floppy driver 0.99.newide Nov 23 10:59:38 redhat10 kernel: application bug: rhel_bug(6919) has SIGCHLD set to SIG_IGN but calls wait(). Nov 23 10:59:38 redhat10 kernel: (see the NOTES section of 'man 2 wait'). Workaround activated. Nov 23 10:59:38 redhat10 kernel: hda: attached ide-cdrom driver. Nov 23 10:59:38 redhat10 kernel: hda: ATAPI 24X DVD-ROM drive, 192kB Cache, UDMA(33) Expected Results: No complaint should have been issued. Additional info:
Hello, Lev. Thanks for your bug report. The kernel messages are produced by the handler for the waitpid() or wait4() system calls. Although your test program doesn't invoke them directly, there is a hook from misc_open() to request_module(), which calls waitpid() from within the kernel. I think we need to save, modify, and restore the SIGCHLD disposition there in order to make this work reliably. Ingo, could you please look into this? It looks like the same bug is upstream in 2.4. The 2.6 code is significantly different in this area.
If we need to save/modify/restore SIGCHLD handler, then the existing code already does that, but the message it prints is still wrong (since the application has no bug). But my understanding is that we don't need to touch SIGCHLD handler because the thread created in request_module() is a "clone" child and doesn't use SIGCHLD. This is corroborated by the fact that my test program works fine on SLES8, which does not have RHEL3's workaround. (If SIGCHLD was being used and ignored, waitpid() would return -ECHILD.)
Patch has been posted by Ingo on 4-Jan-2004. Reassigning to him. I will set this to MODIFIED when the patch is committed to CVS.
A fix for this problem has just been committed to the RHEL3 U5 patch pool this evening (in kernel version 2.4.21-27.8.EL).
*** Bug 156617 has been marked as a duplicate of this bug. ***
where do i find the patch?
The U5 kernel (2.4.21-32.EL) is in the RHN beta channels now. It is scheduled for release in two weeks.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2005-294.html
Kernel wrongly complains about application bug when loading modules In U5 Kernel 2.4.21-32.0.1.ELsmp also this error message occurs but after applying update level 5 error message started to appear in dmesg previously it was in /var/log/messages . DMESG #### application bug: XXXXXX(25546) has SIGCHLD set to SIG_IGN but calls wait(). (see the NOTES section of 'man 2 wait'). Workaround activated. #### Please let us know any patch available for this
LinuxSystem: May your application truely has a bug in it... and the kernel is then right to complain. If that is the case the application should be fixed.
Please use "dmesg -nX" to filter messages less important than you're interested.
dmesg -nX only reports klogctl: Invalid argument That is a common output on all machines. So does it mean that we could ignore the application bug error that comes in normal dmesg | grep <application name>?
Assuming that the application is not performing wait()/wait4()/waitpid() system calls while SIGCHLD signals are being ignored, yes.
I beg to differ with the whole reasoning behind the code. Consider these: 1. An _application_ bug is _not_ for the kernel to bother with, period! It is as simple as that. The kernel must simply execute system calls and return error codes where appropriate. If the application does something stupid, that is not the kernel's business. For example, do you expect the kernel to warn about this program: main() { int p[2]; char buf[1]; pipe(p); read(p[0], buf, 1); } 2. There does not seem to be an easy way to switch off the warnings; one probably must recompile the kernel. 3. I contest that the application has a bug. Consider this case: - At a high level the programmer does not want to be bothered with cleaning up child processes, so the SIGCHLD handler is set to SIG_IGN. - Some low-level library routine, like system() or popen(), needs to create a child process, which of course it will clean up ASAP using waitpid() [not wait()]. The call to waitpid() will either work as if SIGCHLD were not set to SIG_IGN, as is the case on Linux, or it may sometimes or always return -1 (ECHILD), e.g. when the call to waitpid() came too late, as the child had already exited and was cleaned up automatically by the kernel. The latter case may or may not be a problem for the application; that is not for the kernel to judge. 4. The waitpid() call never deadlocks: the child exists or it does not. The unqualified wait() call need not deadlock either: when SIGCHLD is set to SIG_IGN, it can simply return -1 (ECHILD), since there are no child processes for the user to clean up, or it could return the next child to exit. Either behavior may or may not upset a particular application. (A deadlock might result when some specific child is reaped by the kernel, while the application expects to reap it by repeatedly calling wait().) So, I suggest you take out those silly warnings completely.