Bug 140552 - Kernel wrongly complains about application bug when loading modules
Summary: Kernel wrongly complains about application bug when loading modules
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: kernel
Version: 3.0
Hardware: All
OS: Linux
medium
low
Target Milestone: ---
Assignee: Ingo Molnar
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks: 170417
TreeView+ depends on / blocked
 
Reported: 2004-11-23 16:19 UTC by Lev Makhlis
Modified: 2007-11-30 22:07 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2005-11-01 07:14:30 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2005:294 0 normal SHIPPED_LIVE Moderate: Updated kernel packages available for Red Hat Enterprise Linux 3 Update 5 2005-05-18 04:00:00 UTC

Description Lev Makhlis 2004-11-23 16:19:24 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.3)
Gecko/20041020

Description of problem:
When an application opens a device, it may result in the kernel
loading a new module.  Internally, the kernel calls kernel_thread()
and waitpid().  The problem is that if the application has SIGCHLD set
to SIG_IGN, sys_wait4 issues this complaint:

application bug: <cmd>(<pid>) has SIGCHLD set to SIG_IGN but calls wait().
(see the NOTES section of 'man 2 wait'). Workaround activated.

The message is wrong: the application doesn't call wait().  It calls
open().  This isn't a bug because there is no rule against having
SIGCHLD set to SIG_IGN when calling open().

Version-Release number of selected component (if applicable):
kernel-2.4.21-20.EL

How reproducible:
Always

Steps to Reproduce:
1. Compile this program:
#include <unistd.h>
#include <fcntl.h>
#include <signal.h>

int main()
{
  signal(SIGCHLD, SIG_IGN);
  open("/dev/cdrom", O_RDONLY | O_NONBLOCK);
}

2. # rmmod ide-cd
3. # ./rhel_bug
4. # tail /var/log/messages
    

Actual Results:  Nov 23 10:59:38 redhat10 kernel: ide-floppy driver
0.99.newide
Nov 23 10:59:38 redhat10 kernel: application bug: rhel_bug(6919) has
SIGCHLD set to SIG_IGN but calls wait().
Nov 23 10:59:38 redhat10 kernel: (see the NOTES section of 'man 2
wait'). Workaround activated.
Nov 23 10:59:38 redhat10 kernel: hda: attached ide-cdrom driver.
Nov 23 10:59:38 redhat10 kernel: hda: ATAPI 24X DVD-ROM drive, 192kB
Cache, UDMA(33)


Expected Results:  No complaint should have been issued.

Additional info:

Comment 1 Ernie Petrides 2004-11-23 23:39:19 UTC
Hello, Lev.  Thanks for your bug report.

The kernel messages are produced by the handler for the waitpid() or wait4()
system calls.  Although your test program doesn't invoke them directly, there
is a hook from misc_open() to request_module(), which calls waitpid() from
within the kernel.  I think we need to save, modify, and restore the SIGCHLD
disposition there in order to make this work reliably.

Ingo, could you please look into this?  It looks like the same bug is
upstream in 2.4.  The 2.6 code is significantly different in this area.


Comment 2 Lev Makhlis 2004-11-24 03:07:10 UTC
If we need to save/modify/restore SIGCHLD handler, then the existing
code already does that, but the message it prints is still wrong
(since the application has no bug).  But my understanding is that we
don't need to touch SIGCHLD handler because the thread created in
request_module() is a "clone" child and doesn't use SIGCHLD.  This is
corroborated by the fact that my test program works fine on SLES8,
which does not have RHEL3's workaround.  (If SIGCHLD was being used
and ignored, waitpid() would return -ECHILD.)

Comment 5 Ernie Petrides 2005-01-04 20:16:01 UTC
Patch has been posted by Ingo on 4-Jan-2004.  Reassigning to him.
I will set this to MODIFIED when the patch is committed to CVS.

Comment 6 Ernie Petrides 2005-01-15 00:28:02 UTC
A fix for this problem has just been committed to the RHEL3 U5
patch pool this evening (in kernel version 2.4.21-27.8.EL).


Comment 7 Ernie Petrides 2005-05-02 20:01:12 UTC
*** Bug 156617 has been marked as a duplicate of this bug. ***

Comment 9 it@interactivebrokers.ch 2005-05-05 08:43:20 UTC
where do i find the patch?

Comment 10 Ernie Petrides 2005-05-05 21:29:12 UTC
The U5 kernel (2.4.21-32.EL) is in the RHN beta channels now.  It is
scheduled for release in two weeks.

Comment 11 Tim Powers 2005-05-18 13:28:38 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2005-294.html


Comment 12 LinuxSystem 2005-05-27 08:17:49 UTC
Kernel wrongly complains about application bug when loading modules

In U5 Kernel  2.4.21-32.0.1.ELsmp also this error message occurs but after 
applying update level 5 error message started to appear in dmesg previously it 
was in /var/log/messages .

DMESG 

#### 
application bug: XXXXXX(25546) has SIGCHLD set to SIG_IGN but calls wait().
(see the NOTES section of 'man 2 wait'). Workaround activated.
####

Please let us know any patch available for this




Comment 13 Arjan van de Ven 2005-05-27 08:19:34 UTC
LinuxSystem: May your application truely has a bug in it...
and the kernel is then right to complain. If that is the case the application
should be fixed.

Comment 14 Ernie Petrides 2005-05-27 21:38:00 UTC
Please use "dmesg -nX" to filter messages less important than you're interested.


Comment 15 LinuxSystem 2005-05-31 02:05:32 UTC
dmesg -nX only reports 
klogctl: Invalid argument

That is a common output on all machines. So does it mean that we could ignore
the application bug error that comes in normal dmesg | grep <application name>?

Comment 16 Ernie Petrides 2005-05-31 19:47:47 UTC
Assuming that the application is not performing wait()/wait4()/waitpid()
system calls while SIGCHLD signals are being ignored, yes.


Comment 17 Maarten Litmaath 2005-06-25 00:28:54 UTC
I beg to differ with the whole reasoning behind the code.  Consider these:

1. An _application_ bug is _not_ for the kernel to bother with, period!
   It is as simple as that.  The kernel must simply execute system calls
   and return error codes where appropriate.  If the application does
   something stupid, that is not the kernel's business.  For example,
   do you expect the kernel to warn about this program:

   main()
   {
       int p[2];
       char buf[1];

       pipe(p);
       read(p[0], buf, 1);
   }

2. There does not seem to be an easy way to switch off the warnings;
   one probably must recompile the kernel.

3. I contest that the application has a bug.  Consider this case:

   - At a high level the programmer does not want to be bothered with
     cleaning up child processes, so the SIGCHLD handler is set to SIG_IGN.

   - Some low-level library routine, like system() or popen(), needs to
     create a child process, which of course it will clean up ASAP using
     waitpid() [not wait()].  The call to waitpid() will either work
     as if SIGCHLD were not set to SIG_IGN, as is the case on Linux,
     or it may sometimes or always return -1 (ECHILD), e.g. when the call
     to waitpid() came too late, as the child had already exited and was
     cleaned up automatically by the kernel.  The latter case may or may not
     be a problem for the application; that is not for the kernel to judge.

4. The waitpid() call never deadlocks: the child exists or it does not.
   The unqualified wait() call need not deadlock either: when SIGCHLD is
   set to SIG_IGN, it can simply return -1 (ECHILD), since there are no
   child processes for the user to clean up, or it could return the next
   child to exit.  Either behavior may or may not upset a particular
   application.  (A deadlock might result when some specific child is
   reaped by the kernel, while the application expects to reap it by
   repeatedly calling wait().)

So, I suggest you take out those silly warnings completely.



Note You need to log in before you can comment on or make changes to this bug.