Description of problem: Use of "rsync --daemon" results in the following kernel log messages: Jul 8 12:56:59 ftp kernel: application bug: rsync(1051) has SIGCHLD set to SIG_IGN but calls wait(). Jul 8 12:56:59 ftp kernel: (see the NOTES section of 'man 2 wait'). Workaround activated. These messages seem to appear every time a new connection is made (not verified). Version-Release number of selected component (if applicable): rsync-2.5.5-4 with kernel-bigmem-2.4.20-18.9
Does the rsync complete successfully in these cases, or does it crash? And, what is the size of the physical memory on that rsync server box? Please post the rsync server's /etc/rsyncd.conf file, and an example of an rsync command line that causes it to fail, or get the messages you reported. FYI, I don't see these messages on a stock RHL9 system (rsync-2.5.5-4) on an SMP box with up2date'd kernel 2.4.20-18.9smp. So my thought is that it has something to do with the bigmem kernel - I'll test that next.
Rsync server seems to work fine. When I test it, it gives the error when there is nothing to do (after "skipping directory" at the client site), but I'm not sure if this is all (it's a public rsync server). In either case, it seems to complete ok. The server system has 8 GB of memory and runs the bigmem kernel. Relevant part from /etc/rsyncd.conf: [vol] comment = /vol hierarchy path = /var/ftp/vol/ read only = true uid = ftp gid = ftp hosts allow = * Command and result that causes the message to appear in the server kernel log: [jos@test x]$ rsync rsync.server.name::vol/1/nilo /tmp/ skipping directory /1/nilo client: nothing to do [jos@test x]$ Note that vol/1/nilo is a directory, so this is meant to fail. It doesn't seem to give the error when I specify -avx, when tree is actually retrieved.
Are there valid cases (not the case you supplied for rsync'ing a directory that is meant to fail) where this is causing you a problem, or are these cases of clients trying to rsync *incorrectly*? I guess my question is what problem is this really causing you?
I looked in more detail at the log files. It seems that in more than 10% of all rsync sessions we get the error. The error occurs sometimes when the rsyncd log does not show actual transfers, but also sometimes when bytes are xferred. Looks like some timing-related issue, that more often occurs when no data is xferred...
These messages are a warning that rsync is not standards compliant with respect to its handling of child processes. According to POSIX (3.3.1.3) it is unspecified what happens when SIGCHLD is set to SIG_IGN.
Created attachment 93129 [details] signal.patch Wayne Davison, one of the rsync maintainers, proposed this signal handling patch.
Here are Wayne's comments in the email to which he attached the patch: I finally educated myself on this issue, and would like to propose a patch. Since there are reports that zombies can get created when using SIG_IGN on FreeBSD as well as other unices, I think we should change the code to catch the signal and cleanup the zombies in the signal handler. This would make the code similar to the handling in main.c.
JW Schultz, another rsync maintainer, responds to Wayne Davison's patch: Something along these lines might be appropriate. I did a little more digging as a result of your message here and it looks like this routine should either be setting up it's own signal handler or integrate with wait_process and the signal hander in main(). Repeatedly setting SICHGLD to SIG_IGN is dumb.
I've tested the patch and it works for me - no more warnings. Please test the patch (Jos Vos) and post your results here. If it works for you, then I'll contact the rsync maintainers via the rsync mailing list and let them know.
I have applied the patch, after modifying it (the patch does not apply cleanly to the rsync-2.5.5-4 code as in RHL 9, due to the context). Tomorrow morning I can say if the warnings are now gone.
Seems to work for me, no kernel warnings anymore since the upgrade to the patched version.
I posted a message to the rsync mailing list today saying that the patch worked for myself and the person who posted the bug, and asked that the patch be committed so that future rsync versions contained the patch code. Wayne Davison, one of the rsync maintainers, responded: I think it's at least better than what's currently there, so I've committed it to CVS. -------------------------------------------------------- The next release of rsync is expected to be 2.5.7, and since that hasn't been released yet, it will not make it into Red Hat's next release. Since the patch has been committed upstream, I am not planning to create a Red Hat specific patch. Users wanting this patch functionality will need to 1. apply the patch, or 2. wait for the Red Hat release containing rsync 2.5.7 (current release is RHL9 - rsync 2.5.7 will not be in the next release), or 3. wait for rsync 2.5.7 to be released and download it from the rsync website http://samba.anu.edu.au/rsync/, or 4. download the patched code now (8/4/2003) from the rsync website from CVS
Interesting statement: "Since the patch has been committed upstream, I am not planning to create a Red Hat specific patch". I would have thought the opposite: if a patch is committed upstream, this is a good reason to temporarily add a RH patch.
Ok, I've added the signal patch to the rpm - the new package can be found here: http://people.redhat.com/hmerrill/rsync-2.5.6-19.i386.rpm and will appear in rawhide soon. This package is intended for the next Red Hat release, but should work on RHL9 also.