I'm using rsync-2.5.4-2. Because of a buggy old version of /etc/profile.d/lang.csh on one of my machines, coupled with some commented out lines in /etc/sysconfig/i18n, coupled with the fact that I use /bin/tcsh as my log in shell, every time I log into this particular machine, I see a list of my entire argument (i.e., my shell runs the command "setenv" as part of my login sequence). All of this is relevant to rsync because rsync requires a "clean" connection when you use "rsync -e ssh", i.e., it can't cope with garbage on the line such as the output of "setenv". So, when I tried to rsync to copy files from this machine using "rsync -e ssh", I saw this: protocol version mismatch - is your shell clean? (see the rsync man page for an explanation) rsync error: protocol incompatibility (code 2) at compat.c(58) Received signal 10. (no core) The problem is that after I got this message on the client, the rsync process on the server *continued to run*, consuming more and more memory and hosing the machine (the load average got above 45 before somebody rebooted the machine because they didn't realize they could solve the problem simply by killing the rsync process). I think I've provided enough information for you to be able to duplicate the problem; if not, please let me know and I'll debug it further here.
I've upped the severity on this. It's even worse than I thought. The scenario I described earlier is bad, but there's an even worse one -- if you start "rsync -e ssh" to a remote machine and then ctrl-c it, the rsync server process will keep running on the remote machine and exhibit the hosing behavior I described previously. Certainly, it's not unheard of to ctrl-c an rsync command, so I think this needs to be fixed. Incidentally, the version of ssh I'm using is from ssh.com, not from openssh. That could be relevant, although you should of course first try to duplicate the problem with openssh to see if it happens there.
I don't see this when using rsync between two rh 7.3 boxes. (Also not between rh 7.3 and Mac OS X).
Well, I found the problem and a fix. When the client rsync and ssh exit, and thus sshd on the other end exits, and then the server rsync tries to write to the client, it gets SIGPIPE. Alas, SIGPIPE is being ignored, because, quoting a comment in the rsync source code, "Ignore SIGPIPE; we consistently check error codes and will see the EPIPE." The comment is wrong; it does *not* see the SIGPIPE. What happens is that as a result of the SIGPIPE, exit_cleanup gets called. That's a macro which calls _exit_cleanup. That calls log_exit That calls rwrite. Rwrite tries to write an error to stderr, but that fails because of (you guessed it!) SIGPIPE, and so rwrite calls exit_cleanup. Presto, an infinite loop is born. The most straightforward fix I came up with is to modify rwrite so that it doesn't actually try to write the error if the FILE to which the error has been written is showing a true error condition (checked with ferror). I will attach a patch. I suspect that the rsync server running under openssh doesn't have this problem because openssh sshd causes it it to get a SIGHUP whereas the ssh.com version of sshd does not. That's just a guess though. I'll send the rsync maintainers a pointer to this bug report and the patch.
Created attachment 58744 [details] patch to avoid infinite loops on SIGPIPE
I recently inherited 'rsync', so I'm now looking into this. I am unable to reproduce this problem. I have tried: * pushing from rsync-2.5.4-2 to rsync-2.5.4-2, and Ctl-C in middle * pulling from rsync-2.5.4-2 to rsync-2.5.4-2, and Ctl-C in middle and nothing seems to cause 'rsync' process, nor 'ssh'(openssh) process to be left lying around - all related 'rsync' and 'ssh' are cleaned up properly. I have compared your patch code to the upstream rsync sources - looks like code similar(but not the same) to your patch was included in rsync-2.5.6, which I haven't brought in to Red Hat yet, but plan to soon. But since I can't reproduce your problem even on 2.5.4-2, I can't confirm that it fixes the problem you describe. Do you have any more info on this?
I think the bug is gone in the current release of rsync. I think it's safe to close this with CURRENTRELEASE.
Closed.