Red Hat Bugzilla – Bug 65350
rsync process grows without bound, hoses machine
Last modified: 2014-08-31 19:24:09 EDT
I'm using rsync-2.5.4-2.
Because of a buggy old version of /etc/profile.d/lang.csh on one of my machines,
coupled with some commented out lines in /etc/sysconfig/i18n, coupled with the
fact that I use /bin/tcsh as my log in shell, every time I log into this
particular machine, I see a list of my entire argument (i.e., my shell runs the
command "setenv" as part of my login sequence).
All of this is relevant to rsync because rsync requires a "clean" connection
when you use "rsync -e ssh", i.e., it can't cope with garbage on the line such
as the output of "setenv". So, when I tried to rsync to copy files from this
machine using "rsync -e ssh", I saw this:
protocol version mismatch - is your shell clean?
(see the rsync man page for an explanation)
rsync error: protocol incompatibility (code 2) at compat.c(58)
Received signal 10. (no core)
The problem is that after I got this message on the client, the rsync process on
the server *continued to run*, consuming more and more memory and hosing the
machine (the load average got above 45 before somebody rebooted the machine
because they didn't realize they could solve the problem simply by killing the
I think I've provided enough information for you to be able to duplicate the
problem; if not, please let me know and I'll debug it further here.
I've upped the severity on this. It's even worse than I thought. The scenario
I described earlier is bad, but there's an even worse one -- if you start "rsync
-e ssh" to a remote machine and then ctrl-c it, the rsync server process will
keep running on the remote machine and exhibit the hosing behavior I described
previously. Certainly, it's not unheard of to ctrl-c an rsync command, so I
think this needs to be fixed.
Incidentally, the version of ssh I'm using is from ssh.com, not from openssh.
That could be relevant, although you should of course first try to duplicate the
problem with openssh to see if it happens there.
I don't see this when using rsync between two rh 7.3 boxes. (Also not
between rh 7.3 and Mac OS X).
Well, I found the problem and a fix.
When the client rsync and ssh exit, and thus sshd on the other end
exits, and then the server rsync tries to write to the client, it gets
SIGPIPE. Alas, SIGPIPE is being ignored, because, quoting a comment
in the rsync source code, "Ignore SIGPIPE; we consistently check error
codes and will see the EPIPE."
The comment is wrong; it does *not* see the SIGPIPE. What happens is
that as a result of the SIGPIPE, exit_cleanup gets called. That's a
macro which calls _exit_cleanup. That calls log_exit That calls
rwrite. Rwrite tries to write an error to stderr, but that fails
because of (you guessed it!) SIGPIPE, and so rwrite calls
exit_cleanup. Presto, an infinite loop is born.
The most straightforward fix I came up with is to modify rwrite so
that it doesn't actually try to write the error if the FILE to which
the error has been written is showing a true error condition (checked
with ferror). I will attach a patch.
I suspect that the rsync server running under openssh doesn't have
this problem because openssh sshd causes it it to get a SIGHUP whereas
the ssh.com version of sshd does not. That's just a guess though.
I'll send the rsync maintainers a pointer to this bug report and the
Created attachment 58744 [details]
patch to avoid infinite loops on SIGPIPE
I recently inherited 'rsync', so I'm now looking into this.
I am unable to reproduce this problem. I have tried:
* pushing from rsync-2.5.4-2 to rsync-2.5.4-2, and Ctl-C in middle
* pulling from rsync-2.5.4-2 to rsync-2.5.4-2, and Ctl-C in middle
and nothing seems to cause 'rsync' process, nor 'ssh'(openssh) process to be
left lying around - all related 'rsync' and 'ssh' are cleaned up properly.
I have compared your patch code to the upstream rsync sources - looks like code
similar(but not the same) to your patch was included in rsync-2.5.6, which I
haven't brought in to Red Hat yet, but plan to soon. But since I can't
reproduce your problem even on 2.5.4-2, I can't confirm that it fixes the
problem you describe.
Do you have any more info on this?
I think the bug is gone in the current release of rsync. I think it's safe to
close this with CURRENTRELEASE.