Bug 208155

Summary: crash on syncing large directory trees
Product: Red Hat Enterprise Linux 4 Reporter: David Cantrell <dcantrell>
Component: rsyncAssignee: Jan Zeleny <jzeleny>
Status: CLOSED NOTABUG QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: 4.4   
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-03-05 09:46:09 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description David Cantrell 2006-09-26 17:40:44 UTC
rsync on RHEL4 (version 2.6.3, protocol 28) crashes when syncing large directory
trees over ssh.  The command I use is:

rsync -Pav --delete --progress /source/path/ user@remote:/dest/path/

The sync will start to run and crash after a while.  The message given is phase
"unknown" [sender]: connection reset by peer.  The tree I was trying to sync was
1198MB in size with 1046 directories and 25516 files.

It seems related to this problem:
https://bugzilla.samba.org/show_bug.cgi?id=2208

I've been able to work around the problem a bit playing with the --bwlimit and
--timeout options, but those just seem to delay the crash.  I have not really
looked in to it beyond searching for an existing bug.

Comment 1 Matt Whitted 2007-01-02 14:45:48 UTC
We are experiencing similar problems -- it looks like this bug was opened over 3
months ago and has no resolution yet.  Has this been investigated?  

Comment 2 David Cantrell 2007-03-06 19:02:26 UTC
Would like a fix for this to show up in a RHEL4 update.  Setting flags.

Comment 3 Simo Sorce 2007-05-14 19:27:18 UTC
Was this between 2 RHEL4 servers?
Or different machines?
Can you reproduce this at will? And if so how do you reproduce it?


Comment 4 David Cantrell 2007-05-14 19:38:44 UTC
Sort of.  It was between a CentOS 4.4 server and a RHEL 4.4 server.  I can
reproduce it using the description given in the first comment.  It's pretty
simple to reproduce.

Comment 5 Nalin Dahyabhai 2007-09-14 18:53:03 UTC
Checked with David, and it looks like this was happening between 2.6.3 on both
ends, and isn't reproducible when both ends are running 2.6.9.

Comment 6 Simo Sorce 2007-09-24 16:00:04 UTC
I can't reproduce this with 1-2G data sets on RHEL4 <-> RHEL4
If someone has more info on how to reliably reproduce it, it will make a lot
more easier to find what's wrong and patch it.

Thanks.

Comment 7 Simo Sorce 2007-10-05 19:34:22 UTC
It seem that David can't reproduce the bug as he changed the test environment
and I can't reproduce it as well.
Can you Matt provide a reproducible test case?
It would really help a lot to find out the exact conditions to reproduce it so
that I can actually fix it.

Thanks.

Comment 8 Matt Whitted 2007-10-05 20:16:43 UTC
We worked around the issue by changing our iptables rulesets.  Originally we
used connection tracking to protect the system.  When we removed any rules that
use connection tracking, we stopped experiencing the issues reported here.

As such, I cannot tell if this is rsync related or not, but I would recommend
setting up connection tracking rules something roughly as follows to see if this
reproduces the problem:

iptables -F
iptables -I INPUT 1 -m state --state ESTABLISHED,RELATED -j ACCEPT
iptables -I INPUT 2 -j DROP
iptables -I OUTPUT 1 -m state --state NEW -j ACCEPT

Given our environment, with the above, the following rough command would fail
after several minutes given gigabit ethernet between hosts:

rsync -avr --delete --progress --stats user@remote:/data /data

Comment 9 RHEL Program Management 2008-02-01 19:12:06 UTC
This request was evaluated by Red Hat Product Management for
inclusion, but this component is not scheduled to be updated in
the current Red Hat Enterprise Linux release. If you would like
this request to be reviewed for the next minor release, ask your
support representative to set the next rhel-x.y flag to "?".

Comment 10 Jan Zeleny 2010-03-05 09:46:09 UTC
Since the workaround implies that this issue might not be rsync related, I'm closing this bug.