Bug 645236

Summary: with big file, rsync times out out when it should not; the sender is still responsive
Product: [Fedora] Fedora Reporter: Tim Taiwanese Liim <tim.liim>
Component: rsyncAssignee: Michal Ruprich <mruprich>
Status: CLOSED EOL QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: low    
Version: 26CC: i, ssorce, thozza, vvitek
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-05-29 11:25:42 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Tim Taiwanese Liim 2010-10-21 05:58:17 UTC
Description of problem:
    When sending big file (eg. > 500MB) With --timeout=xx option, the
    server side of rsync timeouts out when it should not.  The components
    of rsync (generator or receiver or server) should exchange periodic
    keepalive msg, so they will not timeout when the connection is
    perfectly healthy.

Version-Release number of selected component (if applicable):
    rsync-3.0.7-3.fc13.i686

How reproducible:
    With --timeout=10: sometimes with 500MB file; almost always with 2GB file.

Steps to Reproduce:
    1. Assume timeout=10 sec.  generate the source file, eg. 
           i=/tmp/t55
           time dd if=/dev/zero of=$i bs=1M count=1000
       If it takes less than <timeout> x 3 (eg. 30 sec), increase
       the file size.  Bigger file better illustrates this issue.

    2. j=/tmp/t56
       rsync --timeout=10 localhost:$i $j

Actual results:
    Saw the following errors, depending on the mode of failure.

    #1
    [sender] io timeout after 10 seconds -- exiting
    rsync error: timeout in data send/receive (code 30) at io.c(140) 
           [sender=3.0.7]

    #2
    rsync: writefd_unbuffered failed to write 6 bytes to socket [generator]: 
           Broken pipe (32)
    rsync error: timeout in data send/receive (code 30) at io.c(1530) 
           [generator=3.0.7]
    rsync error: received SIGUSR1 (code 19) at main.c(1288) [receiver=3.0.7]


Expected results:
    The sync should complete without error; certainly no timeout.

Additional info:
    1. The issue is more prounced with two hosts of different speed,
       eg. h1 (dest) is much slower than h2 (src) (eg. h1 could be an
       slower machine (especially I/O bound) with LUKS; this could be
       simulated by strace on h1:receiver).
            i=/tmp/t55 j=/tmp/t56
            h2: dd if=/dev/zero      of=$i bs=1M count=1000
            h1: rsync --timeout 10   h2:$i $j

    2. From strace, the scenario with error msg in #2:
       - generator <g> read <dest> and sends checksum to 
         sender <s> (--server in this case).
       - <g> spawns receiver <r>
       - <r> opens <dest> and <tmp> file, start copying from <dest> to <tmp>
       - <s> reads <src> file and sends msgs to <r>
       - when <s> finishes, it starts to wait, for (timeout/2) sec.
       - Assume <r> is slower, <s> timeouts out after <timeout> sec.

       - <g> probably needs to send keepalive msg to <s>.

    3. strace for #1 is similar to #2: 
       - after <s> finishes, <r> takes more than 10 sec to finish 
         copying <dest> to <tmp>  and then rename <tmp> <dest>, 
         so <s> times out.

    4. The issue is somewhat reproducible on one single host.  But
       with strace on both <s> and <r>, it becomes harder to
       reproduce, probably because strace slows both <s> and <r> down.
       I need to use a much bigger file (2.5GB), and strace <r> from
       beginning, but strace <s> only torwards the end, to show this
       error with strace.

    5. There are several existing bugs that may be related to this
       unexpected timeout issue.

https://bugzilla.samba.org/show_bug.cgi?id=2783
    Random high loads during syncs (server side) / client stream errors
    rsync: connection unexpectedly closed (2761332 bytes received so far)
    [generator]
    rsync error: error in rsync protocol data stream (code 12) at io.c(365)

https://bugzilla.samba.org/show_bug.cgi?id=5478
    rsync: writefd_unbuffered failed to write 4092 bytes [sender]: 
                Broken pipe (32)

    rsync: writefd_unbuffered failed to write 4092 bytes [sender]: 
                Broken pipe (32)
    io timeout after 30 seconds -- exiting
    rsync error: timeout in data send/receive (code 30) at io.c(239) 
                [sender=3.0.2]

https://bugzilla.samba.org/show_bug.cgi?id=5695
    improve keep-alive code to handle long-running directory scans

    ./io.c:void maybe_send_keepalive(void)

https://bugzilla.samba.org/show_bug.cgi?id=6175
    write last transfer status when timeout or other error happens
    rsync: writefd_unbuffered failed to write 4 bytes [sender]: Broken pipe (32)
    rsync: connection unexpectedly closed (99113 bytes received so far) [sender]
    rsync error: unexplained error (code 255) at io.c(600) [sender=3.0.5]

https://bugzilla.samba.org/show_bug.cgi?id=7195
    timeout reached while sending checksums for very large files

Comment 1 Tim Taiwanese Liim 2010-10-27 05:24:46 UTC
Opened upstream bug
    https://bugzilla.samba.org/show_bug.cgi?id=7757

Comment 2 Bug Zapper 2011-05-31 10:56:30 UTC
This message is a reminder that Fedora 13 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 13.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '13'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 13's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 13 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 3 Tim Taiwanese Liim 2011-06-06 03:02:12 UTC
This issue is still present with rsync-3.0.8-1.fc14.x86_64.
Changed Version to F14.

Comment 4 Fedora End Of Life 2012-08-16 19:33:37 UTC
This message is a notice that Fedora 14 is now at end of life. Fedora 
has stopped maintaining and issuing updates for Fedora 14. It is 
Fedora's policy to close all bug reports from releases that are no 
longer maintained.  At this time, all open bugs with a Fedora 'version'
of '14' have been closed as WONTFIX.

(Please note: Our normal process is to give advanced warning of this 
occurring, but we forgot to do that. A thousand apologies.)

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, feel free to reopen 
this bug and simply change the 'version' to a later Fedora version.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we were unable to fix it before Fedora 14 reached end of life. If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora, you are encouraged to click on 
"Clone This Bug" (top right of this page) and open it against that 
version of Fedora.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 5 Christopher Meng 2016-09-03 06:47:31 UTC
This still happens with -z option enabled when trying to pull distro ISO files from nearby hosts. I myself can sync from src extremely fast.

----------------------------------------------------------------------
$/usr/bin/rsync -avzH --progress --delete-after --delay-updates --timeout=20 --contimeout=60 /rsync/src /path/to/dst

Small file...
Small file...
Small file...
Small file...
...
...
...(works well)
Large file...
Large file...
Large file...
Large file...

Suddenly it hangs...

After several minutes...

[generator] io timeout after 20 seconds -- exiting
rsync error: timeout in data send/receive (code 30) at io.c(195) [generator=3.1.2]
rsync error: received SIGUSR1 (code 19) at main.c(1429) [receiver=3.1.2]

----------------------------------------------------------------------
All large files for instance .iso, .tar.gz with similar progress:
    654,569,472 100%  419.80MB/s    0:00:01 (xfr#805, to-chk=92473/93420)

Will eventually stop rsync.

#rpm -q --queryformat "%{VERSION}\n" rsync
3.1.2

Comment 6 Fedora End Of Life 2017-02-28 09:29:11 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 26 development cycle.
Changing version to '26'.

Comment 8 Fedora End Of Life 2018-05-03 08:51:14 UTC
This message is a reminder that Fedora 26 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 26. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as EOL if it remains open with a Fedora  'version'
of '26'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version'
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not
able to fix it before Fedora 26 is end of life. If you would still like
to see this bug fixed and are able to reproduce it against a later version
of Fedora, you are encouraged  change the 'version' to a later Fedora
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's
lifetime, sometimes those efforts are overtaken by events. Often a
more recent Fedora release includes newer upstream software that fixes
bugs or makes them obsolete.

Comment 9 Fedora End Of Life 2018-05-29 11:25:42 UTC
Fedora 26 changed to end-of-life (EOL) status on 2018-05-29. Fedora 26
is no longer maintained, which means that it will not receive any
further security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.