RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1236520 - rsync --sparse cannot copy /var/log/lastlog on x86_64 server
Summary: rsync --sparse cannot copy /var/log/lastlog on x86_64 server
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: rsync
Version: 7.2
Hardware: All
OS: Linux
medium
medium
Target Milestone: rc
: ---
Assignee: Michal Ruprich
QA Contact: Martin Zelený
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-06-29 10:23 UTC by Jaroslav Aster
Modified: 2017-11-20 18:06 UTC (History)
4 users (show)

Fixed In Version: rsync-3.1.2-4.el7
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 827429
Environment:
Last Closed: 2017-11-20 18:06:17 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Jaroslav Aster 2015-06-29 10:23:32 UTC
+++ This bug was initially created as a clone of Bug #827429 +++

+++ This bug was initially created as a clone of Bug #156809 +++

Description of problem:

We have two Intel x86_64 servers running RHEL 4.  When I tried to rsync the /var
file system to either a i386 server or another x86_64 server (the rsync command
is run from the remote server), the rsync job hangs on /var/log/lastfile, even
if I run rsync with the "--sparse" option.  The same rsync command can backup
the /var file system on i386 servers just fine.

Version-Release number of selected component (if applicable):

rsync-2.6.3-1 

How reproducible: Always

Steps to Reproduce:
1. From a remote server, try to rsync /var/log/lastlog of a x86_64 RHEL 4 server
with --sparse option.
  
Actual results:

The rsync job hangs on /var/log/lastlog forever

Expected results:

/var/log/lastlog should be copied to the remote server.

Additional info:

--- Additional comment from andy on 2005-06-08 18:13:24 EDT ---

Just to be clear: /var/log/lastlog is a sparse file indexed by UID.
On 64 bit systems (where UIDs are 32 bits long) it is 1.2TB large.
Running rsync with --sparse (I believe) causes it to detect and
coalesce large runs of zeros in the input file.  It still needs to
read the entire block of nothingness to do its job, however.

I verified similar behavior with tar --sparse, which appears to
terminate only after a very long time.  I don't believe it is an
infinite hang.

Here is a mild flame war on fedora-test-list that might provide more
information:

https://www.redhat.com/archives/fedora-test-list/2005-June/thread.html#00308

Unless the implementation of the lastlog file format is changed to
something other than a gargantuan sparse file (or removed from the
distribution -- it is only readable by root and provides very little
utility), this is going to be difficult to fix.

--- Additional comment from jzeleny on 2010-03-05 04:34:15 EST ---

Since there is only one more release planned for RHEL4, this issue most likely won't be fixed in it. Considering the circumstances described both in this bugzilla and referenced discussion, I'm closing this bug as WONTFIX.


-------------------------------------------------------------------------------
I noticed this problem still exists on RHEL 6.2 and 6.3 using rsync 3.0.6 (using it via BackupPC, which takes forever on these files with this rsync).

# rsync --version
rsync  version 3.0.6  protocol version 30


This bug seems fixed in rsync 3.0.8, see release notes:
http://rsync.samba.org/ftp/rsync/src/rsync-3.0.8-NEWS

Indeed, this seems to work getting the lastlog from Fedora 17 with rsync 3.0.9:

]# rsync --dry-run --rsh=ssh --sparse -avz my.machine:/var/log/lastlog ./ 
receiving incremental file list

sent 11 bytes  received 37 bytes  13.71 bytes/sec
total size is 292000  speedup is 6083.33 (DRY RUN)
# rsync  --rsh=ssh --sparse -avz my.machine:/var/log/lastlog ./
receiving incremental file list
lastlog

sent 30 bytes  received 388 bytes  55.73 bytes/sec
total size is 292000  speedup is 698.56
# ls -l lastlog 
-rw-r--r--. 1 root root 292000 Jun  1 14:54 lastlog
# du -hs lastlog 
8.0K    lastlog

--- Additional comment from RHEL Product and Program Management on 2012-06-01 09:17:11 EDT ---

Since this issue was entered in bugzilla and this package is not
scheduled to be updated in the current release, the release flag
has been set to ? to ensure that it is properly evaluated for
the next release.

--- Additional comment from RHEL Product and Program Management on 2012-09-07 01:09:34 EDT ---

This request was evaluated by Red Hat Product Management for
inclusion in the current release of Red Hat Enterprise Linux.
Because the affected component is not scheduled to be updated
in the current release, Red Hat is unable to address this
request at this time.

Red Hat invites you to ask your support representative to
propose this request, if appropriate, in the next release of
Red Hat Enterprise Linux.

--- Additional comment from Brian J. Murrell on 2014-01-15 09:41:58 EST ---

So fixing this missed RHEL 6.4 and 6.5.  Can we please have this fixed for the next (or current even!) release of RHEL?

--- Additional comment from Martin Žember on 2014-06-25 09:36:11 EDT ---

A reproducer for this bug has been writtern: /CoreOS/rsync/Regression/bz827429-rsync-sparse-var-log-lastlog-on-x86_64

It might pass once the bug is fixed.

An excerpt from the log with
rsync-3.0.6-9.el6_4.1:
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
:: [   LOG    ] :: Test
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

:: [   INFO   ] :: Testing with a sparse file with a size of 1M
:: [   LOG    ] :: Runnning rsync --dry-run -e 'ssh -i ./id_dsa -o StrictHostKeyChecking=no' --sparse -avz localhost:/tmp/tmp.M05i4c7tcK/sparsefile /tmp/tmp.M05i4c7tcK/target, with 60 seconds timeout
:: [   LOG    ] :: Command ended itself, I am not killing it.
:: [   LOG    ] :: Runnning rsync  -e 'ssh -i ./id_dsa -o StrictHostKeyChecking=no' --sparse -avz localhost:/tmp/tmp.M05i4c7tcK/sparsefile /tmp/tmp.M05i4c7tcK/target, with 60 seconds timeout
:: [   LOG    ] :: Command ended itself, I am not killing it.
:: [   PASS   ] :: Running 'ls -l target/' (Expected 0, got 0)
:: [   PASS   ] :: Running 'du -sh target/' (Expected 0, got 0)
:: [   INFO   ] :: Testing with a sparse file with a size of 20G
:: [   LOG    ] :: Runnning rsync --dry-run -e 'ssh -i ./id_dsa -o StrictHostKeyChecking=no' --sparse -avz localhost:/tmp/tmp.M05i4c7tcK/sparsefile /tmp/tmp.M05i4c7tcK/target, with 60 seconds timeout
:: [   LOG    ] :: Command ended itself, I am not killing it.
:: [   LOG    ] :: Runnning rsync  -e 'ssh -i ./id_dsa -o StrictHostKeyChecking=no' --sparse -avz localhost:/tmp/tmp.M05i4c7tcK/sparsefile /tmp/tmp.M05i4c7tcK/target, with 60 seconds timeout
:: [   LOG    ] :: Command is still running, I am killing it with KILL
:: [   FAIL   ] :: rsync took too long to finish 
:: [   PASS   ] :: Running 'ls -l target/' (Expected 0, got 0)
:: [   PASS   ] :: Running 'du -sh target/' (Expected 0, got 0)
:: [   LOG    ] :: Duration: 1m 12s
:: [   LOG    ] :: Assertions: 4 good, 1 bad
:: [   FAIL   ] :: RESULT: Test

Behaves similarly on all four archs:
https://beaker.engineering.redhat.com/jobs/678626

--- Additional comment from Michal Luscon on 2014-06-26 04:21:28 EDT ---

Are you sure that your test is correct? I tried to run this test on RHEL-7 with rsync-3.0.9 and it fails which disagree with previous comment. Also, a patch from rsync-3.0.8 had been already backported into RHEL(rsync-3.0.6-9).

--- Additional comment from Martin Žember on 2014-06-26 04:27:38 EDT ---

It fails on RHEL-7, that could mean it has not been fixed in RHEL-7 (rsync-3.0.9-15.el7)...

--- Additional comment from Michal Luscon on 2014-06-26 11:52:59 EDT ---

I am pointing out that you are probably not testing this bug. The original report relates to lastlog of size much more smaller than 10Gb, also bug reporter claims it is fixed in 3.0.8. Transfer time of large sparse files is influenced by current rsync implementation and there is RFE bugreport for it (https://bugzilla.redhat.com/show_bug.cgi?id=525545).

--- Additional comment from Martin Žember on 2014-06-27 07:11:42 EDT ---

Thanks, it might be the case as well that it takes a while because of current implementation and not the bug.

Do you know the name of the patch or entry in the changelog? I do not know which version contains the bug, reporter's version 3.0.6 is not specific enough. If I change the test to use a 1G-sized sparse file and 120s timeout, the test PASSes wirh the oldest build of 3.0.6 I could find, rsync-3.0.6-3.el6.x86_64.

The reason for a 10G file was because a size of 1.2TB was mentioned above, the lastlog can be of a different size, it depends on UIDs used on the system. Do you know which size is the right one for this bug to trigger?

--- Additional comment from Michal Luscon on 2014-07-02 08:53:11 EDT ---

Could you please confirm whether this bug is fixed or not in the current RHEL-6 version of rsync?

--- Additional comment from Joaquin on 2014-07-14 06:13:27 EDT ---

Still not working on an updated RHEL 6.5, dry run works, actual sync not:

# cat /etc/redhat-release 
Red Hat Enterprise Linux Server release 6.5 (Santiago)

# rpm -qi rsync
Name        : rsync                        Relocations: (not relocatable)
Version     : 3.0.6                             Vendor: Red Hat, Inc.
Release     : 9.el6_4.1                     Build Date: Wed 23 Oct 2013 10:31:58 AM CEST     
..

# rsync --dry-run --rsh=ssh --sparse -avz root@<other-rhel-6>:/var/log/lastlog ./

receiving incremental file list
lastlog

sent 14 bytes  received 43 bytes  8.77 bytes/sec
total size is 500825445128  speedup is 8786411318.04 (DRY RUN)
# rsync --rsh=ssh --sparse -avz root@<other-rhel-6-or-7>:/var/log/lastlog ./
receiving incremental file list
lastlog
^CKilled by signal 2.
rsync error: unexplained error (code 255) at rsync.c(544) [generator=3.0.6]
rsync error: received SIGUSR1 (code 19) at main.c(1285) [receiver=3.0.6]

Still takes forever without even returning the "file size". 

Using a RHEL 7 server immediately returns the "file size" and, after some time, the file:
# rsync --version
rsync  version 3.0.9  protocol version 30
..
# rsync --rsh=ssh --sparse -avz root@<my-rhel6.5-machine>:/var/log/lastlog ./
..
sent 30 bytes  received 279 bytes  41.20 bytes/sec
total size is 146000  speedup is 472.49
# ls -al lastlog 
-rw-r--r--. 1 root root 146000 Jul 10 13:24 lastlog

--- Additional comment from Martin Žember on 2014-07-14 08:23:53 EDT ---

Joaquin,
thank you very much for the info.

Now we see that on RHEL-7, it successfully transferred cca 140kB, while on RHEL-6, we see (thanks to the dry-run) that it tried to transfer cca 500GiB.

BTW, an important information about the version is the update number behind the dash, e.g. -12 in rsync-3.0.6-12.el6, visible in "rpm -q" or the whole output of "rpm -qi", but now it is not important anymore since the difference in the file size is the information that we missed before.

--- Additional comment from Martin Žember on 2014-07-14 09:47:40 EDT ---

One solution would be to change the lastlog file implementation, which is probably not going to happen soon, see bug 525545.

Another way would be to change the rsync implementation of sparse files, see bug 771286.

A workaround could be to reduce the size of the lastlog (probably by changing the high UIDs for those users and recreating/deleting the file).

--- Additional comment from Joaquin on 2014-07-14 12:28:37 EDT ---

Apparently this is not fixed with rsync 3.0.9?

To be clear: The above 140kB (system with only local users) and 500GiB (system using AD for non local users) are different files. 
Maybe I made the same mistake before as I did today (get lastlog from a machine where the reported size was not too big). 

For me an exclusion of /var/log/lastlog resolved the issue at the time and at the moment I'm not using an rsync based backup solution. But maybe there are people who need this to work?


To be complete:

RHEL 6.5:
=========
# rpm -qi rsync
Name        : rsync                        Relocations: (not relocatable)
Version     : 3.0.6                             Vendor: Red Hat, Inc.
Release     : 9.el6_4.1                     Build Date: Wed 23 Oct 2013 10:31:58 AM CEST
...

The "500GiB" lastlog used above on another machine (CentOS 6.5, same rsync version):
centos6.5# du -hs /var/log/lastlog 
40K	/var/log/lastlog
centos6.5# ls -l /var/log/lastlog 
-rw-r--r--. 1 root root 500825445128 Jul  8 15:28 /var/log/lastlog


rsync on RHEL 7 getting file from CentOS 6.5
============================================
Now retrieving that file by rsync using the RHEL 7 machine:
rhel7# rpm -qi rsync
Name        : rsync
Version     : 3.0.9
Release     : 11.el7
Architecture: x86_64
..
rhel7# rsync --dry-run --rsh=ssh --sparse -avz root@<centos6.5>:/va/log/lastlog ./
...
sent 14 bytes  received 43 bytes  8.77 bytes/sec
total size is 500825445128  speedup is 8786411318.04 (DRY RUN)

But the actual transfer might try to download everything (it does not return the size immediately and takes forever).
rhel7# rsync --rsh=ssh --sparse -avz root@<centos6.5>:/var/log/lastlog ./
..
receiving incremental file list
lastlog
^CKilled by signal 2.
rsync error: unexplained error (code 255) at rsync.c(551) [generator=3.0.9]
 rsync error: received SIGUSR1 (code 19) at main.c(1298) [receiver=3.0.9]


RHEL 7 -> RHEL 7:
=================
Get the local /var/log/lastlog on RHEL 7 (also using AD for non local users):
rhel7#  ls -l /var/log/lastlog 
-rw-r--r--. 1 root root 500825445128 Jul 14 11:47 /var/log/lastlog
rhel7# du -ks /var/log/lastlog 
24	/var/log/lastlog

rhel7# time rsync --dry-run --rsh=ssh --sparse -avz root@localhost:/var/log/lastlog ./
root@localhost's password: 
receiving incremental file list
lastlog

sent 14 bytes  received 43 bytes  10.36 bytes/sec
total size is 500825445128  speedup is 8786411318.04 (DRY RUN)

real	10m4.280s
user	0m0.188s
sys	0m0.615s

rhel7# time rsync --rsh=ssh --sparse -avz root@localhost:/var/log/lastlog ./
root@localhost's password: 
receiving incremental file list
lastlog
unexpected tag 25 [receiver]
rsync error: error in rsync protocol data stream (code 12) at io.c(1141) [receiver=3.0.9]
rsync: connection unexpectedly closed (37 bytes received so far) [generator]
rsync error: error in rsync protocol data stream (code 12) at io.c(605) [generator=3.0.9]

real	10m3.797s
user	1m20.166s
sys	0m0.039s



rsync on RHEL 6.5 getting lastlog from RHEL 7
=============================================
Get the same file (from this RHEL 7 machine) using RHEL 6.5:
rhel6.5# time rsync --dry-run --rsh=ssh --sparse -avz root@<rhel7>:/var/log/lastlog ./
root@<rhel7> password: 
receiving incremental file list
lastlog

sent 14 bytes  received 43 bytes  10.36 bytes/sec
total size is 500825445128  speedup is 8786411318.04 (DRY RUN)

real	10m4.889s
user	0m0.287s
sys	0m0.311s



rhel6.5# time rsync --rsh=ssh --sparse -avz root@<rhel7>:/var/log/lastlog ./
root@<rhel7> password: 
receiving incremental file list
lastlog
unexpected tag 25 [receiver]
rsync error: error in rsync protocol data stream (code 12) at io.c(1134) [receiver=3.0.6]
rsync: connection unexpectedly closed (37 bytes received so far) [generator]
rsync error: error in rsync protocol data stream (code 12) at io.c(600) [generator=3.0.6]

real	10m5.969s
user	0m0.604s
sys	0m0.558s

--- Additional comment from Martin Žember on 2014-08-27 05:52:49 EDT ---

Thanks for the update. The details are consistent with what we have discussed.

I think that you are right that it would be nice to have this resolved for other users even when you have worked around it.

One thing is the implementation of transferring sparse files in rsync which influences e.g. transferring virtual machine images as well.

The other thing is the size/implementation of the lastlog file. Regarding comment 12 that says it will not be fixed soon, there is a way -- there is a bug 951564 filed for Fedora and the fix could get into RHEL in the future.

--- Additional comment from Martin Žember on 2014-09-30 09:34:29 EDT ---

A note for possible future QA owner:

The reproducer /CoreOS/rsync/Regression/bz827429-rsync-sparse-var-log-lastlog-on-x86_64 had been failing for a long time, thus it has been disabled in TIP. Please remove the 'NOTIP' tag in TCMS once it will be fixed. Thanks.

Comment 6 Michal Ruprich 2017-07-26 12:22:38 UTC
I was playing around with sparse files for a while and rsync IS able to copy them. The problem is that it basically takes the same time as it would take with a normal file with the same size. Even the test mentioned at the end of comment #1 works but you have to increase the watchdog time to much higher number. So from all this it seems that the --sparse option actually doesn't bring any improvement over normal mode. 

But the problem is that the sparse mode in rsync might not be understood well. The performance on the sender is horrible because rsync actually reads the whole file, even the empty part. The optimization happens on the receiver where the blocks of zero are not all stored in the memory thus relieving the receiver a bit. For example cp or tar have some optimization for sparse file AFAIK - what it does is that the file is analysed very quickly and the zero blocks are not copied. This is something that rsync cannot do yet and it seems that the upstream doesn't plan to do something with this in near future.

There is however a workaround that might help you improve the performance:
1. Create a sparse file on the receiver with the same name as the file that you want to sync. 

OR

Use the --sparse option for the first transfer and simply wait. The option works but isn't any faster than normal mode.

2. Next time use the --inplace option instead of --sparse. This will activate the delta mode which transfers only changed blocks. This could help you improve the performance.

3. Always use the -z when transferring sparse file. It will not speed up the process on the sender but you will save a considerable amount of bandwidth since the -z compresses the data during transfer and blocks of zeros are compressed very efficiently.

Comment 7 Michal Ruprich 2017-07-27 15:04:57 UTC
Just an addition to comment #6:

I tried to capture the traffic between two systems with tcpdump when transferring files with rsync and when transferring sparse file that existed on the receiver with --inplace and -z, the amount of data that went through the network was basically zero. But as I said rsync still needs to read the whole sparse file...

Comment 8 Martin Zelený 2017-11-20 18:06:17 UTC
Fixed in rsync-3.1.2-4 on rhel-7.5


Note You need to log in before you can comment on or make changes to this bug.