Bug 879860 - rsync copy errors go undetected
Summary: rsync copy errors go undetected
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Fedora
Classification: Fedora
Component: rsync
Version: 17
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Michal Luscon
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-11-24 21:02 UTC by Johan Vromans
Modified: 2013-02-12 12:39 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-02-12 12:39:57 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
Full log of the copy tool (4.53 KB, text/x-log)
2012-11-24 21:02 UTC, Johan Vromans
no flags Details

Description Johan Vromans 2012-11-24 21:02:32 UTC
Created attachment 651269 [details]
Full log of the copy tool

Description of problem:

While copying large files to a USB stick using rsync, I noticed that sometimes the resultant file has a copy error which goes unnoticed. The rsync documentation is very clear: 

   Note that rsync always verifies that each transferred  file  was
   correctly  reconstructed  on  the  receiving  side by checking a
   whole-file checksum that is generated  as  the  file  is  trans‐
   ferred, ...

Version-Release number of selected component (if applicable):

Fedora 17 with all patches as of today.
rsync-3.0.9-4.fc17.x86_64

How reproducible:

It occurs rather frequently, on average 1 out of 3 copies.
Note that the cause of the copy problems is still unclear.
However, according to the rsync documentation, rsync should have detected the copy error.

Exerpt from the log from a copy tool:

Setup the (encrypted) USB disk:
  2012-11-23 09:15:09 + /sbin/cryptsetup luksOpen /dev/sdj2 cbb
  2012-11-23 09:15:10 + mount /dev/mapper/cbb /mnt/bkpmstk

Calculate Original checksums
  2012-11-23 09:15:11 + md5sum /backup/tmp/tmp7475/*/*tar.gz
  ...
  ae1dade7c739052631f45fab6fe78e18  /backup/tmp/tmp7475/phoenix::home.level-1/2012-11-23.tar.gz
  ...

Start the rsync:
  2012-11-23 09:17:38 + rsync -av -L --safe-links -c --delete /backup/tmp/tmp7475/ /mnt/bkpmstk/
  sending incremental file list
  ./
  ...
  phoenix::home.level-1/
  phoenix::home.level-1/2012-11-23.tar.gz
  ...

  sent 3182891130 bytes  received 54 bytes  2572033.28 bytes/sec
  total size is 8641550697  speedup is 2.72

Calculate Checksums on copy
  2012-11-23 09:38:15 + md5sum /mnt/bkpmstk/*/*tar.gz
  ...
  f3c715688a7f8727c2dc100d21bd156c  /mnt/bkpmstk/phoenix::home.level-1/2012-11-23.tar.gz
  ...

Verify the copy:
  ...
  2012-11-23 09:54:10 + cmp /mnt/bkpmstk/phoenix::home.level-1/2012-11-23.tar.gz phoenix::home.level-1/2012-11-23.tar.gz
  /mnt/bkpmstk/phoenix::home.level-1/2012-11-23.tar.gz phoenix::home.level-1/2012-11-23.tar.gz differ: byte 551474687, line 2084699 is  34 ^\  74 <

There's a 1-bit error in the copied file.

Additional info:

I see errors like this with different files and USB sticks. A badblock scan of the USB sticks reveals no problems with the media.

I started getting these errors a couple of days ago. This copy tool has been in use for years and never revealed problems like this. 

Again, the cause of the errors, it is always one single bit that is off while it should be on, is unknown. However, rsync should have detected it.

Comment 1 Johan Vromans 2012-11-29 07:24:46 UTC
The cause of the errors has been tracked down to a memory problem.
However, that still doesn't explain why rsync doesn't catch it given that all other checksum calculating tools consistently report different checksums for the original and copied files.

Lowering priority to medium.

Comment 2 Michal Luscon 2012-12-03 13:54:40 UTC
Hi Johan,
could you provide binary copies of transmitted files?

Comment 3 Johan Vromans 2012-12-03 14:24:55 UTC
I'm sorry, that is not possible.  The files are backups (tar.gz) of my /home partition. They're over 3G in size, and contain lots of things that I rather not share.
However, the problems occurred with several different backups (taken on different days) so I don't think the actual content matters. It's likely that the size does matter, it probably enforces memory to be used for data transfer that normally would be occupied by kernel buffers.

I think you may be able to reproduce the copy problem as follows:
- rsync a huge file to a relatively slow medium,
- while the file is being written, clobber one of its already written bits,
- see if rsync detects that the final copy is not identical to the original, as per documentation.

Comment 4 Michal Luscon 2013-02-12 12:39:57 UTC
As far as I understand rsync code correctly, rsync only ensures that no error during file transfer occurred and doesn't perform any memory->filesystem write check. 

Closing as a not a bug.


Note You need to log in before you can comment on or make changes to this bug.