Created attachment 651269 [details]
Full log of the copy tool
Description of problem:
While copying large files to a USB stick using rsync, I noticed that sometimes the resultant file has a copy error which goes unnoticed. The rsync documentation is very clear:
Note that rsync always verifies that each transferred file was
correctly reconstructed on the receiving side by checking a
whole-file checksum that is generated as the file is trans‐
Version-Release number of selected component (if applicable):
Fedora 17 with all patches as of today.
It occurs rather frequently, on average 1 out of 3 copies.
Note that the cause of the copy problems is still unclear.
However, according to the rsync documentation, rsync should have detected the copy error.
Exerpt from the log from a copy tool:
Setup the (encrypted) USB disk:
2012-11-23 09:15:09 + /sbin/cryptsetup luksOpen /dev/sdj2 cbb
2012-11-23 09:15:10 + mount /dev/mapper/cbb /mnt/bkpmstk
Calculate Original checksums
2012-11-23 09:15:11 + md5sum /backup/tmp/tmp7475/*/*tar.gz
Start the rsync:
2012-11-23 09:17:38 + rsync -av -L --safe-links -c --delete /backup/tmp/tmp7475/ /mnt/bkpmstk/
sending incremental file list
sent 3182891130 bytes received 54 bytes 2572033.28 bytes/sec
total size is 8641550697 speedup is 2.72
Calculate Checksums on copy
2012-11-23 09:38:15 + md5sum /mnt/bkpmstk/*/*tar.gz
Verify the copy:
2012-11-23 09:54:10 + cmp /mnt/bkpmstk/phoenix::home.level-1/2012-11-23.tar.gz phoenix::home.level-1/2012-11-23.tar.gz
/mnt/bkpmstk/phoenix::home.level-1/2012-11-23.tar.gz phoenix::home.level-1/2012-11-23.tar.gz differ: byte 551474687, line 2084699 is 34 ^\ 74 <
There's a 1-bit error in the copied file.
I see errors like this with different files and USB sticks. A badblock scan of the USB sticks reveals no problems with the media.
I started getting these errors a couple of days ago. This copy tool has been in use for years and never revealed problems like this.
Again, the cause of the errors, it is always one single bit that is off while it should be on, is unknown. However, rsync should have detected it.
The cause of the errors has been tracked down to a memory problem.
However, that still doesn't explain why rsync doesn't catch it given that all other checksum calculating tools consistently report different checksums for the original and copied files.
Lowering priority to medium.
could you provide binary copies of transmitted files?
I'm sorry, that is not possible. The files are backups (tar.gz) of my /home partition. They're over 3G in size, and contain lots of things that I rather not share.
However, the problems occurred with several different backups (taken on different days) so I don't think the actual content matters. It's likely that the size does matter, it probably enforces memory to be used for data transfer that normally would be occupied by kernel buffers.
I think you may be able to reproduce the copy problem as follows:
- rsync a huge file to a relatively slow medium,
- while the file is being written, clobber one of its already written bits,
- see if rsync detects that the final copy is not identical to the original, as per documentation.
As far as I understand rsync code correctly, rsync only ensures that no error during file transfer occurred and doesn't perform any memory->filesystem write check.
Closing as a not a bug.