Bug 597304 - rsync is not doing whole-file checksum after file is transferred
Summary: rsync is not doing whole-file checksum after file is transferred
Status: CLOSED NOTABUG
Alias: None
Product: Fedora
Classification: Fedora
Component: rsync (Show other bugs)
(Show other bugs)
Version: 12
Hardware: All Linux
low
urgent
Target Milestone: ---
Assignee: Jan Zeleny
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Keywords:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-05-28 15:51 UTC by jairo medina
Modified: 2010-06-03 07:11 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2010-06-03 07:11:50 UTC
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

Description jairo medina 2010-05-28 15:51:52 UTC
Description of problem: A file is not being transferred unless I add the -c option. rsync behaves in two different ways: if source is newer, the destination gets the newer timestamp but the actual file never gets updated, in the other case, if the destination is newer, the file gets updated correctly. 
This problem would be avoided if the whole-file checksum had happened after the transfer.

Manual page states: 
"Note  that rsync always verifies that each transferred file was correctly reconstructed on the receiving side by checking a whole-file checksum that is  generated as the file is transferred, but that automatic after-the-transfer verification has nothing to do with this option’s before-the-transfer  "Does  this file need to be updated?" check."

Version-Release number of selected component (if applicable):
rsync-3.0.7-3.fc12.i686

How reproducible: always


Steps to Reproduce:
[j@l temp]$ mkdir temp
[j@l temp]$ echo "this is a test file" > aaa
[j@l temp]$ echo "this is a tets file" > temp/aaa
[j@l temp]$ touch -t 1001011201.01 aaa
[j@l temp]$ touch -t 1001011201.01 temp/aaa
[j@l temp]$ md5sum aaa temp/aaa
4221d002ceb5d3c9e9137e495ceaa647  aaa
d5112fb75649998cda68b180b0ad7e9b  temp/aaa
[j@l temp]$ ls -lrt --full-time aaa temp/aaa
-rw-r--r--. 1 j a 20 2010-01-01 12:01:01.000000000 -0500 temp/aaa
-rw-r--r--. 1 j a 20 2010-01-01 12:01:01.000000000 -0500 aaa
[j@l temp]$ rsync -Cav aaa temp/aaa
sending incremental file list

sent 46 bytes  received 12 bytes  116.00 bytes/sec
total size is 20  speedup is 0.34
[jairo@linux10 temp]$ ls -lrt --full-time aaa temp/aaa
-rw-r--r--. 1 j a 20 2010-01-01 12:01:01.000000000 -0500 temp/aaa
-rw-r--r--. 1 j a 20 2010-01-01 12:01:01.000000000 -0500 aaa
[j@l temp]$ md5sum aaa temp/aaa
4221d002ceb5d3c9e9137e495ceaa647  aaa
d5112fb75649998cda68b180b0ad7e9b  temp/aaa
[j@l temp]$ cat aaa
this is a test file
[j@l temp]$ cat temp/aaa
this is a tets file
[j@l temp]$ rsync -c -Cav aaa temp/aaa
sending incremental file list
aaa

sent 125 bytes  received 31 bytes  312.00 bytes/sec
total size is 20  speedup is 0.13
[j@l temp]$ md5sum aaa temp/aaa
4221d002ceb5d3c9e9137e495ceaa647  aaa
4221d002ceb5d3c9e9137e495ceaa647  temp/aaa
  
Actual results:
File at destination does not get updated, so it is different from the source.

Expected results:
Same file at destination as the file at the source

Additional info:

Comment 1 Jan Zeleny 2010-06-01 13:06:16 UTC
I believe your test is not entirely correct. As you touch those two files, you add them the same timestamp - try to give them timestamp so the temp/aaa is older. In that case it seems to work and I believe it's a correct behavior.

Comment 2 jairo medina 2010-06-01 14:05:42 UTC
The test is a replication of a real live situation that I did encounter with important files not being synchronized. I was actually happy I could replicate it, this way it can be reviewed and corrected.

In my message I addressed the impact of the timestamps.

Let me know if you need more information to get this issue corrected.

Thank you.

Comment 3 Jan Zeleny 2010-06-02 07:14:29 UTC
I don't even think there is an issue, that's why I wrote comment 1. I think it is logical, that rsync doesn't overwrite file, which has the same timestamp as the source file. My opinion is that situations like this usually mean one of two things: depending on its checksum it is either entirely different file or it hasn't changed from last time it was synced. In both cases it would be a mistake to synchronize them. Or do you have any specific manual or documentation reference that says otherwise?

My previous point was that files are synchronized without any problem when the destination file is undoubtedly older than the source file. That works ok, right?

But back to the same timestamps scenario. How rsync will behave in that situation is not up to me. I recommend you contact upstream, because I certainly won't change this behavior (which I repeat is probably not an error) without their approval.

Comment 4 jairo medina 2010-06-02 14:37:17 UTC
My expectation is that the source will overwrite the destination if they are different, that is not happening.

I quoted a paragraph from the manual page for rsync. The man page states, if I understand it correctly, that a whole-file checksum is performed automatically.

To give you more information: 
 - the files I need to rsync are small files, aprox. 951 bytes each.
 - as I mentioned, if I use -c, the checksum works and the files get sync'd (also happens if the timestamps show the destination files are older).

I see multiple possibilities:
 - perhaps the manual page needs to be updated to clarify when the whole-file checksum is useful and when the -c should be used, i.e. the small file size voids the whole-file checksum when the timestamps are the same.
 - perhaps this is a bug and the file should have been rsync'd.

I think it is a good idea to involve the main developers of the program (upstream), their clarification will be useful.

If you are on their mailing list, could you point them to this bug report ?

Comment 5 Jan Zeleny 2010-06-03 07:11:50 UTC
(In reply to comment #4)
> My expectation is that the source will overwrite the destination if they are
> different, that is not happening.

It will overwrite, but it has to have older timestamp - as I pointed out, if the timestamp is exactly the same and the file is different, from rsync's point of view it is most likely entirely different file.

> I quoted a paragraph from the manual page for rsync. The man page states, if I
> understand it correctly, that a whole-file checksum is performed automatically.

This is where you are wrong. The option -c in manual is perfectly described and it doesn't say anything like this. Please read the manual again and it should be clear that -c has nothing to do with the section you quoted. Even the section you wrote states that "automatic after-the-transfer verification has nothing to do with this option’s before-the-transfer check".

> I see multiple possibilities:
>  - perhaps the manual page needs to be updated to clarify when the whole-file
> checksum is useful and when the -c should be used, i.e. the small file size
> voids the whole-file checksum when the timestamps are the same.

The manual is pretty clear on this.

>  - perhaps this is a bug and the file should have been rsync'd.

No, it shouldn't have. The manual says it very clearly.

> I think it is a good idea to involve the main developers of the program
> (upstream), their clarification will be useful.

I don't think so. They would tell you the same thing I already told you.

Because your last comment confirmed by suspicion that this is rather misunderstanding than actual bug, I'm closing it as NOTABUG.


Note You need to log in before you can comment on or make changes to this bug.