Bug 211752
| Summary: | Data corruption in copying large files | ||||||
|---|---|---|---|---|---|---|---|
| Product: | [Fedora] Fedora | Reporter: | Trevin Beattie <trevin> | ||||
| Component: | kernel | Assignee: | Kernel Maintainer List <kernel-maint> | ||||
| Status: | CLOSED INSUFFICIENT_DATA | QA Contact: | Brian Brock <bbrock> | ||||
| Severity: | low | Docs Contact: | |||||
| Priority: | medium | ||||||
| Version: | 6 | CC: | wtogami | ||||
| Target Milestone: | --- | ||||||
| Target Release: | --- | ||||||
| Hardware: | x86_64 | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2006-10-23 02:11:25 UTC | Type: | --- | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
|
Description
Trevin Beattie
2006-10-21 21:32:37 UTC
Here are the results of my most recent test. I started with a group of files
that I had copied onto both hard drives, and repeatedly re-synced until the
checksums matched. I then ran a shell command which went through each file and
made one copy from disk A to disk B and another copy from disk B to disk A:
$ for file in *.DAT ; do cp -av /diska/$file /diskb/$file.2 ; cp -av
/diskb/$file /diska/$file.2 ; done
Next, A gathered MD5 checksums for each 4K block in the source and destination
files:
$ for file in *.DAT *.DAT.2 ; do size=`du $file | cut -f 1` ; size=$((size/4)) ;
block=0 ; cat /dev/null > $file.MD5s ; while [ $block -lt $size ] ; do echo -ne
"$file: $block\r" ; dd if=$file bs=4096 skip=$block count=1 2>/dev/null | md5sum
>> $file.MD5s ; block=$((block+1)) ; done ; echo "$file: finished" ; done
Using the .MD5s files I could easily compare chunks in one file to chunks in
other files no matter which block they occurred in. I will attach an extraction
from these files which show the blocks that differed between the original and
copy A or B. In all but 1 case, the block that was corrupted matches a block
that came from an earlier file or an earlier block of the same file. The sole
exception (copy A, file 3, block 159829) can be explained by the fact that I
copied the same from from A to B first, so that whole file would have been
cached when copying back from B to A.
This still doesn't rule out the possibility that buffers simply aren't being
written out. So for my next test, I'm going to erase all of these copies, fill
the hard disk (as much as I can) with a simple fixed pattern, and try creating
the copies again.
Created attachment 139072 [details]
List of MD5 checksums for corrupted blocks
I'm sorry, but after numerous tests in trying to analyze the problem in a more controlled environment, I have been unable to reproduce the file corruption. At the moment my best guess is that between upgrading the hard drive firmware and performing test copies last Friday, I don't think I had power-cycled the computer. All test files have copied without corruption since I switched the computer on yesterday. So I think that the firmware upgrade required a cold restart in order to take effect. I don't understand why that would be the case, since everything I can find on this firmware upgrade seems to indicate it is meant to fix a drive detection problem, not data corruption, and that doesn't explain why the errors are in Linux page sizes. But then Maxtor doesn't say exactly what the difference is between BANC1G10 and BANC1G20. |