Red Hat Bugzilla – Bug 211752
Data corruption in copying large files
Last modified: 2007-11-30 17:11:46 EST
Description of problem:
I've spent the last two weeks tracking down the cause of data corruption in
large files (between 500MB and 4GB) on a pair of SATA hard drives. The symptom
is that after copying a file, an MD5 checksum of the copy against the original
would show a mismatch. Detailed comparison of the files using 'cmp -l' shows
that the differences are clustered into one or more 4K chunks, but almost every
byte within these chunks differs from the original.
I tested the hardware using Maxtor's PowerMax diagnostics (both a full read and
burn-in tests) and memtest86+ (multiple passes running overnight). The hardware
tests all passed.
I tried updating the hard disk firmware and the motherboard BIOS. That did not
solve the problem.
I tried running a similar copy test in Windows XP (32-bit, on separate
partitions) running cygwin. All files copied correctly.
I had both FC5 and FC6 RC2 installed, so I tried copying files under different
kernels. In all versions -- 2.6.15-1.2056_FC5, 2.6.17-1.2630.fc6, and
2.6.18-1.2798.fc6 -- I was able to reproduce the problem.
I found a linux-kernel message on the web (http://lkml.org/lkml/2006/9/8/289)
from someone who has a similar hardware and software setup to mine and was
having data corruption problems as well. I wrote to him for a follow-up, and he
reports that after upgrading the kernel from 2.6.17-1.2157_FC5 to 2.6.18-rc6, he
has not seen any further data corruption, although he admits that they have not
done much follow-up testing.
I collected MD5 checksums for the individual 4K blocks in the files that had
been corrupted (both source and destination) and discovered something very
interesting: some of the corrupted blocks in the destination files were found to
be an exact MD5 match for *different* blocks in previously copied files! Since
the hard drive's cache is only 16MB and the distance between one of the the
source blocks and the destination block to which it got incorrectly written was
over 751MB, I think that's a clear indication that the kernel is writing the
wrong disk buffer out to disk. Most of my system's 4GB of memory is being used
for cache (3.4GB).
At this point I suspect a race condition may be overwriting buffer pointers.
But given the number of times I've copied the same files over and over in
testing, it's also possible I'm looking at old sector data in which buffered
writes never got flushed out. I also would not rule out the possibility of a
hardware problem with DMA transfers, though I have no idea how to test that.
Version-Release number of selected component (if applicable):
Tested with the following kernels:
I'd estimate 1 bad 4K block in every 200,000 for unbroken streaming writes.
I've been able to reproduce the problem fairly consistently, except for this
morning after starting the computer up I was able to copy over 23GB of data
without any checksum errors. I don't know if that's because the computer is
cooler or if there is some other condition to the test that's different.
Steps to Reproduce:
1. Create (or download/copy from an external source) one or more files roughly
1GB in size.
2. Copy the file(s) from one hard disk to another.
3. Run md5sum over the original and the copy and compare.
MD5 checksums are different in 50-75% of the files. A bytewise comparison using
'cmp -l' shows that differences are localized in one or a few 4K chunks.
MD5 checksums should be equal.
Motherboard: Tyan Thunder K8WE (S2895A), BIOS upgraded to 1.04
CPU: (2) AMD Opteron 270 HE, 2.0GHz
I/O Controller: on-board nVidia nForce4
Hard disks: (2) Maxtor DiamondMax 10 300GB SATA, firmware upgraded to BANC1G20
Here are the results of my most recent test. I started with a group of files
that I had copied onto both hard drives, and repeatedly re-synced until the
checksums matched. I then ran a shell command which went through each file and
made one copy from disk A to disk B and another copy from disk B to disk A:
$ for file in *.DAT ; do cp -av /diska/$file /diskb/$file.2 ; cp -av
/diskb/$file /diska/$file.2 ; done
Next, A gathered MD5 checksums for each 4K block in the source and destination
$ for file in *.DAT *.DAT.2 ; do size=`du $file | cut -f 1` ; size=$((size/4)) ;
block=0 ; cat /dev/null > $file.MD5s ; while [ $block -lt $size ] ; do echo -ne
"$file: $block\r" ; dd if=$file bs=4096 skip=$block count=1 2>/dev/null | md5sum
>> $file.MD5s ; block=$((block+1)) ; done ; echo "$file: finished" ; done
Using the .MD5s files I could easily compare chunks in one file to chunks in
other files no matter which block they occurred in. I will attach an extraction
from these files which show the blocks that differed between the original and
copy A or B. In all but 1 case, the block that was corrupted matches a block
that came from an earlier file or an earlier block of the same file. The sole
exception (copy A, file 3, block 159829) can be explained by the fact that I
copied the same from from A to B first, so that whole file would have been
cached when copying back from B to A.
This still doesn't rule out the possibility that buffers simply aren't being
written out. So for my next test, I'm going to erase all of these copies, fill
the hard disk (as much as I can) with a simple fixed pattern, and try creating
the copies again.
Created attachment 139072 [details]
List of MD5 checksums for corrupted blocks
I'm sorry, but after numerous tests in trying to analyze the problem in a more
controlled environment, I have been unable to reproduce the file corruption. At
the moment my best guess is that between upgrading the hard drive firmware and
performing test copies last Friday, I don't think I had power-cycled the
computer. All test files have copied without corruption since I switched the
computer on yesterday. So I think that the firmware upgrade required a cold
restart in order to take effect.
I don't understand why that would be the case, since everything I can find on
this firmware upgrade seems to indicate it is meant to fix a drive detection
problem, not data corruption, and that doesn't explain why the errors are in
Linux page sizes. But then Maxtor doesn't say exactly what the difference is
between BANC1G10 and BANC1G20.