From Bugzilla Helper: User-Agent: Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 4.0) Description of problem: I installed the RH7.1 server system on a PC containing 4 IBM UDMA disks. If those disks are configured to RAID 5, the resultant file of the cp (and dd) command is DIFFERENT from the original file! However, on the same hardware, the cp (and dd) command produces the SAME target file as the original if I DO NOT use RAID array. How reproducible: Always Steps to Reproduce: 1.cp dx100.tar dx100.tar1 (while dx100.tar is of 650MB size) 2.cmp dx100.tar dx100.tar1 3.cp dx100.tar dx100.tar2 (just repeat the procedure) 4.cmp dx100.tar dx100.tar2 Actual Results: the cmp result (of step 2):"dx100.tar and dx100.tar1 differ: char 9359357 line 33367" the cmp result (of step 4) "dx100.tar dx100.tar2 differ: char 106258429 line 1321617" Expected Results: the cmp should returns 0 (not 1) because both dx100.tar1 and dx100.tar2 are the direct copy of dx100.tar. Additional info: The hardware is as follows: Motherboard: Epox (EP-D3VA) CPU: PIII 866MHz x 2 (dual cpu) RAM: 512MB Hard disks: (4 x) IBM UDMA 41.0GB I also installed RH7.1 server to the same PC without configuring the hard drives to RAID array. The cp command produced the correct result. Furthermore, I have 2 other servers using SCSI RAID5 disks (under RedHat 6.2 and RedHat 7.0). The cp command is OK. I also have a server using IDE RAID1 under RedHat 7.0. The cp command works OK.
cp and dd don't make a difference between raid and non-raid devices. I tend to believe this is a raid driver bug.
I assume this is linux softwareraid. Could you try running with "ide=nodma" as commandline option to rule out IDE DMA corruption ?
After further test, I believe that the problem might be in the driver for the onboard HTP370 (IDE adaptor) driver. I have been using a mother board (EPOX D3VA) consisting 2 conventional UDMA66 channels and further 2 UDMA100 channels based on a HTP370 (High Point 370) chipset. Under Redhat 7.1, the disks on the conventional UDMA66 channels are /dev/hda, /dev/hdb, ..., /dev/hdd and the disks on the HTP channels are /dev/hde, ..., /dev/hdh. Using Redhat 7.1 software RAID 5 on the conventional UDMA66 channels (/dev/hda, ..., /dev/hde), the result is OK and there is NO data corruption! This indicates that the software for the RAID 5 in Redhat 7.1 is OK. However, I got the reported data corruption problem when connecting the SAME disks, the SAME RAID 5 configuration (just changing /dev/hda to /dev/hde, etc in the /etc/raidtab) to the HTP channels. Furthermore, when I put the disks to the UDMA66 channels (using /dev/hda, etc.) together with the reported corrupted data on the disks and run the cmp command, the cmp magically returns 0 (ie. no data corruption!) Further tests show NO data corruption on individual disk when connected to the HTP channels (ie, not running RAID5). Therefore, I now suppect that the problem is caused by the combination of RAID5 and HTP370 driver.
Sorry, I already took that machine apart and replaced with a different mother board (SuperMicro 370DLE) and using a 3ware (http://www.3ware.com) Escalade 6800 raid card. However, I still choose to use the software raid as it offers better performance than the hardware RAID5. Because the 3ware uses SCSI interface on the UDMA disk, the software RAID5 (on scsi disks) does suffer data corruption. I will try to use ide=nodma next time when I setup the Epox D3VA hardware and let you know the result.
Closing: Appears to be the now worked around old VIA chipset hardware problem