From Bugzilla Helper: User-Agent: Mozilla/4.77 [de] (X11; U; Linux 2.2.19-6.2.7 i686) Description of problem: Simultaneous access to more than one IDE disk with DMA enabled can lead to massive data corruption under RH kernels 2.4 with certain chipset combinations. How reproducible: Always Steps to Reproduce: 1. Install two or more harddisks on the target system. 2. Install RH-7.1, put filesystems on every disk 3. Run heavy load on all disks simultaneously by copying directory trees from an NFS server to multiple disks. 4. Use diff on copied filesystems to show differences. If your root partition is on RAID0,1,5 on more than one disk use rpm -Va to show the mess. Actual Results: Copied filesystems differ, sometimes the filesystem itself gets corrupt. rpm -Va show far too many files. Expected Results: diff should not report diffenrences, rpm -Va should show only some configuration files, but of course no static files like program binaries. Additional info: Hardware where I could reproduce it: 1. DELL Precision Workstation 220 (i820 chipset, onboard UDMA IDE controller), two IBM drives (IC35L060AVER07-0) connected to the onboard controller with the original 80 wire cable. 2. DELL Precision Workstation 220 (i820 chipset), Promise Ultra100TX2 UDMA IDE controller, four IBM drives (IC35L060AVER07-0) connected to the Promise controller with two 80 wire cable. 3. DELL PowerEdge 1400 Server, ServerWorks CNB20LE chipset, one Promise Ultra100TX2 UDMA IDE controller, four IBM drives (IC35L060AVER07-0) connected to the Promise controller with two 80 wire cable. 4. DELL PowerEdge 1400 Server, ServerWorks CNB20LE chipset, two Promise Ultra100TX2 UDMA IDE controller, four IBM drives (IC35L060AVER07-0) connected to the Promise controller each as master with four 80 wire cable. Hardware that runs fine: 1. My old home server with i430HX chipset, Promise Ultra100TX2 UDMA IDE controller, four Quantum Fireball LM15 connected to the Promise controller with two 80 wire cable.
My promise controller seems to work fine with my 2 IBM drives, but they are each on a separate 12" 80 ribbon cable. The surprising part is that on-IBM drives seem to work fine. hmm I assume this is kernel 2.4.2-2; we release an updated 2.4.3-12 kernel 2 weeks ago.
I was using both 2.4.2-2 and 2.4.3-12 and even rawhide 2.4.5-something. All the same. I guess your Promise is not Ultra100TX2 since TX2 is only supported in 2.4.3-12. But anyway it happenend on the DELL i820 based system exactly the same way. This was the reason for me to buy the Promise controllers. I'm quite frustrated because I've seen some similar bugzilla reports here and it seems even people on w2k having all the same problem. Some just blame VIA, some blame Promise, some blame Highpoint but the problem still persists and it is always corruption of data transfer when using more than one IDE disk. FYI as stated above I also connected every disk to it's on channel but it didn't change anything. I guess something goes wrong with the DMA transfers between IDE chip and CPU/memory and it's not a problem of the transfers between the disk and the IDE chip.
I've used a Promise 100 and a Promise 100 TX2 at the same time (each with 2 drives attached on separate channels, for a total of 4 drives) in a ServerWorks LE (SuperMicro 370DLE) system, at UDMA 5, with no corruption problems. I did occasionally get the "interrupt lost" error followed by IDE I/O hanging etc. BTW, don't even think about trying to use the on-board IDE on those ServerWorks systems, see bug 38429 for more info.
Has anyone else seen corruption of this sort on 2.2.19 or greater kernels? We are dealing with various disk errors with Supermicro 370DLE board, Serverworks LE Chipset, and IDE system disk (master) and CD-rom(slave) on the ide0 bus of the onboard IDE controller. With seagate system disk we have seen corruption the same as described above. With Western Digital system disk we have seen a large number of I/O errors that make the system hang and the disk appear to be corrupted but it comes back after reboot and fsck. Steve Timm
Further notes on this problem: We have now installed kernel 2.4.7-2.9smp out of Rawhide and we see the same behavior that we did before, namely that there are an excessive number of I/O errors on the system disk, the system hangs, but is OK after reboot and fsck. It would seem that rumours that this fault has been fixed are somewhat premature.
With Tyan 2510 motherboard we see DMA problems with (1) all 3 disks on the motherboard (2) system disk on motherboard and two data disks on primary channel of promise ultra-66. (3) system disk on primary channel of promise ultra-66 and two data disks on secondary channel. In this third configuration we also see the "interrupt lost" errors mentioned above, as well as the DMA timeouts on any and all three of the drives that are also seen in configurations (1) and (2). Does anyone know any way to make a stable three-drive IDE configuration with this Tyan 2510 board...we could use either 2.2 or 2.4 kernels... just want to find something that works.
Thanks for the bug report. However, Red Hat no longer maintains this version of the product. Please upgrade to the latest version and open a new bug if the problem persists. The Fedora Legacy project (http://fedoralegacy.org/) maintains some older releases, and if you believe this bug is interesting to them, please report the problem in the bug tracker at: http://bugzilla.fedora.us/