Bug 47098

Summary: Corrupt IDE data transfers when using more than one IDE disk
Product: [Retired] Red Hat Linux Reporter: Simon Matter <simon.matter>
Component: kernelAssignee: Arjan van de Ven <arjanv>
Status: CLOSED CURRENTRELEASE QA Contact: Brock Organ <borgan>
Severity: high Docs Contact:
Priority: medium    
Version: 7.1   
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2004-09-30 15:39:04 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Simon Matter 2001-07-03 07:17:38 UTC
From Bugzilla Helper:
User-Agent: Mozilla/4.77 [de] (X11; U; Linux 2.2.19-6.2.7 i686)

Description of problem:
Simultaneous access to more than one IDE disk with DMA enabled can lead to
massive data corruption under RH kernels 2.4 with certain chipset
combinations.

How reproducible:
Always

Steps to Reproduce:
1. Install two or more harddisks on the target system.
2. Install RH-7.1, put filesystems on every disk
3. Run heavy load on all disks simultaneously by copying directory trees
from an NFS server to multiple disks.
4. Use diff on copied filesystems to show differences. If your root
partition is on RAID0,1,5 on more than one disk use rpm -Va to show the
mess.

Actual Results:  Copied filesystems differ, sometimes the filesystem itself
gets corrupt. rpm -Va show far too many files.

Expected Results:  diff should not report diffenrences, rpm -Va should show
only some configuration files, but of course no static files like program
binaries.

Additional info:

Hardware where I could reproduce it:
1. DELL Precision Workstation 220 (i820 chipset, onboard UDMA IDE
controller), two IBM drives (IC35L060AVER07-0) connected to the onboard
controller with the original 80 wire cable.
2. DELL Precision Workstation 220 (i820 chipset), Promise Ultra100TX2 UDMA
IDE controller, four IBM drives (IC35L060AVER07-0) connected to the Promise
controller with two 80 wire cable.
3. DELL PowerEdge 1400 Server, ServerWorks CNB20LE chipset, one Promise
Ultra100TX2 UDMA IDE controller, four IBM drives (IC35L060AVER07-0)
connected to the Promise controller with two 80 wire cable.
4. DELL PowerEdge 1400 Server, ServerWorks CNB20LE chipset, two Promise
Ultra100TX2 UDMA IDE controller, four IBM drives (IC35L060AVER07-0)
connected to the Promise controller each as master with four 80 wire cable.

Hardware that runs fine:
1. My old home server with i430HX chipset, Promise Ultra100TX2 UDMA IDE
controller, four Quantum Fireball LM15 connected to the Promise controller
with two 80 wire cable.

Comment 1 Arjan van de Ven 2001-07-03 08:07:10 UTC
My promise controller seems to work fine with my 2 IBM drives, but they
are each on a separate 12" 80 ribbon cable. The surprising part is that on-IBM
drives seem to work fine. hmm

I assume this is kernel 2.4.2-2; we release an updated 2.4.3-12 kernel 2 weeks
ago.


Comment 2 Simon Matter 2001-07-03 08:39:02 UTC
I was using both 2.4.2-2 and 2.4.3-12 and even rawhide 2.4.5-something. All the
same. I guess your Promise is not Ultra100TX2 since TX2 is only supported in
2.4.3-12. But anyway it happenend on the DELL i820 based system exactly the same
way. This was the reason for me to buy the Promise controllers.

I'm quite frustrated because I've seen some similar bugzilla reports here and it
seems even people on w2k having all the same problem. Some just blame VIA, some
blame Promise, some blame Highpoint but the problem still persists and it is
always corruption of data transfer when using more than one IDE disk.

FYI as stated above I also connected every disk to it's on channel but it didn't
change anything.

I guess something goes wrong with the DMA transfers between IDE chip and
CPU/memory and it's not a problem of the transfers between the disk and the IDE
chip.


Comment 3 Bob Farmer 2001-07-30 11:23:19 UTC
I've used a Promise 100 and a Promise 100 TX2 at the same time (each with 2
drives attached on separate channels, for a total of 4 drives) in a ServerWorks
LE (SuperMicro 370DLE) system, at UDMA 5, with no corruption problems.  I did
occasionally get the "interrupt lost" error followed by IDE I/O hanging etc. 
BTW, don't even think about trying to use the on-board IDE on those ServerWorks
systems, see bug 38429 for more info. 


Comment 4 Steve Timm 2001-09-10 16:56:06 UTC
Has anyone else seen corruption of this sort on 2.2.19 or greater kernels?
We are dealing with various disk errors with Supermicro 370DLE board,
Serverworks LE Chipset, and IDE system disk (master) and CD-rom(slave)
on the ide0 bus of the onboard IDE controller.  With seagate system disk we have
seen corruption the same as described above.  With Western Digital system
disk we have seen a large number of I/O errors that make the 
system hang and the disk appear to be corrupted but it comes back
after reboot and fsck.

Steve Timm


Comment 5 Steve Timm 2001-09-25 17:27:00 UTC
Further notes on this problem:  

We have now installed kernel 2.4.7-2.9smp out of Rawhide and we 
see the same behavior that we did before, namely that there are an 
excessive number of I/O errors on the system disk, the system
hangs, but is OK after reboot and fsck.
It would seem that rumours that this fault has been fixed are
somewhat premature.



Comment 6 Steve Timm 2001-10-29 17:36:50 UTC
With Tyan 2510 motherboard we see DMA problems with 
(1) all 3 disks on the motherboard 
(2) system disk on motherboard and two data disks on primary 
channel of promise ultra-66.
(3) system disk on primary channel of promise ultra-66 and 
two data disks on secondary channel.
In this third configuration we also see the "interrupt lost"
errors mentioned above, as well as the DMA timeouts on any 
and all three of the drives that are also seen in configurations (1) 
and (2).

Does anyone know any way to make a stable three-drive IDE configuration
with this Tyan 2510 board...we could use either 2.2 or 2.4 kernels...
just want to find something that works.



Comment 7 Bugzilla owner 2004-09-30 15:39:04 UTC
Thanks for the bug report. However, Red Hat no longer maintains this version of
the product. Please upgrade to the latest version and open a new bug if the problem
persists.

The Fedora Legacy project (http://fedoralegacy.org/) maintains some older releases, 
and if you believe this bug is interesting to them, please report the problem in
the bug tracker at: http://bugzilla.fedora.us/