Bug 35981

Summary: file corruption
Product: [Retired] Red Hat Linux Reporter: Chris Ricker <chris.ricker>
Component: kernelAssignee: Michael K. Johnson <johnsonm>
Status: CLOSED CURRENTRELEASE QA Contact: Brock Organ <borgan>
Severity: high Docs Contact:
Priority: high    
Version: 7.1   
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2001-09-05 17:37:44 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Chris Ricker 2001-04-15 20:38:54 UTC
This is with a clean install of seawolf on a dual celery system with two
IDE drives, a IBM-DTLA-307045 as hde and a WDC AC313000R as hdh running off
of a Promise PDC20267 ATA-100 controller.

I'm experiencing subtle file corruptions when I do large file transfers.
Occasionally binaries on /usr are getting modified (as evidenced by failure
to run and changed md5sums according to rpm -V).  In addition, I've noticed
some corruptions in C source files -- never more than 2 or 3 bytes worth
changed.

I'll fire up a cerberus run and see if I can get something more readily
reproducible out of that.

Comment 1 Arjan van de Ven 2001-04-15 20:49:11 UTC
What chipset is this ?
And does it happen if you disable DMA ?
(eg "ide=nodma" on the commandline)

Comment 2 Chris Ricker 2001-04-15 22:29:43 UTC
The motherboard is an i440-BX (ABIT BP6?).  Onboard, it has PIIX4 and HPT366
1.26, of which only the PIIX4 is being used (for a CD burner and a Zip).  The
hard drives are running off a Promise controller.

I can't boot the machine with ide=nodma.  When I do, I get constant status
errors, status timeouts, and drives not ready on both drives at the first
attempt to write to them, and the machine is hung.

Booting without any options, I can reliably reproduce corruption by running the
Cerberus (1.3pre1) data test simultaneously against partitions on both drives.
The box invariably fails md5sums on the writes, and will often eventually hard
lock (though the lockup is not totally reproducible).

I get these results whether I run the seawolf smp kernel or a custom-built smp
kernel (built from the seawolf 2.4.2-2 source tree).

Comment 3 Arjan van de Ven 2001-04-15 22:34:45 UTC
Are you using 80ribbon cables ?

Comment 4 Chris Ricker 2001-04-15 22:37:52 UTC
yep

Comment 5 Arjan van de Ven 2001-04-15 22:42:29 UTC
I assume you're not overclocking your system.
Can you give the output of "hdparm -i /dev/hde" ?
(eg that tells us which DMA mode the drive is in)

Comment 6 Chris Ricker 2001-04-15 22:46:18 UTC
Nope, it's not overclocked (and it's an oldish setup that was quite stable with
7.0 and 6.x before that).  The drives are booting up in udma5 for the IBM and
udma2 for the WD (no hdparm tweaking done).

[kaboom@verdande kaboom]$ sudo hdparm -i /dev/hde

/dev/hde:

 Model=IBM-DTLA-307045, FwRev=TX6OA50C, SerialNo=YMDYMM98798
 Config={ HardSect NotMFM HdSw>15uSec Fixed DTR>10Mbs }
 RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=40
 BuffType=DualPortCache, BuffSize=1916kB, MaxMultSect=16, MultSect=16
 CurCHS=16383/16/63, CurSects=-66060037, LBA=yes, LBAsects=90069840
 IORDY=on/off, tPIO={min:240,w/IORDY:120}, tDMA={min:120,rec:120}
 PIO modes: pio0 pio1 pio2 pio3 pio4
 DMA modes: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 *udma5
[kaboom@verdande kaboom]$ sudo hdparm -i /dev/hdh

/dev/hdh:

 Model=WDC AC313000R, FwRev=17.01J17, SerialNo=WD-WT6760209354
 Config={ HardSect NotMFM HdSw>15uSec SpinMotCtl Fixed DTR>5Mbs FmtGapReq }
 RawCHS=16383/16/63, TrkSize=57600, SectSize=600, ECCbytes=40
 BuffType=DualPortCache, BuffSize=512kB, MaxMultSect=16, MultSect=16
 CurCHS=16383/16/63, CurSects=-66060037, LBA=yes, LBAsects=25429824
 IORDY=on/off, tPIO={min:160,w/IORDY:120}, tDMA={min:120,rec:120}
 PIO modes: pio0 pio1 pio2 pio3 pio4
 DMA modes: mdma0 mdma1 mdma2 udma0 udma1 *udma2 udma3 udma4
[kaboom@verdande kaboom]$


Comment 7 Arjan van de Ven 2001-04-15 22:56:23 UTC
Thanks for the information. I'll contact the IDE maintainer and see if any 
of this rings a bell..

Comment 8 Chris Ricker 2001-06-27 17:18:51 UTC
This bug still exists with kernel-smp-2.4.3-12

Comment 9 Chris Ricker 2001-08-15 19:12:53 UTC
I'm seeing what I think is the same problem on that machine (now with ext3)
under beta3 and using a kernel compiled from kernel-source-2.4.7-0.13.1. 
binaries are getting slightly corrupted, segfaulting, and failing checksum
validations.

Comment 10 Chris Ricker 2001-09-05 17:37:39 UTC
I only see the corruption if both the WD and the IBM are in use.  If I just have
either one by itself, things seem fine.

Might there be some sort of timing chatter going on?

Comment 11 Chris Ricker 2002-06-06 04:07:19 UTC
I'm closing this, since I've not been seeing this problem with 7.3 on that machine.