Bug 35981 - file corruption
Summary: file corruption
Alias: None
Product: Red Hat Linux
Classification: Retired
Component: kernel
Version: 7.1
Hardware: i386
OS: Linux
Target Milestone: ---
Assignee: Michael K. Johnson
QA Contact: Brock Organ
Depends On:
TreeView+ depends on / blocked
Reported: 2001-04-15 20:38 UTC by Chris Ricker
Modified: 2005-10-31 22:00 UTC (History)
0 users

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Last Closed: 2001-09-05 17:37:44 UTC

Attachments (Terms of Use)

Description Chris Ricker 2001-04-15 20:38:54 UTC
This is with a clean install of seawolf on a dual celery system with two
IDE drives, a IBM-DTLA-307045 as hde and a WDC AC313000R as hdh running off
of a Promise PDC20267 ATA-100 controller.

I'm experiencing subtle file corruptions when I do large file transfers.
Occasionally binaries on /usr are getting modified (as evidenced by failure
to run and changed md5sums according to rpm -V).  In addition, I've noticed
some corruptions in C source files -- never more than 2 or 3 bytes worth

I'll fire up a cerberus run and see if I can get something more readily
reproducible out of that.

Comment 1 Arjan van de Ven 2001-04-15 20:49:11 UTC
What chipset is this ?
And does it happen if you disable DMA ?
(eg "ide=nodma" on the commandline)

Comment 2 Chris Ricker 2001-04-15 22:29:43 UTC
The motherboard is an i440-BX (ABIT BP6?).  Onboard, it has PIIX4 and HPT366
1.26, of which only the PIIX4 is being used (for a CD burner and a Zip).  The
hard drives are running off a Promise controller.

I can't boot the machine with ide=nodma.  When I do, I get constant status
errors, status timeouts, and drives not ready on both drives at the first
attempt to write to them, and the machine is hung.

Booting without any options, I can reliably reproduce corruption by running the
Cerberus (1.3pre1) data test simultaneously against partitions on both drives.
The box invariably fails md5sums on the writes, and will often eventually hard
lock (though the lockup is not totally reproducible).

I get these results whether I run the seawolf smp kernel or a custom-built smp
kernel (built from the seawolf 2.4.2-2 source tree).

Comment 3 Arjan van de Ven 2001-04-15 22:34:45 UTC
Are you using 80ribbon cables ?

Comment 4 Chris Ricker 2001-04-15 22:37:52 UTC

Comment 5 Arjan van de Ven 2001-04-15 22:42:29 UTC
I assume you're not overclocking your system.
Can you give the output of "hdparm -i /dev/hde" ?
(eg that tells us which DMA mode the drive is in)

Comment 6 Chris Ricker 2001-04-15 22:46:18 UTC
Nope, it's not overclocked (and it's an oldish setup that was quite stable with
7.0 and 6.x before that).  The drives are booting up in udma5 for the IBM and
udma2 for the WD (no hdparm tweaking done).

[kaboom@verdande kaboom]$ sudo hdparm -i /dev/hde


 Model=IBM-DTLA-307045, FwRev=TX6OA50C, SerialNo=YMDYMM98798
 Config={ HardSect NotMFM HdSw>15uSec Fixed DTR>10Mbs }
 RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=40
 BuffType=DualPortCache, BuffSize=1916kB, MaxMultSect=16, MultSect=16
 CurCHS=16383/16/63, CurSects=-66060037, LBA=yes, LBAsects=90069840
 IORDY=on/off, tPIO={min:240,w/IORDY:120}, tDMA={min:120,rec:120}
 PIO modes: pio0 pio1 pio2 pio3 pio4
 DMA modes: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 *udma5
[kaboom@verdande kaboom]$ sudo hdparm -i /dev/hdh


 Model=WDC AC313000R, FwRev=17.01J17, SerialNo=WD-WT6760209354
 Config={ HardSect NotMFM HdSw>15uSec SpinMotCtl Fixed DTR>5Mbs FmtGapReq }
 RawCHS=16383/16/63, TrkSize=57600, SectSize=600, ECCbytes=40
 BuffType=DualPortCache, BuffSize=512kB, MaxMultSect=16, MultSect=16
 CurCHS=16383/16/63, CurSects=-66060037, LBA=yes, LBAsects=25429824
 IORDY=on/off, tPIO={min:160,w/IORDY:120}, tDMA={min:120,rec:120}
 PIO modes: pio0 pio1 pio2 pio3 pio4
 DMA modes: mdma0 mdma1 mdma2 udma0 udma1 *udma2 udma3 udma4
[kaboom@verdande kaboom]$

Comment 7 Arjan van de Ven 2001-04-15 22:56:23 UTC
Thanks for the information. I'll contact the IDE maintainer and see if any 
of this rings a bell..

Comment 8 Chris Ricker 2001-06-27 17:18:51 UTC
This bug still exists with kernel-smp-2.4.3-12

Comment 9 Chris Ricker 2001-08-15 19:12:53 UTC
I'm seeing what I think is the same problem on that machine (now with ext3)
under beta3 and using a kernel compiled from kernel-source-2.4.7-0.13.1. 
binaries are getting slightly corrupted, segfaulting, and failing checksum

Comment 10 Chris Ricker 2001-09-05 17:37:39 UTC
I only see the corruption if both the WD and the IBM are in use.  If I just have
either one by itself, things seem fine.

Might there be some sort of timing chatter going on?

Comment 11 Chris Ricker 2002-06-06 04:07:19 UTC
I'm closing this, since I've not been seeing this problem with 7.3 on that machine.

Note You need to log in before you can comment on or make changes to this bug.