This is with a clean install of seawolf on a dual celery system with two IDE drives, a IBM-DTLA-307045 as hde and a WDC AC313000R as hdh running off of a Promise PDC20267 ATA-100 controller. I'm experiencing subtle file corruptions when I do large file transfers. Occasionally binaries on /usr are getting modified (as evidenced by failure to run and changed md5sums according to rpm -V). In addition, I've noticed some corruptions in C source files -- never more than 2 or 3 bytes worth changed. I'll fire up a cerberus run and see if I can get something more readily reproducible out of that.
What chipset is this ? And does it happen if you disable DMA ? (eg "ide=nodma" on the commandline)
The motherboard is an i440-BX (ABIT BP6?). Onboard, it has PIIX4 and HPT366 1.26, of which only the PIIX4 is being used (for a CD burner and a Zip). The hard drives are running off a Promise controller. I can't boot the machine with ide=nodma. When I do, I get constant status errors, status timeouts, and drives not ready on both drives at the first attempt to write to them, and the machine is hung. Booting without any options, I can reliably reproduce corruption by running the Cerberus (1.3pre1) data test simultaneously against partitions on both drives. The box invariably fails md5sums on the writes, and will often eventually hard lock (though the lockup is not totally reproducible). I get these results whether I run the seawolf smp kernel or a custom-built smp kernel (built from the seawolf 2.4.2-2 source tree).
Are you using 80ribbon cables ?
yep
I assume you're not overclocking your system. Can you give the output of "hdparm -i /dev/hde" ? (eg that tells us which DMA mode the drive is in)
Nope, it's not overclocked (and it's an oldish setup that was quite stable with 7.0 and 6.x before that). The drives are booting up in udma5 for the IBM and udma2 for the WD (no hdparm tweaking done). [kaboom@verdande kaboom]$ sudo hdparm -i /dev/hde /dev/hde: Model=IBM-DTLA-307045, FwRev=TX6OA50C, SerialNo=YMDYMM98798 Config={ HardSect NotMFM HdSw>15uSec Fixed DTR>10Mbs } RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=40 BuffType=DualPortCache, BuffSize=1916kB, MaxMultSect=16, MultSect=16 CurCHS=16383/16/63, CurSects=-66060037, LBA=yes, LBAsects=90069840 IORDY=on/off, tPIO={min:240,w/IORDY:120}, tDMA={min:120,rec:120} PIO modes: pio0 pio1 pio2 pio3 pio4 DMA modes: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 *udma5 [kaboom@verdande kaboom]$ sudo hdparm -i /dev/hdh /dev/hdh: Model=WDC AC313000R, FwRev=17.01J17, SerialNo=WD-WT6760209354 Config={ HardSect NotMFM HdSw>15uSec SpinMotCtl Fixed DTR>5Mbs FmtGapReq } RawCHS=16383/16/63, TrkSize=57600, SectSize=600, ECCbytes=40 BuffType=DualPortCache, BuffSize=512kB, MaxMultSect=16, MultSect=16 CurCHS=16383/16/63, CurSects=-66060037, LBA=yes, LBAsects=25429824 IORDY=on/off, tPIO={min:160,w/IORDY:120}, tDMA={min:120,rec:120} PIO modes: pio0 pio1 pio2 pio3 pio4 DMA modes: mdma0 mdma1 mdma2 udma0 udma1 *udma2 udma3 udma4 [kaboom@verdande kaboom]$
Thanks for the information. I'll contact the IDE maintainer and see if any of this rings a bell..
This bug still exists with kernel-smp-2.4.3-12
I'm seeing what I think is the same problem on that machine (now with ext3) under beta3 and using a kernel compiled from kernel-source-2.4.7-0.13.1. binaries are getting slightly corrupted, segfaulting, and failing checksum validations.
I only see the corruption if both the WD and the IBM are in use. If I just have either one by itself, things seem fine. Might there be some sort of timing chatter going on?
I'm closing this, since I've not been seeing this problem with 7.3 on that machine.