Bug 97833

Summary: (IDE PDC202XX_NEW)Data corruption with Promise FastTrak and latest 2.4.20-18.7smp kernel
Product: [Retired] Red Hat Linux Reporter: Matthias Saou <matthias>
Component: kernelAssignee: Arjan van de Ven <arjanv>
Status: CLOSED WONTFIX QA Contact: Brian Brock <bbrock>
Severity: high Docs Contact:
Priority: medium    
Version: 7.3CC: alan, wtogami
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2004-09-30 15:41:11 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Matthias Saou 2003-06-22 23:17:13 UTC
After upgrading to the latest errata kernel, 2.4.20-18.7smp, some Intel 1U
servers with an onboard Promise FastTrak IDE RAID controller started
experiencing sudden data corruption on some partitions, a few days after the
upgrade. None of the servers still running 2.4.18-24.7.xsmp or 2.4.18-27.7.xsmp
showed problems, and all servers had been running for 4-5 months with no similar
problems prior to that.

Here are the most common log entries encountered :

Jun 20 23:25:07 erpweb05 kernel: EXT3-fs error (device ataraid(114,7)):
ext3_new_block: Allocating block in system zone - block = 6586369
Jun 20 23:25:07 erpweb05 kernel: EXT3-fs error (device ataraid(114,7)):
ext3_new_block: Allocating block in system zone - block = 6586376
Jun 20 23:25:08 erpweb05 kernel: EXT3-fs error (device ataraid(114,7)):
ext3_new_block: Allocating block in system zone - block = 6586377
Jun 20 23:25:08 erpweb05 kernel: EXT3-fs error (device ataraid(114,7)):
ext3_new_block: Allocating block in system zone - block = 6586378
Jun 20 23:25:08 erpweb05 kernel: EXT3-fs error (device ataraid(114,7)):
ext3_new_block: Allocating block in system zone - block = 6586379

The symptoms were the same for all 3 servers that suffered corruption : At once,
one partition starting having severe problems, when it was /var not much
happened, but when it was /usr many processes started dying. Running fsck then
trashes most of the filesystem, and many many directories end up as #<some
number> in lost+found.

Here is the detailed hardware and kernel module information :

00:02.0 RAID bus controller: Promise Technology, Inc. 20267 (rev 02)
        Subsystem: Intel Corp.: Unknown device 3410
        Flags: bus master, medium devsel, latency 64, IRQ 19
        I/O ports at 1400 [size=8]
        I/O ports at 1408 [size=4]
        I/O ports at 1410 [size=8]
        I/O ports at 140c [size=4]
        I/O ports at 1440 [size=64]
        Memory at fe7a0000 (32-bit, non-prefetchable) [size=128K]
        Expansion ROM at fe7e0000 [disabled] [size=64K]
        Capabilities: [58] Power Management version 1

# df
Filesystem           1k-blocks      Used Available Use% Mounted on
/dev/ataraid/d0p1       202220    135856     55924  71% /
none                    515264         0    515264   0% /dev/shm
/dev/ataraid/d0p7       497829      8239    463888   2% /tmp
/dev/ataraid/d0p5      3067568    345672   2566068  12% /usr
/dev/ataraid/d0p8     72975932  19856400  49412536  29% /var

# lsmod
Module                  Size  Used by    Not tainted
nfs                    86048   1  (autoclean)
lockd                  55904   1  (autoclean) [nfs]
sunrpc                 79252   1  (autoclean) [nfs lockd]
eepro100               20720   1
iptable_filter          2464   1  (autoclean)
ip_tables              14464   1  [iptable_filter]
ide-cd                 30208   0  (autoclean)
cdrom                  32384   0  (autoclean) [ide-cd]
usb-ohci               20896   0  (unused)
usbcore                74400   1  [usb-ohci]
pdcraid                14144   5
ataraid                 8736   5  [pdcraid]
ext3                   67392   3
jbd                    51528   3  [ext3]

All servers are now downgraded to 2.4.18-27.7.xsmp.

Matthias

Comment 1 Bugzilla owner 2004-09-30 15:41:11 UTC
Thanks for the bug report. However, Red Hat no longer maintains this version of
the product. Please upgrade to the latest version and open a new bug if the problem
persists.

The Fedora Legacy project (http://fedoralegacy.org/) maintains some older releases, 
and if you believe this bug is interesting to them, please report the problem in
the bug tracker at: http://bugzilla.fedora.us/