Bug 185400 - Filesystem corruption on 2850 PERC 4e/Di
Filesystem corruption on 2850 PERC 4e/Di
Status: CLOSED CURRENTRELEASE
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel (Show other bugs)
4.0
All Linux
medium Severity medium
: ---
: ---
Assigned To: Tom Coughlan
Brian Brock
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2006-03-14 08:09 EST by Charles Rose
Modified: 2007-11-30 17:07 EST (History)
4 users (show)

See Also:
Fixed In Version: 4.4
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2007-02-21 04:54:58 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
ltp.xml config file (4.78 KB, text/plain)
2006-03-15 01:28 EST, Charles Rose
no flags Details
lspci -n output (900 bytes, text/plain)
2006-07-05 08:57 EDT, Shyam kumar Iyer
no flags Details
lspci -xxx output (21.54 KB, text/plain)
2006-07-05 08:58 EDT, Shyam kumar Iyer
no flags Details
lspci -n output (996 bytes, text/plain)
2006-07-06 05:20 EDT, Shyam kumar Iyer
no flags Details
lspci -xxx (21.60 KB, text/plain)
2006-07-06 05:22 EDT, Shyam kumar Iyer
no flags Details
lspci -xxx (21.60 KB, text/plain)
2006-07-06 05:22 EDT, Shyam kumar Iyer
no flags Details

  None (edit)
Description Charles Rose 2006-03-14 08:09:39 EST
Description of problem:
Running ltp - http://ltp.sf.net tests on a Dell PE2850 with PERC4e/Di results in
file system curruption after a couple of days.

Version-Release number of selected component (if applicable):
RHEL4 U3 Gold ISO. kernel-2.6.9-34.EL

How reproducible:
Easy

Steps to Reproduce:
1. Install RHEL4 U3 Gold on the above mentioned hardware. Install ltp suite from
ltp.sf.net
2. Run lftp for over a day
  
Actual results:
No file system corruption should occur

Expected results:
ext3 filesystem corruption occurs

Additional info:
This was root caused to a driver/firmware bug which LSI is working on. LSI might
open a bug shortly (or have done so already). Once they have the patch ready, we
can link this issue to the LSI bug.
Comment 1 Charles Rose 2006-03-14 08:12:13 EST
I will upload the xml config file for ltp shortly.
Comment 2 Charles Rose 2006-03-15 01:28:43 EST
Created attachment 126143 [details]
ltp.xml config file
Comment 4 sachin sanap 2006-06-14 14:20:48 EDT
Hi, I am facing similar kind of problem with DELL PE 2950 with PERC 5/i RAID
card. I ran IO Zone test (iozone -i0 -f test.out -g 4g -aZ) on the mentioned
hardware after installing Scientific Linux (which is maintained by Fermi lab,
and is based on Red hat EL only).

Problem Description:

Ran IO Zone test for a day, then get segmentaion fault for evey command that I
type on after the test finished. Then I reboot the machine it could not get
thoough the boot procedure, stuck at initializing Storage!!!!

Before stucking at this point gave to segmentaion fault at message
starting udev: /sbin/star_udev: line 42: segmentaion fault

It is reporducible
 - install RHEL on DELL 2950
 - run IOZone for a day
 - and you have the bug as soon as the test is over

Please let me know what more information I can give to resolve this.  
Comment 5 Tom Coughlan 2006-06-27 12:35:51 EDT
This problem may be a manifestation of bug 194533. Please test the kernel, or
driver patch, that is posted there if possible. 
Comment 6 Tom Coughlan 2006-06-29 11:54:02 EDT
I have updated the patch, and the test kernel, posted in BZ 194533. Please test. 
Comment 7 Tom Coughlan 2006-06-29 15:51:09 EDT
Sachin and Charles,

As you may have seen from the patch, one problem with the current driver is that
it enables 64-bit DMA on some adapter models that do not support it. I would
like to find out if your adapter is one of them. This will indicate whether the
patch may be the right fix. Please provide the output of

lspci -xxx
lspci -n

on a system that exhibits the failure. Also please send /var/log/messages, or
dmesg, that shows the messages when the megaraid driver loads. That will give me
the fw rev, and any other relevant messages. 

Thanks. 
Comment 8 Charles Rose 2006-07-05 07:46:59 EDT
I am trying to find this machine. should have the info in a day.
Comment 9 Charles Rose 2006-07-05 07:47:53 EDT
I am trying to find a PE1850 on which this issue was seen. should have the info
in a day.
Comment 10 Shyam kumar Iyer 2006-07-05 08:57:46 EDT
Created attachment 131918 [details]
lspci -n output
Comment 11 Shyam kumar Iyer 2006-07-05 08:58:26 EDT
Created attachment 131919 [details]
lspci -xxx output
Comment 12 Tom Coughlan 2006-07-05 14:17:30 EDT
re. comment 11:

There is no adapter that uses the megaraid driver in the system. 

There is this adapter:

02:05.0 SCSI storage controller: LSI Logic / Symbios Logic 53c1030 PCI-X
Fusion-MPT Dual Ultra320 SCSI (rev 08)

I am not entirely sure, but it may be case that if you put this adapter in RAID
mode, it will present itself as a megaraid adapter. If so, please do that, and
repeat the lspci -xxx

Comment 13 Shyam kumar Iyer 2006-07-06 05:20:40 EDT
Created attachment 131986 [details]
lspci -n output
Comment 14 Shyam kumar Iyer 2006-07-06 05:22:33 EDT
Created attachment 131987 [details]
lspci -xxx

This output is with the RAID mode on the LSI Card.
Comment 15 Shyam kumar Iyer 2006-07-06 05:22:35 EDT
Created attachment 131988 [details]
lspci -xxx

This output is with the RAID mode on the LSI Card.
Comment 16 Tom Coughlan 2006-07-06 11:10:57 EDT
This looks better:

02:0e.0 RAID bus controller: Dell PowerEdge Expandable RAID controller 4 (rev 06)

a0: 00 00 00 00 00 f8 00 00 00 00 00 00 00 00 00 00

This output indicates that this HBA does not support 64-bit DMA, because 0x4a
not equal 0x0299. The existing driver incorectly puts the board in 64-bit mode,
this may explain the corruption you have seen. The patched driver will refrain
from trying 64-bit DMA. 

Please test to confirm that the new driver in fact resolves the problem.   
Comment 17 Shyam kumar Iyer 2006-07-12 09:32:36 EDT
Working on reproducing the original issue again. Will be posting results 
shortly.
Comment 18 Shyam kumar Iyer 2006-11-30 02:36:11 EST
Issue not reproducable.
Comment 19 Charles Rose 2006-11-30 05:00:24 EST
This is fixed in RHEL 4 U4 - kernel-2.6.9-42

Note You need to log in before you can comment on or make changes to this bug.