Bug 158169 - megaraid driver for x86_64 causes data corruption
megaraid driver for x86_64 causes data corruption
Status: CLOSED WONTFIX
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel (Show other bugs)
4.0
x86_64 Linux
medium Severity high
: ---
: ---
Assigned To: Tom Coughlan
Brian Brock
:
Depends On:
Blocks: 176344
  Show dependency treegraph
 
Reported: 2005-05-19 07:12 EDT by Need Real Name
Modified: 2008-08-02 19:40 EDT (History)
1 user (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2007-01-02 09:01:19 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Need Real Name 2005-05-19 07:12:44 EDT
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.8) Gecko/20050513 Fedora/1.0.4-1.3.1 Firefox/1.0.4

Description of problem:
megaraid driver seems to cause data corruption randomly. Sometimes the filesystem cannot be safely used for more than a few seconds, sometimes it stays usable for hours.

Since this does not happen in RHEL 3 nor in RHEL4_i386, it should be a driver problem.

We use an Intel SRCS16 raid controller, configured as a raid 5 volume (3 physical disks, serial ata). Data corruption tends to manifest sooner when write back policy is enabled on the controller, but it also happens with write through.

Version-Release number of selected component (if applicable):
kernel-2.6.9-5.0.5

How reproducible:
Sometimes

Steps to Reproduce:
1. Start any kind of heavy I/O on the SRCS16 controller for some time (usually 10 minutes are enough)
2. Check the filesystem with fsck
3. There are severe errors on the filesystem
  

Actual Results:  Sometimes, files become multi-terabyte lose their names or suddenly disappear, sometimes the journal aborts, and some other times dmesg shows that the driver had to reset the controller as a result of repeated failures.

Expected Results:  data should not be corrupted.

Additional info:

This is a dual-xeon system with EMT64 technology. Tests were done with the SMP kernel. It is not at production _right now_, so I should be able to help testing at least for a few days.
Comment 2 Need Real Name 2005-07-06 04:51:51 EDT

*** This bug has been marked as a duplicate of 141360 ***
Comment 3 Ernie Petrides 2005-07-21 17:39:04 EDT
Reopening -- please don't dup bugs across different product versions.
Comment 7 Tom Coughlan 2006-06-27 12:33:08 EDT
This problem may be a manifestation of bug 194533. Please test the kernel, or
driver patch, that is posted there if possible. 
Comment 9 Tom Coughlan 2006-06-29 11:54:57 EDT
I have updated the patch, and the test kernel, posted in BZ 194533. Please test. 
Comment 10 Tom Coughlan 2006-06-29 15:53:29 EDT
As you may have seen from the patch, one problem with the current driver is that
it enables 64-bit DMA on some adapter models that do not support it. I would
like to find out if your adapter is one of them. This will indicate whether the
patch may be the right fix. Please provide the output of

lspci -xxx
lspci -n

on a system that exhibits the failure. Also please send /var/log/messages, or
dmesg, that shows the messages when the megaraid driver loads. That will give me
the fw rev, and any other relevant messages. 

Thanks. 
Comment 12 Daniel Riek 2006-11-21 11:49:11 EST
Raising as an Exception as we need to find out if we are going to address this
or not. I doubt it will ever get addressed as the underlying IT was closed so my
recommendation is to close it.

Comment 13 Daniel Riek 2007-01-02 08:51:23 EST
PM NAK based on comment 12 and the lack of activity.
Comment 14 RHEL Product and Program Management 2007-01-02 09:01:19 EST
Product Management has reviewed and declined this request.  You may appeal this
decision by reopening this request. 

Note You need to log in before you can comment on or make changes to this bug.