Bug 158169 - megaraid driver for x86_64 causes data corruption
Summary: megaraid driver for x86_64 causes data corruption
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel
Version: 4.0
Hardware: x86_64
OS: Linux
Target Milestone: ---
: ---
Assignee: Tom Coughlan
QA Contact: Brian Brock
Depends On:
Blocks: 176344
TreeView+ depends on / blocked
Reported: 2005-05-19 11:12 UTC by Need Real Name
Modified: 2008-08-02 23:40 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Last Closed: 2007-01-02 14:01:19 UTC
Target Upstream Version:

Attachments (Terms of Use)

Description Need Real Name 2005-05-19 11:12:44 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.8) Gecko/20050513 Fedora/1.0.4-1.3.1 Firefox/1.0.4

Description of problem:
megaraid driver seems to cause data corruption randomly. Sometimes the filesystem cannot be safely used for more than a few seconds, sometimes it stays usable for hours.

Since this does not happen in RHEL 3 nor in RHEL4_i386, it should be a driver problem.

We use an Intel SRCS16 raid controller, configured as a raid 5 volume (3 physical disks, serial ata). Data corruption tends to manifest sooner when write back policy is enabled on the controller, but it also happens with write through.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1. Start any kind of heavy I/O on the SRCS16 controller for some time (usually 10 minutes are enough)
2. Check the filesystem with fsck
3. There are severe errors on the filesystem

Actual Results:  Sometimes, files become multi-terabyte lose their names or suddenly disappear, sometimes the journal aborts, and some other times dmesg shows that the driver had to reset the controller as a result of repeated failures.

Expected Results:  data should not be corrupted.

Additional info:

This is a dual-xeon system with EMT64 technology. Tests were done with the SMP kernel. It is not at production _right now_, so I should be able to help testing at least for a few days.

Comment 2 Need Real Name 2005-07-06 08:51:51 UTC

*** This bug has been marked as a duplicate of 141360 ***

Comment 3 Ernie Petrides 2005-07-21 21:39:04 UTC
Reopening -- please don't dup bugs across different product versions.

Comment 7 Tom Coughlan 2006-06-27 16:33:08 UTC
This problem may be a manifestation of bug 194533. Please test the kernel, or
driver patch, that is posted there if possible. 

Comment 9 Tom Coughlan 2006-06-29 15:54:57 UTC
I have updated the patch, and the test kernel, posted in BZ 194533. Please test. 

Comment 10 Tom Coughlan 2006-06-29 19:53:29 UTC
As you may have seen from the patch, one problem with the current driver is that
it enables 64-bit DMA on some adapter models that do not support it. I would
like to find out if your adapter is one of them. This will indicate whether the
patch may be the right fix. Please provide the output of

lspci -xxx
lspci -n

on a system that exhibits the failure. Also please send /var/log/messages, or
dmesg, that shows the messages when the megaraid driver loads. That will give me
the fw rev, and any other relevant messages. 


Comment 12 Daniel Riek 2006-11-21 16:49:11 UTC
Raising as an Exception as we need to find out if we are going to address this
or not. I doubt it will ever get addressed as the underlying IT was closed so my
recommendation is to close it.

Comment 13 Daniel Riek 2007-01-02 13:51:23 UTC
PM NAK based on comment 12 and the lack of activity.

Comment 14 RHEL Program Management 2007-01-02 14:01:19 UTC
Product Management has reviewed and declined this request.  You may appeal this
decision by reopening this request. 

Note You need to log in before you can comment on or make changes to this bug.