Red Hat Bugzilla – Bug 158169
megaraid driver for x86_64 causes data corruption
Last modified: 2008-08-02 19:40:33 EDT
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.8) Gecko/20050513 Fedora/1.0.4-1.3.1 Firefox/1.0.4
Description of problem:
megaraid driver seems to cause data corruption randomly. Sometimes the filesystem cannot be safely used for more than a few seconds, sometimes it stays usable for hours.
Since this does not happen in RHEL 3 nor in RHEL4_i386, it should be a driver problem.
We use an Intel SRCS16 raid controller, configured as a raid 5 volume (3 physical disks, serial ata). Data corruption tends to manifest sooner when write back policy is enabled on the controller, but it also happens with write through.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. Start any kind of heavy I/O on the SRCS16 controller for some time (usually 10 minutes are enough)
2. Check the filesystem with fsck
3. There are severe errors on the filesystem
Actual Results: Sometimes, files become multi-terabyte lose their names or suddenly disappear, sometimes the journal aborts, and some other times dmesg shows that the driver had to reset the controller as a result of repeated failures.
Expected Results: data should not be corrupted.
This is a dual-xeon system with EMT64 technology. Tests were done with the SMP kernel. It is not at production _right now_, so I should be able to help testing at least for a few days.
*** This bug has been marked as a duplicate of 141360 ***
Reopening -- please don't dup bugs across different product versions.
This problem may be a manifestation of bug 194533. Please test the kernel, or
driver patch, that is posted there if possible.
I have updated the patch, and the test kernel, posted in BZ 194533. Please test.
As you may have seen from the patch, one problem with the current driver is that
it enables 64-bit DMA on some adapter models that do not support it. I would
like to find out if your adapter is one of them. This will indicate whether the
patch may be the right fix. Please provide the output of
on a system that exhibits the failure. Also please send /var/log/messages, or
dmesg, that shows the messages when the megaraid driver loads. That will give me
the fw rev, and any other relevant messages.
Raising as an Exception as we need to find out if we are going to address this
or not. I doubt it will ever get addressed as the underlying IT was closed so my
recommendation is to close it.
PM NAK based on comment 12 and the lack of activity.
Product Management has reviewed and declined this request. You may appeal this
decision by reopening this request.