Bug 175535 - megaraid module causes total slowdown near lockup on Dell PERC4
megaraid module causes total slowdown near lockup on Dell PERC4
Status: CLOSED INSUFFICIENT_DATA
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
5
x86_64 Linux
medium Severity high
: ---
: ---
Assigned To: Dave Jones
Brian Brock
MassClosed
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2005-12-12 09:23 EST by Jan Koop
Modified: 2015-01-04 17:23 EST (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2008-01-19 23:41:49 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)

  None (edit)
Description Jan Koop 2005-12-12 09:23:11 EST
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.8) Gecko/20051111 Firefox/1.5

Description of problem:
On a Dell Poweredge 2800 system with a DELL PERC4e/DC using a set of megaraid modules, we experienced unpredictable sudden I/O slowdowns on particular logical drives. During the slowdown, which will eventually stop and can be stopped by rebooting the machine, no load is visible on the physical drives. I/O is only slow to a particular locical drive, others work fine. Depending on the load by the clients (in this case the machine is a file server), as expected, the load avg will pile up, but decreases, as the network load eases off.

Version-Release number of selected component (if applicable):
kernel-smp-2.6.14-1.1644_FC4

How reproducible:
Didn't try

Steps to Reproduce:
1. Boot
2. put system under load
3. wait 24 hrs uptime (or peak load? occurred around 11 am every day)

I know that is not really how to reproduce the issue, but since this is a production system, we cannot risk more downtime and were forced to downgrade.

Actual Results:  I/O on one of the logical disks slow.

Expected Results:  I/O on all locical devices performing equally.

Additional info:

I noticed that there has been a major change, a module split in 2.6.13 (see http://lists.debian.org/debian-knoppix/2005/10/msg00054.html ), maybe the problem is related to that. After downgrading to kernel-smp-2.6.12-1.1398_FC4 the problem disappeared. I also noticed, that the mptspi module was loaded on 2.6.14 and isn't present with 2.6.12, maybe just not loading mptspi will fix the issue, but as stated above, we cannot really try any more.
Comment 1 Dave Jones 2006-02-03 01:17:06 EST
This is a mass-update to all currently open kernel bugs.

A new kernel update has been released (Version: 2.6.15-1.1830_FC4)
based upon a new upstream kernel release.

Please retest against this new kernel, as a large number of patches
go into each upstream release, possibly including changes that
may address this problem.

This bug has been placed in NEEDINFO_REPORTER state.
Due to the large volume of inactive bugs in bugzilla, if this bug is
still in this state in two weeks time, it will be closed.

Should this bug still be relevant after this period, the reporter
can reopen the bug at any time. Any other users on the Cc: list
of this bug can request that the bug be reopened by adding a
comment to the bug.

If this bug is a problem preventing you from installing the
release this version is filed against, please see bug 169613.

Thank you.
Comment 2 Don Knott 2006-02-06 18:19:34 EST
I have a similar problem on a box used for backups.

The box has an LSI MegaRAID SATA 300-8X that uses the megaraid module. Its
running Bru Server from TolisGroup for doing backups to disk and then to tape.

Originally I was going to use RHEL4 but Tolis expressed concern over the 2.6.9
kernel that was included. I opted for FC4 and got a 2.6.11 kernel that seems to
work fine. If I use any kernel newer than 2.6.11 up to and including the latest
2.6.15-1.1830 release, the system load skyrockets during a backup to disk
operation. Performance under 2.6.11 is better. Another symptom is that if I
don't reboot the box periodically the external SCSI LTO3 Certance tape drive
disappears. That may be unrelated to the performance bug.

I have the ability to perform tests or make changes as this box is not in
production yet. Alternatively, I can stick with 2.6.11.
Comment 3 Dave Jones 2006-07-29 01:51:50 EDT
Is this any better with the current errata ?
Comment 4 Don Knott 2006-07-31 16:07:37 EDT
The fix was to upgrade to FC5 with current errata. It seems to have fixed 
things.
Comment 5 Dave Jones 2006-09-16 23:23:00 EDT
[This comment added as part of a mass-update to all open FC4 kernel bugs]

FC4 has now transitioned to the Fedora legacy project, which will continue to
release security related updates for the kernel.  As this bug is not security
related, it is unlikely to be fixed in an update for FC4, and has been migrated
to FC5.

Please retest with Fedora Core 5.

Thank you.
Comment 6 Dave Jones 2006-10-16 15:48:36 EDT
A new kernel update has been released (Version: 2.6.18-1.2200.fc5)
based upon a new upstream kernel release.

Please retest against this new kernel, as a large number of patches
go into each upstream release, possibly including changes that
may address this problem.

This bug has been placed in NEEDINFO state.
Due to the large volume of inactive bugs in bugzilla, if this bug is
still in this state in two weeks time, it will be closed.

Should this bug still be relevant after this period, the reporter
can reopen the bug at any time. Any other users on the Cc: list
of this bug can request that the bug be reopened by adding a
comment to the bug.

In the last few updates, some users upgrading from FC4->FC5
have reported that installing a kernel update has left their
systems unbootable. If you have been affected by this problem
please check you only have one version of device-mapper & lvm2
installed.  See bug 207474 for further details.

If this bug is a problem preventing you from installing the
release this version is filed against, please see bug 169613.

If this bug has been fixed, but you are now experiencing a different
problem, please file a separate bug for the new problem.

Thank you.
Comment 7 Jon Stanley 2008-01-19 23:41:49 EST
(this is a mass-close to kernel bugs in NEEDINFO state)

As indicated previously there has been no update on the progress of this bug
therefore I am closing it as INSUFFICIENT_DATA. Please re-open if the issue
still occurs for you and I will try to assist in its resolution. Thank you for
taking the time to report the initial bug.

If you believe that this bug was closed in error, please feel free to reopen
this bug.

Note You need to log in before you can comment on or make changes to this bug.