Bug 438027 - RHEL4.6 Diskdump performance regression (mptfusion)
Summary: RHEL4.6 Diskdump performance regression (mptfusion)
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel
Version: 4.6
Hardware: ia64
OS: Linux
high
high
Target Milestone: rc
: ---
Assignee: Takao Indoh
QA Contact: Martin Jenner
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2008-03-18 18:38 UTC by Takao Indoh
Modified: 2013-08-06 03:49 UTC (History)
7 users (show)

Fixed In Version: RHSA-2008-0665
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2008-07-24 19:27:44 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Patch to fix the length of buffer used in scsi_dump (1.85 KB, patch)
2008-03-18 18:38 UTC, Takao Indoh
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2008:0665 0 normal SHIPPED_LIVE Moderate: Updated kernel packages for Red Hat Enterprise Linux 4.7 2008-07-24 16:41:06 UTC

Description Takao Indoh 2008-03-18 18:38:32 UTC
Description of problem:
The diskdump works with mpt fusion much slower than usual. Its performance fell
off very much. It takes 1 hour to dump 16GB RAM. The usual diskdump can dump
16GB RAM within 2 minutes.
I found diskdump included in kernel-2.6.9-55.EL works correctly. So this
is just a regression. Incidentally, i386 and x86_64 does not have the same issue.

Version-Release number of selected component (if applicable):
kernel-2.6.9-67.EL

How reproducible:
100%

Steps to Reproduce:
1. Configure a diskdump device using mptfusion.
2. Run "service diskdump initialformat".
3. Run "service diskdump start".
4. Overload the diskdump device.
5. Run "echo c > /proc/sysrq-trigger".

Actual results:
The diskdump dumps memory with mpt fusion at very low speed.

Expected results:
The diskdump dumps memory with mpt fusion at usual speed.

Additional info:
[Background]
scsi_dump module, which is a component of diskdump, issued REQUEST SENSE
command to the driver before starting dump. In 4.6, it was changed to
TEST UNIT READY command to fix BZ#237900, and this change caused this
regression of mptfusion.
https://bugzilla.redhat.com/show_bug.cgi?id=237900

[How to fix]
The best way to fix BZ#237900 is:
1) Remove the patch for BZ#237990
2) Fix the buffer size used in scsi_dump, because the real cause of
   BZ#237900 is that the buffer size of scsi_dump is invalid.

However, changing the buffer size affects all adapters. It takes much
time to test all adapters on all architecture to prevent regression.

On the other hand, Fujitsu needed the errata for this problem ASAP
because the mptfusion is the main scsi adapter of their server and this
regression is very serious problem. Therefore, I proposed the following solution.

1) Use the temporary fix patch for the quick errata provisioning.
   Applying the patch only affects mptfusion driver, so the testing
   can be narrowed down to it.
2) On the other hand, make the real fix available by conducting
   enough test on it.  Once the testing is done, replace the errata
   fix with the real fix at some point (before 4.7 comes out).

bz284991 has already been used for checking in temporary fix, so I open this
bugzilla for the real fix patch.

Comment 1 Takao Indoh 2008-03-18 18:38:32 UTC
Created attachment 298436 [details]
Patch to fix the length of buffer used in scsi_dump

Comment 2 RHEL Program Management 2008-03-19 22:06:19 UTC
Since Keyword Regression exists, this is a blocker,
not an exception.  Cleared exception flag.
Set blocker flag."

Comment 5 Vivek Goyal 2008-03-25 21:05:34 UTC
Committed in 68.25. Released in 68.26. RPMS are available at http://people.redhat.com/vgoyal/rhel4/

Comment 8 errata-xmlrpc 2008-07-24 19:27:44 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2008-0665.html


Note You need to log in before you can comment on or make changes to this bug.