Bug 438027

Summary: RHEL4.6 Diskdump performance regression (mptfusion)
Product: Red Hat Enterprise Linux 4 Reporter: Takao Indoh <tindoh>
Component: kernelAssignee: Takao Indoh <tindoh>
Status: CLOSED ERRATA QA Contact: Martin Jenner <mjenner>
Severity: high Docs Contact:
Priority: high    
Version: 4.6CC: brking, jbaron, ktokunag, lwang, mmatsuya, qcai, tsenglin00
Target Milestone: rcKeywords: Regression
Target Release: ---   
Hardware: ia64   
OS: Linux   
Whiteboard:
Fixed In Version: RHSA-2008-0665 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-07-24 19:27:44 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Patch to fix the length of buffer used in scsi_dump none

Description Takao Indoh 2008-03-18 18:38:32 UTC
Description of problem:
The diskdump works with mpt fusion much slower than usual. Its performance fell
off very much. It takes 1 hour to dump 16GB RAM. The usual diskdump can dump
16GB RAM within 2 minutes.
I found diskdump included in kernel-2.6.9-55.EL works correctly. So this
is just a regression. Incidentally, i386 and x86_64 does not have the same issue.

Version-Release number of selected component (if applicable):
kernel-2.6.9-67.EL

How reproducible:
100%

Steps to Reproduce:
1. Configure a diskdump device using mptfusion.
2. Run "service diskdump initialformat".
3. Run "service diskdump start".
4. Overload the diskdump device.
5. Run "echo c > /proc/sysrq-trigger".

Actual results:
The diskdump dumps memory with mpt fusion at very low speed.

Expected results:
The diskdump dumps memory with mpt fusion at usual speed.

Additional info:
[Background]
scsi_dump module, which is a component of diskdump, issued REQUEST SENSE
command to the driver before starting dump. In 4.6, it was changed to
TEST UNIT READY command to fix BZ#237900, and this change caused this
regression of mptfusion.
https://bugzilla.redhat.com/show_bug.cgi?id=237900

[How to fix]
The best way to fix BZ#237900 is:
1) Remove the patch for BZ#237990
2) Fix the buffer size used in scsi_dump, because the real cause of
   BZ#237900 is that the buffer size of scsi_dump is invalid.

However, changing the buffer size affects all adapters. It takes much
time to test all adapters on all architecture to prevent regression.

On the other hand, Fujitsu needed the errata for this problem ASAP
because the mptfusion is the main scsi adapter of their server and this
regression is very serious problem. Therefore, I proposed the following solution.

1) Use the temporary fix patch for the quick errata provisioning.
   Applying the patch only affects mptfusion driver, so the testing
   can be narrowed down to it.
2) On the other hand, make the real fix available by conducting
   enough test on it.  Once the testing is done, replace the errata
   fix with the real fix at some point (before 4.7 comes out).

bz284991 has already been used for checking in temporary fix, so I open this
bugzilla for the real fix patch.

Comment 1 Takao Indoh 2008-03-18 18:38:32 UTC
Created attachment 298436 [details]
Patch to fix the length of buffer used in scsi_dump

Comment 2 RHEL Program Management 2008-03-19 22:06:19 UTC
Since Keyword Regression exists, this is a blocker,
not an exception.  Cleared exception flag.
Set blocker flag."

Comment 5 Vivek Goyal 2008-03-25 21:05:34 UTC
Committed in 68.25. Released in 68.26. RPMS are available at http://people.redhat.com/vgoyal/rhel4/

Comment 8 errata-xmlrpc 2008-07-24 19:27:44 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2008-0665.html