Bug 498527

Summary: ehca performance impact during creation of queue pairs
Product: Red Hat Enterprise Linux 5 Reporter: IBM Bug Proxy <bugproxy>
Component: kernelAssignee: Ameet Paranjape <aparanja>
Status: CLOSED ERRATA QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: high Docs Contact:
Priority: high    
Version: 5.4CC: aparanja, balkov, dzickus, jjarvis, peterm
Target Milestone: rcKeywords: OtherQA
Target Release: ---   
Hardware: ppc64   
OS: All   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-09-02 08:16:24 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
perfomance patch for ehca none

Description IBM Bug Proxy 2009-04-30 21:00:54 UTC
=Comment: #0=================================================
Stefan Roscher <stefan.roscher.com> - 
If a userspace application tries to allocate a large number of queue pairs the
performance of the creation process degrade rapidly and results in softlookup
errors.

BUG: soft lockup - CPU#10 stuck for 10s! [mpi_lapi_gen_64:21687]
REGS: c000001bc72a7340 TRAP: 0901   Tainted: G       (2.6.18-128.el5)
TASK = c000001e4ad98d40[21687] 'mpi_lapi_gen_64' THREAD: c000001bc72a4000 CPU:
10
NIP [C0000000003C8E3C] ._write_lock+0x44/0x80
LR [C0000000000DB550] .__get_vm_area_node+0xd0/0x1f8
Call Trace:

The application ran on a RHEL-5.3 on ppc64 system.
=Comment: #2=================================================
Stefan Roscher <stefan.roscher.com> - 
Due to further analysis we can point to 2 functioncalls which are the reason
for this performance degradation.
First is the usage of vmalloc() within the device driver, the second one is the
usage of ioremap() for every queue pair. Both functioncalls results in a search
loop of a list in the generic kernel.The size of the list increases with the
number of QPs allocated. We will try to optimize the functioncalls to have
better performance and will provide a patch.

regards Stefan


=Comment: #13=================================================
Duane L. Witherspoon <withersp.com> - 
The Cluster HPC testing of this patch has now been completed successfully.  
=Comment: #15=================================================
Stefan Roscher <stefan.roscher.com> - 

perfomance patch for ehca

performance patch for ehca driver

This patch contains performance improvments for ehca driver.
It will skip code which is not necessary for userspace queue pairs
and will replace vmalloc() calls with kmalloc().
We merged the three single patches to one and tested it with 
2.6.18-141.el5 kernel.

The patch is already applied for linux-2.6.31 as you can see below:
http://lkml.org/lkml/2009/4/21/290
http://lkml.org/lkml/2009/4/21/292
http://lkml.org/lkml/2009/4/21/293

regards Stefan

Comment 1 IBM Bug Proxy 2009-04-30 21:00:59 UTC
Created attachment 342005 [details]
perfomance patch for ehca

Comment 3 RHEL Program Management 2009-05-05 13:29:02 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 4 John Jarvis 2009-05-05 14:06:27 UTC
IBM is signed up to test and provide feedback.

Comment 5 Don Zickus 2009-05-06 17:18:30 UTC
in kernel-2.6.18-144.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Please do NOT transition this bugzilla state to VERIFIED until our QE team
has sent specific instructions indicating when to do so.  However feel free
to provide a comment indicating that this fix has been verified.

Comment 7 IBM Bug Proxy 2009-05-07 09:51:23 UTC
------- Comment From alexs.ibm.com 2009-05-07 05:48 EDT-------
Hello Redhat,

I have verified that the patch is included in kernel-2.6.18-144.el5, our performance tests are running fine.

Thanks,
Alex

Comment 8 IBM Bug Proxy 2009-07-06 09:41:15 UTC
------- Comment From alexs.ibm.com 2009-07-06 05:39 EDT-------
I have verified that the fix is included in RHEL-5.4 beta, testcases work fine.

I'll close this bug.

Comment 10 errata-xmlrpc 2009-09-02 08:16:24 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-1243.html