=Comment: #0================================================= Stefan Roscher <stefan.roscher.com> - If a userspace application tries to allocate a large number of queue pairs the performance of the creation process degrade rapidly and results in softlookup errors. BUG: soft lockup - CPU#10 stuck for 10s! [mpi_lapi_gen_64:21687] REGS: c000001bc72a7340 TRAP: 0901 Tainted: G (2.6.18-128.el5) TASK = c000001e4ad98d40[21687] 'mpi_lapi_gen_64' THREAD: c000001bc72a4000 CPU: 10 NIP [C0000000003C8E3C] ._write_lock+0x44/0x80 LR [C0000000000DB550] .__get_vm_area_node+0xd0/0x1f8 Call Trace: The application ran on a RHEL-5.3 on ppc64 system. =Comment: #2================================================= Stefan Roscher <stefan.roscher.com> - Due to further analysis we can point to 2 functioncalls which are the reason for this performance degradation. First is the usage of vmalloc() within the device driver, the second one is the usage of ioremap() for every queue pair. Both functioncalls results in a search loop of a list in the generic kernel.The size of the list increases with the number of QPs allocated. We will try to optimize the functioncalls to have better performance and will provide a patch. regards Stefan =Comment: #13================================================= Duane L. Witherspoon <withersp.com> - The Cluster HPC testing of this patch has now been completed successfully. =Comment: #15================================================= Stefan Roscher <stefan.roscher.com> - perfomance patch for ehca performance patch for ehca driver This patch contains performance improvments for ehca driver. It will skip code which is not necessary for userspace queue pairs and will replace vmalloc() calls with kmalloc(). We merged the three single patches to one and tested it with 2.6.18-141.el5 kernel. The patch is already applied for linux-2.6.31 as you can see below: http://lkml.org/lkml/2009/4/21/290 http://lkml.org/lkml/2009/4/21/292 http://lkml.org/lkml/2009/4/21/293 regards Stefan
Created attachment 342005 [details] perfomance patch for ehca
RKML post: http://post-office.corp.redhat.com/archives/rhkernel-list/2009-May/msg00046.html
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
IBM is signed up to test and provide feedback.
in kernel-2.6.18-144.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5 Please do NOT transition this bugzilla state to VERIFIED until our QE team has sent specific instructions indicating when to do so. However feel free to provide a comment indicating that this fix has been verified.
------- Comment From alexs.ibm.com 2009-05-07 05:48 EDT------- Hello Redhat, I have verified that the patch is included in kernel-2.6.18-144.el5, our performance tests are running fine. Thanks, Alex
------- Comment From alexs.ibm.com 2009-07-06 05:39 EDT------- I have verified that the fix is included in RHEL-5.4 beta, testcases work fine. I'll close this bug.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2009-1243.html