Bug 599295 - Significant MSI performance issue due to redundant interrupt masking
Summary: Significant MSI performance issue due to redundant interrupt masking
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.4
Hardware: All
OS: Linux
urgent
high
Target Milestone: rc
: ---
Assignee: Prarit Bhargava
QA Contact: Eryu Guan
URL:
Whiteboard:
Depends On:
Blocks: 621938 621939 621940
TreeView+ depends on / blocked
 
Reported: 2010-06-03 04:15 UTC by Wade Mealing
Modified: 2018-10-27 13:28 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-01-13 21:35:24 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Patch to bring msi performance inline with msi-x (1.64 KB, patch)
2010-06-03 04:52 UTC, Wade Mealing
no flags Details | Diff
RHEL5 fix for this issue (2.14 KB, patch)
2010-07-28 18:09 UTC, Prarit Bhargava
no flags Details | Diff
RHEL5 fix for this issue (2.61 KB, patch)
2010-08-05 15:06 UTC, Prarit Bhargava
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2011:0017 0 normal SHIPPED_LIVE Important: Red Hat Enterprise Linux 5.6 kernel security and bug fix update 2011-01-13 10:37:42 UTC

Description Wade Mealing 2010-06-03 04:15:34 UTC
Description of problem:

Large amount of CPU time is wasted to handle MSI interrupt. This degrades  performance considerably under high MSI interrupt load.

MSI-X cannot be a workaround of this issue because RHEL5.x supports only 256 interrupt vectors and using MSI-X runs out of interrupt vectors easily. In this case (no enough vectors for MSI-X), MSI will be used instead of MSI-X.


 Analysis of the problem
 -----------------------
 When a MSI interrupt is generated, the RHEL5.x kernel mask it at the
 beginning of interrupt handler (at the ack time) if this MSI
 interrupt is maskable. To mask the MSI interrupt, kernel writes to
 PCI configuration space with holding spin-lock. This wastes large
 amount of CPU time because:

 - PCI config access is very slow.
 - PCI config access is serialized (need spin-lock) among CPUs.

 Masking maskable MSI interrupt is required only when irq affinity is
 being changed, and the behavior of RHEL5.x (masking every maskable
 MSI interrupt) is redundant. This was already fixed in the upstream
 kernel by commit 277bc33bc2479707e88b0b2ae6fe56e8e4aabe81.

Version-Release number of selected component (if applicable):

 Red Hat Enterprise Linux Version Number:      RHEL5
 Release Number:                               4
 Architecture:                                 x86_64
 Kernel Version:                               2.6.18-164.9.1.el5
 Related Package Version:                      None
 Related Middleware / Application:             None

Drivers or hardware or architecture dependency:

 System with PCI adapter cards that support maskable MSI.
 This problem is not processor architecture specific.

The specifically benchmarked system is:

 Model:        PRIMEQUEST1800E
 CPU Info:     Xeon X7560 (2.27GHz/8core/24MB L3) * 8
 Memory Info:  512GB (DDR3-1066 8GB DIMM * 64)
 Hardware Component Information:
 - FC:    8GB FC (PCIe/single port) * 28
 - LAN:     On-board LAN (Intel igb 1000Mbps) * 4
 - Storage: 270 Logical Volumes
   * ETERNUS6000(8FC path, 24 RAID groups(RAID0)) * 4
   * ETERNUS3000(2FC path, 1 RAID group(RAID0), 6 RAID grpups (RAID5)) * 1

How reproducible:
  Every time

Actual Results:

 Performance is very much degraded when MSI is used, compared to
 MSI-X.

Expected results:

 Performance result using MSI is near to the performance result using
 MSI-X.

Additional info:

 No sosreport unfortunately. It is difficult to build the same
 environment where we did benchmark for getting sosreport because the
 environment was very large (around I/O configuration especially).

 About the proposal patch
 ------------------------
 As mentioned above, this issue was fixed in upstream kernel by the
 commit 277bc33bc2479707e88b0b2ae6fe56e8e4aabe81, which changes MSI
 logic to use irq_chip instead of using hw_interrupt_type. This
 change seems too large to be backported to RHEL5.x. So the proposal
 patch changes existing hw_interrupt_type for MSI to not mask
 maskable interrupt at the ack time.

The customers patch is attached.

Comment 2 Wade Mealing 2010-06-03 04:52:27 UTC
Created attachment 419243 [details]
Patch to bring msi performance inline with msi-x

I'm reporting this on proxy one behalf of the SEG engineering with the ticket, in order to keep the ball in motion.

Comment 11 Prarit Bhargava 2010-07-28 14:21:28 UTC
Wade, I think you're right in your analysis.  I'll update your patch against RHEL5 latest, do a quick test, and post on RHKL.

P.

Comment 12 Prarit Bhargava 2010-07-28 18:09:06 UTC
Created attachment 435098 [details]
RHEL5 fix for this issue

Comment 13 RHEL Program Management 2010-07-28 18:19:17 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 16 Prarit Bhargava 2010-08-05 15:06:55 UTC
Created attachment 436875 [details]
RHEL5 fix for this issue

Comment 20 Jarod Wilson 2010-08-06 15:28:43 UTC
We probably ought to release-note this so people can discover its existence.

Comment 22 Jarod Wilson 2010-08-11 00:12:41 UTC
in kernel-2.6.18-211.el5
You can download this test kernel from http://people.redhat.com/jwilson/el5

Detailed testing feedback is always welcomed.

Comment 32 Eryu Guan 2010-11-25 10:39:27 UTC
FJ has confirmed the fix works as expected.

linux-2.6-pci-msi-add-option-for-lockless-interrupt-mode.patch is applied in
kernel 2.6.18-194.14.1.el5 correctly

Comment 34 errata-xmlrpc 2011-01-13 21:35:24 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-0017.html


Note You need to log in before you can comment on or make changes to this bug.