Bug 432451

Summary: [RHEL5] Setting IRQ affinity does not work with MSI devices
Product: Red Hat Enterprise Linux 5 Reporter: Flavio Leitner <fleitner>
Component: kernelAssignee: Michal Schmidt <mschmidt>
Status: CLOSED WONTFIX QA Contact: Martin Jenner <mjenner>
Severity: high Docs Contact:
Priority: high    
Version: 5.1CC: agospoda, benlu, ddomingo, jbrassow, rlerch, tao
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Configuring IRQ SMP affinity has no effect on some devices that use message signalled interrupts (MSI) with no MSI per-vector masking capability. Examples of such devices include Broadcom NetXtreme Ethernet devices that use the bnx2 driver. If you need to configure IRQ affinity for such a device, disable MSI by creating a file in /etc/modprobe.d/ containing the following line: options bnx2 disable_msi=1 Alternatively, you can disable MSI completely using the kernel boot parameter pci=nomsi.
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-08-26 13:34:56 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 391501, 448732, 454962    
Attachments:
Description Flags
Patch fixing smp irq affinity
none
fix smp_affinity for MSI none

Description Flavio Leitner 2008-02-12 03:29:23 UTC
Description of problem:
Echoing a mask to /proc/irq/<irq>/smp_affinity and then 
check /proc/interrupts shows that mask didn't work.

Version-Release number of selected component (if applicable):
Up to 2.6.18-80.el5

How reproducible:
Always

Steps to Reproduce:
1. Select a device with MSI without masking & pending bit support
2. Find device irq
3. Change mask echoing to /proc/irq/<IRQ>/smp_affinity
4. Check the results in /proc/interrupts
  
Actual results:
The mask is ignored

Expected results:
IRQs moved to another CPU

Additional info:
The MSI should be disabled when moving IRQs on chips without 
Mask-and-Pending bits. This patch based on upstream code 
merge two IRQ chip drivers and also add msi_set_enable() to 
handle this case. 

Flavio

Comment 1 Flavio Leitner 2008-02-12 03:29:23 UTC
Created attachment 294621 [details]
Patch fixing smp irq affinity

Comment 2 Steven Dake 2008-03-18 15:28:35 UTC
*** Bug 432877 has been marked as a duplicate of this bug. ***

Comment 3 Steven Dake 2008-03-18 16:47:44 UTC
please ignore comment #2, it is in error.

Comment 4 Michal Schmidt 2008-03-19 19:37:33 UTC
I'm testing the patch on hp-dl360g5-01.rhts.boston.redhat.com. It works.

I can see the patch combines pieces of upstream commits:
b1cbf4e4 [PATCH] msi: fix up the msi enable/disable logic
277bc33b [PATCH] msi: only use a single irq_chip for msi interrupts
58e0543e [PATCH] msi: support masking msi irqs without a mask bit

There should be no problem. I'm granting devel ACK.

Comment 5 RHEL Program Management 2008-03-19 19:39:47 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 6 Michal Schmidt 2008-03-26 19:44:10 UTC
When I was running a Perl testcase for this bug on hp-dl360g5-02 I noticed I had
a spurious interrupts on a seemingly unrelated IRQ.

The symptoms are very similar to a problem we had on the realtime kernel where
the masking of an active interrupt caused the interrupt controller to deliver an
extra "legacy" interrupt (bug 236150 and others).

On hp-dl360g5-02 with 2.6.18-86.el5 + the patch I can see in /proc/interrupts:
138: ...  PCI-MSI  eth0
169: ...  IO-APIC-level  ehci_hcd:usb1, uhci_hcd:usb2

With the patch I can migrate IRQ 138, but every interrupt there is duplicated to
IRQ 169. After 100000 such spurious interrupts, the kernel disables IRQ 169
completely. When booting with noapic, the three devices (eth0, usb1, usb2) share
the same interrupt line. This is consistent with the observations in the
realtime kernel bug.

I tried a 2.6.23-based Fedora kernel on the machine and it seemed to work
perfectly. I could migrate the MSI interrupt, and saw no spurious interrupts. So
there is a way to do it correctly. I just need to figure out which patches to
cherry-pick.

Comment 7 Michal Schmidt 2008-04-03 15:33:54 UTC
OK, the needed commit is 1769b46a "PCI MSI: always toggle legacy-INTx-enable bit
upon MSI entry/exit". I saw no spurious interrupts with this one added.

And I'm thinking of adding commits:
 - 348e3fd1 "msi: synchronously mask and unmask msi-x irqs."
 - 988cbb15 "PCI: Flush MSI-X table writes"

Their descriptions suggest they're important for MSI interrupt migration, but
the problems they are supposed to fix are not easily reproducible.


Comment 8 Flavio Leitner 2008-04-04 01:15:06 UTC
I saw the commit 1769b46a when I was debugging and considered it but didn't
notice those spurious interrupts, but yes, I agree we should include it. 

IMO, the 348e3fd1 is also good to include.

The 988cbb15 commit is MSI-X, so I would keep it separated in another bz to
avoid confusion. 


Comment 9 Michal Schmidt 2008-04-10 12:50:00 UTC
Created attachment 301970 [details]
fix smp_affinity for MSI

I sent a series of 5 individual small patches to rhkernel-list. This is the
series in one diff.

Comment 10 Michal Schmidt 2008-04-14 14:52:14 UTC
Andy Gospodarek suggested on the list that we should rebase msi code to newer
version from upstream, rather than cherry-picking. I'll try and see if we can go
to 2.6.21.

Comment 13 Michal Schmidt 2008-07-10 10:02:13 UTC
Rebasing the code to 2.6.21 had a too wide impact. Instead I took most of the
patches from 2.6.19 (among them were one of the patches I needed and all the
patches Andy wanted) and then I cherrypicked the remaining patches. I built a
test kernel here:
http://knedlo.englab.brq.redhat.com/scratch/mschmidt/bz432451/
Flavio, could you please test it?
I haven't tested it myself yet, I'm waiting for a reservation in RHTS to succeed.

Comment 17 Michal Schmidt 2008-07-11 06:41:11 UTC
I have the test machine reserved from RHTS. The test kernel panics during boot
on it. There's no need to test this kernel further, I have to find the bug first.

Comment 18 Michal Schmidt 2008-07-11 14:07:58 UTC
I found the bug, fixed it and verified it on the machine in RHTS. A scratch
build is currently in progress in Brew:
http://brewweb.devel.redhat.com/brew/taskinfo?taskID=1388619
It should finish in an hour or two. Then it would be nice if people could test
it. Thank you.


Comment 19 Michal Schmidt 2008-07-28 11:15:15 UTC
In a recent upstream discussion it became apparent that the patch "msi: support
masking msi irqs without a mask bit" causes the MSI registers to be handled in a
way that is prohibited by the PCI specification and it could cause spurious
interrupts to be generated and/or interrupts being lost.

The developers reached the conclusion that MSI IRQ affinity cannot be changed
reliably for PCI devices without MSI mask bits (an optional MSI feature):
http://lkml.org/lkml/2008/7/25/284

The Broadcom NetXtreme Ethernet device (bnx2) does not have MSI mask bits. If
setting of IRQ affinity is needed for it, MSI can be with the disable_msi module
parameter. Classical INTx interrupts are routed via an IO/APIC for which IRQ
affinity works.
In modprobe configuration that would be:
  options bnx2 disable_msi=1

Would disabling MSI for bnx2 have a big performance impact?


Comment 20 Andy Gospodarek 2008-07-28 13:19:50 UTC
Disabling MSI generally has a performance impact, but on some systems it is more
significant than others.

Comment 21 Michal Schmidt 2008-08-07 15:44:13 UTC
Don,

I suggest a release note for 5.3 "Known Issues":

<quote>
Setting of IRQ SMP affinity has no effect for some devices using message signalled interrupts (MSI) without MSI per-vector masking capability. Known affected are Broadcom NetXtreme Ethernet devices using the <code class="filename">bnx2</code> driver. If setting of IRQ affinity is required for such a device, MSI can be disabled using a module parameter by creating a file in <code class="filename">/etc/modprobe.d/</code> containing the line:
<pre>
options bnx2 disable_msi=1
</pre>
Alternatively, MSI capability can be disabled completely for the system by adding the kernel boot parameter <code class="command">pci=nomsi</code>.
</quote>

Comment 23 Ryan Lerch 2008-08-20 03:04:05 UTC
Release note added. If any revisions are required, please set the 
"requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

New Contents:
Setting of IRQ SMP affinity has no effect for some devices using message
signalled interrupts (MSI) without MSI per-vector masking capability. Known
affected are Broadcom NetXtreme Ethernet devices using the bnx2 driver. 

If setting of IRQ affinity is required for
such a device, MSI can be disabled using a module parameter by creating a file
in /etc/modprobe.d/ containing the line:

options bnx2 disable_msi=1

Alternatively, MSI capability can be disabled completely for the system by
adding the kernel boot parameter pci=nomsi

Comment 24 Don Domingo 2008-08-20 03:11:53 UTC
Release note updated. If any revisions are required, please set the 
"requires_release_notes"  flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

Diffed Contents:
@@ -1,12 +1,8 @@
-Setting of IRQ SMP affinity has no effect for some devices using message
-signalled interrupts (MSI) without MSI per-vector masking capability. Known
-affected are Broadcom NetXtreme Ethernet devices using the bnx2 driver. 
+Configuring IRQ SMP affinity has no effect on some devices that use message
+signalled interrupts (MSI) with no MSI per-vector masking capability. Examples of such devices include Broadcom NetXtreme Ethernet devices that use the bnx2 driver.
 
-If setting of IRQ affinity is required for
-such a device, MSI can be disabled using a module parameter by creating a file
-in /etc/modprobe.d/ containing the line:
+If you need to configure IRQ affinity for such a device, disable MSI by creating a file in /etc/modprobe.d/ containing the following line:
 
 options bnx2 disable_msi=1
 
-Alternatively, MSI capability can be disabled completely for the system by
+Alternatively, you can disable MSI completely using the kernel boot parameter pci=nomsi.-adding the kernel boot parameter pci=nomsi

Comment 25 Ryan Lerch 2008-08-20 05:21:30 UTC
Tracking this bug for the Red Hat Enterprise Linux 5.3 Release Notes. 

This Release Note is currently located in the Known issues' section.

Comment 26 Michal Schmidt 2008-08-26 13:34:56 UTC
The related ITs are marked as Resolved and the issue is described in the release notes. Closing with WONTFIX.