Red Hat Bugzilla – Bug 468267
Interrupts presented through multiple P2P PCI Express bridges to the OS are not processed correctly
Last modified: 2014-06-02 09:09:23 EDT
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:184.108.40.206) Gecko/20080829 Firefox/220.127.116.11
It appears that Red Hat 5.2 kernel is only processing interrupts that are mapped to INTA and does not correctly map the INTx virtual wires correctly across multiple P2P (peer-to-peer) bridges to the system interrupt resources. Therefore, interrupts that are mapped to INTB, INTC, or INTD are not serviced.
According to the PCIe Base Specifications 1.0a , when INTx interrupts are presented across a switch ”Virtual and actual PCI to PCI Bridges must map the virtual wires tracked on the secondary side of the Bridge according to the Device Number of the Device on the secondary side of the Bridge, as shown in Table 2-13”. Page 66 PCIe Base Specification Rev 1.0a
Device Number for Device INTx Virtual Wire on Mapping to INTx Virtual
on Secondary Side of Secondary Side of Bridge Wire on Primary Side of
Bridge (Interrupt Source) Bridge
0,4,8,12,16,20,24,28 INTA INTA
1,5,9,13,17,21,25,29 INTA INTB
2,6,10,14,18,22,26,30 INTA INTC
3,7,11,15,19,23,27,31 INTA INTD
We have seen in PCI Express traces that the ASSERT_INTx is generated by the PCI Express endpoint, but only in the case where an INTA is passed up as the DEASSERT_INTx returned. The failure is not exhibited by the EXACT same hardware/firmware running Fedora with kernel 2.6.22-5
Steps to Reproduce:
1. Hardware setup must be such that multiple PCI-Express P2P bridges exist and the interrupt generated will be mapped to INT B, C, or D. INT A will not fail.
2. RHEL 5.2 for PPC64 running on an IBM QS-21 blade with a NextIO N1400-PCM PCI Express High Speed Switch Module (PCM) and the N2800-I/O Consolidation Appliance (ICA)are useful in creating the failing scenario.
3. Set the kernel parameter pci=nomsi so that the PCI Express endpoint generates legacy interrupts instead of MSIs
4. force the endpoint to generate an interrupt
The interrupt is never serviced.
The interrupt should be serviced.
If hardware is needed to reproduce this issue and further debug, please contact email@example.com
It also will be possible to debug further in NextIO's lab and send results to RedHat.
The problem is not a bug in RHEL5, but rather a known deficiency in the firmware. The QS21 firmware does not formally support P2P bridges at this point because INTx interrupts are known to be misrouted in the device tree information that is passed to the operating system.
If you wish to add support for INTx in QS21 and/or get formal support for P2P bridges in that firmware, please open a support request with your IBM contact.
Arnd, how can that explain why we have no issues with Fedora 7?
I suspect that on the Fedora 7 system, you had installed the extension card in one of the working slots. It is really hard to tell now because you are running an obscure kernel version on an outdated distribution.
Try reproducing with Fedora 9, or 2.6.27 or at least the latest Fedora 7 kernel (18.104.22.168-88.fc7), and post the output of 'lspci -vvx' on RHEL5.2 and a working kernel.
Also, if you have no issues with Fedora, why insist on using RHEL with a non-default kernel boot option?
We captured PCIe traces in Fedora 7 and RHEL 5.2 using the same slots. We saw the INTA, INTB, INTC, and INTD asserted/deasserted in Fedora 7, but not in RHEL 5.2. In RHEL 5.2, we only see INTA asserted/deasserted; INTB , INTC, INTD are asserted but we never see them deasserted.
please attach the output of 'lspci -vvx' and the contents of /proc/interrupts on both systems.
Created attachment 321461 [details]
Tarball of /proc/interrupts and lspci -vvx from rhel 5.2 and fedora 7
Added a tarball of /proc/interrupts and lspci -vvx from rhel 5.2 and fedora 7. The PCI Express endpoint of interest is a SysKonnect NIC (eth2) mapped to INTC.
The two listings show the same results:
The attachment lists device 0002:14:00.0 as having interrupt 106, which is INTA of the PCIe host bridge. The device is connected through the bridges 02:00:00.0, 02:01:00.0, 02:02:0a.9, 02:08:00.0 and finally 02:09:0c.0, which means that according to the PCI IRQ swizzling rules you listed in comment #1, it should be 108 (INTC) of the host controller.
Moreover, the device has never received any interrupts, indicating that it is not even intialized, although the device driver has clearly been loaded.
From all I can tell, the device is just as broken in Fedora 7 as it is in RHEL.
The fact that IRQ 106 shows up in /proc/interrupts means that the device has received an interrupt. It would not be in the table if it had not. The difference is that RHEL 5.2 is not getting the deassert back to the device, whereas Fedora does. This is confirmed in the PCI Express traces showing that the interrupt deassert happens in Fedora, but not in RHEL.
I can send you results from testing with another device if it will help to convince you. I have done testing with another device, an AJA video capture card, that shows the same end result. It is broken in RHEL 5.2 and fully functional in Fedora 7. The results are cleaner though. It will show the interrupt counter increasing in Fedora 7, but not in RHEL. It will also show you the same mapping in both OSes, pin INTA routed to IRQ 106. I will have to repeat the tests to capture the output that you want though. Do you want those results captured?
Although the device driver for the SysKonnect has issues in both kernels that I have not looked into, it is not just as broken in Fedora 7 as it is in RHEL. In Fedora 7, the device is functional as an ethernet controller and it is not in RHEL 5.2 (DHCP lease, ping, etc work in Fedora, not in RHEL).
I have to correct a statement made in comment #9. I have not tested the AJA card in Fedora 7. We do not have a driver available to test the device in that kernel. What I did test was that device mapped to pin INTA and then again to pin INTC in RHEL5.2 and showed that interrupts were hung on INTC. I will attach those results.
Created attachment 321627 [details]
Tarball of /proc/interrupts and lspci -vvx from rhel 5.2 with AJA on INTA and INTC
This file contains the results of lspci -vvx and cat /proc/interrupts run in 2 different tests: the AJA card on INTA and the AJA card on INTC, both in rhel 5.2
(In reply to comment #11)
> Created an attachment (id=321627) [details]
Thanks, this confirms what I was saying earlier about the firmware. In your 'INTA' listing, INTA of the AJA card is correctly routed to IRQ 106, in your 'INTC' listing, the firmware also routes the INTA line to IRQ 106, but it should be IRQ 108.
(In reply to comment #9)
> In Fedora 7, the device is functional as an ethernet controller and it
> is not in RHEL 5.2 (DHCP lease, ping, etc work in Fedora, not in RHEL).
If it works in Fedora, that is probably the result of an unrelated bug in
Fedora and purely coincidence.
IBM has confirmed that this is a bug in the QS-21 firmware as explained in comment #1. A bug fix has been provided and confirmed to work correctly.
This bug/component is not included in scope for RHEL-5.11.0 which is the last RHEL5 minor release. This Bugzilla will soon be CLOSED as WONTFIX (at the end of RHEL5.11 development phase (Apr 22, 2014)). Please contact your account manager or support representative in case you need to escalate this bug.
Thank you for submitting this request for inclusion in Red Hat Enterprise Linux 5. We've carefully evaluated the request, but are unable to include it in RHEL5 stream. If the issue is critical for your business, please provide additional business justification through the appropriate support channels (https://access.redhat.com/site/support).