Bug 468267
Summary: | Interrupts presented through multiple P2P PCI Express bridges to the OS are not processed correctly | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | ldekay <ldekay> | ||||||
Component: | kernel | Assignee: | Arnd Bergmann <arnd> | ||||||
Status: | CLOSED WONTFIX | QA Contact: | Red Hat Kernel QE team <kernel-qe> | ||||||
Severity: | low | Docs Contact: | |||||||
Priority: | low | ||||||||
Version: | 5.2 | CC: | arnd, hannsj_uhl, ldekay | ||||||
Target Milestone: | rc | ||||||||
Target Release: | --- | ||||||||
Hardware: | ppc64 | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2014-06-02 13:09:23 UTC | Type: | --- | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
ldekay
2008-10-23 19:59:01 UTC
The problem is not a bug in RHEL5, but rather a known deficiency in the firmware. The QS21 firmware does not formally support P2P bridges at this point because INTx interrupts are known to be misrouted in the device tree information that is passed to the operating system. If you wish to add support for INTx in QS21 and/or get formal support for P2P bridges in that firmware, please open a support request with your IBM contact. Arnd, how can that explain why we have no issues with Fedora 7? I suspect that on the Fedora 7 system, you had installed the extension card in one of the working slots. It is really hard to tell now because you are running an obscure kernel version on an outdated distribution. Try reproducing with Fedora 9, or 2.6.27 or at least the latest Fedora 7 kernel (2.6.23.17-88.fc7), and post the output of 'lspci -vvx' on RHEL5.2 and a working kernel. Also, if you have no issues with Fedora, why insist on using RHEL with a non-default kernel boot option? We captured PCIe traces in Fedora 7 and RHEL 5.2 using the same slots. We saw the INTA, INTB, INTC, and INTD asserted/deasserted in Fedora 7, but not in RHEL 5.2. In RHEL 5.2, we only see INTA asserted/deasserted; INTB , INTC, INTD are asserted but we never see them deasserted. please attach the output of 'lspci -vvx' and the contents of /proc/interrupts on both systems. Created attachment 321461 [details]
Tarball of /proc/interrupts and lspci -vvx from rhel 5.2 and fedora 7
Added a tarball of /proc/interrupts and lspci -vvx from rhel 5.2 and fedora 7. The PCI Express endpoint of interest is a SysKonnect NIC (eth2) mapped to INTC. The two listings show the same results: The attachment lists device 0002:14:00.0 as having interrupt 106, which is INTA of the PCIe host bridge. The device is connected through the bridges 02:00:00.0, 02:01:00.0, 02:02:0a.9, 02:08:00.0 and finally 02:09:0c.0, which means that according to the PCI IRQ swizzling rules you listed in comment #1, it should be 108 (INTC) of the host controller. Moreover, the device has never received any interrupts, indicating that it is not even intialized, although the device driver has clearly been loaded. From all I can tell, the device is just as broken in Fedora 7 as it is in RHEL. The fact that IRQ 106 shows up in /proc/interrupts means that the device has received an interrupt. It would not be in the table if it had not. The difference is that RHEL 5.2 is not getting the deassert back to the device, whereas Fedora does. This is confirmed in the PCI Express traces showing that the interrupt deassert happens in Fedora, but not in RHEL. I can send you results from testing with another device if it will help to convince you. I have done testing with another device, an AJA video capture card, that shows the same end result. It is broken in RHEL 5.2 and fully functional in Fedora 7. The results are cleaner though. It will show the interrupt counter increasing in Fedora 7, but not in RHEL. It will also show you the same mapping in both OSes, pin INTA routed to IRQ 106. I will have to repeat the tests to capture the output that you want though. Do you want those results captured? Although the device driver for the SysKonnect has issues in both kernels that I have not looked into, it is not just as broken in Fedora 7 as it is in RHEL. In Fedora 7, the device is functional as an ethernet controller and it is not in RHEL 5.2 (DHCP lease, ping, etc work in Fedora, not in RHEL). I have to correct a statement made in comment #9. I have not tested the AJA card in Fedora 7. We do not have a driver available to test the device in that kernel. What I did test was that device mapped to pin INTA and then again to pin INTC in RHEL5.2 and showed that interrupts were hung on INTC. I will attach those results. Created attachment 321627 [details]
Tarball of /proc/interrupts and lspci -vvx from rhel 5.2 with AJA on INTA and INTC
This file contains the results of lspci -vvx and cat /proc/interrupts run in 2 different tests: the AJA card on INTA and the AJA card on INTC, both in rhel 5.2
(In reply to comment #11) > Created an attachment (id=321627) [details] Thanks, this confirms what I was saying earlier about the firmware. In your 'INTA' listing, INTA of the AJA card is correctly routed to IRQ 106, in your 'INTC' listing, the firmware also routes the INTA line to IRQ 106, but it should be IRQ 108. (In reply to comment #9) > In Fedora 7, the device is functional as an ethernet controller and it > is not in RHEL 5.2 (DHCP lease, ping, etc work in Fedora, not in RHEL). If it works in Fedora, that is probably the result of an unrelated bug in Fedora and purely coincidence. IBM has confirmed that this is a bug in the QS-21 firmware as explained in comment #1. A bug fix has been provided and confirmed to work correctly. This bug/component is not included in scope for RHEL-5.11.0 which is the last RHEL5 minor release. This Bugzilla will soon be CLOSED as WONTFIX (at the end of RHEL5.11 development phase (Apr 22, 2014)). Please contact your account manager or support representative in case you need to escalate this bug. Thank you for submitting this request for inclusion in Red Hat Enterprise Linux 5. We've carefully evaluated the request, but are unable to include it in RHEL5 stream. If the issue is critical for your business, please provide additional business justification through the appropriate support channels (https://access.redhat.com/site/support). The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days |