Bug 438330
Summary: | HP dl360g5: pci_enable_msix() fails | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Eli Cohen <eli> | ||||||
Component: | kernel | Assignee: | Tony Camuso <tcamuso> | ||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Red Hat Kernel QE team <kernel-qe> | ||||||
Severity: | high | Docs Contact: | |||||||
Priority: | low | ||||||||
Version: | 5.1 | CC: | dledford, dzickus | ||||||
Target Milestone: | rc | ||||||||
Target Release: | --- | ||||||||
Hardware: | x86_64 | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2010-11-17 19:08:52 UTC | Type: | --- | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
Eli Cohen
2008-03-20 13:03:32 UTC
I think your problem may be hardware specific. From RHEL5.2 beta kernel on a Dell PowerEdge 1900 (or 1950, can't remember which): [dledford@ib0test2 ~]$ cat /proc/interrupts CPU0 CPU1 CPU2 CPU3 0: 533316704 0 0 0 IO-APIC-edge timer 1: 3 0 0 0 IO-APIC-edge i8042 6: 5 0 0 0 IO-APIC-edge floppy 8: 1 0 0 0 IO-APIC-edge rtc 9: 0 0 0 0 IO-APIC-level acpi 12: 4 0 0 0 IO-APIC-edge i8042 14: 4787812 2952 0 0 IO-APIC-edge ide0 66: 23814940 15201 0 0 IO-APIC-level uhci_hcd:usb1, uhci_hcd:usb3, ehci_hcd:usb5 74: 39 0 0 0 IO-APIC-level uhci_hcd:usb2, uhci_hcd:usb4 82: 556320 2038 40 0 IO-APIC-level libata 90: 93 119 0 0 PCI-MSI ib_ipath 98: 61 57 0 0 PCI-MSI ib_ipath 106: 0 0 0 0 PCI-MSI-X eth1 114: 27 0 0 0 PCI-MSI-X eth1 (queue 0) 178: 1740629 0 0 0 PCI-MSI eth0 NMI: 5939 2273 2527 2550 LOC: 522492125 522744130 522491999 522743982 ERR: 0 MIS: 0 [dledford@ib0test2 ~]$ I seem to recall we had to blacklist certain motherboard chipsets due to faulty MMCFG cycles. You may have one of those motherboards/chipsets and it may be refusing to allocate MSI interrupts because of that. Can you check to see if this is one of the affected platforms? The bugzillas for the RHEL5 bugs related to this are: Bugzillas ========= 182436 xw9400 AMD proccessors do not support ext config space 239673 stalled installation over HP Compaq 7700 250313 MCP55 chipset hides PCI EXTCFG 251032 add HP dl385g2 and dl585g2 to whitelist 252215 dl585g2/AMD8132 blacklist 253288 PCI domain support for x86/x86_64 408551 all PCI express registers are not accessible It would be good to know what platform Eli was on. MMCFG is completely orthogonal to MSI-X. But there are also PCI quirks that disable MSI on various systems, and Eli may be hitting a new one. The server is HP Proliant DL 360G5. We have kernel 2.6.24 running on this server with MSIX working fine. Please try the latest RHEL 5.2 build and attach the output of dmesg. There is a MMCONFIG patch ACKed in Dec '07 and incorporated in Jan that should help any MMCONF problems. Rather than using a blacklist, the patch first tests the Nrorthbridge to see if MMCONFIG works. If so, then MMCONFIG (ergo MSI) is available. If the Northbridge does not respond correctly to MMCONFIG cycles, it is constrained to PortIO accesses, which may preclude MSI configuration. We don't have here RHEL 5.2. If you send us a copy we will install and test here. Can you see any kind of HW using MSIX on this machine with RHEL 5.2? Eli, you can grab the latest RHEL5.2 build from http://people.redhat.com/dzickus/el5/ AFAICT MSI-X is working: [root@hp-dl360g5-01 ~]# cat /proc/interrupts | grep MSI 114: 8667 1958 183 0 0 0 0 0 PCI-MSI-X cciss0 138: 5558 0 0 0 0 0 0 0 PCI-MSI eth0 P. The RPMs we found at the url you specified do not contain kernel header files so we can't build our driver. Can you point us to missing kernels? Here is the rpm for the latest kernel sources http://people.redhat.com/dzickus/el5/88.el5/src/kernel-2.6.18-88.el5.src.rpm And the mlx4 driver is already in that kernel, so you don't need to build it separately. However, I'm pretty sure the problem here isn't related to the mlx4 driver but is related to the core msi interrupt handling instead. What I would greatly appreciate it if you could do is install the src rpm from comment #8, then build two new kernels using the two patches I'm going to attach to this bug. The easiest way to do that would be something like this: install the src rpm above download the two patches I'm attaching to this bug cd /usr/src/redhat/SPECS edit kernel-2.6.spec to uncomment the #% define buildid and use it to differentiate between the two builds for each build, copy one of the patches to /usr/src/redhat/SOURCES/linux-kernel-test.patch and then run rpmbuild --ba --with baseonly kernel-2.6.spec The binary rpms will get spit out into /usr/src/redhat/RPMS/<arch> where they can be installed and tested. I'm interested in knowing if either of the attached patches by themselves solves your problem, and if neither does, then whether or not both of them together does. Created attachment 300288 [details]
Add a quirk for ht1000 pci-e bridges
This patch should really only make a difference if the failing system uses
HT1000 pci-e bridges.
Created attachment 300289 [details]
4 different upstream changes related to msi handling
This is a more generic set of changes that might resolve the issue regardless
of the chipset in the system.
Doug, Is there any additional status? Any testing you want me to do? Or can we close this? Not getting any responses, so I assume the patch submitted by Doug in comment 11 has fixed the problem. Please close this bug. |