Bug 438776
Summary: | kernel panic with ib_ipath module with kernel 2.6.18-85.el5.x86_64 | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Gurhan Ozen <gozen> | ||||||
Component: | kernel | Assignee: | Doug Ledford <dledford> | ||||||
Status: | CLOSED ERRATA | QA Contact: | |||||||
Severity: | high | Docs Contact: | |||||||
Priority: | high | ||||||||
Version: | 5.2 | CC: | ananth, hancockrwd, jburke | ||||||
Target Milestone: | rc | ||||||||
Target Release: | --- | ||||||||
Hardware: | x86_64 | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | RHBA-2008-0314 | Doc Type: | Bug Fix | ||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2008-05-21 15:12:27 UTC | Type: | --- | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
Gurhan Ozen
2008-03-25 02:42:44 UTC
The fix isn't a kernel patch, it's a change to the initscripts. It will be present in the openib-1.3-2.el5 package. Ok, so I have been trying all this with the latest packages from RHBA-2008:8175-11 advisory and 2.6.18-88.el5 kernel, however i can't get qlogic pciE cards to work at all. Kernel can see and recognize the cards: 22:00.0 InfiniBand: PathScale, Inc InfiniPath PE-800 (rev 01) Subsystem: PathScale, Inc InfiniPath PE-800 3a:00.0 InfiniBand: Mellanox Technologies MT23108 InfiniHost (rev a1) Subsystem: Mellanox Technologies MT23108 InfiniHost 51:00.0 InfiniBand: PathScale, Inc InfiniPath PE-800 (rev 02) Subsystem: PathScale, Inc InfiniPath PE-800 60:14.0 InfiniBand: PathScale, Inc InfiniPath HT-400 (rev 03) Subsystem: PathScale, Inc InfiniPath HT-400 But can't get them up: Apr 2 22:40:29 ibm-ridgeback kernel: ib_ipath 0000:22:00.0: IB link is not ACTIVE Apr 2 22:40:30 ibm-ridgeback kernel: ib_ipath 0000:51:00.0: IB link is not ACTIVE I'll attach output of dmidecode to this bug. If you'd like to poke around, the box is ibm-ridgeback.rhts.boston.redhat.com . Created attachment 300164 [details]
dmidecode output
This isn't an infiniband problem. This is related to the MMCONF PCI changes made in the rhel5.2 kernel. This is either an accidental or intentional victim of those changes. In particular, we see these messages from the kernel: Linux version 2.6.18-88.el5 (brewbuilder.redhat.com) (gcc version 4.1.2 20071124 (Red Hat 4.1.2-41)) #1 SMP Tue Apr 1 19:01:18 EDT 2008 Command line: ro root=/dev/VolGroup00/LogVol00 console=tty0 console=ttyS0,115200 rhgb quiet ... ACPI: bus type pci registered PCI: Using MMCONFIG at f0000000 PCI: No mmconfig possible on device 0:18 PCI: No mmconfig possible on device 0:19 PCI: No mmconfig possible on device 0:1a PCI: No mmconfig possible on device 0:1b PCI: Buses that can't use MMCONFIG will use type 1 PCI conf access. ACPI: Interpreter enabled ACPI: Using IOAPIC for interrupt routing ACPI: PCI Root Bridge [PCI0] (0000:00) PCI: Probing PCI hardware (bus 00) Boot video device is 0000:00:01.0 PCI: Ignoring BAR0-3 of IDE controller 0000:00:08.1 PCI: If a device isn't working, try "pci=nommconf". ... PCI: Using ACPI for IRQ routing PCI: If a device doesn't work, try "pci=routeirq". If it helps, post a report ... PCI: MSI quirk detected. MSI deactivated. PCI: Setting latency timer of device 0000:00:0a.0 to 64 assign_interrupt_mode Found MSI capability Allocate Port Service[0000:00:0a.0:pcie00] Allocate Port Service[0000:00:0a.0:pcie01] PCI: Setting latency timer of device 0000:00:0b.0 to 64 assign_interrupt_mode Found MSI capability Allocate Port Service[0000:00:0b.0:pcie00] Allocate Port Service[0000:00:0b.0:pcie01] PCI: Setting latency timer of device 0000:00:0c.0 to 64 assign_interrupt_mode Found MSI capability Allocate Port Service[0000:00:0c.0:pcie00] Allocate Port Service[0000:00:0c.0:pcie01] PCI: Setting latency timer of device 0000:00:0d.0 to 64 assign_interrupt_mode Found MSI capability Allocate Port Service[0000:00:0d.0:pcie00] Allocate Port Service[0000:00:0d.0:pcie01] PCI: Setting latency timer of device 0000:00:0e.0 to 64 assign_interrupt_mode Found MSI capability Allocate Port Service[0000:00:0e.0:pcie00] Allocate Port Service[0000:00:0e.0:pcie01] PCI: Setting latency timer of device 0000:40:0f.0 to 64 assign_interrupt_mode Found MSI capability Allocate Port Service[0000:40:0f.0:pcie00] Allocate Port Service[0000:40:0f.0:pcie01] PCI: Setting latency timer of device 0000:40:10.0 to 64 assign_interrupt_mode Found MSI capability Allocate Port Service[0000:40:10.0:pcie00] Allocate Port Service[0000:40:10.0:pcie01] PCI: Setting latency timer of device 0000:40:11.0 to 64 assign_interrupt_mode Found MSI capability Allocate Port Service[0000:40:11.0:pcie00] Allocate Port Service[0000:40:11.0:pcie01] PCI: Setting latency timer of device 0000:40:12.0 to 64 assign_interrupt_mode Found MSI capability Allocate Port Service[0000:40:12.0:pcie00] Allocate Port Service[0000:40:12.0:pcie01] PCI: Setting latency timer of device 0000:40:13.0 to 64 assign_interrupt_mode Found MSI capability Allocate Port Service[0000:40:13.0:pcie00] Allocate Port Service[0000:40:13.0:pcie01] ... PCI: Setting latency timer of device 0000:22:00.0 to 64 ib_ipath 0000:22:00.0: infinipath0: pci_enable_msi failed: -22, interrupts may not work ib_ipath 0000:22:00.0: infinipath0: irq is 0, BIOS error? Interrupts won't work ib_ipath 0000:22:00.0: No interrupts detected, not usable. ... ib_ipath 0000:51:00.0: infinipath1: pci_enable_msi failed: -22, interrupts may not work ib_ipath 0000:51:00.0: infinipath1: irq is 0, BIOS error? Interrupts won't work ib_ipath 0000:51:00.0: No interrupts detected, not usable. ... ib_ipath 0000:22:00.0: IB link is not ACTIVE ib_ipath 0000:51:00.0: IB link is not ACTIVE So, the long and short of it is that on this particular hardware, with rhel5.1 kernels, msi interrupts worked on these two cards and now they don't. It would seem that the changes to the MSI interrupt handlers in the kernel are to blame. Changing component to kernel. One of Andy Gospodarek's PCI patches resolved the issue entirely. I'll attach that patch to this report and also post to rhkernel-list. Created attachment 300290 [details]
Quirk patch
This patch (from Andy Gospodarek) has been confirmed to solve the problem on
the target machine.
Yeah, indeed the kernel with Andy's patch ,2.6.18-88.el5.ht1000_quirk, is working.. We should have another spin for 5.2 to have this patch included ... *** Bug 439110 has been marked as a duplicate of this bug. *** in kernel-2.6.18-89.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5 (In reply to comment #13) > in kernel-2.6.18-89.el5 > You can download this test kernel from http://people.redhat.com/dzickus/el5 Yup, this kernel indeed works: [root@ibm-ridgeback ~]# uname -a Linux ibm-ridgeback.rhts.boston.redhat.com 2.6.18-89.el5 #1 SMP Tue Apr 8 16:04:14 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux [root@ibm-ridgeback ~]# ibstat ipath1 CA 'ipath1' CA type: InfiniPath_QLE7140 Number of ports: 1 Firmware version: Hardware version: 2 Node GUID: 0x0011750000ffd9ce System image GUID: 0x001175000068709f Port 1: State: Active Physical state: LinkUp Rate: 10 Base lid: 3 LMC: 0 SM lid: 2 Capability mask: 0x02010800 Port GUID: 0x0011750000ffd9ce An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2008-0314.html *** Bug 241257 has been marked as a duplicate of this bug. *** |