Bug 434822
| Summary: | OpenIB broken in 2.6.24.1-24.el5rt | ||
|---|---|---|---|
| Product: | Red Hat Enterprise MRG | Reporter: | Gurhan Ozen <gozen> |
| Component: | realtime-kernel | Assignee: | Clark Williams <williams> |
| Status: | CLOSED CANTFIX | QA Contact: | |
| Severity: | low | Docs Contact: | |
| Priority: | low | ||
| Version: | 1.0 | CC: | bhu, dledford, jburke |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | All | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2008-07-02 14:34:29 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
I just put the latest ofed-1.3 into our 2.6.24.3-rt3 based kernel. Please try that kernel from brew. Clark Ugh, I could have added the version you need, eh? kernel-rt-2.6.24.3-29.el5rt. As a follow up to my email to rhel-rt-internal list, i am changing the state to fails-qa because in kernel-rt-2.6.24.3-29.el5rt, for some reason, the order of IB devices returned is reversed somehow. I was making comparison between kernel-rt-2.6.24.3-29.el5rt and RHEL5.2 kernel, when i ran the tests on kernel-rt-2.6.24.3-29.el5rtvanilla, the order of ib devices were reversed as well. I have run all ib/openmpi tests and they all pass with kernel-rt-2.6.24.3-29.el5rt installed on RHEL5.2-Server-20030313.1 tree. Are we still broken in -65 (the GA kernel)? Clark, Looking a the change log from the kernel: Changelog: * Fri Jun 06 2008 Clark Williams <williams> - 2.6.24-65 - replaced peterz's slab fix with v2 patch - replaced rostedt's ftrace hotplug fix wth v2 patch What would have changed to have fix the issues Gurhan was seeing? Ah, I didn't read close enough to see that it's a device ordering issue. So yeah, we're still borken. Actually, we aren't broken and this bug should be closed. Upstream has obviously changed the sort order on Gurhan's hardware (so maybe pci=breadth or one of the other sort modifying options is possibly in order) and if it had happened in the middle of a single product lifecycle, that would be a bug we have to fix. However, this went out GA with the sort reversed, and now that sorting order has to be maintained in order to preserve existing systems when updates to MRG go out. In short, it's too late to anything about this issue, and we patently *can't* allow anything to be done about it. |
Description of problem: OpenIB stack of the kernel seems to be broken in kernel 2.6.24.1-24.el5rt . In a lot of cases, a local lid can't even be detected: # ib_send_lat ------------------------------------------------------------------ Send Latency Test Inline data is used up to 400 bytes message Connection type : RC Local lid 0x0 detected. Is an SM running? Even though: # ibstat CA 'mthca0' CA type: MT25208 (MT23108 compat mode) Number of ports: 2 Firmware version: 4.6.2 Hardware version: a0 Node GUID: 0x0002c90200200fcc System image GUID: 0x0002c90200200fcf Port 1: State: Active Physical state: LinkUp Rate: 10 Base lid: 1 LMC: 0 SM lid: 1 Capability mask: 0x00510a6a Port GUID: 0x0002c90200200fcd Port 2: State: Down Physical state: Polling Rate: 10 Base lid: 0 LMC: 0 SM lid: 0 Capability mask: 0x00510a68 Port GUID: 0x0002c90200200fce No programs can be run over the fabric. No obvious error/debug messages are in dmesg, or /var/log/{messages,osm}.log Version-Release number of selected component (if applicable): # uname -a Linux dell-pe1950-02.rhts.boston.redhat.com 2.6.24.1-24.el5rt #1 SMP PREEMPT RT Mon Feb 11 17:19:56 EST 2008 x86_64 x86_64 x86_64 GNU/Linux # rpm -qa | egrep "openib|libib|rdma|sdp" | sort |uniq libibcm-1.0.1-1.el5 libibcm-debuginfo-1.0.1-1.el5 libibcm-devel-1.0.1-1.el5 libibcm-static-1.0.1-1.el5 libibcommon-1.0.7-1.el5 libibcommon-debuginfo-1.0.7-1.el5 libibcommon-devel-1.0.7-1.el5 libibcommon-static-1.0.7-1.el5 libibmad-1.1.5-1.el5 libibmad-debuginfo-1.1.5-1.el5 libibmad-devel-1.1.5-1.el5 libibmad-static-1.1.5-1.el5 libibumad-1.1.6-1.el5 libibumad-debuginfo-1.1.6-1.el5 libibumad-devel-1.1.6-1.el5 libibumad-static-1.1.6-1.el5 libibverbs-1.1.1-8.el5 libibverbs-debuginfo-1.1.1-8.el5 libibverbs-devel-1.1.1-8.el5 libibverbs-static-1.1.1-8.el5 libibverbs-utils-1.1.1-8.el5 librdmacm-1.0.5-1.el5 librdmacm-debuginfo-1.0.5-1.el5 librdmacm-devel-1.0.5-1.el5 librdmacm-static-1.0.5-1.el5 librdmacm-utils-1.0.5-1.el5 libsdp-1.1.99-8.el5 libsdp-debuginfo-1.1.99-8.el5 openib-1.3-1.el5 sdpnetstat-1.50-6.el5_1.1 How reproducible: Everytime Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: