Bug 477609 - Driver for IBM eHCA Infiniband adapter (ib-ehca.ko) is not working in RHEL5 kernel >122.el5
Summary: Driver for IBM eHCA Infiniband adapter (ib-ehca.ko) is not working in RHEL5 k...
Keywords:
Status: CLOSED DUPLICATE of bug 477000
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.3
Hardware: ppc64
OS: Linux
low
high
Target Milestone: beta
: ---
Assignee: Doug Ledford
QA Contact: Red Hat Kernel QE team
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2008-12-22 12:52 UTC by Yury Konovalov
Modified: 2011-10-27 17:28 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-10-27 17:28:57 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
dmesg out on 2.6.18-128.el5 kernel (124.77 KB, text/plain)
2008-12-22 12:52 UTC, Yury Konovalov
no flags Details

Description Yury Konovalov 2008-12-22 12:52:37 UTC
Created attachment 327636 [details]
dmesg out on 2.6.18-128.el5 kernel

Description of problem:
Driver for IBM eHCA Infiniband adapter is not working in RHEL5 kernel >122.el5

Version-Release number of selected component (if applicable):
RHEL5 kernel >122.el5

How reproducible:
1) Enable IPoIB-CM. For example by inserting following lines to openibd initscript:
    echo connected >/sys/class/net/ib0/mode
    echo connected >/sys/class/net/ib1/mode
    ip link set ib0 mtu 65520
    ip link set ib1 mtu 65520

1) Install one of the testing kernel (reproducible through 123-128) from http://people.redhat.com/dzickus/el5/ on IBM p520 (9131-52A) [System Firmware is SF240_358] with dual-port eHCA GX adapter (hw_ver: 0x1000002).
2) reboot with new kernel. 

Actual results:
any attempt to send IP packet though IB will fail with following message being printed to kernel log:

 ehca lhca@23000100: PU0000 EHCA_ERR:print_error_data EHCA ----- error data end ----------------------------------------------------                          
ehca lhca@23000100: PU0000 EHCA_ERR:print_error_data QP 0xe7 (resource=20000000000000e7) has errors.                                                         
ehca lhca@23000100: PU0000 EHCA_ERR:print_error_data Error data is available: 20000000000000e7.                                                              
ehca lhca@23000100: PU0000 EHCA_ERR:print_error_data EHCA ----- error data begin ---------------------------------------------------                         
EHCA_DMP:print_error_data resource=20000000000000e7 adr=c000000130a3b000 ofs=0000 00000000000004d0 20000000000000e7                                          
EHCA_DMP:print_error_data resource=20000000000000e7 adr=c000000130a3b010 ofs=0010 0100000000000310 8000000000000000                                          
EHCA_DMP:print_error_data resource=20000000000000e7 adr=c000000130a3b020 ofs=0020 a000000000000000 0000000000000000                                          
EHCA_DMP:print_error_data resource=20000000000000e7 adr=c000000130a3b030 ofs=0030 0000000000200000 0000000000000000                                          
EHCA_DMP:print_error_data resource=20000000000000e7 adr=c000000130a3b040 ofs=0040 0000000000000000 0000000000000000                                          
EHCA_DMP:print_error_data resource=20000000000000e7 adr=c000000130a3b050 ofs=0050 0000000000000400 0000000000000000                                          
EHCA_DMP:print_error_data resource=20000000000000e7 adr=c000000130a3b060 ofs=0060 0000000000000000 0000000000000000                                          
EHCA_DMP:print_error_data resource=20000000000000e7 adr=c000000130a3b070 ofs=0070 000000000000ffff 0000000000000000                                          
EHCA_DMP:print_error_data resource=20000000000000e7 adr=c000000130a3b080 ofs=0080 0b000000003363c9 0000000000ffffff                                          
EHCA_DMP:print_error_data resource=20000000000000e7 adr=c000000130a3b090 ofs=0090 0000000000ffffff 00000000ffffff00                                          
EHCA_DMP:print_error_data resource=20000000000000e7 adr=c000000130a3b0a0 ofs=00a0 0000000000000066 0000000000000006                                          
EHCA_DMP:print_error_data resource=20000000000000e7 adr=c000000130a3b0b0 ofs=00b0 0000000000000002 0000000000000007                                          
EHCA_DMP:print_error_data resource=20000000000000e7 adr=c000000130a3b0c0 ofs=00c0 0000000000000000 0000000000000000                                          
EHCA_DMP:print_error_data resource=20000000000000e7 adr=c000000130a3b0d0 ofs=00d0 0000000000000000 0000000000000000                                          
EHCA_DMP:print_error_data resource=20000000000000e7 adr=c000000130a3b0e0 ofs=00e0 0000000000000000 0000000000000000                                          
EHCA_DMP:print_error_data resource=20000000000000e7 adr=c000000130a3b0f0 ofs=00f0 0000000000000000 0000000000000004                                          
EHCA_DMP:print_error_data resource=20000000000000e7 adr=c000000130a3b100 ofs=0100 000000000000000c 0000000000000000                                          
EHCA_DMP:print_error_data resource=20000000000000e7 adr=c000000130a3b110 ofs=0110 0000000000000000 0000000000000008                                          
EHCA_DMP:print_error_data resource=20000000000000e7 adr=c000000130a3b120 ofs=0120 0000000142a70000 0000000003af8680                                          
EHCA_DMP:print_error_data resource=20000000000000e7 adr=c000000130a3b130 ofs=0130 0000000000000000 0000000000000000                                          
EHCA_DMP:print_error_data resource=20000000000000e7 adr=c000000130a3b140 ofs=0140 0000000000000000 0000000000000000                                          
EHCA_DMP:print_error_data resource=20000000000000e7 adr=c000000130a3b150 ofs=0150 0000000000000000 0000000000000000                                          
EHCA_DMP:print_error_data resource=20000000000000e7 adr=c000000130a3b160 ofs=0160 0000000000000000 0000000000000001                                          
EHCA_DMP:print_error_data resource=20000000000000e7 adr=c000000130a3b170 ofs=0170 0000000000000002 0000000000000000                                          
EHCA_DMP:print_error_data resource=20000000000000e7 adr=c000000130a3b180 ofs=0180 0000000000000005 0000000000000000                                          
EHCA_DMP:print_error_data resource=20000000000000e7 adr=c000000130a3b190 ofs=0190 0000000000000000 0000000000000000                                          
EHCA_DMP:print_error_data resource=20000000000000e7 adr=c000000130a3b1a0 ofs=01a0 0000000000000000 0000000000000000                                          
EHCA_DMP:print_error_data resource=20000000000000e7 adr=c000000130a3b1b0 ofs=01b0 0000000142a70000 0000000003af8680                                          
EHCA_DMP:print_error_data resource=20000000000000e7 adr=c000000130a3b1c0 ofs=01c0 0000000000000001 0000000000000000                                          
EHCA_DMP:print_error_data resource=20000000000000e7 adr=c000000130a3b1d0 ofs=01d0 0000000156260000 0000000006d1f800                                          
EHCA_DMP:print_error_data resource=20000000000000e7 adr=c000000130a3b1e0 ofs=01e0 000000000000000a 0000000000000000                                          
EHCA_DMP:print_error_data resource=20000000000000e7 adr=c000000130a3b1f0 ofs=01f0 0000000000000003 00000000003363c9                                          
EHCA_DMP:print_error_data resource=20000000000000e7 adr=c000000130a3b200 ofs=0200 0000000000ffffff 0000000000000002                                          
EHCA_DMP:print_error_data resource=20000000000000e7 adr=c000000130a3b210 ofs=0210 0000000000000000 0000000000000000                                          
EHCA_DMP:print_error_data resource=20000000000000e7 adr=c000000130a3b220 ofs=0220 0000000000000000 0000000000000000                                          
EHCA_DMP:print_error_data resource=20000000000000e7 adr=c000000130a3b230 ofs=0230 0000000000000000 0000000000000000                                          
EHCA_DMP:print_error_data resource=20000000000000e7 adr=c000000130a3b240 ofs=0240 0000000000000000 0000000000ffffff                                          
EHCA_DMP:print_error_data resource=20000000000000e7 adr=c000000130a3b250 ofs=0250 00000000dd4a0f80 000000000000000c                                          
EHCA_DMP:print_error_data resource=20000000000000e7 adr=c000000130a3b260 ofs=0260 000000000000000c 0000000000000004                                          
EHCA_DMP:print_error_data resource=20000000000000e7 adr=c000000130a3b270 ofs=0270 0000000000000000 0000000000000000                                          
EHCA_DMP:print_error_data resource=20000000000000e7 adr=c000000130a3b280 ofs=0280 0b000000003363c9 0000000000ffffff                                          
EHCA_DMP:print_error_data resource=20000000000000e7 adr=c000000130a3b290 ofs=0290 0000000000ffffff 0000000000000000                                          
EHCA_DMP:print_error_data resource=20000000000000e7 adr=c000000130a3b2a0 ofs=02a0 00000000003363c9 0000000000000000                                          
EHCA_DMP:print_error_data resource=20000000000000e7 adr=c000000130a3b2b0 ofs=02b0 0000000000000000 0000000000000000                                          
EHCA_DMP:print_error_data resource=20000000000000e7 adr=c000000130a3b2c0 ofs=02c0 0000000000000000 0000000000000000
EHCA_DMP:print_error_data resource=20000000000000e7 adr=c000000130a3b2d0 ofs=02d0 0000000000000000 2000000000000000
EHCA_DMP:print_error_data resource=20000000000000e7 adr=c000000130a3b2e0 ofs=02e0 0000000000000000 0000000000000000
EHCA_DMP:print_error_data resource=20000000000000e7 adr=c000000130a3b2f0 ofs=02f0 0000000000000000 0000000000000000
EHCA_DMP:print_error_data resource=20000000000000e7 adr=c000000130a3b300 ofs=0300 0000000000000000 0000000000000000
EHCA_DMP:print_error_data resource=20000000000000e7 adr=c000000130a3b310 ofs=0310 c000000000000000 0000000000000000
EHCA_DMP:print_error_data resource=20000000000000e7 adr=c000000130a3b320 ofs=0320 0000000000000000 02000000000000c8
EHCA_DMP:print_error_data resource=20000000000000e7 adr=c000000130a3b330 ofs=0330 0000000000000000 0000000000000000
EHCA_DMP:print_error_data resource=20000000000000e7 adr=c000000130a3b340 ofs=0340 0000000000000000 0000000000000000
EHCA_DMP:print_error_data resource=20000000000000e7 adr=c000000130a3b350 ofs=0350 0000000000000000 0000000000000000
EHCA_DMP:print_error_data resource=20000000000000e7 adr=c000000130a3b360 ofs=0360 0000000000000000 0000000000000000
EHCA_DMP:print_error_data resource=20000000000000e7 adr=c000000130a3b370 ofs=0370 0000000000000000 0000000000000000
EHCA_DMP:print_error_data resource=20000000000000e7 adr=c000000130a3b380 ofs=0380 0000000000000000 0000000000000000
EHCA_DMP:print_error_data resource=20000000000000e7 adr=c000000130a3b390 ofs=0390 0000000000000000 0000000000000000
EHCA_DMP:print_error_data resource=20000000000000e7 adr=c000000130a3b3a0 ofs=03a0 0000000000000000 0000000000000000
EHCA_DMP:print_error_data resource=20000000000000e7 adr=c000000130a3b3b0 ofs=03b0 0000000000000000 0000000000000000
EHCA_DMP:print_error_data resource=20000000000000e7 adr=c000000130a3b3c0 ofs=03c0 0000000000000000 0000000000000000
EHCA_DMP:print_error_data resource=20000000000000e7 adr=c000000130a3b3d0 ofs=03d0 0000000000000000 0000000000000000
EHCA_DMP:print_error_data resource=20000000000000e7 adr=c000000130a3b3e0 ofs=03e0 0000000000000000 0000000000000000
EHCA_DMP:print_error_data resource=20000000000000e7 adr=c000000130a3b3f0 ofs=03f0 0000000000000000 0400000000000060
EHCA_DMP:print_error_data resource=20000000000000e7 adr=c000000130a3b400 ofs=0400 8000000000000000 c000000000000000
EHCA_DMP:print_error_data resource=20000000000000e7 adr=c000000130a3b410 ofs=0410 0000000000000000 0000000000000000
EHCA_DMP:print_error_data resource=20000000000000e7 adr=c000000130a3b420 ofs=0420 0000000006be7013 00000001edfc2100
EHCA_DMP:print_error_data resource=20000000000000e7 adr=c000000130a3b430 ofs=0430 0000000000020840 0000000000000000
EHCA_DMP:print_error_data resource=20000000000000e7 adr=c000000130a3b440 ofs=0440 0000000000000000 0004000000000004
EHCA_DMP:print_error_data resource=20000000000000e7 adr=c000000130a3b450 ofs=0450 0000000000000004 0000000000000000
EHCA_DMP:print_error_data resource=20000000000000e7 adr=c000000130a3b460 ofs=0460 0300000000000068 8040000000000000
EHCA_DMP:print_error_data resource=20000000000000e7 adr=c000000130a3b470 ofs=0470 c000c00000000000 0000000000000000
EHCA_DMP:print_error_data resource=20000000000000e7 adr=c000000130a3b480 ofs=0480 0000000000000000 0000000006b6d003
EHCA_DMP:print_error_data resource=20000000000000e7 adr=c000000130a3b490 ofs=0490 00000001eec90ca8 0000000000000000
EHCA_DMP:print_error_data resource=20000000000000e7 adr=c000000130a3b4a0 ofs=04a0 0000000000000000 0000000000000000
EHCA_DMP:print_error_data resource=20000000000000e7 adr=c000000130a3b4b0 ofs=04b0 0000000000000000 0000000000000004
EHCA_DMP:print_error_data resource=20000000000000e7 adr=c000000130a3b4c0 ofs=04c0 0000000000000005 0000000000000000
ehca lhca@23000100: PU0000 EHCA_ERR:print_error_data EHCA ----- error data end ----------------------------------------------------

Expected results:

IP packets should be sent normally.

Extra:
According to kernel change log one of the following fixes could lead to this problem:
- [openib] ehca: deadlock race when creating small queues (Jesse Larrew ) [470137]
- [openib] ehca: remove ref to QP if port activation fails (AMEET M. PARANJAPE ) [469941]

Comment 1 Yury Konovalov 2008-12-29 09:46:12 UTC
I found out that it is linux-2.6-openib-race-in-ipoib_cm_post_receive_nonsrq.patch, that prevent ehca from functioning on my system. I rebuilt 128.el5 kernel without above patch, and ehca now able to send/recieve packets.

Comment 3 Doug Ledford 2009-04-22 22:33:26 UTC
This should no longer be an issue with the upcoming rhel5.4 kernel.

Comment 8 John Feeney 2011-10-27 17:28:57 UTC
I am closing this as a duplicate of bz477000 which has been fixed for awhile. If this is not found to be a duplicate, please re-open or file a new bugzilla.

*** This bug has been marked as a duplicate of bug 477000 ***


Note You need to log in before you can comment on or make changes to this bug.