1. Feature Overview: a. Update bnx2i and cnic drivers b. Feature Description General bug fixes and to add support for 10G devices. Bug fixes for ia64 and ppc. 2. Feature Details: a. Architectures: 32-bit x86 64-bit Intel EM64T/AMD64 64-bit Itanium2 ppc64 b. Upstream acceptance information: Initial versions are upstream. Bug fixes and enhancement patch submission will be ongoing. 3. Business Justification: a. Why is this feature needed? Support 10G devices and bug fixes. 4. Primary contact at Broadcom, email, phone mchan (949)926-6170
*** Bug 512193 has been marked as a duplicate of this bug. ***
bnx2x and cnic patches for 10G iSCSI support have been merged into net-next-2.6. Thanks.
Are any brcm/uip changes? No bnx2i changes right? Do you have a bugzilla to update bnx2x for general networking issues? Maybe we could just piggy back the 10 gig iscsi bnx2x changes with it, so it will be easier to coordinate that.
(In reply to comment #3) > Are any brcm/uip changes? Yes, we need to add the uio userspace driver for 10G. Still working on that. > No bnx2i changes right? Right. May be some bug fixes during testing. > Do you have a bugzilla to update bnx2x for general networking issues? Maybe we > could just piggy back the 10 gig iscsi bnx2x changes with it, so it will be > easier to coordinate that. Bug 515716
@Broadcom, We need to confirm that there is commitment to test for the resolution of this request during the RHEL 5.5 test phase, if it is accepted into the release. Please post a confirmation before Oct 16th, 2009, including the contact information for testing engineers.
Yes, adding Nasser to CC to assign test engineers.
------- Comment From lcm.com 2009-10-13 14:42 EDT------- IBM will also provide test feedback. Please coordinate through Peter Bogdanovic, pbogdano.
PQA Test Engineers are as follows: bnx2x - Tung Nguyen (tungn) bnx2 - Joe Torricelli (jtorrice) tg3 - Jeff Leu (jleu) bnx2i - Emory Bestenlehner (emoryb) BRCM will also provide periodic test results. Any questions/comments, please let me know. Thanks Ed Narvaez, enarvaez, 949-926-6456
(In reply to comment #4) > (In reply to comment #3) > > Are any brcm/uip changes? > > Yes, we need to add the uio userspace driver for 10G. Still working on that. > Ok. Let me use this bugzilla for the userspace changes needed then. > > No bnx2i changes right? > > Right. May be some bug fixes during testing. Ok. > > > Do you have a bugzilla to update bnx2x for general networking issues? Maybe we > > could just piggy back the 10 gig iscsi bnx2x changes with it, so it will be > > easier to coordinate that. > > Bug 515716 And then just send the bnx2x iscsi stuff with your normal update, so that way we do not have any dependency issues with the kernel stuff.
This enhancement request was evaluated by the full Red Hat Enterprise Linux team for inclusion in a Red Hat Enterprise Linux minor release. As a result of this evaluation, Red Hat has tentatively approved inclusion of this feature in the next Red Hat Enterprise Linux Update minor release. While it is a goal to include this enhancement in the next minor release of Red Hat Enterprise Linux, the enhancement is not yet committed for inclusion in the next minor release pending the next phase of actual code integration and successful Red Hat and partner testing.
Created attachment 370624 [details] 0001-update-cnic-driver.patch
Created attachment 373042 [details] 0006-update-cnic-driver__v2.patch This patch is the same as 0001-update-cnic-driver.patch, but it's rebased atop Mike Christie patch for bug 516233, which also modifies cnic code
Brew build: https://brewweb.devel.redhat.com/taskinfo?taskID=2099220 It contains patches from following bugzillas: Bug 516233 - Panic on boot when loading iscsid with broadcom NIC Bug 515716 - [Broadcom 5.5 FEAT] Update bnx2x to 1.52.1-5 Bug 517378 - [Broadcom 5.5 FEAT] Update bnx2i and cnic drivers Public download (x86_64,i686,src): http://people.redhat.com/sgruszka/rhel5.5-broadcom/ Please test.
Does packages works? I would like to have this confirmation before posting patch to RKML. Thanks.
(In reply to comment #15) > Brew build: > https://brewweb.devel.redhat.com/taskinfo?taskID=2099220 > > It contains patches from following bugzillas: > Bug 516233 - Panic on boot when loading iscsid with broadcom NIC > Bug 515716 - [Broadcom 5.5 FEAT] Update bnx2x to 1.52.1-5 > Bug 517378 - [Broadcom 5.5 FEAT] Update bnx2i and cnic drivers > > Public download (x86_64,i686,src): > http://people.redhat.com/sgruszka/rhel5.5-broadcom/ > > Please test. (In reply to comment #16) > Does packages works? I would like to have this confirmation before posting > patch to RKML. Thanks. Hi Michael Any news?
Hi Stanislaw, I just got back from vacation. I'll need to check with our QA tomorrow to see if they've done any testing. Thanks.
Emory will be providing test results soon.
I was able to run to some basic target compatibility and read/write tests. More to follow. Thanks
We are unable to make any connections with bnx2i after initially getting it to work. I had restarted the iSCSI service while running disk I/O, and since then am only able to connect via L2. I am reloading 5.4 to check for procedural differences in connection establishment, etc.
We have ongoing update of bnx2 (bug 517377), perhaps it should be included as well in build for testing. We also do not apply this commit: commit d0549382da9997834ce65e489d9dbdc4b4693a2b Author: Michael Chan <mchan> Date: Wed Oct 28 03:41:59 2009 -0700 cnic: Fix L2CTX_STATUSB_NUM offset in context memory. Michael, any thoughts ?
Yes, this patch is needed if bnx2 is using 5.0.0.j3 firmware. Without it, bnx2 will crash when iSCSI is started. Not sure if that's what Emory was seeing. Thanks.
Seems there was some sort of corruption that occurred while testing prior. I removed the bnx2i and cnic modules and had to manually delete the iSCSI iface files previously created (was unable to via iscsiadm). Modprobed bnx2i and re-created ifaces and nodes via a separate interface and it worked. Will resume moderate testing.
Emory, Could you please provide some more info how you are you enter the issue (hardware, steps to reproduce, are you using cnic with bnx2 or bnx2x?), perhaps we will be able to reproduce problem in RH.
(In reply to comment #23) > Yes, this patch is needed if bnx2 is using 5.0.0.j3 firmware. Without it, bnx2 > will crash when iSCSI is started. Not sure if that's what Emory was seeing. In RHEL5 we use older bnx2 firmware 4.6.16 and 4.6.15, so this must be different problem.
I'll have a software engineer look into this tomorrow. We are testing 1G iSCSI using bnx2 because the final version of the userspace UIO driver for 10G bnx2x has not been sent to Mike Christie yet.
1. Load RedHat5.4, upgrade to 5.5 kernel (kernel-2.6.18-174.el5.bz515716.x86_64.rpm) 2. Create iSCSI interface and bind it to eth2, bnx2i and ip address 3. Connect to some targets and run disktest 4. Restart iSCSI service I'm using the 5709x (4 port), but only only 1 port is active (ifdown on other 3). 5.5 inbox bnx2i version 2.0.1e
Eddie Wai is now debugging the problem with Emory.
Great thanks. Please remember kernel-2.6.18-174.el5.broadcom_test build do not include requested bnx2 fixes from bug 517377. If you want new rebuild with bnx2 2.0.2 update please let me know.
The problem is related to the iSCSI service being scripted to restart while running disktest on the iSCSI disks. The iSCSI sessions should get logged out upon the service going down and get re-established upon the service going back up. Apparently, some connections didn't get cleaned up correctly. This is still under investigation. Barring this cycling of the iSCSI service test, the code appears to run okay so far. Emory is in the process of executing more of our normal test plan to see if other failures shall occur.
(In reply to comment #28) > 1. Load RedHat5.4, upgrade to 5.5 kernel > (kernel-2.6.18-174.el5.bz515716.x86_64.rpm) Oh dear. I took a look at this again. This kernel not contains cnic fixes. It only contains bnx2x fixes from bug 515716 . Please try one from: http://people.redhat.com/sgruszka/rhel5.5-broadcom/ Package should be named kernel-2.6.18-174.el5.broadcom_test.x86_64.rpm
Oops, good catch. Emory, please use test kernels specified in this BZ and check the versions of the cnic and bnx2i drivers. The cnic driver version should be 2.1.0. Thanks.
There are some small bug fix patches for cnic and bnx2i to support the bnx2x 10G devices. They are in the net-2.6 and scsi-misc-2.6 trees. Can those be added here or should we file a new BZ? Thanks.
This one is ok.
Please include these cnic patches: commit 4e9c4fd3e7e022c7a5b8bb7cd06bf914b202cfea cnic: Zero out status block and Event Queue indices. commit 1bcdc32cf4d94442eba79599ce8438ea0b8f78b5 cnic: Send delete command when shutting down iSCSI ring. commit 3248e1682035eef6774c280cd7be19984feb78bb cnic: Use dma_alloc_coherent(). commit 15971c3ce3caf9a92b603a61b07e0be8c9b9d276 cnic: Fix rq_page_table DMA address. commit dd2e4dbce32a2802088f6d0132046afec9bfb2ad cnic: Fix bogus iSCSI MAC address commit 8b065b671d3096bfe0dbc9a833cb592f84642436 cnic: Fix bnx2x ring shutdown. commit c7596b79feb3d15bea64007254f77233bda811f4 cnic: Fix ring I/O address for bnx2x devices. commit 164165dad7e607ec359e64b6fae72abbf3640ea6 drivers/net: tasklet_init - Remove unnecessary leading & from second arg commit 0d37f36ff9bc41067c71635d14b6a5834853a779 cnic: ensure ulp_type is not negative commit d0549382da9997834ce65e489d9dbdc4b4693a2b cnic: Fix L2CTX_STATUSB_NUM offset in context memory. (This one requires bnx2 update to use newer firmware)
Please also include these bnx2i patches (in scsi-misc-2.6): commit 45ca38e753016432a266a18679268a4c4674fb52 [SCSI] bnx2i: minor code cleanup and update driver version commit 85fef20222bda1ee41f97ff94a927180ef0b97e6 [SCSI] bnx2i: Task management ABORT TASK fixes commit 8776193bc308553ac0011b3bb2dd1837e0c6ab28 [SCSI] bnx2i: update CQ arming algorith for 5771x chipsets commit f8c9abe797c54e798b4025b54d71e5d2054c929a [SCSI] bnx2i: Adjust sq_size module parametr to power of 2 only if a non-zero value is specified commit 5d9e1fa99c2a9a5977f5757f4e0fd02697c995c2 [SCSI] bnx2i: Add 5771E device support to bnx2i driver
For the bnx2i patches I think we need to make another bugzilla. I will do this in the morning if I cannot find one. Stanislaw, I can send the patches in comment #37 since they are all scsi related.
(In reply to comment #38) > Stanislaw, I can send the patches in comment #37 since they are all scsi > related. I'll do this, bnx2i patches may have dependency with cnic ones. Thanks.
bnx2i patches in #37 are independent patches and not linked to any cnic driver changes.
Right, but I have already done :)
Created attachment 378535 [details] 0001-cnic-fixes-for-RHEL5.5.patch
Created attachment 378536 [details] 0002-bnx2i-fixes-for-RHEL5.5.patch
in kernel-2.6.18-181.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5 Please update the appropriate value in the Verified field (cf_verified) to indicate this fix has been successfully verified. Include a comment with verification details.
(In reply to comment #44) > in kernel-2.6.18-181.el5 Ahh, this bug needs additional fixes. Moving back to ASSIGNED.
Hello Broadcom I updated kernel-2.6.18-180.el5.broadcom_test at http://people.redhat.com/sgruszka/rhel5.5-broadcom/ It includes all cnic, bnx2i and bnx2 fixes You requested. Please test and report back ASAP. Note there is know issue with bnx2 with MTU=9000 (see bug 517377).
Thanks. We can continue 1G bnx2 iSCSI testing using the new kernel and this initiator: http://people.redhat.com/mchristi/iscsi/rhel5.5/iscsi-initiator-utils/ For 10G iSCSI testing on bnx2x devices, we need to wait for an updated initiator (Please see bug 517380).
(In reply to comment #49) > Thanks. We can continue 1G bnx2 iSCSI testing using the new kernel and this > initiator: Any news, how does testing goes ?
Here's what I show in our testdbase re: progress. I will ask Emory to further update. Software/Driver Driver: iSCSI HBA - Linux iSCSI HBA - Linux 2.01e Linux x64 Passed Failed N/A Blocking Done 7 (14%) 0 (0%) 0 (0%) 0 (0%) 7 of 47 (14%) Joe verified MTU fix no longer occurs (517377)
With the latest kernel, the bnx2i driver should be 2.1.0
kernel-2.6.18-180.el5.broadcom_test contains bnx2i version 2.1.0. Please test that kernel. Don Zickus 2.6.18-181 kernel is latter but it not contains new Broadcom drivers patches. Thank you.
We're not getting iSCSI init interrupts. Currently being investigated. /var/log/messages: Dec 21 10:35:16 localhost kernel: iscsi: registered transport (bnx2i) Dec 21 10:35:16 localhost kernel: scsi21 : Broadcom Offload iSCSI Initiator Dec 21 10:35:16 localhost kernel: bnx2i: send ISCSI_INIT KWQE Dec 21 10:35:18 localhost kernel: scsi22 : Broadcom Offload iSCSI Initiator Dec 21 10:35:18 localhost kernel: scsi23 : Broadcom Offload iSCSI Initiator Dec 21 10:35:18 localhost kernel: bnx2i: send ISCSI_INIT KWQE Dec 21 10:35:20 localhost kernel: scsi24 : Broadcom Offload iSCSI Initiator Dec 21 10:35:20 localhost kernel: bnx2i: send ISCSI_INIT KWQE Dec 21 10:35:22 localhost kernel: scsi25 : Broadcom Offload iSCSI Initiator Dec 21 10:35:22 localhost kernel: scsi26 : Broadcom Offload iSCSI Initiator Dec 21 10:35:22 localhost kernel: bnx2i: ep connect - start... Dec 21 10:35:22 localhost kernel: bnx2i: ep connect shost... Dec 21 10:35:22 localhost kernel: bnx2i: ep connect - hba not ready ... Dec 21 10:35:23 localhost iscsid: Received iferror -1 Thanks, Emory
Hi Emory, Were you using bnx2i with bnx2x or bnx2? And if you are using bnx2x are you using the latest iscsi initiator utils? It is here http://people.redhat.com/mchristi/iscsi/rhel5.5/iscsi-initiator-utils.
Hello Mike, I'm using the latest initiator with bnx2i. Thanks, Emory
Adding Michael Chan from Broadcom. Michael see comment #57. It looks like bnx2i_adapter_ready is failing. I cannot remember the common case for this. Was it if the ethX device was not also set up or something like that.
It could be the link was down or we did not get ISCSI_KCQE_OPCODE_INIT completion from the firmware. We'll retest with the -180 kernel from sgruszka and debug it if it doesn't work.
Update - we have an internal automated test setup for iSCSI offload. For the first time we’ve managed to pass all the iSCSI offload IPv4 tests on both bnx2 and bnx2x with Kernel: 2.6.18-180.el5.broadcom_test and Initiator: 6.2.0.871-0.14.el5. We are still investigating the open issues, but significant progress has been achieved. Thanks, Gidi
I will update tonight with the latest test progress. It will help the PQA efforts to have bnx2i 2.1.0 as part of the Beta CD. Will bnx2i 2.1.0 be part of the next snapshot CD? Thanks, Gidi
Correct - Snapshot 1 is what you'll need once released.
Broadcom - looks like only a partial set of the patches made Snapshot 1 (bnx2x only). The cnic bits will land in Snapshot 2.
Testing update regarding BRCM PQA additional iSCSI offload testing (doesn't include the automated protocol testing this is fully passing): 10G 1st pass: iSCSI offload testing 42% 1G 1st pass: iSCSI offload testing 25% This testing is done using test Kernels and not snapshot CDs since bnx2i is not part of the CDs yet. Thanks, Gidi
I am experiencing a failure while testing bnx2i over bnx2x from 2.6.18-189.el5 x86_64. The failure results in a bnx2x panic dump. The steps to reproduce are roughly: 1. Configure network interface on Broadcom 57711 device to use DHCP assigned address. 2. Configure iSCSI iface on same Broadcom 57711 device to use DHCP assigned address. 3. Discover Equallogic iSCSI target and connect to same. 4. Allow system to sit idle for several hours. Which, if any, of the preceding details are important is currently unknown. I will attach the text of the dump.
Created attachment 397464 [details] bnx2x panic dump from comment 78
Please post complete driver logs leading into bnx2x driver assert.
NOTE: there are still two fixes slated to be included in Snapshot 3 in regard to bnx2x driver, and bnx2x firmware.
That's right. Hopefully the new firmware fixes this issue. I'm adding Eilon to CC so he can have the Israel team look at the MC assert message.
Still pending inclusion in Snapshot 3: https://bugzilla.redhat.com/show_bug.cgi?id=561578 https://bugzilla.redhat.com/show_bug.cgi?id=567979
The -190.el5 kernel should have the updated bnx2x firmware (see Bug 560556). I think the newer bnx2x firmware may have a fix for this issue. Thomas, can you run the same test using this kernel?
I cannot access Bug 560556. I need a more direct link to the -190.el5 kernel.
Bug permissions fixed. You can find all test kernels at: http://people.redhat.com/jwilson/el5/
Created attachment 397661 [details] bnx2x panic dump with kernel -190.el5 The bnx2x panic dump is still occurring with the -190.el5 kernel. See attachment. I will attempt the test on different hardware.
Right, please try -191 since that's where the fixes landed.
------- Comment From linuxram.com 2010-03-03 20:24 EDT------- -191 does not work. Fails to login to the target using bnx2i target. However login to the target using tcp transport works. BTW, the NIC is 5709 with iscsi key enabled.
(In reply to comment #88) > Right, please try -191 since that's where the fixes landed. The failure persisted with the -191.el5 kernel. I will run the test on a 57710 and a different 57711 overnight.
Michael Chan, Mike Christie - see Comment 89 and Comment 90. Any ideas? Have these been tested on your side?
The bnx2x panic dump failure occurred on both the 57710 and 57711 NIC overnight.
Our Israel team is looking into the 10G firmware panic reported by Thomas. The problem reported by IBM may be a configuration issue, but we need more information on their setup.
Thomas, the Israel firmware team is asking for a GRC dump. Can you provide that for us?
(In reply to comment #94) > Thomas, the Israel firmware team is asking for a GRC dump. Can you provide > that for us? Yes, I think that I can get the dump. Feel free to contact me off-list if there are any special instructions I need to follow.
I think no special instructions. Just use lediag to get the dump. Thanks.
Created attachment 397951 [details] GRC dump from 57710 Michael, I have attached the GRC dump.
Thanks Thomas. we've looked at the GRC dump and it is very likely a uIP userspace driver issue. We should have a fix later today. Do we need a new BZ for that? Can we still fix it for RHEL5.5?
Michael, there's no "uIP userspace driver" in RHEL 5.5 currently, correct? I thought this was a 5.6 item?
Yes Andrius, it is the brcm_iscsiuio package that we updated in Bug 517380. The uIP driver is part of that package. We need to fix one line in that userspace driver to fix the issue Thomas reported.
Go ahead and file a new bugzilla ASAP if you have the requisite business justification and I'll send it up ASAP.
------- Comment From coschult.com 2010-03-11 14:52 EDT------- I have not been able to log into the target while using bnx2i transport. If I used the default (no iface argument to iscsiadm) I did successfully log in. Attached are my most recent logs, using kernel 2.6.18-191.el5.
Created attachment 399437 [details] log from brcm_iscsiuio -f -d 100 ------- Comment (attachment only) From coschult.com 2010-03-11 14:53 EDT-------
Created attachment 399439 [details] log from iscsid -f -d 100 ------- Comment on attachment From coschult.com 2010-03-11 14:54 EDT------- One error message caught my eye, but I don't know if it's significant: iscsid: Recieved iferror -38: Unknown error 18446744073709551578
Created attachment 399440 [details] output from /var/log/messages ------- Comment on attachment From coschult.com 2010-03-11 14:55 EDT------- Nothing interesting in here, but I'm attaching it for completeness.
(In reply to comment #104) > Created an attachment (id=399439) [details] > log from iscsid -f -d 100 > > > ------- Comment on attachment From coschult.com 2010-03-11 14:54 > EDT------- > > > One error message caught my eye, but I don't know if it's significant: > > iscsid: Recieved iferror -38: Unknown error 18446744073709551578 -38 is ENOSYS which just means userspace tried to set some feature the kernel did not support. The 18446744073709551578 is a bug due to me using strerror and passing it the negative value, so instead of a nice string you get junk. What is interesting is iscsid: Received iferror -1 iscsid: cannot make a connection to 9.47.81.22:3260 (-1,11) The bnx2i driver is returning -EPERM. Benjamnin@broadcom, what was the reason for this again?
Hi Mike, One place during ep_connect() where -EPERM will be returned is if the state of the offloaded adapter is not ready then -EPERM will be returned. bnx2i will catch the netevents of when the adapter comes up/down and will cache the state. Corinna did you get a chance to run the debug driver set that Michael sent on Thu, 11 Mar 2010 16:13:50 -0800. That driver set has a some additional printk's to help us understand the flow of the code. Thanks again. -Ben
Yes, I just got debug output from Corinna. It failed at exactly the place where we check for bnx2i_adapter_ready(). I'm sending her another patch to print the hba->adapter_state.
------- Comment From coschult.com 2010-03-12 17:39 EDT------- adapter_state is 0. Here's the kernel log output when I restart iscsi and it tried to log in to the target: Mar 12 14:26:20 elm3b102 iscsid: iSCSI logger with pid=7864 started! Mar 12 14:26:20 elm3b102 kernel: bnx2i_ep_connect, shost ffff81107b549000, hba ffff81107b5495a0 Mar 12 14:26:20 elm3b102 kernel: bnx2i_ep_connect, alloc_ep succeeded Mar 12 14:26:20 elm3b102 kernel: bnx2i_ep_connect, hba adapter_state 0 Mar 12 14:26:21 elm3b102 iscsid: transport class version 2.0-871. iscsid version 2.0-871 Mar 12 14:26:21 elm3b102 iscsid: iSCSI daemon with pid=7865 started! Mar 12 14:26:21 elm3b102 iscsid: Received iferror -1 Mar 12 14:26:21 elm3b102 iscsid: cannot make a connection to 9.47.69.22:3260 (-1,11) Mar 12 14:26:22 elm3b102 kernel: bnx2i_ep_connect, shost ffff81107b549000, hba ffff81107b5495a0 Mar 12 14:26:22 elm3b102 kernel: bnx2i_ep_connect, alloc_ep succeeded Mar 12 14:26:22 elm3b102 kernel: bnx2i_ep_connect, hba adapter_state 0 Mar 12 14:26:23 elm3b102 iscsid: Received iferror -1 Mar 12 14:26:23 elm3b102 iscsid: cannot make a connection to 9.47.81.22:3260 (-1,11)
Thanks Corinna. Can you do iscsiadm login one more time immediately after the above failure?
------- Comment From coschult.com 2010-03-12 18:21 EDT------- Same result (with tail -f /var/log/messages & running in the same shell): [root@elm3b102 ~]# iscsiadm -m discovery -t sendtargets -p 9.47.69.22 -I bnx2i.eth3 9.47.69.22:3260,1000 iqn.1992-08.com.netapp:sn.84183797 9.47.81.22:3260,1001 iqn.1992-08.com.netapp:sn.84183797 [root@elm3b102 ~]# iscsiadm -m node -l -I bnx2i.eth3 Logging in to [iface: bnx2i.eth3, target: iqn.1992-08.com.netapp:sn.84183797, portal: 9.47.69.22,3260] Logging in to [iface: bnx2i.eth3, target: iqn.1992-08.com.netapp:sn.84183797, portal: 9.47.81.22,3260] iscsiadm: Could not login to [iface: bnx2i.eth3, target: iqn.1992-08.com.netapp:sn.84183797, portal: 9.47.69.22,3260]: iscsiadm: initiator reported error (4 - encountered connection failure) Mar 12 15:03:39 elm3b102 kernel: bnx2i_ep_connect, shost ffff81107b549000, hba ffff81107b5495a0 Mar 12 15:03:39 elm3b102 kernel: bnx2i_ep_connect, alloc_ep succeeded Mar 12 15:03:39 elm3b102 kernel: bnx2i_ep_connect, hba adapter_state 0 Mar 12 15:03:39 elm3b102 iscsid: Received iferror -1 Mar 12 15:03:39 elm3b102 iscsid: cannot make a connection to 9.47.69.22:3260 (-1,11) iscsiadm: Could not login to [iface: bnx2i.eth3, target: iqn.1992-08.com.netapp:sn.84183797, portal: 9.47.81.22,3260]: iscsiadm: initiator reported error (4 - encountered connection failure) [root@elm3b102 ~]# Mar 12 15:03:41 elm3b102 kernel: bnx2i_ep_connect, shost ffff81107b549000, hba ffff81107b5495a0 Mar 12 15:03:41 elm3b102 kernel: bnx2i_ep_connect, alloc_ep succeeded Mar 12 15:03:41 elm3b102 kernel: bnx2i_ep_connect, hba adapter_state 0 Mar 12 15:03:41 elm3b102 iscsid: Received iferror -1 Mar 12 15:03:41 elm3b102 iscsid: cannot make a connection to 9.47.81.22:3260 (-1,11) [root@elm3b102 ~]# iscsiadm -m node -l -I bnx2i.eth3 Logging in to [iface: bnx2i.eth3, target: iqn.1992-08.com.netapp:sn.84183797, portal: 9.47.69.22,3260] Logging in to [iface: bnx2i.eth3, target: iqn.1992-08.com.netapp:sn.84183797, portal: 9.47.81.22,3260] iscsiadm: Could not login to [iface: bnx2i.eth3, target: iqn.1992-08.com.netapp:sn.84183797, portal: 9.47.69.22,3260]: iscsiadm: initiator reported error (4 - encountered connection failure) Mar 12 15:03:55 elm3b102 kernel: bnx2i_ep_connect, shost ffff81107b549000, hba ffff81107b5495a0 Mar 12 15:03:55 elm3b102 kernel: bnx2i_ep_connect, alloc_ep succeeded Mar 12 15:03:55 elm3b102 kernel: bnx2i_ep_connect, hba adapter_state 0 Mar 12 15:03:55 elm3b102 iscsid: Received iferror -1 Mar 12 15:03:55 elm3b102 iscsid: cannot make a connection to 9.47.69.22:3260 (-1,11) iscsiadm: Could not login to [iface: bnx2i.eth3, target: iqn.1992-08.com.netapp:sn.84183797, portal: 9.47.81.22,3260]: iscsiadm: initiator reported error (4 - encountered connection failure) [root@elm3b102 ~]# Mar 12 15:03:57 elm3b102 kernel: bnx2i_ep_connect, shost ffff81107b549000, hba ffff81107b5495a0 Mar 12 15:03:57 elm3b102 kernel: bnx2i_ep_connect, alloc_ep succeeded Mar 12 15:03:57 elm3b102 kernel: bnx2i_ep_connect, hba adapter_state 0 Mar 12 15:03:57 elm3b102 iscsid: Received iferror -1 Mar 12 15:03:57 elm3b102 iscsid: cannot make a connection to 9.47.81.22:3260 (-1,11)
Is this device licensed for iSCSI offload? Is it possible to upload the complete logs which covers messages from driver load as well?
------- Comment From coschult.com 2010-03-12 19:19 EDT------- Yes, there are messages in the kernel log indicating that offload is successful: Broadcom NetXtreme II iSCSI Driver bnx2i v2.1.0 (Dec 06, 2009) iscsi: registered transport (bnx2i) scsi6 : Broadcom Offload iSCSI Initiator scsi7 : Broadcom Offload iSCSI Initiator scsi8 : Broadcom Offload iSCSI Initiator scsi9 : Broadcom Offload iSCSI Initiator I will attach the latest dmesg log. It shows several attempts to log into the target.
Created attachment 399791 [details] dmesg log with debugging information ------- Comment (attachment only) From coschult.com 2010-03-12 19:21 EDT-------
I think I know what's happening. bnx2i_init_one() calls cnic->register_device() and bnx2i_start() will be called immediately. bnx2i_start() will try to send the iscsi_init message to firmware but hba->cnic has not been setup yet so it fails and adapter_state will stay at 0. I think this can be easily fixed, but for now we can work around this by loading the bnx2i driver first before bringing up the eth? devices. IBM, please try this: modprobe bnx2 modprobe cnic modprobe bnx2i ifup eth0 Or: modprobe cnic modprobe bnx2i modprobe bnx2 ifup eth0
------- Comment From coschult.com 2010-03-15 19:18 EDT------- I rmmod'd the drivers and modprobe'd them in both sequences, and got a different error. The first time, it was a connection timeout, and the second time it immediately returned with a connection failure. I retried the first sequence, and it didn't timeout but gave me a connection failure. Looking at the log, adapter_state is 1, so that much succeeded, at least. I'm attaching the various logs.
Created attachment 400330 [details] shell session ------- Comment on attachment From coschult.com 2010-03-15 19:20 EDT------- My retry of the first sequence of module loading is not shown here.
Created attachment 400332 [details] /var/log/messages output ------- Comment (attachment only) From coschult.com 2010-03-15 19:23 EDT-------
Created attachment 400333 [details] debugging output from brcm_iscsiuio ------- Comment (attachment only) From coschult.com 2010-03-15 19:24 EDT-------
Created attachment 400334 [details] debugging output from iscsid ------- Comment (attachment only) From coschult.com 2010-03-15 19:24 EDT-------
The kernel cannot find the route to the iSCSI target through eth3. Do you have eth0 and eth3 configured for the same subnet? If the shortest route to the target is through eth0 and not eth3, it will not connect. Please configure eth3 only with the subnet that can reach the iSCSI target.
------- Comment From coschult.com 2010-03-16 18:21 EDT------- It turns out that I did have eth0 configured (with the same address as eth3!). I did a ip address flush dev eth0 on it, and reloaded the modules, and still got the same result: get_route returns 0. I flushed usb0, which had an IPV6 address assigned to it, and eth4, which had been configured with the address 10.0.0.102. I ran tcpdump while trying to connect and saw no network traffic. bHere's the output of ifconfig -a: eth0 Link encap:Ethernet HWaddr 00:21:5E:09:60:40 BROADCAST MULTICAST MTU:1500 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:0 (0.0 b) TX bytes:0 (0.0 b) Interrupt:169 Memory:96000000-96012800 eth1 Link encap:Ethernet HWaddr 00:21:5E:09:60:42 BROADCAST MULTICAST MTU:1500 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:0 (0.0 b) TX bytes:0 (0.0 b) Interrupt:138 Memory:98000000-98012800 eth2 Link encap:Ethernet HWaddr 00:10:18:57:0D:BC BROADCAST MULTICAST MTU:1500 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:0 (0.0 b) TX bytes:0 (0.0 b) Interrupt:209 Memory:92000000-92012800 eth3 Link encap:Ethernet HWaddr 00:10:18:57:0D:BE inet addr:9.47.67.102 Bcast:9.47.67.255 Mask:255.255.254.0 inet6 addr: fe80::210:18ff:fe57:dbe/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:7479 errors:0 dropped:0 overruns:0 frame:0 TX packets:439 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:757738 (739.9 KiB) TX bytes:43750 (42.7 KiB) Interrupt:146 Memory:94000000-94012800 eth4 Link encap:Ethernet HWaddr 00:14:5E:99:03:F4 BROADCAST MULTICAST MTU:1500 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:45 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:0 (0.0 b) TX bytes:6820 (6.6 KiB) Interrupt:185 Memory:9c800000-9c800fff lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:6363 errors:0 dropped:0 overruns:0 frame:0 TX packets:6363 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:9063624 (8.6 MiB) TX bytes:9063624 (8.6 MiB) sit0 Link encap:IPv6-in-IPv4 NOARP MTU:1480 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:0 (0.0 b) TX bytes:0 (0.0 b) usb0 Link encap:Ethernet HWaddr 02:21:5E:0A:60:43 BROADCAST MULTICAST MTU:1500 Metric:1 RX packets:468 errors:0 dropped:0 overruns:0 frame:0 TX packets:29 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:30420 (29.7 KiB) TX bytes:5831 (5.6 KiB)
(In reply to comment #122) > ------- Comment From coschult.com 2010-03-16 18:21 EDT------- > It turns out that I did have eth0 configured (with the same address as eth3!). > I did a ip address flush dev eth0 on it, and reloaded the modules, and still > got the same result: get_route returns 0. Yesterday, cnic_get_route() was returning -101 which was -ENETUNREACH. Returning 0 means it is successful. So we're making progress! Can you post the logs or send them to me privately?
------- Comment From coschult.com 2010-03-16 19:40 EDT------- By the way, the error message I got back from iscsiadm was a connection time out.
Created attachment 400598 [details] output from /var/log/messages ------- Comment (attachment only) From coschult.com 2010-03-16 19:36 EDT-------
Created attachment 400599 [details] output from brcm_iscsiuio ------- Comment (attachment only) From coschult.com 2010-03-16 19:38 EDT-------
Created attachment 400600 [details] output from iscsid ------- Comment (attachment only) From coschult.com 2010-03-16 19:38 EDT-------
Hi Corinna, We are definitely getting further. Now from the uIP daemon's point of view, we can see the ARP request come from CNIC. It looks like the ARP request packet was placed on the wire and uIP sees packets come in from the wire but they were not ARP packets because it didn't fill the uIP ARP packets. I was wondering if you are able to provide a packet trace of when you try an iSCSI login. Note: Running wireshark on the L2 interface will only provide a partial view of the contents on the wire because all uni-cast traffic to the iSCSI offload interface is not sent to the L2 interface. If you could do a network capture on the iSCSI target or in the middle of the connection that would be the best. Thanks again. -Ben
------- Comment From coschult.com 2010-03-18 16:01 EDT------- I tried connecting to a target on a different machine, which had an ip address restriction on the target, restricted to my ethernet ip address. I was able to discover the target, but when I tried to log in, I received the error "non-retryable iSCSI login failure". When I had our admin add the iscsi ip address as well, then I was able to log in successfully. But I was only able to successfully login when I loaded the modules in the order suggested by Michael. When I used the init.d script to start the service, I got "encountered connection failure". I am now looking into why I am unable to log into the first machine.
Hi Corinna, For the problem where you are not able to login if you use the init.d scripts to start the service, the iscsid script will load the cnic, and bnx2i drivers only. It will not load the bnx2 driver or bring that interface up. Does your test system automatically run the iscsid init script when booted? If so could you list the directory contents of the SysV init scripts of the runlevel you are having trouble with? (ie list the contents '/etc/rc.d/rc<run level>.d>') Also were you able to provide a wiretrace of the problem describe in comment 124? Thanks again. -Ben
------- Comment From coschult.com 2010-03-22 19:22 EDT------- Since I was able to log into a different target, on a machine with a different configuration, I'm going to not worry about my failure to log into the first machine. Likely it is a configuration problem, that we lack sufficient experience to diagnose. In any case, that machine is a Netapp, and I can't easily tap into its network traffic.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2010-0178.html
------- Comment From coschult.com 2010-05-21 17:23 EDT------- Verified on rhel5.5 rc2 (the release referred to by http://rhn.redhat.com/errata/RHSA-2010-0178.html )