Cannot login to iSCSI target using bnx2i transport if bnx2i is loaded after the network device has been brought up. The workaround is to do service network restart. This upstream patch fixes the problem: http://marc.info/?l=linux-scsi&m=126953968406438&w=2
Michael, is this the patch you were requesting for 5.4.z?
(In reply to comment #1) > Michael, is this the patch you were requesting for 5.4.z? Yes, this is one of them. The other issues are still under investigation.
The issue I have with this request is that typical z-stream candidates have *no* workarounds available. I'll have to defer to Tom on this.
Adding devel ack for whatever version we decide. Patch only affects bnx2i, is lower risk and I have reviewed it and okd for upstream too.
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
Created attachment 416806 [details] /var/log/message log tried with following environment : connect two boxes which have Broadcom Corporation NetXtreme II BCM57711 10-Gigabit PCIe make the NICs in the same LAN 192.168.2.0/24 [RHEL 5.5 initiator v.s. RHEL 5.5 target] All of them are with 2.6.18-198.el5 kernel [target] 192.168.2.101/255.255.255.0 # service tgtd start # tgtadm --lld iscsi --op new --mode target --tid 1 -T bnx2i_test # tgtadm --lld iscsi --op new --mode logicalunit --tid 1 --lun 1 -b /dev/sdb # tgtadm --lld iscsi --op bind --mode target --tid 1 -I ALL [initiator] 192.168.2.103/255.255.255.0 #iscsiadm --mode discovery --type sendtargets --portal 192.168.2.101 #iscsiadm --mode node --login Logging in to [iface: default,target: bnx2i_test,portal:192.168.2.101,3260] Vendor: IET Model: Cotroller Rev:0001 Type : RAID ANSI SCSI revision:05 scsi 9:0:0:0: Attached scsi generic sg2 type 12 Vendor: IET Model: Cotroller Rev:0001 Type : RAID ANSI SCSI revision:05 SCSIdevice sdb:419458080 512-byte hdwr sectors (214759 MB) sdb:Write Protect is off SCSI device sdb: driver cache: write back SCSI device sdb:419458080 512-byte hdwr sectors (214759 MB) sdb:Write Protect is off SCSI device sdb: driver cache: write back end_request: I/O error, dev sdb, sector 0 printk: 4 messages suppressed Buffer I/O error on device sdb,logical block 0 end_request: I/O error, dev sdb, sector 0 Buffer I/O error on device sdb,logical block 0 end_request: I/O error, dev sdb, sector 0 Buffer I/O error on device sdb,logical block 0 at this time target display: end_request: I/O error, dev sdb, sector 0 end_request: I/O error, dev sdb, sector 0 end_request: I/O error, dev sdb, sector 0 end_request: I/O error, dev sdb, sector 0 end_request: I/O error, dev sdb, sector 0 .... ... end_request: I/O error, dev sdb, sector 0 check whether iscsi lun is attached on initiator side : #fdisk -l Disk /dev/sda: 500.1 GB, 500107862016 bytes 255 heads, 63 sectors/track, 60801 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/sda1 * 1 13 104391 83 Linux /dev/sda2 14 60801 488279610 8e Linux LVM /dev/sdb is not attached ,and begin to output I/O error on the console both on target and initiator side . After a long while (about two hours), get a Call Trace on initiator console , Can't ping from remote machines. Initiator machine hang . sorry for no serial console log captured , for no device for capturing that .
Created attachment 416809 [details] call trace screenshot 1 the call trace happened on initiator machine should be at the end of reboot time
Created attachment 416810 [details] call trace screenshot 2
Created attachment 416813 [details] call trace screenshot 3
checked with rhev-hypervisor-5.5-2.2.0.16.1 as well , which is with 2.6.18-194 kernel . Met this the same error as described in comment 7 .
Vivian, Your problem does not seem to be the same the problem we are fixing in this bugzilla. It is only related because they both are bnx2i bugs. Could you open a new bugzilla?
(In reply to comment #11) > Vivian, > > Your problem does not seem to be the same the problem we are fixing in this > bugzilla. It is only related because they both are bnx2i bugs. > > Could you open a new bugzilla? Mike, I have open a new bug for my parts -- Bug 597184 , sorry for spamming here.
in kernel-2.6.18-221.el5 You can download this test kernel from http://people.redhat.com/jwilson/el5 Detailed testing feedback is always welcomed.
With kernel-2.6.18-221.el5 and iscsi-initiator-utils-6.2.0.872-2 (uIP 0.5.15), there are cases where the HBA will not log into the target under fairly normal circumstances. The problem appears to be uIP-related since it does work with a newer internal version of uIP on top of the above components. Here is an example of a failure scenario: 1) iscsiadm -m node -o delete; reboot 2) After system is back up, attempt to log into a target (failure) 3) iscsiadm -m node -o delete; service iscsi restart 4) Attempt to log into a target again (success)
Did I merge the wrong uip version? You guys wanted the same version that was in rhel 6 right?
Oh yeah Broadcom, Is the uip version that fixes all these issues ready to for production? I think I can just swap the versions since the current one failed qa.
Yes, a newer version is ready. Ben, please send it to Mike. Thanks.
Created attachment 454166 [details] uip-0.6.2.1 Attaching the latest version of uIP. Ed has tested this version over the weekend and they have passed all the tests.
Created attachment 454510 [details] uip-0.6.2.2 Updating to uip-0.6.2.2
Created attachment 454512 [details] uip-0.6.2.2 Updating to the correct version of uIP-0.6.2.2
Benjamin - please file a new bugzilla with the updated uip, since this is ON_QA already. Obsoleting those that were submitted after 15-Sep-2010.
Hey Andrius, if for this bz https://bugzilla.redhat.com/show_bug.cgi?id=568609 the uip code failed QA, can I just use that bz to take in the fixed code?
(In reply to comment #27) > Hey Andrius, if for this bz https://bugzilla.redhat.com/show_bug.cgi?id=568609 > the uip code failed QA, can I just use that bz to take in the fixed code? That BZ isn't showing as failing QA though... I guess you can just set it back to ASSIGNED if you'd rather do that.
Reminder! There should be a fix present for this BZ in snapshot 3 -- unless otherwise noted in a previous comment. Please test and update this BZ with test results as soon as possible.
This problem hasn't been seen in a while since uIP has been updated to v0.6.2.2 and above. The last official kernel version tested was with kernel-2.6.18-233 (along with many others and internal builds). Marked as verified.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2011-0017.html