578005 – [Broadcom 5.6 bug] Cannot login to iSCSI target when bnx2i is loaded last

Bug 578005 - [Broadcom 5.6 bug] Cannot login to iSCSI target when bnx2i is loaded last

Summary: [Broadcom 5.6 bug] Cannot login to iSCSI target when bnx2i is loaded last

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	5.6
Hardware:	All
OS:	Linux
Priority:	high
Severity:	urgent
Target Milestone:	rc
Target Release:	5.6
Assignee:	Mike Christie
QA Contact:	Red Hat Kernel QE team
Docs Contact:
URL:
Whiteboard:
Depends On:	568606
Blocks:	557597 595548 595862 661735
TreeView+	depends on / blocked

Reported:	2010-03-30 00:54 UTC by Michael Chan
Modified:	2018-12-01 16:42 UTC (History)
CC List:	21 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Clones:	595548 595862 (view as bug list)
Environment:
Last Closed:	2011-01-13 21:22:05 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
/var/log/message log (178.40 KB, text/plain) 2010-05-26 12:47 UTC, Vivian Bian	no flags	Details
call trace screenshot 1 (153.57 KB, image/png) 2010-05-26 12:57 UTC, Vivian Bian	no flags	Details
call trace screenshot 2 (156.40 KB, text/png) 2010-05-26 12:58 UTC, Vivian Bian	no flags	Details
call trace screenshot 3 (157.54 KB, image/png) 2010-05-26 12:58 UTC, Vivian Bian	no flags	Details
uip-0.6.2.1 (483.48 KB, application/octet-stream) 2010-10-18 17:00 UTC, Benjamin Li	no flags	Details
uip-0.6.2.2 (1016.11 KB, application/octet-stream) 2010-10-20 07:13 UTC, Benjamin Li	no flags	Details
uip-0.6.2.2 (483.54 KB, application/octet-stream) 2010-10-20 07:17 UTC, Benjamin Li	no flags	Details
Show Obsolete (3) View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2011:0017	0	normal	SHIPPED_LIVE	Important: Red Hat Enterprise Linux 5.6 kernel security and bug fix update	2011-01-13 10:37:42 UTC

Description Michael Chan 2010-03-30 00:54:09 UTC

Cannot login to iSCSI target using bnx2i transport if bnx2i is loaded after the network device has been brought up.  The workaround is to do service network restart.

This upstream patch fixes the problem:

http://marc.info/?l=linux-scsi&m=126953968406438&w=2

Comment 1 Andrius Benokraitis 2010-03-30 15:13:30 UTC

Michael, is this the patch you were requesting for 5.4.z?

Comment 2 Michael Chan 2010-03-30 16:31:46 UTC

(In reply to comment #1)
> Michael, is this the patch you were requesting for 5.4.z?    

Yes, this is one of them.  The other issues are still under investigation.

Comment 3 Andrius Benokraitis 2010-03-30 16:42:51 UTC

The issue I have with this request is that typical z-stream candidates have *no* workarounds available. I'll have to defer to Tom on this.

Comment 4 Mike Christie 2010-03-30 18:08:44 UTC

Adding devel ack for whatever version we decide. Patch only affects bnx2i, is lower risk and I have reviewed it and okd for upstream too.

Comment 5 RHEL Program Management 2010-05-20 12:49:55 UTC

This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 6 Vivian Bian 2010-05-26 12:47:16 UTC

Created attachment 416806 [details]
/var/log/message log

tried with following environment :

connect two boxes which have Broadcom Corporation NetXtreme II BCM57711 10-Gigabit PCIe 
make the NICs in the same LAN 192.168.2.0/24 

[RHEL 5.5 initiator v.s. RHEL 5.5 target]
All of them are with 2.6.18-198.el5 kernel 

[target] 192.168.2.101/255.255.255.0
# service tgtd start
# tgtadm --lld iscsi --op new --mode target --tid 1 -T bnx2i_test
# tgtadm --lld iscsi --op new --mode logicalunit --tid 1 --lun 1 -b /dev/sdb
# tgtadm --lld iscsi --op bind --mode target --tid 1 -I ALL

[initiator] 192.168.2.103/255.255.255.0
#iscsiadm --mode discovery --type sendtargets --portal 192.168.2.101
#iscsiadm --mode node --login 

Logging in to [iface: default,target: bnx2i_test,portal:192.168.2.101,3260]
  Vendor: IET     Model: Cotroller    Rev:0001
  Type  : RAID                        ANSI SCSI revision:05
scsi 9:0:0:0: Attached scsi generic sg2 type 12
  Vendor: IET     Model: Cotroller    Rev:0001
  Type  : RAID                        ANSI SCSI revision:05
SCSIdevice sdb:419458080 512-byte hdwr sectors (214759 MB)
sdb:Write Protect is off 
SCSI device sdb: driver cache: write back
SCSI device sdb:419458080 512-byte hdwr sectors (214759 MB)
sdb:Write Protect is off 
SCSI device sdb: driver cache: write back

end_request: I/O error, dev sdb, sector 0
printk: 4 messages suppressed
Buffer I/O error on device sdb,logical block 0

end_request: I/O error, dev sdb, sector 0
Buffer I/O error on device sdb,logical block 0

end_request: I/O error, dev sdb, sector 0
Buffer I/O error on device sdb,logical block 0


at this time target display:
end_request: I/O error, dev sdb, sector 0
end_request: I/O error, dev sdb, sector 0
end_request: I/O error, dev sdb, sector 0
end_request: I/O error, dev sdb, sector 0
end_request: I/O error, dev sdb, sector 0
.... ... 
end_request: I/O error, dev sdb, sector 0

check whether iscsi lun is attached on initiator side :
#fdisk -l

Disk /dev/sda: 500.1 GB, 500107862016 bytes
255 heads, 63 sectors/track, 60801 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *           1          13      104391   83  Linux
/dev/sda2              14       60801   488279610   8e  Linux LVM

/dev/sdb is not attached ,and begin to output I/O error on the console both on target and initiator side .

After a long while (about two hours), get a Call Trace on  initiator console , Can't ping from remote machines. Initiator machine hang . 

sorry for no serial console log captured , for no device for capturing that .

Comment 7 Vivian Bian 2010-05-26 12:57:36 UTC

Created attachment 416809 [details]
call trace screenshot 1

the call trace happened on initiator machine should be at the end of reboot time

Comment 8 Vivian Bian 2010-05-26 12:58:01 UTC

Created attachment 416810 [details]
call trace screenshot 2

Comment 9 Vivian Bian 2010-05-26 12:58:32 UTC

Created attachment 416813 [details]
call trace screenshot 3

Comment 10 Vivian Bian 2010-05-26 13:17:30 UTC

checked with rhev-hypervisor-5.5-2.2.0.16.1 as well , which is with 2.6.18-194 kernel . Met this the same error as described in comment 7 .

Comment 11 Mike Christie 2010-05-26 22:15:25 UTC

Vivian,

Your problem does not seem to be the same the problem we are fixing in this bugzilla. It is only related because they both are bnx2i bugs.

Could you open a new bugzilla?

Comment 12 Vivian Bian 2010-05-28 10:29:39 UTC

(In reply to comment #11)
> Vivian,
> 
> Your problem does not seem to be the same the problem we are fixing in this
> bugzilla. It is only related because they both are bnx2i bugs.
> 
> Could you open a new bugzilla?    

Mike, I have open a new bug for my parts -- Bug 597184 , sorry for spamming here.

Comment 15 Jarod Wilson 2010-09-15 14:01:04 UTC

in kernel-2.6.18-221.el5
You can download this test kernel from http://people.redhat.com/jwilson/el5

Detailed testing feedback is always welcomed.

Comment 19 edwardn 2010-10-14 17:19:48 UTC

With kernel-2.6.18-221.el5 and  iscsi-initiator-utils-6.2.0.872-2 (uIP
0.5.15), there are cases where the HBA will not log into the target under fairly normal circumstances.  The problem appears to be uIP-related since it does work with a newer internal version of uIP on top of the above components.  

Here is an example of a failure scenario: 
1) iscsiadm -m node -o delete; reboot
2) After system is back up, attempt to log into a target (failure)
3) iscsiadm -m node -o delete; service iscsi restart
4) Attempt to log into a target again (success)

Comment 20 Mike Christie 2010-10-14 21:18:09 UTC

Did I merge the wrong uip version? You guys wanted the same version that was in rhel 6 right?

Comment 21 Mike Christie 2010-10-14 21:21:32 UTC

Oh yeah Broadcom,

Is the uip version that fixes all these issues ready to for production?

I think I can just swap the versions since the current one failed qa.

Comment 22 Michael Chan 2010-10-14 21:28:40 UTC

Yes, a newer version is ready.  Ben, please send it to Mike.  Thanks.

Comment 23 Benjamin Li 2010-10-18 17:00:51 UTC

Created attachment 454166 [details]
uip-0.6.2.1

Attaching the latest version of uIP.  Ed has tested this version over the weekend and they have passed all the tests.

Comment 24 Benjamin Li 2010-10-20 07:13:15 UTC

Created attachment 454510 [details]
uip-0.6.2.2

Updating to uip-0.6.2.2

Comment 25 Benjamin Li 2010-10-20 07:17:35 UTC

Created attachment 454512 [details]
uip-0.6.2.2

Updating to the correct version of uIP-0.6.2.2

Comment 26 Andrius Benokraitis 2010-10-20 14:42:55 UTC

Benjamin - please file a new bugzilla with the updated uip, since this is ON_QA already. Obsoleting those that were submitted after 15-Sep-2010.

Comment 27 Mike Christie 2010-10-20 19:56:32 UTC

Hey Andrius, if for this bz https://bugzilla.redhat.com/show_bug.cgi?id=568609 the uip code failed QA, can I just use that bz to take in the fixed code?

Comment 28 Andrius Benokraitis 2010-10-20 23:41:36 UTC

(In reply to comment #27)
> Hey Andrius, if for this bz https://bugzilla.redhat.com/show_bug.cgi?id=568609
> the uip code failed QA, can I just use that bz to take in the fixed code?

That BZ isn't showing as failing QA though... I guess you can just set it back to ASSIGNED if you'd rather do that.

Comment 29 Chris Ward 2010-12-02 15:25:08 UTC

Reminder! There should be a fix present for this BZ in snapshot 3 -- unless otherwise noted in a previous comment.

Please test and update this BZ with test results as soon as possible.

Comment 30 edwardn 2010-12-02 17:17:12 UTC

This problem hasn't been seen in a while since uIP has been updated to v0.6.2.2 and above.  The last official kernel version tested was with kernel-2.6.18-233 (along with many others and internal builds).  Marked as verified.

Comment 33 errata-xmlrpc 2011-01-13 21:22:05 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-0017.html

Note You need to log in before you can comment on or make changes to this bug.

aaswath
agospoda
andriusb
anilgv
apevec
benli
benlu
bugproxy
coughlan
cward
dhoward
dyasny
edwardn
enarvaez
gideonn
jbroman
mchristi
niran
sbombe
vbian
xdl-redhat-bugzilla