Bug 590151 - cxgb3i_ddp Error occurred on the host and the Sessions failed to login to controller during controller reboot test - (CR175840)
Summary: cxgb3i_ddp Error occurred on the host and the Sessions failed to login to con...
Keywords:
Status: CLOSED DUPLICATE of bug 567444
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.5
Hardware: powerpc
OS: Linux
low
medium
Target Milestone: rc
: ---
Assignee: Mike Christie
QA Contact: Red Hat Kernel QE team
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-05-07 20:37 UTC by hong.chung
Modified: 2010-05-26 19:15 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2010-05-26 19:14:27 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
cxgb3i-dsmeg and messages log (50.54 KB, application/x-zip-compressed)
2010-05-07 20:37 UTC, hong.chung
no flags Details

Description hong.chung 2010-05-07 20:37:26 UTC
Created attachment 412437 [details]
cxgb3i-dsmeg and messages log

Description of problem:

While running 12 hrs of controller reboot test, after array several reboots, not constant number, login sessions failed to log back to the target which causes the host reports Read/Write Error.

We have another RHEL5.5 host running with tcp, not with the toe (cxgb3i), and we did not see this issue.

Version-Release number of selected component (if applicable):

Hostname:          Ayeka-RH55
Host IP:           10.10.10.35
Kernel Release:    2.6.18-194.el5
RHEL Release:      Red Hat Enterprise Linux Server release 5.5 (Tikanga)
Version:           Linux version 2.6.18-194.el5 (mockbuild.bos.redhat.com) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-48)) #1 SMP Tue Mar 16 22:03:12 EDT 2010
Platform:          ppc64

filename:       /lib/modules/2.6.18-194.el5/kernel/drivers/scsi/cxgb3i/cxgb3i.ko
version:        1.0.2
license:        GPL
description:    Chelsio S3xx iSCSI Driver
author:         Karen Xie kxie>
srcversion:     92C55DB348F36EC6BDAA7B6
depends:        libiscsi2,libiscsi_tcp,cxgb3,scsi_transport_iscsi2,scsi_mod
vermagic:       2.6.18-194.el5 SMP mod_unload gcc-4.1
parm:           cxgb3_rcv_win:TCP receive window in bytes (default=256KB) (int)
parm:           cxgb3_snd_win:TCP send window in bytes (default=128KB) (int)
parm:           cxgb3_rx_credit_thres:int
parm:           rx_credit_thres:RX credits return threshold in bytes (default=10KB)
parm:           cxgb3_max_connect:Max. # of connections (default=8092) (uint)
parm:           cxgb3_sport_base:starting port number (default=20000) (uint)
module_sig:	883f3504ba0409b9ba3c1dbd688a8a41123a3d09f5e7f92eac401995da8ff3e3d5baca33afe8bd00a0b618675fcf9083773f11c5cdfe5871846c1f146b


How reproducible:
Often.

Steps to Reproduce:
1. Created 64 volumes from in each array.
2. Created Snapshot of all volumes.
3. Mapped 64 base volumes from each array to the RH 5.5 host.
4. Start the 4 cxgb3i session 2 for each array
cxgb3i: [1] 10.10.10.10:3260,1 iqn.1992-01.com.lsi:4981.60080e500017b96a000000004b4db14b
cxgb3i: [2] 10.10.10.20:3260,1 iqn.1992-01.com.lsi:4981.60080e500017b962000000004b4db188
cxgb3i: [3] 11.11.11.10:3260,2 iqn.1992-01.com.lsi:4981.60080e500017b96a000000004b4db14b
cxgb3i: [4] 11.11.11.20:3260,2 iqn.1992-01.com.lsi:4981.60080e500017b962000000004b4db188

5. Started I/O using LunixSmash and started the sysreboot test.

*sysreboot test:
Both A controllers sleeps ten minutes and then sysReboots “reboots" both B controllers and sleeps ten minutes and so on. Test should run for 12 hours.
While rebooting the A controllers, I/Os will be running in the alt path "B path”. When rebooting is completed, I/O will get back to its preferred path "A path".
  
Actual results:
After the target several reboots, not constant number, login sessions failed to log back to the target which causes the host reports Read/Write Error.

*The following log appeared in /var/log/messegaes before I/Os Error occurred:

May  6 13:05:23 Ayeka-RH55 kernel: cxgb3i_ddp: ERR! release 0x1005274b, idx 0x149d, gl 0x0000000000000000, 0.
May  6 13:05:24 Ayeka-RH55 iscsid: Kernel reported iSCSI connection 2:0 error (1011) state (3)

Expected results:
Sessions logged back to the target and I/O continue without any error.

Additional info:

Setup information (1x2)
Switch Cisco Nexus 5020 4.1(3)N2(1a)

HOST
Ayeka		172.22.229.116
OS – RHEL 5.5
Server Brand - IBM
Server Model - P520
Failover - MPP
Architecture ; PPC
HBA 1 Model / Protocol - RedRiver / iscsi

eth0 : 10.10.10.35
eth1 : 11.11.11.35

/var/lib/iscsi/iface/

iface.iscsi_ifacename = cxgb3i.00:14:5e:99:04:68
iface.hwaddress = 00:14:5e:99:04:68
iface.ipaddress = 10.10.10.35
iface.net_ifacename =  empty >
iface.transport_name = cxgb3i
iface.initiator_name = iqn.1994-05.com.redhat:7ae845b4ba6b

iface.iscsi_ifacename = cxgb3i.00:14:5e:99:04:66
iface.hwaddress = 00:14:5e:99:04:66
iface.ipaddress = 11.11.11.35
iface.net_ifacename =  empty >
iface.transport_name = cxgb3i
iface.initiator_name = iqn.1994-05.com.redhat:7ae845b4ba6b


Array 1
Model - 49xx
CFW - 07.60.35.00
Protocol - iscsi
Speed - 1Gb

Array 2
Model - 49xx
CFW  - 07.70.10.00
Protocol - iscsi
Speed - 1Gb

Comment 1 Mike Christie 2010-05-10 19:46:13 UTC
Chelsio has confirmed this is fixed upstream. They were planning on bringing in the fix into 5.6 in this bugzilla
https://bugzilla.redhat.com/show_bug.cgi?id=567444
requesting that we update their driver.

Comment 2 Abdel Sadek 2010-05-11 21:21:49 UTC
(In reply to comment #1)
> Chelsio has confirmed this is fixed upstream. They were planning on bringing in
> the fix into 5.6 in this bugzilla
> https://bugzilla.redhat.com/show_bug.cgi?id=567444
> requesting that we update their driver.    

Will the fix be backported to RHEL 5.5 maintenance?

Comment 3 Andrius Benokraitis 2010-05-26 19:14:27 UTC

*** This bug has been marked as a duplicate of bug 567444 ***

Comment 4 Andrius Benokraitis 2010-05-26 19:15:28 UTC
LSI - please add yourselves to the dupe'd bug 567444. Lobbying for z-stream should be done there.


Note You need to log in before you can comment on or make changes to this bug.