Created attachment 412437 [details] cxgb3i-dsmeg and messages log Description of problem: While running 12 hrs of controller reboot test, after array several reboots, not constant number, login sessions failed to log back to the target which causes the host reports Read/Write Error. We have another RHEL5.5 host running with tcp, not with the toe (cxgb3i), and we did not see this issue. Version-Release number of selected component (if applicable): Hostname: Ayeka-RH55 Host IP: 10.10.10.35 Kernel Release: 2.6.18-194.el5 RHEL Release: Red Hat Enterprise Linux Server release 5.5 (Tikanga) Version: Linux version 2.6.18-194.el5 (mockbuild.bos.redhat.com) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-48)) #1 SMP Tue Mar 16 22:03:12 EDT 2010 Platform: ppc64 filename: /lib/modules/2.6.18-194.el5/kernel/drivers/scsi/cxgb3i/cxgb3i.ko version: 1.0.2 license: GPL description: Chelsio S3xx iSCSI Driver author: Karen Xie kxie> srcversion: 92C55DB348F36EC6BDAA7B6 depends: libiscsi2,libiscsi_tcp,cxgb3,scsi_transport_iscsi2,scsi_mod vermagic: 2.6.18-194.el5 SMP mod_unload gcc-4.1 parm: cxgb3_rcv_win:TCP receive window in bytes (default=256KB) (int) parm: cxgb3_snd_win:TCP send window in bytes (default=128KB) (int) parm: cxgb3_rx_credit_thres:int parm: rx_credit_thres:RX credits return threshold in bytes (default=10KB) parm: cxgb3_max_connect:Max. # of connections (default=8092) (uint) parm: cxgb3_sport_base:starting port number (default=20000) (uint) module_sig: 883f3504ba0409b9ba3c1dbd688a8a41123a3d09f5e7f92eac401995da8ff3e3d5baca33afe8bd00a0b618675fcf9083773f11c5cdfe5871846c1f146b How reproducible: Often. Steps to Reproduce: 1. Created 64 volumes from in each array. 2. Created Snapshot of all volumes. 3. Mapped 64 base volumes from each array to the RH 5.5 host. 4. Start the 4 cxgb3i session 2 for each array cxgb3i: [1] 10.10.10.10:3260,1 iqn.1992-01.com.lsi:4981.60080e500017b96a000000004b4db14b cxgb3i: [2] 10.10.10.20:3260,1 iqn.1992-01.com.lsi:4981.60080e500017b962000000004b4db188 cxgb3i: [3] 11.11.11.10:3260,2 iqn.1992-01.com.lsi:4981.60080e500017b96a000000004b4db14b cxgb3i: [4] 11.11.11.20:3260,2 iqn.1992-01.com.lsi:4981.60080e500017b962000000004b4db188 5. Started I/O using LunixSmash and started the sysreboot test. *sysreboot test: Both A controllers sleeps ten minutes and then sysReboots “reboots" both B controllers and sleeps ten minutes and so on. Test should run for 12 hours. While rebooting the A controllers, I/Os will be running in the alt path "B path”. When rebooting is completed, I/O will get back to its preferred path "A path". Actual results: After the target several reboots, not constant number, login sessions failed to log back to the target which causes the host reports Read/Write Error. *The following log appeared in /var/log/messegaes before I/Os Error occurred: May 6 13:05:23 Ayeka-RH55 kernel: cxgb3i_ddp: ERR! release 0x1005274b, idx 0x149d, gl 0x0000000000000000, 0. May 6 13:05:24 Ayeka-RH55 iscsid: Kernel reported iSCSI connection 2:0 error (1011) state (3) Expected results: Sessions logged back to the target and I/O continue without any error. Additional info: Setup information (1x2) Switch Cisco Nexus 5020 4.1(3)N2(1a) HOST Ayeka 172.22.229.116 OS – RHEL 5.5 Server Brand - IBM Server Model - P520 Failover - MPP Architecture ; PPC HBA 1 Model / Protocol - RedRiver / iscsi eth0 : 10.10.10.35 eth1 : 11.11.11.35 /var/lib/iscsi/iface/ iface.iscsi_ifacename = cxgb3i.00:14:5e:99:04:68 iface.hwaddress = 00:14:5e:99:04:68 iface.ipaddress = 10.10.10.35 iface.net_ifacename = empty > iface.transport_name = cxgb3i iface.initiator_name = iqn.1994-05.com.redhat:7ae845b4ba6b iface.iscsi_ifacename = cxgb3i.00:14:5e:99:04:66 iface.hwaddress = 00:14:5e:99:04:66 iface.ipaddress = 11.11.11.35 iface.net_ifacename = empty > iface.transport_name = cxgb3i iface.initiator_name = iqn.1994-05.com.redhat:7ae845b4ba6b Array 1 Model - 49xx CFW - 07.60.35.00 Protocol - iscsi Speed - 1Gb Array 2 Model - 49xx CFW - 07.70.10.00 Protocol - iscsi Speed - 1Gb
Chelsio has confirmed this is fixed upstream. They were planning on bringing in the fix into 5.6 in this bugzilla https://bugzilla.redhat.com/show_bug.cgi?id=567444 requesting that we update their driver.
(In reply to comment #1) > Chelsio has confirmed this is fixed upstream. They were planning on bringing in > the fix into 5.6 in this bugzilla > https://bugzilla.redhat.com/show_bug.cgi?id=567444 > requesting that we update their driver. Will the fix be backported to RHEL 5.5 maintenance?
*** This bug has been marked as a duplicate of bug 567444 ***
LSI - please add yourselves to the dupe'd bug 567444. Lobbying for z-stream should be done there.