Hide Forgot
Description of problem: an iser target may send iscsi NO-OP PDUs as soon as it marks the iser iscsi session as fully operative. This means that there is window in time, where there are no posted receive buffers in the initiator side, such that its possible for the iser RC connection to break as of RNR NAK / retry errors. To solve that, rely on the flags bits in the login request to have FFP (0x3) in the lower nibble, as a marker for the final login request, and post an initial chunk of receive buffers before sending that login request instead of after getting the login response. A patch to solve that was submitted upstream see http://marc.info/?l=linux-rdma&m=133096464331279&w=2 We actually hit that bug in practice, and here are the prints from the target side that show that: tgtd0: iser_cm_conn_established(1540) conn:0x8943e0 cm_id:0x8a1c90, 192.168.20.11 -> 192.168.20.17, established tgtd0: handle_wc_error(2960) conn:0x8943e0 task:0x1790670 tag:0xffffffff wr_id:0x0x1790748 op:send err:RNR retry counter exceeded vendor_err:0x87 tgtd0: iser_conn_close(1219) conn:0x8943e0 cm_id:0x0x8a1c90 state: CLOSE, refcnt:395 tgtd0: iser_cm_disconnected(1560) conn:0x8943e0 cm_id:0x8a1c90 event:10, RDMA_CM_EVENT_DISCONNECTED Reproducible well under loop of login/logout for bunch (say five or more) of iser targets exported by one tgt instance
I've got this one Mike.
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
(In reply to comment #2) > I've got this one Mike. If you do not have time, let me know and I can do it. I have been working with Or on this upstream. So it is tested and reviewed on my side. I will watch for your posting and ack it.
the patch was picked by Roland, so should be present in 3.4-rc1 and -stable immediatly followin that - http://git.kernel.org/?p=linux/kernel/git/roland/infiniband.git;a=commitdiff;h=89e984e2c2cd14f77ccb26c47726ac7f13b70ae8
Sorry Mike, I knew you said you weren't planning on an iSCSI update this release so figured you weren't doing anything with this at the moment. I'll let you handle it.
Patch(es) available on kernel-2.6.32-252.el6
Tested more than 50 times login/logout and no error has been reported. tgtd was configured with 10 iser targets. uname -r 2.6.32-274.el6.x86_64 rpm -q iscsi-initiator-utils iscsi-initiator-utils-6.2.0.872-41.el6.x86_64 rpm -q scsi-target-utils scsi-target-utils-1.0.24-2.el6.x86_64 [root@rdma1 ~]# iscsiadm -m discovery -p 172.31.1.2 -t st -I iser 172.31.1.2:3260,1 iqn.2010-10.com.example:storage-1000 172.31.1.2:3260,1 iqn.2010-10.com.example:storage-2000 172.31.1.2:3260,1 iqn.2010-10.com.example:storage-3000 172.31.1.2:3260,1 iqn.2010-10.com.example:storage-4000 172.31.1.2:3260,1 iqn.2010-10.com.example:storage-5000 172.31.1.2:3260,1 iqn.2010-10.com.example:storage-6000 172.31.1.2:3260,1 iqn.2010-10.com.example:storage-7000 172.31.1.2:3260,1 iqn.2010-10.com.example:storage-8000 172.31.1.2:3260,1 iqn.2010-10.com.example:storage-9000 172.31.1.2:3260,1 iqn.2010-10.com.example:storage-A000
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2012-0862.html