Description of problem: The iscsi initiator issues a No-OP out with the final bit set and the CmdSn is incremented, when it should not. Also the ITT is not set to 0XFFFFFFFF. This causes our target (Overland REO, tape device) to kill the sesion as supported by error recovery level 0 I am ot sure if the no-op out is being used to send the target with the latest value of ExpStatusSN in which case the imid bit should be set, the CmdSn should not be incremented and the ITT should be set to 0xFFFFFFFF or if it is being used to "ping" the target in which case the imid bit should not be set, the CmdSn should increment anf ITT should be set accordingly. Version-Release number of selected component (if applicable): How reproducible: Very reproducable with a medium load, 8 target devices Steps to Reproduce: 1.With 8 tape devices created on our REO product and all logged i via iSCSI launch dt with the following parameters: of=/dev/stX pattern=incr limit=inf passes=inf enable=Debug dtype=tape bs=64k X indicates the tape drive number. I this case 0 through 7 2. 3. Actual results: After about 20 minutes dt will exit out on 1 or more streams. Expected results: I/O to continue until dt is maualy stopped Additional info: Finisar trace of no-op issue ca be provided.
Sorry for the late reply on this one. The nop is being used as a ping. Are you getting a valid itt, and the cmdsn incremented, but is the issue with the imid bit? I am not sure I see in the spec where the immediate bit cannot be set for a nop used as a ping. What section is that? There was a bug in RHEL5 where you could get a scsi command related pdu and a nop with bad CmdSn numbers. What would happen is that a scsi command could get CmdSn 5 and a nop would get CmdSn 4, but the initiator would send the scsi command pdu first. This was causing all types of problems. If that is the issue then this 2.6.18-27.el5 should fix the problem. You can download this test kernel from http://people.redhat.com/dzickus/el5
I don't see a reference to the imid bit in the spec. Some of the books I have indicate thatthis is the case. Could be an Interpretation issue from the authors of the book. I'll try out the new kernel and post the results.
The test kernel solves the issue. When would this be available to our customers? Can we provide a link to this kernel?
It will be available in RHEL 5.1 (I do not think the release date is out). The kernel in that link is an unstable one so I would not point customers to it. It is not supported by red hat. You can work around the problem by just turning the nop as a ping off by setting node.conn[0].timeo.noop_out_interval = 0 node.conn[0].timeo.noop_out_timeout = 0 in the /etc/iscsi/iscsi.conf file And then rediscovering the iscsi devices (rerun iscsiadm -m discovery -p ip:port).