Red Hat Bugzilla – Bug 241698
iSCSI protocol error with No-Op outs
Last modified: 2008-04-07 01:13:49 EDT
Description of problem:
The iscsi initiator issues a No-OP out with the final bit set and the CmdSn is
incremented, when it should not. Also the ITT is not set to 0XFFFFFFFF.
This causes our target (Overland REO, tape device) to kill the sesion as
supported by error recovery level 0
I am ot sure if the no-op out is being used to send the target with the latest
value of ExpStatusSN in which case the imid bit should be set, the CmdSn
should not be incremented and the ITT should be set to 0xFFFFFFFF or if it is
being used to "ping" the target in which case the imid bit should not be set,
the CmdSn should increment anf ITT should be set accordingly.
Version-Release number of selected component (if applicable):
Very reproducable with a medium load, 8 target devices
Steps to Reproduce:
1.With 8 tape devices created on our REO product and all logged i via iSCSI
launch dt with the following parameters:
of=/dev/stX pattern=incr limit=inf passes=inf enable=Debug dtype=tape bs=64k
X indicates the tape drive number. I this case 0 through 7
After about 20 minutes dt will exit out on 1 or more streams.
I/O to continue until dt is maualy stopped
Finisar trace of no-op issue ca be provided.
Sorry for the late reply on this one. The nop is being used as a ping. Are you
getting a valid itt, and the cmdsn incremented, but is the issue with the imid
bit? I am not sure I see in the spec where the immediate bit cannot be set for a
nop used as a ping. What section is that?
There was a bug in RHEL5 where you could get a scsi command related pdu and a
nop with bad CmdSn numbers. What would happen is that a scsi command could get
CmdSn 5 and a nop would get CmdSn 4, but the initiator would send the scsi
command pdu first. This was causing all types of problems. If that is the issue
then this 2.6.18-27.el5 should fix the problem. You can download this test
kernel from http://people.redhat.com/dzickus/el5
I don't see a reference to the imid bit in the spec. Some of the books I have
indicate thatthis is the case. Could be an Interpretation issue from the
authors of the book.
I'll try out the new kernel and post the results.
The test kernel solves the issue.
When would this be available to our customers?
Can we provide a link to this kernel?
It will be available in RHEL 5.1 (I do not think the release date is out). The
kernel in that link is an unstable one so I would not point customers to it. It
is not supported by red hat.
You can work around the problem by just turning the nop as a ping off by setting
node.conn.timeo.noop_out_interval = 0
node.conn.timeo.noop_out_timeout = 0
in the /etc/iscsi/iscsi.conf file
And then rediscovering the iscsi devices (rerun iscsiadm -m discovery -p ip:port).