241698 – iSCSI protocol error with No-Op outs

Bug 241698 - iSCSI protocol error with No-Op outs

Summary: iSCSI protocol error with No-Op outs

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	iscsi-initiator-utils
Sub Component:
Version:	5.0
Hardware:	i686
OS:	Linux
Priority:	medium
Severity:	urgent
Target Milestone:	---
Target Release:	---
Assignee:	Mike Christie
QA Contact:	Brock Organ
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2007-05-29 17:00 UTC by John Yeazell
Modified:	2008-04-07 05:13 UTC (History)
CC List:	3 users (show)
Fixed In Version:	5.1
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2007-11-14 15:00:16 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description John Yeazell 2007-05-29 17:00:09 UTC

Description of problem:
The iscsi initiator issues a No-OP out with the final bit set and the CmdSn is 
incremented, when it should not. Also the ITT is not set to 0XFFFFFFFF.
This causes our target (Overland REO, tape device) to kill the sesion as 
supported by error recovery level 0
I am ot sure if the no-op out is being used to send the target with the latest 
value of ExpStatusSN in which case the imid bit should be set, the CmdSn 
should not be incremented and the ITT should be set to 0xFFFFFFFF or if it is 
being used to "ping" the target in which case the imid bit should not be set, 
the CmdSn should increment anf ITT should be set accordingly.
Version-Release number of selected component (if applicable):


How reproducible:
Very reproducable with a medium load, 8 target devices

Steps to Reproduce:
1.With 8 tape devices created on our REO product and all logged i via iSCSI 
launch dt with the following parameters:
  of=/dev/stX pattern=incr limit=inf passes=inf enable=Debug dtype=tape bs=64k
  X indicates the tape drive number. I this case 0 through 7
2.
3.
  
Actual results:
After about 20 minutes dt will exit out on 1 or more streams.

Expected results:
I/O to continue until dt is maualy stopped

Additional info:
Finisar trace of no-op issue ca be provided.

Comment 1 Mike Christie 2007-06-20 16:40:24 UTC

Sorry for the late reply on this one. The nop is being used as a ping. Are you
getting a valid itt, and the cmdsn incremented, but is the issue with the imid
bit? I am not sure I see in the spec where the immediate bit cannot be set for a
nop used as a ping. What section is that?

There was a bug in RHEL5 where you could get a scsi command related pdu and a
nop with bad CmdSn numbers. What would happen is that a scsi command could get
CmdSn 5 and a nop would get CmdSn 4, but the initiator would send the scsi
command pdu first. This was causing all types of problems. If that is the issue
then this 2.6.18-27.el5 should fix the problem. You can download this test
kernel from http://people.redhat.com/dzickus/el5

Comment 2 John Yeazell 2007-06-21 15:34:37 UTC

I don't see a reference to the imid bit in the spec. Some of the books I have 
indicate thatthis is the case. Could be an Interpretation issue from the 
authors of the book.

I'll try out the new kernel and post the results.

Comment 3 John Yeazell 2007-06-22 21:36:44 UTC

The test kernel solves the issue.
When would this be available to our customers?
Can we provide a link to this kernel?

Comment 4 Mike Christie 2007-06-22 21:43:24 UTC

It will be available in RHEL 5.1 (I do not think the release date is out). The
kernel in that link is an unstable one so I would not point customers to it. It
is not supported by red hat.

You can work around the problem by just turning the nop as a ping off by setting

node.conn[0].timeo.noop_out_interval = 0
node.conn[0].timeo.noop_out_timeout = 0

in the /etc/iscsi/iscsi.conf file
And then rediscovering the iscsi devices (rerun iscsiadm -m discovery -p ip:port).

Note You need to log in before you can comment on or make changes to this bug.