Bug 197248 - iSCSI discovery times out returning 512 volumes; RHEL5 can't discover >71 volumes
iSCSI discovery times out returning 512 volumes; RHEL5 can't discover >71 vol...
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: iscsi-initiator-utils (Show other bugs)
4.0
All Linux
medium Severity medium
: ---
: ---
Assigned To: Mike Christie
:
Depends On:
Blocks: 176344
  Show dependency treegraph
 
Reported: 2006-06-29 13:28 EDT by Cesar Garde
Modified: 2008-07-24 16:00 EDT (History)
0 users

See Also:
Fixed In Version: RHBA-2008-0743
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2008-07-24 16:00:05 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Wireshark trace of RHEL5 discovery (107.39 KB, application/octet-stream)
2007-06-06 11:59 EDT, Cesar Garde
no flags Details
Wireshark trace of Windows discovery (172.96 KB, application/octet-stream)
2007-06-06 12:08 EDT, Cesar Garde
no flags Details

  None (edit)
Description Cesar Garde 2006-06-29 13:28:55 EDT
Description of problem:
RHEL4 U3 software initiator times out attempting to perform discovery on an 
EqualLogic array has 512 volumes.

Version-Release number of selected component (if applicable):


How reproducible:
Reproduces every time, but requires EqualLogiv v3 firmware.

Steps to Reproduce:
1. Create 512 volumes on the EQL array (unrestricted access)
2. Start the software initiator
  
Actual results:
Not all volumes are connected.

Expected results:
All volumes should be connected.

Additional info:
In the initiator, the MaxRecvDataSegmentLength for discovery is only 8k.  For
512 volumes, it requires 8 text requests/response exchanged to return all the
information.  The exchanges take longer than the 15 second timeout that the EQL
array waits for the discovery process to complete.  This is a request to raise
the value of MaxRecvDataSegmentLength to 64k.
Comment 2 RHEL Product and Program Management 2006-08-18 11:18:50 EDT
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.
Comment 5 RHEL Product and Program Management 2007-03-09 19:59:49 EST
This bugzilla had previously been approved for engineering
consideration but Red Hat Product Management is currently reevaluating
this issue for inclusion in RHEL4.6.
Comment 6 RHEL Product and Program Management 2007-05-09 06:06:57 EDT
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.
Comment 7 Cesar Garde 2007-06-06 11:59:53 EDT
Created attachment 156365 [details]
Wireshark trace of RHEL5 discovery
Comment 8 Cesar Garde 2007-06-06 12:05:58 EDT
This problem also appears in RHEL5.  This limits discovery to <71 volumes.

The user sees the following failure: 

iscsiadm --mode discovery --type sendtargets --portal 10.127.16.150
iscsiadm: can not parse discovery info key '...' Bug?

The network trace shows that similar to RHEL 4.4, open-iscsi in RHEL 5.0
continues to use 8K for MaxRecvDataSegmentLength and hence we are only able to
fit 71 IQN strings in a TextResponse. The open-iscsi initiator requests the rest
and we respond with another 71 IQN strings. The second response does not have
the Final Bit set since there are more targets to send. The open-iscsi initiator
barfs at the second response. In looking at the open-iscsi initiator code it
looks like the initiator code has issues because the second TextResponse does
not start with the keyvalue "TargetName=". The first TextResponse ends in the
middle of the IQN string and hence the second TEXT response begins with the
part of the IQN string and not "TargetName=" 

Also took a network trace on Windows Server 2003 R2. Here I had 585 volumes.
Since windows uses a MaxRecvDataSegmentLength of 64K, we are able to return 561
IQN strings. The rest of the 24 IQN strings is returned in a follow-on Text
Response. The second TextResponse does have the Final bit set. 

I will upload both cap files (open-iscsi and windows)

Aside from the fix to support a higher value for MaxRecvDataSegmentLength, the
TextResponse parsing code should be checked to fix the smaller
MaxRecvDataSegmentLength problem that was seen.
Comment 9 Cesar Garde 2007-06-06 12:08:46 EDT
Created attachment 156366 [details]
Wireshark trace of Windows discovery
Comment 10 Mike Christie 2007-06-06 20:33:27 EDT
Yeah Cesar it looks like the parsring code is broken (RHEL4 and RHEL5 share a
lot of discovery code so it is broken in both).

For RHEL5 though, I made the MaxRecvDataSegmentLength configurable since you had
asked about that a while back. Is there any way you can test that out real quick?

Just have you guys grab the svn tree and set

discovery.sendtargets.iscsi.MaxRecvDataSegmentLength =

in iscsid.conf to whatever you guys need and let er rip.

For RHEL4 we are still working on a fix and for RHEL5 I will try to fix up the
parsing code too, but I doubt I will be able to do soon.
Comment 11 Mike Christie 2007-06-06 20:35:45 EDT
Oh yeah, when I say svn code, I mean the open-iscsi.org svn code. Don knows what
that is and should be able to help your guys use it with no trouble. If you guys
have trouble let me know.
Comment 12 Mike Christie 2007-08-23 12:53:00 EDT
Move to 4.7. It turns out that there is a bug in linux-iscsi which does not
parse targets/portals straddling pdus and that is the problem. This will take
more time to fix.
Comment 13 Cesar Garde 2007-08-23 13:15:16 EDT
Will this fix go into a v5.x release?  Thanks.
Comment 14 Mike Christie 2008-04-11 02:07:59 EDT
Hey Cesar, could you guys retry this test with the initiator in RHEL 4.6?

If it does not work could you run iscsid by hand and do:

iscsid -d 8

Send all the output.
Comment 16 Barry Donahue 2008-07-01 16:53:05 EDT
I created a series of scripts to create 512 volumes on our equallogic array. I
ran on the RHEL4-U7-re20080625.0 build. The iscsi build was
iscsi-initiator-utils-4.0.3.0-7. The system was able to log in to all the
targets and then successfully shut down all the targets.
Comment 18 errata-xmlrpc 2008-07-24 16:00:05 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2008-0743.html

Note You need to log in before you can comment on or make changes to this bug.