Red Hat Bugzilla – Bug 197248
iSCSI discovery times out returning 512 volumes; RHEL5 can't discover >71 volumes
Last modified: 2008-07-24 16:00:05 EDT
Description of problem:
RHEL4 U3 software initiator times out attempting to perform discovery on an
EqualLogic array has 512 volumes.
Version-Release number of selected component (if applicable):
Reproduces every time, but requires EqualLogiv v3 firmware.
Steps to Reproduce:
1. Create 512 volumes on the EQL array (unrestricted access)
2. Start the software initiator
Not all volumes are connected.
All volumes should be connected.
In the initiator, the MaxRecvDataSegmentLength for discovery is only 8k. For
512 volumes, it requires 8 text requests/response exchanged to return all the
information. The exchanges take longer than the 15 second timeout that the EQL
array waits for the discovery process to complete. This is a request to raise
the value of MaxRecvDataSegmentLength to 64k.
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release. Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products. This request is not yet committed for inclusion in an Update
This bugzilla had previously been approved for engineering
consideration but Red Hat Product Management is currently reevaluating
this issue for inclusion in RHEL4.6.
Created attachment 156365 [details]
Wireshark trace of RHEL5 discovery
This problem also appears in RHEL5. This limits discovery to <71 volumes.
The user sees the following failure:
iscsiadm --mode discovery --type sendtargets --portal 10.127.16.150
iscsiadm: can not parse discovery info key '...' Bug?
The network trace shows that similar to RHEL 4.4, open-iscsi in RHEL 5.0
continues to use 8K for MaxRecvDataSegmentLength and hence we are only able to
fit 71 IQN strings in a TextResponse. The open-iscsi initiator requests the rest
and we respond with another 71 IQN strings. The second response does not have
the Final Bit set since there are more targets to send. The open-iscsi initiator
barfs at the second response. In looking at the open-iscsi initiator code it
looks like the initiator code has issues because the second TextResponse does
not start with the keyvalue "TargetName=". The first TextResponse ends in the
middle of the IQN string and hence the second TEXT response begins with the
part of the IQN string and not "TargetName="
Also took a network trace on Windows Server 2003 R2. Here I had 585 volumes.
Since windows uses a MaxRecvDataSegmentLength of 64K, we are able to return 561
IQN strings. The rest of the 24 IQN strings is returned in a follow-on Text
Response. The second TextResponse does have the Final bit set.
I will upload both cap files (open-iscsi and windows)
Aside from the fix to support a higher value for MaxRecvDataSegmentLength, the
TextResponse parsing code should be checked to fix the smaller
MaxRecvDataSegmentLength problem that was seen.
Created attachment 156366 [details]
Wireshark trace of Windows discovery
Yeah Cesar it looks like the parsring code is broken (RHEL4 and RHEL5 share a
lot of discovery code so it is broken in both).
For RHEL5 though, I made the MaxRecvDataSegmentLength configurable since you had
asked about that a while back. Is there any way you can test that out real quick?
Just have you guys grab the svn tree and set
in iscsid.conf to whatever you guys need and let er rip.
For RHEL4 we are still working on a fix and for RHEL5 I will try to fix up the
parsing code too, but I doubt I will be able to do soon.
Oh yeah, when I say svn code, I mean the open-iscsi.org svn code. Don knows what
that is and should be able to help your guys use it with no trouble. If you guys
have trouble let me know.
Move to 4.7. It turns out that there is a bug in linux-iscsi which does not
parse targets/portals straddling pdus and that is the problem. This will take
more time to fix.
Will this fix go into a v5.x release? Thanks.
Hey Cesar, could you guys retry this test with the initiator in RHEL 4.6?
If it does not work could you run iscsid by hand and do:
iscsid -d 8
Send all the output.
I created a series of scripts to create 512 volumes on our equallogic array. I
ran on the RHEL4-U7-re20080625.0 build. The iscsi build was
iscsi-initiator-utils-18.104.22.168-7. The system was able to log in to all the
targets and then successfully shut down all the targets.
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.