Bug 170656
Summary: | iSCSI connection recovery uses session address instead of portal address | ||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 4 | Reporter: | Cesar Garde <cgarde> | ||||||||||||||||||
Component: | kernel | Assignee: | Mike Christie <mchristi> | ||||||||||||||||||
Status: | CLOSED ERRATA | QA Contact: | Brock Organ <borgan> | ||||||||||||||||||
Severity: | high | Docs Contact: | |||||||||||||||||||
Priority: | medium | ||||||||||||||||||||
Version: | 4.0 | CC: | coughlan, mchristi, poelstra, rkenna | ||||||||||||||||||
Target Milestone: | --- | ||||||||||||||||||||
Target Release: | --- | ||||||||||||||||||||
Hardware: | i386 | ||||||||||||||||||||
OS: | Linux | ||||||||||||||||||||
Whiteboard: | |||||||||||||||||||||
Fixed In Version: | RHSA-2006-0132 | Doc Type: | Bug Fix | ||||||||||||||||||
Doc Text: | Story Points: | --- | |||||||||||||||||||
Clone Of: | Environment: | ||||||||||||||||||||
Last Closed: | 2006-03-07 20:24:24 UTC | Type: | --- | ||||||||||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||||||||||
Documentation: | --- | CRM: | |||||||||||||||||||
Verified Versions: | Category: | --- | |||||||||||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||||||||
Embargoed: | |||||||||||||||||||||
Bug Depends On: | |||||||||||||||||||||
Bug Blocks: | 168429 | ||||||||||||||||||||
Attachments: |
|
Description
Cesar Garde
2005-10-13 15:44:39 UTC
Changing Component to kernel. In the future please try to select iscsi-initiator-utils for userspace problems and kernel for driver problems. If there is a kernel and userspace change required then make two :(. iSCSI is legacy Component from when it was all bundled in one rpm. Thanks. Note that if we add some sory of heristics to determine when to switch paths userspace changes will be neccessary. When you have a fix identified, would it be available for us in case we have customers that run into this problem? Created attachment 121044 [details]
retry portal addr after 5 retries
Cesar, could you verify this patch works for you? It is the simple fix we
discussed earlier. To fully support login redirect, we need some fixes in
userspace for iscsi mangement and that might not be done in U3 (U4 for sure).
For now though, I would at least like to get something that works for you guys.
oh yeah, that patch should work against the current RHEL4 U2 kernel source. Comment on attachment 121044 [details]
retry portal addr after 5 retries
Wrong path. Do not test this one. I will upload a correct patch in a minute.
Created attachment 121045 [details]
retry portal address
Ok Cesar, please try this patch.
Thanks Mike. I pulled the patch in comment 12. We'll let you know how it works. Created attachment 121133 [details]
retry portal address immediately
Cessar, sorry about this. Cisco has sent us a fix for this they prefer. It
turns out it is also what your engineer had mentioned too. Please verify this
patch works for you guys.
Just to be clear, I should use the attachment in comment 14 instead of the one in comment 12, correct? Yes. Created attachment 121946 [details]
network trace of 11-16-05 patch retest
Sorry for the delayed response. We finally had a chance to retest the patch and found that the initiator does not attempt a login to the portal address. I included the trace in comment 19. This trace only show the first relogin attempt right? It does not fall back to the portal address until the relogin times out. BTW, you should be seeting the login_timout to a much shorter value to avoid delays. Created attachment 121949 [details]
patch retest with login_timeout=5
A little more detail about the retest:
Login_timout was set to 5 in iscsi.conf and iscsid was restarted.
Initiator (172.19.31.8) connected to 3 volumes on the EQL array. 2 connected
to IP address 172.19.102.161, and 1 connected to IP address .162.
Ethernet port for the .161 address was shut down.
fdisk command on the iscsi targets was issued.
Response came back from .162 (not shut down), and the command did not return.
Trace was stopped after 5 minutes.
could you post the /var/log/messages too? also I am not sure how a iscsid restart will work is the session you are resetting the login_timeout for was created already. Could you just do a: echo 5 > /sys/class/scsi_host/hostN/login_timeout or set the login_timeout before iscsid is run for the first time. Created attachment 121983 [details]
messages file from 12-7-05 retest
We re-ran the test today and started with a reboot of the system. This is the
messages file. The network trace will be in the next attachment.
Created attachment 121984 [details]
network trace of 12-7-05 retest
Network trace from today's retest of the 11-16 patch. login_timeout was set to
5.
Thanks Cesar, I have not gotton to look at the network trace, but wrt to the messages could you verify for me that for the session on host0 you did someting that forced a logout and then that session came back fine, but for the sessions connected to host1 and host2 did you just pull a cable? We didn't do anything to force a logout on host0. After the boot, connections were made to 3 volumes on the array (which show up as host0, host1, host2). Host0 connected to one of the 3 ethernet ports, and host1 and host2 connected to another. Nothing was done with the host0 connection. We shut down the ethernet port that host1 and host2 were using. Would the login redirect show up as a logout, then a login? No. In the messages the logout is showing up as a result of a AEN. Dec 7 10:19:38 serverb kernel: iscsi-sfnet:host0: Target requests logout within 3 seconds for session Dec 7 10:19:38 serverb kernel: iscsi-sfnet:host0: Session logged out Dec 7 10:19:38 serverb kernel: iscsi-sfnet:host0: Session dropped Dec 7 10:19:39 serverb kernel: iscsi-sfnet:host0: Login failed to authenticate with target iqn.2001-05.com.equallogic:6-8a0900-7f4a52a01-60a000038e64395a-vol1 Dec 7 10:19:39 serverb kernel: iscsi-sfnet:host0: Session established I may have misunderstood you question. I guess when we login we will get a error value indicating that the login failed becuase the target wants to us to try another address. Then there is that case above where for host0 we are logged in, then the target logged us out. I did not see that in your trace so I am not sure what happened as far as what addresses we used. Created attachment 122017 [details] add debug output Could you grab the kernel from here http://people.redhat.com/~jbaron/rhel4/ and apply this patch and send me all the log messages (login and failure). Could you also maybe simplify the problem and not use so many ports or something? We took the update 2 initiator and built it with the kernel in http://people.redhat.com/~jbaron/rhel4/, and it works fine for us. We'll check it out in U3. In the interim, can we get a hotfix that we can make available to customers that may run into this problem before U3 is available? The fix resolves this bug from our point of view. I'll let you move its status according to your process. Thanks for your help. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2006-0132.html |