Bug 167946 - iscsi reload can hang during "iscsi-rescan" if targets are unreachable
iscsi reload can hang during "iscsi-rescan" if targets are unreachable
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: iscsi-initiator-utils (Show other bugs)
4.0
All Linux
medium Severity medium
: ---
: ---
Assigned To: Mike Christie
:
Depends On:
Blocks: 168429
  Show dependency treegraph
 
Reported: 2005-09-09 15:04 EDT by Dave Wysochanski
Modified: 2007-11-30 17:07 EST (History)
4 users (show)

See Also:
Fixed In Version: RHBA-2006-0109
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2006-03-07 13:50:29 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
iscsi-kill-session (1.75 KB, text/plain)
2005-09-09 15:04 EDT, Dave Wysochanski
no flags Details

  None (edit)
Description Dave Wysochanski 2005-09-09 15:04:32 EDT
Description of problem:

If any targets are unreachable, and you try an "iscsi reload", the command will
hang indefinately.  

I came up with another simple script that issues the kill session command, and
seems to work ok in the session hangs I found.  I'll attach the script and let
you all decide whether you think this is ok to include or not - seems to meet my
needs but you may have issues with including it.  Also, for this particular
problem, there may be other more elegant solutions but I thought the simple
script was worth mentioning since it solved my problems.

Note that the README states how to shutdown sessions (see below), but having a
simple utility do it seems worth considering.

  Note that any configuration changes will not affect existing target sessions.
  For example, removal of a DiscoveryAddress entry from /etc/iscsi.conf
  will not cause the removal of sessions to targets discovered through this
  DiscoveryAddress, but it will cause the removal of the discovery session
  corresponding to the deleted DiscoveryAddress.
  To remove these sessions execute the following for each of these sessions:
  - Set ConnFailTimeout, ResetTimeout and AbortTimeout on the session to a low
    value, like 5 seconds through sysfs by executing the following commands:
    echo 5 > /sys/class/scsi_host/host<host_no>/connfail_timeout
    echo 5 > /sys/class/scsi_host/host<host_no>/reset_timeout
    echo 5 > /sys/class/scsi_host/host<host_no>/abort_timeout
  - Stop IOs to all the devices discovered through the session.
  - If these devices have been mounted, unmount them.
  - Shutdown the session through sysfs by executing the following command:
    echo > /sys/class/scsi_host/host<host_no>/shutdown
    where <host_no> is the Host Number of the session that has to be removed.



Version-Release number of selected component (if applicable):
iscsi-initiator-utils-4.0.3.0-2

How reproducible:
Every time

Steps to Reproduce:
1. Enter DiscoveryAddress=IP1 into /etc/iscsi.conf
2. /etc/init.d/iscsi start
3. Add DiscoveryAddress=IP2 in /etc/iscsi.conf
4. Somehow make target on IP1 unreachable (pull cable, stop iscsi on target
side, etc)
5. /etc/init.d/iscsi reload *command hang*
  
Actual results:
"/etc/init.d/iscsi reload" hangs indefinately

Expected results:
"/etc/init.d/iscsi reload" should not hang indefinately

Additional info:
Comment 1 Dave Wysochanski 2005-09-09 15:04:32 EDT
Created attachment 118661 [details]
iscsi-kill-session
Comment 2 Dave Wysochanski 2005-09-09 15:30:17 EDT
Well, I thought maybe reversing the order of the "iscsi reload" and the
"killproc -HUP iscsid" would help, and it looks like it does solve the problem
where the initiator cannot reconnect due to the tpgt change, but still hangs
indefinately if another target is unreachable.

Unfortunately, this is a consequence of the change in "iscsi reload" behavior,
and us putting the "iscsi-rescan" in there to get back the behavior of picking
up the newly mapped luns (old iscsid/driver used to handle the rescan, but for
rhel4 we put it in userspace "iscsi-rescan" command).  This command will hang
indefinately if any targets are unreachable.
Comment 3 Mike Christie 2005-09-09 15:45:25 EDT
I am fine with a script to kill sessions.

Where are we hanging exactly? The rescan or the discovery or something else?
Comment 4 Mike Christie 2005-09-09 16:46:06 EDT
Oh is the inquery or report luns gettting stuck in our drivers scsi eh code?
Comment 5 Dave Wysochanski 2005-09-09 16:55:53 EDT
One case is in the driver if the tpgt changes (this can happen for us during
certain upgrade scenarios - it's not super likely but can happen so we need a
recovery procedure).

In this case, I see these:
Sep  9 16:31:27 tso kernel: iscsi-sfnet:host47: Portal group tag mismatch,
expected 1, received 1001
Sep  9 16:31:27 tso kernel: iscsi-sfnet:host48: Portal group tag mismatch,
expected 1, received 1002
Sep  9 16:31:27 tso kernel: iscsi-sfnet:host48: Waiting 1 seconds before next
login attempt
Sep  9 16:31:27 tso kernel: iscsi-sfnet:host47: Waiting 1 seconds before next
login attempt
Sep  9 16:31:30 tso kernel: iscsi-sfnet:host48: Portal group tag mismatch,
expected 1, received 1002
Sep  9 16:31:30 tso kernel: iscsi-sfnet:host47: Portal group tag mismatch,
expected 1, received 1001
Sep  9 16:31:30 tso kernel: iscsi-sfnet:host47: Waiting 1 seconds before next
login attempt
Sep  9 16:31:30 tso kernel: iscsi-sfnet:host48: Waiting 1 seconds before next
login attempt
Sep  9 16:31:33 tso kernel: iscsi-sfnet:host47: Portal group tag mismatch,
expected 1, received 1001
Sep  9 16:31:33 tso kernel: iscsi-sfnet:host48: Portal group tag mismatch,
expected 1, received 1002
Sep  9 16:31:33 tso kernel: iscsi-sfnet:host48: Waiting 1 seconds before next
login attempt
Sep  9 16:31:33 tso kernel: iscsi-sfnet:host47: Waiting 1 seconds before next
login attempt

On rhel3 and prior drivers, "iscsi reload" would signal the daemon which would
in turn signal the driver and it would reconnect (not sure exactly why - maybe
it would just ignore the old tpgt when it saw the rescan from iscsid?).  On
RHEL4, signalling iscsid does not recover things - the only way I found to
recover them is to kill the sessions explicitly.  Here's what I get if I signal
iscsid in RHEL4:


Sep  9 16:35:00 tso kernel: iscsi-sfnet:host48: Waiting 10 seconds before next
login attempt
Sep  9 16:35:12 tso kernel: iscsi-sfnet:host47: Portal group tag mismatch,
expected 1, received 1001
Sep  9 16:35:12 tso kernel: iscsi-sfnet:host48: Portal group tag mismatch,
expected 1, received 1002
Sep  9 16:35:12 tso kernel: iscsi-sfnet:host48: Waiting 10 seconds before next
login attempt
Sep  9 16:35:12 tso kernel: iscsi-sfnet:host47: Waiting 10 seconds before next
login attempt
Sep  9 16:35:14 tso iscsid[31126]: Connected to Discovery Address 10.60.155.91
Sep  9 16:35:14 tso iscsid[30525]: updating bus 0 target 0 to configuration #2
Sep  9 16:35:14 tso kernel: iscsi-sfnet: Could not find session to update
Sep  9 16:35:14 tso iscsid[31127]: iSCSI session ioctl failed for
iqn.1992-08.com.netapp:sn.50391573.winston, Resource temporarily unavailable:
Resource temporarily unavailable
Sep  9 16:35:14 tso iscsid[30525]: updating bus 0 target 0 to configuration #2
Sep  9 16:35:14 tso kernel: iscsi-sfnet: Could not find session to update
Sep  9 16:35:14 tso iscsid[31128]: iSCSI session ioctl failed for
iqn.1992-08.com.netapp:sn.50391573.winston, Resource temporarily unavailable:
Resource temporarily unavailable
Sep  9 16:35:15 tso kernel: iscsi-sfnet: Could not find session to update
Sep  9 16:35:15 tso iscsid[31127]: iSCSI session ioctl failed for
iqn.1992-08.com.netapp:sn.50391573.winston, Resource temporarily unavailable:
Resource temporarily unavailable
Sep  9 16:35:15 tso kernel: iscsi-sfnet: Could not find session to update
Sep  9 16:35:15 tso iscsid[31128]: iSCSI session ioctl failed for
iqn.1992-08.com.netapp:sn.50391573.winston, Resource temporarily unavailable:
Resource temporarily unavailable
Sep  9 16:35:16 tso kernel: iscsi-sfnet: Could not find session to update
Sep  9 16:35:16 tso iscsid[31127]: iSCSI session ioctl failed for
iqn.1992-08.com.netapp:sn.50391573.winston, Resource temporarily unavailable:
Resource temporarily unavailable
Sep  9 16:35:16 tso kernel: iscsi-sfnet: Could not find session to update
Sep  9 16:35:16 tso iscsid[31128]: iSCSI session ioctl failed for
iqn.1992-08.com.netapp:sn.50391573.winston, Resource temporarily unavailable:
Resource temporarily unavailable
Sep  9 16:35:17 tso kernel: iscsi-sfnet: Could not find session to update
Sep  9 16:35:17 tso iscsid[31127]: iSCSI session ioctl failed for
iqn.1992-08.com.netapp:sn.50391573.winston, Resource temporarily unavailable:
Resource temporarily unavailable
Sep  9 16:35:17 tso kernel: iscsi-sfnet: Could not find session to update
Sep  9 16:35:17 tso iscsid[31128]: iSCSI session ioctl failed for
iqn.1992-08.com.netapp:sn.50391573.winston, Resource temporarily unavailable:
Resource temporarily unavailable
Sep  9 16:35:18 tso kernel: iscsi-sfnet: Could not find session to update
Sep  9 16:35:18 tso iscsid[31127]: iSCSI session ioctl failed for
iqn.1992-08.com.netapp:sn.50391573.winston, Resource temporarily unavailable:
Resource temporarily unavailable
Sep  9 16:35:18 tso kernel: iscsi-sfnet: Could not find session to update
Sep  9 16:35:18 tso iscsid[31128]: iSCSI session ioctl failed for
iqn.1992-08.com.netapp:sn.50391573.winston, Resource temporarily unavailable:
Resource temporarily unavailable
Sep  9 16:35:19 tso kernel: iscsi-sfnet: Could not find session to update
Sep  9 16:35:19 tso iscsid[31127]: iSCSI session ioctl failed for
iqn.1992-08.com.netapp:sn.50391573.winston, Resource temporarily unavailable:
Resource temporarily unavailable
Sep  9 16:35:19 tso kernel: iscsi-sfnet: Could not find session to update
Sep  9 16:35:19 tso iscsid[31128]: iSCSI session ioctl failed for
iqn.1992-08.com.netapp:sn.50391573.winston, Resource temporarily unavailable:
Resource temporarily unavailable
Sep  9 16:35:20 tso kernel: iscsi-sfnet: Could not find session to update
Sep  9 16:35:20 tso iscsid[31127]: iSCSI session ioctl failed for
iqn.1992-08.com.netapp:sn.50391573.winston, Resource temporarily unavailable:
Resource temporarily unavailable
Sep  9 16:35:20 tso kernel: iscsi-sfnet: Could not find session to update
Sep  9 16:35:20 tso iscsid[31128]: iSCSI session ioctl failed for
iqn.1992-08.com.netapp:sn.50391573.winston, Resource temporarily unavailable:
Resource temporarily unavailable
Sep  9 16:35:21 tso kernel: iscsi-sfnet: Could not find session to update
Sep  9 16:35:21 tso iscsid[31127]: iSCSI session ioctl failed for
iqn.1992-08.com.netapp:sn.50391573.winston, Resource temporarily unavailable:
Resource temporarily unavailable
Sep  9 16:35:21 tso kernel: iscsi-sfnet: Could not find session to update
Sep  9 16:35:21 tso iscsid[31128]: iSCSI session ioctl failed for
iqn.1992-08.com.netapp:sn.50391573.winston, Resource temporarily unavailable:
Resource temporarily unavailable
Sep  9 16:35:22 tso kernel: iscsi-sfnet: Could not find session to update
Sep  9 16:35:22 tso iscsid[31127]: iSCSI session ioctl failed for
iqn.1992-08.com.netapp:sn.50391573.winston, Resource temporarily unavailable:
Resource temporarily unavailable
Sep  9 16:35:22 tso kernel: iscsi-sfnet: Could not find session to update
Sep  9 16:35:22 tso iscsid[31128]: iSCSI session ioctl failed for
iqn.1992-08.com.netapp:sn.50391573.winston, Resource temporarily unavailable:
Resource temporarily unavailable
Sep  9 16:35:23 tso kernel: iscsi-sfnet: Could not find session to update
Sep  9 16:35:23 tso iscsid[31127]: iSCSI session ioctl failed for
iqn.1992-08.com.netapp:sn.50391573.winston, Resource temporarily unavailable:
Resource temporarily unavailable
Sep  9 16:35:23 tso kernel: iscsi-sfnet: Could not find session to update
Sep  9 16:35:23 tso iscsid[31128]: iSCSI session ioctl failed for
iqn.1992-08.com.netapp:sn.50391573.winston, Resource temporarily unavailable:
Resource temporarily unavailable
Sep  9 16:35:24 tso kernel: iscsi-sfnet: Could not find session to update
Sep  9 16:35:24 tso iscsid[31127]: iSCSI session ioctl failed for
iqn.1992-08.com.netapp:sn.50391573.winston, Resource temporarily unavailable:
Resource temporarily unavailable
Sep  9 16:35:24 tso kernel: iscsi-sfnet: Could not find session to update
Sep  9 16:35:24 tso iscsid[31128]: iSCSI session ioctl failed for
iqn.1992-08.com.netapp:sn.50391573.winston, Resource temporarily unavailable:
Resource temporarily unavailable
Sep  9 16:35:24 tso kernel: iscsi-sfnet:host48: Portal group tag mismatch,
expected 1, received 1002
Sep  9 16:35:24 tso kernel: iscsi-sfnet:host47: Portal group tag mismatch,
expected 1, received 1001
Sep  9 16:35:24 tso kernel: iscsi-sfnet:host47: Waiting 10 seconds before next
login attempt
Sep  9 16:35:24 tso kernel: iscsi-sfnet:host48: Waiting 10 seconds before next
login attempt
Sep  9 16:35:25 tso kernel: iscsi-sfnet: Could not find session to update
Sep  9 16:35:25 tso iscsid[31127]: iSCSI session ioctl failed for
iqn.1992-08.com.netapp:sn.50391573.winston, Resource temporarily unavailable:
Resource temporarily unavailable
Sep  9 16:35:25 tso kernel: iscsi-sfnet: Could not find session to update
Sep  9 16:35:25 tso iscsid[31128]: iSCSI session ioctl failed for
iqn.1992-08.com.netapp:sn.50391573.winston, Resource temporarily unavailable:
Resource temporarily unavailable
Sep  9 16:35:26 tso kernel: iscsi-sfnet: Could not find session to update
Sep  9 16:35:26 tso iscsid[31127]: iSCSI session ioctl failed for
iqn.1992-08.com.netapp:sn.50391573.winston, Resource temporarily unavailable:
Resource temporarily unavailable
Sep  9 16:35:26 tso kernel: iscsi-sfnet: Could not find session to update
Sep  9 16:35:26 tso iscsid[31128]: iSCSI session ioctl failed for
iqn.1992-08.com.netapp:sn.50391573.winston, Resource temporarily unavailable:
Resource temporarily unavailable
Sep  9 16:35:27 tso kernel: iscsi-sfnet: Could not find session to update
Sep  9 16:35:27 tso iscsid[31127]: iSCSI session ioctl failed for
iqn.1992-08.com.netapp:sn.50391573.winston, Resource temporarily unavailable:
Resource temporarily unavailable
Sep  9 16:35:27 tso kernel: iscsi-sfnet: Could not find session to update
Sep  9 16:35:27 tso iscsid[31128]: iSCSI session ioctl failed for
iqn.1992-08.com.netapp:sn.50391573.winston, Resource temporarily unavailable:
Resource temporarily unavailable
Sep  9 16:35:28 tso kernel: iscsi-sfnet: Could not find session to update
Sep  9 16:35:28 tso iscsid[31127]: iSCSI session ioctl failed for
iqn.1992-08.com.netapp:sn.50391573.winston, Resource temporarily unavailable:
Resource temporarily unavailable
Sep  9 16:35:28 tso kernel: iscsi-sfnet: Could not find session to update
Sep  9 16:35:28 tso iscsid[31128]: iSCSI session ioctl failed for
iqn.1992-08.com.netapp:sn.50391573.winston, Resource temporarily unavailable:
Resource temporarily unavailable
Sep  9 16:35:29 tso kernel: iscsi-sfnet: Could not find session to update
Sep  9 16:35:29 tso iscsid[31127]: iSCSI session ioctl failed for
iqn.1992-08.com.netapp:sn.50391573.winston, Resource temporarily unavailable:
Resource temporarily unavailable
Sep  9 16:35:29 tso kernel: iscsi-sfnet: Could not find session to update
Sep  9 16:35:29 tso iscsid[31128]: iSCSI session ioctl failed for
iqn.1992-08.com.netapp:sn.50391573.winston, Resource temporarily unavailable:
Resource temporarily unavailable
Sep  9 16:35:30 tso kernel: iscsi-sfnet: Could not find session to update
Sep  9 16:35:30 tso iscsid[31127]: iSCSI session ioctl failed for
iqn.1992-08.com.netapp:sn.50391573.winston, Resource temporarily unavailable:
Resource temporarily unavailable
Sep  9 16:35:30 tso kernel: iscsi-sfnet: Could not find session to update
Sep  9 16:35:30 tso iscsid[31128]: iSCSI session ioctl failed for
iqn.1992-08.com.netapp:sn.50391573.winston, Resource temporarily unavailable:
Resource temporarily unavailable
Sep  9 16:35:31 tso kernel: iscsi-sfnet: Could not find session to update
Sep  9 16:35:31 tso iscsid[31127]: iSCSI session ioctl failed for
iqn.1992-08.com.netapp:sn.50391573.winston, Resource temporarily unavailable:
Resource temporarily unavailable
Sep  9 16:35:31 tso kernel: iscsi-sfnet: Could not find session to update
Sep  9 16:35:31 tso iscsid[31128]: iSCSI session ioctl failed for
iqn.1992-08.com.netapp:sn.50391573.winston, Resource temporarily unavailable:
Resource temporarily unavailable
Sep  9 16:35:32 tso kernel: iscsi-sfnet: Could not find session to update
Sep  9 16:35:32 tso iscsid[31127]: iSCSI session ioctl failed for
iqn.1992-08.com.netapp:sn.50391573.winston, Resource temporarily unavailable:
Resource temporarily unavailable
Sep  9 16:35:32 tso kernel: iscsi-sfnet: Could not find session to update
Sep  9 16:35:32 tso iscsid[31128]: iSCSI session ioctl failed for
iqn.1992-08.com.netapp:sn.50391573.winston, Resource temporarily unavailable:
Resource temporarily unavailable
Sep  9 16:35:33 tso kernel: iscsi-sfnet: Could not find session to update
Sep  9 16:35:33 tso iscsid[31127]: iSCSI session ioctl failed for
iqn.1992-08.com.netapp:sn.50391573.winston, Resource temporarily unavailable:
Resource temporarily unavailable
Sep  9 16:35:33 tso kernel: iscsi-sfnet: Could not find session to update
Sep  9 16:35:33 tso iscsid[31128]: iSCSI session ioctl failed for
iqn.1992-08.com.netapp:sn.50391573.winston, Resource temporarily unavailable:
Resource temporarily unavailable
Sep  9 16:35:34 tso kernel: iscsi-sfnet: Could not find session to update
Sep  9 16:35:34 tso iscsid[31127]: iSCSI session ioctl failed for
iqn.1992-08.com.netapp:sn.50391573.winston, Resource temporarily unavailable:
Resource temporarily unavailable
Sep  9 16:35:34 tso kernel: iscsi-sfnet: Could not find session to update
Sep  9 16:35:34 tso iscsid[31128]: iSCSI session ioctl failed for
iqn.1992-08.com.netapp:sn.50391573.winston, Resource temporarily unavailable:
Resource temporarily unavailable
Sep  9 16:35:35 tso kernel: iscsi-sfnet: Could not find session to update
Sep  9 16:35:35 tso iscsid[31127]: iSCSI session ioctl failed for
iqn.1992-08.com.netapp:sn.50391573.winston, Resource temporarily unavailable:
Resource temporarily unavailable
Sep  9 16:35:35 tso kernel: iscsi-sfnet: Could not find session to update
Sep  9 16:35:35 tso iscsid[31128]: iSCSI session ioctl failed for
iqn.1992-08.com.netapp:sn.50391573.winston, Resource temporarily unavailable:
Resource temporarily unavailable
Sep  9 16:35:36 tso kernel: iscsi-sfnet:host47: Portal group tag mismatch,
expected 1, received 1001
Sep  9 16:35:36 tso kernel: iscsi-sfnet:host48: Portal group tag mismatch,
expected 1, received 1002
Sep  9 16:35:36 tso kernel: iscsi-sfnet:host48: Waiting 10 seconds before next
login attempt
Sep  9 16:35:36 tso kernel: iscsi-sfnet:host47: Waiting 10 seconds before next
login attempt
Sep  9 16:35:36 tso kernel: iscsi-sfnet: Could not find session to update
Sep  9 16:35:36 tso iscsid[31127]: iSCSI session ioctl failed for
iqn.1992-08.com.netapp:sn.50391573.winston, Resource temporarily unavailable:
Resource temporarily unavailable
Sep  9 16:35:36 tso kernel: iscsi-sfnet: Could not find session to update
Sep  9 16:35:36 tso iscsid[31128]: iSCSI session ioctl failed for
iqn.1992-08.com.netapp:sn.50391573.winston, Resource temporarily unavailable:
Resource temporarily unavailable
Sep  9 16:35:37 tso kernel: iscsi-sfnet: Could not find session to update
Sep  9 16:35:37 tso iscsid[31127]: iSCSI session ioctl failed for
iqn.1992-08.com.netapp:sn.50391573.winston, Resource temporarily unavailable:
Resource temporarily unavailable
Sep  9 16:35:37 tso kernel: iscsi-sfnet: Could not find session to update
Sep  9 16:35:37 tso iscsid[31128]: iSCSI session ioctl failed for
iqn.1992-08.com.netapp:sn.50391573.winston, Resource temporarily unavailable:
Resource temporarily unavailable
Sep  9 16:35:38 tso kernel: iscsi-sfnet: Could not find session to update


Comment 6 Dave Wysochanski 2005-09-09 17:03:06 EDT
The other case was entered above into the bugzilla, but I think there's a simple
workaround for that one (just check to see if session is established before
trying to write to scan attribute for each host).  Will try this and attach a
patch if it works.

The tpgt change case remains.  I think it might be ok to just tell them to kill
the sessions though - not sure if we need anything other than that (probably
unlikely case).  I'll check to see what people think about the workaround of
killing the sessions.

Thanks.
Comment 7 Dave Wysochanski 2005-09-09 17:18:02 EDT
The aforementioned scheme seems to work well to handle the case for which this
bugzilla was originally filed.  Here's the patch:

[root@tso SPECS]# diff -Nurp /sbin/iscsi-rescan.orig /sbin/iscsi-rescan
--- /sbin/iscsi-rescan.orig     2005-09-09 17:05:42.000000000 -0400
+++ /sbin/iscsi-rescan  2005-09-09 17:15:56.000000000 -0400
@@ -78,6 +78,11 @@ no_hosts=${#host_ids[*]}

 i=0
 while [ $i -lt $no_hosts ]; do
-       echo "0 0 -" > $SCSI_HOST/host${host_ids[$i]}/scan
+       if [ `cat $SCSI_HOST/host${host_ids[$i]}/session_established` -eq 1 ]; then
+               echo "Rescanning host${host_ids[$i]} "
+               echo "0 0 -" > $SCSI_HOST/host${host_ids[$i]}/scan
+       else
+               echo "Skipping host${host_ids[$i]} - session not established"
+       fi
        let "i += 1"
 done
Comment 8 Dave Wysochanski 2005-09-09 17:30:20 EDT
*** Bug 167973 has been marked as a duplicate of this bug. ***
Comment 9 Dave Wysochanski 2005-09-09 17:39:00 EDT
If you want, I can file another bugzilla to track the tpgt scenario since it's
really different than this one.  New "iscsi-kill-session" script in attachment
of this bugzilla applies to that case, not the general case of "iscsi reload"
hanging.
Comment 10 Mike Christie 2005-09-09 18:09:56 EDT
I think it is becuase of how we match sessions. If the portal group changes and
iscsid uses the new value then it will never be able to update the old one.
Comment 11 Mike Christie 2005-09-09 18:27:59 EDT
I cannot find the mail where we discussed the update problem. But if I remember
correctly, we assumed iscsid would stay alive and that it would detect the
change and kill the old bad session and start a new one. But if iscsid was not
up for some reason then you were out of luck since we could not rebuild iscsids
old state reliably.
Comment 12 Mike Christie 2005-09-11 02:45:14 EDT
actually we did not want to kill the old session for updates since it will fail
all the IO in flight and disrupt a bunc of other things.
Comment 13 Mike Christie 2005-09-11 03:11:10 EDT
Yeah so we discussed this on the list. Krishna removed it and I asked him to add
this functionality back. Obviosly he did not. It turns out to add it back now we
will have to change the interfacae in U3 which I think would violate some RHEL
thing. Problem is, for Cisco's ipv6 addition we will have to update the ioctl too.

Tom what can I do in this case?

Also Dave, can the targetname ever change?
Comment 14 Dave Wysochanski 2005-09-11 12:01:24 EDT
Yes, conceivably it can change.  But we need to be careful here.

From the protocol perspective, I think the right thing to do in the case of a
tpgt change (or anything that changes the I_T nexus identifiers), is to treat it
as a different session.  Here's the definition from RFC 3720:
"   - I_T nexus: According to [SAM2], the I_T nexus is a relationship
     between a SCSI Initiator Port and a SCSI Target Port.  For iSCSI,
     this relationship is a session, defined as a relationship between
     an iSCSI Initiator's end of the session (SCSI Initiator Port) and
     the iSCSI Target's Portal Group.  The I_T nexus can be identified
     by the conjunction of the SCSI port names; that is, the I_T nexus
     identifier is the tuple (iSCSI Initiator Name + ',i,'+ ISID, iSCSI
     Target Name + ',t,'+ Portal Group Tag)."

I personally don't know why exactly Cisco didn't want to do that.  Being a
target vendor, I can guess at why they wanted the initiator to just "look the
other way", but I'm not sure that's the right thing to do - seems unsafe for
certain conceivable (though perhaps arguably pathalogical) cases.

I guess that's why I went ahead and just said maybe we should just add the
"iscsi-kill-session" thing in this case (for us, it'll probably be pretty rare,
though we need to handle it).  I need to ask though for sure.  I think at one
point we had an upgrade scenario where the tpgt could change even though nothing
else changed, but then they went back and made the tpgt's backward compatible
across upgrades.  I need to check though.

This also might be a good question for the ips reflector if we want to be sure.
Comment 15 Dave Wysochanski 2005-09-11 17:42:28 EDT
Filed bz168057 to track tpgt change issue.
Comment 16 Dave Wysochanski 2005-09-28 14:43:02 EDT
Mike, do you agree we can include the patch in #7 for the next RHEL4 (update 3)?

I think that patch addresses this bugzilla directly (though there may be
other unrelated issues we should track separately).

Thanks.
Comment 17 Mike Christie 2005-09-28 15:02:29 EDT
yeah I am fine with it.
Comment 18 Mike Christie 2005-10-24 12:32:11 EDT
Dave, I am rolling this up for U3. Just want to confirm that the patch in #7 is
the final version of the patch you wanted. I am fine with it. Also remind me if
there were any other fixes we had queued up for U3 related to this. I saw the DM
one you filled out but I saw it looks like a fix for that my be too invasive for
U3. Was there anything else?
Comment 19 Dave Wysochanski 2005-10-25 11:50:23 EDT
The patch in #7 is the final patch for this bugzilla.

Also, are you going to add the "iscsi-kill-session" attachment below?  It
doesn't fix the root of this bugzilla (patch in #7 does), but it's closely
related and a useful utility that easily implements the commands stated in the
README.

Thanks.
Comment 20 Mike Christie 2005-10-25 14:20:46 EDT
yeah it looks ok
Comment 22 Mike Christie 2005-11-23 14:52:25 EST
Note: Netapp will test patch.
Comment 27 Andrius Benokraitis 2006-01-09 11:03:19 EST
This fix has been included in RHEL4 U3 beta.

Action: Netapp, please test this change to iscsi-initiator-utils and give
feedback to Red Hat ASAP.
Comment 28 Dave Wysochanski 2006-01-09 15:50:16 EST
Fix confirmed.

Thanks.
Comment 31 Red Hat Bugzilla 2006-03-07 13:50:29 EST
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2006-0109.html

Note You need to log in before you can comment on or make changes to this bug.