583218 – iscsid preventing machine shutdown or reboot

Bug 583218 - iscsid preventing machine shutdown or reboot

Summary: iscsid preventing machine shutdown or reboot

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	initscripts
Sub Component:
Version:	5.5
Hardware:	All
OS:	Linux
Priority:	urgent
Severity:	urgent
Target Milestone:	rc
Target Release:	---
Assignee:	initscripts Maintenance Team
QA Contact:	qe-baseos-daemons
Docs Contact:
URL:
Whiteboard:
Duplicates (2):	584912 590173 (view as bug list)
Depends On:
Blocks:	630538
TreeView+	depends on / blocked

Reported:	2010-04-17 05:37 UTC by Matt Clark
Modified:	2018-11-14 19:47 UTC (History)
CC List:	41 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:	Prior to this update, an attempt to reboot or shut down a system with a running Internet Small Computer System Interface (iSCSI) daemon may have caused the system to stop responding. This was caused by the fact that the system was waiting for iSCSI devices to sync, even though the network was already shut down. With this update, the /etc/rc.d/init.d/network startup script has been modified not to deactivate network interfaces when the iSCSI daemon is running, and the system can be shut down or rebooted as expected.
Clone Of:
Clones:	713162 (view as bug list)
Environment:
Last Closed:	2011-01-13 23:06:09 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Screenshot showing connection and timeout issues. (3.10 MB, application/octet-stream) 2010-04-17 05:37 UTC, Matt Clark	no flags	Details
screenshot of shutdown when using the redhat 5.4 iscsid script (84.38 KB, image/jpeg) 2010-04-17 05:39 UTC, Matt Clark	no flags	Details
Patch for /etc/init.d/network to check iSCSI sessions. (538 bytes, patch) 2010-07-06 10:25 UTC, Attila Lajko	no flags	Details \| Diff
Don't turn of net on shutdown if iscsi is running (546 bytes, patch) 2010-09-25 08:38 UTC, Mike Christie	no flags	Details \| Diff
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2011:0075	0	normal	SHIPPED_LIVE	initscripts bug fix update	2011-01-12 17:22:01 UTC

Description Matt Clark 2010-04-17 05:37:17 UTC

Created attachment 407233 [details]
Screenshot showing connection and timeout issues.

Description of problem:
The changes made to the /etc/init.d/iscsid script are preventing shutdown or reboot of the machine.

Version-Release number of selected component (if applicable):
iscsi-initiator-utils-6.2.0.871-0.16.el5

How reproducible:
Always


Steps to Reproduce:
1. Add an iscsi device to a RHEL5.5 machine through iscsiadm
2. Reboot machine
  
Actual results:
Machine hangs waiting for sync of iscsi devices after network is down.

Expected results:
Machine reboots.

Additional info:
Info from the job logged with Redhat support
16-APR-2010 04:38:33 Matt Clark 
File shutdown-screen-old-iscsid.jpg attached
 
16-APR-2010 04:38:32 Matt Clark 
Ok, it now is working with the old 5.4 shutdown scripts that delete the network shutdown sym links from the rc0.d and rc6.d directory. 
 
Attached a screen shot of the behaviour now that the network is not getting shut down as part of runlevel 0 or 6. 
 
The screen shot is from a fresh 5.5 build of redhat with the 2.6.18-194 kernel and the /etc/init.d/iscsid script from the iscsi-initiator-utils on redhat 5.4 (iscsi-initiator-utils-6.2.0.871-0.10.el5.x86_64.rpm). 
 
It seems to fail at a different layer and is obviously not ideal, this probably goes back to the thread that I pasted in an earlier entry. There is something missing to allow this to all happen gracefully. The error that is now occuring responds immediately instead of giving the timeout. Could be something like the network layer is still there but the ports are down causing an immediate error to the attempts to contact the iscsi device. 
 
16-APR-2010 04:15:52 Matt Clark 
Sorry that should be the /etc/rc6.d directory that I would like a listing of. 
 
Anyway, I had a look at the iscsi and iscsid kill order for 5.4 and they match what I have for 5.5. So this is not the problem.... 
 
I did find a difference in the two iscsi shutdown scripts 
matt@axiom tmp]$ diff iscsid-5.4 iscsid-5.5  
19,26d18 
< echo -n $"Turning off network shutdown. " 
< # we do not want iscsi or network to run during system shutdown 
< # incase there are RAID or multipath devices using 
< # iscsi disks 
< chkconfig --level 06 network off 
< rm /etc/rc0.d/*network 
< rm /etc/rc6.d/*network 
<  
31a24 
> modprobe -q be2iscsi 
71a65,66 
> rmmod be2iscsi 2>/dev/null 
 
Looking at the comment above it seems pretty clear that this is exactly my case. I will try using the 5.4 shutdown script and see what happens, but obviously the long term solution would be to get this fixed in the release as I don't want to be using any non-standard redhat scripts on these machines. Are you able to find out the reasoning for the change? 
 
Thanks, 
Matt.  
16-APR-2010 03:26:55 Matt Clark 
Hi Kenji, 
 
To answer your questions:- 
1. There was no IO on the system (devices not even mounted). 
2. Sync not necessary as devices not mounted. 
3. The connection errors are because the network has shutdown through the init scripts and therefore can't get to the iscsi device. 
4. Yes we are using the Redhat iscsi-iniator-utils. 
5. Downgraded the kernel and now find that I am having the same problem on 2.6.18-164. So maybe it's related to changes in the iscsi package. That explains why you didn't see the problem with 2.6.18-194 installed on your 5.4 system. 
 
Just to re-iterate, this problem wasn't there in 5.4 and I use exactly the same kickstart to rebuild these machines as fresh 5.5 machines. I did not do a yum upgrade from 5.4. 
 
Hopefully this is a simple issue with rcX.d shutdown ordering. I'll try and reinstall a 5.4 image. Actually if you could give me an ls of the /etc/rc5.d dir from your test system that could possibly help. 
 
Thanks, 
Matt.  
16-APR-2010 03:26:55 Matt Clark 
Status changes from "Waiting on Customer" to "Waiting on Red Hat".  
15-APR-2010 06:30:24 Suzuoki, Kenji 
Status changes from "Waiting on Red Hat" to "Waiting on Customer".  
15-APR-2010 06:05:19 Suzuoki, Kenji 
Hello,

I could not reproduce the issue by simply upgrading the kernel to 2.6.18-194.el5. 

As the message suggest that the "Synchronizing SCSI cache for disk <disk name>" message seems that the kernel sends a instruction to the device through the normal SCSI command structure and it waits for the command to complete.

Could you provide us the following information ?

1. Was the system doing some I/O to the iSCSI device when shutting down ?

2. Can you execute sync command before shutting / rebooting the system to see if you still encounter the issue ?

3. There are many connection error related message in the screen shot. Are you aware of the cause and where it is from ?

4. Does the system use iscsi-initiator-utils package provided by Red Hat ?

5. Reproducer 

Since we did not manage to reproduce the issue by simply updating the kernel, we wonder if you could provide us the detailed steps of how to reproduce the issue.

Best Regards,

Kenji Suzuoki
Red Hat Global Support Services

 
15-APR-2010 04:16:49 Suzuoki, Kenji 
Hi Matt,

My name is Kenji Suzuoki, a technician in APAC region. I have taken the ownership of the issue.

From the URL and the symptom, it seems like a bug. I am currently trying to upgrade my RHEL5.4 system from 2.6.18-164.el5 to 2.6.18-194.el5 to see if it is reproducible in my end as well.

I will get back to you once I get some more information or managed to reproduce the issue.

Also it would be greatly appreciated if you can switch the kernel version to RHEL5.4 (2.6.18-164.el5) and confirm that the problem does NOT appear with the same setting on your end as well.


Best Regards,

Kenji Suzuoki
Red Hat Global Support Services

 
14-APR-2010 04:16:17 Matt Clark 
Sorry that should read that the VM's are NOT set to autostart on this machine hence are not part of the issue.  
14-APR-2010 04:14:26 Matt Clark 
File sosreport-mclark.2011844-996904-82cf37.tar.bz2 attached
 
14-APR-2010 04:14:25 Matt Clark 
Hi Dominic, 
 
The machine doesn't reboot after this. Because there are 4 interfaces on the iscsi and there are 2 LUN's shared, I would expect that it might take 24 (4x2x3) minutes to reboot. I did leave a machine in this state for quite some time and I am pretty sure it was longer than this, so the answer is I am not sure. Even at 24 minutes that is just too long. 
 
As for virtual machines, yes there are virtual machines on this physical host however my tests were done without firing up any of the virtual machines. I.e. a reboot directly after the machine started (and the VM's are set to autostart). 
 
SOS report attached. 
 
Thread were something that sounds like this issue is discussed. 
http://osdir.com/ml/linux.iscsi.open-iscsi/2008-05/msg00198.html 
 
RHEL version is 5.5 and I can confirm this was not happening in 5.4. Unfortunately I need 5.5 to solve bug 487763 (multiple MAC's on a bonded interface with respect to the bridging interface). 
 
Thanks, 
Matt.  
14-APR-2010 04:14:25 Matt Clark 
Status changes from "Waiting on Customer" to "Waiting on Red Hat".  
13-APR-2010 13:10:36 Dominic Padinjattumkara Geevarghese 
Status changes from "Waiting on Red Hat" to "Waiting on Customer".  
13-APR-2010 13:04:43 Dominic Padinjattumkara Geevarghese 
Dear Sir,

Thank you for contacting Red Hat Support.

Are you using iscsi on VMs ?.  Please provide details.

Also, I am able to see the line "timing out command, waited 180s " from the screenshot.
Does it reboot after 180s ? Or it wait even after mentioned time ?

What is the RHEL version you are using ?. It would be great if you could provide a sosreport to understand the 
general configuration details, logs etc. Please refer http://kbase.redhat.com/faq/docs/DOC-2366

Thanks,
Dominic
 
13-APR-2010 05:07:14 Dominic Padinjattumkara Geevarghese 
Status changes from "Open" to "Waiting on Red Hat".  
13-APR-2010 03:52:48 Matt Clark 
There seems to be an issue with the ordering of the removal of the network and the flushing of the scsi cache. 
 
Looking at a few posts it seems that the flushing of the scsi cache is a kernel feature (and not something that can be run before the network is taken down). 
 
Basically the network is taken down, and then as one of the very last steps the md devices are stopped which triggers a scsi cache sync and this hangs as there is no access to the iSCSI device. 
 
Is there something I can do to avoid this?  
13-APR-2010 03:52:48 Matt Clark 
File iscsi-cache-issue.bmp attached.

Comment 1 Matt Clark 2010-04-17 05:39:36 UTC

Created attachment 407234 [details]
screenshot of shutdown when using the redhat 5.4 iscsid script

This screen shot is from a fresh build of redhat 5.5 with only the /etc/init.d/iscsid script replaced with the one from the redhat 5.4 iscsi-initiator-utils.

Comment 2 Mike Christie 2010-04-20 06:12:39 UTC

(In reply to comment #0)
> Is there something I can do to avoid this?  
> 13-APR-2010 03:52:48 Matt Clark 
> File iscsi-cache-issue.bmp attached.    

You can work around this problem by changing the cache settings on the target,
so it does not require a cache sync to be sent on shutdown. I think you would
set the cache settings to something like write through. If this is not possible
I think you can run the

chkconfig --level 06 network off

by hand.

However, I am working on a fix and should be done shortly. I think all we need
is a

iscsiadm -m node --logout=all

call added to the /etc/init.d/iscsi script in the "stop" section, but I am have
to double check that for boot, it is setting the node.startup=boot so boot/root
sessions do not get shutdown too.

Comment 3 Matt Clark 2010-04-20 22:17:45 UTC

I am a bit lacking in the understanding of how the iscsiadm persistency works, so this may be an irrelevant question but wouldn't that mean you would have to re-login to the each of the iscsi portals at boot? Or does the automatic login still function as a result of the entries in /var/lib/iscsi/send_targets?

I don't have a test machine to play with for the next couple of days so I can't try this myself...

Comment 4 Mike Christie 2010-04-20 23:00:31 UTC

(In reply to comment #3)
> I am a bit lacking in the understanding of how the iscsiadm persistency works,
> so this may be an irrelevant question but wouldn't that mean you would have to
> re-login to the each of the iscsi portals at boot? Or does the automatic login


We already log into all the targets at boot. Currently when you shutdown/reboot, the session does not get a complete shutdown. There is no iscsi logout sent. But the disks are synced if needed. On startup then the initiator sends a login command and the target recognizes this as being a continuation of the old session or starts a new one if it has cleaned up the old one.

Comment 5 Mike Christie 2010-05-08 02:16:58 UTC

Just wanted to update with some status.

My first fix that I tried in comment #2 broke setups that did iscsi root. I thought they used the startup=boot flags, but do not.

I am working on a more complex fix.

Comment 6 Mike Christie 2010-05-10 17:44:18 UTC

*** Bug 590173 has been marked as a duplicate of this bug. ***

Comment 7 Mike Christie 2010-06-07 21:27:55 UTC

Hi,

I am still working on a fix for this. I just wanted to add a temp workaround. You can just run the same commands that the iscsi script was running. However, you only need to turn this when you have made changes to the net init scripts (like when you update your system or init scripts rpm). The iscsi scripts ran it every time the iscsi script ran incase a user updated the net init scripts settings after installing the iscsi tools.

So after you have installed iscsi-initiator-utils and the init scripts just run:

chkconfig --level 06 network off
rm /etc/rc0.d/*network
rm /etc/rc6.d/*network

Comment 9 Mike Christie 2010-07-04 19:08:11 UTC

*** Bug 584912 has been marked as a duplicate of this bug. ***

Comment 10 Attila Lajko 2010-07-06 10:25:15 UTC

Created attachment 429730 [details]
Patch for /etc/init.d/network to check iSCSI sessions.

Hi, 

I made a modification in a /etc/init.d/network to check if there is an existing iSCSI session during reboot/shutdown. If there is one, the network service does not stop.

Comment 12 Mike Christie 2010-07-06 17:04:24 UTC

(In reply to comment #10)
> Created an attachment (id=429730) [details]
> Patch for /etc/init.d/network to check iSCSI sessions.
> 
> Hi, 
> 
> I made a modification in a /etc/init.d/network to check if there is an existing
> iSCSI session during reboot/shutdown. If there is one, the network service does
> not stop.    

Nice. Thanks for the patch. I will check with the net scripts maintainer to see if it is ok with them. It seems to handle all the setups/scenarios.

Comment 15 rob 2010-07-07 17:34:15 UTC

(In reply to comment #10)
> Created an attachment (id=429730) [details]
> Patch for /etc/init.d/network to check iSCSI sessions.
> 
> Hi, 
> 
> I made a modification in a /etc/init.d/network to check if there is an existing
> iSCSI session during reboot/shutdown. If there is one, the network service does
> not stop.    

A good patch, but fails if there is more than one iSCSI session open.
This would do the job for multiple sessions:


if [ `find /sys/class/iscsi_session/ -mindepth 1 -maxdepth 1 -type d | wc -l` -ge 1 ]; then

Comment 16 saveline 2010-07-13 07:48:18 UTC

I can confirm that (In reply to comment #15)
> (In reply to comment #10)
> > Created an attachment (id=429730) [details] [details]
> > Patch for /etc/init.d/network to check iSCSI sessions.
> > 
> > Hi, 
> > 
> > I made a modification in a /etc/init.d/network to check if there is an existing
> > iSCSI session during reboot/shutdown. If there is one, the network service does
> > not stop.    
> 
> A good patch, but fails if there is more than one iSCSI session open.
> This would do the job for multiple sessions:
> 
> 
> if [ `find /sys/class/iscsi_session/ -mindepth 1 -maxdepth 1 -type d | wc -l`
> -ge 1 ]; then    

Yes it's true I tested your patch with "find /sys/class/iscsi_session/ -mindepth 1 -maxdepth 1 -type d | wc -l" and it's work very well. I use MSA2312i with 4 sessions.
If I use this:
[ -d /sys/class/iscsi_session/session* ] && echo "OK"
I got:
-bash: [: too many arguments

Is there any chance that this patch we'll appear in a next release.

Comment 17 Mike Christie 2010-07-13 08:49:27 UTC

(In reply to comment #16)
> Is there any chance that this patch we'll appear in a next release.    

For the net script patch in this bz, I am waiting on the init script maintainer to review the patch and ok it.

I made a iscsi-initiator-utils z stream release that added some code to turn off the network shutdown when iscsi rpm is installed (basically does what the iscsi init script was doing before). It is not perfect and is not a complete fix, but is is better than we have now. It is being tested now. Hopefully it will just be a band aid until we hear back from the init script maintainer.

Comment 18 Mike Christie 2010-07-13 08:50:50 UTC

(In reply to comment #17)
> I made a iscsi-initiator-utils z stream release that added some code to turn
> off the network shutdown when iscsi rpm is installed (basically does what the
> iscsi init script was doing before).

Oh yeah, I put the rpm I mentioned here:
http://people.redhat.com/mchristi/iscsi/rhel5.6/iscsi-initiator-utils/

Comment 22 Chris Schanzle 2010-08-02 22:54:18 UTC

I just moved my iscsi to a CentOS 5.5 system (the 5.4->5.5 system was fine), ran into this bug, tried iscsi-initiator-utils-6.2.0.871-0.18.el5.x86_64.rpm per comment 18, but failed to fix the issue.

I'm hanging on reboot while syncing scsi cache for sde, which is in /etc/fstab as:

/dev/sde1               /b                      xfs      noatime,_netdev,nodev 0 4

Boot-up is fine, iscsi logs in and /b is mounted.  Root is local disk.

Comment 23 Mike Christie 2010-08-03 19:28:39 UTC

What does:

chkconfig --list network

output?

Is there /etc/rc0.d/*network or /etc/rc6.d/*network links?

If you run:

chkconfig --level 06 network off
rm /etc/rc0.d/*network
rm /etc/rc6.d/*network   

by hand does it work (try several reboots to make sure something was not resetting the network init scripts to on)?

Comment 24 Chris Schanzle 2010-08-03 21:21:45 UTC

chkconfig --list network
network         0:off   1:off   2:on    3:on    4:on    5:on    6:off

ls -1 /etc/rc[06].d/*network
/etc/rc0.d/K90network
/etc/rc6.d/K90network

sudo chkconfig --level 06 network off

ls -1 /etc/rc[06].d/*network
/etc/rc0.d/K90network
/etc/rc6.d/K90network

sudo rm /etc/rc[06].d/*network

ls -1 /etc/rc[06].d/*network
ls: /etc/rc[06].d/*network: No such file or directory


rebooted twice (first time was okay) with no hang.  Seems /etc/rc[06].d/*network need to be manually removed.

Thanks, Mike!  Let me know if I can be of further assistance in coming to the final resolution.

Comment 28 Shyam Iyer 2010-08-27 22:51:52 UTC

Adding Dell's request for 5.5-z and RHEl5.6 fix.

Dell would be testing the fix.

Comment 31 Chris Greenough 2010-09-03 23:24:25 UTC

We are also seeing this issue with a Dell MD3000i using their delivered MPP multi-path drivers. The patch above using the find variation fixed the problem. It seems to me that these _netdev, non root, iSCSI devices SHOULD be removed before network is stopped. The following in /etc/init.d/iscsi is what is causing the iscsi scripts to not remove the devices. 

# If this is a final shutdown/halt, do nothing since
# lvm/dm, md, power path, etc do not always handle this
if [ "$RUNLEVEL" = "6" -o "$RUNLEVEL" = "0" -o "$RUNLEVEL" = "1" ]; then
success
return
fi

Which script should be monitoring these network dependent devices? Not sure. Should we just leave network up until the plug is pulled? 

Thanks for all the info! If there is anything I can do, or any information I can provide, please let me know.

-Chris

Comment 38 Mike Christie 2010-09-25 08:38:27 UTC

Created attachment 449573 [details]
Don't turn of net on shutdown if iscsi is running

This combines the patch from comment #10 with the comment from #15.

initscript devs, is this patch ok?

The previous fix I tried in this bz is not working when the initscripts are installed before iscsi.

Comment 39 Bill Nottingham 2010-09-27 16:57:31 UTC

Given that we silently exit regardless of the runlevel if root is on a network block device, not sure why we'd test the runlevel here. But it's a reasonable fix.

Comment 42 saveline 2010-09-28 14:08:51 UTC

I've tried patch from comment 38, and it's work very well on reboot (I used shutdown -r now).
I've just patched my /etc/init.d/network and reboot my host.

Comment 43 Edek Pienkowski 2010-09-29 17:52:35 UTC

Just my two cents:

I remember that even when shutdown worked on the client, the TCP connections (or on ISCSI level too) according to target were not closed, preventing target reboot.

Are they closed properly now?

Comment 44 Mike Christie 2010-09-29 23:07:14 UTC

(In reply to comment #43)
> Just my two cents:
> 
> I remember that even when shutdown worked on the client, the TCP connections
> (or on ISCSI level too) according to target were not closed, preventing target
> reboot.
> 
> Are they closed properly now?

No. In RHEL 6 we do a explicit logout on shutdown/reboot, but in RHEL 5 we still leave them open due to apps using iscsi not being prepared for the devices to be removed (in RHEL5 apps thought it would work like fibre channel where during shutdown/reboot the /dev/sdXs do not get removed).

Comment 45 Matt 2010-10-07 17:21:43 UTC

(In reply to comment #31)
> We are also seeing this issue with a Dell MD3000i using their delivered MPP
> multi-path drivers. The patch above using the find variation fixed the problem.
> It seems to me that these _netdev, non root, iSCSI devices SHOULD be removed
> before network is stopped. The following in /etc/init.d/iscsi is what is
> causing the iscsi scripts to not remove the devices. 

Hi Chris, same issue here.  This is what worked for me:
The MPP driver install actually does handle this situation properly, using the less than elegant method of adding a few commands to /etc/init.d/iscsi in stop(). It adds the following code, between the check for root-on-iscsi and 'iscsiadm -m node --logoutall=all':
        #BEGIN_MPP_ADDITION 
        # added by MPP/RDAC driver to prevent filesystem corruption on mpp iscsi devices.
        if [ -x /opt/mpp/mppiscsi_umountall ] ; then
                /opt/mpp/mppiscsi_umountall -tkur5 
        fi
        #END_MPP_ADDITION

The problem is since the RUNLEVEL check from the comments above has been added, stop() returns before it gets there.  I moved the MPP addition above the RUNLEVEL check so it gets executed before stop() returns, which seems to work.  I'm tempted to remove the RUNLEVEL check so iscsiadm logs out properly, but I'm not sure I want to change more than I have to.

So, the stop() function in /etc/init.d/iscsi on my system starts like this:
stop()
{
        rm -f /var/lock/subsys/iscsi
        #BEGIN_MPP_ADDITION 
        # added by MPP/RDAC driver to prevent filesystem corruption on mpp iscsi devices.
        if [ -x /opt/mpp/mppiscsi_umountall ] ; then
                /opt/mpp/mppiscsi_umountall -tkur5 
        fi
        #END_MPP_ADDITION

        # If this is a final shutdown/halt, do nothing since
        # lvm/dm, md, power path, etc do not always handle this
....

The system reboots properly now, no longer hanging on "Syncing disk cache".

Comment 47 Matt 2010-10-15 18:07:26 UTC

Update:
My above fix worked until I actually had a filesystem mounted, then back to hanging on Syncing disk cache.  The filesystem was mounted with _netdev, so it was unmounted early in the shutdown sequence (checked with a 'mount' to print out during the process).

The next workaround was to revert my above changes and stop the physical interfaces from shutting down, which works.  Is there a disadvantage to leaving the network adapters up until power-off/reboot?

/etc/init.d/network:
246c246,249
<       for i in $vpninterfaces $xdslinterfaces $bridgeinterfaces $vlaninterfaces $remaining; do
---
> # MAP 20101013 - remove 'remaining' set (physical) since it hoses up iscsi
> # shutdown / mpp
>       #for i in $vpninterfaces $xdslinterfaces $bridgeinterfaces $vlaninterfaces $remaining; do
>       for i in $vpninterfaces $xdslinterfaces $bridgeinterfaces $vlaninterfaces ; do


One issue with this would be if the iSCSI route was on a vpn, xdsl, or bridge interface, since those still get shut down.

Comment 48 Mike Christie 2010-10-16 22:56:29 UTC

(In reply to comment #47)
> Update:
> My above fix worked until I actually had a filesystem mounted, then back to
> hanging on Syncing disk cache.  The filesystem was mounted with _netdev, so it
> was unmounted early in the shutdown sequence (checked with a 'mount' to print
> out during the process).
> 
> The next workaround was to revert my above changes and stop the physical
> interfaces from shutting down, which works.  Is there a disadvantage to leaving
> the network adapters up until power-off/reboot?
> 

That is what we were doing prior to RHEL 5.5 which is why we are hitting this problem now. See the patch in comment #38 which leaves the network on if iscsi is running.

Also for nfs and iscsi root we do this now.

Comment 49 Abdel Jalal 2010-10-19 18:31:09 UTC

Mike,

I tried this patch from comment #38 and it worked - Do you know when it will be released?

Comment 50 Mike Christie 2010-10-20 20:59:25 UTC

(In reply to comment #49)
> Mike,
> 
> I tried this patch from comment #38 and it worked - Do you know when it will be
> released?

It looks like it is checked in and being QAd for 5.6.

Comment 51 Abdel Jalal 2010-10-22 23:33:57 UTC

I take my comment#49 back: Actually this did not work as I tried it without mapping any volumes to the host but once I mapped some volumes and rebooted, the host showed the soft panic below and the host never came back up - session logout did not help. The host was accessible via ssh. I used that to disable the iscsi ports then reboot and it worked then renabled them back again and restablish the sessions

iscsi package version: iscsi-initiator-utils-6.2.0.871-0.16.el5

Oct 22 17:40:15 kswc-warden shutdown[5304]: shutting down for system reboot
Oct 22 17:40:16 kswc-warden kernel: INFO: task events/0:14 blocked for more than 120 seconds.
Oct 22 17:40:16 kswc-warden kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 22 17:40:16 kswc-warden kernel: events/0      D ffff81012ff64000     0    14      1            15    13 (L-TLB)
Oct 22 17:40:16 kswc-warden kernel:  ffff810037f35a40 0000000000000046 ffffffff880755a6 0000000000000000
Oct 22 17:40:16 kswc-warden kernel:  ffff81012ff64000 000000000000000a ffff81012fb4b080 ffff81010271b080
Oct 22 17:40:16 kswc-warden kernel:  000000257f0dc7dd 00000000000033e7 ffff81012fb4b268 0000000000000001
Oct 22 17:40:16 kswc-warden kernel: Call Trace:
Oct 22 17:40:16 kswc-warden kernel:  [<ffffffff880755a6>] :scsi_mod:scsi_done+0x0/0x18
Oct 22 17:40:16 kswc-warden kernel:  [<ffffffff8006417d>] wait_for_completion+0x8f/0xa2
Oct 22 17:40:16 kswc-warden kernel:  [<ffffffff8008e16d>] default_wake_function+0x0/0xe
Oct 22 17:40:16 kswc-warden kernel:  [<ffffffff80064c6f>] __mutex_lock_slowpath+0x60/0x9b
Oct 22 17:40:16 kswc-warden kernel:  [<ffffffff80064cb9>] .text.lock.mutex+0xf/0x14
Oct 22 17:40:16 kswc-warden kernel:  [<ffffffff8009ecdc>] flush_workqueue+0x3f/0x87
Oct 22 17:40:16 kswc-warden kernel:  [<ffffffff8014e897>] cfq_exit_queue+0x14/0xf4
Oct 22 17:40:16 kswc-warden kernel:  [<ffffffff8014371a>] elevator_exit+0x29/0x45
Oct 22 17:40:16 kswc-warden kernel:  [<ffffffff801461f6>] blk_cleanup_queue+0x37/0x42
Oct 22 17:40:16 kswc-warden kernel:  [<ffffffff8807d6dd>] :scsi_mod:scsi_device_dev_release_usercontext+0x8f/0xd9
Oct 22 17:40:16 kswc-warden kernel:  [<ffffffff8009ebb9>] execute_in_process_context+0x23/0x5a
Oct 22 17:40:16 kswc-warden kernel:  [<ffffffff801519ef>] kobject_cleanup+0x53/0x7e
Oct 22 17:40:16 kswc-warden kernel:  [<ffffffff80151a1a>] kobject_release+0x0/0x9
Oct 22 17:40:16 kswc-warden kernel:  [<ffffffff80035748>] kref_put+0x6f/0x7a
Oct 22 17:40:16 kswc-warden kernel:  [<ffffffff8807c707>] :scsi_mod:scsi_probe_and_add_lun+0x9a0/0x9c9
Oct 22 17:40:16 kswc-warden kernel:  [<ffffffff8807ac4d>] :scsi_mod:scsi_execute_req+0x78/0xce
Oct 22 17:40:16 kswc-warden kernel:  [<ffffffff8807d00f>] :scsi_mod:__scsi_scan_target+0x410/0x5c7
Oct 22 17:40:16 kswc-warden kernel:  [<ffffffff880cc729>] :mppUpper:mpp_SynchronousIo+0x104/0x13d
Oct 22 17:40:16 kswc-warden kernel:  [<ffffffff8807d20b>] :scsi_mod:scsi_scan_channel+0x45/0x70
Oct 22 17:40:16 kswc-warden kernel:  [<ffffffff8807d2f6>] :scsi_mod:scsi_scan_host_selected+0xc0/0xfa
Oct 22 17:40:16 kswc-warden kernel:  [<ffffffff882fa9b9>] :mppVhba:mppLnx_vhba_regVirtualHost+0x673/0x691
Oct 22 17:40:16 kswc-warden kernel:  [<ffffffff882faaf4>] :mppVhba:mppLnx_register_virtual_hosts+0x11d/0x168
Oct 22 17:40:16 kswc-warden kernel:  [<ffffffff882fab81>] :mppVhba:mppLnx_vhbaScanHost+0x42/0x6f
Oct 22 17:40:16 kswc-warden kernel:  [<ffffffff882fae9e>] :mppVhba:mppLnx_vdAddWorkHandler+0x2f0/0x32b
Oct 22 17:40:16 kswc-warden kernel:  [<ffffffff882fabae>] :mppVhba:mppLnx_vdAddWorkHandler+0x0/0x32b
Oct 22 17:40:16 kswc-warden kernel:  [<ffffffff8004dc37>] run_workqueue+0x94/0xe4
Oct 22 17:40:16 kswc-warden kernel:  [<ffffffff8004a472>] worker_thread+0x0/0x122
Oct 22 17:40:16 kswc-warden kernel:  [<ffffffff8004a562>] worker_thread+0xf0/0x122
Oct 22 17:40:16 kswc-warden kernel:  [<ffffffff8008e16d>] default_wake_function+0x0/0xe
Oct 22 17:40:16 kswc-warden kernel:  [<ffffffff80032bdc>] kthread+0xfe/0x132
Oct 22 17:40:16 kswc-warden kernel:  [<ffffffff8005efb1>] child_rip+0xa/0x11
Oct 22 17:40:16 kswc-warden kernel:  [<ffffffff80032ade>] kthread+0x0/0x132
Oct 22 17:40:16 kswc-warden kernel:  [<ffffffff8005efa7>] child_rip+0x0/0x11
Oct 22 17:40:16 kswc-warden kernel:
Oct 22 17:40:16 kswc-warden kernel: INFO: task hald-probe-seri:4179 blocked for more than 120 seconds.
Oct 22 17:40:16 kswc-warden kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 22 17:40:17 kswc-warden kernel: hald-probe-se D ffff810080057aa0     0  4179   4077          4422  4177 (NOTLB)
Oct 22 17:40:17 kswc-warden kernel:  ffff81012d5f7db8 0000000000000082 0000000000000000 0000000000000001
Oct 22 17:40:17 kswc-warden kernel:  0000000000000296 0000000000000009 ffff81012d968820 ffff81012fc0c7a0
Oct 22 17:40:17 kswc-warden kernel:  00000024695de668 00000000000be794 ffff81012d968a08 000000032e08b180
Oct 22 17:40:17 kswc-warden kernel: Call Trace:
Oct 22 17:40:17 kswc-warden kernel:  [<ffffffff8009ec6f>] flush_cpu_workqueue+0x7f/0xad
Oct 22 17:40:17 kswc-warden kernel:  [<ffffffff800a1ba4>] autoremove_wake_function+0x0/0x2e
Oct 22 17:40:17 kswc-warden kernel:  [<ffffffff80064b05>] mutex_lock+0xd/0x1d
Oct 22 17:40:17 kswc-warden kernel:  [<ffffffff8009ecfd>] flush_workqueue+0x60/0x87
Oct 22 17:40:17 kswc-warden kernel:  [<ffffffff801a9f79>] release_dev+0x503/0x67b
Oct 22 17:40:17 kswc-warden kernel:  [<ffffffff80067b88>] do_page_fault+0x4fe/0x874
Oct 22 17:40:17 kswc-warden kernel:  [<ffffffff80053ca3>] tty_release+0x11/0x1a
Oct 22 17:40:17 kswc-warden kernel:  [<ffffffff80012ac5>] __fput+0xd3/0x1bd
Oct 22 17:40:17 kswc-warden kernel:  [<ffffffff80023bd1>] filp_close+0x5c/0x64
Oct 22 17:40:17 kswc-warden kernel:  [<ffffffff8001dff3>] sys_close+0x88/0xbd
Oct 22 17:40:17 kswc-warden kernel:  [<ffffffff8005e28d>] tracesys+0xd5/0xe0
Oct 22 17:40:17 kswc-warden kernel:

Comment 52 Mike Christie 2010-10-25 18:11:54 UTC

(In reply to comment #51)
> I take my comment#49 back: Actually this did not work as I tried it without
> mapping any volumes to the host but once I mapped some volumes and rebooted,
> the host showed the soft panic below and the host never came back up - session
> logout did not help. The host was accessible via ssh. I used that to disable
> the iscsi ports then reboot and it worked then renabled them back again and
> restablish the sessions

Did this ever work for you or did the problem just start in RHEL 5.5? We never logged out of sessions before. In RHEL 5.4 and before just left them running and network up. In RHEL 5.5 we brought down the network. The patch in this bz is just adding back the behavior of leaving the network up.

> 
> Oct 22 17:40:15 kswc-warden shutdown[5304]: shutting down for system reboot

> :scsi_mod:__scsi_scan_target+0x410/0x5c7

Why are you scanning the target at shutdown?

Comment 54 Jaromir Hradilek 2010-11-12 16:03:03 UTC

    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Prior to this update, an attempt to reboot or shut down a system with a running Internet Small Computer System Interface (iSCSI) daemon may have caused the system to stop responding. This was caused by the fact that the system was waiting for iSCSI devices to sync, even though the network was already shut down. With this update, the /etc/rc.d/init.d/network startup script has been modified not to deactivate network interfaces when the iSCSI daemon is running, and the system can be shut down or rebooted as expected.

Comment 55 Charlie Brady 2010-11-26 22:52:29 UTC

To avoid this:

....
Shutting down system logger:
find: /sys/class/iscsi_session/: No such file or directory
Shutting down interface eth0:
...

You could replace

if [ `find /sys/class/iscsi_session/ -mindepth 1 -maxdepth 1 -type d | wc -l`
-ge 1 ]; then

with:

if [ $(ls -d /sys/class/iscsi_session/*/. 2>/dev/null | wc -l) -ge 1 ]; then

Comment 60 errata-xmlrpc 2011-01-13 23:06:09 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-0075.html

Note You need to log in before you can comment on or make changes to this bug.

abdel.jalal
abdel.sadek
aveseb
bloch
bmr
bugzilla
cbuissar
cecilhsujp
charlieb-fedora-bugzilla
chris
coughlan
ctatman
cww
davdunc
dl-iop-bugzilla
gru
harald
james
jeff_burdette
jplans
lajko.attila
matthew.piechota
mbarker
mchristi
moshiro
mr_w
notting
pep
pveiga
rmusil
robin
rob
shiyer
spojenie
syeghiay
tao
tumeya
vchepkov
vogel
wwlinuxengineering
yuji.furui