Bug 583218
Summary: | iscsid preventing machine shutdown or reboot | |||
---|---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Matt Clark <mattjclark0407> | |
Component: | initscripts | Assignee: | initscripts Maintenance Team <initscripts-maint-list> | |
Status: | CLOSED ERRATA | QA Contact: | qe-baseos-daemons | |
Severity: | urgent | Docs Contact: | ||
Priority: | urgent | |||
Version: | 5.5 | CC: | abdel.jalal, abdel.sadek, aveseb, bloch, bmr, bugzilla, cbuissar, cecilhsujp, charlieb-fedora-bugzilla, chris, coughlan, ctatman, cww, davdunc, dl-iop-bugzilla, gru, harald, james, jeff_burdette, jplans, lajko.attila, matthew.piechota, mbarker, mchristi, moshiro, mr_w, notting, pep, pveiga, rmusil, robin, rob, shiyer, spojenie, syeghiay, tao, tumeya, vchepkov, vogel, wwlinuxengineering, yuji.furui | |
Target Milestone: | rc | Keywords: | ZStream | |
Target Release: | --- | |||
Hardware: | All | |||
OS: | Linux | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | Bug Fix | ||
Doc Text: |
Prior to this update, an attempt to reboot or shut down a system with a running Internet Small Computer System Interface (iSCSI) daemon may have caused the system to stop responding. This was caused by the fact that the system was waiting for iSCSI devices to sync, even though the network was already shut down. With this update, the /etc/rc.d/init.d/network startup script has been modified not to deactivate network interfaces when the iSCSI daemon is running, and the system can be shut down or rebooted as expected.
|
Story Points: | --- | |
Clone Of: | ||||
: | 713162 (view as bug list) | Environment: | ||
Last Closed: | 2011-01-13 23:06:09 UTC | Type: | --- | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 630538 | |||
Attachments: |
Description
Matt Clark
2010-04-17 05:37:17 UTC
Created attachment 407234 [details]
screenshot of shutdown when using the redhat 5.4 iscsid script
This screen shot is from a fresh build of redhat 5.5 with only the /etc/init.d/iscsid script replaced with the one from the redhat 5.4 iscsi-initiator-utils.
(In reply to comment #0) > Is there something I can do to avoid this? > 13-APR-2010 03:52:48 Matt Clark > File iscsi-cache-issue.bmp attached. You can work around this problem by changing the cache settings on the target, so it does not require a cache sync to be sent on shutdown. I think you would set the cache settings to something like write through. If this is not possible I think you can run the chkconfig --level 06 network off by hand. However, I am working on a fix and should be done shortly. I think all we need is a iscsiadm -m node --logout=all call added to the /etc/init.d/iscsi script in the "stop" section, but I am have to double check that for boot, it is setting the node.startup=boot so boot/root sessions do not get shutdown too. I am a bit lacking in the understanding of how the iscsiadm persistency works, so this may be an irrelevant question but wouldn't that mean you would have to re-login to the each of the iscsi portals at boot? Or does the automatic login still function as a result of the entries in /var/lib/iscsi/send_targets? I don't have a test machine to play with for the next couple of days so I can't try this myself... (In reply to comment #3) > I am a bit lacking in the understanding of how the iscsiadm persistency works, > so this may be an irrelevant question but wouldn't that mean you would have to > re-login to the each of the iscsi portals at boot? Or does the automatic login We already log into all the targets at boot. Currently when you shutdown/reboot, the session does not get a complete shutdown. There is no iscsi logout sent. But the disks are synced if needed. On startup then the initiator sends a login command and the target recognizes this as being a continuation of the old session or starts a new one if it has cleaned up the old one. Just wanted to update with some status. My first fix that I tried in comment #2 broke setups that did iscsi root. I thought they used the startup=boot flags, but do not. I am working on a more complex fix. *** Bug 590173 has been marked as a duplicate of this bug. *** Hi, I am still working on a fix for this. I just wanted to add a temp workaround. You can just run the same commands that the iscsi script was running. However, you only need to turn this when you have made changes to the net init scripts (like when you update your system or init scripts rpm). The iscsi scripts ran it every time the iscsi script ran incase a user updated the net init scripts settings after installing the iscsi tools. So after you have installed iscsi-initiator-utils and the init scripts just run: chkconfig --level 06 network off rm /etc/rc0.d/*network rm /etc/rc6.d/*network *** Bug 584912 has been marked as a duplicate of this bug. *** Created attachment 429730 [details]
Patch for /etc/init.d/network to check iSCSI sessions.
Hi,
I made a modification in a /etc/init.d/network to check if there is an existing iSCSI session during reboot/shutdown. If there is one, the network service does not stop.
(In reply to comment #10) > Created an attachment (id=429730) [details] > Patch for /etc/init.d/network to check iSCSI sessions. > > Hi, > > I made a modification in a /etc/init.d/network to check if there is an existing > iSCSI session during reboot/shutdown. If there is one, the network service does > not stop. Nice. Thanks for the patch. I will check with the net scripts maintainer to see if it is ok with them. It seems to handle all the setups/scenarios. (In reply to comment #10) > Created an attachment (id=429730) [details] > Patch for /etc/init.d/network to check iSCSI sessions. > > Hi, > > I made a modification in a /etc/init.d/network to check if there is an existing > iSCSI session during reboot/shutdown. If there is one, the network service does > not stop. A good patch, but fails if there is more than one iSCSI session open. This would do the job for multiple sessions: if [ `find /sys/class/iscsi_session/ -mindepth 1 -maxdepth 1 -type d | wc -l` -ge 1 ]; then I can confirm that (In reply to comment #15) > (In reply to comment #10) > > Created an attachment (id=429730) [details] [details] > > Patch for /etc/init.d/network to check iSCSI sessions. > > > > Hi, > > > > I made a modification in a /etc/init.d/network to check if there is an existing > > iSCSI session during reboot/shutdown. If there is one, the network service does > > not stop. > > A good patch, but fails if there is more than one iSCSI session open. > This would do the job for multiple sessions: > > > if [ `find /sys/class/iscsi_session/ -mindepth 1 -maxdepth 1 -type d | wc -l` > -ge 1 ]; then Yes it's true I tested your patch with "find /sys/class/iscsi_session/ -mindepth 1 -maxdepth 1 -type d | wc -l" and it's work very well. I use MSA2312i with 4 sessions. If I use this: [ -d /sys/class/iscsi_session/session* ] && echo "OK" I got: -bash: [: too many arguments Is there any chance that this patch we'll appear in a next release. (In reply to comment #16) > Is there any chance that this patch we'll appear in a next release. For the net script patch in this bz, I am waiting on the init script maintainer to review the patch and ok it. I made a iscsi-initiator-utils z stream release that added some code to turn off the network shutdown when iscsi rpm is installed (basically does what the iscsi init script was doing before). It is not perfect and is not a complete fix, but is is better than we have now. It is being tested now. Hopefully it will just be a band aid until we hear back from the init script maintainer. (In reply to comment #17) > I made a iscsi-initiator-utils z stream release that added some code to turn > off the network shutdown when iscsi rpm is installed (basically does what the > iscsi init script was doing before). Oh yeah, I put the rpm I mentioned here: http://people.redhat.com/mchristi/iscsi/rhel5.6/iscsi-initiator-utils/ I just moved my iscsi to a CentOS 5.5 system (the 5.4->5.5 system was fine), ran into this bug, tried iscsi-initiator-utils-6.2.0.871-0.18.el5.x86_64.rpm per comment 18, but failed to fix the issue. I'm hanging on reboot while syncing scsi cache for sde, which is in /etc/fstab as: /dev/sde1 /b xfs noatime,_netdev,nodev 0 4 Boot-up is fine, iscsi logs in and /b is mounted. Root is local disk. What does: chkconfig --list network output? Is there /etc/rc0.d/*network or /etc/rc6.d/*network links? If you run: chkconfig --level 06 network off rm /etc/rc0.d/*network rm /etc/rc6.d/*network by hand does it work (try several reboots to make sure something was not resetting the network init scripts to on)? chkconfig --list network network 0:off 1:off 2:on 3:on 4:on 5:on 6:off ls -1 /etc/rc[06].d/*network /etc/rc0.d/K90network /etc/rc6.d/K90network sudo chkconfig --level 06 network off ls -1 /etc/rc[06].d/*network /etc/rc0.d/K90network /etc/rc6.d/K90network sudo rm /etc/rc[06].d/*network ls -1 /etc/rc[06].d/*network ls: /etc/rc[06].d/*network: No such file or directory rebooted twice (first time was okay) with no hang. Seems /etc/rc[06].d/*network need to be manually removed. Thanks, Mike! Let me know if I can be of further assistance in coming to the final resolution. Adding Dell's request for 5.5-z and RHEl5.6 fix. Dell would be testing the fix. We are also seeing this issue with a Dell MD3000i using their delivered MPP multi-path drivers. The patch above using the find variation fixed the problem. It seems to me that these _netdev, non root, iSCSI devices SHOULD be removed before network is stopped. The following in /etc/init.d/iscsi is what is causing the iscsi scripts to not remove the devices. # If this is a final shutdown/halt, do nothing since # lvm/dm, md, power path, etc do not always handle this if [ "$RUNLEVEL" = "6" -o "$RUNLEVEL" = "0" -o "$RUNLEVEL" = "1" ]; then success return fi Which script should be monitoring these network dependent devices? Not sure. Should we just leave network up until the plug is pulled? Thanks for all the info! If there is anything I can do, or any information I can provide, please let me know. -Chris Created attachment 449573 [details] Don't turn of net on shutdown if iscsi is running This combines the patch from comment #10 with the comment from #15. initscript devs, is this patch ok? The previous fix I tried in this bz is not working when the initscripts are installed before iscsi. Given that we silently exit regardless of the runlevel if root is on a network block device, not sure why we'd test the runlevel here. But it's a reasonable fix. I've tried patch from comment 38, and it's work very well on reboot (I used shutdown -r now). I've just patched my /etc/init.d/network and reboot my host. Just my two cents: I remember that even when shutdown worked on the client, the TCP connections (or on ISCSI level too) according to target were not closed, preventing target reboot. Are they closed properly now? (In reply to comment #43) > Just my two cents: > > I remember that even when shutdown worked on the client, the TCP connections > (or on ISCSI level too) according to target were not closed, preventing target > reboot. > > Are they closed properly now? No. In RHEL 6 we do a explicit logout on shutdown/reboot, but in RHEL 5 we still leave them open due to apps using iscsi not being prepared for the devices to be removed (in RHEL5 apps thought it would work like fibre channel where during shutdown/reboot the /dev/sdXs do not get removed). (In reply to comment #31) > We are also seeing this issue with a Dell MD3000i using their delivered MPP > multi-path drivers. The patch above using the find variation fixed the problem. > It seems to me that these _netdev, non root, iSCSI devices SHOULD be removed > before network is stopped. The following in /etc/init.d/iscsi is what is > causing the iscsi scripts to not remove the devices. Hi Chris, same issue here. This is what worked for me: The MPP driver install actually does handle this situation properly, using the less than elegant method of adding a few commands to /etc/init.d/iscsi in stop(). It adds the following code, between the check for root-on-iscsi and 'iscsiadm -m node --logoutall=all': #BEGIN_MPP_ADDITION # added by MPP/RDAC driver to prevent filesystem corruption on mpp iscsi devices. if [ -x /opt/mpp/mppiscsi_umountall ] ; then /opt/mpp/mppiscsi_umountall -tkur5 fi #END_MPP_ADDITION The problem is since the RUNLEVEL check from the comments above has been added, stop() returns before it gets there. I moved the MPP addition above the RUNLEVEL check so it gets executed before stop() returns, which seems to work. I'm tempted to remove the RUNLEVEL check so iscsiadm logs out properly, but I'm not sure I want to change more than I have to. So, the stop() function in /etc/init.d/iscsi on my system starts like this: stop() { rm -f /var/lock/subsys/iscsi #BEGIN_MPP_ADDITION # added by MPP/RDAC driver to prevent filesystem corruption on mpp iscsi devices. if [ -x /opt/mpp/mppiscsi_umountall ] ; then /opt/mpp/mppiscsi_umountall -tkur5 fi #END_MPP_ADDITION # If this is a final shutdown/halt, do nothing since # lvm/dm, md, power path, etc do not always handle this .... The system reboots properly now, no longer hanging on "Syncing disk cache". Update:
My above fix worked until I actually had a filesystem mounted, then back to hanging on Syncing disk cache. The filesystem was mounted with _netdev, so it was unmounted early in the shutdown sequence (checked with a 'mount' to print out during the process).
The next workaround was to revert my above changes and stop the physical interfaces from shutting down, which works. Is there a disadvantage to leaving the network adapters up until power-off/reboot?
/etc/init.d/network:
246c246,249
< for i in $vpninterfaces $xdslinterfaces $bridgeinterfaces $vlaninterfaces $remaining; do
---
> # MAP 20101013 - remove 'remaining' set (physical) since it hoses up iscsi
> # shutdown / mpp
> #for i in $vpninterfaces $xdslinterfaces $bridgeinterfaces $vlaninterfaces $remaining; do
> for i in $vpninterfaces $xdslinterfaces $bridgeinterfaces $vlaninterfaces ; do
One issue with this would be if the iSCSI route was on a vpn, xdsl, or bridge interface, since those still get shut down.
(In reply to comment #47) > Update: > My above fix worked until I actually had a filesystem mounted, then back to > hanging on Syncing disk cache. The filesystem was mounted with _netdev, so it > was unmounted early in the shutdown sequence (checked with a 'mount' to print > out during the process). > > The next workaround was to revert my above changes and stop the physical > interfaces from shutting down, which works. Is there a disadvantage to leaving > the network adapters up until power-off/reboot? > That is what we were doing prior to RHEL 5.5 which is why we are hitting this problem now. See the patch in comment #38 which leaves the network on if iscsi is running. Also for nfs and iscsi root we do this now. Mike, I tried this patch from comment #38 and it worked - Do you know when it will be released? (In reply to comment #49) > Mike, > > I tried this patch from comment #38 and it worked - Do you know when it will be > released? It looks like it is checked in and being QAd for 5.6. I take my comment#49 back: Actually this did not work as I tried it without mapping any volumes to the host but once I mapped some volumes and rebooted, the host showed the soft panic below and the host never came back up - session logout did not help. The host was accessible via ssh. I used that to disable the iscsi ports then reboot and it worked then renabled them back again and restablish the sessions iscsi package version: iscsi-initiator-utils-6.2.0.871-0.16.el5 Oct 22 17:40:15 kswc-warden shutdown[5304]: shutting down for system reboot Oct 22 17:40:16 kswc-warden kernel: INFO: task events/0:14 blocked for more than 120 seconds. Oct 22 17:40:16 kswc-warden kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Oct 22 17:40:16 kswc-warden kernel: events/0 D ffff81012ff64000 0 14 1 15 13 (L-TLB) Oct 22 17:40:16 kswc-warden kernel: ffff810037f35a40 0000000000000046 ffffffff880755a6 0000000000000000 Oct 22 17:40:16 kswc-warden kernel: ffff81012ff64000 000000000000000a ffff81012fb4b080 ffff81010271b080 Oct 22 17:40:16 kswc-warden kernel: 000000257f0dc7dd 00000000000033e7 ffff81012fb4b268 0000000000000001 Oct 22 17:40:16 kswc-warden kernel: Call Trace: Oct 22 17:40:16 kswc-warden kernel: [<ffffffff880755a6>] :scsi_mod:scsi_done+0x0/0x18 Oct 22 17:40:16 kswc-warden kernel: [<ffffffff8006417d>] wait_for_completion+0x8f/0xa2 Oct 22 17:40:16 kswc-warden kernel: [<ffffffff8008e16d>] default_wake_function+0x0/0xe Oct 22 17:40:16 kswc-warden kernel: [<ffffffff80064c6f>] __mutex_lock_slowpath+0x60/0x9b Oct 22 17:40:16 kswc-warden kernel: [<ffffffff80064cb9>] .text.lock.mutex+0xf/0x14 Oct 22 17:40:16 kswc-warden kernel: [<ffffffff8009ecdc>] flush_workqueue+0x3f/0x87 Oct 22 17:40:16 kswc-warden kernel: [<ffffffff8014e897>] cfq_exit_queue+0x14/0xf4 Oct 22 17:40:16 kswc-warden kernel: [<ffffffff8014371a>] elevator_exit+0x29/0x45 Oct 22 17:40:16 kswc-warden kernel: [<ffffffff801461f6>] blk_cleanup_queue+0x37/0x42 Oct 22 17:40:16 kswc-warden kernel: [<ffffffff8807d6dd>] :scsi_mod:scsi_device_dev_release_usercontext+0x8f/0xd9 Oct 22 17:40:16 kswc-warden kernel: [<ffffffff8009ebb9>] execute_in_process_context+0x23/0x5a Oct 22 17:40:16 kswc-warden kernel: [<ffffffff801519ef>] kobject_cleanup+0x53/0x7e Oct 22 17:40:16 kswc-warden kernel: [<ffffffff80151a1a>] kobject_release+0x0/0x9 Oct 22 17:40:16 kswc-warden kernel: [<ffffffff80035748>] kref_put+0x6f/0x7a Oct 22 17:40:16 kswc-warden kernel: [<ffffffff8807c707>] :scsi_mod:scsi_probe_and_add_lun+0x9a0/0x9c9 Oct 22 17:40:16 kswc-warden kernel: [<ffffffff8807ac4d>] :scsi_mod:scsi_execute_req+0x78/0xce Oct 22 17:40:16 kswc-warden kernel: [<ffffffff8807d00f>] :scsi_mod:__scsi_scan_target+0x410/0x5c7 Oct 22 17:40:16 kswc-warden kernel: [<ffffffff880cc729>] :mppUpper:mpp_SynchronousIo+0x104/0x13d Oct 22 17:40:16 kswc-warden kernel: [<ffffffff8807d20b>] :scsi_mod:scsi_scan_channel+0x45/0x70 Oct 22 17:40:16 kswc-warden kernel: [<ffffffff8807d2f6>] :scsi_mod:scsi_scan_host_selected+0xc0/0xfa Oct 22 17:40:16 kswc-warden kernel: [<ffffffff882fa9b9>] :mppVhba:mppLnx_vhba_regVirtualHost+0x673/0x691 Oct 22 17:40:16 kswc-warden kernel: [<ffffffff882faaf4>] :mppVhba:mppLnx_register_virtual_hosts+0x11d/0x168 Oct 22 17:40:16 kswc-warden kernel: [<ffffffff882fab81>] :mppVhba:mppLnx_vhbaScanHost+0x42/0x6f Oct 22 17:40:16 kswc-warden kernel: [<ffffffff882fae9e>] :mppVhba:mppLnx_vdAddWorkHandler+0x2f0/0x32b Oct 22 17:40:16 kswc-warden kernel: [<ffffffff882fabae>] :mppVhba:mppLnx_vdAddWorkHandler+0x0/0x32b Oct 22 17:40:16 kswc-warden kernel: [<ffffffff8004dc37>] run_workqueue+0x94/0xe4 Oct 22 17:40:16 kswc-warden kernel: [<ffffffff8004a472>] worker_thread+0x0/0x122 Oct 22 17:40:16 kswc-warden kernel: [<ffffffff8004a562>] worker_thread+0xf0/0x122 Oct 22 17:40:16 kswc-warden kernel: [<ffffffff8008e16d>] default_wake_function+0x0/0xe Oct 22 17:40:16 kswc-warden kernel: [<ffffffff80032bdc>] kthread+0xfe/0x132 Oct 22 17:40:16 kswc-warden kernel: [<ffffffff8005efb1>] child_rip+0xa/0x11 Oct 22 17:40:16 kswc-warden kernel: [<ffffffff80032ade>] kthread+0x0/0x132 Oct 22 17:40:16 kswc-warden kernel: [<ffffffff8005efa7>] child_rip+0x0/0x11 Oct 22 17:40:16 kswc-warden kernel: Oct 22 17:40:16 kswc-warden kernel: INFO: task hald-probe-seri:4179 blocked for more than 120 seconds. Oct 22 17:40:16 kswc-warden kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Oct 22 17:40:17 kswc-warden kernel: hald-probe-se D ffff810080057aa0 0 4179 4077 4422 4177 (NOTLB) Oct 22 17:40:17 kswc-warden kernel: ffff81012d5f7db8 0000000000000082 0000000000000000 0000000000000001 Oct 22 17:40:17 kswc-warden kernel: 0000000000000296 0000000000000009 ffff81012d968820 ffff81012fc0c7a0 Oct 22 17:40:17 kswc-warden kernel: 00000024695de668 00000000000be794 ffff81012d968a08 000000032e08b180 Oct 22 17:40:17 kswc-warden kernel: Call Trace: Oct 22 17:40:17 kswc-warden kernel: [<ffffffff8009ec6f>] flush_cpu_workqueue+0x7f/0xad Oct 22 17:40:17 kswc-warden kernel: [<ffffffff800a1ba4>] autoremove_wake_function+0x0/0x2e Oct 22 17:40:17 kswc-warden kernel: [<ffffffff80064b05>] mutex_lock+0xd/0x1d Oct 22 17:40:17 kswc-warden kernel: [<ffffffff8009ecfd>] flush_workqueue+0x60/0x87 Oct 22 17:40:17 kswc-warden kernel: [<ffffffff801a9f79>] release_dev+0x503/0x67b Oct 22 17:40:17 kswc-warden kernel: [<ffffffff80067b88>] do_page_fault+0x4fe/0x874 Oct 22 17:40:17 kswc-warden kernel: [<ffffffff80053ca3>] tty_release+0x11/0x1a Oct 22 17:40:17 kswc-warden kernel: [<ffffffff80012ac5>] __fput+0xd3/0x1bd Oct 22 17:40:17 kswc-warden kernel: [<ffffffff80023bd1>] filp_close+0x5c/0x64 Oct 22 17:40:17 kswc-warden kernel: [<ffffffff8001dff3>] sys_close+0x88/0xbd Oct 22 17:40:17 kswc-warden kernel: [<ffffffff8005e28d>] tracesys+0xd5/0xe0 Oct 22 17:40:17 kswc-warden kernel: (In reply to comment #51) > I take my comment#49 back: Actually this did not work as I tried it without > mapping any volumes to the host but once I mapped some volumes and rebooted, > the host showed the soft panic below and the host never came back up - session > logout did not help. The host was accessible via ssh. I used that to disable > the iscsi ports then reboot and it worked then renabled them back again and > restablish the sessions Did this ever work for you or did the problem just start in RHEL 5.5? We never logged out of sessions before. In RHEL 5.4 and before just left them running and network up. In RHEL 5.5 we brought down the network. The patch in this bz is just adding back the behavior of leaving the network up. > > Oct 22 17:40:15 kswc-warden shutdown[5304]: shutting down for system reboot > :scsi_mod:__scsi_scan_target+0x410/0x5c7 Why are you scanning the target at shutdown? Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: Prior to this update, an attempt to reboot or shut down a system with a running Internet Small Computer System Interface (iSCSI) daemon may have caused the system to stop responding. This was caused by the fact that the system was waiting for iSCSI devices to sync, even though the network was already shut down. With this update, the /etc/rc.d/init.d/network startup script has been modified not to deactivate network interfaces when the iSCSI daemon is running, and the system can be shut down or rebooted as expected. To avoid this: .... Shutting down system logger: find: /sys/class/iscsi_session/: No such file or directory Shutting down interface eth0: ... You could replace if [ `find /sys/class/iscsi_session/ -mindepth 1 -maxdepth 1 -type d | wc -l` -ge 1 ]; then with: if [ $(ls -d /sys/class/iscsi_session/*/. 2>/dev/null | wc -l) -ge 1 ]; then An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2011-0075.html |