Steps to Reproduce: 1. Install xen guest and include iscsi-initiator-utils package 2. Use iscsiadm to discover, login, and access an iscsi target 3. sync disks 4. hard-reboot domain: xm destroy dom1 5. start up domain: xm create dom1 6. Make sure the iscsi device shows up in /proc/scsi/scsi on guest 7. shutdown guest: xm shutdown dom1 8. monitor progress: xm console dom1 9. system will never dissapear from output of: xm list Actual results: Near the end of the shutdown process you see: ... Shutting down system logger: [ OK ] Shutting down hidd: [ OK ] Stopping iSCSI initiator service: KERNEL: assertion (!atomic_read(&sk->sk_rmem_alloc)) failed at net/netlink/af_netlink.c (145) [ OK ] Shutting down interface eth0: [ OK ] Shutting down interface eth1: [ OK ] ... Turning off swap: Turning off quotas: Unmounting pipe file systems: Unmounting file systems: Halting system... md: stopping all md devices. Synchronizing SCSI cache for disk sda: iscsi: can not broadcast skb (-3) connection0:0: iscsi: detected conn error (1011) -hangs- Expected results: Near the end of the shutdown process you see: ... Shutting down system logger: [ OK ] Shutting down hidd: [ OK ] Stopping iSCSI initiator service: [ OK ] Shutting down interface eth0: [ OK ] Shutting down interface eth1: [ OK ] ... Turning off swap: Turning off quotas: Unmounting pipe file systems: Unmounting file systems: Halting system... md: stopping all md devices. -domain dissapears from output of: xm list-
(In reply to comment #0) > Stopping iSCSI initiator service: KERNEL: assertion > (!atomic_read(&sk->sk_rmem_alloc)) failed at net/netlink/af_netlink.c (145) > [ OK ] This assert is fixed now. It is not the cause of your problem. I just am writing this so people reading the BZ do not get worried about it. > Shutting down interface eth0: [ OK ] > Shutting down interface eth1: [ OK ] > ... This may be the problem. Are these the interfaces the iscsi traffic goes through? If so then that is why the shutdown hangs down below. > Turning off swap: > Turning off quotas: > Unmounting pipe file systems: > Unmounting file systems: > Halting system... > md: stopping all md devices. > Synchronizing SCSI cache for disk sda: > iscsi: can not broadcast skb (-3) > connection0:0: iscsi: detected conn error (1011) > Down here the network is not up, but iscsi disks are still running so we sort of screwed. During kernel shutdown the scsi layer will send a cache sync command for each disk then wait for it to finish. But because the network is not up we cannot send iscsi commands, and ince userpsace is not up iscsid cannot handle the error and fail the command so we are stuck and we wait around forever. So you currently have to manually turn off networking shutdown. Bill or Miloslav, A while back I had proposed something like this --- S00killall.orig 2006-05-03 11:29:17.000000000 -0500 +++ S00killall.work 2006-05-03 11:29:48.000000000 -0500 @@ -20,8 +20,10 @@ for i in /var/lock/subsys/* ; do # Get the subsystem name. subsys=${i#/var/lock/subsys/} - # Networking could be needed for NFS root. + # Networking could be needed for NFS root or services like raid + # or multipath over iscsi [ $subsys = network ] && continue + [ $subsys = iscsi ] && continue # Bring the subsystem down. if [ -f /etc/init.d/$subsys.init ]; then But you guys did not like it. Are you still against it? For RHEL5/FC6 we have to support iscsi root boot and iscsi over lots of stuff so it would be nice to make it so the user does not have to touch anything. Should I instead do somethig like this in the iscsi script? + # we do not want iscsi or network to run during system shutdown + # incase there are RAID or multipath devices using + # iscsi disks + chkconfig --level 06 network off + rm /etc/rc0.d/*network + rm /etc/rc6.d/*network
Are your filesystems mounted with the _netdev option?
No they're not mounted with _netdev. The only iscsi device (sda) contains a GFS2 filesystem which was mounted at the time the shutdown was issued via xm. However, my understanding is the GFS2 service should have come down prior to the iscsi and network services. That means the filesystem should not have been mounted when iscsi was shutdown (the GFS2 service should have unmounted it). Seems to me, for iscsi devices, we should do the final sync when the iscsi service shuts down, and somehow make bloody sure it also removes any hooks which could cause a final SCSI sync on any iscsi devices later on as they're guaranteed to fail. That's just my non-developer opinion though :)
The reason I asked about _netdev is that is the option used by the network scripts to know whether or not to shutdown the network; it's somewhat tangiential to the iscsi/gfs shutdown order, but it's needed so that the network script knows you have a network FS.
I'll try setting it and see what happens.
Uh-oh...I get: kernel: GFS2: fsid=: unknown option: _netdev kernel: GFS2: fsid=: invalid mount option(s) kernel: GFS2: can't parse mount arguments Though this seems like a a seperate problem. Shall I open another bug on it or is it a known problem?
Is this being mounted by mount(8), or mount(2)?
mount(8), those were the entries in /var/log/messages.
Argh, I wasn't very clear, and sorry about the delay. _netdev is used in fstab to characterize network filesystems. Is this fs in fstab, or mounted by hand?
Yes, I put _netdev in fstab when I got that GFS2 error. Though I think those GFS2 errors are a seperate problem. The filesystem was in fstab and it was being mounted by hand. Wouldn't it be valid to use _netdev with an ext3 filesystem over iSCSI in a similar manner? Unfortunately my reproducer for this problem has been formatted and re-installed. However it should be fairly easy to recreate the setup in the lab.
I'm not sure if this should'nt be cloned to RHEL 5 final. Still a RHEL 5 machine with a GFS2 FS on a iSCSI target does not reboot. I haven't yet found an easy workaround except manual shutdown.
Was this ever solved?
In Redhat 5.3 this code was present in init.d/iscsid: chkconfig --level 06 network off rm -f /etc/rc0.d/*network rm -f /etc/rc6.d/*network But in 5.5 it was removed and now systems hangs during shutdown. Was there another solution introduced in 5.5 I have to enable? Thanks
For properly configured iscsi devices (with _netdev/_rnetdev set, etc.) the network scripts will not shut down networking.
well, that would imply they have to be mounted from /etc/fstab what about autofs? pacemaker?
By my reading of /etc/init.d/network in RHEL 5.5 the network scripts check /proc/mounts and/or /etc/mtab so the situations you're concerned about should be covered (assuming autofs/pacemaker specify _netdev option for iscsi filesystems).
But it could be not even a file system, just a block device for sbd daemon, for example. if you just login to the targets and don't do anything at all, you wouldn't be able to reboot your server. That's what happening to me, for example: Signaling Corosync Cluster Engine (corosync) to terminate: [ OK ] Waiting for corosync services to unload:..[ OK ] Stopping sshd: [ OK ] Shutting down sm-client: [ OK ] Shutting down sendmail: [ OK ] Shutting down ntpd: [ OK ] Stopping system message bus: [ OK ] Shutting down kernel logger: [ OK ] Shutting down system logger: [ OK ] Shutting down interface eth0: [ OK ] Shutting down interface eth1: [ OK ] Shutting down loopback interface: [ OK ] Starting killall: [ OK ] Sending all processes the TERM signal... Sending all processes the KILL signal... Saving random seed: Syncing hardware clock to system time Cannot access the Hardware Clock via any known method. Use the --debug option to see the details of our search for an access method. Turning off swap: Please stand by while rebooting the system... md: stopping all md devices. Synchronizing SCSI cache for disk sdb: connection1:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4299361323, last ping 4299362602, now 4299363876 connection1:0: detected conn error (1011) That's it, system is toasted
Vadym, You are hitting this: https://bugzilla.redhat.com/show_bug.cgi?id=583218
This bug/component is not included in scope for RHEL-5.11.0 which is the last RHEL5 minor release. This Bugzilla will soon be CLOSED as WONTFIX (at the end of RHEL5.11 development phase (Apr 22, 2014)). Please contact your account manager or support representative in case you need to escalate this bug.
Based on comment 14, and this being documented configuration requirements, I think we can just close this.