Created attachment 468237 [details] screenshot of shutdown problem Description of problem: Problem seems to be similar to Bug 583218 under RHEL5 How reproducible: always Steps to Reproduce: 1. activate iscsi volume on RHEL6 2. run reboot command Actual results: system stuck on shutdown - the last line it shows "shutting down loppback interface" Expected results: reboot Additional info:
I think this is a duplicate of: https://bugzilla.redhat.com/show_bug.cgi?id=634021 To login to the target did you do service iscsi start or did you manually login to the target: iscsiadm -m node ... -l/-L
I'm not sure I understand your question well. If you're asking how I activate iSCSI - the answer is using "service iscsi start", automatically: # chkconfig --list|grep scs iscsi 0:off 1:off 2:off 3:on 4:on 5:on 6:off iscsid 0:off 1:off 2:off 3:on 4:on 5:on 6:off
You can start the iscsi service by doing "service iscsi start" and that will log into target portals found in the iscsi db. But you can also log into targets manually by just running iscsiadm. So if you have discovered some portals that are not yet logged into you can do iscsiadm -m node -T target -p ip -l or iscsiadm -m node -l and this will log into the target portals. There is bug with this though. If you run iscsiadm by hand to login then when you reboot or run "service iscsi stop" then the sessions might not get logged out of. And iIf you are rebooting that can cause a hang. So just to confirm it is iscsi that is causing the problem, before you do the reboot, do iscsiadm -m node -u then reboot and report if it hangs. Also you are not doing iscsi root, right?
# iscsiadm -m node -u Logging out of session [sid: 1, target: iqn.2001-05.com.equallogic:0-8a0906-0b9469e03-def000ece0d4cfd5-vringo-1, portal: 172.30.2.2,3260] Logging out of session [sid: 2, target: iqn.2001-05.com.equallogic:0-8a0906-0b9469e03-def000ece0d4cfd5-vringo-1, portal: 172.30.2.2,3260] Logout of [sid: 1, target: iqn.2001-05.com.equallogic:0-8a0906-0b9469e03-def000ece0d4cfd5-vringo-1, portal: 172.30.2.2,3260] successful. Logout of [sid: 2, target: iqn.2001-05.com.equallogic:0-8a0906-0b9469e03-def000ece0d4cfd5-vringo-1, portal: 172.30.2.2,3260] successful. But system hang anyway.
Initscript developers, It does not look like a iscsi issue. In comment #5 we manually shutdown iscsi before rebooting so during shutdown there is not iscsi activity or sessions running.
If you enable sysrq, what does 'sysrq-t' or 'sysrq-p' show? (You may need a serial console to catch it all.)
Unfortunately, I wasn't able to get any dump using 'sysrq-t' or 'sysrq-p' [even I enabled sysrq]. Probably it because some DRAC (Dell's remote console) issue. I'll ask my hosting company for help, may be they will be able to provide more details. thank you, Vitaly
I'm also seeing this issue on my RHEL6 64bit test bed. During shutdown the system locks on Not stopping iscsid: iscsi sessions still active If I manually run iscsiadm -m node -u Then the following exits correctly service iscsid stop I still have an issue on shutdown where there is a huge delay after displaying stopping cgconfig service [OK] If I then hit CTRL-ALT-Delete I get the following but the machine won't power off Could not log bootup: Address already in use This has been tracked in Fedora under - https://bugzilla.redhat.com/show_bug.cgi?id=654762 - https://bugzilla.redhat.com/show_bug.cgi?id=642347
(In reply to comment #9) > I'm also seeing this issue on my RHEL6 64bit test bed. During shutdown the > system locks on > > Not stopping iscsid: iscsi sessions still active > > If I manually run > > iscsiadm -m node -u > > Then the following exits correctly > service iscsid stop Steven, Your bug looks like https://bugzilla.redhat.com/show_bug.cgi?id=634021 or at least will be fixed with the change in there.
Hi Mike Afraid I don't have access to https://bugzilla.redhat.com/show_bug.cgi?id=634021 so I can't fully comment. On shutdown the last displayed message is Not stopping iscsid: iscsi sessions still active I've disable quiet boot and rhgb. Any suggestions on how getting any additional output?
(In reply to comment #11) > Hi Mike > > Afraid I don't have access to > https://bugzilla.redhat.com/show_bug.cgi?id=634021 so I can't fully comment. > > On shutdown the last displayed message is > Not stopping iscsid: iscsi sessions still active > > I've disable quiet boot and rhgb. > > Any suggestions on how getting any additional output? you could add set +x after the first line in /etc/init.d/halt /etc/rc /etc/init.d/iscsi and /etc/init.d/iscsid of course this will give a lot of output, but you will see the last one
Just tried a reboot with the above suggestion for adjusting the halt/iscsi/iscid scripts but didn't get much if any additional output. I did get a big pause on iscsi during shutdown and then I got a kernel panic. I'll attach a pair of screen shots showing the output.
Created attachment 473954 [details] System paused on this iscsi message for a couple of minutes
Created attachment 473955 [details] Device mapper kernel panic during shutdown Once the iscsi error times out shutdown continued until this kernel panic appeared
You are hitting the bug in 634021. I will send you a patch to test in a email so this bz is not cluttered and I will figure out what I am supposed to do so you can see the other bz.
Steven, Could you try the rpm here http://people.redhat.com/mchristi/iscsi/rhel6.1/iscsi-initiator-utils/
Hi Mike. Tried the updated version that you suggested and I'm afraid it actually made things worse. Currently installed package * iscsi-initiator-utils-6.2.0.872-10.el6.x86_64 Proposed newer package * iscsi-initiator-utils-6.2.0.872-15.el6.x86_64 I upgraded to iscsi-initiator-utils-6.2.0.872-15.el6.x86_64 and rebooted the machine. It had the same shutdown issues as before and the same kernel panic messages. I forced a reboot and then brought the machine up in single user mode to confirm no local storage issues. I then tried a clean reboot with the updated version of iscsi-initiator-utils and the system hung trying to mount an LVM filesystem sitting on iSCSI storage (see iscsi-lvm-error.png). Logging in showed that the iscsi daemon hadn't been started before the OS tried to mount filesystems tagged with _netdev in /etc/fstab, and thus causing the error. If I manually started iscsi at this time the system found the storage. I've attached two text files boot.broken and boot.works showing the section of /var/log/messages around the boot for the old and new versions of iscsi-initiator-utils. I then tried a normal shutdown of the system with your new version of iscsi-initiator-utils, and I still get the same errors and kernel panic on shutdown. For the moment I've reverted the version back to iscsi-initiator-utils-6.2.0.872-10.el6.x86_64 and I've confirmed that I now have a clean boot process.
Created attachment 479432 [details] Screen shot of failure to mount LVM partition from iSCSI
Created attachment 479433 [details] Output of /var/log/messages showing a valid boot process
Created attachment 479434 [details] Output of /var/log/messages where iscsi hasn't started before network devices are mounted.
Created attachment 479435 [details] kernel panic on shutdown with upgraded iscsi package.
(In reply to comment #23) > Created attachment 479435 [details] > kernel panic on shutdown with upgraded iscsi package. Do you still see this the hang in "Not stopping iscsid" like here: https://bugzilla.redhat.com/show_bug.cgi?id=662433#c14
(In reply to comment #19) > > Logging in showed that the iscsi daemon hadn't been started before the OS tried > to mount filesystems tagged with _netdev in /etc/fstab, and thus causing the > error Sorry about that. That looks like a dumb mistake on my part. I fixed the startup bug now in iscsi-initiator-utils-6.2.0.872-17.el6. It can be downloaded here: http://people.redhat.com/mchristi/iscsi/rhel6.1/iscsi-initiator-utils/ Let me know if we still see the "Not stopping iscsid" message or if we hang in something like the network interface shutdown like here https://bugzilla.redhat.com/attachment.cgi?id=468237.
Still got some issues with your newer package. The boot issues have now been resolved but on shutdown the system still throws kernel errors The shutdown gets as far as the multipath failing messages (see iscis-multipath-failing-over.png) and then pauses for around 5 minutes. Then I get the usual kernel panic error (iscsi-new-kernel-panic.png). I'll see if I can arrange a Serial console into the blade where I'm testing this as at the moment I can't capture the full shutdown output as I have limited physical access to the hardware.
Created attachment 479800 [details] Multipath failing over
Created attachment 479801 [details] kernel panic on shutdown with iscsi-initiator-utils-6.2.0.872-17.el6.x86_64.rpm package.
I've now managed to reproduce the issue from inside a KVM guest so I've got full console output. First the output before I've patched the system - here is what mutipath can see [root@rh6test02 ~]# multipath -ll mpathc (360a98000503363775a5a43453338594d) dm-3 NETAPP,LUN size=150G features='1 queue_if_no_path' hwhandler='0' wp=rw `-+- policy='round-robin 0' prio=2 status=active |- 3:0:0:0 sdb 8:16 active ready running `- 2:0:0:0 sdc 8:32 active ready running mpathb (360a98000503363775a5a434c6e564a6b) dm-2 NETAPP,LUN size=180G features='1 queue_if_no_path' hwhandler='0' wp=rw `-+- policy='round-robin 0' prio=2 status=active |- 3:0:0:2 sdg 8:96 active ready running `- 2:0:0:2 sdf 8:80 active ready running mpatha (360a98000503363775a5a434533435976) dm-1 NETAPP,LUN size=150G features='1 queue_if_no_path' hwhandler='0' wp=rw `-+- policy='round-robin 0' prio=2 status=active |- 3:0:0:1 sde 8:64 active ready running `- 2:0:0:1 sdd 8:48 active ready running If we perform a shutdown we get Stopping Red Hat Network Daemon: [ OK ] Stopping atd: [ OK ] Stopping abrt daemon: [ OK ] Stopping sshd: [ OK ] Shutting down postfix: [ OK ] Stopping crond: [ OK ] Stopping acpi daemon: [ OK ] Stopping HAL daemon: [ OK ] Killing mdmonitor: [ OK ] Stopping system message bus: [ OK ] Stopping multipathd daemon: [ OK ] Stopping auditd: type=1305 audit(1298261350.484:14067): audit_pid=0 old=1537 auid=4294967295 ses=4294967295 subj=system_u:system_r:auditd_t:s0 res=1 [ OK ] Not stopping iscsid: iscsi sessions still active[WARNING] Shutting down interface eth0: [ OK ] Shutting down interface eth1: [ OK ] Shutting down loopback interface: [ OK ] IPv6 over IPv4 tunneling driver sit0: Disabled Privacy Extensions connection2:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4295875817, last ping 4295880817, now 4295885817 connection2:0: detected conn error (1011) connection1:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4295876635, last ping 4295881635, now 4295886635 connection1:0: detected conn error (1011) session2: session recovery timed out after 120 secs session1: session recovery timed out after 120 secs sd 2:0:0:1: [sdd] Unhandled error code sd 2:0:0:1: [sdd] Result: hostbyte=DID_TRANSPORT_FAILFAST driverbyte=DRIVER_OK sd 2:0:0:1: [sdd] CDB: Read(10): 28 00 12 bf ff 80 00 00 08 00 end_request: I/O error, dev sdd, sector 314572672 device-mapper: multipath: Failing path 8:48. sd 3:0:0:1: [sde] Unhandled error code sd 3:0:0:1: [sde] Result: hostbyte=DID_TRANSPORT_FAILFAST driverbyte=DRIVER_OK sd 3:0:0:1: [sde] CDB: Read(10): 28 00 12 bf ff 80 00 00 08 00 end_request: I/O error, dev sde, sector 314572672 device-mapper: multipath: Failing path 8:64. INFO: task vgs:2713 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. vgs D 0000000000000000 0 2713 2712 0x00000080 ffff880037bc5b88 0000000000000086 ffff880000000000 0000024380c8b294 ffff88003c8142c0 ffff88003bea0ec0 000000000001ead8 ffffffffb29e1f53 ffff880037bf70a8 ffff880037bc5fd8 0000000000010518 ffff880037bf70a8 Call Trace: [<ffffffff8109bae9>] ? ktime_get_ts+0xa9/0xe0 [<ffffffff814c9533>] io_schedule+0x73/0xc0 [<ffffffff811a688e>] __blockdev_direct_IO+0x70e/0xc40 [<ffffffff8125b82a>] ? kobject_get+0x1a/0x30 [<ffffffff811a44d7>] blkdev_direct_IO+0x57/0x60 [<ffffffff811a36c0>] ? blkdev_get_blocks+0x0/0xc0 [<ffffffff8110d8cb>] generic_file_aio_read+0x6db/0x730 [<ffffffff812071b1>] ? avc_has_perm+0x71/0x90 [<ffffffff81208c02>] ? selinux_inode_permission+0x72/0xb0 [<ffffffff8116ceda>] do_sync_read+0xfa/0x140 [<ffffffff81091de0>] ? autoremove_wake_function+0x0/0x40 [<ffffffff811a3a8c>] ? block_ioctl+0x3c/0x40 [<ffffffff8117fa12>] ? vfs_ioctl+0x22/0xa0 [<ffffffff8120c70b>] ? selinux_file_permission+0xfb/0x150 [<ffffffff811ffb76>] ? security_file_permission+0x16/0x20 [<ffffffff8116d905>] vfs_read+0xb5/0x1a0 [<ffffffff810d42b2>] ? audit_syscall_entry+0x272/0x2a0 [<ffffffff8116da41>] sys_read+0x51/0x90 [<ffffffff81013172>] system_call_fastpath+0x16/0x1b
Trying the newer iscsi-initiator-utils-6.2.0.872-17.el6.x86_64.rpm we get similar output on shutdown Stopping Red Hat Network Daemon: [ OK ] Stopping atd: [ OK ] Stopping abrt daemon: [ OK ] Stopping sshd: [ OK ] Shutting down postfix: [ OK ] Stopping crond: [ OK ] Stopping acpi daemon: [ OK ] Stopping HAL daemon: [ OK ] Killing mdmonitor: [ OK ] Stopping system message bus: [ OK ] Stopping multipathd daemon: [ OK ] Stopping auditd: type=1305 audit(1298263802.100:16777): audit_pid=0 old=1698 auid=4294967295 ses=4294967295 subj=system_u:system_r:auditd_t:s0 res=1 [ OK ] Stopping iscsi: [ OK ] Stopping iscsid: Shutting down interface eth0: [ OK ] Shutting down interface eth1: [ OK ] Shutting down loopback interface: [ OK ] IPv6 over IPv4 tunneling driver sit0: Disabled Privacy Extensions device-mapper: multipath: Failing path 8:80. device-mapper: multipath: Failing path 8:96. INFO: task vgs:2240 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. vgs D 0000000000000000 0 2240 2239 0x00000080 ffff88003b175b88 0000000000000086 ffff880000000000 000000efbfb3dd2e ffff880037e642c0 ffff88003d42ef30 0000000000011b9e ffffffffb29e12af ffff880037fac5f8 ffff88003b175fd8 0000000000010518 ffff880037fac5f8 Call Trace: [<ffffffff8109bae9>] ? ktime_get_ts+0xa9/0xe0 [<ffffffff814c9533>] io_schedule+0x73/0xc0 [<ffffffff811a688e>] __blockdev_direct_IO+0x70e/0xc40 [<ffffffff8125b82a>] ? kobject_get+0x1a/0x30 [<ffffffff811a44d7>] blkdev_direct_IO+0x57/0x60 [<ffffffff811a36c0>] ? blkdev_get_blocks+0x0/0xc0 [<ffffffff8110d8cb>] generic_file_aio_read+0x6db/0x730 [<ffffffff812071b1>] ? avc_has_perm+0x71/0x90 [<ffffffff81208c02>] ? selinux_inode_permission+0x72/0xb0 [<ffffffff8116ceda>] do_sync_read+0xfa/0x140 [<ffffffff81091de0>] ? autoremove_wake_function+0x0/0x40 [<ffffffff811a3a8c>] ? block_ioctl+0x3c/0x40 [<ffffffff8117fa12>] ? vfs_ioctl+0x22/0xa0 [<ffffffff8120c70b>] ? selinux_file_permission+0xfb/0x150 [<ffffffff811ffb76>] ? security_file_permission+0x16/0x20 [<ffffffff8116d905>] vfs_read+0xb5/0x1a0 [<ffffffff810d42b2>] ? audit_syscall_entry+0x272/0x2a0 [<ffffffff8116da41>] sys_read+0x51/0x90 [<ffffffff81013172>] system_call_fastpath+0x16/0x1b INFO: task vgs:2240 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. vgs D 0000000000000000 0 2240 2239 0x00000080 ffff88003b175b88 0000000000000086 ffff880000000000 000000efbfb3dd2e ffff880037e642c0 ffff88003d42ef30 0000000000011b9e ffffffffb29e12af ffff880037fac5f8 ffff88003b175fd8 0000000000010518 ffff880037fac5f8 Call Trace: [<ffffffff8109bae9>] ? ktime_get_ts+0xa9/0xe0 [<ffffffff814c9533>] io_schedule+0x73/0xc0 [<ffffffff811a688e>] __blockdev_direct_IO+0x70e/0xc40 [<ffffffff8125b82a>] ? kobject_get+0x1a/0x30 [<ffffffff811a44d7>] blkdev_direct_IO+0x57/0x60 [<ffffffff811a36c0>] ? blkdev_get_blocks+0x0/0xc0 [<ffffffff8110d8cb>] generic_file_aio_read+0x6db/0x730 [<ffffffff812071b1>] ? avc_has_perm+0x71/0x90 [<ffffffff81208c02>] ? selinux_inode_permission+0x72/0xb0 [<ffffffff8116ceda>] do_sync_read+0xfa/0x140 [<ffffffff81091de0>] ? autoremove_wake_function+0x0/0x40 [<ffffffff811a3a8c>] ? block_ioctl+0x3c/0x40 [<ffffffff8117fa12>] ? vfs_ioctl+0x22/0xa0 [<ffffffff8120c70b>] ? selinux_file_permission+0xfb/0x150 [<ffffffff811ffb76>] ? security_file_permission+0x16/0x20 [<ffffffff8116d905>] vfs_read+0xb5/0x1a0 [<ffffffff810d42b2>] ? audit_syscall_entry+0x272/0x2a0 [<ffffffff8116da41>] sys_read+0x51/0x90 [<ffffffff81013172>] system_call_fastpath+0x16/0x1b INFO: task vgs:2240 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. vgs D 0000000000000000 0 2240 2239 0x00000080 ffff88003b175b88 0000000000000086 ffff880000000000 000000efbfb3dd2e ffff880037e642c0 ffff88003d42ef30 0000000000011b9e ffffffffb29e12af ffff880037fac5f8 ffff88003b175fd8 0000000000010518 ffff880037fac5f8 Call Trace: [<ffffffff8109bae9>] ? ktime_get_ts+0xa9/0xe0 [<ffffffff814c9533>] io_schedule+0x73/0xc0 [<ffffffff811a688e>] __blockdev_direct_IO+0x70e/0xc40 [<ffffffff8125b82a>] ? kobject_get+0x1a/0x30 [<ffffffff811a44d7>] blkdev_direct_IO+0x57/0x60 [<ffffffff811a36c0>] ? blkdev_get_blocks+0x0/0xc0 [<ffffffff8110d8cb>] generic_file_aio_read+0x6db/0x730 [<ffffffff812071b1>] ? avc_has_perm+0x71/0x90 [<ffffffff81208c02>] ? selinux_inode_permission+0x72/0xb0 [<ffffffff8116ceda>] do_sync_read+0xfa/0x140 [<ffffffff81091de0>] ? autoremove_wake_function+0x0/0x40 [<ffffffff811a3a8c>] ? block_ioctl+0x3c/0x40 [<ffffffff8117fa12>] ? vfs_ioctl+0x22/0xa0 [<ffffffff8120c70b>] ? selinux_file_permission+0xfb/0x150 [<ffffffff811ffb76>] ? security_file_permission+0x16/0x20 [<ffffffff8116d905>] vfs_read+0xb5/0x1a0 [<ffffffff810d42b2>] ? audit_syscall_entry+0x272/0x2a0 [<ffffffff8116da41>] sys_read+0x51/0x90 [<ffffffff81013172>] system_call_fastpath+0x16/0x1b
If I want a clean shutdown I need to perform the following steps service multipathd stop dmsetup remove /dev/mapper/mpathc dmsetup remove /dev/mapper/mpathb dmsetup remove /dev/mapper/mpatha service iscsi stop service iscsid stop shutdown If I don't use dmsetup to remove the device-mapper entries I always get the kernel errors on shutdown.
In the defaults section of /etc/multipath.conf, can you add the line queue_without_daemon no This turns off queue_if_no_path when multipathd is shutdown. That way, when all your scsi device get removed, any outstanding IO will simply be failed back. Let me know if that helps.
Hi Ben Yes that does appear to fix things with regards to the shutdown. I am a little bit worried about the I/O errors that are thrown, although the system does boot cleanly after a shutdown. Here is the output of the shutdown Stopping Red Hat Network Daemon: [ OK ] Stopping atd: [ OK ] Stopping abrt daemon: [ OK ] Stopping sshd: [ OK ] Shutting down postfix: [ OK ] Stopping crond: [ OK ] Stopping acpi daemon: [ OK ] Stopping HAL daemon: [ OK ] Killing mdmonitor: [ OK ] Stopping system message bus: [ OK ] Stopping multipathd daemon: [ OK ] Stopping auditd: type=1305 audit(1298416533.079:9404): audit_pid=0 old=1184 auid=4294967295 ses=4294967295 subj=system_u:system_r:auditd_t:s0 res=1 [ OK ] Stopping iscsi: connection2:0: detected conn error (1020) connection1:0: detected conn error (1020) [ OK ] Stopping iscsid: Shutting down interface eth0: [ OK ] Shutting down interface eth1: [ OK ] Shutting down loopback interface: [ OK ] IPv6 over IPv4 tunneling driver sit0: Disabled Privacy Extensions device-mapper: multipath: Failing path 8:80. device-mapper: multipath: Failing path 8:96. end_request: I/O error, dev dm-1, sector 377495424 end_request: I/O error, dev dm-1, sector 377495536 end_request: I/O error, dev dm-1, sector 0 end_request: I/O error, dev dm-1, sector 8 end_request: I/O error, dev dm-1, sector 0 device-mapper: multipath: Failing path 8:32. device-mapper: multipath: Failing path 8:16. end_request: I/O error, dev dm-2, sector 314572672 end_request: I/O error, dev dm-2, sector 314572784 end_request: I/O error, dev dm-2, sector 0 end_request: I/O error, dev dm-2, sector 8 end_request: I/O error, dev dm-2, sector 0 device-mapper: multipath: Failing path 8:48. device-mapper: multipath: Failing path 8:64. end_request: I/O error, dev dm-3, sector 314572672 end_request: I/O error, dev dm-3, sector 314572784 end_request: I/O error, dev dm-3, sector 0 end_request: I/O error, dev dm-3, sector 8 end_request: I/O error, dev dm-3, sector 0 end_request: I/O error, dev dm-2, sector 227459328 end_request: I/O error, dev dm-2, sector 227459440 end_request: I/O error, dev dm-2, sector 227246464 end_request: I/O error, dev dm-2, sector 227246472 end_request: I/O error, dev dm-2, sector 227246464 Stopping monitoring for VG myvg: end_request: I/O error, dev dm-1, sector 377495424 /dev/mapper/mpathb: read failed after 0 of 4096 at 193277657088: Input/output error end_request: I/O error, dev dm-1, sector 377495536 /dev/mapper/mpathb: read failed after 0 of 4096 at 193277714432: Input/output error end_request: I/O error, dev dm-1, sector 0 /dev/mapper/mpathb: read failed after 0 of 4096 at 0: Input/output error end_request: I/O error, dev dm-1, sector 8 /dev/mapper/mpathb: read failed after 0 of 4096 at 4096: Input/output error end_request: I/O error, dev dm-1, sector 0 end_request: I/O error, dev dm-2, sector 314572672 /dev/mapper/mpathc: read failed after 0 of 4096 at 161061208064: Input/output error end_request: I/O error, dev dm-2, sector 314572784 /dev/mapper/mpathc: read failed after 0 of 4096 at 161061265408: Input/output error end_request: I/O error, dev dm-2, sector 0 /dev/mapper/mpathc: read failed after 0 of 4096 at 0: Input/output error end_request: I/O error, dev dm-2, sector 8 /dev/mapper/mpathc: read failed after 0 of 4096 at 4096: Input/output error end_request: I/O error, dev dm-2, sector 0 end_request: I/O error, dev dm-3, sector 314572672 /dev/mapper/mpatha: read failed after 0 of 4096 at 161061208064: Input/output error end_request: I/O error, dev dm-3, sector 314572784 /dev/mapper/mpatha: read failed after 0 of 4096 at 161061265408: Input/output error end_request: I/O error, dev dm-3, sector 0 /dev/mapper/mpatha: read failed after 0 of 4096 at 0: Input/output error end_request: I/O error, dev dm-3, sector 8 /dev/mapper/mpatha: read failed after 0 of 4096 at 4096: Input/output error end_request: I/O error, dev dm-3, sector 0 end_request: I/O error, dev dm-2, sector 227459328 /dev/LinuxBackupVG/rhsat_boot: read failed after 0 of 4096 at 108986368: Input/output error end_request: I/O error, dev dm-2, sector 227459440 /dev/LinuxBackupVG/rhsat_boot: read failed after 0 of 4096 at 109043712: Input/output error end_request: I/O error, dev dm-2, sector 227246464 /dev/LinuxBackupVG/rhsat_boot: read failed after 0 of 4096 at 0: Input/output error end_request: I/O error, dev dm-2, sector 227246472 /dev/LinuxBackupVG/rhsat_boot: read failed after 0 of 4096 at 4096: Input/output error end_request: I/O error, dev dm-2, sector 227246464 1 logical volume(s) in volume group "myvg" unmonitored [ OK ] Sending all processes the TERM signal... [ OK ] Sending all processes the KILL signal... [ OK ] Saving random seed: [ OK ] Syncing hardware clock to system time type=1111 audit(1298416538.502:9405): user pid=23543 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:hwclock_t:s0 msg='changing system time: exe="/sbin/hwclock" hostname=? addr=? terminal=console res=success' [ OK ] Turning off swap: [ OK ] Turning off quotas: [ OK ] Unmounting file systems: [ OK ] init: Re-executing /sbin/init Halting system... type=1128 audit(1298416539.657:9406): user pid=23117 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:initrc_t:s0 msg='init: exe="/sbin/reboot" hostname=? addr=? terminal=console res=success' md: stopping all md devices. sd 0:0:0:0: [sda] Synchronizing SCSI cache sd 0:0:0:0: [sda] Stopping disk ACPI: Preparing to enter system sleep state S5 Disabling non-boot CPUs ... Power down.
(In reply to comment #33) > I am a little bit worried about the I/O errors that are thrown, although the > system does boot cleanly after a shutdown. > Hi, Where you using a filesystem over the dm multipath device or using the dm device directly?
Anthony @ HP, can you please try the suggestion in Comment #32? thanks!
@David, yes with comment #32 added to multipath.conf does help.
Just to document this issue is still happening on RHEL6 Update1 Snapshot5
In reply to comment #37) > @David, yes with comment #32 added to multipath.conf does help. (In reply to comment #41) > Just to document this issue is still happening on RHEL6 Update1 Snapshot5 Anthony, Elvir, Stephen, If I understand the situation correctly, when you add the line queue_without_daemon no to the defaults section of /etc/multipath.conf, then the system shuts down reliably, without hanging. The remaining open questions are: 1) is there something we can do to avoid the I/O errors shown in comment 33? 2) should we make "queue_without_daemon no" the default? Is this a correct summary of the situation? It would be nice to fix #1, but that may not be possible, since those I/Os are already queued to the device. We have to fail them. Right Ben? If this is true, we could document this as an unavoidable consequence of selecting queue_if_no_path. WRT #2, we do not normally change defaults in a minor release because this can cause a surprise change of behavior for people who already have this running. This case might be an exception, though, since no one would really have any reason to want to run with "queue_without_daemon yes". Is that true Ben?
Tom yes queue_without_daemon=no help the situation. It has been a while but I don't recall seeing errors like those in comment 33 in my setup. Systems just shutdown like normally would.
(In reply to comment #47) > 1) is there something we can do to avoid the I/O errors shown in comment 33? The issue is that with "queue_without_daemon no" multipath is no longer queueing IO when there are no available paths. This means that if you don't have any valid paths to the multipath device, and there is IO to the device, you will see these errors. If the paths are gone and never coming back, there is no way to avoid this. All queued IO will be failed. > 2) should we make "queue_without_daemon no" the default? > > Is this a correct summary of the situation? > > It would be nice to fix #1, but that may not be possible, since those I/Os are > already queued to the device. We have to fail them. Right Ben? > > If this is true, we could document this as an unavoidable consequence of > selecting queue_if_no_path. > > WRT #2, we do not normally change defaults in a minor release because this can > cause a surprise change of behavior for people who already have this running. Correct. > This case might be an exception, though, since no one would really have any > reason to want to run with "queue_without_daemon yes". Is that true Ben? The onlyl issue is that if someone needed to stop multipathd, even to restart it (for example when upgrading the package). If they had a path with all of it paths failed, then the IO would get failed back, instead of being queued like they intended. Since you don't normally stop multipathd, this probably isn't a big issue, and I willing to consider changing the default. We could certainly put it into the multipath.conf template. This way, new installs will have it, but upgrades won't.
We should be able to solve this for all users. Like I mentioned in Comment 49, the problem is with running # service mutipathd restart This gets run when you upgrade you RPM. In this case, multipathd will be coming right back, so you don't want to fail the queued IOs. To solve this, I can add a multipathd command that overrides the queue_without_daemon setting, and make the multipathd init script use this when running a restart. With that change in place, I don't see much risk of surprising customers with this change.
multipathd how has queue_without_daemon disabled by default. However, when doing # service multipathd restart queue_without_daemon will always be enforced, so paths don't fail back IO when you restart the daemon. # service multipathd restart will now also work with mutipathed root filesystems. There are also two new multipathd interactive commands forcequeueing daemon and restorequeueing daemon forcequeueing will force multipathd to set queue_without_daemon. restorequeueing will revert it to the configured value.
Reproduce this problem with this line in RHEL 6.2 GA (which enable queue_without_daemon by default): ==== features "1 queue_if_no_path" ==== As RHEL 6.3 disable "queue_without_daemon" by default, reboot will not be blocked. Ben, Once more concern, if user specify "queue_without_daemon yes" in their configure, os shutdown will still be blocked. Will this not a bug or we will fix it later? Thanks.
I don't consider this a bug. If the user manually sets queue_without_daemon, then yes, it will get blocked, but it's only doing what it was told to do.
Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: Cause: Multipathd was not disabling queue_if_no_path on multipath devices when it stopped, by default. Consequence: Once multipathd stopped on shutdown, if all paths to a device were lost, the IO to that device would queue, causing shutdown to hang Fix: multipath now sets the queue_without_daemon option to "no" by default, causing all multipath devices to stop queueing when multipathd is stopped Result: multipath devices no longer queue IO during shutdown, and the nodes no longer hang
Thanks for clarification. verify this bug. Bug #818404 for documents update for this change.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2012-0946.html