Hide Forgot
Description of problem: If kernel reconizes multipath device having LVM Volume, RHEL shutdown sequence stop. Version-Release number of selected component (if applicable): kernel-2.6.32-71.el6 lvm2-2.02-72-8.el6 device-mapper-multipath-0.4.9-31.el6 iscsi-initiator-utils-6.2.0.872-10.el6 How reproducible: 100% reproducible. Steps to Reproduce: 1.make following constitution iSCSI target - iSCSI initiator(4path) - multipath dev(PV) - VG - LV iSCSI target is storage product. 2.shutdown OS Actual results: kernel shutdown sequence stop. Expected results: shutdown sequence is succeeded. Additional info: stack output of /var/log/messages Jan 14 18:52:08 bhbhv-fcb2032 kernel: INFO: task vgs:5740 blocked for more than 120 seconds. Jan 14 18:52:08 bhbhv-fcb2032 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jan 14 18:52:08 bhbhv-fcb2032 kernel: vgs D 0000000000000000 0 5740 5739 0x00000080 Jan 14 18:52:08 bhbhv-fcb2032 kernel: ffff88033095bb88 0000000000000082 ffff88033095bb48 ffffffffa000471c Jan 14 18:52:08 bhbhv-fcb2032 kernel: ffff880630a60cd0 ffff8806301be200 0000000000000001 000000000000000c Jan 14 18:52:08 bhbhv-fcb2032 kernel: ffff88032ecab028 ffff88033095bfd8 0000000000010518 ffff88032ecab028 Jan 14 18:52:08 bhbhv-fcb2032 kernel: Call Trace: Jan 14 18:52:08 bhbhv-fcb2032 kernel: [<ffffffffa000471c>] ? dm_table_unplug_all+0x5c/0xd0 [dm_mod] Jan 14 18:52:08 bhbhv-fcb2032 kernel: [<ffffffff8109b9a9>] ? ktime_get_ts+0xa9/0xe0 Jan 14 18:52:08 bhbhv-fcb2032 kernel: [<ffffffff814c8a23>] io_schedule+0x73/0xc0 Jan 14 18:52:08 bhbhv-fcb2032 kernel: [<ffffffff811a5ffe>] __blockdev_direct_IO+0x70e/0xc40 Jan 14 18:52:08 bhbhv-fcb2032 kernel: [<ffffffff811a3c47>] blkdev_direct_IO+0x57/0x60 Jan 14 18:52:08 bhbhv-fcb2032 kernel: [<ffffffff811a2e30>] ? blkdev_get_blocks+0x0/0xc0 Jan 14 18:52:08 bhbhv-fcb2032 kernel: [<ffffffff8110d69b>] generic_file_aio_read+0x6db/0x730 Jan 14 18:52:08 bhbhv-fcb2032 kernel: [<ffffffff811a46b0>] ? blkdev_open+0x0/0xc0 Jan 14 18:52:08 bhbhv-fcb2032 kernel: [<ffffffff8120051f>] ? security_inode_permission+0x1f/0x30 Jan 14 18:52:08 bhbhv-fcb2032 kernel: [<ffffffff8116c65a>] do_sync_read+0xfa/0x140 Jan 14 18:52:08 bhbhv-fcb2032 kernel: [<ffffffff8117d32d>] ? do_filp_open+0x60d/0xd40 Jan 14 18:52:08 bhbhv-fcb2032 kernel: [<ffffffff81091ca0>] ? autoremove_wake_function+0x0/0x40 Jan 14 18:52:08 bhbhv-fcb2032 kernel: [<ffffffff811a31fc>] ? block_ioctl+0x3c/0x40 Jan 14 18:52:08 bhbhv-fcb2032 kernel: [<ffffffff8117f182>] ? vfs_ioctl+0x22/0xa0 Jan 14 18:52:08 bhbhv-fcb2032 kernel: [<ffffffff8117f324>] ? do_vfs_ioctl+0x84/0x580 Jan 14 18:52:08 bhbhv-fcb2032 kernel: [<ffffffff811ff3b6>] ? security_file_permission+0x16/0x20 Jan 14 18:52:08 bhbhv-fcb2032 kernel: [<ffffffff8116d085>] vfs_read+0xb5/0x1a0 Jan 14 18:52:08 bhbhv-fcb2032 kernel: [<ffffffff810d40a2>] ? audit_syscall_entry+0x272/0x2a0 Jan 14 18:52:08 bhbhv-fcb2032 kernel: [<ffffffff8116d1c1>] sys_read+0x51/0x90 Jan 14 18:52:08 bhbhv-fcb2032 kernel: [<ffffffff81013172>] system_call_fastpath+0x16/0x1b
This request was evaluated by Red Hat Product Management for inclusion in the current release of Red Hat Enterprise Linux. Because the affected component is not scheduled to be updated in the current release, Red Hat is unfortunately unable to address this request at this time. Red Hat invites you to ask your support representative to propose this request, if appropriate and relevant, in the next release of Red Hat Enterprise Linux. If you would like it considered as an exception in the current release, please ask your support representative.
This request was erroneously denied for the current release of Red Hat Enterprise Linux. The error has been fixed and this request has been re-proposed for the current release.
This is because of the lvm2-monitor script being enabled by default. Disabling it should allow the system to shutdown properly. # chkconfig lvm2-monitor off
Actually, the real reason is an inversion in shutdown order of lvm2-monitor and iscsi. lvm2-monitor it shutting down really late, which is after iscsi and a lot of other things have been torn down: # ls /etc/rc.d/rc6.d/*lvm2* /etc/rc.d/rc6.d/K99lvm2-monitor # ls /etc/rc.d/rc6.d/*iscsi* /etc/rc.d/rc6.d/K88iscsi /etc/rc.d/rc6.d/K89iscsid This looks wrong.
lvm2-monitor is required for systems with mirrors. Just as activation of all the LVs on the system is done in several steps with other things happening in between, shutdown should be similar. Which part of the monitor script is causing the problem? It should be late, after iscsi, yes, I'd have thought. But what is deactivating LVs on iscsi devs before iscsi is shut down?
I testd to shift shutdown sequence "K87XXX, K88XXX, K89XXX, K90XXX" and change K99lvm2-monitor to K87lvm2-monitor. I verified shutdown sequence is suceeded on that environment. Besides, what kind of order should it be?
(In reply to comment #7) > But what is deactivating LVs on iscsi devs before iscsi is shut down? I do not think anything is. I think we just hit a similar problem with software fcoe and lvm.
I'll switch this to initscripts rather than kernel for now for co-ordination of any necessary changes, but expect that other components may need to change their own scripts too.
Not sure what initscripts itself has to do with this - wouldn't this be in the specific fcoe/iscsi/lvm2-monitor init scripts?
Moving back - none of the scripts in question are in initscripts.
Does anyone know if this bug will make it into RHEL 6 SP 1? Thanks.
you could try adding defaults { ... queue_without_daemon no ... } to /etc/multipath.conf to fix the queue_if_no_paths problem. That disables queuing whenever the multipathd daemon is not running.
Did the suggestion in comment #15 help to address the problem reported? The shutdown should not hang with that setting used. A script to deactivate any remaining LVs/VGs should still be provided though with the exception of the LV on which root fs and any system fs resides (together with unmounting any filesystems using them and any layered devices/mappings underneath). I'll try to complete the script using existing lsblk output that shows the device tree already based on information found in sysfs. The lsblk needs a small patch to make the output more suitable for parsing...
I've put together a Kbase article covering this issue, and a couple workarounds. The current public version is here: https://access.redhat.com/kb/docs/DOC-60763
How is this "NOTABUG"? The Kbase article says "This is a known issue and a fix is being worked on"
We already have a bug tracking this problem, but it's marked as private as per customer's request. I've opened a public one with 6.4 as target - bug #800801. Please add any additional comments there. Thanks.