Bug 672530

Summary: OS shutdown sequence is stop on vgs?
Product: Red Hat Enterprise Linux 6 Reporter: Teruaki Ishizaki <teruaki.ishizaki>
Component: lvm2Assignee: Peter Rajnoha <prajnoha>
Status: CLOSED NOTABUG QA Contact: Corey Marthaler <cmarthal>
Severity: urgent Docs Contact:
Priority: medium    
Version: 6.0CC: agk, bmarzins, dkelson, dwysocha, heinzm, jbrassow, jwest, kfujii, mbroz, mchristi, msnitzer, notting, prajnoha, prockai, Sean.Stewart, syeghiay, thornber, tomasz.kepczynski, zkabelac
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 800801 (view as bug list) Environment:
Last Closed: 2011-08-25 18:48:03 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 800801    

Description Teruaki Ishizaki 2011-01-25 12:58:21 UTC
Description of problem:
If kernel reconizes multipath device having LVM Volume, RHEL shutdown sequence stop.

Version-Release number of selected component (if applicable):
kernel-2.6.32-71.el6
lvm2-2.02-72-8.el6
device-mapper-multipath-0.4.9-31.el6
iscsi-initiator-utils-6.2.0.872-10.el6

How reproducible:
100% reproducible.

Steps to Reproduce:
1.make following constitution
 iSCSI target - iSCSI initiator(4path) - multipath dev(PV) - VG - LV

 iSCSI target is storage product.

2.shutdown OS
  
Actual results:
kernel shutdown sequence stop.

Expected results:
shutdown sequence is succeeded.

Additional info:
stack output of /var/log/messages

Jan 14 18:52:08 bhbhv-fcb2032 kernel: INFO: task vgs:5740 blocked for more than 120 seconds.
Jan 14 18:52:08 bhbhv-fcb2032 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan 14 18:52:08 bhbhv-fcb2032 kernel: vgs           D 0000000000000000     0  5740   5739 0x00000080
Jan 14 18:52:08 bhbhv-fcb2032 kernel: ffff88033095bb88 0000000000000082 ffff88033095bb48 ffffffffa000471c
Jan 14 18:52:08 bhbhv-fcb2032 kernel: ffff880630a60cd0 ffff8806301be200 0000000000000001 000000000000000c
Jan 14 18:52:08 bhbhv-fcb2032 kernel: ffff88032ecab028 ffff88033095bfd8 0000000000010518 ffff88032ecab028
Jan 14 18:52:08 bhbhv-fcb2032 kernel: Call Trace:
Jan 14 18:52:08 bhbhv-fcb2032 kernel: [<ffffffffa000471c>] ? dm_table_unplug_all+0x5c/0xd0 [dm_mod]
Jan 14 18:52:08 bhbhv-fcb2032 kernel: [<ffffffff8109b9a9>] ? ktime_get_ts+0xa9/0xe0
Jan 14 18:52:08 bhbhv-fcb2032 kernel: [<ffffffff814c8a23>] io_schedule+0x73/0xc0
Jan 14 18:52:08 bhbhv-fcb2032 kernel: [<ffffffff811a5ffe>] __blockdev_direct_IO+0x70e/0xc40
Jan 14 18:52:08 bhbhv-fcb2032 kernel: [<ffffffff811a3c47>] blkdev_direct_IO+0x57/0x60
Jan 14 18:52:08 bhbhv-fcb2032 kernel: [<ffffffff811a2e30>] ? blkdev_get_blocks+0x0/0xc0
Jan 14 18:52:08 bhbhv-fcb2032 kernel: [<ffffffff8110d69b>] generic_file_aio_read+0x6db/0x730
Jan 14 18:52:08 bhbhv-fcb2032 kernel: [<ffffffff811a46b0>] ? blkdev_open+0x0/0xc0
Jan 14 18:52:08 bhbhv-fcb2032 kernel: [<ffffffff8120051f>] ? security_inode_permission+0x1f/0x30
Jan 14 18:52:08 bhbhv-fcb2032 kernel: [<ffffffff8116c65a>] do_sync_read+0xfa/0x140
Jan 14 18:52:08 bhbhv-fcb2032 kernel: [<ffffffff8117d32d>] ? do_filp_open+0x60d/0xd40
Jan 14 18:52:08 bhbhv-fcb2032 kernel: [<ffffffff81091ca0>] ? autoremove_wake_function+0x0/0x40
Jan 14 18:52:08 bhbhv-fcb2032 kernel: [<ffffffff811a31fc>] ? block_ioctl+0x3c/0x40
Jan 14 18:52:08 bhbhv-fcb2032 kernel: [<ffffffff8117f182>] ? vfs_ioctl+0x22/0xa0
Jan 14 18:52:08 bhbhv-fcb2032 kernel: [<ffffffff8117f324>] ? do_vfs_ioctl+0x84/0x580
Jan 14 18:52:08 bhbhv-fcb2032 kernel: [<ffffffff811ff3b6>] ? security_file_permission+0x16/0x20
Jan 14 18:52:08 bhbhv-fcb2032 kernel: [<ffffffff8116d085>] vfs_read+0xb5/0x1a0
Jan 14 18:52:08 bhbhv-fcb2032 kernel: [<ffffffff810d40a2>] ? audit_syscall_entry+0x272/0x2a0
Jan 14 18:52:08 bhbhv-fcb2032 kernel: [<ffffffff8116d1c1>] sys_read+0x51/0x90
Jan 14 18:52:08 bhbhv-fcb2032 kernel: [<ffffffff81013172>] system_call_fastpath+0x16/0x1b

Comment 2 RHEL Program Management 2011-01-25 13:28:29 UTC
This request was evaluated by Red Hat Product Management for
inclusion in the current release of Red Hat Enterprise Linux.
Because the affected component is not scheduled to be updated
in the current release, Red Hat is unfortunately unable to
address this request at this time. Red Hat invites you to
ask your support representative to propose this request, if
appropriate and relevant, in the next release of Red Hat
Enterprise Linux. If you would like it considered as an
exception in the current release, please ask your support
representative.

Comment 3 RHEL Program Management 2011-02-01 06:22:49 UTC
This request was evaluated by Red Hat Product Management for
inclusion in the current release of Red Hat Enterprise Linux.
Because the affected component is not scheduled to be updated
in the current release, Red Hat is unfortunately unable to
address this request at this time. Red Hat invites you to
ask your support representative to propose this request, if
appropriate and relevant, in the next release of Red Hat
Enterprise Linux. If you would like it considered as an
exception in the current release, please ask your support
representative.

Comment 4 RHEL Program Management 2011-02-01 19:10:55 UTC
This request was erroneously denied for the current release of
Red Hat Enterprise Linux.  The error has been fixed and this
request has been re-proposed for the current release.

Comment 5 Dave Wysochanski 2011-02-07 22:17:14 UTC
This is because of the lvm2-monitor script being enabled by default.
Disabling it should allow the system to shutdown properly.
# chkconfig lvm2-monitor off

Comment 6 Dave Wysochanski 2011-02-07 22:24:57 UTC
Actually, the real reason is an inversion in shutdown order of lvm2-monitor and iscsi.  lvm2-monitor it shutting down really late, which is after iscsi and a lot of other things have been torn down:
# ls /etc/rc.d/rc6.d/*lvm2*
/etc/rc.d/rc6.d/K99lvm2-monitor
# ls /etc/rc.d/rc6.d/*iscsi*
/etc/rc.d/rc6.d/K88iscsi  /etc/rc.d/rc6.d/K89iscsid

This looks wrong.

Comment 7 Alasdair Kergon 2011-02-08 00:31:42 UTC
lvm2-monitor is required for systems with mirrors.

Just as activation of all the LVs on the system is done in several steps with other things happening in between, shutdown should be similar.

Which part of the monitor script is causing the problem?

It should be late, after iscsi, yes, I'd have thought.

But what is deactivating LVs on iscsi devs before iscsi is shut down?

Comment 8 Teruaki Ishizaki 2011-02-08 04:50:19 UTC
I testd to shift shutdown sequence "K87XXX, K88XXX, K89XXX, K90XXX" and change K99lvm2-monitor to K87lvm2-monitor.

I verified shutdown sequence is suceeded on that environment.

Besides, what kind of order should it be?

Comment 10 Mike Christie 2011-02-14 20:58:51 UTC
(In reply to comment #7)
> But what is deactivating LVs on iscsi devs before iscsi is shut down?

I do not think anything is. I think we just hit a similar problem with software fcoe and lvm.

Comment 11 Alasdair Kergon 2011-02-15 20:54:55 UTC
I'll switch this to initscripts rather than kernel for now for co-ordination of any necessary changes, but expect that other components may need to change their own scripts too.

Comment 12 Bill Nottingham 2011-02-16 03:48:43 UTC
Not sure what initscripts itself has to do with this - wouldn't this be in the specific fcoe/iscsi/lvm2-monitor init scripts?

Comment 13 Bill Nottingham 2011-03-18 19:30:07 UTC
Moving back - none of the scripts in question are in initscripts.

Comment 14 Sean Stewart 2011-05-04 14:09:11 UTC
Does anyone know if this bug will make it into RHEL 6 SP 1?  Thanks.

Comment 15 Ben Marzinski 2011-06-06 17:30:28 UTC
you could try adding

defaults {
  ...
  queue_without_daemon no
  ...
}

to /etc/multipath.conf

to fix the queue_if_no_paths problem.  That disables queuing whenever the multipathd daemon is not running.

Comment 16 Peter Rajnoha 2011-08-01 12:12:15 UTC
Did the suggestion in comment #15 help to address the problem reported? The shutdown should not hang with that setting used.

A script to deactivate any remaining LVs/VGs should still be provided though with the exception of the LV on which root fs and any system fs resides (together with unmounting any filesystems using them and any layered devices/mappings underneath). I'll try to complete the script using existing lsblk output that shows the device tree already based on information found in sysfs. The lsblk needs a small patch to make the output more suitable for parsing...

Comment 17 Dave Wysochanski 2011-08-08 19:35:37 UTC
I've put together a Kbase article covering this issue, and a couple workarounds.  The current public version is here: https://access.redhat.com/kb/docs/DOC-60763

Comment 19 Dax Kelson 2012-03-06 23:23:36 UTC
How is this "NOTABUG"? 

The Kbase article says "This is a known issue and a fix is being worked on"

Comment 20 Peter Rajnoha 2012-03-07 10:10:09 UTC
We already have a bug tracking this problem, but it's marked as private as per customer's request. I've opened a public one with 6.4 as target - bug #800801. Please add any additional comments there. Thanks.