Hide Forgot
Customer has reproduced this bug on RHEL6. "Were are trying to move the content of a physical disk in a volume group with 20 open logical volumes via the command "pvmove" to another physical disk freshly added to this volume group. To simulate database I/O we started two parallel iozone programs on two different logical volumes of the mentioned 20 logical volumes which are all mounted. We can reproduce that the pvmove command hangs after some time and the two iozone processes are stalled, too. We would expect that the pvmove command moves the physical volume even when there is some load on the volumes as we often have to move disks when there is a database accessing this volume." $ grep -E 'Suspend|Resume' pvmove_verbose_1.txt #libdm-deptree.c:1077 Suspending TEST1-test.1 (253:22) with device flush #libdm-deptree.c:1077 Suspending TEST1-test.2 (253:23) with device flush #libdm-deptree.c:1077 Suspending TEST1-test.3 (253:24) with device flush #libdm-deptree.c:1077 Suspending TEST1-test.4 (253:25) with device flush #libdm-deptree.c:1077 Suspending TEST1-test.5 (253:26) with device flush #libdm-deptree.c:1077 Suspending TEST1-test.6 (253:27) with device flush #libdm-deptree.c:1077 Suspending TEST1-test.7 (253:28) with device flush #libdm-deptree.c:1077 Suspending TEST1-test.8 (253:29) with device flush #libdm-deptree.c:1077 Suspending TEST1-test.9 (253:30) with device flush #libdm-deptree.c:1077 Suspending TEST1-test.10 (253:31) with device flush #libdm-deptree.c:1077 Suspending TEST1-test.11 (253:32) with device flush #libdm-deptree.c:1077 Suspending TEST1-test.12 (253:33) with device flush #libdm-deptree.c:1077 Suspending TEST1-test.13 (253:34) with device flush #libdm-deptree.c:1077 Suspending TEST1-test.14 (253:35) with device flush #libdm-deptree.c:1077 Suspending TEST1-test.15 (253:36) with device flush #libdm-deptree.c:1077 Suspending TEST1-test.16 (253:37) with device flush #libdm-deptree.c:1077 Suspending TEST1-test.17 (253:38) with device flush #libdm-deptree.c:1077 Suspending TEST1-test.18 (253:39) with device flush #libdm-deptree.c:1077 Suspending TEST1-test.19 (253:40) with device flush #libdm-deptree.c:1077 Suspending TEST1-test.20 (253:41) with device flush #libdm-deptree.c:1077 Suspending TEST1-test.1 (253:22) with device flush #libdm-deptree.c:1077 Suspending TEST1-pvmove0 (253:42) with device flush <---- pvmove0 suspended #libdm-deptree.c:1077 Suspending TEST1-test.2 (253:23) with device flush #libdm-deptree.c:1077 Suspending TEST1-test.3 (253:24) with device flush #libdm-deptree.c:1077 Suspending TEST1-test.4 (253:25) with device flush #libdm-deptree.c:1077 Suspending TEST1-test.5 (253:26) with device flush <---- TEST1-test.5 is waiting for I/O to complete that's stuck in pvmove0 Mar 8 10:20:57 degtlun1843 kernel: pvmove S ffff8801a7828800 0 17762 17761 0x00000001 Mar 8 10:20:57 degtlun1843 kernel: ffff88018011dcb8 0000000000000082 0000000000000000 ffff88019258ec00 Mar 8 10:20:57 degtlun1843 kernel: ffff88018011dc38 ffffffff8123b274 ffff88019bc58ec0 0000000103f42701 Mar 8 10:20:57 degtlun1843 kernel: ffff88019d116678 ffff88018011dfd8 0000000000010518 ffff88019d116678 Mar 8 10:20:57 degtlun1843 kernel: Call Trace: Mar 8 10:20:57 degtlun1843 kernel: [<ffffffff8123b274>] ? blk_unplug+0x34/0x70 Mar 8 10:20:57 degtlun1843 kernel: [<ffffffff814c9533>] io_schedule+0x73/0xc0 Mar 8 10:20:57 degtlun1843 kernel: [<ffffffffa000298b>] dm_wait_for_completion+0x9b/0x100 [dm_mod] Mar 8 10:20:57 degtlun1843 kernel: [<ffffffff8105c530>] ? default_wake_function+0x0/0x20 Mar 8 10:20:57 degtlun1843 kernel: [<ffffffffa0002af8>] dm_suspend+0x108/0x1f0 [dm_mod] Mar 8 10:20:57 degtlun1843 kernel: [<ffffffffa00085a6>] dev_suspend+0x76/0x240 [dm_mod] Mar 8 10:20:57 degtlun1843 kernel: [<ffffffffa0008530>] ? dev_suspend+0x0/0x240 [dm_mod] Mar 8 10:20:57 degtlun1843 kernel: [<ffffffffa0008fc3>] ctl_ioctl+0x1a3/0x240 [dm_mod] Mar 8 10:20:57 degtlun1843 kernel: [<ffffffffa0009073>] dm_ctl_ioctl+0x13/0x20 [dm_mod] Mar 8 10:20:57 degtlun1843 kernel: [<ffffffff8117fa12>] vfs_ioctl+0x22/0xa0 Mar 8 10:20:57 degtlun1843 kernel: [<ffffffff810c711c>] ? utrace_stop+0x12c/0x1e0 Mar 8 10:20:57 degtlun1843 kernel: [<ffffffff8117fbb4>] do_vfs_ioctl+0x84/0x580 Mar 8 10:20:57 degtlun1843 kernel: [<ffffffff810c865e>] ? utrace_report_syscall_entry+0x10e/0x160 Mar 8 10:20:57 degtlun1843 kernel: [<ffffffff81180131>] sys_ioctl+0x81/0xa0 Mar 8 10:20:57 degtlun1843 kernel: [<ffffffff81013387>] tracesys+0xd9/0xde
Since RHEL 6.1 External Beta has begun, and this bug remains unresolved, it has been rejected as it is not proposed as exception or blocker. Red Hat invites you to ask your support representative to propose this request, if appropriate and relevant, in the next release of Red Hat Enterprise Linux.
Adding QA ack for 6.2. Devel will need to provide unit testing results however before this bug can be ultimately verified by QA.
The same bug is reported in rhel5: https://bugzilla.redhat.com/show_bug.cgi?id=706036 And someone is working on fix it. On comment 13, Alasdair provide a method to work around: In the meantime, as a workaround, try using the -n option of pvmove to move only one LV at once. List of LVS in VG: lvs --noheadings -o name $vg Move one LV: pvmove -i0 -n $lvname
This passed the upstream test suite for the first time last night. However, due to the complexity of the change and the amount of regression testing I believe it needs, I am not offering this as a Z-stream release, but only releasing it as part of the next scheduled update, viz. 6.2. In the meantime, I'm afraid the above workaround is the best I can offer.
Upstream release 2.02.86 include in Fedora rawhide. Please test.
I added a basic pvmove during I/O regression test case. I didn't see any issues while running it on the latest rpms. Marking this verified (SanityOnly). SCENARIO - [pvmove_during_io] Pvmove a volume during active I/O grant-01: lvcreate -n move_during_io -L 800M mirror_sanity Starting io to linear to be pvmoved Attempting pvmove of /dev/sdc6 on grant-01 Deactivating mirror move_during_io... and removing 2.6.32-203.el6.x86_64 lvm2-2.02.87-3.el6 BUILT: Wed Sep 21 09:54:55 CDT 2011 lvm2-libs-2.02.87-3.el6 BUILT: Wed Sep 21 09:54:55 CDT 2011 lvm2-cluster-2.02.87-3.el6 BUILT: Wed Sep 21 09:54:55 CDT 2011 udev-147-2.40.el6 BUILT: Fri Sep 23 07:51:13 CDT 2011 device-mapper-1.02.66-3.el6 BUILT: Wed Sep 21 09:54:55 CDT 2011 device-mapper-libs-1.02.66-3.el6 BUILT: Wed Sep 21 09:54:55 CDT 2011 device-mapper-event-1.02.66-3.el6 BUILT: Wed Sep 21 09:54:55 CDT 2011 device-mapper-event-libs-1.02.66-3.el6 BUILT: Wed Sep 21 09:54:55 CDT 2011 cmirror-2.02.87-3.el6 BUILT: Wed Sep 21 09:54:55 CDT 2011
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2011-1522.html