Red Hat Bugzilla – Bug 1481383
need to protect against multiple queued up merge attempts while origin is inactive
Last modified: 2018-04-10 11:21:33 EDT
Description of problem: [root@host-075 tmp]# lvs -a -o +devices LV VG Attr LSize Pool Origin Devices merge1 snapper swi---s--- 2.00g origin /dev/sda1(1024) merge2 snapper swi---s--- 2.00g origin /dev/sda1(1536) merge3 snapper swi---s--- 2.00g origin /dev/sda1(2048) merge4 snapper swi---s--- 2.00g origin /dev/sda1(2560) merge5 snapper swi---s--- 2.00g origin /dev/sda1(3072) origin snapper owi---s--- 4.00g /dev/sda1(0) [root@host-075 tmp]# lvconvert --merge snapper/merge1 Merging of snapshot snapper/merge1 will occur on next activation of snapper/origin. [root@host-075 tmp]# lvconvert --merge snapper/merge2 Merging of snapshot snapper/merge2 will occur on next activation of snapper/origin. [root@host-075 tmp]# lvconvert --merge snapper/merge3 Merging of snapshot snapper/merge3 will occur on next activation of snapper/origin. [root@host-075 tmp]# lvconvert --merge snapper/merge4 Merging of snapshot snapper/merge4 will occur on next activation of snapper/origin. [root@host-075 tmp]# lvconvert --merge snapper/merge5 Merging of snapshot snapper/merge5 will occur on next activation of snapper/origin. [root@host-075 tmp]# vgchange -ay snapper 1 logical volume(s) in volume group "snapper" now active Background polling started for 1 logical volume(s) in volume group "snapper" [root@host-075 ~]# dmsetup status snapper-merge5-cow: 0 4194304 linear snapper-merge5: 0 8388608 error snapper-merge4: 0 8388608 error snapper-merge3: 0 8388608 error snapper-merge1-cow: snapper-merge2: 0 8388608 error snapper-merge1: 0 8388608 error snapper-origin-real: 0 8388608 linear snapper-merge2-cow: rhel_host--075-swap: 0 1679360 linear snapper-merge3-cow: rhel_host--075-root: 0 12992512 linear snapper-merge4-cow: snapper-origin: 0 8388608 snapshot-merge 512/4194304 512 [root@host-075 tmp]# lvs -a -o +devices [Deadlock] Aug 14 12:35:18 host-075 lvmpolld: W: #011LVPOLL: PID 27744: STDERR: ' Internal error: Performing unsafe table load while 12 device(s) are known to be suspended: (253:5) ' Aug 14 12:36:18 host-075 lvmpolld: W: LVMPOLLD: polling for output of the lvm cmd (PID 27744) has timed out Aug 14 12:36:26 host-075 systemd: Started Session 7749 of user root. Aug 14 12:36:26 host-075 systemd: Starting Session 7749 of user root. Aug 14 12:36:26 host-075 systemd-logind: New session 7749 of user root. Aug 14 12:36:26 host-075 dbus[619]: [system] Activating service name='org.freedesktop.problems' (using servicehelper) Aug 14 12:36:26 host-075 dbus-daemon: dbus[619]: [system] Activating service name='org.freedesktop.problems' (using servicehelper) Aug 14 12:36:26 host-075 dbus[619]: [system] Successfully activated service 'org.freedesktop.problems' Aug 14 12:36:26 host-075 dbus-daemon: dbus[619]: [system] Successfully activated service 'org.freedesktop.problems' Aug 14 12:37:18 host-075 lvmpolld: W: LVMPOLLD: polling for output of the lvm cmd (PID 27744) has timed out Aug 14 12:38:18 host-075 lvmpolld[27740]: LVMPOLLD: LVM2 cmd is unresponsive too long (PID 27744) (no output for 180 seconds) Aug 14 12:38:18 host-075 lvmpolld: W: LVMPOLLD: polling for output of the lvm cmd (PID 27744) has timed out Aug 14 12:38:38 host-075 kernel: INFO: task kworker/0:0:25794 blocked for more than 120 seconds. Aug 14 12:38:38 host-075 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Aug 14 12:38:38 host-075 kernel: kworker/0:0 D ffff880016f217d4 0 25794 2 0x00000080 Aug 14 12:38:38 host-075 kernel: Workqueue: ksnaphd do_metadata [dm_snapshot] Aug 14 12:38:38 host-075 kernel: ffff880028e5bc10 0000000000000046 ffff88001dd7bf40 ffff880028e5bfd8 Aug 14 12:38:38 host-075 kernel: ffff880028e5bfd8 ffff880028e5bfd8 ffff88001dd7bf40 ffff88003fc16cc0 Aug 14 12:38:38 host-075 kernel: 0000000000000000 7fffffffffffffff ffff88001dd7bf40 ffff880037fe0c60 Aug 14 12:38:38 host-075 kernel: Call Trace: Aug 14 12:38:38 host-075 kernel: [<ffffffff816a8469>] schedule+0x29/0x70 Aug 14 12:38:38 host-075 kernel: [<ffffffff816a5f79>] schedule_timeout+0x239/0x2c0 Aug 14 12:38:38 host-075 kernel: [<ffffffff812f8e50>] ? submit_bio+0x70/0x150 Aug 14 12:38:38 host-075 kernel: [<ffffffff81062efe>] ? kvm_clock_get_cycles+0x1e/0x20 Aug 14 12:38:38 host-075 kernel: [<ffffffff816a7aed>] io_schedule_timeout+0xad/0x130 Aug 14 12:38:38 host-075 kernel: [<ffffffffc000acf0>] ? bvec_next_page+0x10/0x10 [dm_mod] Aug 14 12:38:38 host-075 kernel: [<ffffffff816a8a9d>] wait_for_completion_io+0xfd/0x140 Aug 14 12:38:38 host-075 kernel: [<ffffffff810c4850>] ? wake_up_state+0x20/0x20 Aug 14 12:38:38 host-075 kernel: [<ffffffffc000b5bf>] dm_io+0x26f/0x2e0 [dm_mod] Aug 14 12:38:38 host-075 kernel: [<ffffffffc000ae90>] ? dm_io_client_destroy+0x30/0x30 [dm_mod] Aug 14 12:38:38 host-075 kernel: [<ffffffffc000acf0>] ? bvec_next_page+0x10/0x10 [dm_mod] Aug 14 12:38:38 host-075 kernel: [<ffffffffc03f2931>] do_metadata+0x21/0x30 [dm_snapshot] Aug 14 12:38:38 host-075 kernel: [<ffffffff810a885a>] process_one_work+0x17a/0x440 Aug 14 12:38:38 host-075 kernel: [<ffffffff810a9526>] worker_thread+0x126/0x3c0 Aug 14 12:38:38 host-075 kernel: [<ffffffff810a9400>] ? manage_workers.isra.24+0x2a0/0x2a0 Aug 14 12:38:38 host-075 kernel: [<ffffffff810b09cf>] kthread+0xcf/0xe0 Aug 14 12:38:38 host-075 kernel: [<ffffffff810b0900>] ? insert_kthread_work+0x40/0x40 Aug 14 12:38:38 host-075 kernel: [<ffffffff816b3ed8>] ret_from_fork+0x58/0x90 Aug 14 12:38:38 host-075 kernel: [<ffffffff810b0900>] ? insert_kthread_work+0x40/0x40 Aug 14 12:38:38 host-075 kernel: INFO: task lvm:27744 blocked for more than 120 seconds. Aug 14 12:38:38 host-075 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Aug 14 12:38:38 host-075 kernel: lvm D ffff8800371dba50 0 27744 27743 0x00000080 Aug 14 12:38:38 host-075 kernel: ffff8800371db8f0 0000000000000082 ffff88001dd79fa0 ffff8800371dbfd8 Aug 14 12:38:38 host-075 kernel: ffff8800371dbfd8 ffff8800371dbfd8 ffff88001dd79fa0 ffff8800371dba70 Aug 14 12:38:38 host-075 kernel: 7fffffffffffffff ffff8800371dba68 ffff88001dd79fa0 ffff8800371dba50 Aug 14 12:38:38 host-075 kernel: Call Trace: Aug 14 12:38:38 host-075 kernel: [<ffffffff816a8469>] schedule+0x29/0x70 Aug 14 12:38:38 host-075 kernel: [<ffffffff816a5f79>] schedule_timeout+0x239/0x2c0 Aug 14 12:38:38 host-075 kernel: [<ffffffff810c1315>] ? check_preempt_curr+0x85/0xa0 Aug 14 12:38:38 host-075 kernel: [<ffffffff810c1349>] ? ttwu_do_wakeup+0x19/0xd0 Aug 14 12:38:38 host-075 kernel: [<ffffffff816a881d>] wait_for_completion+0xfd/0x140 Aug 14 12:38:38 host-075 kernel: [<ffffffff810c4850>] ? wake_up_state+0x20/0x20 Aug 14 12:38:38 host-075 kernel: [<ffffffff810a750c>] flush_workqueue+0x12c/0x5b0 Aug 14 12:38:38 host-075 kernel: [<ffffffffc03f2a36>] chunk_io+0xf6/0x140 [dm_snapshot] Aug 14 12:38:38 host-075 kernel: [<ffffffffc03f2910>] ? persistent_prepare_exception+0x90/0x90 [dm_snapshot] Aug 14 12:38:38 host-075 kernel: [<ffffffff811c192b>] ? vmalloc+0x4b/0x50 Aug 14 12:38:38 host-075 kernel: [<ffffffffc03f0210>] ? alloc_completed_exception+0x50/0x50 [dm_snapshot] Aug 14 12:38:38 host-075 kernel: [<ffffffffc03f31e6>] persistent_read_metadata+0xa6/0x560 [dm_snapshot] Aug 14 12:38:38 host-075 kernel: [<ffffffff8118573d>] ? mempool_create_node+0xdd/0x140 Aug 14 12:38:38 host-075 kernel: [<ffffffff811df75c>] ? kmem_cache_alloc_trace+0x3c/0x200 Aug 14 12:38:38 host-075 kernel: [<ffffffffc03f0baa>] snapshot_ctr+0x5fa/0x670 [dm_snapshot] Aug 14 12:38:38 host-075 kernel: [<ffffffffc0005a57>] dm_table_add_target+0x177/0x430 [dm_mod] Aug 14 12:38:38 host-075 kernel: [<ffffffffc0009417>] table_load+0x157/0x390 [dm_mod] Aug 14 12:38:38 host-075 kernel: [<ffffffffc00092c0>] ? retrieve_status+0x1c0/0x1c0 [dm_mod] Aug 14 12:38:38 host-075 kernel: [<ffffffffc000a015>] ctl_ioctl+0x1e5/0x500 [dm_mod] Aug 14 12:38:38 host-075 kernel: [<ffffffffc000a343>] dm_ctl_ioctl+0x13/0x20 [dm_mod] Aug 14 12:38:38 host-075 kernel: [<ffffffff812151fd>] do_vfs_ioctl+0x33d/0x540 Aug 14 12:38:38 host-075 kernel: [<ffffffff812b766f>] ? file_has_perm+0x9f/0xb0 Aug 14 12:38:38 host-075 kernel: [<ffffffff812154a1>] SyS_ioctl+0xa1/0xc0 Aug 14 12:38:38 host-075 kernel: [<ffffffff816b3f89>] system_call_fastpath+0x16/0x1b Version-Release number of selected component (if applicable): 3.10.0-686.el7.x86_64 lvm2-2.02.171-8.el7 BUILT: Wed Jun 28 13:28:58 CDT 2017 lvm2-libs-2.02.171-8.el7 BUILT: Wed Jun 28 13:28:58 CDT 2017 lvm2-cluster-2.02.171-8.el7 BUILT: Wed Jun 28 13:28:58 CDT 2017 device-mapper-1.02.140-8.el7 BUILT: Wed Jun 28 13:28:58 CDT 2017 device-mapper-libs-1.02.140-8.el7 BUILT: Wed Jun 28 13:28:58 CDT 2017 device-mapper-event-1.02.140-8.el7 BUILT: Wed Jun 28 13:28:58 CDT 2017 device-mapper-event-libs-1.02.140-8.el7 BUILT: Wed Jun 28 13:28:58 CDT 2017 device-mapper-persistent-data-0.7.0-0.1.rc6.el7 BUILT: Mon Mar 27 10:15:46 CDT 2017
Solved with upstream commit: https://www.redhat.com/archives/lvm-devel/2017-November/msg00014.html
Fix verified in the latest rpms. 3.10.0-811.el7.x86_64 lvm2-2.02.176-5.el7 BUILT: Wed Dec 6 04:13:07 CST 2017 lvm2-libs-2.02.176-5.el7 BUILT: Wed Dec 6 04:13:07 CST 2017 lvm2-cluster-2.02.176-5.el7 BUILT: Wed Dec 6 04:13:07 CST 2017 lvm2-lockd-2.02.176-5.el7 BUILT: Wed Dec 6 04:13:07 CST 2017 lvm2-python-boom-0.8.1-5.el7 BUILT: Wed Dec 6 04:15:40 CST 2017 cmirror-2.02.176-5.el7 BUILT: Wed Dec 6 04:13:07 CST 2017 device-mapper-1.02.145-5.el7 BUILT: Wed Dec 6 04:13:07 CST 2017 device-mapper-libs-1.02.145-5.el7 BUILT: Wed Dec 6 04:13:07 CST 2017 device-mapper-event-1.02.145-5.el7 BUILT: Wed Dec 6 04:13:07 CST 2017 device-mapper-event-libs-1.02.145-5.el7 BUILT: Wed Dec 6 04:13:07 CST 2017 device-mapper-persistent-data-0.7.3-2.el7 BUILT: Tue Oct 10 04:00:07 CDT 2017 SCENARIO - [write_to_snap_merge] Create snaps of origin with fs data, verify data on snaps, change data on snaps, merge data back to origin, verify origin data Making origin volume lvcreate -L 4G snapper -n origin Placing an ext filesystem on origin volume mke2fs 1.42.9 (28-Dec-2013) Mounting origin volume Writing files to /mnt/origin Checking files on /mnt/origin Making 6 snapshots of the origin volume, mounting, and verifying original data lvcreate -s /dev/snapper/origin -c 128 -n merge1 -L 2G +++ Mounting and verifying snapshot merge1 data +++ Checking files on /mnt/merge1 lvcreate -s /dev/snapper/origin -c 128 -n merge2 -L 2G +++ Mounting and verifying snapshot merge2 data +++ Checking files on /mnt/merge2 lvcreate -s /dev/snapper/origin -c 128 -n merge3 -L 2G +++ Mounting and verifying snapshot merge3 data +++ Checking files on /mnt/merge3 lvcreate -s /dev/snapper/origin -c 128 -n merge4 -L 2G +++ Mounting and verifying snapshot merge4 data +++ Checking files on /mnt/merge4 lvcreate -s /dev/snapper/origin -c 128 -n merge5 -L 2G +++ Mounting and verifying snapshot merge5 data +++ Checking files on /mnt/merge5 lvcreate -s /dev/snapper/origin -c 128 -n merge6 -L 2G +++ Mounting and verifying snapshot merge6 data +++ Checking files on /mnt/merge6 Writing new snapshot data and then merging back each of the snapshot volumes +++ snapshot snapper/merge1 +++ Writing files to /mnt/merge1 Checking files on /mnt/merge1 Deactivating origin/snap volume(s) Merge snapshot snapper/merge1 back into the origin lvconvert --merge snapper/merge1 Attempting regression check for multiple queued up merge attempts 1. Same snap already initiated (bug 1481403): lvconvert --merge snapper/merge1 Command on LV snapper/merge1 is invalid on LV with properties: lv_is_merging_cow . Command not permitted on LV snapper/merge1. 2. Different snap with other initiated (bug 1481383): lvconvert --merge snapper/merge2 Cannot merge snapshot snapper/merge2 into the origin snapper/origin with merging snapshot snapper/merge1. Activating origin/snap volume(s) Waiting for the snap merge to complete... Failed to find logical volume "snapper/merge1" Mount and verify the proper data now exists on the origin Checking files on /mnt/origin +++ snapshot snapper/merge2 +++ Writing files to /mnt/merge2 Checking files on /mnt/merge2 Deactivating origin/snap volume(s) Merge snapshot snapper/merge2 back into the origin lvconvert --merge snapper/merge2 Attempting regression check for multiple queued up merge attempts 1. Same snap already initiated (bug 1481403): lvconvert --merge snapper/merge2 Command on LV snapper/merge2 is invalid on LV with properties: lv_is_merging_cow . Command not permitted on LV snapper/merge2. 2. Different snap with other initiated (bug 1481383): lvconvert --merge snapper/merge3 Cannot merge snapshot snapper/merge3 into the origin snapper/origin with merging snapshot snapper/merge2. Activating origin/snap volume(s) Waiting for the snap merge to complete... Failed to find logical volume "snapper/merge2" Mount and verify the proper data now exists on the origin Checking files on /mnt/origin [...]
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2018:0853