RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1481383 - need to protect against multiple queued up merge attempts while origin is inactive
Summary: need to protect against multiple queued up merge attempts while origin is ina...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: lvm2
Version: 7.4
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: rc
: ---
Assignee: Zdenek Kabelac
QA Contact: cluster-qe@redhat.com
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-08-14 19:42 UTC by Corey Marthaler
Modified: 2021-09-03 12:55 UTC (History)
9 users (show)

Fixed In Version: lvm2-2.02.176-4.el7
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-04-10 15:20:44 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2018:0853 0 None None None 2018-04-10 15:21:32 UTC

Description Corey Marthaler 2017-08-14 19:42:00 UTC
Description of problem:
[root@host-075 tmp]# lvs -a -o +devices
  LV     VG      Attr       LSize   Pool Origin Devices        
  merge1 snapper swi---s---   2.00g      origin /dev/sda1(1024)
  merge2 snapper swi---s---   2.00g      origin /dev/sda1(1536)
  merge3 snapper swi---s---   2.00g      origin /dev/sda1(2048)
  merge4 snapper swi---s---   2.00g      origin /dev/sda1(2560)
  merge5 snapper swi---s---   2.00g      origin /dev/sda1(3072)
  origin snapper owi---s---   4.00g             /dev/sda1(0)   

[root@host-075 tmp]# lvconvert --merge snapper/merge1
  Merging of snapshot snapper/merge1 will occur on next activation of snapper/origin.
[root@host-075 tmp]# lvconvert --merge snapper/merge2
  Merging of snapshot snapper/merge2 will occur on next activation of snapper/origin.
[root@host-075 tmp]# lvconvert --merge snapper/merge3
  Merging of snapshot snapper/merge3 will occur on next activation of snapper/origin.
[root@host-075 tmp]# lvconvert --merge snapper/merge4
  Merging of snapshot snapper/merge4 will occur on next activation of snapper/origin.
[root@host-075 tmp]# lvconvert --merge snapper/merge5
  Merging of snapshot snapper/merge5 will occur on next activation of snapper/origin.
[root@host-075 tmp]# vgchange -ay snapper
  1 logical volume(s) in volume group "snapper" now active
  Background polling started for 1 logical volume(s) in volume group "snapper"

[root@host-075 ~]# dmsetup status
snapper-merge5-cow: 0 4194304 linear 
snapper-merge5: 0 8388608 error 
snapper-merge4: 0 8388608 error 
snapper-merge3: 0 8388608 error 
snapper-merge1-cow: 
snapper-merge2: 0 8388608 error 
snapper-merge1: 0 8388608 error 
snapper-origin-real: 0 8388608 linear 
snapper-merge2-cow: 
rhel_host--075-swap: 0 1679360 linear 
snapper-merge3-cow: 
rhel_host--075-root: 0 12992512 linear 
snapper-merge4-cow: 
snapper-origin: 0 8388608 snapshot-merge 512/4194304 512

[root@host-075 tmp]# lvs -a -o +devices
[Deadlock]

Aug 14 12:35:18 host-075 lvmpolld: W: #011LVPOLL: PID 27744: STDERR: '  Internal error: Performing unsafe table load while 12 device(s) are known to be suspended:  (253:5) '
Aug 14 12:36:18 host-075 lvmpolld: W: LVMPOLLD: polling for output of the lvm cmd (PID 27744) has timed out
Aug 14 12:36:26 host-075 systemd: Started Session 7749 of user root.
Aug 14 12:36:26 host-075 systemd: Starting Session 7749 of user root.
Aug 14 12:36:26 host-075 systemd-logind: New session 7749 of user root.
Aug 14 12:36:26 host-075 dbus[619]: [system] Activating service name='org.freedesktop.problems' (using servicehelper)
Aug 14 12:36:26 host-075 dbus-daemon: dbus[619]: [system] Activating service name='org.freedesktop.problems' (using servicehelper)
Aug 14 12:36:26 host-075 dbus[619]: [system] Successfully activated service 'org.freedesktop.problems'
Aug 14 12:36:26 host-075 dbus-daemon: dbus[619]: [system] Successfully activated service 'org.freedesktop.problems'
Aug 14 12:37:18 host-075 lvmpolld: W: LVMPOLLD: polling for output of the lvm cmd (PID 27744) has timed out
Aug 14 12:38:18 host-075 lvmpolld[27740]: LVMPOLLD: LVM2 cmd is unresponsive too long (PID 27744) (no output for 180 seconds)
Aug 14 12:38:18 host-075 lvmpolld: W: LVMPOLLD: polling for output of the lvm cmd (PID 27744) has timed out
Aug 14 12:38:38 host-075 kernel: INFO: task kworker/0:0:25794 blocked for more than 120 seconds.
Aug 14 12:38:38 host-075 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 14 12:38:38 host-075 kernel: kworker/0:0     D ffff880016f217d4     0 25794      2 0x00000080
Aug 14 12:38:38 host-075 kernel: Workqueue: ksnaphd do_metadata [dm_snapshot]
Aug 14 12:38:38 host-075 kernel: ffff880028e5bc10 0000000000000046 ffff88001dd7bf40 ffff880028e5bfd8
Aug 14 12:38:38 host-075 kernel: ffff880028e5bfd8 ffff880028e5bfd8 ffff88001dd7bf40 ffff88003fc16cc0
Aug 14 12:38:38 host-075 kernel: 0000000000000000 7fffffffffffffff ffff88001dd7bf40 ffff880037fe0c60
Aug 14 12:38:38 host-075 kernel: Call Trace:
Aug 14 12:38:38 host-075 kernel: [<ffffffff816a8469>] schedule+0x29/0x70
Aug 14 12:38:38 host-075 kernel: [<ffffffff816a5f79>] schedule_timeout+0x239/0x2c0
Aug 14 12:38:38 host-075 kernel: [<ffffffff812f8e50>] ? submit_bio+0x70/0x150
Aug 14 12:38:38 host-075 kernel: [<ffffffff81062efe>] ? kvm_clock_get_cycles+0x1e/0x20
Aug 14 12:38:38 host-075 kernel: [<ffffffff816a7aed>] io_schedule_timeout+0xad/0x130
Aug 14 12:38:38 host-075 kernel: [<ffffffffc000acf0>] ? bvec_next_page+0x10/0x10 [dm_mod]
Aug 14 12:38:38 host-075 kernel: [<ffffffff816a8a9d>] wait_for_completion_io+0xfd/0x140
Aug 14 12:38:38 host-075 kernel: [<ffffffff810c4850>] ? wake_up_state+0x20/0x20
Aug 14 12:38:38 host-075 kernel: [<ffffffffc000b5bf>] dm_io+0x26f/0x2e0 [dm_mod]
Aug 14 12:38:38 host-075 kernel: [<ffffffffc000ae90>] ? dm_io_client_destroy+0x30/0x30 [dm_mod]
Aug 14 12:38:38 host-075 kernel: [<ffffffffc000acf0>] ? bvec_next_page+0x10/0x10 [dm_mod]
Aug 14 12:38:38 host-075 kernel: [<ffffffffc03f2931>] do_metadata+0x21/0x30 [dm_snapshot]
Aug 14 12:38:38 host-075 kernel: [<ffffffff810a885a>] process_one_work+0x17a/0x440
Aug 14 12:38:38 host-075 kernel: [<ffffffff810a9526>] worker_thread+0x126/0x3c0
Aug 14 12:38:38 host-075 kernel: [<ffffffff810a9400>] ? manage_workers.isra.24+0x2a0/0x2a0
Aug 14 12:38:38 host-075 kernel: [<ffffffff810b09cf>] kthread+0xcf/0xe0
Aug 14 12:38:38 host-075 kernel: [<ffffffff810b0900>] ? insert_kthread_work+0x40/0x40
Aug 14 12:38:38 host-075 kernel: [<ffffffff816b3ed8>] ret_from_fork+0x58/0x90
Aug 14 12:38:38 host-075 kernel: [<ffffffff810b0900>] ? insert_kthread_work+0x40/0x40
Aug 14 12:38:38 host-075 kernel: INFO: task lvm:27744 blocked for more than 120 seconds.
Aug 14 12:38:38 host-075 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 14 12:38:38 host-075 kernel: lvm             D ffff8800371dba50     0 27744  27743 0x00000080
Aug 14 12:38:38 host-075 kernel: ffff8800371db8f0 0000000000000082 ffff88001dd79fa0 ffff8800371dbfd8
Aug 14 12:38:38 host-075 kernel: ffff8800371dbfd8 ffff8800371dbfd8 ffff88001dd79fa0 ffff8800371dba70
Aug 14 12:38:38 host-075 kernel: 7fffffffffffffff ffff8800371dba68 ffff88001dd79fa0 ffff8800371dba50
Aug 14 12:38:38 host-075 kernel: Call Trace:
Aug 14 12:38:38 host-075 kernel: [<ffffffff816a8469>] schedule+0x29/0x70
Aug 14 12:38:38 host-075 kernel: [<ffffffff816a5f79>] schedule_timeout+0x239/0x2c0
Aug 14 12:38:38 host-075 kernel: [<ffffffff810c1315>] ? check_preempt_curr+0x85/0xa0
Aug 14 12:38:38 host-075 kernel: [<ffffffff810c1349>] ? ttwu_do_wakeup+0x19/0xd0
Aug 14 12:38:38 host-075 kernel: [<ffffffff816a881d>] wait_for_completion+0xfd/0x140
Aug 14 12:38:38 host-075 kernel: [<ffffffff810c4850>] ? wake_up_state+0x20/0x20
Aug 14 12:38:38 host-075 kernel: [<ffffffff810a750c>] flush_workqueue+0x12c/0x5b0
Aug 14 12:38:38 host-075 kernel: [<ffffffffc03f2a36>] chunk_io+0xf6/0x140 [dm_snapshot]
Aug 14 12:38:38 host-075 kernel: [<ffffffffc03f2910>] ? persistent_prepare_exception+0x90/0x90 [dm_snapshot]
Aug 14 12:38:38 host-075 kernel: [<ffffffff811c192b>] ? vmalloc+0x4b/0x50
Aug 14 12:38:38 host-075 kernel: [<ffffffffc03f0210>] ? alloc_completed_exception+0x50/0x50 [dm_snapshot]
Aug 14 12:38:38 host-075 kernel: [<ffffffffc03f31e6>] persistent_read_metadata+0xa6/0x560 [dm_snapshot]
Aug 14 12:38:38 host-075 kernel: [<ffffffff8118573d>] ? mempool_create_node+0xdd/0x140
Aug 14 12:38:38 host-075 kernel: [<ffffffff811df75c>] ? kmem_cache_alloc_trace+0x3c/0x200
Aug 14 12:38:38 host-075 kernel: [<ffffffffc03f0baa>] snapshot_ctr+0x5fa/0x670 [dm_snapshot]
Aug 14 12:38:38 host-075 kernel: [<ffffffffc0005a57>] dm_table_add_target+0x177/0x430 [dm_mod]
Aug 14 12:38:38 host-075 kernel: [<ffffffffc0009417>] table_load+0x157/0x390 [dm_mod]
Aug 14 12:38:38 host-075 kernel: [<ffffffffc00092c0>] ? retrieve_status+0x1c0/0x1c0 [dm_mod]
Aug 14 12:38:38 host-075 kernel: [<ffffffffc000a015>] ctl_ioctl+0x1e5/0x500 [dm_mod]
Aug 14 12:38:38 host-075 kernel: [<ffffffffc000a343>] dm_ctl_ioctl+0x13/0x20 [dm_mod]
Aug 14 12:38:38 host-075 kernel: [<ffffffff812151fd>] do_vfs_ioctl+0x33d/0x540
Aug 14 12:38:38 host-075 kernel: [<ffffffff812b766f>] ? file_has_perm+0x9f/0xb0
Aug 14 12:38:38 host-075 kernel: [<ffffffff812154a1>] SyS_ioctl+0xa1/0xc0
Aug 14 12:38:38 host-075 kernel: [<ffffffff816b3f89>] system_call_fastpath+0x16/0x1b


Version-Release number of selected component (if applicable):
3.10.0-686.el7.x86_64

lvm2-2.02.171-8.el7    BUILT: Wed Jun 28 13:28:58 CDT 2017
lvm2-libs-2.02.171-8.el7    BUILT: Wed Jun 28 13:28:58 CDT 2017
lvm2-cluster-2.02.171-8.el7    BUILT: Wed Jun 28 13:28:58 CDT 2017
device-mapper-1.02.140-8.el7    BUILT: Wed Jun 28 13:28:58 CDT 2017
device-mapper-libs-1.02.140-8.el7    BUILT: Wed Jun 28 13:28:58 CDT 2017
device-mapper-event-1.02.140-8.el7    BUILT: Wed Jun 28 13:28:58 CDT 2017
device-mapper-event-libs-1.02.140-8.el7    BUILT: Wed Jun 28 13:28:58 CDT 2017
device-mapper-persistent-data-0.7.0-0.1.rc6.el7    BUILT: Mon Mar 27 10:15:46 CDT 2017

Comment 2 Zdenek Kabelac 2017-11-08 12:17:47 UTC
Solved with upstream commit:

https://www.redhat.com/archives/lvm-devel/2017-November/msg00014.html

Comment 4 Corey Marthaler 2017-12-08 23:17:25 UTC
Fix verified in the latest rpms.

3.10.0-811.el7.x86_64
lvm2-2.02.176-5.el7    BUILT: Wed Dec  6 04:13:07 CST 2017
lvm2-libs-2.02.176-5.el7    BUILT: Wed Dec  6 04:13:07 CST 2017
lvm2-cluster-2.02.176-5.el7    BUILT: Wed Dec  6 04:13:07 CST 2017
lvm2-lockd-2.02.176-5.el7    BUILT: Wed Dec  6 04:13:07 CST 2017
lvm2-python-boom-0.8.1-5.el7    BUILT: Wed Dec  6 04:15:40 CST 2017
cmirror-2.02.176-5.el7    BUILT: Wed Dec  6 04:13:07 CST 2017
device-mapper-1.02.145-5.el7    BUILT: Wed Dec  6 04:13:07 CST 2017
device-mapper-libs-1.02.145-5.el7    BUILT: Wed Dec  6 04:13:07 CST 2017
device-mapper-event-1.02.145-5.el7    BUILT: Wed Dec  6 04:13:07 CST 2017
device-mapper-event-libs-1.02.145-5.el7    BUILT: Wed Dec  6 04:13:07 CST 2017
device-mapper-persistent-data-0.7.3-2.el7    BUILT: Tue Oct 10 04:00:07 CDT 2017



SCENARIO - [write_to_snap_merge]
Create snaps of origin with fs data, verify data on snaps, change data on snaps, merge data back to origin, verify origin data
Making origin volume
lvcreate -L 4G snapper -n origin
Placing an ext filesystem on origin volume
mke2fs 1.42.9 (28-Dec-2013)
Mounting origin volume

Writing files to /mnt/origin

Checking files on /mnt/origin


Making 6 snapshots of the origin volume, mounting, and verifying original data
lvcreate -s /dev/snapper/origin -c 128 -n merge1 -L 2G
+++ Mounting and verifying snapshot merge1 data +++
Checking files on /mnt/merge1

lvcreate -s /dev/snapper/origin -c 128 -n merge2 -L 2G
+++ Mounting and verifying snapshot merge2 data +++
Checking files on /mnt/merge2

lvcreate -s /dev/snapper/origin -c 128 -n merge3 -L 2G
+++ Mounting and verifying snapshot merge3 data +++
Checking files on /mnt/merge3

lvcreate -s /dev/snapper/origin -c 128 -n merge4 -L 2G
+++ Mounting and verifying snapshot merge4 data +++
Checking files on /mnt/merge4

lvcreate -s /dev/snapper/origin -c 128 -n merge5 -L 2G
+++ Mounting and verifying snapshot merge5 data +++
Checking files on /mnt/merge5

lvcreate -s /dev/snapper/origin -c 128 -n merge6 -L 2G
+++ Mounting and verifying snapshot merge6 data +++
Checking files on /mnt/merge6

Writing new snapshot data and then merging back each of the snapshot volumes
+++ snapshot snapper/merge1 +++
Writing files to /mnt/merge1
Checking files on /mnt/merge1
Deactivating origin/snap volume(s)
Merge snapshot snapper/merge1 back into the origin
lvconvert --merge snapper/merge1

Attempting regression check for multiple queued up merge attempts
1. Same snap already initiated (bug 1481403): lvconvert --merge snapper/merge1
  Command on LV snapper/merge1 is invalid on LV with properties: lv_is_merging_cow .
  Command not permitted on LV snapper/merge1.
2. Different snap with other initiated (bug 1481383): lvconvert --merge snapper/merge2
  Cannot merge snapshot snapper/merge2 into the origin snapper/origin with merging snapshot snapper/merge1.

Activating origin/snap volume(s)
Waiting for the snap merge to complete...
  Failed to find logical volume "snapper/merge1"
Mount and verify the proper data now exists on the origin
Checking files on /mnt/origin


+++ snapshot snapper/merge2 +++
Writing files to /mnt/merge2
Checking files on /mnt/merge2
Deactivating origin/snap volume(s)
Merge snapshot snapper/merge2 back into the origin
lvconvert --merge snapper/merge2

Attempting regression check for multiple queued up merge attempts
1. Same snap already initiated (bug 1481403): lvconvert --merge snapper/merge2
  Command on LV snapper/merge2 is invalid on LV with properties: lv_is_merging_cow .
  Command not permitted on LV snapper/merge2.
2. Different snap with other initiated (bug 1481383): lvconvert --merge snapper/merge3
  Cannot merge snapshot snapper/merge3 into the origin snapper/origin with merging snapshot snapper/merge2.

Activating origin/snap volume(s)
Waiting for the snap merge to complete...
  Failed to find logical volume "snapper/merge2"
Mount and verify the proper data now exists on the origin
Checking files on /mnt/origin
[...]

Comment 7 errata-xmlrpc 2018-04-10 15:20:44 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:0853


Note You need to log in before you can comment on or make changes to this bug.