1753713 – vgchange --refresh can fail if running "delayed" merge is in progress: "snapshot-merge: A snapshot is already merging."

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1753713 - vgchange --refresh can fail if running "delayed" merge is in progress: "snapshot-merge: A snapshot is already merging."

Summary: vgchange --refresh can fail if running "delayed" merge is in progress: "snaps...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	lvm2
Sub Component:
Version:	7.8
Hardware:	x86_64
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	rc
Target Release:	---
Assignee:	Zdenek Kabelac
QA Contact:	cluster-qe@redhat.com
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2019-09-19 15:37 UTC by Corey Marthaler
Modified:	2021-09-03 12:55 UTC (History)
CC List:	8 users (show)
Fixed In Version:	lvm2-2.02.186-3.el7
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-03-31 20:04:51 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
verbose vgcreate --refresh attempt (77.27 KB, text/plain) 2019-09-19 15:39 UTC, Corey Marthaler	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2020:1129	0	None	None	None	2020-03-31 20:05:22 UTC

Description Corey Marthaler 2019-09-19 15:37:35 UTC

Description of problem:
SCENARIO - [reboot_before_cache_snap_merge_starts]
Attempt to merge an inuse snapshot, then "reboot" the machine before the merge can take place

*** Cache info for this scenario ***
*  origin (slow):  /dev/mapper/mpatha1
*  pool (fast):    /dev/mapper/mpathh1
************************************

Adding "slow" and "fast" tags to corresponding pvs
Create origin (slow) volume
lvcreate --wipesignatures y  -L 4G -n corigin cache_sanity @slow

Create cache data and cache metadata (fast) volumes
lvcreate  -L 2G -n pool cache_sanity @fast
lvcreate  -L 12M -n pool_meta cache_sanity @fast
WARNING: xfs signature detected on /dev/cache_sanity/pool_meta at offset 0. Wipe it? [y/n]: [n]
  Aborted wiping of xfs.
  1 existing signature left on the device.

Create cache pool volume by combining the cache data and cache metadata (fast) volumes with policy: smq  mode: writeback
lvconvert --yes --type cache-pool --cachepolicy smq --cachemode writeback -c 64 --poolmetadata cache_sanity/pool_meta cache_sanity/pool
  WARNING: Converting cache_sanity/pool and cache_sanity/pool_meta to cache pool's data and metadata volumes with metadata wiping.
  THIS WILL DESTROY CONTENT OF LOGICAL VOLUME (filesystem etc.)
Create cached volume by combining the cache pool (fast) and origin (slow) volumes
lvconvert --yes --type cache --cachemetadataformat 2 --cachepool cache_sanity/pool cache_sanity/corigin

Placing an xfs filesystem on origin volume
warning: device is not properly aligned /dev/cache_sanity/corigin
Mounting origin volume

Making snapshot of origin volume
lvcreate  -s /dev/cache_sanity/corigin -c 128 -n merge_reboot -L 500M
Mounting snap volume

Attempt to merge snapshot cache_sanity/merge_reboot
lvconvert --merge cache_sanity/merge_reboot --yes
  Delaying merge since snapshot is open.
  Merging of snapshot cache_sanity/merge_reboot will occur on next activation of cache_sanity/corigin.


umount and deactivate volume group
vgchange --sysinit -ay cache_sanity
vgchange --refresh cache_sanity
  device-mapper: reload ioctl on  (253:24) failed: Invalid argument
  Failed to suspend cache_sanity/corigin.


[80033.514171] device-mapper: cache: Origin device (dm-21) discard unsupported: Disabling discard passdown.
[80034.448551] device-mapper: table: 253:24: snapshot-merge: A snapshot is already merging.
[80034.457612] device-mapper: ioctl: error adding target to table


From -vvvv:


#libdm-deptree.c:2734      Suppressed cache_sanity-pool_cmeta (253:20) identical table reload.
#libdm-deptree.c:2701      Loading table for cache_sanity-corigin-real (253:22).
#libdm-deptree.c:2646          Adding target to (253:22): 0 8388608 cache 253:20 253:19 253:21 64 1 writethrough cleaner 0
#ioctl/libdm-iface.c:1885          dm table   (253:22) [ opencount flush ]   [16384] (*1)
#libdm-deptree.c:2734      Suppressed cache_sanity-corigin-real (253:22) identical table reload.
#libdm-deptree.c:2701      Loading table for cache_sanity-corigin (253:24).
#libdm-deptree.c:2646          Adding target to (253:24): 0 8388608 snapshot-merge 253:22 253:23 P 256
#ioctl/libdm-iface.c:1885          dm table   (253:24) [ opencount flush ]   [16384] (*1)
#ioctl/libdm-iface.c:1885          dm reload   (253:24) [ noopencount flush ]   [16384] (*1)
#ioctl/libdm-iface.c:1923    device-mapper: reload ioctl on  (253:24) failed: Invalid argument
#libdm-deptree.c:2851          <backtrace>
#activate/dev_manager.c:3325          <backtrace>
#activate/dev_manager.c:3380          <backtrace>
#activate/activate.c:1419          <backtrace>
#activate/activate.c:2315          <backtrace>
#locking/locking.c:311           <backtrace>
#locking/locking.c:392           <backtrace>
#metadata/lv_manip.c:1482    Failed to suspend cache_sanity/corigin.
#locking/file_locking.c:90          Unlocking LV G9IYZ9lPEwMpQqPLA5ajw7pSV3RX8OyQM8FAVmtkXolH3OEPYXbLfCPiJJTQbviK
#activate/activate.c:2545          Resuming LV cache_sanity/corigin if active.
#activate/dev_manager.c:783           Getting device info for cache_sanity-corigin [LVM-G9IYZ9lPEwMpQqPLA5ajw7pSV3RX8OyQM8FAVmtkXolH3OEPYXbLfCPiJJTQbviK].
#ioctl/libdm-iface.c:1885          dm info  LVM-G9IYZ9lPEwMpQqPLA5ajw7pSV3RX8OyQM8FAVmtkXolH3OEPYXbLfCPiJJTQbviK [ noopencount flush ]   [16384] (*1)
#misc/lvm-flock.c:70          Unlocking /run/lock/lvm/A_cache_sanity
#misc/lvm-flock.c:47            _undo_flock /run/lock/lvm/A_cache_sanity





Version-Release number of selected component (if applicable):
3.10.0-1091.el7.x86_64

lvm2-2.02.186-1.el7    BUILT: Tue Aug 27 11:16:28 CDT 2019
lvm2-libs-2.02.186-1.el7    BUILT: Tue Aug 27 11:16:28 CDT 2019
lvm2-cluster-2.02.186-1.el7    BUILT: Tue Aug 27 11:16:28 CDT 2019
lvm2-lockd-2.02.186-1.el7    BUILT: Tue Aug 27 11:16:28 CDT 2019
lvm2-python-boom-0.9-19.el7    BUILT: Tue Aug 27 11:19:25 CDT 2019
cmirror-2.02.186-1.el7    BUILT: Tue Aug 27 11:16:28 CDT 2019
device-mapper-1.02.164-1.el7    BUILT: Tue Aug 27 11:16:28 CDT 2019
device-mapper-libs-1.02.164-1.el7    BUILT: Tue Aug 27 11:16:28 CDT 2019
device-mapper-event-1.02.164-1.el7    BUILT: Tue Aug 27 11:16:28 CDT 2019
device-mapper-event-libs-1.02.164-1.el7    BUILT: Tue Aug 27 11:16:28 CDT 2019
device-mapper-persistent-data-0.8.5-1.el7    BUILT: Mon Jun 10 03:58:20 CDT 2019


How reproducible:
Often, but not always

Comment 3 Corey Marthaler 2019-09-19 15:39:46 UTC

Created attachment 1616798 [details]
verbose vgcreate --refresh attempt

Comment 4 Zdenek Kabelac 2019-10-28 19:04:38 UTC

I'd hope these patches should fix this issue:

https://www.redhat.com/archives/lvm-devel/2019-October/msg00114.html
https://www.redhat.com/archives/lvm-devel/2019-October/msg00115.html
https://www.redhat.com/archives/lvm-devel/2019-October/msg00116.html

Effectively after merge - the table was improperly updated.

If these patches do not fix this problem - we would need to revisit this.

Comment 7 Corey Marthaler 2019-11-11 16:44:14 UTC

Fix verified in the latest rpms. Although, I'm not sure about the "/sys/dev/block/253:3/dm/uuid: fgets failed: Invalid argument" messages from time to time, that's likely a different issue.

3.10.0-1109.el7.x86_64

lvm2-2.02.186-3.el7    BUILT: Fri Nov  8 07:07:01 CST 2019
lvm2-libs-2.02.186-3.el7    BUILT: Fri Nov  8 07:07:01 CST 2019
lvm2-cluster-2.02.186-3.el7    BUILT: Fri Nov  8 07:07:01 CST 2019
lvm2-lockd-2.02.186-3.el7    BUILT: Fri Nov  8 07:07:01 CST 2019
device-mapper-1.02.164-3.el7    BUILT: Fri Nov  8 07:07:01 CST 2019
device-mapper-libs-1.02.164-3.el7    BUILT: Fri Nov  8 07:07:01 CST 2019
device-mapper-event-1.02.164-3.el7    BUILT: Fri Nov  8 07:07:01 CST 2019
device-mapper-event-libs-1.02.164-3.el7    BUILT: Fri Nov  8 07:07:01 CST 2019
device-mapper-persistent-data-0.8.5-1.el7    BUILT: Mon Jun 10 03:58:20 CDT 2019


[root@hayes-01 ~]# lvs -a -o +devices
  LV              VG           Attr       LSize   Pool   Origin          Data%  Meta%  Move Log Cpy%Sync Convert Devices         
  corigin         cache_sanity Owi---C---   4.00g [pool] [corigin_corig]                                         corigin_corig(0)
  [corigin_corig] cache_sanity owi---C---   4.00g                                                                /dev/sdg1(0)    
  [lvol0_pmspare] cache_sanity ewi-------  12.00m                                                                /dev/sdc1(0)    
  [merge_reboot]  cache_sanity Swi---s--- 500.00m        corigin                                                 /dev/sdc1(3)    
  [pool]          cache_sanity Cwi---C---   2.00g                                                                pool_cdata(0)   
  [pool_cdata]    cache_sanity Cwi-------   2.00g                                                                /dev/sde1(0)    
  [pool_cmeta]    cache_sanity ewi-------  12.00m                                                                /dev/sde1(512)  
[root@hayes-01 ~]# vgchange --sysinit -ay cache_sanity
  1 logical volume(s) in volume group "cache_sanity" now active
[root@hayes-01 ~]# vgchange --refresh cache_sanity
[root@hayes-01 ~]# lvs -a -o +devices
  LV              VG           Attr       LSize  Pool   Origin          Data%  Meta%  Move Log Cpy%Sync Convert Devices         
  corigin         cache_sanity Cwi-a-C---  4.00g [pool] [corigin_corig] 0.00   6.58            0.00             corigin_corig(0)
  [corigin_corig] cache_sanity owi-aoC---  4.00g                                                                /dev/sdg1(0)    
  [lvol0_pmspare] cache_sanity ewi------- 12.00m                                                                /dev/sdc1(0)    
  [pool]          cache_sanity Cwi---C---  2.00g                        0.00   6.58            0.00             pool_cdata(0)   
  [pool_cdata]    cache_sanity Cwi-ao----  2.00g                                                                /dev/sde1(0)    
  [pool_cmeta]    cache_sanity ewi-ao---- 12.00m                                                                /dev/sde1(512)  






============================================================
Iteration 10 of 10 started at Mon Nov 11 10:38:08 CST 2019
============================================================
SCENARIO - [reboot_before_cache_snap_merge_starts]
Attempt to merge an inuse snapshot, then "reboot" the machine before the merge can take place

*** Cache info for this scenario ***
*  origin (slow):  /dev/sde1
*  pool (fast):    /dev/sdc1
************************************

Adding "slow" and "fast" tags to corresponding pvs
Create origin (slow) volume
lvcreate --wipesignatures y  -L 4G -n corigin cache_sanity @slow
WARNING: xfs signature detected on /dev/cache_sanity/corigin at offset 0. Wipe it? [y/n]: [n]
  Aborted wiping of xfs.
  1 existing signature left on the device.

Create cache data and cache metadata (fast) volumes
lvcreate  -L 2G -n pool cache_sanity @fast
lvcreate  -L 12M -n pool_meta cache_sanity @fast

Create cache pool volume by combining the cache data and cache metadata (fast) volumes with policy: cleaner  mode: writethrough
lvconvert --yes --type cache-pool --cachepolicy cleaner --cachemode writethrough -c 32 --poolmetadata cache_sanity/pool_meta cache_sanity/pool
  WARNING: Converting cache_sanity/pool and cache_sanity/pool_meta to cache pool's data and metadata volumes with metadata wiping.
  THIS WILL DESTROY CONTENT OF LOGICAL VOLUME (filesystem etc.)
Create cached volume by combining the cache pool (fast) and origin (slow) volumes
lvconvert --yes --type cache --cachemetadataformat 1 --cachepool cache_sanity/pool cache_sanity/corigin

Placing an xfs filesystem on origin volume
Mounting origin volume

Making snapshot of origin volume
lvcreate  -s /dev/cache_sanity/corigin -c 128 -n merge_reboot -L 500M
Mounting snap volume

Attempt to merge snapshot cache_sanity/merge_reboot
lvconvert --merge cache_sanity/merge_reboot --yes

umount and deactivate volume group
vgchange --sysinit -ay cache_sanity
vgchange --refresh cache_sanity
  /sys/dev/block/253:3/dm/uuid: fgets failed: Invalid argument
  /sys/dev/block/253:3/dm/uuid: fgets failed: Invalid argument
  /sys/dev/block/253:3/dm/uuid: fgets failed: Invalid argument

Uncaching cache origin (lvconvert --uncache) cache_sanity/corigin from cache origin
Removing cache origin volume cache_sanity/corigin
lvremove -f /dev/cache_sanity/corigin

Comment 9 errata-xmlrpc 2020-03-31 20:04:51 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:1129

Note You need to log in before you can comment on or make changes to this bug.