1536015 – locking error when converting raid1 volumes to cache pool

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1536015 - locking error when converting raid1 volumes to cache pool

Summary: locking error when converting raid1 volumes to cache pool

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	lvm2
Sub Component:
Version:	7.5
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	unspecified
Target Milestone:	rc
Target Release:	---
Assignee:	LVM and device-mapper development team
QA Contact:	cluster-qe@redhat.com
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-01-18 12:56 UTC by Roman Bednář
Modified:	2021-09-03 12:40 UTC (History)
CC List:	9 users (show)
Fixed In Version:	lvm2-2.02.177-1.el7
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-04-10 15:23:49 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHEA-2018:0853	0	None	None	None	2018-04-10 15:24:41 UTC

Description Roman Bednář 2018-01-18 12:56:54 UTC

Converting a raid1 "slow" and raid1 "fast" volumes to cache pool occasionally ends up with locking error.


This does not seem 100% reproducible, although there are scenarios that should eventually trigger this bug.
Unfortunately I can not attach verbose lvconvert output since I was not able to reproduce this again yet.


=============================================
SCENARIO - [many_caches]
 Create 500 cached volumes
 
 *** Cache info for this scenario ***
 *  origin (slow):  /dev/sdb1 /dev/sdf1
 *  pool (fast):    /dev/sdc1 /dev/sdd1
 ************************************
 
 Recreating VG and PVs to increase metadata size
 Adding "slow" and "fast" tags to corresponding pvs
 Create origin (slow) volume
 lvcreate --activate ey --type raid1 -m 1 -L 20M -n origin_1 cache_sanity @slow
 Waiting until all mirror|raid volumes become fully syncd...
    1/1 mirror(s) are fully synced: ( 100.00% )
 Sleeping 15 sec
 
 Create cache data and cache metadata (fast) volumes
 lvcreate --activate ey --type raid1 -m 1 -L 10M -n pool_1 cache_sanity @fast
 lvcreate --activate ey --type raid1 -m 1 -L 12M -n pool_1_meta cache_sanity @fast
 Waiting until all mirror|raid volumes become fully syncd...
    2/2 mirror(s) are fully synced: ( 100.00% 100.00% )
 Sleeping 15 sec
 Sleeping 15 sec
 
 Create cache pool volume by combining the cache data and cache metadata (fast) volumes with policy: cleaner  mode: writethrough
 lvconvert --yes --type cache-pool --cachepolicy cleaner --cachemode writethrough -c 32 --poolmetadata cache_sanity/pool_1_meta cache_sanity/pool_1
   WARNING: Converting cache_sanity/pool_1 and cache_sanity/pool_1_meta to cache pool's data and metadata volumes with metadata wiping.
   THIS WILL DESTROY CONTENT OF LOGICAL VOLUME (filesystem etc.)
 Create cached volume by combining the cache pool (fast) and origin (slow) volumes
 lvconvert --yes --type cache --cachemetadataformat 1 --cachepool cache_sanity/pool_1 cache_sanity/origin_1
 dmsetup status | grep cache_sanity-origin_1 | grep writethrough | grep -w cleaner
 
 
 Create origin (slow) volume
 lvcreate --activate ey --type raid1 -m 1 -L 20M -n origin_2 cache_sanity @slow
 Waiting until all mirror|raid volumes become fully syncd...
    1/1 mirror(s) are fully synced: ( 100.00% )
 Sleeping 15 sec
 
 Create cache data and cache metadata (fast) volumes
 lvcreate --activate ey --type raid1 -m 1 -L 10M -n pool_2 cache_sanity @fast
 lvcreate --activate ey --type raid1 -m 1 -L 12M -n pool_2_meta cache_sanity @fast
 Waiting until all mirror|raid volumes become fully syncd...
    2/2 mirror(s) are fully synced: ( 100.00% 100.00% )
 Sleeping 15 sec
 Sleeping 15 sec
 
 Create cache pool volume by combining the cache data and cache metadata (fast) volumes with policy: mq  mode: writethrough
 lvconvert --yes --type cache-pool --cachepolicy mq --cachemode writethrough -c 32 --poolmetadata cache_sanity/pool_2_meta cache_sanity/pool_2
   WARNING: Converting cache_sanity/pool_2 and cache_sanity/pool_2_meta to cache pool's data and metadata volumes with metadata wiping.
   THIS WILL DESTROY CONTENT OF LOGICAL VOLUME (filesystem etc.)
   Error locking on node 1: Command timed out
   Aborting. Failed to deactivate metadata lv. Manual intervention required.
 couldn't create combined cache pool volume

=============================================

### Layout after failed lvconvert attempt (after manually attempting conversion again it ended up ok)

# lvs -a
  LV                        VG             Attr       LSize   Pool     Origin           Data%  Meta%  Move Log Cpy%Sync Convert
  [...]                                                               
  pool_2                    cache_sanity   rwi-a-r---  12.00m                                                  100.00          
  pool_2_meta               cache_sanity   rwi---r---  12.00m                                                                  
  [pool_2_meta_rimage_0]    cache_sanity   Iwi---r---  12.00m                                                                  
  [pool_2_meta_rimage_1]    cache_sanity   Iwi---r---  12.00m                                                                  
  [pool_2_meta_rmeta_0]     cache_sanity   ewi---r---   4.00m                                                                  
  [pool_2_meta_rmeta_1]     cache_sanity   ewi---r---   4.00m                                                                  
  [pool_2_rimage_0]         cache_sanity   iwi-aor---  12.00m                                                                  
  [pool_2_rimage_1]         cache_sanity   iwi-aor---  12.00m                                                                  
  [pool_2_rmeta_0]          cache_sanity   ewi-aor---   4.00m                                                                  
  [pool_2_rmeta_1]          cache_sanity   ewi-aor---   4.00m                                                                  
  [...]                           


=============================================
3.10.0-829.el7.x86_64

lvm2-2.02.176-5.el7    BUILT: Wed Dec  6 11:13:07 CET 2017
lvm2-libs-2.02.176-5.el7    BUILT: Wed Dec  6 11:13:07 CET 2017
lvm2-cluster-2.02.176-5.el7    BUILT: Wed Dec  6 11:13:07 CET 2017
lvm2-python-boom-0.8.1-5.el7    BUILT: Wed Dec  6 11:15:40 CET 2017
cmirror-2.02.176-5.el7    BUILT: Wed Dec  6 11:13:07 CET 2017
device-mapper-1.02.145-5.el7    BUILT: Wed Dec  6 11:13:07 CET 2017
device-mapper-libs-1.02.145-5.el7    BUILT: Wed Dec  6 11:13:07 CET 2017
device-mapper-event-1.02.145-5.el7    BUILT: Wed Dec  6 11:13:07 CET 2017
device-mapper-event-libs-1.02.145-5.el7    BUILT: Wed Dec  6 11:13:07 CET 2017
device-mapper-persistent-data-0.7.3-3.el7    BUILT: Tue Nov 14 12:07:18 CET 2017
vdo-6.1.0.98-13    BUILT: Tue Dec 12 14:35:03 CET 2017
kmod-kvdo-6.1.0.98-11.el7    BUILT: Tue Dec 12 15:08:52 CET 2017

Comment 2 Zdenek Kabelac 2018-01-18 17:10:33 UTC

Assuming same patch as for Bug 1533932 fixes this issue as well.

https://www.redhat.com/archives/lvm-devel/2018-January/msg00049.html

Comment 7 Zdenek Kabelac 2018-02-08 12:50:31 UTC

I don't think it's the same case here.

Please open new BZ for this case.


Using 'mirror' without dmeventd is risky thing on it's own.


Cluster does require the nodes have no leaked 'devices' occupying DM device.

So for full exploration we need to get traces & tables from all nodes.

Comment 8 Roman Bednář 2018-02-08 13:13:33 UTC

Ok, thank you for explanation. Opening separate BZ#1543429 to track this.

Just a note: dmeventd actually was running before (and after) the lvcreate.

Comment 11 errata-xmlrpc 2018-04-10 15:23:49 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:0853

Note You need to log in before you can comment on or make changes to this bug.