Bug 1872695

Summary: Cannot create LV with cache when PV is encrypted
Product: Red Hat Enterprise Linux 8 Reporter: Vojtech Trefny <vtrefny>
Component: lvm2Assignee: David Teigland <teigland>
lvm2 sub component: Cache Logical Volumes QA Contact: cluster-qe <cluster-qe>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: high CC: agk, anaconda-maint-list, cmarthal, heinzm, jbittner, jbrassow, jkonecny, jrusz, jstodola, lvm-team, mcsontos, msnitzer, prajnoha, release-test-team-automation, rmetrich, rvykydal, vtrefny, zkabelac
Version: 8.2Keywords: TestCaseNeeded, Triaged
Target Milestone: rc   
Target Release: 8.0   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: lvm2-2.03.11-2.el8 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1855973 Environment:
Last Closed: 2021-05-18 15:01:53 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1855973    
Bug Blocks:    

Comment 5 Zdenek Kabelac 2020-10-09 09:18:05 UTC
So the story here goes this way -

Wiping has got couple fixes in lvm2 (bug 1805892), which are slightly changing behavior in some cases
like this one.

lvconvert  wants to 'wipe'  and prompted user to confirm the action.

since Anaconda likely call lvm2 with closed  'stdin'  - it gets automatic answer  'n' - 
so command is now by new 'rule' stopped as if the user doesn't want wiping he should be using -Wn.
If user want wiping without prompting he needs to pass  '--yes' which is missing in this case.

However since this change was introduced relatively late in this cycle it's been reverted for 8.3
through this bug 1868169.

But as we do want to have fixed bug 1805892 and not ignore errors from wiping,
the next version (>=2.03.11) of lvm2 will strictly require either --yes or -Wn.

Thus some change on Anaconda side will be necessary.

Comment 6 Vojtech Trefny 2020-10-09 10:03:18 UTC
I think this might be a different issue -- we are calling lvconvert with "-y" option:

12:04:46.883011 lvconvert[3174] lvmcmdline.c:3068  Processing command: lvconvert -y --type cache-pool --poolmetadata data_cache_meta --cachemode writeback rhel/data_cache '--config= devices { preferred_names=["^/dev/mapper/", "^/dev/md/", "^/dev/sd"] } log {level=7 file=/tmp/lvm.log syslog=0}'

Comment 7 Zdenek Kabelac 2020-10-09 11:19:45 UTC
Yep the trace in  comment 2 exposed:

12:04:47.153312 lvconvert[3174] device_mapper/libdm-common.c:1496  rhel-lvol0: Processing NODE_READ_AHEAD 256 (flags=1)
12:04:47.153350 lvconvert[3174] device_mapper/libdm-common.c:1250  rhel-lvol0 (253:4): read ahead is 256
12:04:47.153356 lvconvert[3174] device_mapper/libdm-common.c:1375  rhel-lvol0: retaining kernel read ahead of 256 (requested 256)
12:04:47.153423 lvconvert[3174] device/dev-cache.c:750  Found dev 253:4 /dev/rhel/lvol0 - new alias.
12:04:47.153450 lvconvert[3174] label/label.c:542  Device open /dev/mapper/rhel-data_cache 253:4 failed errno 2
12:04:47.153455 lvconvert[3174] label/label.c:546  Device open /dev/mapper/rhel-data_cache 253:4 stat failed errno 2
12:04:47.158543 lvconvert[3174] label/label.c:561  Device open /dev/mapper/rhel-data_cache retry
12:04:47.158572 lvconvert[3174] label/label.c:542  Device open /dev/mapper/rhel-data_cache 253:4 failed errno 2
12:04:47.158578 lvconvert[3174] label/label.c:546  Device open /dev/mapper/rhel-data_cache 253:4 stat failed errno 2
12:04:47.158587 lvconvert[3174] metadata/lv_manip.c:7618  Failed to open rhel/lvol0 for wiping and zeroing.
12:04:47.158596 lvconvert[3174] metadata/lv_manip.c:8486  Aborting. Failed to wipe start of new LV.
12:04:47.158599 lvconvert[3174] activate/activate.c:2432  Deactivating rhel/lvol0.
12:04:47.158604 lvconvert[3174] activate/dev_manager.c:817  Getting device info for rhel-lvol0 [LVM-Wlz0mo2FoskrGP1d76e7z7FjwQVrzpb4UIJ0rF0YokZNlhZKNqSivsfnbkmp6cS0].
12:04:47.158621 lvconvert[3174] device_mapper/ioctl/libdm-iface.c:1898  dm info  LVM-Wlz0mo2FoskrGP1d76e7z7FjwQVrzpb4UIJ0rF0YokZNlhZKNqSivsfnbkmp6cS0 [ noopencount


So there is still a bug on lvm2 side.

Comment 8 Vojtech Trefny 2020-10-09 12:10:52 UTC
This new Fedora bug might be related: https://bugzilla.redhat.com/show_bug.cgi?id=1886767

Comment 10 Zdenek Kabelac 2021-01-13 14:36:43 UTC
Passing to David -  seems like this issue is more related to our caching.

From the log is appears that during conversion of data & metadata LV into a cache pool - cache remembered device names for major:minor
and then it tries to use same name for our wiping - while device have been already deactivated and should be already dropped from label cache.

It seems there are several mismatches between our 2 internal caches here that needs closer inspection.

Comment 11 Zdenek Kabelac 2021-01-13 14:52:15 UTC
It seems that dropping 'preferred_names' /dev/mapper at least makes the Anaconda run - however still lvm2 cache part needs to be corrected.

Comment 12 David Teigland 2021-01-13 23:57:12 UTC
I have a fairly simple fix which makes the command work correctly, but I'm going to take another day or so to study the problem from some other angles to see if more work is needed for a complete solution.

The problem is:
LVM creates a list of device paths for each major:minor at the start of the command ("dev-cache").  This is mostly used for PV paths, but it also includes paths to active LVs which are occasionally used (e.g. when wiping new LVs).  While doing the work of the command, lvm will sometimes deactivate an existing LV, which causes the device path for that LV to go away on the system.  But, lvm is not clearing its own dev-cache entry for the deactivated LV device path, so it's leaving stale info in its dev-cache.  When the same command later creates a new LV and activates it, that new LV can get the major:minor of the previously deactivated LV.  Because of the stale path, and because of the preferred_names setting, lvm is trying to use the path for the deactivated LV when trying to wipe the new LV.

Comment 13 Zdenek Kabelac 2021-01-14 12:07:20 UTC
Few notes when looking at the code base we currently have -

It's not obvious whether the usage of our caching engine has some benefits for the wiping (usage of async/io).
To take proper advantage of async/io support few more changes around the wipe of multiple LVs at once (i.e. raid creation) needs to happen.
ZERO-OUT ioctl recently added is likely not running async.

We still have 2 different caches in lvm2 - those should be unified - this possibly relates to still missing new Joe's engine.
So it might be worth try to deploy this finally.  Otherwise we have single ->fd descriptor used in two contexts and there
is now even a remapping engined behind the label cache - so it's getting cluttered.

We need to define what is 'atomic scan list of devices' and whether we have cases we would actually need to keep this list.
It might be the 'cache' can be actually almost emptied so the RAM can be used more efficiently after the initial scan
and only VG related devs needs to be cached - it can possibly even apply that dropping whole cache after scan can be
the reasonable match - needs some analysis.

Comment 14 David Teigland 2021-01-15 22:41:51 UTC
fix and test https://sourceware.org/git/?p=lvm2.git;a=commit;h=37227b8ad6ba67804f98cdadd0ed6f2b369ae656

before:

# lvcreate -n fast -L500M test; lvcreate -n meta -L200M test
  Logical volume "fast" created.
  Logical volume "meta" created.

# lvconvert -y --type cache-pool --poolmetadata meta test/fast --config='devices { preferred_names=["^/dev/mapper/", "^/dev/md/", "^/dev/sd"] }'
  WARNING: Converting test/fast and test/meta to cache pool's data and metadata volumes with metadata wiping.
  THIS WILL DESTROY CONTENT OF LOGICAL VOLUME (filesystem etc.)
  Device open /dev/mapper/test-fast 253:4 failed errno 2
  Device open /dev/mapper/test-fast 253:4 failed errno 2
  Failed to open test/lvol0 for wiping and zeroing.
  Aborting. Failed to wipe start of new LV.

after:

# lvcreate -n fast -L500M test; lvcreate -n meta -L200M test
  Logical volume "fast" created.
  Logical volume "meta" created.

# lvconvert -y --type cache-pool --poolmetadata meta test/fast --config='devices { preferred_names=["^/dev/mapper/", "^/dev/md/", "^/dev/sd"] }'
  WARNING: Converting test/fast and test/meta to cache pool's data and metadata volumes with metadata wiping.
  THIS WILL DESTROY CONTENT OF LOGICAL VOLUME (filesystem etc.)
  Converted test/fast and test/meta to cache pool.

Comment 17 Corey Marthaler 2021-02-01 23:44:46 UTC
Marking Verified:Tested in the latest lvm build.

kernel-4.18.0-277.el8    BUILT: Wed Jan 20 09:06:28 CST 2021
lvm2-2.03.11-2.el8    BUILT: Thu Jan 28 14:40:36 CST 2021
lvm2-libs-2.03.11-2.el8    BUILT: Thu Jan 28 14:40:36 CST 2021


[root@hayes-02 ~]# lvcreate -n fast -L500M test; lvcreate -n meta -L200M test
  Logical volume "fast" created.
  Logical volume "meta" created.
[root@hayes-02 ~]# lvconvert -y --type cache-pool --poolmetadata meta test/fast --config='devices { preferred_names=["^/dev/mapper/", "^/dev/md/", "^/dev/sd"] }'
  WARNING: Converting test/fast and test/meta to cache pool's data and metadata volumes with metadata wiping.
  THIS WILL DESTROY CONTENT OF LOGICAL VOLUME (filesystem etc.)
  Converted test/fast and test/meta to cache pool.
[root@hayes-02 ~]# lvs -a -o +devices
  LV              VG   Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert Devices      
  fast            test Cwi---C--- 500.00m                                                     fast_cdata(0)
  [fast_cdata]    test Cwi------- 500.00m                                                     /dev/sdb(0)  
  [fast_cmeta]    test ewi------- 200.00m                                                     /dev/sdb(125)
  [lvol0_pmspare] test ewi------- 200.00m                                                     /dev/sdb(175)

Comment 20 Corey Marthaler 2021-02-11 21:58:44 UTC
Fix verified in the latest rpms.

kernel-4.18.0-284.el8    BUILT: Mon Feb  8 04:33:33 CST 2021
lvm2-2.03.11-4.el8    BUILT: Thu Feb 11 04:35:23 CST 2021
lvm2-libs-2.03.11-4.el8    BUILT: Thu Feb 11 04:35:23 CST 2021


[root@host-086 ~]# lvcreate -n fast -L500M test; lvcreate -n meta -L200M test
  Logical volume "fast" created.
  Logical volume "meta" created.

[root@host-086 ~]# lvconvert -y --type cache-pool --poolmetadata meta test/fast --config='devices { preferred_names=["^/dev/mapper/", "^/dev/md/", "^/dev/sd"] }'
  WARNING: Converting test/fast and test/meta to cache pool's data and metadata volumes with metadata wiping.
  THIS WILL DESTROY CONTENT OF LOGICAL VOLUME (filesystem etc.)
  Converted test/fast and test/meta to cache pool.

[root@host-086 ~]# lvs -a -o +devices
  LV              VG            Attr       LSize   Pool   Origin Data%  Meta%  Move Log Cpy%Sync Convert Devices        
  fast            test          Cwi---C--- 500.00m                                                       fast_cdata(0)  
  [fast_cdata]    test          Cwi------- 500.00m                                                       /dev/sdb(0)    
  [fast_cmeta]    test          ewi------- 200.00m                                                       /dev/sdb(125)  
  [lvol0_pmspare] test          ewi------- 200.00m                                                       /dev/sdb(175)

Comment 22 errata-xmlrpc 2021-05-18 15:01:53 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (lvm2 bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:1659