Bug 1872695
Summary: | Cannot create LV with cache when PV is encrypted | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 8 | Reporter: | Vojtech Trefny <vtrefny> |
Component: | lvm2 | Assignee: | David Teigland <teigland> |
lvm2 sub component: | Cache Logical Volumes | QA Contact: | cluster-qe <cluster-qe> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | medium | ||
Priority: | high | CC: | agk, anaconda-maint-list, cmarthal, heinzm, jbittner, jbrassow, jkonecny, jrusz, jstodola, lvm-team, mcsontos, msnitzer, prajnoha, release-test-team-automation, rmetrich, rvykydal, vtrefny, zkabelac |
Version: | 8.2 | Keywords: | TestCaseNeeded, Triaged |
Target Milestone: | rc | ||
Target Release: | 8.0 | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | lvm2-2.03.11-2.el8 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | 1855973 | Environment: | |
Last Closed: | 2021-05-18 15:01:53 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1855973 | ||
Bug Blocks: |
Comment 5
Zdenek Kabelac
2020-10-09 09:18:05 UTC
I think this might be a different issue -- we are calling lvconvert with "-y" option: 12:04:46.883011 lvconvert[3174] lvmcmdline.c:3068 Processing command: lvconvert -y --type cache-pool --poolmetadata data_cache_meta --cachemode writeback rhel/data_cache '--config= devices { preferred_names=["^/dev/mapper/", "^/dev/md/", "^/dev/sd"] } log {level=7 file=/tmp/lvm.log syslog=0}' Yep the trace in comment 2 exposed: 12:04:47.153312 lvconvert[3174] device_mapper/libdm-common.c:1496 rhel-lvol0: Processing NODE_READ_AHEAD 256 (flags=1) 12:04:47.153350 lvconvert[3174] device_mapper/libdm-common.c:1250 rhel-lvol0 (253:4): read ahead is 256 12:04:47.153356 lvconvert[3174] device_mapper/libdm-common.c:1375 rhel-lvol0: retaining kernel read ahead of 256 (requested 256) 12:04:47.153423 lvconvert[3174] device/dev-cache.c:750 Found dev 253:4 /dev/rhel/lvol0 - new alias. 12:04:47.153450 lvconvert[3174] label/label.c:542 Device open /dev/mapper/rhel-data_cache 253:4 failed errno 2 12:04:47.153455 lvconvert[3174] label/label.c:546 Device open /dev/mapper/rhel-data_cache 253:4 stat failed errno 2 12:04:47.158543 lvconvert[3174] label/label.c:561 Device open /dev/mapper/rhel-data_cache retry 12:04:47.158572 lvconvert[3174] label/label.c:542 Device open /dev/mapper/rhel-data_cache 253:4 failed errno 2 12:04:47.158578 lvconvert[3174] label/label.c:546 Device open /dev/mapper/rhel-data_cache 253:4 stat failed errno 2 12:04:47.158587 lvconvert[3174] metadata/lv_manip.c:7618 Failed to open rhel/lvol0 for wiping and zeroing. 12:04:47.158596 lvconvert[3174] metadata/lv_manip.c:8486 Aborting. Failed to wipe start of new LV. 12:04:47.158599 lvconvert[3174] activate/activate.c:2432 Deactivating rhel/lvol0. 12:04:47.158604 lvconvert[3174] activate/dev_manager.c:817 Getting device info for rhel-lvol0 [LVM-Wlz0mo2FoskrGP1d76e7z7FjwQVrzpb4UIJ0rF0YokZNlhZKNqSivsfnbkmp6cS0]. 12:04:47.158621 lvconvert[3174] device_mapper/ioctl/libdm-iface.c:1898 dm info LVM-Wlz0mo2FoskrGP1d76e7z7FjwQVrzpb4UIJ0rF0YokZNlhZKNqSivsfnbkmp6cS0 [ noopencount So there is still a bug on lvm2 side. This new Fedora bug might be related: https://bugzilla.redhat.com/show_bug.cgi?id=1886767 Passing to David - seems like this issue is more related to our caching. From the log is appears that during conversion of data & metadata LV into a cache pool - cache remembered device names for major:minor and then it tries to use same name for our wiping - while device have been already deactivated and should be already dropped from label cache. It seems there are several mismatches between our 2 internal caches here that needs closer inspection. It seems that dropping 'preferred_names' /dev/mapper at least makes the Anaconda run - however still lvm2 cache part needs to be corrected. I have a fairly simple fix which makes the command work correctly, but I'm going to take another day or so to study the problem from some other angles to see if more work is needed for a complete solution. The problem is: LVM creates a list of device paths for each major:minor at the start of the command ("dev-cache"). This is mostly used for PV paths, but it also includes paths to active LVs which are occasionally used (e.g. when wiping new LVs). While doing the work of the command, lvm will sometimes deactivate an existing LV, which causes the device path for that LV to go away on the system. But, lvm is not clearing its own dev-cache entry for the deactivated LV device path, so it's leaving stale info in its dev-cache. When the same command later creates a new LV and activates it, that new LV can get the major:minor of the previously deactivated LV. Because of the stale path, and because of the preferred_names setting, lvm is trying to use the path for the deactivated LV when trying to wipe the new LV. Few notes when looking at the code base we currently have - It's not obvious whether the usage of our caching engine has some benefits for the wiping (usage of async/io). To take proper advantage of async/io support few more changes around the wipe of multiple LVs at once (i.e. raid creation) needs to happen. ZERO-OUT ioctl recently added is likely not running async. We still have 2 different caches in lvm2 - those should be unified - this possibly relates to still missing new Joe's engine. So it might be worth try to deploy this finally. Otherwise we have single ->fd descriptor used in two contexts and there is now even a remapping engined behind the label cache - so it's getting cluttered. We need to define what is 'atomic scan list of devices' and whether we have cases we would actually need to keep this list. It might be the 'cache' can be actually almost emptied so the RAM can be used more efficiently after the initial scan and only VG related devs needs to be cached - it can possibly even apply that dropping whole cache after scan can be the reasonable match - needs some analysis. fix and test https://sourceware.org/git/?p=lvm2.git;a=commit;h=37227b8ad6ba67804f98cdadd0ed6f2b369ae656 before: # lvcreate -n fast -L500M test; lvcreate -n meta -L200M test Logical volume "fast" created. Logical volume "meta" created. # lvconvert -y --type cache-pool --poolmetadata meta test/fast --config='devices { preferred_names=["^/dev/mapper/", "^/dev/md/", "^/dev/sd"] }' WARNING: Converting test/fast and test/meta to cache pool's data and metadata volumes with metadata wiping. THIS WILL DESTROY CONTENT OF LOGICAL VOLUME (filesystem etc.) Device open /dev/mapper/test-fast 253:4 failed errno 2 Device open /dev/mapper/test-fast 253:4 failed errno 2 Failed to open test/lvol0 for wiping and zeroing. Aborting. Failed to wipe start of new LV. after: # lvcreate -n fast -L500M test; lvcreate -n meta -L200M test Logical volume "fast" created. Logical volume "meta" created. # lvconvert -y --type cache-pool --poolmetadata meta test/fast --config='devices { preferred_names=["^/dev/mapper/", "^/dev/md/", "^/dev/sd"] }' WARNING: Converting test/fast and test/meta to cache pool's data and metadata volumes with metadata wiping. THIS WILL DESTROY CONTENT OF LOGICAL VOLUME (filesystem etc.) Converted test/fast and test/meta to cache pool. Marking Verified:Tested in the latest lvm build. kernel-4.18.0-277.el8 BUILT: Wed Jan 20 09:06:28 CST 2021 lvm2-2.03.11-2.el8 BUILT: Thu Jan 28 14:40:36 CST 2021 lvm2-libs-2.03.11-2.el8 BUILT: Thu Jan 28 14:40:36 CST 2021 [root@hayes-02 ~]# lvcreate -n fast -L500M test; lvcreate -n meta -L200M test Logical volume "fast" created. Logical volume "meta" created. [root@hayes-02 ~]# lvconvert -y --type cache-pool --poolmetadata meta test/fast --config='devices { preferred_names=["^/dev/mapper/", "^/dev/md/", "^/dev/sd"] }' WARNING: Converting test/fast and test/meta to cache pool's data and metadata volumes with metadata wiping. THIS WILL DESTROY CONTENT OF LOGICAL VOLUME (filesystem etc.) Converted test/fast and test/meta to cache pool. [root@hayes-02 ~]# lvs -a -o +devices LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert Devices fast test Cwi---C--- 500.00m fast_cdata(0) [fast_cdata] test Cwi------- 500.00m /dev/sdb(0) [fast_cmeta] test ewi------- 200.00m /dev/sdb(125) [lvol0_pmspare] test ewi------- 200.00m /dev/sdb(175) Fix verified in the latest rpms. kernel-4.18.0-284.el8 BUILT: Mon Feb 8 04:33:33 CST 2021 lvm2-2.03.11-4.el8 BUILT: Thu Feb 11 04:35:23 CST 2021 lvm2-libs-2.03.11-4.el8 BUILT: Thu Feb 11 04:35:23 CST 2021 [root@host-086 ~]# lvcreate -n fast -L500M test; lvcreate -n meta -L200M test Logical volume "fast" created. Logical volume "meta" created. [root@host-086 ~]# lvconvert -y --type cache-pool --poolmetadata meta test/fast --config='devices { preferred_names=["^/dev/mapper/", "^/dev/md/", "^/dev/sd"] }' WARNING: Converting test/fast and test/meta to cache pool's data and metadata volumes with metadata wiping. THIS WILL DESTROY CONTENT OF LOGICAL VOLUME (filesystem etc.) Converted test/fast and test/meta to cache pool. [root@host-086 ~]# lvs -a -o +devices LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert Devices fast test Cwi---C--- 500.00m fast_cdata(0) [fast_cdata] test Cwi------- 500.00m /dev/sdb(0) [fast_cmeta] test ewi------- 200.00m /dev/sdb(125) [lvol0_pmspare] test ewi------- 200.00m /dev/sdb(175) Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (lvm2 bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:1659 |