Bug 1809660
Summary: | pvs without specifying a device fails with locking_type=4 | |||
---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Nir Soffer <nsoffer> | |
Component: | lvm2 | Assignee: | David Teigland <teigland> | |
lvm2 sub component: | Command-line tools | QA Contact: | cluster-qe <cluster-qe> | |
Status: | CLOSED ERRATA | Docs Contact: | ||
Severity: | urgent | |||
Priority: | urgent | CC: | agk, aperotti, cmarthal, dfediuck, heinzm, jbrassow, jmagrini, kzona, loberman, lvm-team, mcsontos, michal.skrivanek, mkalinin, msnitzer, mtessun, pelauter, prajnoha, rhandlin, teigland, thornber, vjuranek, zkabelac | |
Version: | 7.7 | Keywords: | ZStream | |
Target Milestone: | rc | |||
Target Release: | --- | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | lvm2-2.02.186-7.el7_8.1 | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1812441 (view as bug list) | Environment: | ||
Last Closed: | 2020-09-29 19:55:48 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1553133, 1711360, 1812441 |
Description
Nir Soffer
2020-03-03 16:09:22 UTC
The bug is from commit 79c4971210a6337563ffa2fca08fb636423d93d4 (from 2017.) The bug leads lvm to attempt a bogus recovery of the orphan VG whenever the orphan VG is read. (orphan VG is a fake internal VG for handling orphan PVs). No recovery code exists for the orphan VG, but lvm still attempts it. When lvm attempts the bogus/no-op orphan recovery, it tries to get a write lock. Usually the write lock succeeds, and "recovery" does nothing, so fairly harmless. But, with locking_type=4, the write lock fails, which bubbles up to cause the whole command failure. This fix is to not attempt to do recovery of the orphan VG: diff --git a/lib/metadata/metadata.c b/lib/metadata/metadata.c index 81a6029c4b59..666ad78230d2 100644 --- a/lib/metadata/metadata.c +++ b/lib/metadata/metadata.c @@ -3433,6 +3433,8 @@ static struct volume_group *_vg_read_orphans(struct cmd_context *cmd, dm_list_init(&head.list); + *consistent = 1; + if (!(vginfo = lvmcache_vginfo_from_vgname(orphan_vgname, NULL))) return_NULL; This bug above is exposed when a pvs command tries to process orphan PVs. That happens when pvs needs to look at all PVs on the system. This is obviously true for a 'pvs' command with no args which by definition reports all PVs. It is not true for a 'pvs /dev/foo' command which only looks at the named devs. When using --select, lvm processes all objects on the system because --select does a wide range of matching to various properties. It compares every PV, VG or LV to the --select matching pattern. So, 'pvs --select' is going to process the orphan VG, like 'pvs', and hit the locking/recovery bug. Turns out that RHV 4.3.9 will be delivered with RHEL 7.8, so we many not need a fix in RHEL 7.7.z. scratch build with patch in comment 2 https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=26995154 The fix seems to work: [root@host1 ~]# pvs --readonly --config 'global { use_lvmetad=0 }'; echo $? PV VG Fmt Attr PSize PFree /dev/mapper/360014051ce5179112ae4fb98e72d9ba9 test lvm2 a-- 99.62g 99.62g /dev/mapper/36001405271fe76b24b542bf858aaeef7 test lvm2 a-- 99.62g <22.88g /dev/mapper/360014053b18095bd13c48158687153a5 91630622-c645-4397-a9fe-9ddf26690500 lvm2 a-- 99.62g 91.62g /dev/sda2 centos lvm2 a-- <19.00g 0 0 [root@host1 ~]# pvs --config 'global { use_lvmetad=0 locking_type=4 }'; echo $? Scan of VG test from /dev/mapper/36001405271fe76b24b542bf858aaeef7 found metadata seqno 51296 vs previous 51295. PV VG Fmt Attr PSize PFree /dev/mapper/360014051ce5179112ae4fb98e72d9ba9 test lvm2 a-- 99.62g 99.62g /dev/mapper/36001405271fe76b24b542bf858aaeef7 test lvm2 a-- 99.62g 37.50g /dev/mapper/360014053b18095bd13c48158687153a5 91630622-c645-4397-a9fe-9ddf26690500 lvm2 a-- 99.62g 91.62g /dev/sda2 centos lvm2 a-- <19.00g 0 0 [root@host1 ~]# pvs --config 'global { use_lvmetad=0 locking_type=4 }' --select 'pv_name = test'; echo $? 0 [root@host1 ~]# pvs --config 'global { use_lvmetad=0 locking_type=4 }' --select 'pv_name = /dev/mapper/360014051ce5179112ae4fb98e72d9ba9'; echo $? PV VG Fmt Attr PSize PFree /dev/mapper/360014051ce5179112ae4fb98e72d9ba9 test lvm2 a-- 99.62g 99.62g 0 The warning: Scan of VG test from /dev/mapper/36001405271fe76b24b542bf858aaeef7 found metadata seqno 51296 vs previous 51295. Is expected, I'm running a stress test extending LVs in this VG on another host. We will do more testing later when we enable locking_type=4 in RHV. this can happen also when the device is specified, with locking_type=4 this fails on one of my test VMs: [root@localhost ~]# pvs /dev/sdc --config 'global { use_lvmetad=0 locking_type=4 }'; echo $? WARNING: Not using lvmetad because config setting use_lvmetad=0. WARNING: To avoid corruption, rescan devices to make changes visible (pvscan --cache). /dev/sdb: Checksum error at offset 160935559168 Couldn't read volume group metadata from /dev/sdb. Metadata location on /dev/sdb at 160935559168 has invalid summary for VG. Failed to read metadata summary from /dev/sdb Failed to scan VG from /dev/sdb Read-only locking type set. Write locks are prohibited. Recovery of standalone physical volumes failed. Cannot process standalone physical volumes Read-only locking type set. Write locks are prohibited. Recovery of standalone physical volumes failed. Cannot process standalone physical volumes Read-only locking type set. Write locks are prohibited. Recovery of standalone physical volumes failed. Cannot process standalone physical volumes Failed to find physical volume "/dev/sdc". 5 while with locking_type=1 succeeds: [root@localhost ~]# pvs /dev/sdc --config 'global { use_lvmetad=0 locking_type=1 }'; echo $? WARNING: Not using lvmetad because config setting use_lvmetad=0. WARNING: To avoid corruption, rescan devices to make changes visible (pvscan --cache). /dev/sdb: Checksum error at offset 160935559168 Couldn't read volume group metadata from /dev/sdb. Metadata location on /dev/sdb at 160935559168 has invalid summary for VG. Failed to read metadata summary from /dev/sdb Failed to scan VG from /dev/sdb PV VG Fmt Attr PSize PFree /dev/sdc lvm2 --- 20,00g 20,00g 0 lvm version: [root@localhost ~]# rpm -qa|grep lvm lvm2-libs-2.02.185-2.el7_7.2.x86_64 lvm2-2.02.185-2.el7_7.2.x86_64 (In reply to Vojtech Juranek from comment #6) > this can happen also when the device is specified, with locking_type=4 this > fails on one of my test VMs: > > [root@localhost ~]# pvs /dev/sdc --config 'global { use_lvmetad=0 > locking_type=4 }'; echo $? > WARNING: Not using lvmetad because config setting use_lvmetad=0. > WARNING: To avoid corruption, rescan devices to make changes visible > (pvscan --cache). This system is running lvmetad - we disable and mask this service RHV hosts. Please test when lvmetad service is masked and disabled. > /dev/sdb: Checksum error at offset 160935559168 > Couldn't read volume group metadata from /dev/sdb. > Metadata location on /dev/sdb at 160935559168 has invalid summary for VG. VG metadata on /dev/sdb is corrupted. > lvm version: > > [root@localhost ~]# rpm -qa|grep lvm > lvm2-libs-2.02.185-2.el7_7.2.x86_64 > lvm2-2.02.185-2.el7_7.2.x86_64 Does it work with the scratch build mentioned in comment 4? (lvm2-2.02.186-7.el7.bz1809660_2.x86_64) (In reply to Vojtech Juranek from comment #6) > this can happen also when the device is specified, with locking_type=4 this > fails on one of my test VMs: Yes, it can still happen when devs are specified if the dev is an orphan PV or not pvcreated (which causes lvm to look in orphans for it). In this case the corruption of sdb caused lvm to look through the orphans which is where the problem appears. Generally if PVs are named and exist in VGs, then pvs will not look in the orphans for them and will not hit the problem. The patch should fix the problem regardless. pushed to stable-2.02 https://sourceware.org/git/?p=lvm2.git;a=commit;h=20d61a2553a160570835cad6790108fa2365b936 Fix verified in the latest rpms. 3.10.0-1136.el7.x86_64 lvm2-2.02.187-2.el7 BUILT: Thu Apr 16 11:56:15 CDT 2020 lvm2-libs-2.02.187-2.el7 BUILT: Thu Apr 16 11:56:15 CDT 2020 lvm2-cluster-2.02.187-2.el7 BUILT: Thu Apr 16 11:56:15 CDT 2020 lvm2-lockd-2.02.187-2.el7 BUILT: Thu Apr 16 11:56:15 CDT 2020 lvm2-python-boom-0.9-27.el7 BUILT: Thu Apr 16 12:10:50 CDT 2020 cmirror-2.02.187-2.el7 BUILT: Thu Apr 16 11:56:15 CDT 2020 device-mapper-1.02.170-2.el7 BUILT: Thu Apr 16 11:56:15 CDT 2020 device-mapper-libs-1.02.170-2.el7 BUILT: Thu Apr 16 11:56:15 CDT 2020 device-mapper-event-1.02.170-2.el7 BUILT: Thu Apr 16 11:56:15 CDT 2020 device-mapper-event-libs-1.02.170-2.el7 BUILT: Thu Apr 16 11:56:15 CDT 2020 device-mapper-persistent-data-0.8.5-3.el7 BUILT: Mon Apr 20 09:49:16 CDT 2020 I attempted these with PV /dev/mapper/mpathf1 also being failed at the time. [root@harding-02 ~]# systemctl status lvm2-lvmetad â lvm2-lvmetad.service - LVM2 metadata daemon Loaded: loaded (/usr/lib/systemd/system/lvm2-lvmetad.service; static; vendor preset: enabled) Active: active (running) since Fri 2020-05-01 10:41:39 CDT; 2h 23min ago Docs: man:lvmetad(8) Main PID: 1129 (lvmetad) CGroup: /system.slice/lvm2-lvmetad.service ââ1129 /usr/sbin/lvmetad -f May 01 10:41:39 harding-02.lab.msp.redhat.com systemd[1]: Started LVM2 metadata daemon. [root@harding-02 ~]# vgs --config 'global { use_lvmetad=0 }' WARNING: Not using lvmetad because config setting use_lvmetad=0. WARNING: To avoid corruption, rescan devices to make changes visible (pvscan --cache). Couldn't find device with uuid x8BnVM-UX9h-BkNn-XAWS-IOzl-cEHe-LLW7EE. VG #PV #LV #SN Attr VSize VFree black_bird 7 2 1 wz-pn- <1.71t <1.71t rhel_harding-02 3 3 0 wz--n- <278.47g 0 [root@harding-02 ~]# pvs --readonly --config 'global { use_lvmetad=0 }'; echo $? WARNING: Not using lvmetad because config setting use_lvmetad=0. WARNING: To avoid corruption, rescan devices to make changes visible (pvscan --cache). Error reading device /dev/mapper/mpathf at 0 length 512. Error reading device /dev/mapper/mpathf at 0 length 4. Error reading device /dev/mapper/mpathf at 4096 length 4. Error reading device /dev/mapper/mpathf1 at 0 length 512. Error reading device /dev/mapper/mpathf1 at 0 length 4. Error reading device /dev/mapper/mpathf1 at 4096 length 4. Couldn't find device with uuid x8BnVM-UX9h-BkNn-XAWS-IOzl-cEHe-LLW7EE. PV VG Fmt Attr PSize PFree /dev/mapper/mpatha1 black_bird lvm2 a-- 249.96g <249.23g /dev/mapper/mpathb1 black_bird lvm2 a-- 249.96g 249.47g /dev/mapper/mpathc1 black_bird lvm2 a-- 249.96g 249.96g /dev/mapper/mpathd1 black_bird lvm2 a-- 249.96g 249.96g /dev/mapper/mpathe1 black_bird lvm2 a-- 249.96g 249.96g /dev/mapper/mpathg1 black_bird lvm2 a-- 249.96g 249.47g /dev/sda2 rhel_harding-02 lvm2 a-- <92.16g 0 /dev/sdb1 rhel_harding-02 lvm2 a-- <93.16g 0 /dev/sdc1 rhel_harding-02 lvm2 a-- <93.16g 0 [unknown] black_bird lvm2 a-m 249.96g 249.96g 0 [root@harding-02 ~]# pvs --config 'global { use_lvmetad=0 locking_type=4 }'; echo $? WARNING: Not using lvmetad because config setting use_lvmetad=0. WARNING: To avoid corruption, rescan devices to make changes visible (pvscan --cache). Error reading device /dev/mapper/mpathf at 0 length 512. Error reading device /dev/mapper/mpathf at 0 length 4. Error reading device /dev/mapper/mpathf at 4096 length 4. Error reading device /dev/mapper/mpathf1 at 0 length 512. Error reading device /dev/mapper/mpathf1 at 0 length 4. Error reading device /dev/mapper/mpathf1 at 4096 length 4. Couldn't find device with uuid x8BnVM-UX9h-BkNn-XAWS-IOzl-cEHe-LLW7EE. PV VG Fmt Attr PSize PFree /dev/mapper/mpatha1 black_bird lvm2 a-- 249.96g <249.23g /dev/mapper/mpathb1 black_bird lvm2 a-- 249.96g 249.47g /dev/mapper/mpathc1 black_bird lvm2 a-- 249.96g 249.96g /dev/mapper/mpathd1 black_bird lvm2 a-- 249.96g 249.96g /dev/mapper/mpathe1 black_bird lvm2 a-- 249.96g 249.96g /dev/mapper/mpathg1 black_bird lvm2 a-- 249.96g 249.47g /dev/sda2 rhel_harding-02 lvm2 a-- <92.16g 0 /dev/sdb1 rhel_harding-02 lvm2 a-- <93.16g 0 /dev/sdc1 rhel_harding-02 lvm2 a-- <93.16g 0 [unknown] black_bird lvm2 a-m 249.96g 249.96g 0 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (lvm2 bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:3927 |