RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1809660 - pvs without specifying a device fails with locking_type=4
Summary: pvs without specifying a device fails with locking_type=4
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: lvm2
Version: 7.7
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: rc
: ---
Assignee: David Teigland
QA Contact: cluster-qe@redhat.com
URL:
Whiteboard:
Depends On:
Blocks: 1553133 1711360 1812441
TreeView+ depends on / blocked
 
Reported: 2020-03-03 16:09 UTC by Nir Soffer
Modified: 2021-09-03 12:53 UTC (History)
22 users (show)

Fixed In Version: lvm2-2.02.186-7.el7_8.1
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1812441 (view as bug list)
Environment:
Last Closed: 2020-09-29 19:55:48 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2020:3927 0 None None None 2020-09-29 19:56:28 UTC

Description Nir Soffer 2020-03-03 16:09:22 UTC
Description of problem:

For bug 1553133, RHV need to use locking_type=4 for all LVM commands on a host
which is not the SPM.

We use "pvs" without specifying the devices to learn about the available PVs.
When running the command with locking_type=4 or with --readonly, the command
fails.

Here is a minimal reproducer using RHV storage domain:

[root@host1 ~]# pvdisplay /dev/mapper/360014053b18095bd13c48158687153a5
  --- Physical volume ---
  PV Name               /dev/mapper/360014053b18095bd13c48158687153a5
  VG Name               91630622-c645-4397-a9fe-9ddf26690500
  PV Size               100.00 GiB / not usable 384.00 MiB
  Allocatable           yes 
  PE Size               128.00 MiB
  Total PE              797
  Free PE               733
  Allocated PE          64
  PV UUID               utJUza-dIPy-sJ0j-dETX-LtdO-mAkB-yOiDIC
   
[root@host1 ~]# vgdisplay 91630622-c645-4397-a9fe-9ddf26690500
  --- Volume group ---
  VG Name               91630622-c645-4397-a9fe-9ddf26690500
  System ID             
  Format                lvm2
  Metadata Areas        2
  Metadata Sequence No  143
  VG Access             read/write
  VG Status             resizable
  MAX LV                0
  Cur LV                10
  Open LV               2
  Max PV                0
  Cur PV                1
  Act PV                1
  VG Size               99.62 GiB
  PE Size               128.00 MiB
  Total PE              797
  Alloc PE / Size       64 / 8.00 GiB
  Free  PE / Size       733 / 91.62 GiB
  VG UUID               lpBKnM-wL1e-Cxkz-1jGS-SK7X-flcf-fTaObo

[root@host1 ~]# vgs --config 'global { use_lvmetad=0 }'
  VG                                   #PV #LV #SN Attr   VSize   VFree
  91630622-c645-4397-a9fe-9ddf26690500   1  10   0 wz--n-  99.62g 91.62g
  centos                                 1   2   0 wz--n- <19.00g     0

[root@host1 ~]# pvs --config 'global { use_lvmetad=0 }'
  PV                                            VG                                   Fmt  Attr PSize   PFree
  /dev/mapper/360014053b18095bd13c48158687153a5 91630622-c645-4397-a9fe-9ddf26690500 lvm2 a--   99.62g 91.62g
  /dev/sda2                                     centos                               lvm2 a--  <19.00g     0

[root@host1 ~]# pvs --readonly --config 'global { use_lvmetad=0 }'; echo $?
  Operation prohibited while global/metadata_read_only is set.
  Recovery of standalone physical volumes failed.
  Cannot process standalone physical volumes
  Operation prohibited while global/metadata_read_only is set.
  Recovery of standalone physical volumes failed.
  Cannot process standalone physical volumes
  Operation prohibited while global/metadata_read_only is set.
  Recovery of standalone physical volumes failed.
  Cannot process standalone physical volumes
  PV                                            VG                                   Fmt  Attr PSize   PFree
  /dev/mapper/360014053b18095bd13c48158687153a5 91630622-c645-4397-a9fe-9ddf26690500 lvm2 a--   99.62g 91.62g
  /dev/sda2                                     centos                               lvm2 a--  <19.00g     0
5

[root@host1 ~]# pvs --config 'global { use_lvmetad=0 locking_type=4 }'; echo $?
  Read-only locking type set. Write locks are prohibited.
  Recovery of standalone physical volumes failed.
  Cannot process standalone physical volumes
  Read-only locking type set. Write locks are prohibited.
  Recovery of standalone physical volumes failed.
  Cannot process standalone physical volumes
  Read-only locking type set. Write locks are prohibited.
  Recovery of standalone physical volumes failed.
  Cannot process standalone physical volumes
  PV                                            VG                                   Fmt  Attr PSize   PFree
  /dev/mapper/360014053b18095bd13c48158687153a5 91630622-c645-4397-a9fe-9ddf26690500 lvm2 a--   99.62g 91.62g
  /dev/sda2                                     centos                               lvm2 a--  <19.00g     0
5

Version-Release number of selected component (if applicable):
# rpm -qa | grep lvm2
lvm2-libs-2.02.185-2.el7_7.2.x86_64
lvm2-2.02.185-2.el7_7.2.x86_64
udisks2-lvm2-2.7.3-9.el7.x86_64


How reproducible:
100%

Bug 1553133 is causing data corruption and having working locking_type=4 option
in RHEL 7.7.z is very important for RHV.

Comment 2 David Teigland 2020-03-03 16:17:51 UTC
The bug is from commit 79c4971210a6337563ffa2fca08fb636423d93d4 (from 2017.)
The bug leads lvm to attempt a bogus recovery of the orphan VG whenever the
orphan VG is read.  (orphan VG is a fake internal VG for handling orphan PVs).
No recovery code exists for the orphan VG, but lvm still attempts it.

When lvm attempts the bogus/no-op orphan recovery, it tries to get a write
lock.  Usually the write lock succeeds, and "recovery" does nothing, so
fairly harmless.  But, with locking_type=4, the write lock fails, which
bubbles up to cause the whole command failure.

This fix is to not attempt to do recovery of the orphan VG:

diff --git a/lib/metadata/metadata.c b/lib/metadata/metadata.c
index 81a6029c4b59..666ad78230d2 100644
--- a/lib/metadata/metadata.c
+++ b/lib/metadata/metadata.c
@@ -3433,6 +3433,8 @@ static struct volume_group *_vg_read_orphans(struct cmd_context *cmd,
 
        dm_list_init(&head.list);
 
+       *consistent = 1;
+
        if (!(vginfo = lvmcache_vginfo_from_vgname(orphan_vgname, NULL)))
                return_NULL;


This bug above is exposed when a pvs command tries to process orphan PVs.
That happens when pvs needs to look at all PVs on the system.  This is obviously
true for a 'pvs' command with no args which by definition reports all PVs.  It is
not true for a 'pvs /dev/foo' command which only looks at the named devs.

When using --select, lvm processes all objects on the system  because --select
does a wide range of matching to various properties. It compares every PV, VG or LV
to the --select matching pattern. So, 'pvs --select' is going to process the
orphan VG, like 'pvs', and hit the locking/recovery bug.

Comment 3 Nir Soffer 2020-03-03 17:50:29 UTC
Turns out that RHV 4.3.9 will be delivered with RHEL 7.8, so we many not need
a fix in RHEL 7.7.z.

Comment 4 David Teigland 2020-03-03 19:13:37 UTC
scratch build with patch in comment 2
https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=26995154

Comment 5 Nir Soffer 2020-03-03 19:55:41 UTC
The fix seems to work:

[root@host1 ~]# pvs --readonly --config 'global { use_lvmetad=0 }'; echo $?
  PV                                            VG                                   Fmt  Attr PSize   PFree  
  /dev/mapper/360014051ce5179112ae4fb98e72d9ba9 test                                 lvm2 a--   99.62g  99.62g
  /dev/mapper/36001405271fe76b24b542bf858aaeef7 test                                 lvm2 a--   99.62g <22.88g
  /dev/mapper/360014053b18095bd13c48158687153a5 91630622-c645-4397-a9fe-9ddf26690500 lvm2 a--   99.62g  91.62g
  /dev/sda2                                     centos                               lvm2 a--  <19.00g      0 
0

[root@host1 ~]# pvs --config 'global { use_lvmetad=0 locking_type=4 }'; echo $?
  Scan of VG test from /dev/mapper/36001405271fe76b24b542bf858aaeef7 found metadata seqno 51296 vs previous 51295.
  PV                                            VG                                   Fmt  Attr PSize   PFree 
  /dev/mapper/360014051ce5179112ae4fb98e72d9ba9 test                                 lvm2 a--   99.62g 99.62g
  /dev/mapper/36001405271fe76b24b542bf858aaeef7 test                                 lvm2 a--   99.62g 37.50g
  /dev/mapper/360014053b18095bd13c48158687153a5 91630622-c645-4397-a9fe-9ddf26690500 lvm2 a--   99.62g 91.62g
  /dev/sda2                                     centos                               lvm2 a--  <19.00g     0 
0

[root@host1 ~]# pvs --config 'global { use_lvmetad=0 locking_type=4 }' --select 'pv_name = test'; echo $?
0

[root@host1 ~]# pvs --config 'global { use_lvmetad=0 locking_type=4 }' --select 'pv_name = /dev/mapper/360014051ce5179112ae4fb98e72d9ba9'; echo $?
  PV                                            VG   Fmt  Attr PSize  PFree 
  /dev/mapper/360014051ce5179112ae4fb98e72d9ba9 test lvm2 a--  99.62g 99.62g
0


The warning:

  Scan of VG test from /dev/mapper/36001405271fe76b24b542bf858aaeef7 found metadata seqno 51296 vs previous 51295.

Is expected, I'm running a stress test extending LVs in this VG on another host.


We will do more testing later when we enable locking_type=4 in RHV.

Comment 6 Vojtech Juranek 2020-03-05 13:44:41 UTC
this can happen also when the device is specified, with locking_type=4 this fails on one of my test VMs:

[root@localhost ~]# pvs /dev/sdc --config 'global { use_lvmetad=0 locking_type=4 }'; echo $?
  WARNING: Not using lvmetad because config setting use_lvmetad=0.
  WARNING: To avoid corruption, rescan devices to make changes visible (pvscan --cache).
  /dev/sdb: Checksum error at offset 160935559168
  Couldn't read volume group metadata from /dev/sdb.
  Metadata location on /dev/sdb at 160935559168 has invalid summary for VG.
  Failed to read metadata summary from /dev/sdb
  Failed to scan VG from /dev/sdb
  Read-only locking type set. Write locks are prohibited.
  Recovery of standalone physical volumes failed.
  Cannot process standalone physical volumes
  Read-only locking type set. Write locks are prohibited.
  Recovery of standalone physical volumes failed.
  Cannot process standalone physical volumes
  Read-only locking type set. Write locks are prohibited.
  Recovery of standalone physical volumes failed.
  Cannot process standalone physical volumes
  Failed to find physical volume "/dev/sdc".
5

while with locking_type=1 succeeds:

[root@localhost ~]# pvs /dev/sdc --config 'global { use_lvmetad=0 locking_type=1 }'; echo $?
  WARNING: Not using lvmetad because config setting use_lvmetad=0.
  WARNING: To avoid corruption, rescan devices to make changes visible (pvscan --cache).
  /dev/sdb: Checksum error at offset 160935559168
  Couldn't read volume group metadata from /dev/sdb.
  Metadata location on /dev/sdb at 160935559168 has invalid summary for VG.
  Failed to read metadata summary from /dev/sdb
  Failed to scan VG from /dev/sdb
  PV         VG Fmt  Attr PSize  PFree 
  /dev/sdc      lvm2 ---  20,00g 20,00g
0


lvm version:

[root@localhost ~]# rpm -qa|grep lvm
lvm2-libs-2.02.185-2.el7_7.2.x86_64
lvm2-2.02.185-2.el7_7.2.x86_64

Comment 7 Nir Soffer 2020-03-05 13:56:10 UTC
(In reply to Vojtech Juranek from comment #6)
> this can happen also when the device is specified, with locking_type=4 this
> fails on one of my test VMs:
> 
> [root@localhost ~]# pvs /dev/sdc --config 'global { use_lvmetad=0
> locking_type=4 }'; echo $?
>   WARNING: Not using lvmetad because config setting use_lvmetad=0.
>   WARNING: To avoid corruption, rescan devices to make changes visible
> (pvscan --cache).

This system is running lvmetad - we disable and mask this service RHV hosts.
Please test when lvmetad service is masked and disabled.

>   /dev/sdb: Checksum error at offset 160935559168
>   Couldn't read volume group metadata from /dev/sdb.
>   Metadata location on /dev/sdb at 160935559168 has invalid summary for VG.

VG metadata on /dev/sdb is corrupted.

> lvm version:
> 
> [root@localhost ~]# rpm -qa|grep lvm
> lvm2-libs-2.02.185-2.el7_7.2.x86_64
> lvm2-2.02.185-2.el7_7.2.x86_64

Does it work with the scratch build mentioned in comment 4?
(lvm2-2.02.186-7.el7.bz1809660_2.x86_64)

Comment 8 David Teigland 2020-03-05 17:24:01 UTC
(In reply to Vojtech Juranek from comment #6)
> this can happen also when the device is specified, with locking_type=4 this
> fails on one of my test VMs:

Yes, it can still happen when devs are specified if the dev is an orphan PV or not pvcreated (which causes lvm to look in orphans for it).  In this case the corruption of sdb caused lvm to look through the orphans which is where the problem appears.  Generally if PVs are named and exist in VGs, then pvs will not look in the orphans for them and will not hit the problem.  The patch should fix the problem regardless.

Comment 23 Corey Marthaler 2020-05-01 18:10:05 UTC
Fix verified in the latest rpms. 

3.10.0-1136.el7.x86_64
lvm2-2.02.187-2.el7    BUILT: Thu Apr 16 11:56:15 CDT 2020
lvm2-libs-2.02.187-2.el7    BUILT: Thu Apr 16 11:56:15 CDT 2020
lvm2-cluster-2.02.187-2.el7    BUILT: Thu Apr 16 11:56:15 CDT 2020
lvm2-lockd-2.02.187-2.el7    BUILT: Thu Apr 16 11:56:15 CDT 2020
lvm2-python-boom-0.9-27.el7    BUILT: Thu Apr 16 12:10:50 CDT 2020
cmirror-2.02.187-2.el7    BUILT: Thu Apr 16 11:56:15 CDT 2020
device-mapper-1.02.170-2.el7    BUILT: Thu Apr 16 11:56:15 CDT 2020
device-mapper-libs-1.02.170-2.el7    BUILT: Thu Apr 16 11:56:15 CDT 2020
device-mapper-event-1.02.170-2.el7    BUILT: Thu Apr 16 11:56:15 CDT 2020
device-mapper-event-libs-1.02.170-2.el7    BUILT: Thu Apr 16 11:56:15 CDT 2020
device-mapper-persistent-data-0.8.5-3.el7    BUILT: Mon Apr 20 09:49:16 CDT 2020




I attempted these with PV /dev/mapper/mpathf1 also being failed at the time.

[root@harding-02 ~]# systemctl status lvm2-lvmetad
â lvm2-lvmetad.service - LVM2 metadata daemon
   Loaded: loaded (/usr/lib/systemd/system/lvm2-lvmetad.service; static; vendor preset: enabled)
   Active: active (running) since Fri 2020-05-01 10:41:39 CDT; 2h 23min ago
     Docs: man:lvmetad(8)
 Main PID: 1129 (lvmetad)
   CGroup: /system.slice/lvm2-lvmetad.service
           ââ1129 /usr/sbin/lvmetad -f

May 01 10:41:39 harding-02.lab.msp.redhat.com systemd[1]: Started LVM2 metadata daemon.

[root@harding-02 ~]# vgs --config 'global { use_lvmetad=0 }'
  WARNING: Not using lvmetad because config setting use_lvmetad=0.
  WARNING: To avoid corruption, rescan devices to make changes visible (pvscan --cache).
  Couldn't find device with uuid x8BnVM-UX9h-BkNn-XAWS-IOzl-cEHe-LLW7EE.
  VG              #PV #LV #SN Attr   VSize    VFree 
  black_bird        7   2   1 wz-pn-   <1.71t <1.71t
  rhel_harding-02   3   3   0 wz--n- <278.47g     0 

[root@harding-02 ~]# pvs --readonly --config 'global { use_lvmetad=0 }'; echo $?
  WARNING: Not using lvmetad because config setting use_lvmetad=0.
  WARNING: To avoid corruption, rescan devices to make changes visible (pvscan --cache).
  Error reading device /dev/mapper/mpathf at 0 length 512.
  Error reading device /dev/mapper/mpathf at 0 length 4.
  Error reading device /dev/mapper/mpathf at 4096 length 4.
  Error reading device /dev/mapper/mpathf1 at 0 length 512.
  Error reading device /dev/mapper/mpathf1 at 0 length 4.
  Error reading device /dev/mapper/mpathf1 at 4096 length 4.
  Couldn't find device with uuid x8BnVM-UX9h-BkNn-XAWS-IOzl-cEHe-LLW7EE.
  PV                  VG              Fmt  Attr PSize   PFree   
  /dev/mapper/mpatha1 black_bird      lvm2 a--  249.96g <249.23g
  /dev/mapper/mpathb1 black_bird      lvm2 a--  249.96g  249.47g
  /dev/mapper/mpathc1 black_bird      lvm2 a--  249.96g  249.96g
  /dev/mapper/mpathd1 black_bird      lvm2 a--  249.96g  249.96g
  /dev/mapper/mpathe1 black_bird      lvm2 a--  249.96g  249.96g
  /dev/mapper/mpathg1 black_bird      lvm2 a--  249.96g  249.47g
  /dev/sda2           rhel_harding-02 lvm2 a--  <92.16g       0 
  /dev/sdb1           rhel_harding-02 lvm2 a--  <93.16g       0 
  /dev/sdc1           rhel_harding-02 lvm2 a--  <93.16g       0 
  [unknown]           black_bird      lvm2 a-m  249.96g  249.96g
0

[root@harding-02 ~]# pvs --config 'global { use_lvmetad=0 locking_type=4 }'; echo $?
  WARNING: Not using lvmetad because config setting use_lvmetad=0.
  WARNING: To avoid corruption, rescan devices to make changes visible (pvscan --cache).
  Error reading device /dev/mapper/mpathf at 0 length 512.
  Error reading device /dev/mapper/mpathf at 0 length 4.
  Error reading device /dev/mapper/mpathf at 4096 length 4.
  Error reading device /dev/mapper/mpathf1 at 0 length 512.
  Error reading device /dev/mapper/mpathf1 at 0 length 4.
  Error reading device /dev/mapper/mpathf1 at 4096 length 4.
  Couldn't find device with uuid x8BnVM-UX9h-BkNn-XAWS-IOzl-cEHe-LLW7EE.
  PV                  VG              Fmt  Attr PSize   PFree   
  /dev/mapper/mpatha1 black_bird      lvm2 a--  249.96g <249.23g
  /dev/mapper/mpathb1 black_bird      lvm2 a--  249.96g  249.47g
  /dev/mapper/mpathc1 black_bird      lvm2 a--  249.96g  249.96g
  /dev/mapper/mpathd1 black_bird      lvm2 a--  249.96g  249.96g
  /dev/mapper/mpathe1 black_bird      lvm2 a--  249.96g  249.96g
  /dev/mapper/mpathg1 black_bird      lvm2 a--  249.96g  249.47g
  /dev/sda2           rhel_harding-02 lvm2 a--  <92.16g       0 
  /dev/sdb1           rhel_harding-02 lvm2 a--  <93.16g       0 
  /dev/sdc1           rhel_harding-02 lvm2 a--  <93.16g       0 
  [unknown]           black_bird      lvm2 a-m  249.96g  249.96g
0

Comment 25 errata-xmlrpc 2020-09-29 19:55:48 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (lvm2 bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:3927


Note You need to log in before you can comment on or make changes to this bug.