Bug 1034460

Summary: [lvmetad] VG mda corruption is not handled when using lvmetad
Product: Red Hat Enterprise Linux 7 Reporter: Corey Marthaler <cmarthal>
Component: lvm2Assignee: LVM and device-mapper development team <lvm-team>
lvm2 sub component: Default / Unclassified QA Contact: cluster-qe <cluster-qe>
Status: CLOSED NOTABUG Docs Contact:
Severity: high    
Priority: high CC: agk, djansa, dwysocha, heinzm, jbrassow, lnovich, msnitzer, prajnoha, prockai, slevine, thornber, zkabelac
Version: 7.0   
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 894136 Environment:
Last Closed: 2014-01-09 00:16:36 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On: 894136    
Bug Blocks:    

Comment 3 Petr Rockai 2013-12-02 14:35:45 UTC
Not sure, but the upstream test is still passing, so I don't think anyone broke this upstream. The patch referred in comment 1 should indeed fix the problem.

Comment 4 Corey Marthaler 2013-12-18 23:31:50 UTC
Any word on this? Should the tests just not be using pvs to reliably determine corruption? The pvs behavior is different from rhel6.5 (using lvmetad) and rhe7.0 (using lvmetad). As well as 7.0 w/ lvmetad and 7.0 w/o lvmetad.  

## RHEL6.5

[root@taft-01 ~]# lvs -a -o +devices
  LV                VG       Attr       LSize   Origin Data%  Devices
  corrupt_meta_snap snapper  swi-a-s--- 100.00m origin   0.00 /dev/sdb1(75)
  origin            snapper  owi-a-s--- 300.00m               /dev/sdb1(0)

[root@taft-01 ~]# dd if=/dev/zero of=/dev/sdb1 count=1000
1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 0.0038089 s, 134 MB/s

[root@taft-01 ~]# vgck
  Couldn't find device with uuid rWf7wc-nwnD-x3Ml-eFvV-xYIX-E2Td-btUZMF.
  The volume group is missing 1 physical volumes.

[root@taft-01 ~]# pvs /dev/sdb1
  No physical volume found in lvmetad cache for /dev/sdb1
  Failed to read physical volume "/dev/sdb1"
[root@taft-01 ~]# echo $?
5

## RHEL7.0

[root@harding-02 ~]# lvs -a -o +devices
  LV                VG       Attr       LSize   Origin Data%  Devices
  corrupt_meta_snap snapper  swi-a-s--- 100.00m origin   0.00 /dev/sdc1(75)
  origin            snapper  owi-a-s--- 300.00m               /dev/sdc1(0)

[root@harding-02 ~]# dd if=/dev/zero of=/dev/sdc1 count=1000
1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 0.0968152 s, 5.3 MB/s

[root@harding-02 ~]# vgck
  Couldn't find device with uuid OOeorC-2aY2-j177-rbju-QN4M-zN91-pzbNWg.
  The volume group is missing 1 physical volumes.

[root@harding-02 ~]# pvs /dev/sdc1
  PV         VG      Fmt  Attr PSize  PFree 
  /dev/sdc1  snapper lvm2 a--  93.16g 92.77g
[root@harding-02 ~]# echo $?
0



3.10.0-54.0.1.el7.x86_64
lvm2-2.02.103-6.el7    BUILT: Wed Nov 27 02:28:25 CST 2013
lvm2-libs-2.02.103-6.el7    BUILT: Wed Nov 27 02:28:25 CST 2013
lvm2-cluster-2.02.103-6.el7    BUILT: Wed Nov 27 02:28:25 CST 2013
device-mapper-1.02.82-6.el7    BUILT: Wed Nov 27 02:28:25 CST 2013
device-mapper-libs-1.02.82-6.el7    BUILT: Wed Nov 27 02:28:25 CST 2013
device-mapper-event-1.02.82-6.el7    BUILT: Wed Nov 27 02:28:25 CST 2013
device-mapper-event-libs-1.02.82-6.el7    BUILT: Wed Nov 27 02:28:25 CST 2013
device-mapper-persistent-data-0.2.8-2.el7    BUILT: Wed Oct 30 10:20:48 CDT 2013
cmirror-2.02.103-6.el7    BUILT: Wed Nov 27 02:28:25 CST 2013

Comment 5 Petr Rockai 2013-12-31 16:42:28 UTC
Oh, pvs! Yes, pvs no longer detects corruption: in 6.5, it goes to disk, but in 7.0 it relies on lvmetad. To check for metadata corruption, vgck should be always used. The original resolution for this bug was:

> Should be implemented upstream (vgck will not rely on lvmetad but check
> metadata stored on disk) in 0da72743ca46ae9f8185cd12d5c78b3c2b801872.

Please change the tests to use vgck instead of pvs. If that still fails, then we have a bug on our hands. I am still not sure though, because this bug report states:

> This fix must not have made it into rhel7 yet? A 'pvscan --cache $pv' will
> detect a corrupted pv, but vgck will not (unlike it does in rhel6.5).

Summary: pvs used to report corrupt metadata, but no longer does if lvmetad is in use. We have extended vgck instead to always check metadata on disk. Only vgck is guaranteed to check metadata on disk for corruption; if other commands currently do such checking, this behaviour may be removed in future.

Comment 6 Corey Marthaler 2014-01-09 00:16:36 UTC
Ok thanks. I'll remove the 'pvs' checks and stick with vgck, which does detect the corruption.

Closing...