Bug 894136

Summary:	[lvmetad] VG mda corruption is not handled when using lvmetad
Product:	Red Hat Enterprise Linux 6	Reporter:	Corey Marthaler <cmarthal>
Component:	lvm2	Assignee:	Petr Rockai <prockai>
Status:	CLOSED ERRATA	QA Contact:	Cluster QE <mspqa-list>
Severity:	high	Docs Contact:
Priority:	high
Version:	6.4	CC:	agk, djansa, dwysocha, heinzm, jbrassow, lnovich, msnitzer, prajnoha, prockai, slevine, thornber, zkabelac
Target Milestone:	rc
Target Release:	---
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:	lvm2-2.02.100-2.el6	Doc Type:	Bug Fix
Doc Text:	Cause: When lvmetad is enabled, metadata is cached in RAM and most LVM commands do not consult on-disk metadata during normal operation. Consequence: When metadata becomes corrupt on disk, LVM may fail to take notice until a restart of lvmetad or a reboot. Fix: The pre-existing command for checking VG consistency, vgck, has been improved to detect such on-disk corruption even while lvmetad is active and the metadata is cached. Result: Users can issue the "vgck" command to verify consistency of on-disk metadata at any time, or they can arrange a periodic check using cron.	Story Points:	---
Clone Of:
Clones:	987085 1034460 (view as bug list)		Environment:
Last Closed:	2013-11-21 23:18:46 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1034460

Description Corey Marthaler 2013-01-10 20:16:07 UTC

Description of problem:
This is similar to bug 892991. We have many test cases that attempt to recover/repair VGs with corrupted MDA areas. These test cases corrupt the MDA area by dd'ing out the MDA area on the actual PV. What can be done to remedy this now? 


SCENARIO - [recover_corrupt_mda_no_restorefile]
Create a mirror on taft-01, corrupt it's metadata, and then restore the volume using no backup file
taft-01: lvcreate -m 1 -n corrupt_meta_mirror -L 300M --nosync mirror_sanity
  WARNING: New mirror won't be synchronised. Don't read what you didn't write!
Corrupting PV /dev/sdb2 (used in this mirror)
1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 0.00363178 s, 141 MB/s
Verifying that this VG is now corrupt
physical volume did not appear to get corrupted


Version-Release number of selected component (if applicable):
2.6.32-353.el6.x86_64
lvm2-2.02.98-7.el6    BUILT: Wed Jan  9 03:34:27 CST 2013
lvm2-libs-2.02.98-7.el6    BUILT: Wed Jan  9 03:34:27 CST 2013
lvm2-cluster-2.02.98-7.el6    BUILT: Wed Jan  9 03:34:27 CST 2013
udev-147-2.43.el6    BUILT: Thu Oct 11 05:59:38 CDT 2012
device-mapper-1.02.77-7.el6    BUILT: Wed Jan  9 03:34:27 CST 2013
device-mapper-libs-1.02.77-7.el6    BUILT: Wed Jan  9 03:34:27 CST 2013
device-mapper-event-1.02.77-7.el6    BUILT: Wed Jan  9 03:34:27 CST 2013
device-mapper-event-libs-1.02.77-7.el6    BUILT: Wed Jan  9 03:34:27 CST 2013
cmirror-2.02.98-7.el6    BUILT: Wed Jan  9 03:34:27 CST 2013

Comment 1 Alasdair Kergon 2013-01-10 21:23:00 UTC

We need to think about this.

On the face of it, the new behaviour is better, because the system survives despite the corruption.

But on the other hand, the corruption still needs detecting and repairing and there should probably be something proactively doing this.

- How often should the system check for this sort of corruption?
  - Should this be tunable?
  - Should there be a command to do this on-demand?

Should 'vgck' perform this, and should we schedule regular 'vgck' runs?


At the moment, to test this, you'll have to restart lvmetad - to throw away its memory of the old configuration.

Comment 2 Alasdair Kergon 2013-01-10 21:26:00 UTC

Would:
   pvscan --cache <device_deliberately_corrupted>
be enough to tell lvmetad without restarting it?

Comment 3 Corey Marthaler 2013-01-10 22:50:47 UTC

Looks like that's the trick. Thanks!

[root@qalvm-01 ~]# lvs -a -o +devices
 LV                             Attr      LSize   Cpy%Sync Devices
 corrupt_meta_mirror            Mwi-a-m-- 300.00m   100.00 corrupt_meta_mirror_mimage_0(0),corrupt_meta_mirror_mimage_1(0)
 [corrupt_meta_mirror_mimage_0] iwi-aom-- 300.00m          /dev/vdh2(0)
 [corrupt_meta_mirror_mimage_1] iwi-aom-- 300.00m          /dev/vdh1(0)
 [corrupt_meta_mirror_mlog]     lwi-aom--   4.00m          /dev/vda1(0)

[root@qalvm-01 ~]# pvscan --cache /dev/vdh2
 No PV label found on /dev/vdh2.

[root@qalvm-01 ~]# lvs -a -o +devices
 PV 6Ekai9-WetR-N5RQ-nTBz-tFd6-375q-hEg3Cp not recognised. Is the device missing?
 PV 6Ekai9-WetR-N5RQ-nTBz-tFd6-375q-hEg3Cp not recognised. Is the device missing?
 LV                             Attr      LSize   Cpy%Sync Devices
 corrupt_meta_mirror            Mwi-a-m-p 300.00m   100.00 corrupt_meta_mirror_mimage_0(0),corrupt_meta_mirror_mimage_1(0)
 [corrupt_meta_mirror_mimage_0] iwi-aom-p 300.00m          unknown device(0)
 [corrupt_meta_mirror_mimage_1] iwi-aom-- 300.00m          /dev/vdh1(0)
 [corrupt_meta_mirror_mlog]     lwi-aom--   4.00m          /dev/vda1(0)

Comment 4 Peter Rajnoha 2013-01-15 15:00:33 UTC

(In reply to comment #1)
> - How often should the system check for this sort of corruption?
>   - Should this be tunable?
>   - Should there be a command to do this on-demand?
> 
> Should 'vgck' perform this, and should we schedule regular 'vgck' runs?

For now, I think the direct pvscan --cache call is just fine. Let's have a think about adding more automatism to this for 6.5.

Comment 5 Petr Rockai 2013-01-21 15:22:30 UTC

I think vgck is the right entrypoint for this check. 6.5 of course.

Comment 8 Petr Rockai 2013-06-03 22:35:29 UTC

The requirement is that running "vgck" will detect the corrupt MDA and/or missing PV label. A cronjob to run vgck periodically may be considered, but presumably this is not really a concern regarding QE. Basically, to test this, you want to try running "vgck" in the above scenario (in place of pvscan --cache ...) and verify that it detects and reports the problem.

Comment 15 Alasdair Kergon 2013-08-13 15:14:23 UTC

Should vgck have an option to determine whether it just reports problems, or whether it fixes them?

Comment 16 Petr Rockai 2013-08-13 21:50:57 UTC

Should be implemented upstream (vgck will not rely on lvmetad but check metadata stored on disk) in 0da72743ca46ae9f8185cd12d5c78b3c2b801872.

Comment 18 Corey Marthaler 2013-10-24 20:20:11 UTC

The vgck solves this lvmetad device corruption issue.

============================================================
Iteration 10 of 10 started at Thu Oct 24 14:37:02 CDT 2013
============================================================
SCENARIO - [recover_corrupt_mda_no_restorefile]
Create a mirror on harding-02, corrupt it's metadata, and then restore the volume using no backup file
harding-02: lvcreate -m 1 -n corrupt_meta_mirror -L 300M --nosync mirror_sanity
  WARNING: New mirror won't be synchronised. Don't read what you didn't write!
Corrupting PV /dev/sdb6 (used in this mirror)
1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 0.150019 s, 3.4 MB/s
Running vgck (bug 894136)
  Couldn't find device with uuid 0lq8i1-4dWy-shFK-MrBi-ch5A-cQYi-VuW1OL.
  The volume group is missing 1 physical volumes.
Verifying that this VG is now corrupt
  No physical volume found in lvmetad cache for /dev/sdb6
  Failed to read physical volume "/dev/sdb6"
Activating VG in partial readonly mode
  PV 0lq8i1-4dWy-shFK-MrBi-ch5A-cQYi-VuW1OL not recognised. Is the device missing?
  PV 0lq8i1-4dWy-shFK-MrBi-ch5A-cQYi-VuW1OL not recognised. Is the device missing?
  Logical volume vg_harding02/lv_root contains a filesystem in use.
  Can't deactivate volume group "vg_harding02" with 3 open logical volume(s)
  PARTIAL MODE. Incomplete logical volumes will be processed.
  PV 0lq8i1-4dWy-shFK-MrBi-ch5A-cQYi-VuW1OL not recognised. Is the device missing?
Recreating PV using it's old uuid
Restoring the VG back to it's original state
Reactivating VG
Deactivating mirror corrupt_meta_mirror... and removing


SCENARIO - [recover_corrupt_mda_restorefile]
Create a mirror on harding-02, corrupt it's metadata, and then restore the volume using a backup file
harding-02: lvcreate -m 1 -n corrupt_meta_mirror -L 300M --nosync mirror_sanity
  WARNING: New mirror won't be synchronised. Don't read what you didn't write!
Corrupting PV /dev/sdb6 (used in this mirror)
1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 0.14091 s, 3.6 MB/s
Running vgck (bug 894136)
  Couldn't find device with uuid 0lq8i1-4dWy-shFK-MrBi-ch5A-cQYi-VuW1OL.
  The volume group is missing 1 physical volumes.
Verifying that this VG is now corrupt
  No physical volume found in lvmetad cache for /dev/sdb6
  Failed to read physical volume "/dev/sdb6"
Activating VG in partial readonly mode
  PV 0lq8i1-4dWy-shFK-MrBi-ch5A-cQYi-VuW1OL not recognised. Is the device missing?
  PV 0lq8i1-4dWy-shFK-MrBi-ch5A-cQYi-VuW1OL not recognised. Is the device missing?
  Logical volume vg_harding02/lv_root contains a filesystem in use.
  Can't deactivate volume group "vg_harding02" with 3 open logical volume(s)
  PARTIAL MODE. Incomplete logical volumes will be processed.
  PV 0lq8i1-4dWy-shFK-MrBi-ch5A-cQYi-VuW1OL not recognised. Is the device missing?
Recreating PV using it's old uuid
  Couldn't find device with uuid 0lq8i1-4dWy-shFK-MrBi-ch5A-cQYi-VuW1OL.
Restoring the VG back to it's original state
Reactivating VG
Deactivating mirror corrupt_meta_mirror... and removing



2.6.32-410.el6.x86_64
lvm2-2.02.100-7.el6    BUILT: Wed Oct 23 10:19:11 CDT 2013
lvm2-libs-2.02.100-7.el6    BUILT: Wed Oct 23 10:19:11 CDT 2013
lvm2-cluster-2.02.100-7.el6    BUILT: Wed Oct 23 10:19:11 CDT 2013
udev-147-2.50.el6    BUILT: Fri Oct 11 05:58:10 CDT 2013
device-mapper-1.02.79-7.el6    BUILT: Wed Oct 23 10:19:11 CDT 2013
device-mapper-libs-1.02.79-7.el6    BUILT: Wed Oct 23 10:19:11 CDT 2013
device-mapper-event-1.02.79-7.el6    BUILT: Wed Oct 23 10:19:11 CDT 2013
device-mapper-event-libs-1.02.79-7.el6    BUILT: Wed Oct 23 10:19:11 CDT 2013
device-mapper-persistent-data-0.2.8-2.el6    BUILT: Mon Oct 21 09:14:25 CDT 2013
cmirror-2.02.100-7.el6    BUILT: Wed Oct 23 10:19:11 CDT 2013

Comment 19 errata-xmlrpc 2013-11-21 23:18:46 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1704.html