Bug 894136
Summary: | [lvmetad] VG mda corruption is not handled when using lvmetad | |||
---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Corey Marthaler <cmarthal> | |
Component: | lvm2 | Assignee: | Petr Rockai <prockai> | |
Status: | CLOSED ERRATA | QA Contact: | Cluster QE <mspqa-list> | |
Severity: | high | Docs Contact: | ||
Priority: | high | |||
Version: | 6.4 | CC: | agk, djansa, dwysocha, heinzm, jbrassow, lnovich, msnitzer, prajnoha, prockai, slevine, thornber, zkabelac | |
Target Milestone: | rc | |||
Target Release: | --- | |||
Hardware: | x86_64 | |||
OS: | Linux | |||
Whiteboard: | ||||
Fixed In Version: | lvm2-2.02.100-2.el6 | Doc Type: | Bug Fix | |
Doc Text: |
Cause: When lvmetad is enabled, metadata is cached in RAM and most LVM commands do not consult on-disk metadata during normal operation.
Consequence: When metadata becomes corrupt on disk, LVM may fail to take notice until a restart of lvmetad or a reboot.
Fix: The pre-existing command for checking VG consistency, vgck, has been improved to detect such on-disk corruption even while lvmetad is active and the metadata is cached.
Result: Users can issue the "vgck" command to verify consistency of on-disk metadata at any time, or they can arrange a periodic check using cron.
|
Story Points: | --- | |
Clone Of: | ||||
: | 987085 1034460 (view as bug list) | Environment: | ||
Last Closed: | 2013-11-21 23:18:46 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1034460 |
Description
Corey Marthaler
2013-01-10 20:16:07 UTC
We need to think about this. On the face of it, the new behaviour is better, because the system survives despite the corruption. But on the other hand, the corruption still needs detecting and repairing and there should probably be something proactively doing this. - How often should the system check for this sort of corruption? - Should this be tunable? - Should there be a command to do this on-demand? Should 'vgck' perform this, and should we schedule regular 'vgck' runs? At the moment, to test this, you'll have to restart lvmetad - to throw away its memory of the old configuration. Would: pvscan --cache <device_deliberately_corrupted> be enough to tell lvmetad without restarting it? Looks like that's the trick. Thanks! [root@qalvm-01 ~]# lvs -a -o +devices LV Attr LSize Cpy%Sync Devices corrupt_meta_mirror Mwi-a-m-- 300.00m 100.00 corrupt_meta_mirror_mimage_0(0),corrupt_meta_mirror_mimage_1(0) [corrupt_meta_mirror_mimage_0] iwi-aom-- 300.00m /dev/vdh2(0) [corrupt_meta_mirror_mimage_1] iwi-aom-- 300.00m /dev/vdh1(0) [corrupt_meta_mirror_mlog] lwi-aom-- 4.00m /dev/vda1(0) [root@qalvm-01 ~]# pvscan --cache /dev/vdh2 No PV label found on /dev/vdh2. [root@qalvm-01 ~]# lvs -a -o +devices PV 6Ekai9-WetR-N5RQ-nTBz-tFd6-375q-hEg3Cp not recognised. Is the device missing? PV 6Ekai9-WetR-N5RQ-nTBz-tFd6-375q-hEg3Cp not recognised. Is the device missing? LV Attr LSize Cpy%Sync Devices corrupt_meta_mirror Mwi-a-m-p 300.00m 100.00 corrupt_meta_mirror_mimage_0(0),corrupt_meta_mirror_mimage_1(0) [corrupt_meta_mirror_mimage_0] iwi-aom-p 300.00m unknown device(0) [corrupt_meta_mirror_mimage_1] iwi-aom-- 300.00m /dev/vdh1(0) [corrupt_meta_mirror_mlog] lwi-aom-- 4.00m /dev/vda1(0) (In reply to comment #1) > - How often should the system check for this sort of corruption? > - Should this be tunable? > - Should there be a command to do this on-demand? > > Should 'vgck' perform this, and should we schedule regular 'vgck' runs? For now, I think the direct pvscan --cache call is just fine. Let's have a think about adding more automatism to this for 6.5. I think vgck is the right entrypoint for this check. 6.5 of course. The requirement is that running "vgck" will detect the corrupt MDA and/or missing PV label. A cronjob to run vgck periodically may be considered, but presumably this is not really a concern regarding QE. Basically, to test this, you want to try running "vgck" in the above scenario (in place of pvscan --cache ...) and verify that it detects and reports the problem. Should vgck have an option to determine whether it just reports problems, or whether it fixes them? Should be implemented upstream (vgck will not rely on lvmetad but check metadata stored on disk) in 0da72743ca46ae9f8185cd12d5c78b3c2b801872. The vgck solves this lvmetad device corruption issue. ============================================================ Iteration 10 of 10 started at Thu Oct 24 14:37:02 CDT 2013 ============================================================ SCENARIO - [recover_corrupt_mda_no_restorefile] Create a mirror on harding-02, corrupt it's metadata, and then restore the volume using no backup file harding-02: lvcreate -m 1 -n corrupt_meta_mirror -L 300M --nosync mirror_sanity WARNING: New mirror won't be synchronised. Don't read what you didn't write! Corrupting PV /dev/sdb6 (used in this mirror) 1000+0 records in 1000+0 records out 512000 bytes (512 kB) copied, 0.150019 s, 3.4 MB/s Running vgck (bug 894136) Couldn't find device with uuid 0lq8i1-4dWy-shFK-MrBi-ch5A-cQYi-VuW1OL. The volume group is missing 1 physical volumes. Verifying that this VG is now corrupt No physical volume found in lvmetad cache for /dev/sdb6 Failed to read physical volume "/dev/sdb6" Activating VG in partial readonly mode PV 0lq8i1-4dWy-shFK-MrBi-ch5A-cQYi-VuW1OL not recognised. Is the device missing? PV 0lq8i1-4dWy-shFK-MrBi-ch5A-cQYi-VuW1OL not recognised. Is the device missing? Logical volume vg_harding02/lv_root contains a filesystem in use. Can't deactivate volume group "vg_harding02" with 3 open logical volume(s) PARTIAL MODE. Incomplete logical volumes will be processed. PV 0lq8i1-4dWy-shFK-MrBi-ch5A-cQYi-VuW1OL not recognised. Is the device missing? Recreating PV using it's old uuid Restoring the VG back to it's original state Reactivating VG Deactivating mirror corrupt_meta_mirror... and removing SCENARIO - [recover_corrupt_mda_restorefile] Create a mirror on harding-02, corrupt it's metadata, and then restore the volume using a backup file harding-02: lvcreate -m 1 -n corrupt_meta_mirror -L 300M --nosync mirror_sanity WARNING: New mirror won't be synchronised. Don't read what you didn't write! Corrupting PV /dev/sdb6 (used in this mirror) 1000+0 records in 1000+0 records out 512000 bytes (512 kB) copied, 0.14091 s, 3.6 MB/s Running vgck (bug 894136) Couldn't find device with uuid 0lq8i1-4dWy-shFK-MrBi-ch5A-cQYi-VuW1OL. The volume group is missing 1 physical volumes. Verifying that this VG is now corrupt No physical volume found in lvmetad cache for /dev/sdb6 Failed to read physical volume "/dev/sdb6" Activating VG in partial readonly mode PV 0lq8i1-4dWy-shFK-MrBi-ch5A-cQYi-VuW1OL not recognised. Is the device missing? PV 0lq8i1-4dWy-shFK-MrBi-ch5A-cQYi-VuW1OL not recognised. Is the device missing? Logical volume vg_harding02/lv_root contains a filesystem in use. Can't deactivate volume group "vg_harding02" with 3 open logical volume(s) PARTIAL MODE. Incomplete logical volumes will be processed. PV 0lq8i1-4dWy-shFK-MrBi-ch5A-cQYi-VuW1OL not recognised. Is the device missing? Recreating PV using it's old uuid Couldn't find device with uuid 0lq8i1-4dWy-shFK-MrBi-ch5A-cQYi-VuW1OL. Restoring the VG back to it's original state Reactivating VG Deactivating mirror corrupt_meta_mirror... and removing 2.6.32-410.el6.x86_64 lvm2-2.02.100-7.el6 BUILT: Wed Oct 23 10:19:11 CDT 2013 lvm2-libs-2.02.100-7.el6 BUILT: Wed Oct 23 10:19:11 CDT 2013 lvm2-cluster-2.02.100-7.el6 BUILT: Wed Oct 23 10:19:11 CDT 2013 udev-147-2.50.el6 BUILT: Fri Oct 11 05:58:10 CDT 2013 device-mapper-1.02.79-7.el6 BUILT: Wed Oct 23 10:19:11 CDT 2013 device-mapper-libs-1.02.79-7.el6 BUILT: Wed Oct 23 10:19:11 CDT 2013 device-mapper-event-1.02.79-7.el6 BUILT: Wed Oct 23 10:19:11 CDT 2013 device-mapper-event-libs-1.02.79-7.el6 BUILT: Wed Oct 23 10:19:11 CDT 2013 device-mapper-persistent-data-0.2.8-2.el6 BUILT: Mon Oct 21 09:14:25 CDT 2013 cmirror-2.02.100-7.el6 BUILT: Wed Oct 23 10:19:11 CDT 2013 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2013-1704.html |