| Summary: | Missing PVs lead to corrupted metadata, and "vgreduce --removemissing --force" is unable to correct the metadata | |||
|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 4 | Reporter: | Dave Wysochanski <dwysocha> | |
| Component: | lvm2 | Assignee: | Milan Broz <mbroz> | |
| Status: | CLOSED ERRATA | QA Contact: | Cluster QE <mspqa-list> | |
| Severity: | urgent | Docs Contact: | ||
| Priority: | urgent | |||
| Version: | 4.9 | CC: | agk, bmr, cmarthal, cww, dwysocha, heinzm, jbrassow, mbroz, mjuricek, mkhusid, prajnoha, prockai, pvrabec, ssaha, thornber, zkabelac | |
| Target Milestone: | rc | |||
| Target Release: | --- | |||
| Hardware: | All | |||
| OS: | Linux | |||
| Whiteboard: | ||||
| Fixed In Version: | lvm2-2.02.42-11.el4 | Doc Type: | Bug Fix | |
| Doc Text: |
Previously, if more physical volumes are missing in volume group, it can happen that written metadata
contains wrong name for missing physical volumes and this situation is later detected as incorrect metadata
for the whole volume group. If this condition occurs, the volume group cannot be repaired or removed, even with commands such as "vgreduce --removemissing --force" or "vgremove --force". For recovery procedures, refer to https://access.redhat.com/kb/docs/DOC-55800.
This fix enforces using physical volume UUID to reference physical volumes and fixes this problem.
|
Story Points: | --- | |
| Clone Of: | ||||
| : | 727578 (view as bug list) | Environment: | ||
| Last Closed: | 2011-08-18 13:04:36 UTC | Type: | --- | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Bug Depends On: | ||||
| Bug Blocks: | 727578 | |||
| Attachments: | ||||
Created attachment 497031 [details] output of verbose commands archive of the verbose commands, created as follows: # vgreduce -vvvv --removemissing --force local_3par-dg &> vgreduce_removemissing_force.txt # vgremove -vvvv local_3par-dg &> vgremove.txt # pvs -vvvv &> pvs.txt # tar -cvjf output.tar.bz2 *.txt Created attachment 497032 [details] metadata for dm devices, not picked up by lvmdump lvmdump did not pick up the metadata on these dm devices (they are dm-mp devices), so we asked the customer do the following: # dd if=/dev/dm-0 of=/tmp/dm-0.out bs=1M count=1 # dd if=/dev/dm-1 of=/tmp/dm-1.out bs=1M count=1 # dd if=/dev/dm-2 of=/tmp/dm-2.out bs=1M count=1 # dd if=/dev/dm-3 of=/tmp/dm-3.out bs=1M count=1 # dd if=/dev/dm-4 of=/tmp/dm-4.out bs=1M count=1 # dd if=/dev/dm-5 of=/tmp/dm-5.out bs=1M count=1 # dd if=/dev/dm-6 of=/tmp/dm-6.out bs=1M count=1 # tar -cvjf /tmp/metadata.tar.bz2 /tmp/*.out Created attachment 497036 [details]
Script to attempt to repro the customer's failure
This script gets somewhat close to the customer's failure. I created this based on the history of the vg on the customer's system (lvmdump archive, grepping out the command history). It does produce similar errors to what the customer was seeing, but does not reproduce the vgreduce failure.
Looks very similar to Bug 643538. Created attachment 497045 [details]
This script reproduces the customer failure.
This script reproduces the customer failure. After PVs go missing, you need one more command to run and create the corrupt metadata - multiple entries of "pvNN" with "MISSING" flag.
That problem was fixed long ago in new packages but unfortunatelly not in
RHEL4,
from the a7cac2463c15c915636e511887f022b8cb63a97e commit log:
Use PV UUID in hash for device name when exporting metadata.
Currently code uses pv_dev_name() for hash when getting internal
"pvX" name.
This produce corrupted metadata if PVs are missing, pv->dev
is NULL and all these missing devices returns one name
(using "unknown device" for all missing devices as hash key).
I see here quite serious problem - when the simple VG with several PVs
experiences fails of several PVS, code apparently generates wrong metadata and
these metadata is not parsable, so it can lead to loss of the whole VG.
I think this bug should be fixed in post RHEL4.9 update, dev_ack.
Created attachment 497362 [details]
Current KCS / Kbase article that describes the failure and recovery procedure
Since there's no plans or its impossible to have LVM tools fixup metadata that is mangled in this way, I've created an article describing the possible recovery procedures.
Milan - it looks like there is no 4.10 planned. Do you want to push this for release in some other mechanism (e.g. async errata)? Fixed in lvm2-2.02.42-11.el4. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2011-1185.html |
Created attachment 497030 [details] lvmdump of customer system Description of problem: In some customer setups, when a PV goes missing that is part of an LV, "vgreduce --removemissing --force" does not make a consistent VG to remove. As a result, they are unable to remove the VG. Version-Release number of selected component (if applicable): lvm2-2.02.42-9.el4 How reproducible: I had a hard time reproducing it, but I'll attach all the info from the customer's system, including an lvmdump, and verbose output of the command. Steps to Reproduce: 1. create a vg from multiple pvs 2. create at least one lv on the vg 3. remove at least one of the pvs in the lv 4. try using vgreduce, vgreduce --removemissing, and vgreduce --removemissing --force. Actual results: Unable to use vgreduce to make a consistent vg to remove.