Bug 905063

Summary: Multiple issues with lvm2 and 2-disk mirrored configurations
Product: Red Hat Enterprise Linux 6 Reporter: loberman <loberman>
Component: lvm2Assignee: Jonathan Earl Brassow <jbrassow>
lvm2 sub component: Activating existing Logical Volumes (RHEL6) QA Contact: Cluster QE <mspqa-list>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: high CC: agk, cww, dwysocha, heinzm, jbrassow, juzou, mgoodwin, msnitzer, nperic, prajnoha, prockai, thornber, zkabelac
Version: 6.6Keywords: Reopened
Target Milestone: rc   
Target Release: 6.6   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: lvm2-2.02.107-2.el6 Doc Type: Bug Fix
Doc Text:
Previously, there were two modes of activation, an unnamed nominal mode (now referred to as "complete") and "partial" mode. The "complete" mode required that a volume group be 'complete' - that is, no missing PVs. If there were any missing PVs, no affected LVs were allowed to activate - even RAID LVs which might be able to tolerate a failure. The "partial" mode allowed anything to be activated (or at least attempted). If a non-redundant LV was missing a portion of its addressable space due to a device failure, it was replaced with an error target. RAID LVs will either activate or fail to activate depending on how badly their redundancy is compromised. This update adds a third option, "degraded" mode. This mode can be selected via the '--activationmode {complete|degraded|partial}' option to lvchange/vgchange. It can also be set in lvm.conf. This new "degraded" mode is now the default activation mode for LVM. The "degraded" activation mode allows RAID LVs with a sufficient level of redundancy to activate (e.g. a RAID5 LV with one device failure, a RAID6 with two device failures, or RAID1 with n-1 failures). RAID LVs with too many device failures are not allowed to activate - nor are any non-redundant LVs that may have been affected. The degraded activation mode does not yet work in a cluster. When the locking_type is 3 (i.e. LVM cluster mode), the degraded mode flag simply gets dropped and the old ("complete") behavior is exhibited.
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-10-14 08:24:06 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1056252, 1075263    
Attachments:
Description Flags
Comment none

Description loberman 2013-01-28 13:53:30 UTC
Created attachment 915664 [details]
Comment

(This comment was longer than 65,535 characters and has been moved to an attachment by Red Hat Bugzilla).

Comment 6 RHEL Program Management 2013-02-01 06:47:56 UTC
This request was not resolved in time for the current release.
Red Hat invites you to ask your support representative to
propose this request, if still desired, for consideration in
the next release of Red Hat Enterprise Linux.

Comment 16 Jonathan Earl Brassow 2013-07-29 16:36:55 UTC
Mirror support is not planned for Dracut (bug 833078); therefore, "mirror" cannot be used for the root volume.  You must use the "raid1" segment type instead.

To convert an old "mirror" segment type logical volume to a "raid1" type, do:
~> lvconvert --type raid1 vg/lv

The first comment of bug 907487 shows how to deal with the loss of a device when using LVM RAID devices.  Bug 907487 deals with allowing 'vgreduce --removemissing' to operate on RAID logical volumes.  While this ability will come in RHEL6.5, the first comment of that bug describes what can be done until then.

Comment 29 Nenad Peric 2014-04-10 14:35:15 UTC
Acking based on requirements stated in Comment 28.

Comment 31 Jonathan Earl Brassow 2014-05-30 18:46:31 UTC
For the "mirror" segment type LVM provides the following options for handling failures (these are set in lvm.conf):
    mirror_log_fault_policy = "remove"/"allocate"  # "allocate" is default
    mirror_image_fault_policy = "remove"/"allocate" # "remove" is default

For all "raid*" segment types (including raid1), the following options are provided:
    raid_fault_policy = "warn"/"allocate" # "warn" is the default

The reason that 'mirror' has a "remove" policy is because the mirror will not function while there is a failed device in the array.  It will simply block all I/O to it.  RAID arrays do not have the same issue.  They can operate quite happily when there is a failed device in the array.  Not forcing a failed device to be kicked out is part of the reason that RAID LVs are capable of handling transient failures.

It appears that the customer is asking for a "remove" policy for RAID - this could work for RAID1, but not RAID4/5/6/10 since there is no way to reshape RAID arrays ATM.  This would effectively remove the RAID image for you and leave you with a linear LV, but the VG would still need manual intervention (due to the missing PV).  I view this as a sub-optimal solution.  Remember, there is no need to do anything with the RAID LV after the failure - it will work fine without intervention.  Leaving the failed device in the array allows the user to intelligently decide what should be done later, as opposed to allowing the machine to pull the device out on the first error it receives.  If the idea is to prevent the error messages that accompany LVM commands when PVs are missing/failed, then the extra vgreduce command is required anyway - just as it is for the "mirror" segment type for non-root volumes.

The "mirror" segment type will never be supported for root volumes.  I would consider adding a "remove" method for RAID though.  Please advise.

Comment 32 Jonathan Earl Brassow 2014-06-05 22:19:30 UTC
Please advise on what is desired after reading comment 31.  It appears the customer wants the following (see comment 28):
"device should be failed out of the volume group cleanly and system should not hang, even with var in an LVM mirror configuration."

There are effectively 2 things being asked for:
1) device should be failed out of the VG cleanly
2) system should not hang.

Using RAID1, #2 should never happen.  There should never be an immediate need to intervene when using RAID.

#1 has two components to it.  The failed device should be removed from the logical volume AND from the volume group.  When using the "mirror" segment type for non-root volumes, a failure necessitates the device's removal from the LV, but the failed device remains in the VG until a 'vgreduce --removemissing' is performed.  RAID does not require a removal of a failed device to keep working.  To remove the failed device from the VG, only a 'vgreduce --removemissing' is required.  LVM always requires an admin to remove a failed device from the VG; but again, it does not have to be done immediately.

Comment 36 Jonathan Earl Brassow 2014-06-20 02:20:09 UTC
I've tested this with two system LVs as RAID1 and it worked, but it has been reported that things fall down with > 2.  I hope to try this shortly.

Comment 37 Jonathan Earl Brassow 2014-06-20 02:59:34 UTC
I see the same failure (drops into admin mode) when I have 3 RAID LVs (/, swap, and /home).  It previously worked with / and 'swap'.

Perhaps I will retry with '/' and '/home'.

Comment 38 Jonathan Earl Brassow 2014-07-03 23:55:22 UTC
dracut is doing its job just fine getting the root file system going, it is the other mount points that are controlled by the initscripts that are failing to activate.  /etc/rc.sysinit is the trouble in this case.  It only calls 'vgchange' without the '--partial' flag and makes no attempt to activate RAID volumes that have suffered a managable failure.

We can handle the whole problem in LVM without getting the initscript involved by adding a new activation mode option (right now we have "normal" and "partial" mode), "degraded".  Degraded mode behaves like normal mode, but it attempts to activate RAID LVs even if they have suffered a device failure.  Other types of LVs that have suffered a failure would require partial mode to activate because part (or all) of their addressable space would be missing.

This new degraded mode would become the default, but a user could revert to the old behavior by switching a config file option.  I have tested a POC patch, and it works.

Comment 39 Jonathan Earl Brassow 2014-07-10 03:59:44 UTC
Fix committed upstream.  (Note, this solution is not yet cluster capable and will not work with locking_type=3.)

commit be75076dfc842945a03fa42073e9e03f51bd3a3c
Author: Jonathan Brassow <jbrassow>
Date:   Wed Jul 9 22:56:11 2014 -0500

    activation: Add "degraded" activation mode
    
    Currently, we have two modes of activation, an unnamed nominal mode
    (which I will refer to as "complete") and "partial" mode.  The
    "complete" mode requires that a volume group be 'complete' - that
    is, no missing PVs.  If there are any missing PVs, no affected LVs
    are allowed to activate - even RAID LVs which might be able to
    tolerate a failure.  The "partial" mode allows anything to be
    activated (or at least attempted).  If a non-redundant LV is
    missing a portion of its addressable space due to a device failure,
    it will be replaced with an error target.  RAID LVs will either
    activate or fail to activate depending on how badly their
    redundancy is compromised.
    
    This patch adds a third option, "degraded" mode.  This mode can
    be selected via the '--activationmode {complete|degraded|partial}'
    option to lvchange/vgchange.  It can also be set in lvm.conf.
    The "degraded" activation mode allows RAID LVs with a sufficient
    level of redundancy to activate (e.g. a RAID5 LV with one device
    failure, a RAID6 with two device failures, or RAID1 with n-1
    failures).  RAID LVs with too many device failures are not allowed
    to activate - nor are any non-redundant LVs that may have been
    affected.  This patch also makes the "degraded" mode the default
    activation mode.
    
    The degraded activation mode does not yet work in a cluster.  A
    new cluster lock flag (LCK_DEGRADED_MODE) will need to be created
    to make that work.  Currently, there is limited space for this
    extra flag and I am looking for possible solutions.  One possible
    solution is to usurp LCK_CONVERT, as it is not used.  When the
    locking_type is 3, the degraded mode flag simply gets dropped and
    the old ("complete") behavior is exhibited.

Comment 41 Nenad Peric 2014-08-18 12:11:34 UTC
Tested with:

lvm2-2.02.109-1.el6


The system booted even though one leg from RAID1 LVs holding /var, / and /home were missing. It booted up in degraded mode, and RAID1 LVs were all marked as partial as one would expect. 
The degraded activation was set by default in lvm.conf
No additional settings were needed after the clean install of RHEL 6.6, but I suspect that after an upgrade just setting activation_mode = "degraded" should suffice. 

Marking this one as VERIFIED with RAID LVs.

Comment 42 errata-xmlrpc 2014-10-14 08:24:06 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2014-1387.html