RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 593119 - RFE: LVM RAID - Handle transient failures of RAID1 images
Summary: RFE: LVM RAID - Handle transient failures of RAID1 images
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: lvm2
Version: 6.0
Hardware: All
OS: Linux
low
medium
Target Milestone: rc
: ---
Assignee: Jonathan Earl Brassow
QA Contact: Corey Marthaler
URL:
Whiteboard:
: 319221 (view as bug list)
Depends On: 758552
Blocks: 697866 732458 756082
TreeView+ depends on / blocked
 
Reported: 2010-05-17 20:59 UTC by Jonathan Earl Brassow
Modified: 2018-11-27 21:56 UTC (History)
14 users (show)

Fixed In Version: lvm2-2.02.95-1.el6
Doc Type: Enhancement
Doc Text:
LVM RAID fully supported with the exception of RAID logical volumes in HA-LVM. The expanded RAID support in LVM is now fully supported in Red Hat Enterprise Linux 6.3. LVM now has the capability to create RAID 4/5/6 logical volumes and supports a new implementation of mirroring. The MD (software RAID) modules provide the backend support for these new features.
Clone Of:
Environment:
Last Closed: 2012-06-20 14:51:01 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2012:0962 0 normal SHIPPED_LIVE lvm2 bug fix and enhancement update 2012-06-19 21:12:11 UTC

Description Jonathan Earl Brassow 2010-05-17 20:59:32 UTC
Implement a solution such that when an image (leg) of a mirror fails, the areas of the address space that change during its absense are tracked.  If/when the mirror device can be revived, the changes can be quickly copied - bringing the mirror image back in-sync.

If possible, record the information in the mirror (set) device to maintain which image is unavailable.  This will help verify the correctness of the device when it is re-enabled as part of the group, and should eliminate the need for the administrator to specify the names of the devices that need to be readded.

Comment 1 Jonathan Earl Brassow 2010-05-18 18:15:52 UTC
*** Bug 319221 has been marked as a duplicate of this bug. ***

Comment 2 Siddharth Nagar 2010-11-23 22:11:06 UTC
Deferring to RHEL 6.2.

Comment 5 Jonathan Earl Brassow 2011-08-31 16:40:34 UTC
This feature comes for free with the inclusion of RAID in LVM.  This bug will define how it works, how to test it, and what the release requirements are.

Comment 7 Corey Marthaler 2011-08-31 18:52:26 UTC
QE reviewed this BZ for QA_ACK but was unable to ack due to a lack of
requirements or description of how the new feature is supposed to work or be
tested.

Please add all the device failure cases to be tested/supported in 6.3.

Please see
https://wiki.test.redhat.com/ClusterStorage/WhyNoAck

Comment 9 Jonathan Earl Brassow 2011-12-06 20:46:42 UTC
This feature is now committed upstream in LVM version 2.02.89.

In order to handle transient failures, the user must set the raid_fault_policy in lvm.conf to "warn".  This will prevent the automated response from immediately replacing a device that suffers a failure - instead, warning the user of the failure.

Once the user is informed of the failure, they can take steps to restore the failing device.  Then, simply 'deactivate' and re-'activate' the logical volume at the next most appropriate time.  This action will restore the device and update any portions of the device that may be out-of-sync.

[ I've also been considering adding some code to 'lvconvert --repair' to check the RAID LV's status and "recycle" it if there is a device which is listed as failed but has come back.  This way, the user would not need to go through the cumbersome 'unmount, deactivate, activate, mount' process.  However, I've not addressed this idea in this bug. ]

Comment 13 Corey Marthaler 2011-12-21 23:54:21 UTC
Adding QA ack for 6.3.

Comment 14 Jonathan Earl Brassow 2012-01-17 21:52:17 UTC
Showing the feature in action...

# 3-way RAID1
[root@bp-01 ~]# devices vg
  LV            Copy%  Devices                                     
  lv            100.00 lv_rimage_0(0),lv_rimage_1(0),lv_rimage_2(0)
  [lv_rimage_0]        /dev/sde1(1)                                
  [lv_rimage_1]        /dev/sdf1(1)                                
  [lv_rimage_2]        /dev/sdg1(1)                                
  [lv_rmeta_0]         /dev/sde1(0)                                
  [lv_rmeta_1]         /dev/sdf1(0)                                
  [lv_rmeta_2]         /dev/sdg1(0)

                 
# Kill a device
[root@bp-01 ~]# off.sh sdf
Turning off sdf


# Writing to the LV reveals device failure (note: no problem with I/O)
[root@bp-01 ~]# dd if=/dev/zero of=/dev/vg/lv bs=4M count=1
1+0 records in
1+0 records out
4194304 bytes (4.2 MB) copied, 0.14839 s, 28.3 MB/s



# LVM messages found in system log after array failure
# (Lack of instructions to restore device may warrant some discussion...)
Jan 17 15:36:41 bp-01 lvm[8599]: Device #1 of raid1 array, vg-lv, has failed.
Jan 17 15:36:41 bp-01 lvm[8599]: /dev/sdf1: read failed after 0 of 2048 at 250994294784: Input/output error
Jan 17 15:36:41 bp-01 lvm[8599]: /dev/sdf1: read failed after 0 of 2048 at 250994376704: Input/output error
Jan 17 15:36:41 bp-01 lvm[8599]: /dev/sdf1: read failed after 0 of 2048 at 0: Input/output error
Jan 17 15:36:41 bp-01 lvm[8599]: /dev/sdf1: read failed after 0 of 2048 at 4096: Input/output error
Jan 17 15:36:42 bp-01 lvm[8599]: Couldn't find device with uuid VxseDx-HGqr-1Fan-TmvI-DS4S-5Xn9-OKpObo.
Jan 17 15:36:43 bp-01 lvm[8599]: Issue 'lvconvert --repair vg/lv' to replace failed device


# 'lvs' output
[root@bp-01 ~]# devices vg
  /dev/sdf1: read failed after 0 of 2048 at 250994294784: Input/output error
  /dev/sdf1: read failed after 0 of 2048 at 250994376704: Input/output error
  /dev/sdf1: read failed after 0 of 2048 at 0: Input/output error
  /dev/sdf1: read failed after 0 of 2048 at 4096: Input/output error
  Couldn't find device with uuid VxseDx-HGqr-1Fan-TmvI-DS4S-5Xn9-OKpObo.
  LV            Copy%  Devices                                     
  lv            100.00 lv_rimage_0(0),lv_rimage_1(0),lv_rimage_2(0)
  [lv_rimage_0]        /dev/sde1(1)                                
  [lv_rimage_1]        unknown device(1)                           
  [lv_rimage_2]        /dev/sdg1(1)                                
  [lv_rmeta_0]         /dev/sde1(0)                                
  [lv_rmeta_1]         unknown device(0)                           
  [lv_rmeta_2]         /dev/sdg1(0)                                


# Turn device back on
[root@bp-01 ~]# on.sh sdf
Turning on sdf


# 'lvs' shows device has recovered.
# (It might be worth a bug to have the Attr characters still report that
# one of the devices is considered "failed" still - at least until the LV
# is recycled.)
[root@bp-01 ~]# devices vg
  LV            Copy%  Devices                                     
  lv            100.00 lv_rimage_0(0),lv_rimage_1(0),lv_rimage_2(0)
  [lv_rimage_0]        /dev/sde1(1)                                
  [lv_rimage_1]        /dev/sdf1(1)                                
  [lv_rimage_2]        /dev/sdg1(1)                                
  [lv_rmeta_0]         /dev/sde1(0)                                
  [lv_rmeta_1]         /dev/sdf1(0)                                
  [lv_rmeta_2]         /dev/sdg1(0)


# 'dmsetup status' does show the device as "failed" still
[root@bp-01 ~]# dmsetup status vg-lv
0 204800 raid raid1 3 ADA 204800/204800



# Recycle the LV
[root@bp-01 ~]# lvchange -an vg/lv; lvchange -ay vg/lv



# 'dmsetup status' shows the device is "alive but recovering" immediately after
# (It would be nice if 'lvs' also showed this.)
[root@bp-01 ~]# dmsetup status vg-lv
0 204800 raid raid1 3 AaA 198016/204800


# Once the drive is in-sync again, 'dmsetup status' shows it as 'A' again.
[root@bp-01 ~]# dmsetup status vg-lv
0 204800 raid raid1 3 AAA 204800/204800
[root@bp-01 ~]# devices vg
  LV            Copy%  Devices                                     
  lv            100.00 lv_rimage_0(0),lv_rimage_1(0),lv_rimage_2(0)
  [lv_rimage_0]        /dev/sde1(1)                                
  [lv_rimage_1]        /dev/sdf1(1)                                
  [lv_rimage_2]        /dev/sdg1(1)                                
  [lv_rmeta_0]         /dev/sde1(0)                                
  [lv_rmeta_1]         /dev/sdf1(0)                                
  [lv_rmeta_2]         /dev/sdg1(0)

Comment 17 Tom Coughlan 2012-03-28 21:49:33 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
The expanded RAID support in LVM moves from Tech. Preview to full suport in 6.4.

Comment 18 Tom Coughlan 2012-03-28 21:54:33 UTC
    Technical note updated. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    Diffed Contents:
@@ -1 +1,3 @@
-The expanded RAID support in LVM moves from Tech. Preview to full suport in 6.4.+The expanded RAID support in LVM moves from Tech. Preview to full support in 6.3.
+
+LVM now has the capability to create RAID 4/5/6 logical volumes and supports a new implementation of mirroring. The MD (software RAID) modules provide the backend support for these new features.

Comment 19 Martin Prpič 2012-04-06 12:00:32 UTC
    Technical note updated. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    Diffed Contents:
@@ -1,3 +1,3 @@
-The expanded RAID support in LVM moves from Tech. Preview to full support in 6.3.
+LVM RAID fully supported
 
-LVM now has the capability to create RAID 4/5/6 logical volumes and supports a new implementation of mirroring. The MD (software RAID) modules provide the backend support for these new features.+The expanded RAID support in LVM is now fully supported in Red Hat Enterprise Linux 6.3. LVM now has the capability to create RAID 4/5/6 logical volumes and supports a new implementation of mirroring. The MD (software RAID) modules provide the backend support for these new features.

Comment 20 Corey Marthaler 2012-05-03 22:46:40 UTC
Feature verified with the latest rpms.

2.6.32-269.el6.x86_64
lvm2-2.02.95-7.el6    BUILT: Wed May  2 05:14:03 CDT 2012
lvm2-libs-2.02.95-7.el6    BUILT: Wed May  2 05:14:03 CDT 2012
lvm2-cluster-2.02.95-7.el6    BUILT: Wed May  2 05:14:03 CDT 2012
udev-147-2.41.el6    BUILT: Thu Mar  1 13:01:08 CST 2012
device-mapper-1.02.74-7.el6    BUILT: Wed May  2 05:14:03 CDT 2012
device-mapper-libs-1.02.74-7.el6    BUILT: Wed May  2 05:14:03 CDT 2012
device-mapper-event-1.02.74-7.el6    BUILT: Wed May  2 05:14:03 CDT 2012
device-mapper-event-libs-1.02.74-7.el6    BUILT: Wed May  2 05:14:03 CDT 2012
cmirror-2.02.95-7.el6    BUILT: Wed May  2 05:14:03 CDT 2012

Comment 21 Jonathan Earl Brassow 2012-05-22 20:32:24 UTC
    Technical note updated. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    Diffed Contents:
@@ -1,3 +1,3 @@
-LVM RAID fully supported
+LVM RAID fully supported with the exception of RAID logical volumes in HA-LVM.
 
 The expanded RAID support in LVM is now fully supported in Red Hat Enterprise Linux 6.3. LVM now has the capability to create RAID 4/5/6 logical volumes and supports a new implementation of mirroring. The MD (software RAID) modules provide the backend support for these new features.

Comment 23 errata-xmlrpc 2012-06-20 14:51:01 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2012-0962.html


Note You need to log in before you can comment on or make changes to this bug.