Bug 593119
Summary: | RFE: LVM RAID - Handle transient failures of RAID1 images | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Jonathan Earl Brassow <jbrassow> |
Component: | lvm2 | Assignee: | Jonathan Earl Brassow <jbrassow> |
Status: | CLOSED ERRATA | QA Contact: | Corey Marthaler <cmarthal> |
Severity: | medium | Docs Contact: | |
Priority: | low | ||
Version: | 6.0 | CC: | agk, coughlan, crenshaw, djuran, dwysocha, heinzm, jbrassow, joe.thornber, jwest, mbroz, msnitzer, prockai, snagar, tao |
Target Milestone: | rc | Keywords: | FutureFeature |
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | lvm2-2.02.95-1.el6 | Doc Type: | Enhancement |
Doc Text: |
LVM RAID fully supported with the exception of RAID logical volumes in HA-LVM.
The expanded RAID support in LVM is now fully supported in Red Hat Enterprise Linux 6.3. LVM now has the capability to create RAID 4/5/6 logical volumes and supports a new implementation of mirroring. The MD (software RAID) modules provide the backend support for these new features.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2012-06-20 14:51:01 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 758552 | ||
Bug Blocks: | 697866, 732458, 756082 |
Description
Jonathan Earl Brassow
2010-05-17 20:59:32 UTC
*** Bug 319221 has been marked as a duplicate of this bug. *** Deferring to RHEL 6.2. This feature comes for free with the inclusion of RAID in LVM. This bug will define how it works, how to test it, and what the release requirements are. QE reviewed this BZ for QA_ACK but was unable to ack due to a lack of requirements or description of how the new feature is supposed to work or be tested. Please add all the device failure cases to be tested/supported in 6.3. Please see https://wiki.test.redhat.com/ClusterStorage/WhyNoAck This feature is now committed upstream in LVM version 2.02.89. In order to handle transient failures, the user must set the raid_fault_policy in lvm.conf to "warn". This will prevent the automated response from immediately replacing a device that suffers a failure - instead, warning the user of the failure. Once the user is informed of the failure, they can take steps to restore the failing device. Then, simply 'deactivate' and re-'activate' the logical volume at the next most appropriate time. This action will restore the device and update any portions of the device that may be out-of-sync. [ I've also been considering adding some code to 'lvconvert --repair' to check the RAID LV's status and "recycle" it if there is a device which is listed as failed but has come back. This way, the user would not need to go through the cumbersome 'unmount, deactivate, activate, mount' process. However, I've not addressed this idea in this bug. ] Adding QA ack for 6.3. Showing the feature in action... # 3-way RAID1 [root@bp-01 ~]# devices vg LV Copy% Devices lv 100.00 lv_rimage_0(0),lv_rimage_1(0),lv_rimage_2(0) [lv_rimage_0] /dev/sde1(1) [lv_rimage_1] /dev/sdf1(1) [lv_rimage_2] /dev/sdg1(1) [lv_rmeta_0] /dev/sde1(0) [lv_rmeta_1] /dev/sdf1(0) [lv_rmeta_2] /dev/sdg1(0) # Kill a device [root@bp-01 ~]# off.sh sdf Turning off sdf # Writing to the LV reveals device failure (note: no problem with I/O) [root@bp-01 ~]# dd if=/dev/zero of=/dev/vg/lv bs=4M count=1 1+0 records in 1+0 records out 4194304 bytes (4.2 MB) copied, 0.14839 s, 28.3 MB/s # LVM messages found in system log after array failure # (Lack of instructions to restore device may warrant some discussion...) Jan 17 15:36:41 bp-01 lvm[8599]: Device #1 of raid1 array, vg-lv, has failed. Jan 17 15:36:41 bp-01 lvm[8599]: /dev/sdf1: read failed after 0 of 2048 at 250994294784: Input/output error Jan 17 15:36:41 bp-01 lvm[8599]: /dev/sdf1: read failed after 0 of 2048 at 250994376704: Input/output error Jan 17 15:36:41 bp-01 lvm[8599]: /dev/sdf1: read failed after 0 of 2048 at 0: Input/output error Jan 17 15:36:41 bp-01 lvm[8599]: /dev/sdf1: read failed after 0 of 2048 at 4096: Input/output error Jan 17 15:36:42 bp-01 lvm[8599]: Couldn't find device with uuid VxseDx-HGqr-1Fan-TmvI-DS4S-5Xn9-OKpObo. Jan 17 15:36:43 bp-01 lvm[8599]: Issue 'lvconvert --repair vg/lv' to replace failed device # 'lvs' output [root@bp-01 ~]# devices vg /dev/sdf1: read failed after 0 of 2048 at 250994294784: Input/output error /dev/sdf1: read failed after 0 of 2048 at 250994376704: Input/output error /dev/sdf1: read failed after 0 of 2048 at 0: Input/output error /dev/sdf1: read failed after 0 of 2048 at 4096: Input/output error Couldn't find device with uuid VxseDx-HGqr-1Fan-TmvI-DS4S-5Xn9-OKpObo. LV Copy% Devices lv 100.00 lv_rimage_0(0),lv_rimage_1(0),lv_rimage_2(0) [lv_rimage_0] /dev/sde1(1) [lv_rimage_1] unknown device(1) [lv_rimage_2] /dev/sdg1(1) [lv_rmeta_0] /dev/sde1(0) [lv_rmeta_1] unknown device(0) [lv_rmeta_2] /dev/sdg1(0) # Turn device back on [root@bp-01 ~]# on.sh sdf Turning on sdf # 'lvs' shows device has recovered. # (It might be worth a bug to have the Attr characters still report that # one of the devices is considered "failed" still - at least until the LV # is recycled.) [root@bp-01 ~]# devices vg LV Copy% Devices lv 100.00 lv_rimage_0(0),lv_rimage_1(0),lv_rimage_2(0) [lv_rimage_0] /dev/sde1(1) [lv_rimage_1] /dev/sdf1(1) [lv_rimage_2] /dev/sdg1(1) [lv_rmeta_0] /dev/sde1(0) [lv_rmeta_1] /dev/sdf1(0) [lv_rmeta_2] /dev/sdg1(0) # 'dmsetup status' does show the device as "failed" still [root@bp-01 ~]# dmsetup status vg-lv 0 204800 raid raid1 3 ADA 204800/204800 # Recycle the LV [root@bp-01 ~]# lvchange -an vg/lv; lvchange -ay vg/lv # 'dmsetup status' shows the device is "alive but recovering" immediately after # (It would be nice if 'lvs' also showed this.) [root@bp-01 ~]# dmsetup status vg-lv 0 204800 raid raid1 3 AaA 198016/204800 # Once the drive is in-sync again, 'dmsetup status' shows it as 'A' again. [root@bp-01 ~]# dmsetup status vg-lv 0 204800 raid raid1 3 AAA 204800/204800 [root@bp-01 ~]# devices vg LV Copy% Devices lv 100.00 lv_rimage_0(0),lv_rimage_1(0),lv_rimage_2(0) [lv_rimage_0] /dev/sde1(1) [lv_rimage_1] /dev/sdf1(1) [lv_rimage_2] /dev/sdg1(1) [lv_rmeta_0] /dev/sde1(0) [lv_rmeta_1] /dev/sdf1(0) [lv_rmeta_2] /dev/sdg1(0) Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: The expanded RAID support in LVM moves from Tech. Preview to full suport in 6.4. Technical note updated. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. Diffed Contents: @@ -1 +1,3 @@ -The expanded RAID support in LVM moves from Tech. Preview to full suport in 6.4.+The expanded RAID support in LVM moves from Tech. Preview to full support in 6.3. + +LVM now has the capability to create RAID 4/5/6 logical volumes and supports a new implementation of mirroring. The MD (software RAID) modules provide the backend support for these new features. Technical note updated. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. Diffed Contents: @@ -1,3 +1,3 @@ -The expanded RAID support in LVM moves from Tech. Preview to full support in 6.3. +LVM RAID fully supported -LVM now has the capability to create RAID 4/5/6 logical volumes and supports a new implementation of mirroring. The MD (software RAID) modules provide the backend support for these new features.+The expanded RAID support in LVM is now fully supported in Red Hat Enterprise Linux 6.3. LVM now has the capability to create RAID 4/5/6 logical volumes and supports a new implementation of mirroring. The MD (software RAID) modules provide the backend support for these new features. Feature verified with the latest rpms. 2.6.32-269.el6.x86_64 lvm2-2.02.95-7.el6 BUILT: Wed May 2 05:14:03 CDT 2012 lvm2-libs-2.02.95-7.el6 BUILT: Wed May 2 05:14:03 CDT 2012 lvm2-cluster-2.02.95-7.el6 BUILT: Wed May 2 05:14:03 CDT 2012 udev-147-2.41.el6 BUILT: Thu Mar 1 13:01:08 CST 2012 device-mapper-1.02.74-7.el6 BUILT: Wed May 2 05:14:03 CDT 2012 device-mapper-libs-1.02.74-7.el6 BUILT: Wed May 2 05:14:03 CDT 2012 device-mapper-event-1.02.74-7.el6 BUILT: Wed May 2 05:14:03 CDT 2012 device-mapper-event-libs-1.02.74-7.el6 BUILT: Wed May 2 05:14:03 CDT 2012 cmirror-2.02.95-7.el6 BUILT: Wed May 2 05:14:03 CDT 2012 Technical note updated. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. Diffed Contents: @@ -1,3 +1,3 @@ -LVM RAID fully supported +LVM RAID fully supported with the exception of RAID logical volumes in HA-LVM. The expanded RAID support in LVM is now fully supported in Red Hat Enterprise Linux 6.3. LVM now has the capability to create RAID 4/5/6 logical volumes and supports a new implementation of mirroring. The MD (software RAID) modules provide the backend support for these new features. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2012-0962.html |