Hide Forgot
Description of problem: If a device which holds one leg of a mirror fails in such a way that it returns I/O errors, raid does not get repaired (the policy is set to allocate) /var/log/messages says that raid1 has been repaired, however this is not true. Version-Release number of selected component (if applicable): How reproducible: Everytime Steps to Reproduce: Create a raid and make a device fail with I/O errors (it should still be present in the system but return errors). An easy way would be to unpam iscsi mapping on the storage server. Actual results: the failure is detected and here are the messages from the log: Apr 18 16:04:55 bucek-03 lvm[3453]: Device #0 of raid1 array, vg-raid, has failed. Apr 18 16:04:55 bucek-03 lvm[3453]: /dev/sdb1: read failed after 0 of 1024 at 99994566656: Input/output error Apr 18 16:04:55 bucek-03 lvm[3453]: /dev/sdb1: read failed after 0 of 1024 at 99994685440: Input/output error Apr 18 16:04:55 bucek-03 lvm[3453]: /dev/sdb1: read failed after 0 of 1024 at 0: Input/output error Apr 18 16:04:55 bucek-03 lvm[3453]: /dev/sdb1: read failed after 0 of 1024 at 4096: Input/output error Apr 18 16:04:55 bucek-03 lvm[3453]: Faulty devices in vg/raid successfully replaced. However this is not true: (after pvscan --cache) [root@bucek-03 ~]# lvs -a -o+devices PV QL855h-zBg4-ZO1A-Vatp-XhUF-gU6O-8G5h8j not recognised. Is the device missing? PV QL855h-zBg4-ZO1A-Vatp-XhUF-gU6O-8G5h8j not recognised. Is the device missing? LV VG Attr LSize Pool Origin Data% Move Log Cpy%Sync Convert Devices home rhel_bucek-03 -wi-ao---- 224.88g /dev/sda2(1024) root rhel_bucek-03 -wi-ao---- 50.00g /dev/sda2(58592) swap rhel_bucek-03 -wi-ao---- 4.00g /dev/sda2(0) raid vg rwi-a-r-p- 2.00g 100.00 raid_rimage_0(0),raid_rimage_1(0) [raid_rimage_0] vg iwi-aor-p- 2.00g unknown device(1) [raid_rimage_1] vg iwi-aor--- 2.00g /dev/sdc1(1) [raid_rmeta_0] vg ewi-aor-p- 4.00m unknown device(0) [raid_rmeta_1] vg ewi-aor--- 4.00m /dev/sdc1(0) The raid LV is marked as partial, even though another device should have been allocated. There are 5 unused PVs in that VG. Manually running lvconvert --repair seems to do the trick, but that means that automatic raid_fault_policy is not working when lvmetad is enabled and running. Expected results: That the repair of raid finishes successfully (and automatically) based on policies set in lvm.conf and if enough PVs are available in the VG.
Forgot to write versions: lvm2-2.02.105-14.el7.x86_64 device-mapper-1.02.84-14.el7.x86_64
This is a duplicate of bug 1085553, but i'd rather allow this bug to stay open and dependent on 1085553 since it is not immediately obvious.
This request was not resolved in time for the current release. Red Hat invites you to ask your support representative to propose this request, if still desired, for consideration in the next release of Red Hat Enterprise Linux.
Should be fixed along with bug 1085553 and for same reason as bug 892991.
This was tested with a scratch build and works in it. Jan 20 16:39:10 tardis-03 lvm[2085]: Monitoring RAID device vg-raid_lv for events. Jan 20 16:39:10 tardis-03 lvm[2085]: Monitoring mirror device vg-mirror_lv for events. Jan 20 16:39:10 tardis-03 lvm: 2 logical volume(s) in volume group "vg" now active Jan 20 16:39:10 tardis-03 lvm[2085]: vg-mirror_lv is now in-sync. Jan 20 16:42:15 tardis-03 lvm[2085]: Device #1 of raid1 array, vg-raid_lv, has failed. Jan 20 16:42:15 tardis-03 lvm[2085]: /dev/sdf1: read failed after 0 of 2048 at 0: Input/output error Jan 20 16:42:15 tardis-03 lvm[2085]: No PV label found on /dev/sdf1. Jan 20 16:42:15 tardis-03 lvm[2085]: WARNING: Device for PV Y6H7MU-ZVl5-nztA-Xlne-eANW-BgVQ-lM4KlU not found or rejected by a filter. Jan 20 16:42:23 tardis-03 lvm[2085]: Monitoring RAID device vg-raid_lv for events. Jan 20 16:42:32 tardis-03 lvm[2085]: Monitoring RAID device vg-raid_lv for events. Jan 20 16:42:32 tardis-03 lvm[2085]: Faulty devices in vg/raid_lv successfully replaced. Jan 20 16:42:32 tardis-03 lvm[2085]: raid1 array, vg-raid_lv, is not in-sync. Jan 20 16:42:32 tardis-03 lvm[2085]: raid1 array, vg-raid_lv, is not in-sync. Jan 20 16:42:34 tardis-03 lvm[2085]: device-mapper: waitevent ioctl on failed: Interrupted system call Jan 20 16:42:34 tardis-03 lvm[2085]: No longer monitoring RAID device vg-raid_lv for events. Jan 20 16:42:36 tardis-03 lvm[2085]: device-mapper: waitevent ioctl on failed: Interrupted system call Jan 20 16:42:36 tardis-03 lvm[2085]: No longer monitoring RAID device vg-raid_lv for events. Jan 20 16:42:58 tardis-03 lvm[2085]: raid1 array, vg-raid_lv, is now in-sync. Will open a separate bug for these device-mapper errors, since they appear after any raid/mirror sync has completed. Marking this one VERIFIED with: 3.10.0-223.el7.x86_64 lvm2-2.02.114-6.el7 BUILT: Tue Jan 20 14:49:01 CET 2015 lvm2-libs-2.02.114-6.el7 BUILT: Tue Jan 20 14:49:01 CET 2015 lvm2-cluster-2.02.114-6.el7 BUILT: Tue Jan 20 14:49:01 CET 2015 device-mapper-1.02.92-6.el7 BUILT: Tue Jan 20 14:49:01 CET 2015 device-mapper-libs-1.02.92-6.el7 BUILT: Tue Jan 20 14:49:01 CET 2015 device-mapper-event-1.02.92-6.el7 BUILT: Tue Jan 20 14:49:01 CET 2015 device-mapper-event-libs-1.02.92-6.el7 BUILT: Tue Jan 20 14:49:01 CET 2015 device-mapper-persistent-data-0.4.1-2.el7 BUILT: Wed Nov 12 19:39:46 CET 2014 cmirror-2.02.114-6.el7 BUILT: Tue Jan 20 14:49:01 CET 2015
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2015-0513.html