Bug 1138452

Summary: mirror device repair failure when losing all but one leg and the log device
Product: Red Hat Enterprise Linux 6 Reporter: Corey Marthaler <cmarthal>
Component: lvm2Assignee: Heinz Mauelshagen <heinzm>
lvm2 sub component: Mirroring and RAID (RHEL6) QA Contact: cluster-qe <cluster-qe>
Status: CLOSED WONTFIX Docs Contact:
Severity: high    
Priority: unspecified CC: agk, heinzm, jbrassow, msnitzer, prajnoha, prockai, zkabelac
Version: 6.6   
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1215156 (view as bug list) Environment:
Last Closed: 2017-10-02 21:52:14 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1215156    
Attachments:
Description Flags
log file none

Description Corey Marthaler 2014-09-04 20:09:57 UTC
Description of problem:
This may not be that big of a deal. I've only seen this issue one time so far, and even then, the repair was attempted a second time and eventually finished successfully.

./revolution_9 -e kill_random_devices

[...]
================================================================================
Iteration 4.3 started at Wed Sep  3 20:55:55 CDT 2014
================================================================================
Scenario kill_random_devices: Kill random devices (log and legs)

********* Mirror info for this scenario *********
* mirrors:            mirror_1
* leg devices:        /dev/sdd1 /dev/sdg1 /dev/sde1
* log devices:        /dev/sdh1
* failpv(s):          /dev/sdh1 /dev/sdg1 /dev/sde1
* failnode(s):        host-081.virt.lab.msp.redhat.com
* lvmetad:            0
* leg fault policy:   allocate
* log fault policy:   remove
*************************************************


Current mirror/raid device structure(s):
  LV                  Attr       LSize   Cpy%Sync Devices
   mirror_1            mwi-aom---   6.00g 100.00   mirror_1_mimage_4(0),mirror_1_mimage_5(0),mirror_1_mimage_6(0)
   [mirror_1_mimage_4] iwi-aom---   6.00g          /dev/sdd1(0)
   [mirror_1_mimage_5] iwi-aom---   6.00g          /dev/sdg1(0)
   [mirror_1_mimage_6] iwi-aom---   6.00g          /dev/sde1(0)
   [mirror_1_mlog]     lwi-aom---   4.00m          /dev/sdh1(0)


* NOTE: not enough available devices for allocation fault polices to fully work *
(well technically, since we have 2, some allocation should work)

PV=/dev/sde1
        mirror_1_mimage_6: 5.2
PV=/dev/sdg1
        mirror_1_mimage_5: 5.2
PV=/dev/sdh1
        mirror_1_mlog: 2
PV=/dev/sde1
        mirror_1_mimage_6: 5.2
PV=/dev/sdg1
        mirror_1_mimage_5: 5.2
PV=/dev/sdh1
        mirror_1_mlog: 2

Writing verification files (checkit) to mirror(s) on...
        ---- host-081.virt.lab.msp.redhat.com ----
checkit starting with:
CREATE
Num files:          100
Random Seed:        21346
Verify XIOR Stream: /tmp/checkit_mirror_1
Working dir:        /mnt/mirror_1/checkit

<start name="host-081.virt.lab.msp.redhat.com_mirror_1" pid="26356" time="Wed Sep  3 20:56:05 2014" type="cmd" />
Sleeping 15 seconds to get some outsanding EXT I/O locks before the failure 
Verifying files (checkit) on mirror(s) on...
        ---- host-081.virt.lab.msp.redhat.com ----
checkit starting with:
VERIFY
Verify XIOR Stream: /tmp/checkit_mirror_1
Working dir:        /mnt/mirror_1/checkit


Disabling device sdh on host-081.virt.lab.msp.redhat.com
Disabling device sdg on host-081.virt.lab.msp.redhat.com
Disabling device sde on host-081.virt.lab.msp.redhat.com

Getting recovery check start time from /var/log/messages: Sep  3 20:56
Attempting I/O to cause mirror down conversion(s) on host-081.virt.lab.msp.redhat.com
10+0 records in
10+0 records out
41943040 bytes (42 MB) copied, 0.44329 s, 94.6 MB/s

Verifying current sanity of lvm after the failure

Current mirror/raid device structure(s):
  Couldn't find device with uuid dPfkBF-zL63-UjxN-A8C2-AeSu-3U99-JhiuXt.
  Couldn't find device with uuid jGFKkS-ffhg-cXQc-x8cf-QeAc-mPWC-MXQfKg.
  Couldn't find device with uuid iRL9eF-ZdFJ-ITqK-5071-ZBjG-jk4k-beX57o.
  LV                  Attr       LSize   Cpy%Sync Devices
   mirror_1            mwi-aom---   6.00g 5.86     mirror_1_mimage_0(0),mirror_1_mimage_1(0),mirror_1_mimage_2(0)
   [mirror_1_mimage_0] Iwi-aom---   6.00g          /dev/sdd1(0)
   [mirror_1_mimage_1] Iwi-aom---   6.00g          /dev/sdb1(0)
   [mirror_1_mimage_2] Iwi-aom---   6.00g          /dev/sda1(0)
   mirror_1_mlog       -wi-----p-   4.00m          unknown device(0)

Verify that each of the mirror repairs finished successfully
repair of mirrored LV mirror_1 failed on host-081.virt.lab.msp.redhat.com



LOG:
host-081 qarshd[21355]: Running cmdline: echo offline > /sys/block/sdh/device/state
host-081 lvm[3891]: Log device 253:2 has failed (D).
host-081 lvm[3891]: Device failure in revolution_9-mirror_1.
host-081 lvm[3891]: /dev/sdh1: read failed after 0 of 512 at 16104947712: Input/output error
[...]
host-081 lvm[3891]: /dev/sdg1: read failed after 0 of 512 at 4096: Input/output error
host-081 lvm[3891]: WARNING: Inconsistent metadata found for VG revolution_9 - updating to use version 272
host-081 lvm[3891]: WARNING: Failed to write an MDA of VG revolution_9.
host-081 qarshd[21359]: Running cmdline: echo offline > /sys/block/sde/device/state
host-081 lvm[3891]: /dev/sde1: read failed after 0 of 512 at 4096: Input/output error
host-081 lvm[3891]: WARNING: Failed to write an MDA of VG revolution_9.
host-081 lvm[3891]: WARNING: Failed to write an MDA of VG revolution_9.
host-081 kernel: device-mapper: raid1: log postsuspend failed
host-081 lvm[3891]: Internal error: Performing unsafe table load while 5 device(s) are known to be suspended:  (253:8) 
host-081 lvm[3891]: Internal error: Performing unsafe table load while 5 device(s) are known to be suspended:  (253:5) 
host-081 lvm[3891]: Internal error: Performing unsafe table load while 5 device(s) are known to be suspended:  (253:9) 
host-081 lvm[3891]: Internal error: Performing unsafe table load while 5 device(s) are known to be suspended:  (253:4) 
host-081 lvm[3891]: Monitoring mirror device revolution_9-mirror_1 for events.
host-081 lvm[3891]: /dev/sde1: open failed: No such device or address
host-081 lvm[3891]: WARNING: Failed to write an MDA of VG revolution_9.
host-081 lvm[3891]: /dev/sdg1: open failed: No such device or address
host-081 lvm[3891]: WARNING: Failed to write an MDA of VG revolution_9.
host-081 lvm[3891]: /dev/sdg1: open failed: No such device or address
host-081 lvm[3891]: /dev/sde1: open failed: No such device or address
[...]
host-081 lvm[3891]: WARNING: Failed to write an MDA of VG revolution_9.
host-081 lvm[3891]: WARNING: Failed to write an MDA of VG revolution_9.
[...]
host-081 lvm[3891]: /dev/sde1: open failed: No such device or address
host-081 lvm[3891]: /dev/sdg1: open failed: No such device or address
host-081 lvm[3891]: Failed to lock mirror_1
host-081 kernel: device-mapper: table: 253:5: linear: dm-linear: Device lookup failed
host-081 kernel: device-mapper: ioctl: error adding target to table
host-081 lvm[3891]: /dev/sde1: open failed: No such device or address
host-081 lvm[3891]: Attempt to close device '/dev/sde1' which is not open.
host-081 lvm[3891]: /dev/sdg1: open failed: No such device or address
host-081 lvm[3891]: Attempt to close device '/dev/sdg1' which is not open.
host-081 lvm[3891]: Repair of mirrored device revolution_9-mirror_1 failed.
host-081 lvm[3891]: Failed to remove faulty devices in revolution_9-mirror_1.
host-081 lvm[3891]: revolution_9-mirror_1 is now in-sync.
host-081 lvm[3891]: Secondary mirror device 253:4 has failed (D).
host-081 lvm[3891]: Secondary mirror device 253:5 has failed (D).
host-081 lvm[3891]: Device failure in revolution_9-mirror_1.
host-081 lvm[3891]: Couldn't find device with uuid dPfkBF-zL63-UjxN-A8C2-AeSu-3U99-JhiuXt.
host-081 lvm[3891]: Couldn't find device with uuid jGFKkS-ffhg-cXQc-x8cf-QeAc-mPWC-MXQfKg.
host-081 lvm[3891]: Couldn't find device with uuid iRL9eF-ZdFJ-ITqK-5071-ZBjG-jk4k-beX57o.
host-081 lvm[3891]: No longer monitoring mirror device revolution_9-mirror_1 for events.
host-081 lvm[3891]: Mirror status: 2 of 3 images failed.
host-081 lvm[3891]: Trying to up-convert to 3 images, 0 logs.
host-081 lvm[3891]: Monitoring mirror device revolution_9-mirror_1 for events.
host-081 lvm[3891]: revolution_9/mirror_1: Converted: 0.0%
host-081 lvm[3891]: revolution_9/mirror_1: Converted: 6.8%
host-081 lvm[3891]: revolution_9/mirror_1: Converted: 9.9%
host-081 lvm[3891]: revolution_9/mirror_1: Converted: 15.2%
host-081 lvm[3891]: revolution_9/mirror_1: Converted: 17.4%
host-081 lvm[3891]: revolution_9/mirror_1: Converted: 19.7%
host-081 lvm[3891]: revolution_9/mirror_1: Converted: 20.6%
host-081 lvm[3891]: revolution_9/mirror_1: Converted: 22.8%
host-081 lvm[3891]: revolution_9/mirror_1: Converted: 25.8%
host-081 lvm[3891]: revolution_9/mirror_1: Converted: 29.2%
host-081 lvm[3891]: revolution_9/mirror_1: Converted: 32.2%
host-081 lvm[3891]: revolution_9/mirror_1: Converted: 34.0%
host-081 lvm[3891]: revolution_9/mirror_1: Converted: 38.1%
host-081 lvm[3891]: revolution_9/mirror_1: Converted: 42.8%
host-081 lvm[3891]: revolution_9/mirror_1: Converted: 48.9%
host-081 lvm[3891]: revolution_9/mirror_1: Converted: 52.4%
host-081 lvm[3891]: revolution_9/mirror_1: Converted: 57.7%
host-081 lvm[3891]: revolution_9/mirror_1: Converted: 63.7%
host-081 lvm[3891]: revolution_9/mirror_1: Converted: 66.5%
host-081 lvm[3891]: revolution_9/mirror_1: Converted: 72.5%
host-081 lvm[3891]: revolution_9/mirror_1: Converted: 77.5%
host-081 lvm[3891]: revolution_9/mirror_1: Converted: 80.1%
host-081 lvm[3891]: revolution_9/mirror_1: Converted: 84.2%
host-081 lvm[3891]: revolution_9/mirror_1: Converted: 87.8%
host-081 lvm[3891]: revolution_9/mirror_1: Converted: 88.8%
host-081 lvm[3891]: revolution_9/mirror_1: Converted: 93.0%
host-081 lvm[3891]: revolution_9/mirror_1: Converted: 95.8%
host-081 lvm[3891]: revolution_9/mirror_1: Converted: 96.5%
host-081 lvm[3891]: revolution_9/mirror_1: Converted: 100.0%
host-081 lvm[3891]: Repair of mirrored device revolution_9-mirror_1 finished successfully.
host-081 lvm[3891]: revolution_9-mirror_1 is now in-sync.
host-081 lvm[3891]: No longer monitoring mirror device revolution_9-mirror_1 for events.



Version-Release number of selected component (if applicable):
2.6.32-495.el6.x86_64
lvm2-2.02.111-2.el6    BUILT: Mon Sep  1 06:46:43 CDT 2014
lvm2-libs-2.02.111-2.el6    BUILT: Mon Sep  1 06:46:43 CDT 2014
lvm2-cluster-2.02.111-2.el6    BUILT: Mon Sep  1 06:46:43 CDT 2014
udev-147-2.57.el6    BUILT: Thu Jul 24 08:48:47 CDT 2014
device-mapper-1.02.90-2.el6    BUILT: Mon Sep  1 06:46:43 CDT 2014
device-mapper-libs-1.02.90-2.el6    BUILT: Mon Sep  1 06:46:43 CDT 2014
device-mapper-event-1.02.90-2.el6    BUILT: Mon Sep  1 06:46:43 CDT 2014
device-mapper-event-libs-1.02.90-2.el6    BUILT: Mon Sep  1 06:46:43 CDT 2014
device-mapper-persistent-data-0.3.2-1.el6    BUILT: Fri Apr  4 08:43:06 CDT 2014
cmirror-2.02.111-2.el6    BUILT: Mon Sep  1 06:46:43 CDT 2014

Comment 2 Corey Marthaler 2014-09-05 16:54:15 UTC
This issue does appear to be reproducable.

================================================================================
Iteration 10.6 started at Fri Sep  5 01:34:53 CDT 2014
================================================================================
ACTUAL LEG ORDER: /dev/sdg1 /dev/sdf1 /dev/sda1
Scenario kill_random_devices: Kill random devices (log and legs)

********* Mirror info for this scenario *********
* mirrors:            mirror_1
* leg devices:        /dev/sdg1 /dev/sdf1 /dev/sda1
* log devices:        /dev/sdh1
* failpv(s):          /dev/sdh1 /dev/sdg1 /dev/sda1
* failnode(s):        host-074.virt.lab.msp.redhat.com
* lvmetad:            0
* leg fault policy:   allocate
* log fault policy:   allocate
*************************************************

Current mirror/raid device structure(s):
  LV                  Attr       LSize Cpy%Sync Devices
  mirror_1            mwi-aom--- 6.00g 100.00   mirror_1_mimage_1(0),mirror_1_mimage_3(0),mirror_1_mimage_4(0)
  [mirror_1_mimage_1] iwi-aom--- 6.00g          /dev/sdg1(0)
  [mirror_1_mimage_3] iwi-aom--- 6.00g          /dev/sdf1(0)
  [mirror_1_mimage_4] iwi-aom--- 6.00g          /dev/sda1(0)
  [mirror_1_mlog]     lwi-aom--- 4.00m          /dev/sdh1(0)

* NOTE: not enough available devices for allocation fault polices to fully work *
(well technically, since we have 2, some allocation should work)

PV=/dev/sda1
        mirror_1_mimage_4: 5.2
PV=/dev/sdg1
        mirror_1_mimage_1: 5.2
PV=/dev/sdh1
        mirror_1_mlog: 1.2
PV=/dev/sda1
        mirror_1_mimage_4: 5.2
PV=/dev/sdg1
        mirror_1_mimage_1: 5.2
PV=/dev/sdh1
        mirror_1_mlog: 1.2

Writing verification files (checkit) to mirror(s) on...
        ---- host-074.virt.lab.msp.redhat.com ----
checkit starting with:

Sleeping 15 seconds to get some outsanding EXT I/O locks before the failure 
Verifying files (checkit) on mirror(s) on...
        ---- host-074.virt.lab.msp.redhat.com ----

Disabling device sdh on host-074.virt.lab.msp.redhat.com
Disabling device sdg on host-074.virt.lab.msp.redhat.com
Disabling device sda on host-074.virt.lab.msp.redhat.com

Getting recovery check start time from /var/log/messages: Sep  5 01:35
Attempting I/O to cause mirror down conversion(s) on host-074.virt.lab.msp.redhat.com
10+0 records in
10+0 records out
41943040 bytes (42 MB) copied, 0.0976763 s, 429 MB/s

Verifying current sanity of lvm after the failure

Current mirror/raid device structure(s):
  Couldn't find device with uuid P2DTMe-vQrr-ArL8-D3s8-jhzd-BLOy-vtM6Vp.
  Couldn't find device with uuid iHLb4R-mzpv-1zFj-HMI1-62tV-6ZR0-sulMkq.
  Couldn't find device with uuid eFdxXo-mV8d-e0T9-mmAT-lFzA-dgEI-ewsmbH.
  LV                  Attr       LSize Cpy%Sync Devices
  mirror_1            mwi-aom--- 6.00g 2.60     mirror_1_mimage_0(0),mirror_1_mimage_1(0),mirror_1_mimage_2(0)
  [mirror_1_mimage_0] Iwi-aom--- 6.00g          /dev/sdf1(0)
  [mirror_1_mimage_1] Iwi-aom--- 6.00g          /dev/sdd1(0)
  [mirror_1_mimage_2] Iwi-aom--- 6.00g          /dev/sdb1(0)
  [mirror_1_mlog]     lwi-aom--- 4.00m          /dev/sdb1(1536)

Verify that each of the mirror repairs finished successfully
repair of mirrored LV mirror_1 failed on host-074.virt.lab.msp.redhat.com

Comment 3 Corey Marthaler 2014-09-05 16:55:49 UTC
Created attachment 934870 [details]
log file

Comment 4 Corey Marthaler 2014-09-09 19:44:46 UTC
This appears to be easier to hit when running in exclusive activation mode while in a cluster.

Comment 5 Corey Marthaler 2014-09-09 19:45:50 UTC
Sep  9 14:23:18 host-114 qarshd[9101]: Running cmdline: pvs -a
Sep  9 14:23:18 host-114 kernel: md/raid1:mdX: active with 2 out of 4 mirrors
Sep  9 14:23:18 host-114 kernel: created bitmap (1 pages) for device mdX
Sep  9 14:23:18 host-114 kernel: md: mdX: recovery interrupted.
Sep  9 14:23:20 host-114 lvm[7548]: device-mapper: waitevent ioctl on  failed: Interrupted system call
Sep  9 14:25:18 host-114 lvm[7548]: Error locking on node host-114: Command timed out
Sep  9 14:25:18 host-114 xinetd[1932]: EXIT: qarsh status=0 pid=9101 duration=120(sec)
Sep  9 14:26:17 host-114 kernel: INFO: task clvmd:6558 blocked for more than 120 seconds.
Sep  9 14:26:17 host-114 kernel:      Not tainted 2.6.32-500.el6.x86_64 #1
Sep  9 14:26:17 host-114 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Sep  9 14:26:17 host-114 kernel: clvmd         D 0000000000000000     0  6558      1 0x00000080
Sep  9 14:26:17 host-114 kernel: ffff88003dcdfa18 0000000000000082 ffff88003dcdf9a8 ffffffff81041e98
Sep  9 14:26:17 host-114 kernel: ffff880000000000 0003ac9bb260140c ffff88003dcdf9e8 ffff88003cf70590
Sep  9 14:26:17 host-114 kernel: 00000000000a4588 ffffffffabf0b056 ffff88003d35bab8 ffff88003dcdffd8
Sep  9 14:26:17 host-114 kernel: Call Trace:
Sep  9 14:26:17 host-114 kernel: [<ffffffff81041e98>] ? pvclock_clocksource_read+0x58/0xd0
Sep  9 14:26:17 host-114 kernel: [<ffffffff810aaa21>] ? ktime_get_ts+0xb1/0xf0
Sep  9 14:26:17 host-114 kernel: [<ffffffff8152a1f3>] io_schedule+0x73/0xc0
Sep  9 14:26:17 host-114 kernel: [<ffffffff811ce71d>] __blockdev_direct_IO_newtrunc+0xb7d/0x1270
Sep  9 14:26:17 host-114 kernel: [<ffffffff812965e0>] ? vsnprintf+0x450/0x5e0
Sep  9 14:26:17 host-114 kernel: [<ffffffff811ca080>] ? blkdev_get_block+0x0/0x20
Sep  9 14:26:17 host-114 kernel: [<ffffffff8127a04d>] ? get_disk+0x7d/0xf0
Sep  9 14:26:17 host-114 kernel: [<ffffffff811cee87>] __blockdev_direct_IO+0x77/0xe0
Sep  9 14:26:17 host-114 kernel: [<ffffffff811ca080>] ? blkdev_get_block+0x0/0x20
Sep  9 14:26:17 host-114 kernel: [<ffffffff811cb107>] blkdev_direct_IO+0x57/0x60
Sep  9 14:26:17 host-114 kernel: [<ffffffff811ca080>] ? blkdev_get_block+0x0/0x20
Sep  9 14:26:17 host-114 kernel: [<ffffffff811261db>] generic_file_aio_read+0x6bb/0x700
Sep  9 14:26:17 host-114 kernel: [<ffffffff81234b51>] ? avc_has_perm+0x71/0x90
Sep  9 14:26:17 host-114 kernel: [<ffffffff81236a02>] ? selinux_inode_permission+0x72/0xb0
Sep  9 14:26:17 host-114 kernel: [<ffffffff811ca4c1>] blkdev_aio_read+0x51/0x80
Sep  9 14:26:17 host-114 kernel: [<ffffffff8118de2a>] do_sync_read+0xfa/0x140
Sep  9 14:26:17 host-114 kernel: [<ffffffff8109eb00>] ? autoremove_wake_function+0x0/0x40
Sep  9 14:26:17 host-114 kernel: [<ffffffff811ca41c>] ? block_ioctl+0x3c/0x40
Sep  9 14:26:17 host-114 kernel: [<ffffffff811a36b2>] ? vfs_ioctl+0x22/0xa0
Sep  9 14:26:17 host-114 kernel: [<ffffffff8123a55b>] ? selinux_file_permission+0xfb/0x150
Sep  9 14:26:17 host-114 kernel: [<ffffffff8122d3b6>] ? security_file_permission+0x16/0x20
Sep  9 14:26:17 host-114 kernel: [<ffffffff8118e7e5>] vfs_read+0xb5/0x1a0
Sep  9 14:26:17 host-114 kernel: [<ffffffff8118e921>] sys_read+0x51/0x90
Sep  9 14:26:17 host-114 kernel: [<ffffffff810e5aee>] ? __audit_syscall_exit+0x25e/0x290
Sep  9 14:26:17 host-114 kernel: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b

Comment 8 Corey Marthaler 2015-04-29 19:11:30 UTC
Quick update that this is still reproducable on mirror volumes in the latest 6.7 rpms. That said, it takes many iterations in order to hit this problem.

2.6.32-554.el6.x86_64
lvm2-2.02.118-2.el6    BUILT: Wed Apr 15 06:34:08 CDT 2015
lvm2-libs-2.02.118-2.el6    BUILT: Wed Apr 15 06:34:08 CDT 2015
lvm2-cluster-2.02.118-2.el6    BUILT: Wed Apr 15 06:34:08 CDT 2015
udev-147-2.61.el6    BUILT: Mon Mar  2 05:08:11 CST 2015
device-mapper-1.02.95-2.el6    BUILT: Wed Apr 15 06:34:08 CDT 2015
device-mapper-libs-1.02.95-2.el6    BUILT: Wed Apr 15 06:34:08 CDT 2015
device-mapper-event-1.02.95-2.el6    BUILT: Wed Apr 15 06:34:08 CDT 2015
device-mapper-event-libs-1.02.95-2.el6    BUILT: Wed Apr 15 06:34:08 CDT 2015
device-mapper-persistent-data-0.3.2-1.el6    BUILT: Fri Apr  4 08:43:06 CDT 2014
cmirror-2.02.118-2.el6    BUILT: Wed Apr 15 06:34:08 CDT 2015



./revolution_9 -e kill_random_devices

================================================================================
Iteration 50.1 started at Tue Apr 28 07:00:48 CDT 2015
================================================================================
ACTUAL LEG ORDER: /dev/sdc1 /dev/sda1 /dev/sde1
Scenario kill_random_devices: Kill random devices (log and legs)

********* Mirror info for this scenario *********
* mirrors:            mirror_1
* leg devices:        /dev/sdc1 /dev/sda1 /dev/sde1
* log devices:        /dev/sdf1
* failpv(s):          /dev/sdf1 /dev/sdc1 /dev/sda1
* failnode(s):        host-073.virt.lab.msp.redhat.com
* lvmetad:            0
* leg fault policy:   remove
* log fault policy:   allocate
*************************************************

Current mirror/raid device structure(s):
  LV                  Attr       LSize   Cpy%Sync Devices
   mirror_1            mwi-aom---   6.00g 100.00   mirror_1_mimage_0(0),mirror_1_mimage_1(0),mirror_1_mimage_2(0)
   [mirror_1_mimage_0] iwi-aom---   6.00g          /dev/sdc1(0)
   [mirror_1_mimage_1] iwi-aom---   6.00g          /dev/sda1(0)
   [mirror_1_mimage_2] iwi-aom---   6.00g          /dev/sde1(0)
   [mirror_1_mlog]     lwi-aom---   4.00m          /dev/sdf1(0)

* NOTE: not enough available devices for allocation fault polices to fully work *
(well technically, since we have 2, some allocation should work)

PV=/dev/sda1
        mirror_1_mimage_1: 6
PV=/dev/sdf1
        mirror_1_mlog: 1.1
PV=/dev/sdc1
        mirror_1_mimage_0: 6
PV=/dev/sda1
        mirror_1_mimage_1: 6
PV=/dev/sdf1
        mirror_1_mlog: 1.1
PV=/dev/sdc1
        mirror_1_mimage_0: 6

Writing verification files (checkit) to mirror(s) on...
        ---- host-073.virt.lab.msp.redhat.com ----

Sleeping 15 seconds to get some outsanding EXT I/O locks before the failure 
Verifying files (checkit) on mirror(s) on...
        ---- host-073.virt.lab.msp.redhat.com ----

Disabling device sdf on host-073.virt.lab.msp.redhat.com
Disabling device sdc on host-073.virt.lab.msp.redhat.com
Disabling device sda on host-073.virt.lab.msp.redhat.com

Getting recovery check start time from /var/log/messages: Apr 28 07:01
Attempting I/O to cause mirror down conversion(s) on host-073.virt.lab.msp.redhat.com
10+0 records in
10+0 records out
41943040 bytes (42 MB) copied, 0.0790474 s, 531 MB/s


[root@host-073 ~]# lvs -a -o +devices
  Couldn't find device with uuid y4sg5S-IFm9-pJ7q-iogm-juYP-J0FO-XVsXj3.
  Couldn't find device with uuid qT9tZd-hBwb-L8Mi-6nRb-d2eW-fQSa-c65wg6.
  Couldn't find device with uuid pGhXZr-RD0N-g21I-UZkw-m1X4-A0YU-P0ykAk.
  LV                  VG           Attr       LSize Log Cpy%Sync Devices
  mirror_1            revolution_9 mwi-aom-p- 6.00g     0.00     mirror_1_mimage_1(0),mirror_1_mimage_2(0)
  [mirror_1_mimage_1] revolution_9 Iwi-aom-p- 6.00g              unknown device(0)
  [mirror_1_mimage_2] revolution_9 Iwi-aom--- 6.00g              /dev/sde1(0)
  mirror_1_mlog       revolution_9 -wi-----p- 4.00m              unknown device(0)

Apr 28 07:01:27 host-073 qarshd[22827]: Running cmdline: echo offline > /sys/block/sdf/device/state
Apr 28 07:01:27 host-073 qarshd[22829]: Running cmdline: echo offline > /sys/block/sdc/device/state
Apr 28 07:01:27 host-073 lvm[22580]: Log device 253:2 has failed (D).
Apr 28 07:01:27 host-073 lvm[22580]: Device failure in revolution_9-mirror_1.
Apr 28 07:01:27 host-073 lvm[22580]: /dev/sdc1: read failed after 0 of 4096 at 0: Input/output error
Apr 28 07:01:27 host-073 lvm[22580]: /dev/sdf1: read failed after 0 of 4096 at 0: Input/output error
Apr 28 07:01:27 host-073 qarshd[22831]: Running cmdline: echo offline > /sys/block/sda/device/state
[...]
Apr 28 07:01:27 host-073 lvm[22580]: WARNING: Inconsistent metadata found for VG revolution_9 - updating to use version 504
Apr 28 07:01:27 host-073 lvm[22580]: WARNING: Failed to write an MDA of VG revolution_9.
Apr 28 07:01:27 host-073 lvm[22580]: WARNING: Failed to write an MDA of VG revolution_9.
Apr 28 07:01:32 host-073 qarshd[22857]: Running cmdline: dd if=/dev/zero of=/mnt/mirror_1/ddfile count=10 bs=4M
Apr 28 07:01:32 host-073 qarshd[22859]: Running cmdline: sync
[...]
Apr 28 07:01:35 host-073 kernel: device-mapper: raid1: log postsuspend failed
Apr 28 07:01:35 host-073 lvm[22580]: Internal error: Performing unsafe table load while 5 device(s) are known to be suspended:  (253:9)
Apr 28 07:01:35 host-073 lvm[22580]: Internal error: Performing unsafe table load while 5 device(s) are known to be suspended:  (253:4)
Apr 28 07:01:35 host-073 kernel: device-mapper: raid1: Unable to read primary mirror during recovery
Apr 28 07:01:35 host-073 kernel: device-mapper: raid1: Primary mirror (253:4) failed while out-of-sync: Reads may fail.
Apr 28 07:01:35 host-073 lvm[22580]: Monitoring mirror device revolution_9-mirror_1 for events.
Apr 28 07:01:35 host-073 lvm[22580]: /dev/sda1: open failed: No such device or address
Apr 28 07:01:35 host-073 lvm[22580]: WARNING: Failed to write an MDA of VG revolution_9.
Apr 28 07:01:35 host-073 kernel: device-mapper: raid1: Unable to read primary mirror during recovery
Apr 28 07:01:35 host-073 kernel: device-mapper: raid1: Unable to read primary mirror during recovery
Apr 28 07:01:35 host-073 kernel: device-mapper: raid1: Unable to read primary mirror during recovery
Apr 28 07:01:35 host-073 kernel: device-mapper: raid1: Unable to read primary mirror during recovery
Apr 28 07:01:35 host-073 lvm[22580]: WARNING: Failed to write an MDA of VG revolution_9.
Apr 28 07:01:35 host-073 lvm[22580]: WARNING: Failed to write an MDA of VG revolution_9.
Apr 28 07:01:35 host-073 kernel: device-mapper: table: 253:4: linear: dm-linear: Device lookup failed
Apr 28 07:01:35 host-073 kernel: device-mapper: ioctl: error adding target to table
Apr 28 07:01:35 host-073 lvm[22580]: Failed to lock logical volume revolution_9/mirror_1.
Apr 28 07:01:35 host-073 lvm[22580]: /dev/sda1: open failed: No such device or address
Apr 28 07:01:35 host-073 lvm[22580]: Attempt to close device '/dev/sda1' which is not open.
Apr 28 07:01:35 host-073 lvm[22580]: Repair of mirrored device revolution_9-mirror_1 failed.
Apr 28 07:01:35 host-073 lvm[22580]: Failed to remove faulty devices in revolution_9-mirror_1.
Apr 28 07:01:35 host-073 lvm[22580]: Primary mirror device 253:4 sync failed.
Apr 28 07:01:37 host-073 lvm[22580]: No longer monitoring mirror device revolution_9-mirror_1 for events.
Apr 28 07:04:40 host-073 kernel: INFO: task flush-253:6:22909 blocked for more than 120 seconds.
Apr 28 07:04:40 host-073 kernel:      Not tainted 2.6.32-554.el6.x86_64 #1
Apr 28 07:04:40 host-073 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 28 07:04:40 host-073 kernel: flush-253:6   D 0000000000000000     0 22909      2 0x00000080
Apr 28 07:04:40 host-073 kernel: ffff88002eeb3840 0000000000000046 ffff880000000000 0000830b5203cba2
Apr 28 07:04:40 host-073 kernel: ffff880000822f60 ffff88003a3069f0 000000000673ac22 ffffffffaac19ea0
Apr 28 07:04:40 host-073 kernel: 000000001e4188ab 0000000104366706 ffff88003b4cc5f8 ffff88002eeb3fd8
Apr 28 07:04:40 host-073 kernel: Call Trace:
Apr 28 07:04:40 host-073 kernel: [<ffffffff811c8580>] ? sync_buffer+0x0/0x50
Apr 28 07:04:40 host-073 kernel: [<ffffffff81537bc3>] io_schedule+0x73/0xc0
Apr 28 07:04:40 host-073 kernel: [<ffffffff811c85c0>] sync_buffer+0x40/0x50
Apr 28 07:04:40 host-073 kernel: [<ffffffff8153845a>] __wait_on_bit_lock+0x5a/0xc0
Apr 28 07:04:40 host-073 kernel: [<ffffffff811c8580>] ? sync_buffer+0x0/0x50
Apr 28 07:04:40 host-073 kernel: [<ffffffff81538538>] out_of_line_wait_on_bit_lock+0x78/0x90
Apr 28 07:04:40 host-073 kernel: [<ffffffff810a1520>] ? wake_bit_function+0x0/0x50
Apr 28 07:04:40 host-073 kernel: [<ffffffff811c88f0>] ? end_buffer_async_write+0x0/0x190
Apr 28 07:04:40 host-073 kernel: [<ffffffff811c8766>] __lock_buffer+0x36/0x40
Apr 28 07:04:40 host-073 kernel: [<ffffffff811c9a65>] __block_write_full_page+0x305/0x330
Apr 28 07:04:40 host-073 kernel: [<ffffffff811c88f0>] ? end_buffer_async_write+0x0/0x190
Apr 28 07:04:40 host-073 kernel: [<ffffffff811c9b70>] block_write_full_page_endio+0xe0/0x120
Apr 28 07:04:40 host-073 kernel: [<ffffffffa0480d10>] ? buffer_unmapped+0x0/0x20 [ext3]
Apr 28 07:04:40 host-073 kernel: [<ffffffff811c9bc5>] block_write_full_page+0x15/0x20
Apr 28 07:04:40 host-073 kernel: [<ffffffffa04818bd>] ext3_ordered_writepage+0x1ed/0x240 [ext3]
Apr 28 07:04:40 host-073 kernel: [<ffffffff8113b087>] __writepage+0x17/0x40
Apr 28 07:04:40 host-073 kernel: [<ffffffff8113c34d>] write_cache_pages+0x1fd/0x4c0
Apr 28 07:04:40 host-073 kernel: [<ffffffff8113b070>] ? __writepage+0x0/0x40
Apr 28 07:04:40 host-073 kernel: [<ffffffff8113c634>] generic_writepages+0x24/0x30
Apr 28 07:04:40 host-073 kernel: [<ffffffff8113c675>] do_writepages+0x35/0x40
Apr 28 07:04:40 host-073 kernel: [<ffffffff811be40d>] writeback_single_inode+0xdd/0x290
Apr 28 07:04:40 host-073 kernel: [<ffffffff811be80d>] writeback_sb_inodes+0xbd/0x170
Apr 28 07:04:40 host-073 kernel: [<ffffffff811be96b>] writeback_inodes_wb+0xab/0x1b0
Apr 28 07:04:40 host-073 kernel: [<ffffffff811bed63>] wb_writeback+0x2f3/0x410
Apr 28 07:04:40 host-073 kernel: [<ffffffff8108a772>] ? del_timer_sync+0x22/0x30
Apr 28 07:04:40 host-073 kernel: [<ffffffff811bef3b>] wb_do_writeback+0xbb/0x240
Apr 28 07:04:40 host-073 kernel: [<ffffffff811bf123>] bdi_writeback_task+0x63/0x1b0
Apr 28 07:04:40 host-073 kernel: [<ffffffff810a1327>] ? bit_waitqueue+0x17/0xd0
Apr 28 07:04:40 host-073 kernel: [<ffffffff8114b380>] ? bdi_start_fn+0x0/0x100
Apr 28 07:04:40 host-073 kernel: [<ffffffff8114b406>] bdi_start_fn+0x86/0x100
Apr 28 07:04:40 host-073 kernel: [<ffffffff8114b380>] ? bdi_start_fn+0x0/0x100
Apr 28 07:04:40 host-073 kernel: [<ffffffff810a100e>] kthread+0x9e/0xc0
Apr 28 07:04:40 host-073 kernel: [<ffffffff8100c28a>] child_rip+0xa/0x20
Apr 28 07:04:40 host-073 kernel: [<ffffffff810a0f70>] ? kthread+0x0/0xc0
Apr 28 07:04:40 host-073 kernel: [<ffffffff8100c280>] ? child_rip+0x0/0x20