RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1138452 - mirror device repair failure when losing all but one leg and the log device
Summary: mirror device repair failure when losing all but one leg and the log device
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: lvm2
Version: 6.6
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: rc
: ---
Assignee: Heinz Mauelshagen
QA Contact: cluster-qe@redhat.com
URL:
Whiteboard:
Depends On:
Blocks: 1215156
TreeView+ depends on / blocked
 
Reported: 2014-09-04 20:09 UTC by Corey Marthaler
Modified: 2017-10-02 21:52 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1215156 (view as bug list)
Environment:
Last Closed: 2017-10-02 21:52:14 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
log file (50.57 KB, text/plain)
2014-09-05 16:55 UTC, Corey Marthaler
no flags Details

Description Corey Marthaler 2014-09-04 20:09:57 UTC
Description of problem:
This may not be that big of a deal. I've only seen this issue one time so far, and even then, the repair was attempted a second time and eventually finished successfully.

./revolution_9 -e kill_random_devices

[...]
================================================================================
Iteration 4.3 started at Wed Sep  3 20:55:55 CDT 2014
================================================================================
Scenario kill_random_devices: Kill random devices (log and legs)

********* Mirror info for this scenario *********
* mirrors:            mirror_1
* leg devices:        /dev/sdd1 /dev/sdg1 /dev/sde1
* log devices:        /dev/sdh1
* failpv(s):          /dev/sdh1 /dev/sdg1 /dev/sde1
* failnode(s):        host-081.virt.lab.msp.redhat.com
* lvmetad:            0
* leg fault policy:   allocate
* log fault policy:   remove
*************************************************


Current mirror/raid device structure(s):
  LV                  Attr       LSize   Cpy%Sync Devices
   mirror_1            mwi-aom---   6.00g 100.00   mirror_1_mimage_4(0),mirror_1_mimage_5(0),mirror_1_mimage_6(0)
   [mirror_1_mimage_4] iwi-aom---   6.00g          /dev/sdd1(0)
   [mirror_1_mimage_5] iwi-aom---   6.00g          /dev/sdg1(0)
   [mirror_1_mimage_6] iwi-aom---   6.00g          /dev/sde1(0)
   [mirror_1_mlog]     lwi-aom---   4.00m          /dev/sdh1(0)


* NOTE: not enough available devices for allocation fault polices to fully work *
(well technically, since we have 2, some allocation should work)

PV=/dev/sde1
        mirror_1_mimage_6: 5.2
PV=/dev/sdg1
        mirror_1_mimage_5: 5.2
PV=/dev/sdh1
        mirror_1_mlog: 2
PV=/dev/sde1
        mirror_1_mimage_6: 5.2
PV=/dev/sdg1
        mirror_1_mimage_5: 5.2
PV=/dev/sdh1
        mirror_1_mlog: 2

Writing verification files (checkit) to mirror(s) on...
        ---- host-081.virt.lab.msp.redhat.com ----
checkit starting with:
CREATE
Num files:          100
Random Seed:        21346
Verify XIOR Stream: /tmp/checkit_mirror_1
Working dir:        /mnt/mirror_1/checkit

<start name="host-081.virt.lab.msp.redhat.com_mirror_1" pid="26356" time="Wed Sep  3 20:56:05 2014" type="cmd" />
Sleeping 15 seconds to get some outsanding EXT I/O locks before the failure 
Verifying files (checkit) on mirror(s) on...
        ---- host-081.virt.lab.msp.redhat.com ----
checkit starting with:
VERIFY
Verify XIOR Stream: /tmp/checkit_mirror_1
Working dir:        /mnt/mirror_1/checkit


Disabling device sdh on host-081.virt.lab.msp.redhat.com
Disabling device sdg on host-081.virt.lab.msp.redhat.com
Disabling device sde on host-081.virt.lab.msp.redhat.com

Getting recovery check start time from /var/log/messages: Sep  3 20:56
Attempting I/O to cause mirror down conversion(s) on host-081.virt.lab.msp.redhat.com
10+0 records in
10+0 records out
41943040 bytes (42 MB) copied, 0.44329 s, 94.6 MB/s

Verifying current sanity of lvm after the failure

Current mirror/raid device structure(s):
  Couldn't find device with uuid dPfkBF-zL63-UjxN-A8C2-AeSu-3U99-JhiuXt.
  Couldn't find device with uuid jGFKkS-ffhg-cXQc-x8cf-QeAc-mPWC-MXQfKg.
  Couldn't find device with uuid iRL9eF-ZdFJ-ITqK-5071-ZBjG-jk4k-beX57o.
  LV                  Attr       LSize   Cpy%Sync Devices
   mirror_1            mwi-aom---   6.00g 5.86     mirror_1_mimage_0(0),mirror_1_mimage_1(0),mirror_1_mimage_2(0)
   [mirror_1_mimage_0] Iwi-aom---   6.00g          /dev/sdd1(0)
   [mirror_1_mimage_1] Iwi-aom---   6.00g          /dev/sdb1(0)
   [mirror_1_mimage_2] Iwi-aom---   6.00g          /dev/sda1(0)
   mirror_1_mlog       -wi-----p-   4.00m          unknown device(0)

Verify that each of the mirror repairs finished successfully
repair of mirrored LV mirror_1 failed on host-081.virt.lab.msp.redhat.com



LOG:
host-081 qarshd[21355]: Running cmdline: echo offline > /sys/block/sdh/device/state
host-081 lvm[3891]: Log device 253:2 has failed (D).
host-081 lvm[3891]: Device failure in revolution_9-mirror_1.
host-081 lvm[3891]: /dev/sdh1: read failed after 0 of 512 at 16104947712: Input/output error
[...]
host-081 lvm[3891]: /dev/sdg1: read failed after 0 of 512 at 4096: Input/output error
host-081 lvm[3891]: WARNING: Inconsistent metadata found for VG revolution_9 - updating to use version 272
host-081 lvm[3891]: WARNING: Failed to write an MDA of VG revolution_9.
host-081 qarshd[21359]: Running cmdline: echo offline > /sys/block/sde/device/state
host-081 lvm[3891]: /dev/sde1: read failed after 0 of 512 at 4096: Input/output error
host-081 lvm[3891]: WARNING: Failed to write an MDA of VG revolution_9.
host-081 lvm[3891]: WARNING: Failed to write an MDA of VG revolution_9.
host-081 kernel: device-mapper: raid1: log postsuspend failed
host-081 lvm[3891]: Internal error: Performing unsafe table load while 5 device(s) are known to be suspended:  (253:8) 
host-081 lvm[3891]: Internal error: Performing unsafe table load while 5 device(s) are known to be suspended:  (253:5) 
host-081 lvm[3891]: Internal error: Performing unsafe table load while 5 device(s) are known to be suspended:  (253:9) 
host-081 lvm[3891]: Internal error: Performing unsafe table load while 5 device(s) are known to be suspended:  (253:4) 
host-081 lvm[3891]: Monitoring mirror device revolution_9-mirror_1 for events.
host-081 lvm[3891]: /dev/sde1: open failed: No such device or address
host-081 lvm[3891]: WARNING: Failed to write an MDA of VG revolution_9.
host-081 lvm[3891]: /dev/sdg1: open failed: No such device or address
host-081 lvm[3891]: WARNING: Failed to write an MDA of VG revolution_9.
host-081 lvm[3891]: /dev/sdg1: open failed: No such device or address
host-081 lvm[3891]: /dev/sde1: open failed: No such device or address
[...]
host-081 lvm[3891]: WARNING: Failed to write an MDA of VG revolution_9.
host-081 lvm[3891]: WARNING: Failed to write an MDA of VG revolution_9.
[...]
host-081 lvm[3891]: /dev/sde1: open failed: No such device or address
host-081 lvm[3891]: /dev/sdg1: open failed: No such device or address
host-081 lvm[3891]: Failed to lock mirror_1
host-081 kernel: device-mapper: table: 253:5: linear: dm-linear: Device lookup failed
host-081 kernel: device-mapper: ioctl: error adding target to table
host-081 lvm[3891]: /dev/sde1: open failed: No such device or address
host-081 lvm[3891]: Attempt to close device '/dev/sde1' which is not open.
host-081 lvm[3891]: /dev/sdg1: open failed: No such device or address
host-081 lvm[3891]: Attempt to close device '/dev/sdg1' which is not open.
host-081 lvm[3891]: Repair of mirrored device revolution_9-mirror_1 failed.
host-081 lvm[3891]: Failed to remove faulty devices in revolution_9-mirror_1.
host-081 lvm[3891]: revolution_9-mirror_1 is now in-sync.
host-081 lvm[3891]: Secondary mirror device 253:4 has failed (D).
host-081 lvm[3891]: Secondary mirror device 253:5 has failed (D).
host-081 lvm[3891]: Device failure in revolution_9-mirror_1.
host-081 lvm[3891]: Couldn't find device with uuid dPfkBF-zL63-UjxN-A8C2-AeSu-3U99-JhiuXt.
host-081 lvm[3891]: Couldn't find device with uuid jGFKkS-ffhg-cXQc-x8cf-QeAc-mPWC-MXQfKg.
host-081 lvm[3891]: Couldn't find device with uuid iRL9eF-ZdFJ-ITqK-5071-ZBjG-jk4k-beX57o.
host-081 lvm[3891]: No longer monitoring mirror device revolution_9-mirror_1 for events.
host-081 lvm[3891]: Mirror status: 2 of 3 images failed.
host-081 lvm[3891]: Trying to up-convert to 3 images, 0 logs.
host-081 lvm[3891]: Monitoring mirror device revolution_9-mirror_1 for events.
host-081 lvm[3891]: revolution_9/mirror_1: Converted: 0.0%
host-081 lvm[3891]: revolution_9/mirror_1: Converted: 6.8%
host-081 lvm[3891]: revolution_9/mirror_1: Converted: 9.9%
host-081 lvm[3891]: revolution_9/mirror_1: Converted: 15.2%
host-081 lvm[3891]: revolution_9/mirror_1: Converted: 17.4%
host-081 lvm[3891]: revolution_9/mirror_1: Converted: 19.7%
host-081 lvm[3891]: revolution_9/mirror_1: Converted: 20.6%
host-081 lvm[3891]: revolution_9/mirror_1: Converted: 22.8%
host-081 lvm[3891]: revolution_9/mirror_1: Converted: 25.8%
host-081 lvm[3891]: revolution_9/mirror_1: Converted: 29.2%
host-081 lvm[3891]: revolution_9/mirror_1: Converted: 32.2%
host-081 lvm[3891]: revolution_9/mirror_1: Converted: 34.0%
host-081 lvm[3891]: revolution_9/mirror_1: Converted: 38.1%
host-081 lvm[3891]: revolution_9/mirror_1: Converted: 42.8%
host-081 lvm[3891]: revolution_9/mirror_1: Converted: 48.9%
host-081 lvm[3891]: revolution_9/mirror_1: Converted: 52.4%
host-081 lvm[3891]: revolution_9/mirror_1: Converted: 57.7%
host-081 lvm[3891]: revolution_9/mirror_1: Converted: 63.7%
host-081 lvm[3891]: revolution_9/mirror_1: Converted: 66.5%
host-081 lvm[3891]: revolution_9/mirror_1: Converted: 72.5%
host-081 lvm[3891]: revolution_9/mirror_1: Converted: 77.5%
host-081 lvm[3891]: revolution_9/mirror_1: Converted: 80.1%
host-081 lvm[3891]: revolution_9/mirror_1: Converted: 84.2%
host-081 lvm[3891]: revolution_9/mirror_1: Converted: 87.8%
host-081 lvm[3891]: revolution_9/mirror_1: Converted: 88.8%
host-081 lvm[3891]: revolution_9/mirror_1: Converted: 93.0%
host-081 lvm[3891]: revolution_9/mirror_1: Converted: 95.8%
host-081 lvm[3891]: revolution_9/mirror_1: Converted: 96.5%
host-081 lvm[3891]: revolution_9/mirror_1: Converted: 100.0%
host-081 lvm[3891]: Repair of mirrored device revolution_9-mirror_1 finished successfully.
host-081 lvm[3891]: revolution_9-mirror_1 is now in-sync.
host-081 lvm[3891]: No longer monitoring mirror device revolution_9-mirror_1 for events.



Version-Release number of selected component (if applicable):
2.6.32-495.el6.x86_64
lvm2-2.02.111-2.el6    BUILT: Mon Sep  1 06:46:43 CDT 2014
lvm2-libs-2.02.111-2.el6    BUILT: Mon Sep  1 06:46:43 CDT 2014
lvm2-cluster-2.02.111-2.el6    BUILT: Mon Sep  1 06:46:43 CDT 2014
udev-147-2.57.el6    BUILT: Thu Jul 24 08:48:47 CDT 2014
device-mapper-1.02.90-2.el6    BUILT: Mon Sep  1 06:46:43 CDT 2014
device-mapper-libs-1.02.90-2.el6    BUILT: Mon Sep  1 06:46:43 CDT 2014
device-mapper-event-1.02.90-2.el6    BUILT: Mon Sep  1 06:46:43 CDT 2014
device-mapper-event-libs-1.02.90-2.el6    BUILT: Mon Sep  1 06:46:43 CDT 2014
device-mapper-persistent-data-0.3.2-1.el6    BUILT: Fri Apr  4 08:43:06 CDT 2014
cmirror-2.02.111-2.el6    BUILT: Mon Sep  1 06:46:43 CDT 2014

Comment 2 Corey Marthaler 2014-09-05 16:54:15 UTC
This issue does appear to be reproducable.

================================================================================
Iteration 10.6 started at Fri Sep  5 01:34:53 CDT 2014
================================================================================
ACTUAL LEG ORDER: /dev/sdg1 /dev/sdf1 /dev/sda1
Scenario kill_random_devices: Kill random devices (log and legs)

********* Mirror info for this scenario *********
* mirrors:            mirror_1
* leg devices:        /dev/sdg1 /dev/sdf1 /dev/sda1
* log devices:        /dev/sdh1
* failpv(s):          /dev/sdh1 /dev/sdg1 /dev/sda1
* failnode(s):        host-074.virt.lab.msp.redhat.com
* lvmetad:            0
* leg fault policy:   allocate
* log fault policy:   allocate
*************************************************

Current mirror/raid device structure(s):
  LV                  Attr       LSize Cpy%Sync Devices
  mirror_1            mwi-aom--- 6.00g 100.00   mirror_1_mimage_1(0),mirror_1_mimage_3(0),mirror_1_mimage_4(0)
  [mirror_1_mimage_1] iwi-aom--- 6.00g          /dev/sdg1(0)
  [mirror_1_mimage_3] iwi-aom--- 6.00g          /dev/sdf1(0)
  [mirror_1_mimage_4] iwi-aom--- 6.00g          /dev/sda1(0)
  [mirror_1_mlog]     lwi-aom--- 4.00m          /dev/sdh1(0)

* NOTE: not enough available devices for allocation fault polices to fully work *
(well technically, since we have 2, some allocation should work)

PV=/dev/sda1
        mirror_1_mimage_4: 5.2
PV=/dev/sdg1
        mirror_1_mimage_1: 5.2
PV=/dev/sdh1
        mirror_1_mlog: 1.2
PV=/dev/sda1
        mirror_1_mimage_4: 5.2
PV=/dev/sdg1
        mirror_1_mimage_1: 5.2
PV=/dev/sdh1
        mirror_1_mlog: 1.2

Writing verification files (checkit) to mirror(s) on...
        ---- host-074.virt.lab.msp.redhat.com ----
checkit starting with:

Sleeping 15 seconds to get some outsanding EXT I/O locks before the failure 
Verifying files (checkit) on mirror(s) on...
        ---- host-074.virt.lab.msp.redhat.com ----

Disabling device sdh on host-074.virt.lab.msp.redhat.com
Disabling device sdg on host-074.virt.lab.msp.redhat.com
Disabling device sda on host-074.virt.lab.msp.redhat.com

Getting recovery check start time from /var/log/messages: Sep  5 01:35
Attempting I/O to cause mirror down conversion(s) on host-074.virt.lab.msp.redhat.com
10+0 records in
10+0 records out
41943040 bytes (42 MB) copied, 0.0976763 s, 429 MB/s

Verifying current sanity of lvm after the failure

Current mirror/raid device structure(s):
  Couldn't find device with uuid P2DTMe-vQrr-ArL8-D3s8-jhzd-BLOy-vtM6Vp.
  Couldn't find device with uuid iHLb4R-mzpv-1zFj-HMI1-62tV-6ZR0-sulMkq.
  Couldn't find device with uuid eFdxXo-mV8d-e0T9-mmAT-lFzA-dgEI-ewsmbH.
  LV                  Attr       LSize Cpy%Sync Devices
  mirror_1            mwi-aom--- 6.00g 2.60     mirror_1_mimage_0(0),mirror_1_mimage_1(0),mirror_1_mimage_2(0)
  [mirror_1_mimage_0] Iwi-aom--- 6.00g          /dev/sdf1(0)
  [mirror_1_mimage_1] Iwi-aom--- 6.00g          /dev/sdd1(0)
  [mirror_1_mimage_2] Iwi-aom--- 6.00g          /dev/sdb1(0)
  [mirror_1_mlog]     lwi-aom--- 4.00m          /dev/sdb1(1536)

Verify that each of the mirror repairs finished successfully
repair of mirrored LV mirror_1 failed on host-074.virt.lab.msp.redhat.com

Comment 3 Corey Marthaler 2014-09-05 16:55:49 UTC
Created attachment 934870 [details]
log file

Comment 4 Corey Marthaler 2014-09-09 19:44:46 UTC
This appears to be easier to hit when running in exclusive activation mode while in a cluster.

Comment 5 Corey Marthaler 2014-09-09 19:45:50 UTC
Sep  9 14:23:18 host-114 qarshd[9101]: Running cmdline: pvs -a
Sep  9 14:23:18 host-114 kernel: md/raid1:mdX: active with 2 out of 4 mirrors
Sep  9 14:23:18 host-114 kernel: created bitmap (1 pages) for device mdX
Sep  9 14:23:18 host-114 kernel: md: mdX: recovery interrupted.
Sep  9 14:23:20 host-114 lvm[7548]: device-mapper: waitevent ioctl on  failed: Interrupted system call
Sep  9 14:25:18 host-114 lvm[7548]: Error locking on node host-114: Command timed out
Sep  9 14:25:18 host-114 xinetd[1932]: EXIT: qarsh status=0 pid=9101 duration=120(sec)
Sep  9 14:26:17 host-114 kernel: INFO: task clvmd:6558 blocked for more than 120 seconds.
Sep  9 14:26:17 host-114 kernel:      Not tainted 2.6.32-500.el6.x86_64 #1
Sep  9 14:26:17 host-114 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Sep  9 14:26:17 host-114 kernel: clvmd         D 0000000000000000     0  6558      1 0x00000080
Sep  9 14:26:17 host-114 kernel: ffff88003dcdfa18 0000000000000082 ffff88003dcdf9a8 ffffffff81041e98
Sep  9 14:26:17 host-114 kernel: ffff880000000000 0003ac9bb260140c ffff88003dcdf9e8 ffff88003cf70590
Sep  9 14:26:17 host-114 kernel: 00000000000a4588 ffffffffabf0b056 ffff88003d35bab8 ffff88003dcdffd8
Sep  9 14:26:17 host-114 kernel: Call Trace:
Sep  9 14:26:17 host-114 kernel: [<ffffffff81041e98>] ? pvclock_clocksource_read+0x58/0xd0
Sep  9 14:26:17 host-114 kernel: [<ffffffff810aaa21>] ? ktime_get_ts+0xb1/0xf0
Sep  9 14:26:17 host-114 kernel: [<ffffffff8152a1f3>] io_schedule+0x73/0xc0
Sep  9 14:26:17 host-114 kernel: [<ffffffff811ce71d>] __blockdev_direct_IO_newtrunc+0xb7d/0x1270
Sep  9 14:26:17 host-114 kernel: [<ffffffff812965e0>] ? vsnprintf+0x450/0x5e0
Sep  9 14:26:17 host-114 kernel: [<ffffffff811ca080>] ? blkdev_get_block+0x0/0x20
Sep  9 14:26:17 host-114 kernel: [<ffffffff8127a04d>] ? get_disk+0x7d/0xf0
Sep  9 14:26:17 host-114 kernel: [<ffffffff811cee87>] __blockdev_direct_IO+0x77/0xe0
Sep  9 14:26:17 host-114 kernel: [<ffffffff811ca080>] ? blkdev_get_block+0x0/0x20
Sep  9 14:26:17 host-114 kernel: [<ffffffff811cb107>] blkdev_direct_IO+0x57/0x60
Sep  9 14:26:17 host-114 kernel: [<ffffffff811ca080>] ? blkdev_get_block+0x0/0x20
Sep  9 14:26:17 host-114 kernel: [<ffffffff811261db>] generic_file_aio_read+0x6bb/0x700
Sep  9 14:26:17 host-114 kernel: [<ffffffff81234b51>] ? avc_has_perm+0x71/0x90
Sep  9 14:26:17 host-114 kernel: [<ffffffff81236a02>] ? selinux_inode_permission+0x72/0xb0
Sep  9 14:26:17 host-114 kernel: [<ffffffff811ca4c1>] blkdev_aio_read+0x51/0x80
Sep  9 14:26:17 host-114 kernel: [<ffffffff8118de2a>] do_sync_read+0xfa/0x140
Sep  9 14:26:17 host-114 kernel: [<ffffffff8109eb00>] ? autoremove_wake_function+0x0/0x40
Sep  9 14:26:17 host-114 kernel: [<ffffffff811ca41c>] ? block_ioctl+0x3c/0x40
Sep  9 14:26:17 host-114 kernel: [<ffffffff811a36b2>] ? vfs_ioctl+0x22/0xa0
Sep  9 14:26:17 host-114 kernel: [<ffffffff8123a55b>] ? selinux_file_permission+0xfb/0x150
Sep  9 14:26:17 host-114 kernel: [<ffffffff8122d3b6>] ? security_file_permission+0x16/0x20
Sep  9 14:26:17 host-114 kernel: [<ffffffff8118e7e5>] vfs_read+0xb5/0x1a0
Sep  9 14:26:17 host-114 kernel: [<ffffffff8118e921>] sys_read+0x51/0x90
Sep  9 14:26:17 host-114 kernel: [<ffffffff810e5aee>] ? __audit_syscall_exit+0x25e/0x290
Sep  9 14:26:17 host-114 kernel: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b

Comment 8 Corey Marthaler 2015-04-29 19:11:30 UTC
Quick update that this is still reproducable on mirror volumes in the latest 6.7 rpms. That said, it takes many iterations in order to hit this problem.

2.6.32-554.el6.x86_64
lvm2-2.02.118-2.el6    BUILT: Wed Apr 15 06:34:08 CDT 2015
lvm2-libs-2.02.118-2.el6    BUILT: Wed Apr 15 06:34:08 CDT 2015
lvm2-cluster-2.02.118-2.el6    BUILT: Wed Apr 15 06:34:08 CDT 2015
udev-147-2.61.el6    BUILT: Mon Mar  2 05:08:11 CST 2015
device-mapper-1.02.95-2.el6    BUILT: Wed Apr 15 06:34:08 CDT 2015
device-mapper-libs-1.02.95-2.el6    BUILT: Wed Apr 15 06:34:08 CDT 2015
device-mapper-event-1.02.95-2.el6    BUILT: Wed Apr 15 06:34:08 CDT 2015
device-mapper-event-libs-1.02.95-2.el6    BUILT: Wed Apr 15 06:34:08 CDT 2015
device-mapper-persistent-data-0.3.2-1.el6    BUILT: Fri Apr  4 08:43:06 CDT 2014
cmirror-2.02.118-2.el6    BUILT: Wed Apr 15 06:34:08 CDT 2015



./revolution_9 -e kill_random_devices

================================================================================
Iteration 50.1 started at Tue Apr 28 07:00:48 CDT 2015
================================================================================
ACTUAL LEG ORDER: /dev/sdc1 /dev/sda1 /dev/sde1
Scenario kill_random_devices: Kill random devices (log and legs)

********* Mirror info for this scenario *********
* mirrors:            mirror_1
* leg devices:        /dev/sdc1 /dev/sda1 /dev/sde1
* log devices:        /dev/sdf1
* failpv(s):          /dev/sdf1 /dev/sdc1 /dev/sda1
* failnode(s):        host-073.virt.lab.msp.redhat.com
* lvmetad:            0
* leg fault policy:   remove
* log fault policy:   allocate
*************************************************

Current mirror/raid device structure(s):
  LV                  Attr       LSize   Cpy%Sync Devices
   mirror_1            mwi-aom---   6.00g 100.00   mirror_1_mimage_0(0),mirror_1_mimage_1(0),mirror_1_mimage_2(0)
   [mirror_1_mimage_0] iwi-aom---   6.00g          /dev/sdc1(0)
   [mirror_1_mimage_1] iwi-aom---   6.00g          /dev/sda1(0)
   [mirror_1_mimage_2] iwi-aom---   6.00g          /dev/sde1(0)
   [mirror_1_mlog]     lwi-aom---   4.00m          /dev/sdf1(0)

* NOTE: not enough available devices for allocation fault polices to fully work *
(well technically, since we have 2, some allocation should work)

PV=/dev/sda1
        mirror_1_mimage_1: 6
PV=/dev/sdf1
        mirror_1_mlog: 1.1
PV=/dev/sdc1
        mirror_1_mimage_0: 6
PV=/dev/sda1
        mirror_1_mimage_1: 6
PV=/dev/sdf1
        mirror_1_mlog: 1.1
PV=/dev/sdc1
        mirror_1_mimage_0: 6

Writing verification files (checkit) to mirror(s) on...
        ---- host-073.virt.lab.msp.redhat.com ----

Sleeping 15 seconds to get some outsanding EXT I/O locks before the failure 
Verifying files (checkit) on mirror(s) on...
        ---- host-073.virt.lab.msp.redhat.com ----

Disabling device sdf on host-073.virt.lab.msp.redhat.com
Disabling device sdc on host-073.virt.lab.msp.redhat.com
Disabling device sda on host-073.virt.lab.msp.redhat.com

Getting recovery check start time from /var/log/messages: Apr 28 07:01
Attempting I/O to cause mirror down conversion(s) on host-073.virt.lab.msp.redhat.com
10+0 records in
10+0 records out
41943040 bytes (42 MB) copied, 0.0790474 s, 531 MB/s


[root@host-073 ~]# lvs -a -o +devices
  Couldn't find device with uuid y4sg5S-IFm9-pJ7q-iogm-juYP-J0FO-XVsXj3.
  Couldn't find device with uuid qT9tZd-hBwb-L8Mi-6nRb-d2eW-fQSa-c65wg6.
  Couldn't find device with uuid pGhXZr-RD0N-g21I-UZkw-m1X4-A0YU-P0ykAk.
  LV                  VG           Attr       LSize Log Cpy%Sync Devices
  mirror_1            revolution_9 mwi-aom-p- 6.00g     0.00     mirror_1_mimage_1(0),mirror_1_mimage_2(0)
  [mirror_1_mimage_1] revolution_9 Iwi-aom-p- 6.00g              unknown device(0)
  [mirror_1_mimage_2] revolution_9 Iwi-aom--- 6.00g              /dev/sde1(0)
  mirror_1_mlog       revolution_9 -wi-----p- 4.00m              unknown device(0)

Apr 28 07:01:27 host-073 qarshd[22827]: Running cmdline: echo offline > /sys/block/sdf/device/state
Apr 28 07:01:27 host-073 qarshd[22829]: Running cmdline: echo offline > /sys/block/sdc/device/state
Apr 28 07:01:27 host-073 lvm[22580]: Log device 253:2 has failed (D).
Apr 28 07:01:27 host-073 lvm[22580]: Device failure in revolution_9-mirror_1.
Apr 28 07:01:27 host-073 lvm[22580]: /dev/sdc1: read failed after 0 of 4096 at 0: Input/output error
Apr 28 07:01:27 host-073 lvm[22580]: /dev/sdf1: read failed after 0 of 4096 at 0: Input/output error
Apr 28 07:01:27 host-073 qarshd[22831]: Running cmdline: echo offline > /sys/block/sda/device/state
[...]
Apr 28 07:01:27 host-073 lvm[22580]: WARNING: Inconsistent metadata found for VG revolution_9 - updating to use version 504
Apr 28 07:01:27 host-073 lvm[22580]: WARNING: Failed to write an MDA of VG revolution_9.
Apr 28 07:01:27 host-073 lvm[22580]: WARNING: Failed to write an MDA of VG revolution_9.
Apr 28 07:01:32 host-073 qarshd[22857]: Running cmdline: dd if=/dev/zero of=/mnt/mirror_1/ddfile count=10 bs=4M
Apr 28 07:01:32 host-073 qarshd[22859]: Running cmdline: sync
[...]
Apr 28 07:01:35 host-073 kernel: device-mapper: raid1: log postsuspend failed
Apr 28 07:01:35 host-073 lvm[22580]: Internal error: Performing unsafe table load while 5 device(s) are known to be suspended:  (253:9)
Apr 28 07:01:35 host-073 lvm[22580]: Internal error: Performing unsafe table load while 5 device(s) are known to be suspended:  (253:4)
Apr 28 07:01:35 host-073 kernel: device-mapper: raid1: Unable to read primary mirror during recovery
Apr 28 07:01:35 host-073 kernel: device-mapper: raid1: Primary mirror (253:4) failed while out-of-sync: Reads may fail.
Apr 28 07:01:35 host-073 lvm[22580]: Monitoring mirror device revolution_9-mirror_1 for events.
Apr 28 07:01:35 host-073 lvm[22580]: /dev/sda1: open failed: No such device or address
Apr 28 07:01:35 host-073 lvm[22580]: WARNING: Failed to write an MDA of VG revolution_9.
Apr 28 07:01:35 host-073 kernel: device-mapper: raid1: Unable to read primary mirror during recovery
Apr 28 07:01:35 host-073 kernel: device-mapper: raid1: Unable to read primary mirror during recovery
Apr 28 07:01:35 host-073 kernel: device-mapper: raid1: Unable to read primary mirror during recovery
Apr 28 07:01:35 host-073 kernel: device-mapper: raid1: Unable to read primary mirror during recovery
Apr 28 07:01:35 host-073 lvm[22580]: WARNING: Failed to write an MDA of VG revolution_9.
Apr 28 07:01:35 host-073 lvm[22580]: WARNING: Failed to write an MDA of VG revolution_9.
Apr 28 07:01:35 host-073 kernel: device-mapper: table: 253:4: linear: dm-linear: Device lookup failed
Apr 28 07:01:35 host-073 kernel: device-mapper: ioctl: error adding target to table
Apr 28 07:01:35 host-073 lvm[22580]: Failed to lock logical volume revolution_9/mirror_1.
Apr 28 07:01:35 host-073 lvm[22580]: /dev/sda1: open failed: No such device or address
Apr 28 07:01:35 host-073 lvm[22580]: Attempt to close device '/dev/sda1' which is not open.
Apr 28 07:01:35 host-073 lvm[22580]: Repair of mirrored device revolution_9-mirror_1 failed.
Apr 28 07:01:35 host-073 lvm[22580]: Failed to remove faulty devices in revolution_9-mirror_1.
Apr 28 07:01:35 host-073 lvm[22580]: Primary mirror device 253:4 sync failed.
Apr 28 07:01:37 host-073 lvm[22580]: No longer monitoring mirror device revolution_9-mirror_1 for events.
Apr 28 07:04:40 host-073 kernel: INFO: task flush-253:6:22909 blocked for more than 120 seconds.
Apr 28 07:04:40 host-073 kernel:      Not tainted 2.6.32-554.el6.x86_64 #1
Apr 28 07:04:40 host-073 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 28 07:04:40 host-073 kernel: flush-253:6   D 0000000000000000     0 22909      2 0x00000080
Apr 28 07:04:40 host-073 kernel: ffff88002eeb3840 0000000000000046 ffff880000000000 0000830b5203cba2
Apr 28 07:04:40 host-073 kernel: ffff880000822f60 ffff88003a3069f0 000000000673ac22 ffffffffaac19ea0
Apr 28 07:04:40 host-073 kernel: 000000001e4188ab 0000000104366706 ffff88003b4cc5f8 ffff88002eeb3fd8
Apr 28 07:04:40 host-073 kernel: Call Trace:
Apr 28 07:04:40 host-073 kernel: [<ffffffff811c8580>] ? sync_buffer+0x0/0x50
Apr 28 07:04:40 host-073 kernel: [<ffffffff81537bc3>] io_schedule+0x73/0xc0
Apr 28 07:04:40 host-073 kernel: [<ffffffff811c85c0>] sync_buffer+0x40/0x50
Apr 28 07:04:40 host-073 kernel: [<ffffffff8153845a>] __wait_on_bit_lock+0x5a/0xc0
Apr 28 07:04:40 host-073 kernel: [<ffffffff811c8580>] ? sync_buffer+0x0/0x50
Apr 28 07:04:40 host-073 kernel: [<ffffffff81538538>] out_of_line_wait_on_bit_lock+0x78/0x90
Apr 28 07:04:40 host-073 kernel: [<ffffffff810a1520>] ? wake_bit_function+0x0/0x50
Apr 28 07:04:40 host-073 kernel: [<ffffffff811c88f0>] ? end_buffer_async_write+0x0/0x190
Apr 28 07:04:40 host-073 kernel: [<ffffffff811c8766>] __lock_buffer+0x36/0x40
Apr 28 07:04:40 host-073 kernel: [<ffffffff811c9a65>] __block_write_full_page+0x305/0x330
Apr 28 07:04:40 host-073 kernel: [<ffffffff811c88f0>] ? end_buffer_async_write+0x0/0x190
Apr 28 07:04:40 host-073 kernel: [<ffffffff811c9b70>] block_write_full_page_endio+0xe0/0x120
Apr 28 07:04:40 host-073 kernel: [<ffffffffa0480d10>] ? buffer_unmapped+0x0/0x20 [ext3]
Apr 28 07:04:40 host-073 kernel: [<ffffffff811c9bc5>] block_write_full_page+0x15/0x20
Apr 28 07:04:40 host-073 kernel: [<ffffffffa04818bd>] ext3_ordered_writepage+0x1ed/0x240 [ext3]
Apr 28 07:04:40 host-073 kernel: [<ffffffff8113b087>] __writepage+0x17/0x40
Apr 28 07:04:40 host-073 kernel: [<ffffffff8113c34d>] write_cache_pages+0x1fd/0x4c0
Apr 28 07:04:40 host-073 kernel: [<ffffffff8113b070>] ? __writepage+0x0/0x40
Apr 28 07:04:40 host-073 kernel: [<ffffffff8113c634>] generic_writepages+0x24/0x30
Apr 28 07:04:40 host-073 kernel: [<ffffffff8113c675>] do_writepages+0x35/0x40
Apr 28 07:04:40 host-073 kernel: [<ffffffff811be40d>] writeback_single_inode+0xdd/0x290
Apr 28 07:04:40 host-073 kernel: [<ffffffff811be80d>] writeback_sb_inodes+0xbd/0x170
Apr 28 07:04:40 host-073 kernel: [<ffffffff811be96b>] writeback_inodes_wb+0xab/0x1b0
Apr 28 07:04:40 host-073 kernel: [<ffffffff811bed63>] wb_writeback+0x2f3/0x410
Apr 28 07:04:40 host-073 kernel: [<ffffffff8108a772>] ? del_timer_sync+0x22/0x30
Apr 28 07:04:40 host-073 kernel: [<ffffffff811bef3b>] wb_do_writeback+0xbb/0x240
Apr 28 07:04:40 host-073 kernel: [<ffffffff811bf123>] bdi_writeback_task+0x63/0x1b0
Apr 28 07:04:40 host-073 kernel: [<ffffffff810a1327>] ? bit_waitqueue+0x17/0xd0
Apr 28 07:04:40 host-073 kernel: [<ffffffff8114b380>] ? bdi_start_fn+0x0/0x100
Apr 28 07:04:40 host-073 kernel: [<ffffffff8114b406>] bdi_start_fn+0x86/0x100
Apr 28 07:04:40 host-073 kernel: [<ffffffff8114b380>] ? bdi_start_fn+0x0/0x100
Apr 28 07:04:40 host-073 kernel: [<ffffffff810a100e>] kthread+0x9e/0xc0
Apr 28 07:04:40 host-073 kernel: [<ffffffff8100c28a>] child_rip+0xa/0x20
Apr 28 07:04:40 host-073 kernel: [<ffffffff810a0f70>] ? kthread+0x0/0xc0
Apr 28 07:04:40 host-073 kernel: [<ffffffff8100c280>] ? child_rip+0x0/0x20


Note You need to log in before you can comment on or make changes to this bug.