Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1085983

Summary: mirror device failure scenarios lead to deadlocks
Product: Red Hat Enterprise Linux 7 Reporter: Corey Marthaler <cmarthal>
Component: lvm2Assignee: Jonathan Earl Brassow <jbrassow>
lvm2 sub component: Mirroring and RAID QA Contact: Cluster QE <mspqa-list>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: unspecified CC: agk, cmarthal, heinzm, jbrassow, msnitzer, nperic, ovasik, prajnoha, prockai, zkabelac
Version: 7.0   
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: lvm2-2.02.115-1.el7 Doc Type: Bug Fix
Doc Text:
Previously, when using the "mirror" segment type to create a logical volume while using lvmetad, it was possible for LVM to incorrectly use a failed device as the source for allocating a replacement device. This would lead to deadlocks in the affected device and LVM commands. While this issue has been resolved, the "raid1" segment type should be used instead of the "mirror" segment type. The "raid1" segment type is the default in RHEL 7.
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-03-05 13:08:01 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
manual repair with captured output following device failure
none
Performing upconvert when there is a failed PV present. none

Description Corey Marthaler 2014-04-09 19:34:32 UTC
Description of problem:
I feel like this is just a rhel7 version of bug 1025456. Still deadlock scenarios that *don't* involve snapshots. 

./helter_skelter -o host-049.virt.lab.msp.redhat.com -l /home/msp/cmarthal/work/sts/sts-root -r /usr/tests/sts-rhel7.0 -F -e kill_primary_synced_core_log_4_legs

================================================================================
Iteration 2.3 started at Mon Apr  7 20:04:25 CDT 2014
================================================================================
Scenario kill_primary_synced_core_log_4_legs: Kill primary leg of synced core log 4 leg mirror(s)

********* Mirror hash info for this scenario *********
* names:              syncd_primary_core_4legs_1 syncd_primary_core_4legs_2 syncd_primary_core_4legs_3
* sync:               1
* striped:            0
* leg devices:        /dev/sdd1 /dev/sdc1 /dev/sde1 /dev/sdg1
* log devices:        
* no MDA devices:     
* failpv(s):          /dev/sdd1
* failnode(s):        host-049.virt.lab.msp.redhat.com
* lvmetad:            1
* leg fault policy:   allocate
* log fault policy:   allocate
******************************************************

Creating mirror(s) on host-049.virt.lab.msp.redhat.com...
host-049.virt.lab.msp.redhat.com: lvcreate --type mirror --mirrorlog core -m 3 -n syncd_primary_core_4legs_1 -L 500M helter_skelter /dev/sdd1:0-1000 /dev/sdc1:0-1000 /dev/sde1:0-1000 /dev/sdg1:0-1000
host-049.virt.lab.msp.redhat.com: lvcreate --type mirror --mirrorlog core -m 3 -n syncd_primary_core_4legs_2 -L 500M helter_skelter /dev/sdd1:0-1000 /dev/sdc1:0-1000 /dev/sde1:0-1000 /dev/sdg1:0-1000
host-049.virt.lab.msp.redhat.com: lvcreate --type mirror --mirrorlog core -m 3 -n syncd_primary_core_4legs_3 -L 500M helter_skelter /dev/sdd1:0-1000 /dev/sdc1:0-1000 /dev/sde1:0-1000 /dev/sdg1:0-1000

Current mirror/raid device structure(s):
  LV                                    Attr       LSize   Cpy%Sync Devices
  syncd_primary_core_4legs_1            mwi-a-m--- 500.00m    10.40 syncd_primary_core_4legs_1_mimage_0(0),syncd_primary_core_4legs_1_mimage_1(0),syncd_primary_core_4legs_1_mimage_2(0),syncd_primary_core_4legs_1_mimage_3(0)
  [syncd_primary_core_4legs_1_mimage_0] Iwi-aom--- 500.00m          /dev/sdd1(0)
  [syncd_primary_core_4legs_1_mimage_1] Iwi-aom--- 500.00m          /dev/sdc1(0)
  [syncd_primary_core_4legs_1_mimage_2] Iwi-aom--- 500.00m          /dev/sde1(0)
  [syncd_primary_core_4legs_1_mimage_3] Iwi-aom--- 500.00m          /dev/sdg1(0)
  syncd_primary_core_4legs_2            mwi-a-m--- 500.00m     6.40 syncd_primary_core_4legs_2_mimage_0(0),syncd_primary_core_4legs_2_mimage_1(0),syncd_primary_core_4legs_2_mimage_2(0),syncd_primary_core_4legs_2_mimage_3(0)
  [syncd_primary_core_4legs_2_mimage_0] Iwi-aom--- 500.00m          /dev/sdd1(125)
  [syncd_primary_core_4legs_2_mimage_1] Iwi-aom--- 500.00m          /dev/sdc1(125)
  [syncd_primary_core_4legs_2_mimage_2] Iwi-aom--- 500.00m          /dev/sde1(125)
  [syncd_primary_core_4legs_2_mimage_3] Iwi-aom--- 500.00m          /dev/sdg1(125)
  syncd_primary_core_4legs_3            mwi-aom--- 500.00m     2.40 syncd_primary_core_4legs_3_mimage_0(0),syncd_primary_core_4legs_3_mimage_1(0),syncd_primary_core_4legs_3_mimage_2(0),syncd_primary_core_4legs_3_mimage_3(0)
  [syncd_primary_core_4legs_3_mimage_0] Iwi-aom--- 500.00m          /dev/sdd1(250)
  [syncd_primary_core_4legs_3_mimage_1] Iwi-aom--- 500.00m          /dev/sdc1(250)
  [syncd_primary_core_4legs_3_mimage_2] Iwi-aom--- 500.00m          /dev/sde1(250)
  [syncd_primary_core_4legs_3_mimage_3] Iwi-aom--- 500.00m          /dev/sdg1(250)

Waiting until all mirror|raid volumes become fully syncd...
   0/3 mirror(s) are fully synced: ( 30.70% 27.10% 20.10% )
   0/3 mirror(s) are fully synced: ( 53.40% 49.90% 35.60% )
   0/3 mirror(s) are fully synced: ( 80.70% 72.10% 45.40% )
   1/3 mirror(s) are fully synced: ( 100.00% 93.10% 56.90% )
   2/3 mirror(s) are fully synced: ( 100.00% 100.00% 85.10% )
   3/3 mirror(s) are fully synced: ( 100.00% 100.00% 100.00% )

Creating ext on top of mirror(s) on host-049.virt.lab.msp.redhat.com...
mke2fs 1.42.9 (28-Dec-2013)
mke2fs 1.42.9 (28-Dec-2013)
mke2fs 1.42.9 (28-Dec-2013)
Mounting mirrored ext filesystems on host-049.virt.lab.msp.redhat.com...

PV=/dev/sdd1
        syncd_primary_core_4legs_1_mimage_0: 4.1
        syncd_primary_core_4legs_2_mimage_0: 4.1
        syncd_primary_core_4legs_3_mimage_0: 4.1
PV=/dev/sdd1
        syncd_primary_core_4legs_1_mimage_0: 4.1
        syncd_primary_core_4legs_2_mimage_0: 4.1
        syncd_primary_core_4legs_3_mimage_0: 4.1

Writing verification files (checkit) to mirror(s) on...
        ---- host-049.virt.lab.msp.redhat.com ----

Sleeping 15 seconds to get some outsanding EXT I/O locks before the failure 
Verifying files (checkit) on mirror(s) on...
        ---- host-049.virt.lab.msp.redhat.com ----

Disabling device sdd on host-049.virt.lab.msp.redhat.com  /dev/sdd1: read failed after 0 of 2048 at 0: Input/output error

Getting recovery check start time from /var/log/messages: Apr  7 20:06
Attempting I/O to cause mirror down conversion(s) on host-049.virt.lab.msp.redhat.com

[DEADLOCK]

[root@host-049 ~]# lvs -a -o +devices
  PV aLf1PV-900v-Sljd-aKT9-KO01-zeqb-8DNWRm not recognised. Is the device missing?
  PV aLf1PV-900v-Sljd-aKT9-KO01-zeqb-8DNWRm not recognised. Is the device missing?
  LV                                    Attr       LSize   Log Cpy%Sync Devices
  syncd_primary_core_4legs_1            mwi-aom-p- 500.00m        96.80 syncd_primary_core_4legs_1_mimage_0(0),syncd_primary_core_4legs_1_mimage_1(0),syncd_primary_core_4legs_1_mimage_2(0),syncd_primary_core_4legs_1_mimage_3(0)
  [syncd_primary_core_4legs_1_mimage_0] Iwi-aom-p- 500.00m              unknown device(0)
  [syncd_primary_core_4legs_1_mimage_1] Iwi-aom--- 500.00m              /dev/sdc1(0)
  [syncd_primary_core_4legs_1_mimage_2] Iwi-aom--- 500.00m              /dev/sde1(0)
  [syncd_primary_core_4legs_1_mimage_3] Iwi-aom--- 500.00m              /dev/sdg1(0)
  syncd_primary_core_4legs_2            mwi-aom-p- 500.00m        99.20 syncd_primary_core_4legs_2_mimage_0(0),syncd_primary_core_4legs_2_mimage_1(0),syncd_primary_core_4legs_2_mimage_2(0),syncd_primary_core_4legs_2_mimage_3(0)
  [syncd_primary_core_4legs_2_mimage_0] Iwi-aom-p- 500.00m              unknown device(125)
  [syncd_primary_core_4legs_2_mimage_1] Iwi-aom--- 500.00m              /dev/sdc1(125)
  [syncd_primary_core_4legs_2_mimage_2] Iwi-aom--- 500.00m              /dev/sde1(125)
  [syncd_primary_core_4legs_2_mimage_3] Iwi-aom--- 500.00m              /dev/sdg1(125)
  syncd_primary_core_4legs_3            mwi-aom-p- 500.00m        99.20 syncd_primary_core_4legs_3_mimage_0(0),syncd_primary_core_4legs_3_mimage_1(0),syncd_primary_core_4legs_3_mimage_2(0),syncd_primary_core_4legs_3_mimage_3(0)
  [syncd_primary_core_4legs_3_mimage_0] Iwi-aom-p- 500.00m              unknown device(250)
  [syncd_primary_core_4legs_3_mimage_1] Iwi-aom--- 500.00m              /dev/sdc1(250)
  [syncd_primary_core_4legs_3_mimage_2] Iwi-aom--- 500.00m              /dev/sde1(250)
  [syncd_primary_core_4legs_3_mimage_3] Iwi-aom--- 500.00m              /dev/sdg1(250)

LOG:
[...]
Apr  7 20:06:26 host-049 qarshd[24692]: Running cmdline: echo offline > /sys/block/sdd/device/state
Apr  7 20:06:26 host-049 kernel: sd 2:0:0:1: rejecting I/O to offline device
Apr  7 20:06:26 host-049 lvm[17030]: Primary mirror device 253:7 has failed (D).
Apr  7 20:06:26 host-049 lvm[17030]: Device failure in helter_skelter-syncd_primary_core_4legs_2.
Apr  7 20:06:26 host-049 lvm[17030]: /dev/sdd1: read failed after 0 of 1024 at 10733879296: Input/output error
Apr  7 20:06:26 host-049 lvm[17030]: /dev/sdd1: read failed after 0 of 1024 at 10733948928: Input/output error
Apr  7 20:06:26 host-049 qarshd[24695]: Running cmdline: pvscan --cache /dev/sdd1
Apr  7 20:06:26 host-049 lvm[17030]: intermediate VG write failed.
Apr  7 20:06:26 host-049 lvm[17030]: Mirror status: 1 of 4 images failed.
Apr  7 20:06:26 host-049 lvm[17030]: Trying to up-convert to 4 images, 0 logs.
Apr  7 20:06:26 host-049 lvm[17030]: Trying to up-convert to 3 images, 0 logs.
Apr  7 20:06:26 host-049 lvm[17030]: intermediate VG write failed.
Apr  7 20:06:26 host-049 lvm[17030]: Trying to up-convert to 2 images, 0 logs.
Apr  7 20:06:26 host-049 lvm[17030]: intermediate VG write failed.
Apr  7 20:06:26 host-049 lvm[17030]: WARNING: Failed to replace 2 of 4 images in volume syncd_primary_core_4legs_2
Apr  7 20:06:26 host-049 lvm[17030]: PV aLf1PV-900v-Sljd-aKT9-KO01-zeqb-8DNWRm not recognised. Is the device missing?
Apr  7 20:06:26 host-049 lvm[17030]: helter_skelter/syncd_primary_core_4legs_2: Converted: 99.2%
Apr  7 20:06:31 host-049 qarshd[24707]: Running cmdline: pvs -a
Apr  7 20:06:32 host-049 qarshd[24711]: Running cmdline: tail -n1 /var/log/messages
Apr  7 20:06:32 host-049 qarshd[24714]: Running cmdline: dd if=/dev/zero of=/mnt/syncd_primary_core_4legs_1/ddfile count=10 bs=4M
Apr  7 20:06:41 host-049 lvm[17030]: PV aLf1PV-900v-Sljd-aKT9-KO01-zeqb-8DNWRm not recognised. Is the device missing?
Apr  7 20:06:41 host-049 lvm[17030]: helter_skelter/syncd_primary_core_4legs_2: Converted: 99.2%
Apr  7 20:06:56 host-049 lvm[17030]: PV aLf1PV-900v-Sljd-aKT9-KO01-zeqb-8DNWRm not recognised. Is the device missing?
Apr  7 20:06:56 host-049 lvm[17030]: helter_skelter/syncd_primary_core_4legs_2: Converted: 99.2%
Apr  7 20:07:11 host-049 lvm[17030]: PV aLf1PV-900v-Sljd-aKT9-KO01-zeqb-8DNWRm not recognised. Is the device missing?
Apr  7 20:07:11 host-049 lvm[17030]: helter_skelter/syncd_primary_core_4legs_2: Converted: 99.2%
Apr  7 20:07:26 host-049 lvm[17030]: PV aLf1PV-900v-Sljd-aKT9-KO01-zeqb-8DNWRm not recognised. Is the device missing?
Apr  7 20:07:26 host-049 lvm[17030]: helter_skelter/syncd_primary_core_4legs_2: Converted: 99.2%
[...]
Apr  9 12:40:19 host-049 lvm[17030]: PV aLf1PV-900v-Sljd-aKT9-KO01-zeqb-8DNWRm not recognised. Is the device missing?
Apr  9 12:40:19 host-049 lvm[17030]: helter_skelter/syncd_primary_core_4legs_2: Converted: 99.2%
Apr  9 12:40:34 host-049 lvm[17030]: PV aLf1PV-900v-Sljd-aKT9-KO01-zeqb-8DNWRm not recognised. Is the device missing?
Apr  9 12:40:34 host-049 lvm[17030]: helter_skelter/syncd_primary_core_4legs_2: Converted: 99.2%



Version-Release number of selected component (if applicable):
3.10.0-116.el7.x86_64
lvm2-2.02.105-14.el7    BUILT: Wed Mar 26 08:29:41 CDT 2014
lvm2-libs-2.02.105-14.el7    BUILT: Wed Mar 26 08:29:41 CDT 2014
lvm2-cluster-2.02.105-14.el7    BUILT: Wed Mar 26 08:29:41 CDT 2014
device-mapper-1.02.84-14.el7    BUILT: Wed Mar 26 08:29:41 CDT 2014
device-mapper-libs-1.02.84-14.el7    BUILT: Wed Mar 26 08:29:41 CDT 2014
device-mapper-event-1.02.84-14.el7    BUILT: Wed Mar 26 08:29:41 CDT 2014
device-mapper-event-libs-1.02.84-14.el7    BUILT: Wed Mar 26 08:29:41 CDT 2014
device-mapper-persistent-data-0.3.0-1.el7    BUILT: Fri Mar 28 07:42:24 CDT 2014
cmirror-2.02.105-14.el7    BUILT: Wed Mar 26 08:29:41 CDT 2014


How reproducible:
This is reproducible

Comment 1 Corey Marthaler 2014-04-09 19:54:33 UTC
Another case:

================================================================================
Iteration 2.1 started at Mon Apr  7 20:26:33 CDT 2014
================================================================================
Scenario kill_pri_log_and_pri_leg_2_legs_2_logs: Kill primary leg and primary log of synced 2 leg redundant log mirror(s)

********* Mirror hash info for this scenario *********
* names:              syncd_pri_leg_pri_log_2legs_2logs_1 syncd_pri_leg_pri_log_2legs_2logs_2 syncd_pri_leg_pri_log_2legs_2logs_3
* sync:               1
* striped:            0
* leg devices:        /dev/sdd1 /dev/sdf1
* log devices:        /dev/sde1 /dev/sdh1
* no MDA devices:     
* failpv(s):          /dev/sdd1 /dev/sde1
* failnode(s):        host-052.virt.lab.msp.redhat.com
* lvmetad:            1
* leg fault policy:   remove
* log fault policy:   remove
******************************************************

Creating mirror(s) on host-052.virt.lab.msp.redhat.com...
host-052.virt.lab.msp.redhat.com: lvcreate --type mirror --mirrorlog mirrored -m 1 -n syncd_pri_leg_pri_log_2legs_2logs_1 -L 500M helter_skelter /dev/sdd1:0-1000 /dev/sdf1:0-1000 /dev/sde1:0-150 /dev/sdh1:0-150
host-052.virt.lab.msp.redhat.com: lvcreate --type mirror --mirrorlog mirrored -m 1 -n syncd_pri_leg_pri_log_2legs_2logs_2 -L 500M helter_skelter /dev/sdd1:0-1000 /dev/sdf1:0-1000 /dev/sde1:0-150 /dev/sdh1:0-150
WARNING: ext3 signature detected on /dev/helter_skelter/syncd_pri_leg_pri_log_2legs_2logs_2 at offset 1080. Wipe it? [y/n] 
  1 existing signature left on the device.
host-052.virt.lab.msp.redhat.com: lvcreate --type mirror --mirrorlog mirrored -m 1 -n syncd_pri_leg_pri_log_2legs_2logs_3 -L 500M helter_skelter /dev/sdd1:0-1000 /dev/sdf1:0-1000 /dev/sde1:0-150 /dev/sdh1:0-150

Current mirror/raid device structure(s):
  LV                                                  Attr       LSize   Cpy%Sync Devices
  syncd_pri_leg_pri_log_2legs_2logs_1                 mwi-a-m--- 500.00m    26.40 syncd_pri_leg_pri_log_2legs_2logs_1_mimage_0(0),syncd_pri_leg_pri_log_2legs_2logs_1_mimage_1(0)          
  [syncd_pri_leg_pri_log_2legs_2logs_1_mimage_0]      Iwi-aom--- 500.00m          /dev/sdd1(0)
  [syncd_pri_leg_pri_log_2legs_2logs_1_mimage_1]      Iwi-aom--- 500.00m          /dev/sdf1(0)
  [syncd_pri_leg_pri_log_2legs_2logs_1_mlog]          mwi-aom---   4.00m   100.00 syncd_pri_leg_pri_log_2legs_2logs_1_mlog_mimage_0(0),syncd_pri_leg_pri_log_2legs_2logs_1_mlog_mimage_1(0)
  [syncd_pri_leg_pri_log_2legs_2logs_1_mlog_mimage_0] iwi-aom---   4.00m          /dev/sde1(0)
  [syncd_pri_leg_pri_log_2legs_2logs_1_mlog_mimage_1] iwi-aom---   4.00m          /dev/sdh1(0)
  syncd_pri_leg_pri_log_2legs_2logs_2                 mwi-a-m--- 500.00m    16.00 syncd_pri_leg_pri_log_2legs_2logs_2_mimage_0(0),syncd_pri_leg_pri_log_2legs_2logs_2_mimage_1(0)          
  [syncd_pri_leg_pri_log_2legs_2logs_2_mimage_0]      Iwi-aom--- 500.00m          /dev/sdd1(125)
  [syncd_pri_leg_pri_log_2legs_2logs_2_mimage_1]      Iwi-aom--- 500.00m          /dev/sdf1(125)
  [syncd_pri_leg_pri_log_2legs_2logs_2_mlog]          mwi-aom---   4.00m   100.00 syncd_pri_leg_pri_log_2legs_2logs_2_mlog_mimage_0(0),syncd_pri_leg_pri_log_2legs_2logs_2_mlog_mimage_1(0)
  [syncd_pri_leg_pri_log_2legs_2logs_2_mlog_mimage_0] iwi-aom---   4.00m          /dev/sde1(1)
  [syncd_pri_leg_pri_log_2legs_2logs_2_mlog_mimage_1] iwi-aom---   4.00m          /dev/sdh1(1)
  syncd_pri_leg_pri_log_2legs_2logs_3                 mwi-aom--- 500.00m     0.80 syncd_pri_leg_pri_log_2legs_2logs_3_mimage_0(0),syncd_pri_leg_pri_log_2legs_2logs_3_mimage_1(0)          
  [syncd_pri_leg_pri_log_2legs_2logs_3_mimage_0]      Iwi-aom--- 500.00m          /dev/sdd1(250)
  [syncd_pri_leg_pri_log_2legs_2logs_3_mimage_1]      Iwi-aom--- 500.00m          /dev/sdf1(250)
  [syncd_pri_leg_pri_log_2legs_2logs_3_mlog]          mwi-aom---   4.00m   100.00 syncd_pri_leg_pri_log_2legs_2logs_3_mlog_mimage_0(0),syncd_pri_leg_pri_log_2legs_2logs_3_mlog_mimage_1(0)
  [syncd_pri_leg_pri_log_2legs_2logs_3_mlog_mimage_0] iwi-aom---   4.00m          /dev/sde1(2)
  [syncd_pri_leg_pri_log_2legs_2logs_3_mlog_mimage_1] iwi-aom---   4.00m          /dev/sdh1(2)

Waiting until all mirror|raid volumes become fully syncd...
   0/3 mirror(s) are fully synced: ( 71.10% 62.10% 36.40% )
   2/3 mirror(s) are fully synced: ( 100.00% 100.00% 83.20% )
   3/3 mirror(s) are fully synced: ( 100.00% 100.00% 100.00% )

Creating ext on top of mirror(s) on host-052.virt.lab.msp.redhat.com...
mke2fs 1.42.9 (28-Dec-2013)
mke2fs 1.42.9 (28-Dec-2013)
mke2fs 1.42.9 (28-Dec-2013)
Mounting mirrored ext filesystems on host-052.virt.lab.msp.redhat.com...

PV=/dev/sde1
        syncd_pri_leg_pri_log_2legs_2logs_1_mlog_mimage_0: 2
        syncd_pri_leg_pri_log_2legs_2logs_2_mlog_mimage_0: 2
        syncd_pri_leg_pri_log_2legs_2logs_3_mlog_mimage_0: 2
PV=/dev/sdd1
        syncd_pri_leg_pri_log_2legs_2logs_1_mimage_0: 6
        syncd_pri_leg_pri_log_2legs_2logs_2_mimage_0: 6
        syncd_pri_leg_pri_log_2legs_2logs_3_mimage_0: 6
PV=/dev/sde1
        syncd_pri_leg_pri_log_2legs_2logs_1_mlog_mimage_0: 2
        syncd_pri_leg_pri_log_2legs_2logs_2_mlog_mimage_0: 2
        syncd_pri_leg_pri_log_2legs_2logs_3_mlog_mimage_0: 2
PV=/dev/sdd1
        syncd_pri_leg_pri_log_2legs_2logs_1_mimage_0: 6
        syncd_pri_leg_pri_log_2legs_2logs_2_mimage_0: 6
        syncd_pri_leg_pri_log_2legs_2logs_3_mimage_0: 6

Writing verification files (checkit) to mirror(s) on...
        ---- host-052.virt.lab.msp.redhat.com ----

Sleeping 15 seconds to get some outsanding EXT I/O locks before the failure 
Verifying files (checkit) on mirror(s) on...
        ---- host-052.virt.lab.msp.redhat.com ----

Disabling device sdd on host-052.virt.lab.msp.redhat.com  /dev/sdd1: read failed after 0 of 2048 at 0: Input/output error
Disabling device sde on host-052.virt.lab.msp.redhat.com  /dev/sde1: read failed after 0 of 2048 at 0: Input/output error


Getting recovery check start time from /var/log/messages: Apr  7 20:28
Attempting I/O to cause mirror down conversion(s) on host-052.virt.lab.msp.redhat.com
[DEADLOCK]

Comment 2 Corey Marthaler 2014-04-09 20:17:34 UTC
Another case, all of these involve writes to the mirror hanging after a failure.

================================================================================
Iteration 0.10 started at Mon Apr  7 17:09:45 CDT 2014
================================================================================
Scenario kill_primary_synced_2_legs: Kill primary leg of synced 2 leg mirror(s)

********* Mirror hash info for this scenario *********
* names:              syncd_primary_2legs_1
* sync:               1
* striped:            0
* leg devices:        /dev/sdh1 /dev/sdg1
* log devices:        /dev/sdc1
* no MDA devices:     
* failpv(s):          /dev/sdh1
* failnode(s):        host-053.virt.lab.msp.redhat.com
* lvmetad:            1
* leg fault policy:   remove
* log fault policy:   allocate
******************************************************

Creating mirror(s) on host-053.virt.lab.msp.redhat.com...
host-053.virt.lab.msp.redhat.com: lvcreate --type mirror -m 1 -n syncd_primary_2legs_1 -L 500M helter_skelter /dev/sdh1:0-1000 /dev/sdg1:0-1000 /dev/sdc1:0-150
WARNING: ext3 signature detected on /dev/helter_skelter/syncd_primary_2legs_1 at offset 1080. Wipe it? [y/n] 
  1 existing signature left on the device.

Current mirror/raid device structure(s):
  LV                               Attr       LSize   Cpy%Sync Devices
  syncd_primary_2legs_1            mwi-aom--- 500.00m     1.60 syncd_primary_2legs_1_mimage_0(0),syncd_primary_2legs_1_mimage_1(0)
  [syncd_primary_2legs_1_mimage_0] Iwi-aom--- 500.00m          /dev/sdh1(0)
  [syncd_primary_2legs_1_mimage_1] Iwi-aom--- 500.00m          /dev/sdg1(0)
  [syncd_primary_2legs_1_mlog]     lwi-aom---   4.00m          /dev/sdc1(0)

Waiting until all mirror|raid volumes become fully syncd...
   0/1 mirror(s) are fully synced: ( 83.30% )
   1/1 mirror(s) are fully synced: ( 100.00% )

Creating ext on top of mirror(s) on host-053.virt.lab.msp.redhat.com...
mke2fs 1.42.9 (28-Dec-2013)
Mounting mirrored ext filesystems on host-053.virt.lab.msp.redhat.com...

PV=/dev/sdh1
        syncd_primary_2legs_1_mimage_0: 6
PV=/dev/sdh1
        syncd_primary_2legs_1_mimage_0: 6

Writing verification files (checkit) to mirror(s) on...
        ---- host-053.virt.lab.msp.redhat.com ----

Sleeping 15 seconds to get some outsanding EXT I/O locks before the failure 
Verifying files (checkit) on mirror(s) on...
        ---- host-053.virt.lab.msp.redhat.com ----

Disabling device sdh on host-053.virt.lab.msp.redhat.com  /dev/sdh1: read failed after 0 of 2048 at 0: Input/output error

Getting recovery check start time from /var/log/messages: Apr  7 17:10
Attempting I/O to cause mirror down conversion(s) on host-053.virt.lab.msp.redhat.com
[DEADLOCK]

[root@host-053 ~]# lvs -a -o +devices
  PV 9hG5TK-6u1t-8QI5-6aCn-ZFjz-EEok-rNuYCf not recognised. Is the device missing?
  PV 9hG5TK-6u1t-8QI5-6aCn-ZFjz-EEok-rNuYCf not recognised. Is the device missing?
  LV                               Attr       LSize   Log                         Cpy%Sync  Devices
  syncd_primary_2legs_1            mwi-aom-p- 500.00m syncd_primary_2legs_1_mlog  97.60     syncd_primary_2legs_1_mimage_0(0),syncd_primary_2legs_1_mimage_1(0)
  [syncd_primary_2legs_1_mimage_0] Iwi-aom-p- 500.00m                                       unknown device(0)
  [syncd_primary_2legs_1_mimage_1] Iwi-aom--- 500.00m                                       /dev/sdg1(0)
  [syncd_primary_2legs_1_mlog]     lwi-aom---   4.00m                                       /dev/sdc1(0)

Comment 5 Jonathan Earl Brassow 2015-01-08 22:16:58 UTC
# ./helter_skelter -o bp-01 -F -e kill_primary_synced_core_log_4_legs

....

================================================================================
Iteration 34.1 started at Thu Jan  8 16:12:42 CST 2015
================================================================================
Scenario kill_primary_synced_core_log_4_legs: Kill primary leg of synced core log 4 leg mirror(s)

********* Mirror hash info for this scenario *********
* names:              syncd_primary_core_4legs_1 syncd_primary_core_4legs_2
* sync:               1
* striped:            0
* leg devices:        /dev/sdf1 /dev/sdd1 /dev/sdi1 /dev/sdb1
* log devices:
* no MDA devices:
* failpv(s):          /dev/sdf1
* failnode(s):        bp-01
* lvmetad:            1
* leg fault policy:   allocate
* log fault policy:   allocate
******************************************************

Having trouble reproducing...

Comment 6 Corey Marthaler 2015-01-09 19:06:13 UTC
I appear to have hit this on all three machines I had helter_skelter running over night. All three were during a redundant mirror log testcase (like in comment #1). Also, instead of running just one case (like in comment #5), I left the '-e" off and had the test run all the cases since this appears to happen in multiple scenarios.

================================================================================
Iteration 1.2 started at Thu Jan  8 20:20:39 CST 2015
================================================================================
Scenario kill_pri_log_and_pri_leg_2_legs_2_logs: Kill primary leg and primary log of synced 2 leg redundant log mirror(s)

********* Mirror hash info for this scenario *********
* names:              syncd_pri_leg_pri_log_2legs_2logs_1 syncd_pri_leg_pri_log_2legs_2logs_2
* sync:               1
* striped:            0
* leg devices:        /dev/sdf1 /dev/sdg1
* log devices:        /dev/sdc1 /dev/sda1
* no MDA devices:     
* failpv(s):          /dev/sdf1 /dev/sdc1
* failnode(s):        host-111.virt.lab.msp.redhat.com
* lvmetad:            1
* leg fault policy:   allocate
* log fault policy:   allocate
******************************************************

Creating mirror(s) on host-111.virt.lab.msp.redhat.com...
host-111.virt.lab.msp.redhat.com: lvcreate --type mirror --mirrorlog mirrored -m 1 -n syncd_pri_leg_pri_log_2legs_2logs_1 -L 500M helter_skelter /dev/sdf1:0-2400 /dev/sdg1:0-2400 /dev/sdc1:0-150 /dev/sda1:0-150
WARNING: ext3 signature detected on /dev/helter_skelter/syncd_pri_leg_pri_log_2legs_2logs_1 at offset 1080. Wipe it? [y/n]: n
  Aborted wiping of ext3.
  1 existing signature left on the device.
host-111.virt.lab.msp.redhat.com: lvcreate --type mirror --mirrorlog mirrored -m 1 -n syncd_pri_leg_pri_log_2legs_2logs_2 -L 500M helter_skelter /dev/sdf1:0-2400 /dev/sdg1:0-2400 /dev/sdc1:0-150 /dev/sda1:0-150
WARNING: ext3 signature detected on /dev/helter_skelter/syncd_pri_leg_pri_log_2legs_2logs_2 at offset 1080. Wipe it? [y/n]: n
  Aborted wiping of ext3.
  1 existing signature left on the device.

Current mirror/raid device structure(s):
  LV                                                  Attr       LSize   Cpy%Sync Devices
  syncd_pri_leg_pri_log_2legs_2logs_1                 mwi-a-m--- 500.00m 16.00    syncd_pri_leg_pri_log_2legs_2logs_1_mimage_0(0),syncd_pri_leg_pri_log_2legs_2logs_1_mimage_1(0)          
  [syncd_pri_leg_pri_log_2legs_2logs_1_mimage_0]      Iwi-aom--- 500.00m          /dev/sdf1(0)
  [syncd_pri_leg_pri_log_2legs_2logs_1_mimage_1]      Iwi-aom--- 500.00m          /dev/sdg1(0)
  [syncd_pri_leg_pri_log_2legs_2logs_1_mlog]          mwi-aom---   4.00m 100.00   syncd_pri_leg_pri_log_2legs_2logs_1_mlog_mimage_0(0),syncd_pri_leg_pri_log_2legs_2logs_1_mlog_mimage_1(0)
  [syncd_pri_leg_pri_log_2legs_2logs_1_mlog_mimage_0] iwi-aom---   4.00m          /dev/sdc1(0)
  [syncd_pri_leg_pri_log_2legs_2logs_1_mlog_mimage_1] iwi-aom---   4.00m          /dev/sda1(0)
  syncd_pri_leg_pri_log_2legs_2logs_2                 mwi-aom--- 500.00m 0.80     syncd_pri_leg_pri_log_2legs_2logs_2_mimage_0(0),syncd_pri_leg_pri_log_2legs_2logs_2_mimage_1(0)          
  [syncd_pri_leg_pri_log_2legs_2logs_2_mimage_0]      Iwi-aom--- 500.00m          /dev/sdf1(125)
  [syncd_pri_leg_pri_log_2legs_2logs_2_mimage_1]      Iwi-aom--- 500.00m          /dev/sdg1(125)
  [syncd_pri_leg_pri_log_2legs_2logs_2_mlog]          mwi-aom---   4.00m 100.00   syncd_pri_leg_pri_log_2legs_2logs_2_mlog_mimage_0(0),syncd_pri_leg_pri_log_2legs_2logs_2_mlog_mimage_1(0)
  [syncd_pri_leg_pri_log_2legs_2logs_2_mlog_mimage_0] iwi-aom---   4.00m          /dev/sdc1(1)
  [syncd_pri_leg_pri_log_2legs_2logs_2_mlog_mimage_1] iwi-aom---   4.00m          /dev/sda1(1)

Waiting until all mirror|raid volumes become fully syncd...
   1/2 mirror(s) are fully synced: ( 100.00% 54.30% )
   2/2 mirror(s) are fully synced: ( 100.00% 100.00% )

Creating ext on top of mirror(s) on host-111.virt.lab.msp.redhat.com...
mke2fs 1.42.9 (28-Dec-2013)
mke2fs 1.42.9 (28-Dec-2013)
Mounting mirrored ext filesystems on host-111.virt.lab.msp.redhat.com...

PV=/dev/sdf1
        syncd_pri_leg_pri_log_2legs_2logs_1_mimage_0: 5.1
        syncd_pri_leg_pri_log_2legs_2logs_2_mimage_0: 5.1
PV=/dev/sdc1
        syncd_pri_leg_pri_log_2legs_2logs_1_mlog_mimage_0: 1.4
        syncd_pri_leg_pri_log_2legs_2logs_2_mlog_mimage_0: 1.4
PV=/dev/sdf1
        syncd_pri_leg_pri_log_2legs_2logs_1_mimage_0: 5.1
        syncd_pri_leg_pri_log_2legs_2logs_2_mimage_0: 5.1
PV=/dev/sdc1
        syncd_pri_leg_pri_log_2legs_2logs_1_mlog_mimage_0: 1.4
        syncd_pri_leg_pri_log_2legs_2logs_2_mlog_mimage_0: 1.4

Writing verification files (checkit) to mirror(s) on...
        ---- host-111.virt.lab.msp.redhat.com ----

Sleeping 15 seconds to get some outsanding I/O locks before the failure 
Verifying files (checkit) on mirror(s) on...
        ---- host-111.virt.lab.msp.redhat.com ----

Disabling device sdf on host-111.virt.lab.msp.redhat.com
Disabling device sdc on host-111.virt.lab.msp.redhat.com


[root@host-111 ~]# lvs -a 
  WARNING: Device for PV kNGAA7-0I5p-zySO-wrp6-kR7h-YzGp-n0ukeb not found or rejected by a filter.
  WARNING: Device for PV SgpsQV-nM1p-awvQ-PWeC-fyAO-swxA-qt8lq9 not found or rejected by a filter.

^C  Interrupted...
  Giving up waiting for lock.
  /run/lock/lvm/V_helter_skelter: flock failed: Interrupted system call
  Can't get lock for helter_skelter
  Cannot process volume group helter_skelter
  Interrupted...



Jan  8 20:21:42 host-111 qarshd[25562]: Running cmdline: echo offline > /sys/block/sdf/device/state
Jan  8 20:21:42 host-111 qarshd[25564]: Running cmdline: pvscan --cache /dev/sdf1
Jan  8 20:21:42 host-111 qarshd[25569]: Running cmdline: echo offline > /sys/block/sdc/device/state
Jan  8 20:21:42 host-111 lvm[1125]: Primary mirror device 253:2 has failed (D).
Jan  8 20:21:42 host-111 lvm[1125]: Device failure in helter_skelter-syncd_pri_leg_pri_log_2legs_2logs_1_mlog.
Jan  8 20:21:42 host-111 qarshd[25572]: Running cmdline: pvscan --cache /dev/sdc1
Jan  8 20:21:42 host-111 lvm[1125]: WARNING: Device for PV kNGAA7-0I5p-zySO-wrp6-kR7h-YzGp-n0ukeb not found or rejected by a filter.
Jan  8 20:21:42 host-111 lvm[1125]: WARNING: Failed to write an MDA of VG helter_skelter.
Jan  8 20:21:48 host-111 qarshd[25590]: Running cmdline: pvs -a
Jan  8 20:23:57 host-111 kernel: INFO: task kworker/0:1:18416 blocked for more than 120 seconds.
Jan  8 20:23:57 host-111 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan  8 20:23:57 host-111 kernel: kworker/0:1     D ffff88003fc13680     0 18416      2 0x00000080
Jan  8 20:23:57 host-111 kernel: Workqueue: kmirrord do_mirror [dm_mirror]
Jan  8 20:23:57 host-111 kernel: ffff88003b2afac0 0000000000000046 ffff88003be7f1c0 ffff88003b2affd8
Jan  8 20:23:57 host-111 kernel: ffff88003b2affd8 ffff88003b2affd8 ffff88003be7f1c0 ffff88003be7f1c0
Jan  8 20:23:57 host-111 kernel: ffff88003fc13f48 7fffffffffffffff ffff88003be7f1c0 ffff88003d032230
Jan  8 20:23:57 host-111 kernel: Call Trace:
Jan  8 20:23:57 host-111 kernel: [<ffffffff81609f59>] schedule+0x29/0x70
Jan  8 20:23:57 host-111 kernel: [<ffffffff81607ea9>] schedule_timeout+0x209/0x2d0
Jan  8 20:23:57 host-111 kernel: [<ffffffff812a9dc1>] ? submit_bio+0x71/0x150
Jan  8 20:23:57 host-111 kernel: [<ffffffff81052b0f>] ? kvm_clock_get_cycles+0x1f/0x30
Jan  8 20:23:57 host-111 kernel: [<ffffffff810c8c9c>] ? ktime_get_ts+0x4c/0xf0
Jan  8 20:23:57 host-111 kernel: [<ffffffff8160971b>] io_schedule_timeout+0x9b/0xf0
Jan  8 20:23:57 host-111 kernel: [<ffffffffa000a730>] ? bvec_next_page+0x10/0x10 [dm_mod]
Jan  8 20:23:57 host-111 kernel: [<ffffffff8160a746>] wait_for_completion_io+0x116/0x170
Jan  8 20:23:57 host-111 kernel: [<ffffffff810a94d0>] ? wake_up_state+0x20/0x20
Jan  8 20:23:57 host-111 kernel: [<ffffffffa000afbf>] dm_io+0x26f/0x2e0 [dm_mod]
Jan  8 20:23:57 host-111 kernel: [<ffffffffa000a8d0>] ? dm_io_client_destroy+0x30/0x30 [dm_mod]
Jan  8 20:23:57 host-111 kernel: [<ffffffffa000a730>] ? bvec_next_page+0x10/0x10 [dm_mod]
Jan  8 20:23:57 host-111 kernel: [<ffffffffa0028111>] disk_flush+0xa1/0x18c [dm_log]
Jan  8 20:23:57 host-111 kernel: [<ffffffffa001bea2>] ? dm_rh_inc_pending+0x72/0x140 [dm_region_hash]
Jan  8 20:23:57 host-111 kernel: [<ffffffffa001b0b3>] dm_rh_flush+0x13/0x20 [dm_region_hash]
Jan  8 20:23:57 host-111 kernel: [<ffffffffa002dfd4>] do_writes+0x194/0x650 [dm_mirror]
Jan  8 20:23:57 host-111 kernel: [<ffffffffa001c080>] ? dm_rh_recovery_prepare+0x110/0x190 [dm_region_hash]
Jan  8 20:23:57 host-111 kernel: [<ffffffffa002f2bf>] do_mirror+0x10f/0x230 [dm_mirror]
Jan  8 20:23:57 host-111 kernel: [<ffffffff8108f0ab>] process_one_work+0x17b/0x470
Jan  8 20:23:57 host-111 kernel: [<ffffffff8108fe8b>] worker_thread+0x11b/0x400
Jan  8 20:23:57 host-111 kernel: [<ffffffff8108fd70>] ? rescuer_thread+0x400/0x400
Jan  8 20:23:57 host-111 kernel: [<ffffffff8109726f>] kthread+0xcf/0xe0
Jan  8 20:23:57 host-111 kernel: [<ffffffff810971a0>] ? kthread_create_on_node+0x140/0x140
Jan  8 20:23:57 host-111 kernel: [<ffffffff816149fc>] ret_from_fork+0x7c/0xb0
Jan  8 20:23:57 host-111 kernel: [<ffffffff810971a0>] ? kthread_create_on_node+0x140/0x140



3.10.0-219.el7.x86_64
lvm2-2.02.114-4.el7    BUILT: Wed Jan  7 07:07:47 CST 2015
lvm2-libs-2.02.114-4.el7    BUILT: Wed Jan  7 07:07:47 CST 2015
lvm2-cluster-2.02.114-4.el7    BUILT: Wed Jan  7 07:07:47 CST 2015
device-mapper-1.02.92-4.el7    BUILT: Wed Jan  7 07:07:47 CST 2015
device-mapper-libs-1.02.92-4.el7    BUILT: Wed Jan  7 07:07:47 CST 2015
device-mapper-event-1.02.92-4.el7    BUILT: Wed Jan  7 07:07:47 CST 2015
device-mapper-event-libs-1.02.92-4.el7    BUILT: Wed Jan  7 07:07:47 CST 2015
device-mapper-persistent-data-0.4.1-2.el7    BUILT: Wed Nov 12 12:39:46 CST 2014
cmirror-2.02.114-4.el7    BUILT: Wed Jan  7 07:07:47 CST 2015

Comment 7 Jonathan Earl Brassow 2015-01-09 21:51:20 UTC
I've hit the problem also...

########################### TEST OUTPUT:
================================================================================
Iteration 3.1 started at Thu Jan  8 22:58:46 CST 2015
================================================================================
Scenario kill_pri_log_and_pri_leg_2_legs_2_logs: Kill primary leg and primary log of synced 2 leg redundant log mirror(s)

********* Mirror hash info for this scenario *********
* names:              syncd_pri_leg_pri_log_2legs_2logs_1
* sync:               1
* striped:            0
* leg devices:        /dev/sdf1 /dev/sde1
* log devices:        /dev/sdd1 /dev/sdi1
* no MDA devices:
* failpv(s):          /dev/sdf1 /dev/sdd1
* failnode(s):        bp-01
* lvmetad:            1
* leg fault policy:   allocate
* log fault policy:   allocate
******************************************************

Creating mirror(s) on bp-01...
bp-01: lvcreate --type mirror --mirrorlog mirrored -m 1 -n syncd_pri_leg_pri_log_2legs_2logs_1 -L 500M helter_skelter /dev/sdf1:0-2400 /dev/sde1:0-2400 /dev/sdd1:0-150 /dev/sdi1:0-150
WARNING: ext3 signature detected on /dev/helter_skelter/syncd_pri_leg_pri_log_2legs_2logs_1 at offset 1080. Wipe it? [y/n]: n
  Aborted wiping of ext3.
  1 existing signature left on the device.

Current mirror/raid device structure(s):
  LV                                                  Attr       LSize   Cpy%Sync Devices
   syncd_pri_leg_pri_log_2legs_2logs_1                 mwi-a-m--- 500.00m 2.40     syncd_pri_leg_pri_log_2legs_2logs_1_mimage_0(0),syncd_pri_leg_pri_log_2legs_2logs_1_mimage_1(0)
   [syncd_pri_leg_pri_log_2legs_2logs_1_mimage_0]      Iwi-aom--- 500.00m          /dev/sdf1(0)
   [syncd_pri_leg_pri_log_2legs_2logs_1_mimage_1]      Iwi-aom--- 500.00m          /dev/sde1(0)
   [syncd_pri_leg_pri_log_2legs_2logs_1_mlog]          mwi-aom---   4.00m 100.00   syncd_pri_leg_pri_log_2legs_2logs_1_mlog_mimage_0(0),syncd_pri_leg_pri_log_2legs_2logs_1_mlog_mimage_1(0)
   [syncd_pri_leg_pri_log_2legs_2logs_1_mlog_mimage_0] iwi-aom---   4.00m          /dev/sdd1(0)
   [syncd_pri_leg_pri_log_2legs_2logs_1_mlog_mimage_1] iwi-aom---   4.00m          /dev/sdi1(0)
   home                                                -wi-ao---- 407.33g          /dev/sda2(2016)
   root                                                -wi-ao----  50.00g          /dev/sda2(106293)
   swap                                                -wi-ao----   7.88g          /dev/sda2(0)



Waiting until all mirror|raid volumes become fully syncd...
   1/1 mirror(s) are fully synced: ( 100.00% )

Creating ext on top of mirror(s) on bp-01...
mke2fs 1.42.9 (28-Dec-2013)
Mounting mirrored ext filesystems on bp-01...

PV=/dev/sdd1
        syncd_pri_leg_pri_log_2legs_2logs_1_mlog_mimage_0: 1.3
PV=/dev/sdf1
Creating ext on top of mirror(s) on bp-01...
mke2fs 1.42.9 (28-Dec-2013)
Mounting mirrored ext filesystems on bp-01...

PV=/dev/sdd1
        syncd_pri_leg_pri_log_2legs_2logs_1_mlog_mimage_0: 1.3
PV=/dev/sdf1
        syncd_pri_leg_pri_log_2legs_2logs_1_mimage_0: 5.1
PV=/dev/sdd1
        syncd_pri_leg_pri_log_2legs_2logs_1_mlog_mimage_0: 1.3
PV=/dev/sdf1
        syncd_pri_leg_pri_log_2legs_2logs_1_mimage_0: 5.1
Writing verification files (checkit) to mirror(s) on...
        ---- bp-01 ----
checkit starting with:
CREATE
Num files:          100
Random Seed:        14823
Verify XIOR Stream: /tmp/checkit_syncd_pri_leg_pri_log_2legs_2logs_1
Working dir:        /mnt/syncd_pri_leg_pri_log_2legs_2logs_1/checkit

<start name="bp-01_syncd_pri_leg_pri_log_2legs_2logs_1" pid="38435" time="Thu Jan  8 22:59:08 2015" type="cmd" />
Sleeping 15 seconds to get some outsanding I/O locks before the failure
Verifying files (checkit) on mirror(s) on...
        ---- bp-01 ----
checkit starting with:
VERIFY
Verify XIOR Stream: /tmp/checkit_syncd_pri_leg_pri_log_2legs_2logs_1
Working dir:        /mnt/syncd_pri_leg_pri_log_2legs_2logs_1/checkit


Disabling device sdf on bp-01  /dev/sdf1: read failed after 0 of 512 at 898381381632: Input/output error
  /dev/sdf1: read failed after 0 of 512 at 898381488128: Input/output error
  /dev/sdf1: read failed after 0 of 512 at 0: Input/output error
  /dev/sdf1: read failed after 0 of 512 at 4096: Input/output error
  /dev/sdf1: read failed after 0 of 2048 at 0: Input/output error

Disabling device sdd on bp-01  /dev/sdd1: read failed after 0 of 512 at 898381381632: Input/output error
  /dev/sdd1: read failed after 0 of 512 at 898381488128: Input/output error
  /dev/sdd1: read failed after 0 of 512 at 0: Input/output error
  /dev/sdd1: read failed after 0 of 512 at 4096: Input/output error
  /dev/sdd1: read failed after 0 of 2048 at 0: Input/output error


######################## LOGS:


Jan  8 18:04:07 bp-01 kernel: INFO: task dmeventd:14685 blocked for more than 120 seconds.
Jan  8 18:04:07 bp-01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan  8 18:04:07 bp-01 kernel: dmeventd        D ffff88041ea14680     0 14685      1 0x00000080
Jan  8 18:04:07 bp-01 kernel: ffff8803f654ba40 0000000000000086 ffff880411248000 ffff8803f654bfd8
Jan  8 18:04:07 bp-01 kernel: ffff8803f654bfd8 ffff8803f654bfd8 ffff880411248000 ffff8803f654bbb8
Jan  8 18:04:07 bp-01 kernel: ffff8803f654bbc0 7fffffffffffffff ffff880411248000 ffff8803f654bba0
Jan  8 18:04:07 bp-01 kernel: Call Trace:
Jan  8 18:04:07 bp-01 kernel: [<ffffffff81608d09>] schedule+0x29/0x70
Jan  8 18:04:07 bp-01 kernel: [<ffffffff81606c59>] schedule_timeout+0x209/0x2d0
Jan  8 18:04:07 bp-01 kernel: [<ffffffff816000e5>] ? __slab_free+0x10e/0x277
Jan  8 18:04:07 bp-01 kernel: [<ffffffff816000e5>] ? __slab_free+0x10e/0x277
Jan  8 18:04:07 bp-01 kernel: [<ffffffff81609206>] wait_for_completion+0x116/0x170
Jan  8 18:04:07 bp-01 kernel: [<ffffffff810a9470>] ? wake_up_state+0x20/0x20
Jan  8 18:04:07 bp-01 kernel: [<ffffffff8108dd0c>] flush_workqueue+0x12c/0x5b0
Jan  8 18:04:07 bp-01 kernel: [<ffffffffa002f52a>] mirror_presuspend+0x14a/0x1ac [dm_mirror]
Jan  8 18:04:07 bp-01 kernel: [<ffffffff8126e00b>] ? cred_has_capability+0x6b/0x120
Jan  8 18:04:07 bp-01 kernel: [<ffffffffa00063ea>] dm_table_presuspend_targets+0x4a/0x60 [dm_mod]
Jan  8 18:04:07 bp-01 kernel: [<ffffffffa000376b>] dm_suspend+0x6b/0x1e0 [dm_mod]
Jan  8 18:04:07 bp-01 kernel: [<ffffffffa0008ba0>] ? table_load+0x380/0x380 [dm_mod]
Jan  8 18:04:07 bp-01 kernel: [<ffffffffa0008d34>] dev_suspend+0x194/0x250 [dm_mod]
Jan  8 18:04:07 bp-01 kernel: [<ffffffffa0008ba0>] ? table_load+0x380/0x380 [dm_mod]
Jan  8 18:04:07 bp-01 kernel: [<ffffffffa00095e5>] ctl_ioctl+0x255/0x500 [dm_mod]
Jan  8 18:04:07 bp-01 kernel: [<ffffffff810aeb8e>] ? account_entity_dequeue+0xae/0xd0
Jan  8 18:04:07 bp-01 kernel: [<ffffffffa00098a3>] dm_ctl_ioctl+0x13/0x20 [dm_mod]
Jan  8 18:04:07 bp-01 kernel: [<ffffffff811d9225>] do_vfs_ioctl+0x2e5/0x4c0
Jan  8 18:04:07 bp-01 kernel: [<ffffffff8126e34e>] ? file_has_perm+0xae/0xc0
Jan  8 18:04:07 bp-01 kernel: [<ffffffff8109ae10>] ? hrtimer_get_res+0x50/0x50
Jan  8 18:04:07 bp-01 kernel: [<ffffffff811d94a1>] SyS_ioctl+0xa1/0xc0
Jan  8 18:04:07 bp-01 kernel: [<ffffffff81613869>] system_call_fastpath+0x16/0x1b
Jan  8 18:04:07 bp-01 kernel: INFO: task kworker/0:1:9162 blocked for more than 120 seconds.
Jan  8 18:04:07 bp-01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan  8 18:04:07 bp-01 kernel: kworker/0:1     D ffff880217a14680     0  9162      2 0x00000080
Jan  8 18:04:07 bp-01 kernel: Workqueue: kmirrord do_mirror [dm_mirror]
Jan  8 18:04:07 bp-01 kernel: ffff880213747b00 0000000000000046 ffff88021248db00 ffff880213747fd8
Jan  8 18:04:07 bp-01 kernel: ffff880213747fd8 ffff880213747fd8 ffff88021248db00 ffff88021248db00
Jan  8 18:04:07 bp-01 kernel: ffff880217a14f48 7fffffffffffffff ffff88021248db00 ffff8803fe531910
Jan  8 18:04:07 bp-01 kernel: Call Trace:
Jan  8 18:04:07 bp-01 kernel: [<ffffffff81608d09>] schedule+0x29/0x70
Jan  8 18:04:07 bp-01 kernel: [<ffffffff81606c59>] schedule_timeout+0x209/0x2d0
Jan  8 18:04:07 bp-01 kernel: [<ffffffff812a9951>] ? submit_bio+0x71/0x150
Jan  8 18:04:07 bp-01 kernel: [<ffffffff8101b4e9>] ? read_tsc+0x9/0x10
Jan  8 18:04:07 bp-01 kernel: [<ffffffff810c8c3c>] ? ktime_get_ts+0x4c/0xf0
Jan  8 18:04:07 bp-01 kernel: [<ffffffff816084cb>] io_schedule_timeout+0x9b/0xf0
Jan  8 18:04:07 bp-01 kernel: [<ffffffffa000a240>] ? bvec_next_page+0x10/0x10 [dm_mod]
Jan  8 18:04:07 bp-01 kernel: [<ffffffff816094f6>] wait_for_completion_io+0x116/0x170
Jan  8 18:04:07 bp-01 kernel: [<ffffffff810a9470>] ? wake_up_state+0x20/0x20
Jan  8 18:04:07 bp-01 kernel: [<ffffffffa000aacf>] dm_io+0x26f/0x2e0 [dm_mod]
...skipping...
Jan  8 18:06:07 bp-01 kernel: INFO: task dmeventd:14685 blocked for more than 120 seconds.
Jan  8 18:06:07 bp-01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan  8 18:06:08 bp-01 kernel: dmeventd        D ffff88041ea14680     0 14685      1 0x00000080
Jan  8 18:06:08 bp-01 kernel: ffff8803f654ba40 0000000000000086 ffff880411248000 ffff8803f654bfd8
Jan  8 18:06:08 bp-01 kernel: ffff8803f654bfd8 ffff8803f654bfd8 ffff880411248000 ffff8803f654bbb8
Jan  8 18:06:08 bp-01 kernel: ffff8803f654bbc0 7fffffffffffffff ffff880411248000 ffff8803f654bba0
Jan  8 18:06:08 bp-01 kernel: Call Trace:
Jan  8 18:06:08 bp-01 kernel: [<ffffffff81608d09>] schedule+0x29/0x70
Jan  8 18:06:08 bp-01 kernel: [<ffffffff81606c59>] schedule_timeout+0x209/0x2d0
Jan  8 18:06:08 bp-01 kernel: [<ffffffff816000e5>] ? __slab_free+0x10e/0x277
Jan  8 18:06:08 bp-01 kernel: [<ffffffff816000e5>] ? __slab_free+0x10e/0x277
Jan  8 18:06:08 bp-01 kernel: [<ffffffff81609206>] wait_for_completion+0x116/0x170
Jan  8 18:06:08 bp-01 kernel: [<ffffffff810a9470>] ? wake_up_state+0x20/0x20
Jan  8 18:06:08 bp-01 kernel: [<ffffffff8108dd0c>] flush_workqueue+0x12c/0x5b0
Jan  8 18:06:08 bp-01 kernel: [<ffffffffa002f52a>] mirror_presuspend+0x14a/0x1ac [dm_mirror]
Jan  8 18:06:08 bp-01 kernel: [<ffffffff8126e00b>] ? cred_has_capability+0x6b/0x120
Jan  8 18:06:08 bp-01 kernel: [<ffffffffa00063ea>] dm_table_presuspend_targets+0x4a/0x60 [dm_mod]
Jan  8 18:06:08 bp-01 kernel: [<ffffffffa000376b>] dm_suspend+0x6b/0x1e0 [dm_mod]
Jan  8 18:06:08 bp-01 kernel: [<ffffffffa0008ba0>] ? table_load+0x380/0x380 [dm_mod]
Jan  8 18:06:08 bp-01 kernel: [<ffffffffa0008d34>] dev_suspend+0x194/0x250 [dm_mod]
Jan  8 18:06:08 bp-01 kernel: [<ffffffffa0008ba0>] ? table_load+0x380/0x380 [dm_mod]
Jan  8 18:06:08 bp-01 kernel: [<ffffffffa00095e5>] ctl_ioctl+0x255/0x500 [dm_mod]
Jan  8 18:06:08 bp-01 kernel: [<ffffffff810aeb8e>] ? account_entity_dequeue+0xae/0xd0
Jan  8 18:06:08 bp-01 kernel: [<ffffffffa00098a3>] dm_ctl_ioctl+0x13/0x20 [dm_mod]
Jan  8 18:06:08 bp-01 kernel: [<ffffffff811d9225>] do_vfs_ioctl+0x2e5/0x4c0
Jan  8 18:06:08 bp-01 kernel: [<ffffffff8126e34e>] ? file_has_perm+0xae/0xc0
Jan  8 18:06:08 bp-01 kernel: [<ffffffff8109ae10>] ? hrtimer_get_res+0x50/0x50
Jan  8 18:06:08 bp-01 kernel: [<ffffffff811d94a1>] SyS_ioctl+0xa1/0xc0
Jan  8 18:06:08 bp-01 kernel: [<ffffffff81613869>] system_call_fastpath+0x16/0x1b
Jan  8 18:06:08 bp-01 kernel: INFO: task kworker/0:1:9162 blocked for more than 120 seconds.
Jan  8 18:06:08 bp-01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan  8 18:06:08 bp-01 kernel: kworker/0:1     D ffff880217a14680     0  9162      2 0x00000080
Jan  8 18:06:08 bp-01 kernel: Workqueue: kmirrord do_mirror [dm_mirror]
Jan  8 18:06:08 bp-01 kernel: ffff880213747b00 0000000000000046 ffff88021248db00 ffff880213747fd8
Jan  8 18:06:08 bp-01 kernel: ffff880213747fd8 ffff880213747fd8 ffff88021248db00 ffff88021248db00
Jan  8 18:06:08 bp-01 kernel: ffff880217a14f48 7fffffffffffffff ffff88021248db00 ffff8803fe531910
Jan  8 18:06:08 bp-01 kernel: Call Trace:
Jan  8 18:06:08 bp-01 kernel: [<ffffffff81608d09>] schedule+0x29/0x70
Jan  8 18:06:08 bp-01 kernel: [<ffffffff81606c59>] schedule_timeout+0x209/0x2d0
Jan  8 18:06:08 bp-01 kernel: [<ffffffff812a9951>] ? submit_bio+0x71/0x150
Jan  8 18:06:08 bp-01 kernel: [<ffffffff8101b4e9>] ? read_tsc+0x9/0x10
Jan  8 18:06:08 bp-01 kernel: [<ffffffff810c8c3c>] ? ktime_get_ts+0x4c/0xf0
Jan  8 18:06:08 bp-01 kernel: [<ffffffff816084cb>] io_schedule_timeout+0x9b/0xf0
Jan  8 18:06:08 bp-01 kernel: [<ffffffffa000a240>] ? bvec_next_page+0x10/0x10 [dm_mod]
Jan  8 18:06:08 bp-01 kernel: [<ffffffff816094f6>] wait_for_completion_io+0x116/0x170
Jan  8 18:04:07 bp-01 kernel: dmeventd        D ffff88041ea14680     0 14685      1 0x00000080
Jan  8 18:04:07 bp-01 kernel: ffff8803f654ba40 0000000000000086 ffff880411248000 ffff8803f654bfd8
Jan  8 18:04:07 bp-01 kernel: ffff8803f654bfd8 ffff8803f654bfd8 ffff880411248000 ffff8803f654bbb8
Jan  8 18:04:07 bp-01 kernel: ffff8803f654bbc0 7fffffffffffffff ffff880411248000 ffff8803f654bba0
Jan  8 18:04:07 bp-01 kernel: Call Trace:
Jan  8 18:04:07 bp-01 kernel: [<ffffffff81608d09>] schedule+0x29/0x70
Jan  8 18:04:07 bp-01 kernel: [<ffffffff81606c59>] schedule_timeout+0x209/0x2d0
Jan  8 18:04:07 bp-01 kernel: [<ffffffff816000e5>] ? __slab_free+0x10e/0x277
Jan  8 18:04:07 bp-01 kernel: [<ffffffff816000e5>] ? __slab_free+0x10e/0x277
Jan  8 18:04:07 bp-01 kernel: [<ffffffff81609206>] wait_for_completion+0x116/0x170
Jan  8 18:04:07 bp-01 kernel: [<ffffffff810a9470>] ? wake_up_state+0x20/0x20
Jan  8 18:04:07 bp-01 kernel: [<ffffffff8108dd0c>] flush_workqueue+0x12c/0x5b0
Jan  8 18:04:07 bp-01 kernel: [<ffffffffa002f52a>] mirror_presuspend+0x14a/0x1ac [dm_mirror]
Jan  8 18:04:07 bp-01 kernel: [<ffffffff8126e00b>] ? cred_has_capability+0x6b/0x120
Jan  8 18:04:07 bp-01 kernel: [<ffffffffa00063ea>] dm_table_presuspend_targets+0x4a/0x60 [dm_mod]
Jan  8 18:04:07 bp-01 kernel: [<ffffffffa000376b>] dm_suspend+0x6b/0x1e0 [dm_mod]
Jan  8 18:04:07 bp-01 kernel: [<ffffffffa0008ba0>] ? table_load+0x380/0x380 [dm_mod]
Jan  8 18:04:07 bp-01 kernel: [<ffffffffa0008d34>] dev_suspend+0x194/0x250 [dm_mod]
Jan  8 18:04:07 bp-01 kernel: [<ffffffffa0008ba0>] ? table_load+0x380/0x380 [dm_mod]
Jan  8 18:04:07 bp-01 kernel: [<ffffffffa00095e5>] ctl_ioctl+0x255/0x500 [dm_mod]
Jan  8 18:04:07 bp-01 kernel: [<ffffffff810aeb8e>] ? account_entity_dequeue+0xae/0xd0
Jan  8 18:04:07 bp-01 kernel: [<ffffffffa00098a3>] dm_ctl_ioctl+0x13/0x20 [dm_mod]
Jan  8 18:04:07 bp-01 kernel: [<ffffffff811d9225>] do_vfs_ioctl+0x2e5/0x4c0
Jan  8 18:04:07 bp-01 kernel: [<ffffffff8126e34e>] ? file_has_perm+0xae/0xc0
Jan  8 18:04:07 bp-01 kernel: [<ffffffff8109ae10>] ? hrtimer_get_res+0x50/0x50
Jan  8 18:04:07 bp-01 kernel: [<ffffffff811d94a1>] SyS_ioctl+0xa1/0xc0
Jan  8 18:04:07 bp-01 kernel: [<ffffffff81613869>] system_call_fastpath+0x16/0x1b
Jan  8 18:04:07 bp-01 kernel: INFO: task kworker/0:1:9162 blocked for more than 120 seconds.
Jan  8 18:04:07 bp-01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan  8 18:04:07 bp-01 kernel: kworker/0:1     D ffff880217a14680     0  9162      2 0x00000080
Jan  8 18:04:07 bp-01 kernel: Workqueue: kmirrord do_mirror [dm_mirror]
Jan  8 18:04:07 bp-01 kernel: ffff880213747b00 0000000000000046 ffff88021248db00 ffff880213747fd8
Jan  8 18:04:07 bp-01 kernel: ffff880213747fd8 ffff880213747fd8 ffff88021248db00 ffff88021248db00
Jan  8 18:04:07 bp-01 kernel: ffff880217a14f48 7fffffffffffffff ffff88021248db00 ffff8803fe531910
Jan  8 18:04:07 bp-01 kernel: Call Trace:
Jan  8 18:04:07 bp-01 kernel: [<ffffffff81608d09>] schedule+0x29/0x70
Jan  8 18:04:07 bp-01 kernel: [<ffffffff81606c59>] schedule_timeout+0x209/0x2d0
Jan  8 18:04:07 bp-01 kernel: [<ffffffff812a9951>] ? submit_bio+0x71/0x150
Jan  8 18:04:07 bp-01 kernel: [<ffffffff8101b4e9>] ? read_tsc+0x9/0x10
Jan  8 18:04:07 bp-01 kernel: [<ffffffff810c8c3c>] ? ktime_get_ts+0x4c/0xf0
Jan  8 18:04:07 bp-01 kernel: [<ffffffff816084cb>] io_schedule_timeout+0x9b/0xf0
Jan  8 18:04:07 bp-01 kernel: [<ffffffffa000a240>] ? bvec_next_page+0x10/0x10 [dm_mod]
Jan  8 18:04:07 bp-01 kernel: [<ffffffff816094f6>] wait_for_completion_io+0x116/0x170
Jan  8 18:04:07 bp-01 kernel: [<ffffffff810a9470>] ? wake_up_state+0x20/0x20
Jan  8 18:04:07 bp-01 kernel: [<ffffffffa000aacf>] dm_io+0x26f/0x2e0 [dm_mod]
Jan  8 18:04:07 bp-01 kernel: [<ffffffffa000a3e0>] ? dm_io_client_destroy+0x30/0x30 [dm_mod]
Jan  8 18:04:07 bp-01 kernel: [<ffffffffa000a240>] ? bvec_next_page+0x10/0x10 [dm_mod]
Jan  8 18:04:07 bp-01 kernel: [<ffffffffa0028111>] disk_flush+0xa1/0x18c [dm_log]
Jan  8 18:04:07 bp-01 kernel: [<ffffffff810aeb8e>] ? account_entity_dequeue+0xae/0xd0
Jan  8 18:04:07 bp-01 kernel: [<ffffffffa001b88b>] dm_rh_update_states+0x2bb/0x2e0 [dm_region_hash]
Jan  8 18:04:07 bp-01 kernel: [<ffffffffa002f263>] do_mirror+0xb3/0x230 [dm_mirror]
Jan  8 18:04:07 bp-01 kernel: [<ffffffff8108f03b>] process_one_work+0x17b/0x470
Jan  8 18:04:07 bp-01 kernel: [<ffffffff8108fe1b>] worker_thread+0x11b/0x400
Jan  8 18:04:07 bp-01 kernel: [<ffffffff8108fd00>] ? rescuer_thread+0x400/0x400
Jan  8 18:04:07 bp-01 kernel: [<ffffffff810971ff>] kthread+0xcf/0xe0
Jan  8 18:04:07 bp-01 kernel: [<ffffffff81097130>] ? kthread_create_on_node+0x140/0x140
Jan  8 18:04:07 bp-01 kernel: [<ffffffff816137bc>] ret_from_fork+0x7c/0xb0
Jan  8 18:04:07 bp-01 kernel: [<ffffffff81097130>] ? kthread_create_on_node+0x140/0x140
Jan  8 18:04:07 bp-01 kernel: INFO: task jbd2/dm-8-8:14771 blocked for more than 120 seconds.
Jan  8 18:04:07 bp-01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan  8 18:04:07 bp-01 kernel: jbd2/dm-8-8     D ffff880217a54680     0 14771      2 0x00000080
Jan  8 18:04:07 bp-01 kernel: ffff8803f0febaf8 0000000000000046 ffff880410c94fa0 ffff8803f0febfd8
Jan  8 18:04:07 bp-01 kernel: ffff8803f0febfd8 ffff8803f0febfd8 ffff880410c94fa0 ffff880217a54f48
Jan  8 18:04:07 bp-01 kernel: ffff88041efc65e8 0000000000000002 ffffffff81155f50 ffff8803f0febb70
Jan  8 18:04:07 bp-01 kernel: Call Trace:
Jan  8 18:04:07 bp-01 kernel: [<ffffffff81155f50>] ? wait_on_page_read+0x60/0x60
Jan  8 18:04:07 bp-01 kernel: [<ffffffff8160900d>] io_schedule+0x9d/0x130
Jan  8 18:04:07 bp-01 kernel: [<ffffffff81155f5e>] sleep_on_page+0xe/0x20
Jan  8 18:04:07 bp-01 kernel: [<ffffffff81606de0>] __wait_on_bit+0x60/0x90
Jan  8 18:04:07 bp-01 kernel: [<ffffffff81155ce6>] wait_on_page_bit+0x86/0xb0
Jan  8 18:04:07 bp-01 kernel: [<ffffffff81098210>] ? autoremove_wake_function+0x40/0x40
Jan  8 18:04:07 bp-01 kernel: [<ffffffff81155e21>] filemap_fdatawait_range+0x111/0x1b0
Jan  8 18:04:07 bp-01 kernel: [<ffffffff81155ee7>] filemap_fdatawait+0x27/0x30
Jan  8 18:04:07 bp-01 kernel: [<ffffffffa03e1a9f>] jbd2_journal_commit_transaction+0xa8f/0x19a0 [jbd2]
Jan  8 18:04:07 bp-01 kernel: [<ffffffff8101ba29>] ? sched_clock+0x9/0x10
Jan  8 18:04:07 bp-01 kernel: [<ffffffff810125c6>] ? __switch_to+0x136/0x4a0
Jan  8 18:04:07 bp-01 kernel: [<ffffffffa03e6da9>] kjournald2+0xc9/0x260 [jbd2]
Jan  8 18:04:07 bp-01 kernel: [<ffffffff810981d0>] ? wake_up_bit+0x30/0x30
Jan  8 18:04:07 bp-01 kernel: [<ffffffffa03e6ce0>] ? commit_timeout+0x10/0x10 [jbd2]
Jan  8 18:04:07 bp-01 kernel: [<ffffffff810971ff>] kthread+0xcf/0xe0
Jan  8 18:04:07 bp-01 kernel: [<ffffffff81097130>] ? kthread_create_on_node+0x140/0x140
Jan  8 18:04:07 bp-01 kernel: [<ffffffff816137bc>] ret_from_fork+0x7c/0xb0
Jan  8 18:04:07 bp-01 kernel: [<ffffffff81097130>] ? kthread_create_on_node+0x140/0x140
Jan  8 18:04:07 bp-01 kernel: INFO: task xdoio:14830 blocked for more than 120 seconds.
Jan  8 18:04:07 bp-01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan  8 18:04:07 bp-01 kernel: xdoio           D ffff88041ea34680     0 14830  14829 0x00000080
Jan  8 18:04:07 bp-01 kernel: ffff8802119cba58 0000000000000082 ffff8800de7f2d80 ffff8802119cbfd8
Jan  8 18:04:07 bp-01 kernel: ffff8802119cbfd8 ffff8802119cbfd8 ffff8800de7f2d80 ffff88041ea34f48
Jan  8 18:04:07 bp-01 kernel: ffff88021ffa3c98 0000000000000002 ffffffffa03de720 ffff8802119cbad0
Jan  8 18:04:07 bp-01 kernel: Call Trace:
Jan  8 18:04:07 bp-01 kernel: [<ffffffffa03de720>] ? start_this_handle+0x5d0/0x5d0 [jbd2]
Jan  8 18:04:07 bp-01 kernel: [<ffffffff8160900d>] io_schedule+0x9d/0x130
Jan  8 18:04:07 bp-01 kernel: [<ffffffffa03de72e>] sleep_on_shadow_bh+0xe/0x20 [jbd2]
Jan  8 18:04:07 bp-01 kernel: [<ffffffff81606de0>] __wait_on_bit+0x60/0x90
Jan  8 18:04:07 bp-01 kernel: [<ffffffffa03de720>] ? start_this_handle+0x5d0/0x5d0 [jbd2]
Jan  8 18:04:07 bp-01 kernel: [<ffffffff81606e97>] out_of_line_wait_on_bit+0x87/0xb0
Jan  8 18:04:07 bp-01 kernel: [<ffffffff81098210>] ? autoremove_wake_function+0x40/0x40
Jan  8 18:04:07 bp-01 kernel: [<ffffffffa03dfc0d>] do_get_write_access+0x2ed/0x500 [jbd2]
Jan  8 18:04:07 bp-01 kernel: [<ffffffff811f91cd>] ? __getblk+0x2d/0x2e0
Jan  8 18:04:07 bp-01 kernel: [<ffffffffa03dfe47>] jbd2_journal_get_write_access+0x27/0x40 [jbd2]
Jan  8 18:04:07 bp-01 kernel: [<ffffffffa065ad5b>] __ext4_journal_get_write_access+0x3b/0x80 [ext4]
Jan  8 18:04:07 bp-01 kernel: [<ffffffffa062d980>] ext4_reserve_inode_write+0x70/0xa0 [ext4]
Jan  8 18:04:07 bp-01 kernel: [<ffffffffa0630e10>] ? ext4_dirty_inode+0x40/0x60 [ext4]
Jan  8 18:04:07 bp-01 kernel: [<ffffffffa062da03>] ext4_mark_inode_dirty+0x53/0x210 [ext4]
Jan  8 18:04:07 bp-01 kernel: [<ffffffffa0630e10>] ext4_dirty_inode+0x40/0x60 [ext4]
Jan  8 18:04:07 bp-01 kernel: [<ffffffff811f1aea>] __mark_inode_dirty+0xca/0x290
Jan  8 18:04:07 bp-01 kernel: [<ffffffff811e0b11>] update_time+0x81/0xd0
Jan  8 18:04:07 bp-01 kernel: [<ffffffff811e0c79>] touch_atime+0x119/0x160
Jan  8 18:04:07 bp-01 kernel: [<ffffffff81158138>] generic_file_aio_read+0x5b8/0x750
Jan  8 18:04:07 bp-01 kernel: [<ffffffff811c590d>] do_sync_read+0x8d/0xd0
Jan  8 18:04:07 bp-01 kernel: [<ffffffff811c5fec>] vfs_read+0x9c/0x170
Jan  8 18:04:07 bp-01 kernel: [<ffffffff811c6b18>] SyS_read+0x58/0xb0
Jan  8 18:04:07 bp-01 kernel: [<ffffffff81613869>] system_call_fastpath+0x16/0x1b
Jan  8 18:06:07 bp-01 kernel: INFO: task dmeventd:14685 blocked for more than 120 seconds.
Jan  8 18:06:07 bp-01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan  8 18:06:08 bp-01 kernel: dmeventd        D ffff88041ea14680     0 14685      1 0x00000080
Jan  8 18:06:08 bp-01 kernel: ffff8803f654ba40 0000000000000086 ffff880411248000 ffff8803f654bfd8
Jan  8 18:06:08 bp-01 kernel: ffff8803f654bfd8 ffff8803f654bfd8 ffff880411248000 ffff8803f654bbb8
Jan  8 18:06:08 bp-01 kernel: ffff8803f654bbc0 7fffffffffffffff ffff880411248000 ffff8803f654bba0
Jan  8 18:06:08 bp-01 kernel: Call Trace:
Jan  8 18:06:08 bp-01 kernel: [<ffffffff81608d09>] schedule+0x29/0x70
Jan  8 18:06:08 bp-01 kernel: [<ffffffff81606c59>] schedule_timeout+0x209/0x2d0
Jan  8 18:06:08 bp-01 kernel: [<ffffffff816000e5>] ? __slab_free+0x10e/0x277
Jan  8 18:06:08 bp-01 kernel: [<ffffffff816000e5>] ? __slab_free+0x10e/0x277
Jan  8 18:06:08 bp-01 kernel: [<ffffffff81609206>] wait_for_completion+0x116/0x170
Jan  8 18:06:08 bp-01 kernel: [<ffffffff810a9470>] ? wake_up_state+0x20/0x20
Jan  8 18:06:08 bp-01 kernel: [<ffffffff8108dd0c>] flush_workqueue+0x12c/0x5b0
Jan  8 18:06:08 bp-01 kernel: [<ffffffffa002f52a>] mirror_presuspend+0x14a/0x1ac [dm_mirror]
Jan  8 18:06:08 bp-01 kernel: [<ffffffff8126e00b>] ? cred_has_capability+0x6b/0x120
Jan  8 18:06:08 bp-01 kernel: [<ffffffffa00063ea>] dm_table_presuspend_targets+0x4a/0x60 [dm_mod]
Jan  8 18:06:08 bp-01 kernel: [<ffffffffa000376b>] dm_suspend+0x6b/0x1e0 [dm_mod]
Jan  8 18:06:08 bp-01 kernel: [<ffffffffa0008ba0>] ? table_load+0x380/0x380 [dm_mod]
Jan  8 18:06:08 bp-01 kernel: [<ffffffffa0008d34>] dev_suspend+0x194/0x250 [dm_mod]
Jan  8 18:06:08 bp-01 kernel: [<ffffffffa0008ba0>] ? table_load+0x380/0x380 [dm_mod]
Jan  8 18:06:08 bp-01 kernel: [<ffffffffa00095e5>] ctl_ioctl+0x255/0x500 [dm_mod]
Jan  8 18:06:08 bp-01 kernel: [<ffffffff810aeb8e>] ? account_entity_dequeue+0xae/0xd0
Jan  8 18:06:08 bp-01 kernel: [<ffffffffa00098a3>] dm_ctl_ioctl+0x13/0x20 [dm_mod]
Jan  8 18:06:08 bp-01 kernel: [<ffffffff811d9225>] do_vfs_ioctl+0x2e5/0x4c0
Jan  8 18:06:08 bp-01 kernel: [<ffffffff8126e34e>] ? file_has_perm+0xae/0xc0
Jan  8 18:06:08 bp-01 kernel: [<ffffffff8109ae10>] ? hrtimer_get_res+0x50/0x50
Jan  8 18:06:08 bp-01 kernel: [<ffffffff811d94a1>] SyS_ioctl+0xa1/0xc0
Jan  8 18:06:08 bp-01 kernel: [<ffffffff81613869>] system_call_fastpath+0x16/0x1b
Jan  8 18:06:08 bp-01 kernel: INFO: task kworker/0:1:9162 blocked for more than 120 seconds.
Jan  8 18:06:08 bp-01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan  8 18:06:08 bp-01 kernel: kworker/0:1     D ffff880217a14680     0  9162      2 0x00000080
Jan  8 18:06:08 bp-01 kernel: Workqueue: kmirrord do_mirror [dm_mirror]
Jan  8 18:06:08 bp-01 kernel: ffff880213747b00 0000000000000046 ffff88021248db00 ffff880213747fd8
Jan  8 18:06:08 bp-01 kernel: ffff880213747fd8 ffff880213747fd8 ffff88021248db00 ffff88021248db00
Jan  8 18:06:08 bp-01 kernel: ffff880217a14f48 7fffffffffffffff ffff88021248db00 ffff8803fe531910
Jan  8 18:06:08 bp-01 kernel: Call Trace:
Jan  8 18:06:08 bp-01 kernel: [<ffffffff81608d09>] schedule+0x29/0x70
Jan  8 18:06:08 bp-01 kernel: [<ffffffff81606c59>] schedule_timeout+0x209/0x2d0
Jan  8 18:06:08 bp-01 kernel: [<ffffffff812a9951>] ? submit_bio+0x71/0x150
Jan  8 18:06:08 bp-01 kernel: [<ffffffff8101b4e9>] ? read_tsc+0x9/0x10
Jan  8 18:06:08 bp-01 kernel: [<ffffffff810c8c3c>] ? ktime_get_ts+0x4c/0xf0
Jan  8 18:06:08 bp-01 kernel: [<ffffffff816084cb>] io_schedule_timeout+0x9b/0xf0
Jan  8 18:06:08 bp-01 kernel: [<ffffffffa000a240>] ? bvec_next_page+0x10/0x10 [dm_mod]
Jan  8 18:06:08 bp-01 kernel: [<ffffffff816094f6>] wait_for_completion_io+0x116/0x170
Jan  8 18:06:08 bp-01 kernel: [<ffffffff810a9470>] ? wake_up_state+0x20/0x20
Jan  8 18:06:08 bp-01 kernel: [<ffffffffa000aacf>] dm_io+0x26f/0x2e0 [dm_mod]
Jan  8 18:06:08 bp-01 kernel: [<ffffffffa000a3e0>] ? dm_io_client_destroy+0x30/0x30 [dm_mod]
Jan  8 18:06:08 bp-01 kernel: [<ffffffffa000a240>] ? bvec_next_page+0x10/0x10 [dm_mod]
Jan  8 18:06:08 bp-01 kernel: [<ffffffffa0028111>] disk_flush+0xa1/0x18c [dm_log]
Jan  8 18:06:08 bp-01 kernel: [<ffffffff810aeb8e>] ? account_entity_dequeue+0xae/0xd0
Jan  8 18:06:08 bp-01 kernel: [<ffffffffa001b88b>] dm_rh_update_states+0x2bb/0x2e0 [dm_region_hash]
Jan  8 18:06:08 bp-01 kernel: [<ffffffffa002f263>] do_mirror+0xb3/0x230 [dm_mirror]
Jan  8 18:06:08 bp-01 kernel: [<ffffffff8108f03b>] process_one_work+0x17b/0x470
Jan  8 18:06:08 bp-01 kernel: [<ffffffff8108fe1b>] worker_thread+0x11b/0x400
Jan  8 18:06:08 bp-01 kernel: [<ffffffff8108fd00>] ? rescuer_thread+0x400/0x400
Jan  8 18:06:08 bp-01 kernel: [<ffffffff810971ff>] kthread+0xcf/0xe0
Jan  8 18:06:08 bp-01 kernel: [<ffffffff81097130>] ? kthread_create_on_node+0x140/0x140
Jan  8 18:06:08 bp-01 kernel: [<ffffffff816137bc>] ret_from_fork+0x7c/0xb0
Jan  8 18:06:08 bp-01 kernel: [<ffffffff81097130>] ? kthread_create_on_node+0x140/0x140
Jan  8 18:06:08 bp-01 kernel: INFO: task jbd2/dm-8-8:14771 blocked for more than 120 seconds.
Jan  8 18:06:08 bp-01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan  8 18:06:08 bp-01 kernel: jbd2/dm-8-8     D ffff880217a54680     0 14771      2 0x00000080
Jan  8 18:06:08 bp-01 kernel: ffff8803f0febaf8 0000000000000046 ffff880410c94fa0 ffff8803f0febfd8
Jan  8 18:06:08 bp-01 kernel: ffff8803f0febfd8 ffff8803f0febfd8 ffff880410c94fa0 ffff880217a54f48
Jan  8 18:06:08 bp-01 kernel: ffff88041efc65e8 0000000000000002 ffffffff81155f50 ffff8803f0febb70
Jan  8 18:06:08 bp-01 kernel: Call Trace:
Jan  8 18:06:08 bp-01 kernel: [<ffffffff81155f50>] ? wait_on_page_read+0x60/0x60
Jan  8 18:06:08 bp-01 kernel: [<ffffffff8160900d>] io_schedule+0x9d/0x130
Jan  8 18:06:08 bp-01 kernel: [<ffffffff81155f5e>] sleep_on_page+0xe/0x20
Jan  8 18:06:08 bp-01 kernel: [<ffffffff81606de0>] __wait_on_bit+0x60/0x90
Jan  8 18:06:08 bp-01 kernel: [<ffffffff81155ce6>] wait_on_page_bit+0x86/0xb0
Jan  8 18:06:08 bp-01 kernel: [<ffffffff81098210>] ? autoremove_wake_function+0x40/0x40
Jan  8 18:06:08 bp-01 kernel: [<ffffffff81155e21>] filemap_fdatawait_range+0x111/0x1b0
Jan  8 18:06:08 bp-01 kernel: [<ffffffff81155ee7>] filemap_fdatawait+0x27/0x30
Jan  8 18:06:08 bp-01 kernel: [<ffffffffa03e1a9f>] jbd2_journal_commit_transaction+0xa8f/0x19a0 [jbd2]
Jan  8 18:06:08 bp-01 kernel: [<ffffffff8101ba29>] ? sched_clock+0x9/0x10
Jan  8 18:06:08 bp-01 kernel: [<ffffffff810125c6>] ? __switch_to+0x136/0x4a0
Jan  8 18:06:08 bp-01 kernel: [<ffffffffa03e6da9>] kjournald2+0xc9/0x260 [jbd2]
Jan  8 18:06:08 bp-01 kernel: [<ffffffff810981d0>] ? wake_up_bit+0x30/0x30
Jan  8 18:06:08 bp-01 kernel: [<ffffffffa03e6ce0>] ? commit_timeout+0x10/0x10 [jbd2]
Jan  8 18:06:08 bp-01 kernel: [<ffffffff810971ff>] kthread+0xcf/0xe0
Jan  8 18:06:08 bp-01 kernel: [<ffffffff81097130>] ? kthread_create_on_node+0x140/0x140
Jan  8 18:06:08 bp-01 kernel: [<ffffffff816137bc>] ret_from_fork+0x7c/0xb0
Jan  8 18:06:08 bp-01 kernel: [<ffffffff81097130>] ? kthread_create_on_node+0x140/0x140
Jan  8 18:06:08 bp-01 kernel: INFO: task xdoio:14830 blocked for more than 120 seconds.
Jan  8 18:06:08 bp-01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan  8 18:06:08 bp-01 kernel: xdoio           D ffff88041ea34680     0 14830  14829 0x00000080
Jan  8 18:06:08 bp-01 kernel: ffff8802119cba58 0000000000000082 ffff8800de7f2d80 ffff8802119cbfd8
Jan  8 18:06:08 bp-01 kernel: ffff8802119cbfd8 ffff8802119cbfd8 ffff8800de7f2d80 ffff88041ea34f48
Jan  8 18:06:08 bp-01 kernel: ffff88021ffa3c98 0000000000000002 ffffffffa03de720 ffff8802119cbad0
Jan  8 18:06:08 bp-01 kernel: Call Trace:
Jan  8 18:06:08 bp-01 kernel: [<ffffffffa03de720>] ? start_this_handle+0x5d0/0x5d0 [jbd2]
Jan  8 18:06:08 bp-01 kernel: [<ffffffff8160900d>] io_schedule+0x9d/0x130
Jan  8 18:06:08 bp-01 kernel: [<ffffffffa03de72e>] sleep_on_shadow_bh+0xe/0x20 [jbd2]
Jan  8 18:06:08 bp-01 kernel: [<ffffffff81606de0>] __wait_on_bit+0x60/0x90
Jan  8 18:06:08 bp-01 kernel: [<ffffffffa03de720>] ? start_this_handle+0x5d0/0x5d0 [jbd2]
Jan  8 18:06:08 bp-01 kernel: [<ffffffff81606e97>] out_of_line_wait_on_bit+0x87/0xb0
Jan  8 18:06:08 bp-01 kernel: [<ffffffff81098210>] ? autoremove_wake_function+0x40/0x40
Jan  8 18:06:08 bp-01 kernel: [<ffffffffa03dfc0d>] do_get_write_access+0x2ed/0x500 [jbd2]
Jan  8 18:06:08 bp-01 kernel: [<ffffffff811f91cd>] ? __getblk+0x2d/0x2e0
Jan  8 18:06:08 bp-01 kernel: [<ffffffffa03dfe47>] jbd2_journal_get_write_access+0x27/0x40 [jbd2]
Jan  8 18:06:08 bp-01 kernel: [<ffffffffa065ad5b>] __ext4_journal_get_write_access+0x3b/0x80 [ext4]
Jan  8 18:06:08 bp-01 kernel: [<ffffffffa062d980>] ext4_reserve_inode_write+0x70/0xa0 [ext4]
Jan  8 18:06:08 bp-01 kernel: [<ffffffffa0630e10>] ? ext4_dirty_inode+0x40/0x60 [ext4]
Jan  8 18:06:08 bp-01 kernel: [<ffffffffa062da03>] ext4_mark_inode_dirty+0x53/0x210 [ext4]
Jan  8 18:06:08 bp-01 kernel: [<ffffffffa0630e10>] ext4_dirty_inode+0x40/0x60 [ext4]
Jan  8 18:06:08 bp-01 kernel: [<ffffffff811f1aea>] __mark_inode_dirty+0xca/0x290
Jan  8 18:06:08 bp-01 kernel: [<ffffffff811e0b11>] update_time+0x81/0xd0
Jan  8 18:06:08 bp-01 kernel: [<ffffffff811e0c79>] touch_atime+0x119/0x160
Jan  8 18:06:08 bp-01 kernel: [<ffffffff81158138>] generic_file_aio_read+0x5b8/0x750
Jan  8 18:06:08 bp-01 kernel: [<ffffffff811c590d>] do_sync_read+0x8d/0xd0
Jan  8 18:04:07 bp-01 kernel: [<ffffffffa03de720>] ? start_this_handle+0x5d0/0x5d0 [jbd2]
Jan  8 18:04:07 bp-01 kernel: [<ffffffff8160900d>] io_schedule+0x9d/0x130
Jan  8 18:04:07 bp-01 kernel: [<ffffffffa03de72e>] sleep_on_shadow_bh+0xe/0x20 [jbd2]
Jan  8 18:04:07 bp-01 kernel: [<ffffffff81606de0>] __wait_on_bit+0x60/0x90
Jan  8 18:04:07 bp-01 kernel: [<ffffffffa03de720>] ? start_this_handle+0x5d0/0x5d0 [jbd2]
Jan  8 18:04:07 bp-01 kernel: [<ffffffff81606e97>] out_of_line_wait_on_bit+0x87/0xb0
Jan  8 18:04:07 bp-01 kernel: [<ffffffff81098210>] ? autoremove_wake_function+0x40/0x40
Jan  8 18:04:07 bp-01 kernel: [<ffffffffa03dfc0d>] do_get_write_access+0x2ed/0x500 [jbd2]
Jan  8 18:04:07 bp-01 kernel: [<ffffffff811f91cd>] ? __getblk+0x2d/0x2e0
Jan  8 18:04:07 bp-01 kernel: [<ffffffffa03dfe47>] jbd2_journal_get_write_access+0x27/0x40 [jbd2]
Jan  8 18:04:07 bp-01 kernel: [<ffffffffa065ad5b>] __ext4_journal_get_write_access+0x3b/0x80 [ext4]
Jan  8 18:04:07 bp-01 kernel: [<ffffffffa062d980>] ext4_reserve_inode_write+0x70/0xa0 [ext4]
Jan  8 18:04:07 bp-01 kernel: [<ffffffffa0630e10>] ? ext4_dirty_inode+0x40/0x60 [ext4]
Jan  8 18:04:07 bp-01 kernel: [<ffffffffa062da03>] ext4_mark_inode_dirty+0x53/0x210 [ext4]
Jan  8 18:04:07 bp-01 kernel: [<ffffffffa0630e10>] ext4_dirty_inode+0x40/0x60 [ext4]
Jan  8 18:04:07 bp-01 kernel: [<ffffffff811f1aea>] __mark_inode_dirty+0xca/0x290
Jan  8 18:04:07 bp-01 kernel: [<ffffffff811e0b11>] update_time+0x81/0xd0
Jan  8 18:04:07 bp-01 kernel: [<ffffffff811e0c79>] touch_atime+0x119/0x160
Jan  8 18:04:07 bp-01 kernel: [<ffffffff81158138>] generic_file_aio_read+0x5b8/0x750
Jan  8 18:04:07 bp-01 kernel: [<ffffffff811c590d>] do_sync_read+0x8d/0xd0
Jan  8 18:04:07 bp-01 kernel: [<ffffffff811c5fec>] vfs_read+0x9c/0x170
Jan  8 18:04:07 bp-01 kernel: [<ffffffff811c6b18>] SyS_read+0x58/0xb0
Jan  8 18:04:07 bp-01 kernel: [<ffffffff81613869>] system_call_fastpath+0x16/0x1b

Comment 8 Jonathan Earl Brassow 2015-01-09 22:02:15 UTC
I'd like to try to reproduce this without a "mirrored" log - otherwise, we are dealing with two mirrors.

Comment 9 Jonathan Earl Brassow 2015-01-13 21:25:18 UTC
it is simple to create an issue using mirrored-log mirrors.  Disable the log device slightly after the image device.  dmeventd will try to respond to the failure in the leg, but before any I/O action is taken the device for the log is failed which causes the system to wait to flush I/O through the log.  This cannot be accomplished until the log is fixed, which can't happen until the current process attempting to repair the image finishes - a catch-22.

This type of bug will never be addressed.  Instead, users should use the "raid1" segment type to avoid this type of double-fault scenario.  (There is no excuse not to, especially since the "mirror" segment type should only ever be used if cluster mirroring is required and cluster mirroring does not support a mirrored log.)

I have also encountered a hang when using a core log mirror.  However, I hit this when a drive failed to respond during sync I/O.  This caused the QA test to stall, but that is to be expected, since the mirror can't reach 100% in-sync if there was an I/O failure during I/O.

I am continuing to test and see if I can reproduce other hang issues.  The original description and comment 2 seem like legitimate issues that should be worked on.  It seems odd that in each of those cases, sync% is not 100%.  I would love to see the logs for both and 'dmsetup status' for each as well.

Comment 10 Jonathan Earl Brassow 2015-01-13 21:33:16 UTC
Reproduced again:

================================================================================
Iteration 9.1 started at Tue Jan 13 15:17:53 CST 2015
================================================================================
Scenario kill_primary_synced_2_legs: Kill primary leg of synced 2 leg mirror(s)

********* Mirror hash info for this scenario *********
* names:              syncd_primary_2legs_1
* sync:               1
* striped:            0
* leg devices:        /dev/sdh1 /dev/sdi1
* log devices:        /dev/sdd1
* no MDA devices:
* failpv(s):          /dev/sdh1
* failnode(s):        bp-01
* lvmetad:            1
* leg fault policy:   allocate
* log fault policy:   allocate
******************************************************

Creating mirror(s) on bp-01...
bp-01: lvcreate --type mirror -m 1 -n syncd_primary_2legs_1 -L 500M helter_skelter /dev/sdh1:0-2400 /dev/sdi1:0-2400 /dev/sdd1:0-150
WARNING: ext3 signature detected on /dev/helter_skelter/syncd_primary_2legs_1 at offset 1080. Wipe it? [y/n]: n
  Aborted wiping of ext3.
  1 existing signature left on the device.

Current mirror/raid device structure(s):
  LV                               Attr       LSize   Cpy%Sync Devices
   syncd_primary_2legs_1            mwi-a-m--- 500.00m 2.40     syncd_primary_2legs_1_mimage_0(0),syncd_primary_2legs_1_mimage_1(0)
   [syncd_primary_2legs_1_mimage_0] Iwi-aom--- 500.00m          /dev/sdh1(0)
   [syncd_primary_2legs_1_mimage_1] Iwi-aom--- 500.00m          /dev/sdi1(0)
   [syncd_primary_2legs_1_mlog]     lwi-aom---   4.00m          /dev/sdd1(0)
   home                             -wi-ao---- 407.33g          /dev/sda2(2016)
   root                             -wi-ao----  50.00g          /dev/sda2(106293)
   swap                             -wi-ao----   7.88g          /dev/sda2(0)



Waiting until all mirror|raid volumes become fully syncd...
   1/1 mirror(s) are fully synced: ( 100.00% )

Creating ext on top of mirror(s) on bp-01...
mke2fs 1.42.9 (28-Dec-2013)
Mounting mirrored ext filesystems on bp-01...

PV=/dev/sdh1
        syncd_primary_2legs_1_mimage_0: 5.1
PV=/dev/sdh1
        syncd_primary_2legs_1_mimage_0: 5.1
Writing verification files (checkit) to mirror(s) on...
        ---- bp-01 ----
checkit starting with:
CREATE
Num files:          100
Random Seed:        18859
Verify XIOR Stream: /tmp/checkit_syncd_primary_2legs_1
Working dir:        /mnt/syncd_primary_2legs_1/checkit

<start name="bp-01_syncd_primary_2legs_1" pid="32191" time="Tue Jan 13 15:18:13 2015" type="cmd" />
Sleeping 15 seconds to get some outsanding I/O locks before the failure
Verifying files (checkit) on mirror(s) on...
        ---- bp-01 ----
checkit starting with:
VERIFY
Verify XIOR Stream: /tmp/checkit_syncd_primary_2legs_1
Working dir:        /mnt/syncd_primary_2legs_1/checkit


Disabling device sdh on bp-01  /dev/sdh1: read failed after 0 of 512 at 898381381632: Input/output error
  /dev/sdh1: read failed after 0 of 512 at 898381488128: Input/output error
  /dev/sdh1: read failed after 0 of 512 at 0: Input/output error
  /dev/sdh1: read failed after 0 of 512 at 4096: Input/output error
  /dev/sdh1: read failed after 0 of 2048 at 0: Input/output error


Getting recovery check start time from /var/log/messages: Jan 13 10:19
Attempting I/O to cause mirror down conversion(s) on bp-01
10+0 records in
10+0 records out
41943040 bytes (42 MB) copied, 0.304708 s, 138 MB/s
[HANG]


[root@bp-01 ~]# dmsetup status
helter_skelter-syncd_primary_2legs_1_mimage_0: 0 1024000 linear
rhel_bp--01-home: 0 854237184 linear
helter_skelter-syncd_primary_2legs_1_mlog: 0 8192 linear
helter_skelter-syncd_primary_2legs_1: 0 1024000 mirror 2 253:4 253:5 0/1000 1 AF 3 disk 253:3 A
rhel_bp--01-swap: 0 16515072 linear
rhel_bp--01-root: 0 104857600 linear
helter_skelter-syncd_primary_2legs_1_mimage_1: 0 1024000 linear


From the logs:
Jan 13 10:19:42 bp-01 kernel: device-mapper: raid1: Write error during recovery (error = 0x1)
Jan 13 10:19:42 bp-01 kernel: sd 7:0:0:6: rejecting I/O to offline device
...
Jan 13 10:21:27 bp-01 lvm[1125]: helter_skelter/syncd_primary_2legs_1: Converted: 0.0%
Jan 13 10:21:42 bp-01 lvm[1125]: helter_skelter/syncd_primary_2legs_1: Converted: 0.0%
Jan 13 10:21:45 bp-01 kernel: INFO: task kworker/u130:0:12421 blocked for more than 120 seconds.
Jan 13 10:21:45 bp-01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan 13 10:21:45 bp-01 kernel: kworker/u130:0  D ffff88041ea94680     0 12421      2 0x00000080
Jan 13 10:21:45 bp-01 kernel: Workqueue: writeback bdi_writeback_workfn (flush-253:6)
Jan 13 10:21:45 bp-01 kernel: ffff88040ea9f918 0000000000000046 ffff8804006a4fa0 ffff88040ea9ffd8
Jan 13 10:21:45 bp-01 kernel: ffff88040ea9ffd8 ffff88040ea9ffd8 ffff8804006a4fa0 ffff88041ea94f48
Jan 13 10:21:45 bp-01 kernel: ffff88021ff96be8 0000000000000002 ffffffff81155f50 ffff88040ea9f990
Jan 13 10:21:45 bp-01 kernel: Call Trace:
Jan 13 10:21:45 bp-01 kernel: [<ffffffff81155f50>] ? wait_on_page_read+0x60/0x60
Jan 13 10:21:45 bp-01 kernel: [<ffffffff8160900d>] io_schedule+0x9d/0x130
Jan 13 10:21:45 bp-01 kernel: [<ffffffff81155f5e>] sleep_on_page+0xe/0x20
Jan 13 10:21:45 bp-01 kernel: [<ffffffff81606de0>] __wait_on_bit+0x60/0x90
Jan 13 10:21:45 bp-01 kernel: [<ffffffff81155ce6>] wait_on_page_bit+0x86/0xb0
Jan 13 10:21:45 bp-01 kernel: [<ffffffff81098210>] ? autoremove_wake_function+0x40/0x40
Jan 13 10:21:45 bp-01 kernel: [<ffffffffa0582e93>] mpage_prepare_extent_to_map+0x1e3/0x2e0 [ext4]
Jan 13 10:21:45 bp-01 kernel: [<ffffffffa0587023>] ext4_writepages+0x463/0xd60 [ext4]
Jan 13 10:21:45 bp-01 kernel: [<ffffffff81162b3e>] do_writepages+0x1e/0x40
Jan 13 10:21:45 bp-01 kernel: [<ffffffff811ef2e0>] __writeback_single_inode+0x40/0x220
Jan 13 10:21:45 bp-01 kernel: [<ffffffff811efd5e>] writeback_sb_inodes+0x25e/0x420
Jan 13 10:21:45 bp-01 kernel: [<ffffffff811f069f>] wb_writeback+0xff/0x2f0
Jan 13 10:21:45 bp-01 kernel: [<ffffffff811f1f15>] bdi_writeback_workfn+0x115/0x450
Jan 13 10:21:45 bp-01 kernel: [<ffffffff810a9346>] ? try_to_wake_up+0x1b6/0x280
Jan 13 10:21:45 bp-01 kernel: [<ffffffff8108f03b>] process_one_work+0x17b/0x470
Jan 13 10:21:45 bp-01 kernel: [<ffffffff8108fe1b>] worker_thread+0x11b/0x400
Jan 13 10:21:45 bp-01 kernel: [<ffffffff8108fd00>] ? rescuer_thread+0x400/0x400
Jan 13 10:21:45 bp-01 kernel: [<ffffffff810971ff>] kthread+0xcf/0xe0
Jan 13 10:21:45 bp-01 kernel: [<ffffffff81097130>] ? kthread_create_on_node+0x140/0x140
Jan 13 10:21:45 bp-01 kernel: [<ffffffff816137bc>] ret_from_fork+0x7c/0xb0
Jan 13 10:21:45 bp-01 kernel: [<ffffffff81097130>] ? kthread_create_on_node+0x140/0x140
Jan 13 10:21:45 bp-01 kernel: INFO: task xdoio:18866 blocked for more than 120 seconds.
Jan 13 10:21:45 bp-01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan 13 10:21:45 bp-01 kernel: xdoio           D ffff880217a94680     0 18866  18865 0x00000080
Jan 13 10:21:45 bp-01 kernel: ffff880211873978 0000000000000082 ffff880204e2c440 ffff880211873fd8
Jan 13 10:21:45 bp-01 kernel: ffff880211873fd8 ffff880211873fd8 ffff880204e2c440 ffff880217a94f48
Jan 13 10:21:45 bp-01 kernel: ffff880211873a00 0000000000000002 ffffffff81155f50 ffff8802118739f0
Jan 13 10:21:45 bp-01 kernel: Call Trace:
Jan 13 10:21:45 bp-01 kernel: [<ffffffff81155f50>] ? wait_on_page_read+0x60/0x60
Jan 13 10:21:45 bp-01 kernel: [<ffffffff8160900d>] io_schedule+0x9d/0x130
Jan 13 10:21:45 bp-01 kernel: [<ffffffff81155f5e>] sleep_on_page+0xe/0x20
Jan 13 10:21:45 bp-01 kernel: [<ffffffff81606f1b>] __wait_on_bit_lock+0x5b/0xc0
Jan 13 10:21:45 bp-01 kernel: [<ffffffff81156078>] __lock_page+0x78/0xa0
Jan 13 10:21:45 bp-01 kernel: [<ffffffff81098210>] ? autoremove_wake_function+0x40/0x40
Jan 13 10:21:45 bp-01 kernel: [<ffffffff811569b4>] __find_lock_page+0x54/0x70
Jan 13 10:21:45 bp-01 kernel: [<ffffffff811574a2>] grab_cache_page_write_begin+0x62/0xd0
Jan 13 10:21:45 bp-01 kernel: [<ffffffffa058821c>] ext4_write_begin+0x9c/0x420 [ext4]
Jan 13 10:21:45 bp-01 kernel: [<ffffffff811564cd>] generic_file_buffered_write+0x11d/0x290
Jan 13 10:21:45 bp-01 kernel: [<ffffffff81158635>] __generic_file_aio_write+0x1d5/0x3e0
Jan 13 10:21:45 bp-01 kernel: [<ffffffff810abe05>] ? sched_clock_cpu+0xb5/0x100
Jan 13 10:21:45 bp-01 kernel: [<ffffffff8115889d>] generic_file_aio_write+0x5d/0xc0
Jan 13 10:21:45 bp-01 kernel: [<ffffffffa057db69>] ext4_file_write+0xa9/0x450 [ext4]
Jan 13 10:21:45 bp-01 kernel: [<ffffffff811e18db>] ? iput+0x3b/0x180
Jan 13 10:21:45 bp-01 kernel: [<ffffffff810d1da5>] ? drop_futex_key_refs.isra.13+0x35/0x70
Jan 13 10:21:45 bp-01 kernel: [<ffffffff811c5a99>] do_sync_readv_writev+0x79/0xd0
Jan 13 10:21:45 bp-01 kernel: [<ffffffff811c6f8e>] do_readv_writev+0xce/0x260
Jan 13 10:21:45 bp-01 kernel: [<ffffffffa057dac0>] ? ext4_file_mmap+0x30/0x30 [ext4]
Jan 13 10:21:45 bp-01 kernel: [<ffffffff811c5950>] ? do_sync_read+0xd0/0xd0
Jan 13 10:21:45 bp-01 kernel: [<ffffffff811c71b5>] vfs_writev+0x35/0x60
Jan 13 10:21:45 bp-01 kernel: [<ffffffff811c730c>] SyS_writev+0x5c/0xd0
Jan 13 10:21:45 bp-01 kernel: [<ffffffff81613869>] system_call_fastpath+0x16/0x1b
Jan 13 10:21:45 bp-01 kernel: INFO: task sync:18962 blocked for more than 120 seconds.
Jan 13 10:21:45 bp-01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan 13 10:21:45 bp-01 kernel: sync            D ffff88041ea34680     0 18962  18961 0x00000080
Jan 13 10:21:45 bp-01 kernel: ffff880036d6fd50 0000000000000086 ffff880204e2cfa0 ffff880036d6ffd8
Jan 13 10:21:45 bp-01 kernel: ffff880036d6ffd8 ffff880036d6ffd8 ffff880204e2cfa0 ffff880036d6fe80
Jan 13 10:21:45 bp-01 kernel: ffff880036d6fe88 7fffffffffffffff ffff880204e2cfa0 ffffffff811f5d00
Jan 13 10:21:45 bp-01 kernel: Call Trace:
Jan 13 10:21:45 bp-01 kernel: [<ffffffff811f5d00>] ? generic_write_sync+0x60/0x60
Jan 13 10:21:45 bp-01 kernel: [<ffffffff81608d09>] schedule+0x29/0x70
Jan 13 10:21:45 bp-01 kernel: [<ffffffff81606c59>] schedule_timeout+0x209/0x2d0
Jan 13 10:21:45 bp-01 kernel: [<ffffffff8108cf76>] ? __queue_work+0x136/0x320
Jan 13 10:21:45 bp-01 kernel: [<ffffffff8108d22a>] ? __queue_delayed_work+0xaa/0x1a0
Jan 13 10:21:45 bp-01 kernel: [<ffffffff8108d591>] ? try_to_grab_pending+0xb1/0x160
Jan 13 10:21:45 bp-01 kernel: [<ffffffff811f5d00>] ? generic_write_sync+0x60/0x60
Jan 13 10:21:45 bp-01 kernel: [<ffffffff81609206>] wait_for_completion+0x116/0x170
Jan 13 10:21:45 bp-01 kernel: [<ffffffff810a9470>] ? wake_up_state+0x20/0x20
Jan 13 10:21:45 bp-01 kernel: [<ffffffff811ef787>] sync_inodes_sb+0xb7/0x1e0
Jan 13 10:21:45 bp-01 kernel: [<ffffffff811f5d00>] ? generic_write_sync+0x60/0x60
Jan 13 10:21:45 bp-01 kernel: [<ffffffff811f5d19>] sync_inodes_one_sb+0x19/0x20
Jan 13 10:21:45 bp-01 kernel: [<ffffffff811c9b22>] iterate_supers+0xb2/0x110
Jan 13 10:21:45 bp-01 kernel: [<ffffffff811f5fd4>] sys_sync+0x44/0xb0
Jan 13 10:21:45 bp-01 kernel: [<ffffffff81613869>] system_call_fastpath+0x16/0x1b
Jan 13 10:21:57 bp-01 lvm[1125]: helter_skelter/syncd_primary_2legs_1: Converted: 0.0%
Jan 13 10:22:12 bp-01 lvm[1125]: helter_skelter/syncd_primary_2legs_1: Converted: 0.0%

Comment 11 Jonathan Earl Brassow 2015-01-13 22:37:54 UTC
This is easily reproducible with the following steps (no STS tests required):
[root@bp-01 ~]# devices vg
  LV            Attr       Cpy%Sync Devices
  lv            mwi-a-m--- 100.00   lv_mimage_0(0),lv_mimage_1(0)
  [lv_mimage_0] iwi-aom---          /dev/sdb1(0)
  [lv_mimage_1] iwi-aom---          /dev/sdc1(0)
  [lv_mlog]     lwi-aom---          /dev/sdi1(0)
[root@bp-01 ~]# off.sh sdb
Turning off sdb
[root@bp-01 ~]# killall -9 dmeventd
[root@bp-01 ~]# dd if=/dev/zero of=/dev/vg/lv bs=4M count=1&
[1] 3576
[root@bp-01 ~]# dmsetup status
rhel_bp--01-home: 0 854237184 linear
vg-lv: 0 1024000 mirror 2 253:4 253:5 1000/1000 1 DA 3 disk 253:3 A
vg-lv_mimage_1: 0 1024000 linear
vg-lv_mimage_0: 0 1024000 linear
rhel_bp--01-swap: 0 16515072 linear
rhel_bp--01-root: 0 104857600 linear
vg-lv_mlog: 0 8192 linear
[root@bp-01 ~]# lvconvert --repair vg/lv
  /dev/sdb1: read failed after 0 of 512 at 4096: Input/output error
  WARNING: Failed to write an MDA of VG vg.
  WARNING: Failed to write an MDA of VG vg.
1+0 records in
1+0 records out
4194304 bytes (4.2 MB) copied, 14.0624 s, 298 kB/s
  WARNING: Failed to write an MDA of VG vg.
  WARNING: Failed to write an MDA of VG vg.
  WARNING: Failed to write an MDA of VG vg.
  Mirror status: 1 of 2 images failed.
Attempt to replace failed mirror images (requires full device resync)? [y/n]: y
  Trying to up-convert to 2 images, 1 logs.
  WARNING: Failed to write an MDA of VG vg.
  WARNING: Failed to write an MDA of VG vg.
  vg/lv: Converted: 0.0%
  vg/lv: Converted: 0.0%
  vg/lv: Converted: 0.0%
... forever.

Someone has broken the mirror repair code, it seems.

Comment 12 Jonathan Earl Brassow 2015-01-13 22:38:56 UTC
Status after the failure is interesting...

[root@bp-01 ~]# dmsetup status
rhel_bp--01-home: 0 854237184 linear
vg-lv: 0 1024000 mirror 2 253:4 253:5 0/1000 1 AF 3 disk 253:3 A
vg-lv_mimage_1: 0 1024000 linear
vg-lv_mimage_0: 0 1024000 linear
rhel_bp--01-swap: 0 16515072 linear
rhel_bp--01-root: 0 104857600 linear
vg-lv_mlog: 0 8192 linear

Comment 13 Jonathan Earl Brassow 2015-01-14 16:12:24 UTC
Created attachment 980097 [details]
manual repair with captured output following device failure

This is the captured output of a hung lvconvert --repair process when the fault_policy is set to "allocate" after the primary image of a mirror is failed.

The problem is clearly that the failed device is being used to allocate the replacement image.

Comment 14 Jonathan Earl Brassow 2015-01-14 16:14:45 UTC
Created attachment 980099 [details]
Performing upconvert when there is a failed PV present.

After a successful repair of a mirror after a device failure when using the "remove" fault_policy, we attempt to up-convert the linear device back to a mirror manually.  Again the result is to choose the failed PV.

Comment 15 Petr Rockai 2015-01-14 17:08:24 UTC
Fix in eac4e1e939c6fe7eba0d2c3865e25b8ed43fe6d9 -- we call lvscan --cache in the dmeventd plugin for RAID volumes but have neglected to do the same for old-style mirror volumes.

Comment 17 Nenad Peric 2015-01-22 14:34:46 UTC
[root@tardis-03 mnt]# dmsetup status
vg-mirror_lv_mlog: 0 8192 linear 
rhel_tardis--03-home: 0 471474176 linear 
rhel_tardis--03-swap: 0 8388608 linear 
rhel_tardis--03-root: 0 104857600 linear 
vg-mirror_lv: 0 6291456 mirror 2 253:4 253:5 6143/6144 1 DA 3 disk 253:3 A
vg-mirror_lv_mimage_1: 0 6291456 linear 
vg-mirror_lv_mimage_0: 0 6291456 linear 

[root@tardis-03 mnt]# lvconvert --repair vg/mirror_lv
  /dev/sdd1: read failed after 0 of 1024 at 99994566656: Input/output error
  /dev/sdd1: read failed after 0 of 1024 at 99994685440: Input/output error
  /dev/sdd1: read failed after 0 of 1024 at 0: Input/output error
  /dev/sdd1: read failed after 0 of 1024 at 4096: Input/output error
  /dev/sdd1: read failed after 0 of 2048 at 0: Input/output error
  Couldn't find device with uuid MEfKLO-Bh2r-03R4-XvFE-9YFN-w65N-A1ra4S.
  Mirror status: 1 of 2 images failed.
Attempt to replace failed mirror images (requires full device resync)? [y/n]: y
  Trying to up-convert to 2 images, 1 logs.
  vg/mirror_lv: Converted: 0.5%
  vg/mirror_lv: Converted: 22.3%
  vg/mirror_lv: Converted: 42.6%
  vg/mirror_lv: Converted: 56.1%
  vg/mirror_lv: Converted: 65.0%
  vg/mirror_lv: Converted: 74.0%
  vg/mirror_lv: Converted: 82.9%
  vg/mirror_lv: Converted: 92.1%
  vg/mirror_lv: Converted: 100.0%
[root@tardis-03 mnt]# 


marking VERIFIED with:

lvm2-2.02.115-2.el7.x86_64

Comment 19 errata-xmlrpc 2015-03-05 13:08:01 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-0513.html