RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1366760 - converted raid1 -> mirror takeover volume deadlocks during first mimage failure attempt
Summary: converted raid1 -> mirror takeover volume deadlocks during first mimage failu...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: lvm2
Version: 7.3
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: rc
: ---
Assignee: LVM and device-mapper development team
QA Contact: cluster-qe@redhat.com
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-08-12 17:50 UTC by Corey Marthaler
Modified: 2021-09-08 18:54 UTC (History)
9 users (show)

Fixed In Version: lvm2-2.02.164-4.el7
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-11-04 04:17:39 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
syslog during failure (135.63 KB, text/plain)
2016-08-12 22:10 UTC, Corey Marthaler
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2016:1445 0 normal SHIPPED_LIVE lvm2 bug fix and enhancement update 2016-11-03 13:46:41 UTC

Description Corey Marthaler 2016-08-12 17:50:20 UTC
Description of problem:
This failure scenario was run on a mirror that had been converted from a raid1 volume.



mirror device(s) (transform) found in revolution_1 on host-082
Creating ext on top of mirror(s) on host-082...
mke2fs 1.42.9 (28-Dec-2013)
Mounting mirrored ext filesystems on host-082...

================================================================================
Iteration 0.1 started at Fri Aug 12 12:31:17 CDT 2016
================================================================================
ACTUAL LEG ORDER: /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1
Scenario kill_non_primary_leg: Kill non primary leg
********* Mirror info for this scenario *********
* mirrors:            transform
* leg devices:        /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1
* log devices:        /dev/sdh1
* failpv(s):          /dev/sdc1
* failnode(s):        host-082
* lvmetad:            1
* leg fault policy:   remove
* log fault policy:   allocate
*************************************************


Current mirror/raid device structure(s):
  LV                   Attr       LSize   Cpy%Sync Devices                                                                                
   transform            Mwi-aom---  10.00g 100.00   transform_mimage_0(0),transform_mimage_1(0),transform_mimage_2(0),transform_mimage_3(0)
   [transform_mimage_0] iwi-aom---  10.00g          /dev/sda1(1)                                                                           
   [transform_mimage_1] iwi-aom---  10.00g          /dev/sdb1(1)                                                                           
   [transform_mimage_2] iwi-aom---  10.00g          /dev/sdc1(1)                                                                           
   [transform_mimage_3] iwi-aom---  10.00g          /dev/sdd1(1)                                                                           
   [transform_mlog]     lwn-aom---   4.00m          /dev/sdh1(0)                                                                           
   root                 -wi-ao----   6.67g          /dev/vda2(205)                                                                         
   swap                 -wi-ao---- 820.00m          /dev/vda2(0)                                                                           


PV=/dev/sdc1
        transform_mimage_2: 6
PV=/dev/sdc1
        transform_mimage_2: 6

Writing verification files (checkit) to mirror(s) on...
        ---- host-082 ----

Sleeping 15 seconds to get some outsanding I/O locks before the failure 
Verifying files (checkit) on mirror(s) on...
        ---- host-082 ----


Disabling device sdc on host-082

Getting recovery check start time from /var/log/messages: Aug 12 12:31
Attempting I/O to cause mirror down conversion(s) on host-082
dd if=/dev/zero of=/mnt/transform/ddfile count=10 bs=4M

[DEADLOCK]

Aug 12 12:34:56 host-082 kernel: INFO: task kworker/u2:2:26234 blocked for more than 120 seconds.
Aug 12 12:34:56 host-082 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 12 12:34:56 host-082 kernel: kworker/u2:2    D ffff88001c412030     0 26234      2 0x00000080
Aug 12 12:34:56 host-082 kernel: Workqueue: writeback bdi_writeback_workfn (flush-253:10)
Aug 12 12:34:56 host-082 kernel: ffff88003d7938e8 0000000000000046 ffff88001b592f10 ffff88003d793fd8
Aug 12 12:34:56 host-082 kernel: ffff88003d793fd8 ffff88003d793fd8 ffff88001b592f10 ffff88003a2ad000
Aug 12 12:34:56 host-082 kernel: ffff88003a2ad078 0000000000000007 ffff88001c413000 ffff88001c412030
Aug 12 12:34:56 host-082 kernel: Call Trace:
Aug 12 12:34:56 host-082 kernel: [<ffffffff816933d9>] schedule+0x29/0x70
Aug 12 12:34:56 host-082 kernel: [<ffffffffa041b085>] wait_transaction_locked+0x85/0xd0 [jbd2]
Aug 12 12:34:56 host-082 kernel: [<ffffffff810b12d0>] ? wake_up_atomic_t+0x30/0x30
Aug 12 12:34:56 host-082 kernel: [<ffffffffa041b400>] start_this_handle+0x2b0/0x5d0 [jbd2]
Aug 12 12:34:56 host-082 kernel: [<ffffffff815d6a00>] ? tcp_write_xmit+0x2b0/0xce0
Aug 12 12:34:56 host-082 kernel: [<ffffffff811dba1a>] ? kmem_cache_alloc+0x1ba/0x1e0
Aug 12 12:34:56 host-082 kernel: [<ffffffffa041b913>] jbd2__journal_start+0xf3/0x1e0 [jbd2]
Aug 12 12:34:56 host-082 kernel: [<ffffffffa044283c>] ? ext4_writepages+0x42c/0xd60 [ext4]
Aug 12 12:34:56 host-082 kernel: [<ffffffffa0473989>] __ext4_journal_start_sb+0x69/0xe0 [ext4]
Aug 12 12:34:56 host-082 kernel: [<ffffffffa044283c>] ext4_writepages+0x42c/0xd60 [ext4]
Aug 12 12:34:56 host-082 kernel: [<ffffffff8118c04e>] do_writepages+0x1e/0x40
Aug 12 12:34:56 host-082 kernel: [<ffffffff81226510>] __writeback_single_inode+0x40/0x210
Aug 12 12:34:56 host-082 kernel: [<ffffffff812271fe>] writeback_sb_inodes+0x25e/0x420
Aug 12 12:34:56 host-082 kernel: [<ffffffff8122745f>] __writeback_inodes_wb+0x9f/0xd0
Aug 12 12:34:56 host-082 kernel: [<ffffffff81227ca3>] wb_writeback+0x263/0x2f0
Aug 12 12:34:56 host-082 kernel: [<ffffffff8118b6f0>] ? bdi_dirty_limit+0x40/0xe0
Aug 12 12:34:56 host-082 kernel: [<ffffffff81229b9c>] bdi_writeback_workfn+0x1cc/0x460
Aug 12 12:34:56 host-082 kernel: [<ffffffff810a7c2b>] process_one_work+0x17b/0x470
Aug 12 12:34:56 host-082 kernel: [<ffffffff810a8a66>] worker_thread+0x126/0x410
Aug 12 12:34:56 host-082 kernel: [<ffffffff810a8940>] ? rescuer_thread+0x460/0x460
Aug 12 12:34:56 host-082 kernel: [<ffffffff810b021f>] kthread+0xcf/0xe0
Aug 12 12:34:56 host-082 kernel: [<ffffffff810b0150>] ? kthread_create_on_node+0x140/0x140
Aug 12 12:34:56 host-082 kernel: [<ffffffff8169e198>] ret_from_fork+0x58/0x90
Aug 12 12:34:56 host-082 kernel: [<ffffffff810b0150>] ? kthread_create_on_node+0x140/0x140
Aug 12 12:34:56 host-082 kernel: INFO: task jbd2/dm-10-8:26666 blocked for more than 120 seconds.
Aug 12 12:34:56 host-082 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 12 12:34:56 host-082 kernel: jbd2/dm-10-8    D ffff88001c413000     0 26666      2 0x00000080
Aug 12 12:34:56 host-082 kernel: ffff88003b7efc90 0000000000000046 ffff88003cc8edd0 ffff88003b7effd8
Aug 12 12:34:56 host-082 kernel: ffff88003b7effd8 ffff88003b7effd8 ffff88003cc8edd0 ffff88003b7efda0
Aug 12 12:34:56 host-082 kernel: ffff88003a2ad0c0 ffff88003cc8edd0 ffff88003b7efd88 ffff88001c413000
Aug 12 12:34:56 host-082 kernel: Call Trace:
Aug 12 12:34:56 host-082 kernel: [<ffffffff816933d9>] schedule+0x29/0x70
Aug 12 12:34:56 host-082 kernel: [<ffffffffa041e0e8>] jbd2_journal_commit_transaction+0x248/0x1990 [jbd2]
Aug 12 12:34:56 host-082 kernel: [<ffffffff810ca57e>] ? account_entity_dequeue+0xae/0xd0
Aug 12 12:34:56 host-082 kernel: [<ffffffff810ce06c>] ? dequeue_entity+0x11c/0x5d0
Aug 12 12:34:56 host-082 kernel: [<ffffffff81060abf>] ? kvm_clock_read+0x1f/0x30
Aug 12 12:34:56 host-082 kernel: [<ffffffff81029569>] ? __switch_to+0xd9/0x4c0
Aug 12 12:34:56 host-082 kernel: [<ffffffff810b12d0>] ? wake_up_atomic_t+0x30/0x30
Aug 12 12:34:56 host-082 kernel: [<ffffffff81097dce>] ? try_to_del_timer_sync+0x5e/0x90
Aug 12 12:34:56 host-082 kernel: [<ffffffffa0423d39>] kjournald2+0xc9/0x260 [jbd2]
Aug 12 12:34:56 host-082 kernel: [<ffffffff810b12d0>] ? wake_up_atomic_t+0x30/0x30
Aug 12 12:34:56 host-082 kernel: [<ffffffffa0423c70>] ? commit_timeout+0x10/0x10 [jbd2]
Aug 12 12:34:56 host-082 kernel: [<ffffffff810b021f>] kthread+0xcf/0xe0
Aug 12 12:34:56 host-082 kernel: [<ffffffff810b0150>] ? kthread_create_on_node+0x140/0x140
Aug 12 12:34:56 host-082 kernel: [<ffffffff8169e198>] ret_from_fork+0x58/0x90
Aug 12 12:34:56 host-082 kernel: [<ffffffff810b0150>] ? kthread_create_on_node+0x140/0x140
Aug 12 12:34:56 host-082 kernel: INFO: task ext4lazyinit:26668 blocked for more than 120 seconds.
Aug 12 12:34:56 host-082 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 12 12:34:56 host-082 kernel: ext4lazyinit    D 0000000000001000     0 26668      2 0x00000080
Aug 12 12:34:56 host-082 kernel: ffff88002904fbf0 0000000000000046 ffff88003cc8de20 ffff88002904ffd8
Aug 12 12:34:56 host-082 kernel: ffff88002904ffd8 ffff88002904ffd8 ffff88003cc8de20 ffff88003fc16c40
Aug 12 12:34:56 host-082 kernel: 0000000000000000 7fffffffffffffff ffff88003cc8de20 0000000000001000
Aug 12 12:34:56 host-082 kernel: Call Trace:
Aug 12 12:34:56 host-082 kernel: [<ffffffff816933d9>] schedule+0x29/0x70
Aug 12 12:34:56 host-082 kernel: [<ffffffff81690e29>] schedule_timeout+0x239/0x2d0
Aug 12 12:34:56 host-082 kernel: [<ffffffff812e0000>] ? hash_recvmsg+0x80/0x100
Aug 12 12:34:56 host-082 kernel: [<ffffffff81060aef>] ? kvm_clock_get_cycles+0x1f/0x30
Aug 12 12:34:56 host-082 kernel: [<ffffffff810e903c>] ? ktime_get_ts64+0x4c/0xf0
Aug 12 12:34:56 host-082 kernel: [<ffffffff8169297e>] io_schedule_timeout+0xae/0x130
Aug 12 12:34:56 host-082 kernel: [<ffffffff81693aa6>] wait_for_completion_io+0x116/0x170
Aug 12 12:34:56 host-082 kernel: [<ffffffff810c4b90>] ? wake_up_state+0x20/0x20
Aug 12 12:34:56 host-082 kernel: [<ffffffff812f4cec>] blkdev_issue_zeroout+0x24c/0x260
Aug 12 12:34:56 host-082 kernel: [<ffffffffa041cdb2>] ? jbd2_journal_get_write_access+0x32/0x40 [jbd2]
Aug 12 12:34:56 host-082 kernel: [<ffffffffa043c86c>] ext4_init_inode_table+0x17c/0x3c0 [ext4]
Aug 12 12:34:56 host-082 kernel: [<ffffffffa0456985>] ext4_lazyinit_thread+0x275/0x2f0 [ext4]
Aug 12 12:34:56 host-082 kernel: [<ffffffffa0456710>] ? ext4_unregister_li_request+0x60/0x60 [ext4]
Aug 12 12:34:56 host-082 kernel: [<ffffffff810b021f>] kthread+0xcf/0xe0
Aug 12 12:34:56 host-082 kernel: [<ffffffff810b0150>] ? kthread_create_on_node+0x140/0x140
Aug 12 12:34:56 host-082 kernel: [<ffffffff8169e198>] ret_from_fork+0x58/0x90
Aug 12 12:34:56 host-082 kernel: [<ffffffff810b0150>] ? kthread_create_on_node+0x140/0x140
Aug 12 12:34:56 host-082 kernel: INFO: task xdoio:26824 blocked for more than 120 seconds.


Version-Release number of selected component (if applicable):
3.10.0-489.el7.x86_64

lvm2-2.02.163-1.el7    BUILT: Wed Aug 10 06:53:21 CDT 2016
lvm2-libs-2.02.163-1.el7    BUILT: Wed Aug 10 06:53:21 CDT 2016
lvm2-cluster-2.02.163-1.el7    BUILT: Wed Aug 10 06:53:21 CDT 2016
device-mapper-1.02.133-1.el7    BUILT: Wed Aug 10 06:53:21 CDT 2016
device-mapper-libs-1.02.133-1.el7    BUILT: Wed Aug 10 06:53:21 CDT 2016
device-mapper-event-1.02.133-1.el7    BUILT: Wed Aug 10 06:53:21 CDT 2016
device-mapper-event-libs-1.02.133-1.el7    BUILT: Wed Aug 10 06:53:21 CDT 2016
device-mapper-persistent-data-0.6.3-1.el7    BUILT: Fri Jul 22 05:29:13 CDT 2016
cmirror-2.02.163-1.el7    BUILT: Wed Aug 10 06:53:21 CDT 2016
sanlock-3.4.0-1.el7    BUILT: Fri Jun 10 11:41:03 CDT 2016
sanlock-lib-3.4.0-1.el7    BUILT: Fri Jun 10 11:41:03 CDT 2016
lvm2-lockd-2.02.163-1.el7    BUILT: Wed Aug 10 06:53:21 CDT 2016

Comment 2 Corey Marthaler 2016-08-12 22:09:03 UTC
This is reproducible. Fresh reboot, no existing or left over volumes/dm devices.


[root@host-082 ~]# lvcreate --type raid1 -L 3G -m 3 -n mirror_1 revolution_9
  Logical volume "mirror_1" created.

[root@host-082 ~]# lvs -a -o +devices
  LV                  VG            Attr       LSize Log Cpy%Sync Devices
  mirror_1            revolution_9  rwi-a-r--- 3.00g     100.00   mirror_1_rimage_0(0),mirror_1_rimage_1(0),mirror_1_rimage_2(0),mirror_1_rimage_3(0)
  [mirror_1_rimage_0] revolution_9  iwi-aor--- 3.00g              /dev/sda1(1)
  [mirror_1_rimage_1] revolution_9  iwi-aor--- 3.00g              /dev/sdb1(1)
  [mirror_1_rimage_2] revolution_9  iwi-aor--- 3.00g              /dev/sdc1(1)
  [mirror_1_rimage_3] revolution_9  iwi-aor--- 3.00g              /dev/sdd1(1)
  [mirror_1_rmeta_0]  revolution_9  ewi-aor--- 4.00m              /dev/sda1(0)
  [mirror_1_rmeta_1]  revolution_9  ewi-aor--- 4.00m              /dev/sdb1(0)
  [mirror_1_rmeta_2]  revolution_9  ewi-aor--- 4.00m              /dev/sdc1(0)
  [mirror_1_rmeta_3]  revolution_9  ewi-aor--- 4.00m              /dev/sdd1(0)

[root@host-082 ~]# lvconvert --type mirror --mirrorlog disk revolution_9/mirror_1
  Using default stripesize 64.00 KiB.
Are you sure you want to convert revolution_9/mirror_1 back to the older "mirror" type? [y/n]: y
  Monitoring revolution_9/mirror_1 failed.
  Logical volume revolution_9/mirror_1 successfully converted.

[root@host-082 ~]# lvs -a -o +devices
  LV                  VG            Attr       LSize Log             Cpy%Sync Devices
  mirror_1            revolution_9  mwi-a-m--- 3.00g [mirror_1_mlog] 100.00   mirror_1_mimage_0(0),mirror_1_mimage_1(0),mirror_1_mimage_2(0),mirror_1_mimage_3(0)
  [mirror_1_mimage_0] revolution_9  iwi-aom--- 3.00g                          /dev/sda1(1)
  [mirror_1_mimage_1] revolution_9  iwi-aom--- 3.00g                          /dev/sdb1(1)
  [mirror_1_mimage_2] revolution_9  iwi-aom--- 3.00g                          /dev/sdc1(1)
  [mirror_1_mimage_3] revolution_9  iwi-aom--- 3.00g                          /dev/sdd1(1)
  [mirror_1_mlog]     revolution_9  lwn-aom--- 4.00m                          /dev/sdh1(0)


mirror device(s) (mirror_1) found in revolution_9 on host-082
Creating ext on top of mirror(s) on host-082...
mke2fs 1.42.9 (28-Dec-2013)
Mounting mirrored ext filesystems on host-082...

================================================================================
Iteration 0.1 started at Fri Aug 12 16:26:43 CDT 2016
================================================================================
ACTUAL LEG ORDER: /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1
Scenario kill_random_legs: Kill random legs
********* Mirror info for this scenario *********
* mirrors:            mirror_1
* leg devices:        /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1
* log devices:        /dev/sdh1
* failpv(s):          /dev/sda1
* failnode(s):        host-082
* lvmetad:            1
* leg fault policy:   remove
* log fault policy:   allocate
*************************************************


Current mirror/raid device structure(s):
  LV                  Attr       LSize   Cpy%Sync Devices
  mirror_1            mwi-aom---   3.00g 100.00   mirror_1_mimage_0(0),mirror_1_mimage_1(0),mirror_1_mimage_2(0),mirror_1_mimage_3(0)
  [mirror_1_mimage_0] iwi-aom---   3.00g          /dev/sda1(1)
  [mirror_1_mimage_1] iwi-aom---   3.00g          /dev/sdb1(1)
  [mirror_1_mimage_2] iwi-aom---   3.00g          /dev/sdc1(1)
  [mirror_1_mimage_3] iwi-aom---   3.00g          /dev/sdd1(1)
  [mirror_1_mlog]     lwn-aom---   4.00m          /dev/sdh1(0)

PV=/dev/sda1
        mirror_1_mimage_0: 6
PV=/dev/sda1
        mirror_1_mimage_0: 6

Writing verification files (checkit) to mirror(s) on...
        ---- host-082 ----

Sleeping 15 seconds to get some outsanding I/O locks before the failure 
Verifying files (checkit) on mirror(s) on...
        ---- host-082 ----

Disabling device sda on host-082rescan device...
  /dev/sda1: read failed after 0 of 4096 at 26838958080: Input/output error
  /dev/sda1: read failed after 0 of 4096 at 26839048192: Input/output error
  /dev/sda1: read failed after 0 of 4096 at 0: Input/output error
  /dev/sda1: read failed after 0 of 4096 at 4096: Input/output error

Getting recovery check start time from /var/log/messages: Aug 12 16:27
Attempting I/O to cause mirror down conversion(s) on host-082
dd if=/dev/zero of=/mnt/mirror_1/ddfile count=10 bs=4M
[I/O STUCK]


[root@host-082 ~]# lvs -a -o +devices
 WARNING: Device for PV GLbWf5-Q3Ep-xvwf-d78T-1RNX-mxzP-OUt9Bj not found or rejected by a filter.
 WARNING: Couldn't find all devices for LV revolution_9/mirror_1_mimage_0 while checking used and assumed devices.
 LV                  VG            Attr       LSize  Log             Cpy%Sync Devices
 mirror_1            revolution_9  mwi-aom-p- 3.00g  [mirror_1_mlog] 99.87    mirror_1_mimage_0(0),mirror_1_mimage_1(0),mirror_1_mimage_2(0),mirror_1_mimage_3(0)
 [mirror_1_mimage_0] revolution_9  Iwi-aom-p- 3.00g                           [unknown](1)
 [mirror_1_mimage_1] revolution_9  Iwi-aom--- 3.00g                           /dev/sdb1(1)
 [mirror_1_mimage_2] revolution_9  Iwi-aom--- 3.00g                           /dev/sdc1(1)
 [mirror_1_mimage_3] revolution_9  Iwi-aom--- 3.00g                           /dev/sdd1(1)
 [mirror_1_mlog]     revolution_9  lwn-aom--- 4.00m                           /dev/sdh1(0)

[root@host-082 ~]# dmsetup status
revolution_9-mirror_1: 0 6291456 mirror 4 253:3 253:5 253:7 253:9 6141/6144 1 DAAA 3 disk 253:11 A
revolution_9-mirror_1_mlog: 0 8192 linear 
revolution_9-mirror_1_mimage_3: 0 6291456 linear 
revolution_9-mirror_1_mimage_2: 0 6291456 linear 
revolution_9-mirror_1_mimage_1: 0 6291456 linear 
revolution_9-mirror_1_mimage_0: 0 6291456 linear

Comment 3 Corey Marthaler 2016-08-12 22:10:46 UTC
Created attachment 1190585 [details]
syslog during failure

Comment 5 Corey Marthaler 2016-08-19 16:42:19 UTC
This continues to exist in the latest test kernel/lvm rpms.

3.10.0-493.el7.bz1367223.x86_64

lvm2-2.02.164-2.el7    BUILT: Tue Aug 16 05:43:50 CDT 2016
lvm2-libs-2.02.164-2.el7    BUILT: Tue Aug 16 05:43:50 CDT 2016
lvm2-cluster-2.02.164-2.el7    BUILT: Tue Aug 16 05:43:50 CDT 2016
device-mapper-1.02.133-2.el7    BUILT: Tue Aug 16 05:43:50 CDT 2016
device-mapper-libs-1.02.133-2.el7    BUILT: Tue Aug 16 05:43:50 CDT 2016
device-mapper-event-1.02.133-2.el7    BUILT: Tue Aug 16 05:43:50 CDT 2016
device-mapper-event-libs-1.02.133-2.el7    BUILT: Tue Aug 16 05:43:50 CDT 2016
device-mapper-persistent-data-0.6.3-1.el7    BUILT: Fri Jul 22 05:29:13 CDT 2016
cmirror-2.02.164-2.el7    BUILT: Tue Aug 16 05:43:50 CDT 2016
sanlock-3.4.0-1.el7    BUILT: Fri Jun 10 11:41:03 CDT 2016
sanlock-lib-3.4.0-1.el7    BUILT: Fri Jun 10 11:41:03 CDT 2016
lvm2-lockd-2.02.164-2.el7    BUILT: Tue Aug 16 05:43:50 CDT 2016




# Create original raid1 volume                                                                                                                                                                                                                                          
[root@host-126 ~]# lvcreate --type raid1 -L 5G -n transform -m 2 revolution_1                                                                                                                                                                                           
  Logical volume "transform" created.                                                                                                                                                                                                                                   
                                                                                                                                                                                                                                                                        
[root@host-126 ~]# lvs -a -o +devices                                                                                                                                                                                                                                   
  LV                   VG            Attr       LSize Cpy%Sync Devices                                                                                                                                                                                                  
  transform            revolution_1  rwi-a-r--- 5.00g 8.20     transform_rimage_0(0),transform_rimage_1(0),transform_rimage_2(0)                                                                                                                                        
  [transform_rimage_0] revolution_1  Iwi-aor--- 5.00g          /dev/sdb1(1)                                                                                                                                                                                             
  [transform_rimage_1] revolution_1  Iwi-aor--- 5.00g          /dev/sdg1(1)                                                                                                                                                                                             
  [transform_rimage_2] revolution_1  Iwi-aor--- 5.00g          /dev/sda1(1)                                                                                                                                                                                             
  [transform_rmeta_0]  revolution_1  ewi-aor--- 4.00m          /dev/sdb1(0)                                                                                                                                                                                             
  [transform_rmeta_1]  revolution_1  ewi-aor--- 4.00m          /dev/sdg1(0)                                                                                                                                                                                             
  [transform_rmeta_2]  revolution_1  ewi-aor--- 4.00m          /dev/sda1(0)                                                                                                                                                                                             
                                                                                                                                                                                                                                                                        
# Wait until it's 100% synced and then convert to mirror type                                                                                                                                                                                                           
[root@host-126 ~]# lvconvert --type mirror revolution_1/transform                                                                                                                                                                                                       
Are you sure you want to convert revolution_1/transform back to the older "mirror" type? [y/n]: y                                                                                                                                                                       
  Monitoring revolution_1/transform failed.                                                                                                                                                                                                                             
  Logical volume revolution_1/transform successfully converted.                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                        
[root@host-126 ~]# lvs -a -o +devices                                                                                                                                                                                                                                   
  LV                   VG            Attr       LSize  Log              Cpy%Sync Devices                                                                                                                                                                                
  transform            revolution_1  mwi-a-m--- 5.00g  [transform_mlog] 100.00   transform_mimage_0(0),transform_mimage_1(0),transform_mimage_2(0)
  [transform_mimage_0] revolution_1  iwi-aom--- 5.00g                            /dev/sdb1(1)
  [transform_mimage_1] revolution_1  iwi-aom--- 5.00g                            /dev/sdg1(1)
  [transform_mimage_2] revolution_1  iwi-aom--- 5.00g                            /dev/sda1(1)
  [transform_mlog]     revolution_1  lwn-aom--- 4.00m                            /dev/sdh1(0)


# Start failure test

mirror device(s) (transform) found in revolution_1 on host-126                                                                                                                                                                                   
Creating ext on top of mirror(s) on host-126...                                                                                                                                                                                                  
mke2fs 1.42.9 (28-Dec-2013)                                                                                                                                                                                                                      
Mounting mirrored ext filesystems on host-126...                                                                                                                                                                                                 
                                                                                                                                                                                                                                                 
================================================================================                                                                                                                                                                 
Iteration 0.1 started at Fri Aug 19 11:25:36 CDT 2016                                                                                                                                                                                            
================================================================================                                                                                                                                                                 
ACTUAL LEG ORDER: /dev/sdb1 /dev/sdg1 /dev/sda1                                                                                                                                                                                                  
Scenario kill_non_primary_leg: Kill non primary leg                                                                                                                                                                                              
********* Mirror info for this scenario *********
* mirrors:            transform
* leg devices:        /dev/sdb1 /dev/sdg1 /dev/sda1
* log devices:        /dev/sdh1
* failpv(s):          /dev/sda1
* failnode(s):        host-126
* lvmetad:            1
* leg fault policy:   remove
* log fault policy:   allocate
*************************************************


Current mirror/raid device structure(s):
  LV                   Attr       LSize   Cpy%Sync Devices                                                          
   transform            mwi-aom---   5.00g 100.00   transform_mimage_0(0),transform_mimage_1(0),transform_mimage_2(0)
   [transform_mimage_0] iwi-aom---   5.00g          /dev/sdb1(1)                                                     
   [transform_mimage_1] iwi-aom---   5.00g          /dev/sdg1(1)                                                     
   [transform_mimage_2] iwi-aom---   5.00g          /dev/sda1(1)                                                     
   [transform_mlog]     lwn-aom---   4.00m          /dev/sdh1(0)                                                     

PV=/dev/sda1
        transform_mimage_2: 6
PV=/dev/sda1
        transform_mimage_2: 6

Writing verification files (checkit) to mirror(s) on...
        ---- host-126 ----

Sleeping 15 seconds to get some outsanding I/O locks before the failure 
Verifying files (checkit) on mirror(s) on...
        ---- host-126 ----


Disabling device sda on host-126
rescan (pvscan --cache) device...
  /dev/sda1: read failed after 0 of 512 at 21467824128: Input/output error
  /dev/sda1: read failed after 0 of 512 at 21467938816: Input/output error
  /dev/sda1: read failed after 0 of 512 at 0: Input/output error
  /dev/sda1: read failed after 0 of 512 at 4096: Input/output error
  /dev/sda1: read failed after 0 of 2048 at 0: Input/output error


Getting recovery check start time from /var/log/messages: Aug 19 11:26
Attempting I/O to cause mirror down conversion(s) on host-126
dd if=/dev/zero of=/mnt/transform/ddfile count=10 bs=4M

# Recovery is now stuck at 99.92% ??
[root@host-126 ~]# lvs -a -o +devices
  WARNING: Device for PV Wc2ieE-cV55-bd2L-l319-FsTq-snCa-JyPt0A not found or rejected by a filter.
  WARNING: Couldn't find all devices for LV revolution_1/transform_mimage_2 while checking used and assumed devices.
  LV                   VG            Attr       LSize    Log              Cpy%Sync Devices
  transform            revolution_1  mwi-aom-p-   5.00g  [transform_mlog] 99.92    transform_mimage_0(0),transform_mimage_1(0),transform_mimage_2(0)
  [transform_mimage_0] revolution_1  Iwi-aom---   5.00g                            /dev/sdb1(1)
  [transform_mimage_1] revolution_1  Iwi-aom---   5.00g                            /dev/sdg1(1)
  [transform_mimage_2] revolution_1  Iwi-aom-p-   5.00g                            [unknown](1)
  [transform_mlog]     revolution_1  lwn-aom---   4.00m                            /dev/sdh1(0)


Aug 19 11:26:09 host-126 qarshd[3109]: Running cmdline: dd if=/dev/zero of=/mnt/transform/ddfile count=10 bs=4M
Aug 19 11:26:09 host-126 kernel: sd 3:0:0:1: rejecting I/O to offline device
Aug 19 11:26:09 host-126 kernel: sd 3:0:0:1: rejecting I/O to offline device
Aug 19 11:26:09 host-126 kernel: sd 3:0:0:1: rejecting I/O to offline device
Aug 19 11:26:09 host-126 kernel: sd 3:0:0:1: rejecting I/O to offline device
Aug 19 11:26:09 host-126 kernel: sd 3:0:0:1: rejecting I/O to offline device
Aug 19 11:27:39 host-126 systemd: Starting Cleanup of Temporary Directories...
Aug 19 11:27:39 host-126 systemd: Started Cleanup of Temporary Directories.
Aug 19 11:28:14 host-126 kernel: INFO: task jbd2/dm-8-8:2918 blocked for more than 120 seconds.
Aug 19 11:28:14 host-126 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 19 11:28:14 host-126 kernel: jbd2/dm-8-8     D ffffffff81688150     0  2918      2 0x00000080
Aug 19 11:28:14 host-126 kernel: ffff88001b1e7ac0 0000000000000046 ffff88001b1e7fd8 0000000000016c40
Aug 19 11:28:14 host-126 kernel: ffff88001b1e7fd8 0000000000016c40 ffff88003b613ec0 ffff88003fc16c40
Aug 19 11:28:14 host-126 kernel: 0000000000000000 7fffffffffffffff ffff88003ff64a48 ffffffff81688150
Aug 19 11:28:14 host-126 kernel: Call Trace:
Aug 19 11:28:14 host-126 kernel: [<ffffffff81688150>] ? bit_wait+0x50/0x50
Aug 19 11:28:14 host-126 kernel: [<ffffffff8168a0e9>] schedule+0x29/0x70
Aug 19 11:28:14 host-126 kernel: [<ffffffff81687b29>] schedule_timeout+0x239/0x2d0
Aug 19 11:28:14 host-126 kernel: [<ffffffff81060aef>] ? kvm_clock_get_cycles+0x1f/0x30
Aug 19 11:28:14 host-126 kernel: [<ffffffff810e961c>] ? ktime_get_ts64+0x4c/0xf0
Aug 19 11:28:14 host-126 kernel: [<ffffffff81060aef>] ? kvm_clock_get_cycles+0x1f/0x30
Aug 19 11:28:14 host-126 kernel: [<ffffffff81688150>] ? bit_wait+0x50/0x50
Aug 19 11:28:14 host-126 kernel: [<ffffffff8168968e>] io_schedule_timeout+0xae/0x140
Aug 19 11:28:14 host-126 kernel: [<ffffffff81689738>] io_schedule+0x18/0x20
Aug 19 11:28:14 host-126 kernel: [<ffffffff81688161>] bit_wait_io+0x11/0x50
Aug 19 11:28:14 host-126 kernel: [<ffffffff81687c85>] __wait_on_bit+0x65/0x90
Aug 19 11:28:14 host-126 kernel: [<ffffffff81688150>] ? bit_wait+0x50/0x50
Aug 19 11:28:14 host-126 kernel: [<ffffffff81687d31>] out_of_line_wait_on_bit+0x81/0xb0
Aug 19 11:28:14 host-126 kernel: [<ffffffff810b18c0>] ? wake_bit_function+0x40/0x40
Aug 19 11:28:14 host-126 kernel: [<ffffffff81232eaa>] __wait_on_buffer+0x2a/0x30
Aug 19 11:28:14 host-126 kernel: [<ffffffffa03cce98>] jbd2_journal_commit_transaction+0xe88/0x1990 [jbd2]
Aug 19 11:28:14 host-126 kernel: [<ffffffff81029599>] ? __switch_to+0xd9/0x4c0
Aug 19 11:28:14 host-126 kernel: [<ffffffffa03d1ea9>] kjournald2+0xc9/0x270 [jbd2]
Aug 19 11:28:14 host-126 kernel: [<ffffffff810b1800>] ? wake_up_atomic_t+0x30/0x30
Aug 19 11:28:14 host-126 kernel: [<ffffffffa03d1de0>] ? commit_timeout+0x10/0x10 [jbd2]
Aug 19 11:28:14 host-126 kernel: [<ffffffff810b073f>] kthread+0xcf/0xe0
Aug 19 11:28:14 host-126 kernel: [<ffffffff810b0670>] ? kthread_create_on_node+0x140/0x140
Aug 19 11:28:14 host-126 kernel: [<ffffffff81694f18>] ret_from_fork+0x58/0x90
Aug 19 11:28:14 host-126 kernel: [<ffffffff810b0670>] ? kthread_create_on_node+0x140/0x140
Aug 19 11:28:14 host-126 kernel: INFO: task xdoio:3091 blocked for more than 120 seconds.
Aug 19 11:28:14 host-126 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 19 11:28:14 host-126 kernel: xdoio           D ffffffff81688150     0  3091   3090 0x00000080
Aug 19 11:28:14 host-126 kernel: ffff88001b23b960 0000000000000086 ffff88001b23bfd8 0000000000016c40
Aug 19 11:28:14 host-126 kernel: ffff88001b23bfd8 0000000000016c40 ffff88003b6a4e70 ffff88003fc16c40
Aug 19 11:28:14 host-126 kernel: 0000000000000000 7fffffffffffffff ffff88003ff5af80 ffffffff81688150
Aug 19 11:28:14 host-126 kernel: Call Trace:
Aug 19 11:28:14 host-126 kernel: [<ffffffff81688150>] ? bit_wait+0x50/0x50
Aug 19 11:28:14 host-126 kernel: [<ffffffff8168a0e9>] schedule+0x29/0x70
Aug 19 11:28:14 host-126 kernel: [<ffffffff81687b29>] schedule_timeout+0x239/0x2d0
Aug 19 11:28:14 host-126 kernel: [<ffffffff810bc194>] ? __wake_up+0x44/0x50
Aug 19 11:28:14 host-126 kernel: [<ffffffff81060aef>] ? kvm_clock_get_cycles+0x1f/0x30
Aug 19 11:28:14 host-126 kernel: [<ffffffff81688150>] ? bit_wait+0x50/0x50
Aug 19 11:28:14 host-126 kernel: [<ffffffff8168968e>] io_schedule_timeout+0xae/0x140
Aug 19 11:28:14 host-126 kernel: [<ffffffff81689738>] io_schedule+0x18/0x20
Aug 19 11:28:14 host-126 kernel: [<ffffffff81688161>] bit_wait_io+0x11/0x50
Aug 19 11:28:14 host-126 kernel: [<ffffffff81687c85>] __wait_on_bit+0x65/0x90
Aug 19 11:28:14 host-126 kernel: [<ffffffff81688150>] ? bit_wait+0x50/0x50
Aug 19 11:28:14 host-126 kernel: [<ffffffff81687d31>] out_of_line_wait_on_bit+0x81/0xb0
Aug 19 11:28:14 host-126 kernel: [<ffffffff810b18c0>] ? wake_bit_function+0x40/0x40
Aug 19 11:28:14 host-126 kernel: [<ffffffffa03caca5>] do_get_write_access+0x285/0x4d0 [jbd2]
Aug 19 11:28:14 host-126 kernel: [<ffffffff8123326d>] ? __getblk+0x2d/0x2e0
Aug 19 11:28:14 host-126 kernel: [<ffffffffa03caf17>] jbd2_journal_get_write_access+0x27/0x40 [jbd2]
Aug 19 11:28:14 host-126 kernel: [<ffffffffa0421bfb>] __ext4_journal_get_write_access+0x3b/0x80 [ext4]
Aug 19 11:28:14 host-126 kernel: [<ffffffffa03f01d0>] ext4_reserve_inode_write+0x70/0xa0 [ext4]
Aug 19 11:28:14 host-126 kernel: [<ffffffffa03f38f0>] ? ext4_dirty_inode+0x40/0x60 [ext4]
Aug 19 11:28:14 host-126 kernel: [<ffffffffa03f0253>] ext4_mark_inode_dirty+0x53/0x210 [ext4]
Aug 19 11:28:14 host-126 kernel: [<ffffffffa03f38f0>] ext4_dirty_inode+0x40/0x60 [ext4]
Aug 19 11:28:14 host-126 kernel: [<ffffffff81228f8a>] __mark_inode_dirty+0xca/0x290
Aug 19 11:28:14 host-126 kernel: [<ffffffff81219ba1>] update_time+0x81/0xd0
Aug 19 11:28:14 host-126 kernel: [<ffffffff81219d09>] touch_atime+0x119/0x160
Aug 19 11:28:14 host-126 kernel: [<ffffffff811824f8>] generic_file_aio_read+0x5d8/0x790
Aug 19 11:28:14 host-126 kernel: [<ffffffff811fda3d>] do_sync_read+0x8d/0xd0
Aug 19 11:28:14 host-126 kernel: [<ffffffff811fe1ee>] vfs_read+0x9e/0x170
Aug 19 11:28:14 host-126 kernel: [<ffffffff811fedbf>] SyS_read+0x7f/0xe0
Aug 19 11:28:14 host-126 kernel: [<ffffffff81694fc9>] system_call_fastpath+0x16/0x1b

Comment 8 Heinz Mauelshagen 2016-08-22 14:31:25 UTC
I can trigger this deadlock with just "lvcreate -m3 --ty mirror -L128M -nm VG" on an upstream kernel 4.8.0-rc2 (2-way "mirror" works fine).
Trying REHL7 kernel and without lvm2 now.

Comment 9 Heinz Mauelshagen 2016-08-23 14:46:25 UTC
Instrumenting/debugging I think I got to the core of this: half way port of dm-log.c to the new bio_set_op_attr(). Testing patch...

Comment 10 Alasdair Kergon 2016-08-23 18:02:35 UTC
On the userspace side, the fix for Bug 1366749 may be related to this too, as after the takeover, the raid1 monitoring dso will still have been in use even though it's become a mirror.

Comment 12 Heinz Mauelshagen 2016-08-30 14:31:07 UTC
dm-log fix with ommit 9c5a559d9495bba6e5b6c5ee4e8e2f2b71088684 queued in linux-dm.

Comment 13 Heinz Mauelshagen 2016-08-31 14:48:33 UTC
Corey,
based on comment #10 and new uspace package, update uspace and retest, please.

Comment 14 Heinz Mauelshagen 2016-08-31 15:06:24 UTC
The kernel fix as of comment #12 is irrelevant for the RHEL kernel, because it doesn't haveisn't needed for RHEL7, because that kernel doesn't have the bio operations changes backported yet.

Let's see if Alasdair's uspace fix WRT monitoring will solve this one.

Comment 15 Heinz Mauelshagen 2016-08-31 15:22:46 UTC
LVM2 upstream commit ids related:
952e4133283ac4a593d0fa8daccdc55c907379de
97ee5a1cd326cac4fa5642395d13ebbf13965970
2d65ce9711dace22cebb370b0d93fd7a98213bd8

Comment 16 Heinz Mauelshagen 2016-08-31 15:29:59 UTC
If bz1366749 (tested successfully here) tests ok for Corey, we can close this one as a duplicate.

Changing component to lvm2 and changing state to POST until further QE testing.

Comment 19 Corey Marthaler 2016-08-31 22:58:30 UTC
This does appear to be fixed in the latest kernel. First two iterations passed so far (this was always triggered on the very first iteration). Letting test run all night...

3.10.0-497.el7.x86_64
lvm2-2.02.164-4.el7    BUILT: Wed Aug 31 08:47:09 CDT 2016
lvm2-libs-2.02.164-4.el7    BUILT: Wed Aug 31 08:47:09 CDT 2016
lvm2-cluster-2.02.164-4.el7    BUILT: Wed Aug 31 08:47:09 CDT 2016
device-mapper-1.02.133-4.el7    BUILT: Wed Aug 31 08:47:09 CDT 2016
device-mapper-libs-1.02.133-4.el7    BUILT: Wed Aug 31 08:47:09 CDT 2016
device-mapper-event-1.02.133-4.el7    BUILT: Wed Aug 31 08:47:09 CDT 2016
device-mapper-event-libs-1.02.133-4.el7    BUILT: Wed Aug 31 08:47:09 CDT 2016
device-mapper-persistent-data-0.6.3-1.el7    BUILT: Fri Jul 22 05:29:13 CDT 2016



[root@host-119 ~]# lvcreate --type raid1 -L 5G -n transform -m 2 revolution_9
  Logical volume "transform" created.

[root@host-119 ~]# lvs -a -o +devices
  LV                   VG            Attr       LSize   Cpy%Sync Devices
  transform            revolution_9  rwi-a-r---   5.00g 100.00   transform_rimage_0(0),transform_rimage_1(0),transform_rimage_2(0)
  [transform_rimage_0] revolution_9  iwi-aor---   5.00g          /dev/sda1(1)
  [transform_rimage_1] revolution_9  iwi-aor---   5.00g          /dev/sdb1(1)
  [transform_rimage_2] revolution_9  iwi-aor---   5.00g          /dev/sdc1(1)
  [transform_rmeta_0]  revolution_9  ewi-aor---   4.00m          /dev/sda1(0)
  [transform_rmeta_1]  revolution_9  ewi-aor---   4.00m          /dev/sdb1(0)
  [transform_rmeta_2]  revolution_9  ewi-aor---   4.00m          /dev/sdc1(0)

[root@host-119 ~]# lvconvert --type mirror revolution_9/transform
Are you sure you want to convert revolution_9/transform back to the older "mirror" type? [y/n]: y
  Logical volume revolution_9/transform successfully converted.

[root@host-119 ~]# lvs -a -o +devices  
  LV                   VG            Attr       LSize    Log              Cpy%Sync Devices
  transform            revolution_9  mwi-a-m---   5.00g  [transform_mlog] 100.00   transform_mimage_0(0),transform_mimage_1(0),transform_mimage_2(0)
  [transform_mimage_0] revolution_9  iwi-aom---   5.00g                            /dev/sda1(1)
  [transform_mimage_1] revolution_9  iwi-aom---   5.00g                            /dev/sdb1(1)
  [transform_mimage_2] revolution_9  iwi-aom---   5.00g                            /dev/sdc1(1)
  [transform_mlog]     revolution_9  lwn-aom---   4.00m                            /dev/sdh1(0)


mirror device(s) (transform) found in revolution_9 on host-119
Creating ext on top of mirror(s) on host-119...
mke2fs 1.42.9 (28-Dec-2013)
Mounting mirrored ext filesystems on host-119...

================================================================================
Iteration 0.1 started at Wed Aug 31 17:46:46 CDT 2016
================================================================================
ACTUAL LEG ORDER: /dev/sda1 /dev/sdb1 /dev/sdc1
Scenario kill_random_legs: Kill random legs
********* Mirror info for this scenario *********
* mirrors:            transform
* leg devices:        /dev/sda1 /dev/sdb1 /dev/sdc1
* log devices:        /dev/sdh1
* failpv(s):          /dev/sda1
* failnode(s):        host-119
* lvmetad:            1
* leg fault policy:   remove
* log fault policy:   allocate
*************************************************


Current mirror/raid device structure(s):
  LV                   Attr       LSize   Cpy%Sync Devices
   transform            mwi-aom---   5.00g 100.00   transform_mimage_0(0),transform_mimage_1(0),transform_mimage_2(0)
   [transform_mimage_0] iwi-aom---   5.00g          /dev/sda1(1)
   [transform_mimage_1] iwi-aom---   5.00g          /dev/sdb1(1)
   [transform_mimage_2] iwi-aom---   5.00g          /dev/sdc1(1)
   [transform_mlog]     lwn-aom---   4.00m          /dev/sdh1(0)

PV=/dev/sda1
        transform_mimage_0: 6
PV=/dev/sda1
        transform_mimage_0: 6

Writing verification files (checkit) to mirror(s) on...
        ---- host-119 ----

Sleeping 15 seconds to get some outsanding I/O locks before the failure 
Verifying files (checkit) on mirror(s) on...
        ---- host-119 ----


Disabling device sda on host-119

Getting recovery check start time from /var/log/messages: Aug 31 17:47
Attempting I/O to cause mirror down conversion(s) on host-119
dd if=/dev/zero of=/mnt/transform/ddfile count=10 bs=4M
10+0 records in
10+0 records out
41943040 bytes (42 MB) copied, 0.177075 s, 237 MB/s

Verifying current sanity of lvm after the failure

Current mirror/raid device structure(s):
  WARNING: Device for PV SAKkmK-BMc0-e7Oc-tg0F-5JIi-UNoi-uOuew3 not found or rejected by a filter.
  LV                   Attr       LSize   Cpy%Sync Devices                                    
   transform            mwi-aom---   5.00g 100.00   transform_mimage_1(0),transform_mimage_2(0)
   [transform_mimage_1] iwi-aom---   5.00g          /dev/sdb1(1)                               
   [transform_mimage_2] iwi-aom---   5.00g          /dev/sdc1(1)                               
   [transform_mlog]     lwn-aom---   4.00m          /dev/sdh1(0)                               

Verify that each of the mirror repairs finished successfully

Verifying FAILED device /dev/sda1 is *NOT* in the volume(s)
olog: 2.2
Verifying LOG device(s) /dev/sdh1 *ARE* in the mirror(s)
Verifying LEG device /dev/sdb1 *IS* in the volume(s)
Verifying LEG device /dev/sdc1 *IS* in the volume(s)
verify the dm devices associated with /dev/sda1 have been removed as expected
Checking REMOVAL of transform_mimage_0 on:  host-119

Verify that the mirror image order remains the same after the down conversion
EXPECTED LEG ORDER: /dev/sdb1 /dev/sdc1
ACTUAL LEG ORDER: /dev/sdb1 /dev/sdc1
Verifying files (checkit) on mirror(s) on...
        ---- host-119 ----

Enabling device sda on host-119  WARNING: Inconsistent metadata found for VG revolution_9
        Running vgs to make LVM update metadata version if possible (will restore a-m PVs)
  WARNING: Missing device /dev/sda1 reappeared, updating metadata for VG revolution_9 to version 10.
  WARNING: Inconsistent metadata found for VG revolution_9 - updating to use version 10

-------------------------------------------------------------------------------
Force a vgreduce to either clean up the corrupt additional LV or lack of PV MDA
-------------------------------------------------------------------------------
Recreating PVs /dev/sda1 and then extending back into revolution_9
host-119 pvcreate /dev/sda1
  Can't initialize physical volume "/dev/sda1" of volume group "revolution_9" without -ff
recreation of /dev/sda1 failed, must still be in VG
host-119 vgextend revolution_9 /dev/sda1
  Physical volume '/dev/sda1' is already in volume group 'revolution_9'
  Unable to add physical volume '/dev/sda1' to volume group 'revolution_9'
extension of /dev/sda1 back into revolution_9 failed

Getting PE for leg: /dev/sda1
Getting PE for leg: /dev/sdb1
Getting PE for leg: /dev/sdc1
Up converting linear(s) back to mirror(s) on host-119...
host-119: lvconvert --yes --type mirror --mirrorlog disk -m 2 revolution_9/transform /dev/sda1:0-5117 /dev/sdb1:0-5117 /dev/sdc1:0-5117 /dev/sdh1:0-150
Waiting for convert to finish (loss of _mimagetmp_)

Waiting until all mirror|raid volumes become fully syncd...
   1/1 mirror(s) are fully synced: ( 100.00% )
Sleeping 15 sec

Verifying files (checkit) on mirror(s) on...
        ---- host-119 ----

Stopping the io load (collie/xdoio) on mirror(s)

Comment 20 Peter Rajnoha 2016-09-05 13:22:13 UTC
(In reply to Heinz Mauelshagen from comment #15)
> LVM2 upstream commit ids related:
> 952e4133283ac4a593d0fa8daccdc55c907379de
> 97ee5a1cd326cac4fa5642395d13ebbf13965970
> 2d65ce9711dace22cebb370b0d93fd7a98213bd8

(In reply to Corey Marthaler from comment #19)
> This does appear to be fixed in the latest kernel. First two iterations
> passed so far (this was always triggered on the very first iteration).
> Letting test run all night...
> 
> 3.10.0-497.el7.x86_64
> lvm2-2.02.164-4.el7    BUILT: Wed Aug 31 08:47:09 CDT 2016
> lvm2-libs-2.02.164-4.el7    BUILT: Wed Aug 31 08:47:09 CDT 2016
> lvm2-cluster-2.02.164-4.el7    BUILT: Wed Aug 31 08:47:09 CDT 2016
> device-mapper-1.02.133-4.el7    BUILT: Wed Aug 31 08:47:09 CDT 2016
> device-mapper-libs-1.02.133-4.el7    BUILT: Wed Aug 31 08:47:09 CDT 2016
> device-mapper-event-1.02.133-4.el7    BUILT: Wed Aug 31 08:47:09 CDT 2016
> device-mapper-event-libs-1.02.133-4.el7    BUILT: Wed Aug 31 08:47:09 CDT
> 2016
> device-mapper-persistent-data-0.6.3-1.el7    BUILT: Fri Jul 22 05:29:13 CDT
> 2016

The patches above are part of that lvm2 build. So I'm adding this to errata.

Comment 22 Corey Marthaler 2016-09-07 22:18:30 UTC
Marking verified in the latest rpms.

3.10.0-501.el7.x86_64

lvm2-2.02.165-1.el7    BUILT: Wed Sep  7 11:04:22 CDT 2016
lvm2-libs-2.02.165-1.el7    BUILT: Wed Sep  7 11:04:22 CDT 2016
lvm2-cluster-2.02.165-1.el7    BUILT: Wed Sep  7 11:04:22 CDT 2016
device-mapper-1.02.134-1.el7    BUILT: Wed Sep  7 11:04:22 CDT 2016
device-mapper-libs-1.02.134-1.el7    BUILT: Wed Sep  7 11:04:22 CDT 2016
device-mapper-event-1.02.134-1.el7    BUILT: Wed Sep  7 11:04:22 CDT 2016
device-mapper-event-libs-1.02.134-1.el7    BUILT: Wed Sep  7 11:04:22 CDT 2016
device-mapper-persistent-data-0.6.3-1.el7    BUILT: Fri Jul 22 05:29:13 CDT 2016

Comment 24 errata-xmlrpc 2016-11-04 04:17:39 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-1445.html


Note You need to log in before you can comment on or make changes to this bug.