Bug 1366760
Summary: | converted raid1 -> mirror takeover volume deadlocks during first mimage failure attempt | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Corey Marthaler <cmarthal> | ||||
Component: | lvm2 | Assignee: | LVM and device-mapper development team <lvm-team> | ||||
lvm2 sub component: | Default / Unclassified | QA Contact: | cluster-qe <cluster-qe> | ||||
Status: | CLOSED ERRATA | Docs Contact: | |||||
Severity: | high | ||||||
Priority: | unspecified | CC: | agk, heinzm, jbrassow, mkolaja, msnitzer, prajnoha, prockai, rbednar, zkabelac | ||||
Version: | 7.3 | Keywords: | TestBlocker | ||||
Target Milestone: | rc | ||||||
Target Release: | --- | ||||||
Hardware: | x86_64 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | lvm2-2.02.164-4.el7 | Doc Type: | If docs needed, set a value | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2016-11-04 04:17:39 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Corey Marthaler
2016-08-12 17:50:20 UTC
This is reproducible. Fresh reboot, no existing or left over volumes/dm devices. [root@host-082 ~]# lvcreate --type raid1 -L 3G -m 3 -n mirror_1 revolution_9 Logical volume "mirror_1" created. [root@host-082 ~]# lvs -a -o +devices LV VG Attr LSize Log Cpy%Sync Devices mirror_1 revolution_9 rwi-a-r--- 3.00g 100.00 mirror_1_rimage_0(0),mirror_1_rimage_1(0),mirror_1_rimage_2(0),mirror_1_rimage_3(0) [mirror_1_rimage_0] revolution_9 iwi-aor--- 3.00g /dev/sda1(1) [mirror_1_rimage_1] revolution_9 iwi-aor--- 3.00g /dev/sdb1(1) [mirror_1_rimage_2] revolution_9 iwi-aor--- 3.00g /dev/sdc1(1) [mirror_1_rimage_3] revolution_9 iwi-aor--- 3.00g /dev/sdd1(1) [mirror_1_rmeta_0] revolution_9 ewi-aor--- 4.00m /dev/sda1(0) [mirror_1_rmeta_1] revolution_9 ewi-aor--- 4.00m /dev/sdb1(0) [mirror_1_rmeta_2] revolution_9 ewi-aor--- 4.00m /dev/sdc1(0) [mirror_1_rmeta_3] revolution_9 ewi-aor--- 4.00m /dev/sdd1(0) [root@host-082 ~]# lvconvert --type mirror --mirrorlog disk revolution_9/mirror_1 Using default stripesize 64.00 KiB. Are you sure you want to convert revolution_9/mirror_1 back to the older "mirror" type? [y/n]: y Monitoring revolution_9/mirror_1 failed. Logical volume revolution_9/mirror_1 successfully converted. [root@host-082 ~]# lvs -a -o +devices LV VG Attr LSize Log Cpy%Sync Devices mirror_1 revolution_9 mwi-a-m--- 3.00g [mirror_1_mlog] 100.00 mirror_1_mimage_0(0),mirror_1_mimage_1(0),mirror_1_mimage_2(0),mirror_1_mimage_3(0) [mirror_1_mimage_0] revolution_9 iwi-aom--- 3.00g /dev/sda1(1) [mirror_1_mimage_1] revolution_9 iwi-aom--- 3.00g /dev/sdb1(1) [mirror_1_mimage_2] revolution_9 iwi-aom--- 3.00g /dev/sdc1(1) [mirror_1_mimage_3] revolution_9 iwi-aom--- 3.00g /dev/sdd1(1) [mirror_1_mlog] revolution_9 lwn-aom--- 4.00m /dev/sdh1(0) mirror device(s) (mirror_1) found in revolution_9 on host-082 Creating ext on top of mirror(s) on host-082... mke2fs 1.42.9 (28-Dec-2013) Mounting mirrored ext filesystems on host-082... ================================================================================ Iteration 0.1 started at Fri Aug 12 16:26:43 CDT 2016 ================================================================================ ACTUAL LEG ORDER: /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1 Scenario kill_random_legs: Kill random legs ********* Mirror info for this scenario ********* * mirrors: mirror_1 * leg devices: /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1 * log devices: /dev/sdh1 * failpv(s): /dev/sda1 * failnode(s): host-082 * lvmetad: 1 * leg fault policy: remove * log fault policy: allocate ************************************************* Current mirror/raid device structure(s): LV Attr LSize Cpy%Sync Devices mirror_1 mwi-aom--- 3.00g 100.00 mirror_1_mimage_0(0),mirror_1_mimage_1(0),mirror_1_mimage_2(0),mirror_1_mimage_3(0) [mirror_1_mimage_0] iwi-aom--- 3.00g /dev/sda1(1) [mirror_1_mimage_1] iwi-aom--- 3.00g /dev/sdb1(1) [mirror_1_mimage_2] iwi-aom--- 3.00g /dev/sdc1(1) [mirror_1_mimage_3] iwi-aom--- 3.00g /dev/sdd1(1) [mirror_1_mlog] lwn-aom--- 4.00m /dev/sdh1(0) PV=/dev/sda1 mirror_1_mimage_0: 6 PV=/dev/sda1 mirror_1_mimage_0: 6 Writing verification files (checkit) to mirror(s) on... ---- host-082 ---- Sleeping 15 seconds to get some outsanding I/O locks before the failure Verifying files (checkit) on mirror(s) on... ---- host-082 ---- Disabling device sda on host-082rescan device... /dev/sda1: read failed after 0 of 4096 at 26838958080: Input/output error /dev/sda1: read failed after 0 of 4096 at 26839048192: Input/output error /dev/sda1: read failed after 0 of 4096 at 0: Input/output error /dev/sda1: read failed after 0 of 4096 at 4096: Input/output error Getting recovery check start time from /var/log/messages: Aug 12 16:27 Attempting I/O to cause mirror down conversion(s) on host-082 dd if=/dev/zero of=/mnt/mirror_1/ddfile count=10 bs=4M [I/O STUCK] [root@host-082 ~]# lvs -a -o +devices WARNING: Device for PV GLbWf5-Q3Ep-xvwf-d78T-1RNX-mxzP-OUt9Bj not found or rejected by a filter. WARNING: Couldn't find all devices for LV revolution_9/mirror_1_mimage_0 while checking used and assumed devices. LV VG Attr LSize Log Cpy%Sync Devices mirror_1 revolution_9 mwi-aom-p- 3.00g [mirror_1_mlog] 99.87 mirror_1_mimage_0(0),mirror_1_mimage_1(0),mirror_1_mimage_2(0),mirror_1_mimage_3(0) [mirror_1_mimage_0] revolution_9 Iwi-aom-p- 3.00g [unknown](1) [mirror_1_mimage_1] revolution_9 Iwi-aom--- 3.00g /dev/sdb1(1) [mirror_1_mimage_2] revolution_9 Iwi-aom--- 3.00g /dev/sdc1(1) [mirror_1_mimage_3] revolution_9 Iwi-aom--- 3.00g /dev/sdd1(1) [mirror_1_mlog] revolution_9 lwn-aom--- 4.00m /dev/sdh1(0) [root@host-082 ~]# dmsetup status revolution_9-mirror_1: 0 6291456 mirror 4 253:3 253:5 253:7 253:9 6141/6144 1 DAAA 3 disk 253:11 A revolution_9-mirror_1_mlog: 0 8192 linear revolution_9-mirror_1_mimage_3: 0 6291456 linear revolution_9-mirror_1_mimage_2: 0 6291456 linear revolution_9-mirror_1_mimage_1: 0 6291456 linear revolution_9-mirror_1_mimage_0: 0 6291456 linear Created attachment 1190585 [details]
syslog during failure
This continues to exist in the latest test kernel/lvm rpms. 3.10.0-493.el7.bz1367223.x86_64 lvm2-2.02.164-2.el7 BUILT: Tue Aug 16 05:43:50 CDT 2016 lvm2-libs-2.02.164-2.el7 BUILT: Tue Aug 16 05:43:50 CDT 2016 lvm2-cluster-2.02.164-2.el7 BUILT: Tue Aug 16 05:43:50 CDT 2016 device-mapper-1.02.133-2.el7 BUILT: Tue Aug 16 05:43:50 CDT 2016 device-mapper-libs-1.02.133-2.el7 BUILT: Tue Aug 16 05:43:50 CDT 2016 device-mapper-event-1.02.133-2.el7 BUILT: Tue Aug 16 05:43:50 CDT 2016 device-mapper-event-libs-1.02.133-2.el7 BUILT: Tue Aug 16 05:43:50 CDT 2016 device-mapper-persistent-data-0.6.3-1.el7 BUILT: Fri Jul 22 05:29:13 CDT 2016 cmirror-2.02.164-2.el7 BUILT: Tue Aug 16 05:43:50 CDT 2016 sanlock-3.4.0-1.el7 BUILT: Fri Jun 10 11:41:03 CDT 2016 sanlock-lib-3.4.0-1.el7 BUILT: Fri Jun 10 11:41:03 CDT 2016 lvm2-lockd-2.02.164-2.el7 BUILT: Tue Aug 16 05:43:50 CDT 2016 # Create original raid1 volume [root@host-126 ~]# lvcreate --type raid1 -L 5G -n transform -m 2 revolution_1 Logical volume "transform" created. [root@host-126 ~]# lvs -a -o +devices LV VG Attr LSize Cpy%Sync Devices transform revolution_1 rwi-a-r--- 5.00g 8.20 transform_rimage_0(0),transform_rimage_1(0),transform_rimage_2(0) [transform_rimage_0] revolution_1 Iwi-aor--- 5.00g /dev/sdb1(1) [transform_rimage_1] revolution_1 Iwi-aor--- 5.00g /dev/sdg1(1) [transform_rimage_2] revolution_1 Iwi-aor--- 5.00g /dev/sda1(1) [transform_rmeta_0] revolution_1 ewi-aor--- 4.00m /dev/sdb1(0) [transform_rmeta_1] revolution_1 ewi-aor--- 4.00m /dev/sdg1(0) [transform_rmeta_2] revolution_1 ewi-aor--- 4.00m /dev/sda1(0) # Wait until it's 100% synced and then convert to mirror type [root@host-126 ~]# lvconvert --type mirror revolution_1/transform Are you sure you want to convert revolution_1/transform back to the older "mirror" type? [y/n]: y Monitoring revolution_1/transform failed. Logical volume revolution_1/transform successfully converted. [root@host-126 ~]# lvs -a -o +devices LV VG Attr LSize Log Cpy%Sync Devices transform revolution_1 mwi-a-m--- 5.00g [transform_mlog] 100.00 transform_mimage_0(0),transform_mimage_1(0),transform_mimage_2(0) [transform_mimage_0] revolution_1 iwi-aom--- 5.00g /dev/sdb1(1) [transform_mimage_1] revolution_1 iwi-aom--- 5.00g /dev/sdg1(1) [transform_mimage_2] revolution_1 iwi-aom--- 5.00g /dev/sda1(1) [transform_mlog] revolution_1 lwn-aom--- 4.00m /dev/sdh1(0) # Start failure test mirror device(s) (transform) found in revolution_1 on host-126 Creating ext on top of mirror(s) on host-126... mke2fs 1.42.9 (28-Dec-2013) Mounting mirrored ext filesystems on host-126... ================================================================================ Iteration 0.1 started at Fri Aug 19 11:25:36 CDT 2016 ================================================================================ ACTUAL LEG ORDER: /dev/sdb1 /dev/sdg1 /dev/sda1 Scenario kill_non_primary_leg: Kill non primary leg ********* Mirror info for this scenario ********* * mirrors: transform * leg devices: /dev/sdb1 /dev/sdg1 /dev/sda1 * log devices: /dev/sdh1 * failpv(s): /dev/sda1 * failnode(s): host-126 * lvmetad: 1 * leg fault policy: remove * log fault policy: allocate ************************************************* Current mirror/raid device structure(s): LV Attr LSize Cpy%Sync Devices transform mwi-aom--- 5.00g 100.00 transform_mimage_0(0),transform_mimage_1(0),transform_mimage_2(0) [transform_mimage_0] iwi-aom--- 5.00g /dev/sdb1(1) [transform_mimage_1] iwi-aom--- 5.00g /dev/sdg1(1) [transform_mimage_2] iwi-aom--- 5.00g /dev/sda1(1) [transform_mlog] lwn-aom--- 4.00m /dev/sdh1(0) PV=/dev/sda1 transform_mimage_2: 6 PV=/dev/sda1 transform_mimage_2: 6 Writing verification files (checkit) to mirror(s) on... ---- host-126 ---- Sleeping 15 seconds to get some outsanding I/O locks before the failure Verifying files (checkit) on mirror(s) on... ---- host-126 ---- Disabling device sda on host-126 rescan (pvscan --cache) device... /dev/sda1: read failed after 0 of 512 at 21467824128: Input/output error /dev/sda1: read failed after 0 of 512 at 21467938816: Input/output error /dev/sda1: read failed after 0 of 512 at 0: Input/output error /dev/sda1: read failed after 0 of 512 at 4096: Input/output error /dev/sda1: read failed after 0 of 2048 at 0: Input/output error Getting recovery check start time from /var/log/messages: Aug 19 11:26 Attempting I/O to cause mirror down conversion(s) on host-126 dd if=/dev/zero of=/mnt/transform/ddfile count=10 bs=4M # Recovery is now stuck at 99.92% ?? [root@host-126 ~]# lvs -a -o +devices WARNING: Device for PV Wc2ieE-cV55-bd2L-l319-FsTq-snCa-JyPt0A not found or rejected by a filter. WARNING: Couldn't find all devices for LV revolution_1/transform_mimage_2 while checking used and assumed devices. LV VG Attr LSize Log Cpy%Sync Devices transform revolution_1 mwi-aom-p- 5.00g [transform_mlog] 99.92 transform_mimage_0(0),transform_mimage_1(0),transform_mimage_2(0) [transform_mimage_0] revolution_1 Iwi-aom--- 5.00g /dev/sdb1(1) [transform_mimage_1] revolution_1 Iwi-aom--- 5.00g /dev/sdg1(1) [transform_mimage_2] revolution_1 Iwi-aom-p- 5.00g [unknown](1) [transform_mlog] revolution_1 lwn-aom--- 4.00m /dev/sdh1(0) Aug 19 11:26:09 host-126 qarshd[3109]: Running cmdline: dd if=/dev/zero of=/mnt/transform/ddfile count=10 bs=4M Aug 19 11:26:09 host-126 kernel: sd 3:0:0:1: rejecting I/O to offline device Aug 19 11:26:09 host-126 kernel: sd 3:0:0:1: rejecting I/O to offline device Aug 19 11:26:09 host-126 kernel: sd 3:0:0:1: rejecting I/O to offline device Aug 19 11:26:09 host-126 kernel: sd 3:0:0:1: rejecting I/O to offline device Aug 19 11:26:09 host-126 kernel: sd 3:0:0:1: rejecting I/O to offline device Aug 19 11:27:39 host-126 systemd: Starting Cleanup of Temporary Directories... Aug 19 11:27:39 host-126 systemd: Started Cleanup of Temporary Directories. Aug 19 11:28:14 host-126 kernel: INFO: task jbd2/dm-8-8:2918 blocked for more than 120 seconds. Aug 19 11:28:14 host-126 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Aug 19 11:28:14 host-126 kernel: jbd2/dm-8-8 D ffffffff81688150 0 2918 2 0x00000080 Aug 19 11:28:14 host-126 kernel: ffff88001b1e7ac0 0000000000000046 ffff88001b1e7fd8 0000000000016c40 Aug 19 11:28:14 host-126 kernel: ffff88001b1e7fd8 0000000000016c40 ffff88003b613ec0 ffff88003fc16c40 Aug 19 11:28:14 host-126 kernel: 0000000000000000 7fffffffffffffff ffff88003ff64a48 ffffffff81688150 Aug 19 11:28:14 host-126 kernel: Call Trace: Aug 19 11:28:14 host-126 kernel: [<ffffffff81688150>] ? bit_wait+0x50/0x50 Aug 19 11:28:14 host-126 kernel: [<ffffffff8168a0e9>] schedule+0x29/0x70 Aug 19 11:28:14 host-126 kernel: [<ffffffff81687b29>] schedule_timeout+0x239/0x2d0 Aug 19 11:28:14 host-126 kernel: [<ffffffff81060aef>] ? kvm_clock_get_cycles+0x1f/0x30 Aug 19 11:28:14 host-126 kernel: [<ffffffff810e961c>] ? ktime_get_ts64+0x4c/0xf0 Aug 19 11:28:14 host-126 kernel: [<ffffffff81060aef>] ? kvm_clock_get_cycles+0x1f/0x30 Aug 19 11:28:14 host-126 kernel: [<ffffffff81688150>] ? bit_wait+0x50/0x50 Aug 19 11:28:14 host-126 kernel: [<ffffffff8168968e>] io_schedule_timeout+0xae/0x140 Aug 19 11:28:14 host-126 kernel: [<ffffffff81689738>] io_schedule+0x18/0x20 Aug 19 11:28:14 host-126 kernel: [<ffffffff81688161>] bit_wait_io+0x11/0x50 Aug 19 11:28:14 host-126 kernel: [<ffffffff81687c85>] __wait_on_bit+0x65/0x90 Aug 19 11:28:14 host-126 kernel: [<ffffffff81688150>] ? bit_wait+0x50/0x50 Aug 19 11:28:14 host-126 kernel: [<ffffffff81687d31>] out_of_line_wait_on_bit+0x81/0xb0 Aug 19 11:28:14 host-126 kernel: [<ffffffff810b18c0>] ? wake_bit_function+0x40/0x40 Aug 19 11:28:14 host-126 kernel: [<ffffffff81232eaa>] __wait_on_buffer+0x2a/0x30 Aug 19 11:28:14 host-126 kernel: [<ffffffffa03cce98>] jbd2_journal_commit_transaction+0xe88/0x1990 [jbd2] Aug 19 11:28:14 host-126 kernel: [<ffffffff81029599>] ? __switch_to+0xd9/0x4c0 Aug 19 11:28:14 host-126 kernel: [<ffffffffa03d1ea9>] kjournald2+0xc9/0x270 [jbd2] Aug 19 11:28:14 host-126 kernel: [<ffffffff810b1800>] ? wake_up_atomic_t+0x30/0x30 Aug 19 11:28:14 host-126 kernel: [<ffffffffa03d1de0>] ? commit_timeout+0x10/0x10 [jbd2] Aug 19 11:28:14 host-126 kernel: [<ffffffff810b073f>] kthread+0xcf/0xe0 Aug 19 11:28:14 host-126 kernel: [<ffffffff810b0670>] ? kthread_create_on_node+0x140/0x140 Aug 19 11:28:14 host-126 kernel: [<ffffffff81694f18>] ret_from_fork+0x58/0x90 Aug 19 11:28:14 host-126 kernel: [<ffffffff810b0670>] ? kthread_create_on_node+0x140/0x140 Aug 19 11:28:14 host-126 kernel: INFO: task xdoio:3091 blocked for more than 120 seconds. Aug 19 11:28:14 host-126 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Aug 19 11:28:14 host-126 kernel: xdoio D ffffffff81688150 0 3091 3090 0x00000080 Aug 19 11:28:14 host-126 kernel: ffff88001b23b960 0000000000000086 ffff88001b23bfd8 0000000000016c40 Aug 19 11:28:14 host-126 kernel: ffff88001b23bfd8 0000000000016c40 ffff88003b6a4e70 ffff88003fc16c40 Aug 19 11:28:14 host-126 kernel: 0000000000000000 7fffffffffffffff ffff88003ff5af80 ffffffff81688150 Aug 19 11:28:14 host-126 kernel: Call Trace: Aug 19 11:28:14 host-126 kernel: [<ffffffff81688150>] ? bit_wait+0x50/0x50 Aug 19 11:28:14 host-126 kernel: [<ffffffff8168a0e9>] schedule+0x29/0x70 Aug 19 11:28:14 host-126 kernel: [<ffffffff81687b29>] schedule_timeout+0x239/0x2d0 Aug 19 11:28:14 host-126 kernel: [<ffffffff810bc194>] ? __wake_up+0x44/0x50 Aug 19 11:28:14 host-126 kernel: [<ffffffff81060aef>] ? kvm_clock_get_cycles+0x1f/0x30 Aug 19 11:28:14 host-126 kernel: [<ffffffff81688150>] ? bit_wait+0x50/0x50 Aug 19 11:28:14 host-126 kernel: [<ffffffff8168968e>] io_schedule_timeout+0xae/0x140 Aug 19 11:28:14 host-126 kernel: [<ffffffff81689738>] io_schedule+0x18/0x20 Aug 19 11:28:14 host-126 kernel: [<ffffffff81688161>] bit_wait_io+0x11/0x50 Aug 19 11:28:14 host-126 kernel: [<ffffffff81687c85>] __wait_on_bit+0x65/0x90 Aug 19 11:28:14 host-126 kernel: [<ffffffff81688150>] ? bit_wait+0x50/0x50 Aug 19 11:28:14 host-126 kernel: [<ffffffff81687d31>] out_of_line_wait_on_bit+0x81/0xb0 Aug 19 11:28:14 host-126 kernel: [<ffffffff810b18c0>] ? wake_bit_function+0x40/0x40 Aug 19 11:28:14 host-126 kernel: [<ffffffffa03caca5>] do_get_write_access+0x285/0x4d0 [jbd2] Aug 19 11:28:14 host-126 kernel: [<ffffffff8123326d>] ? __getblk+0x2d/0x2e0 Aug 19 11:28:14 host-126 kernel: [<ffffffffa03caf17>] jbd2_journal_get_write_access+0x27/0x40 [jbd2] Aug 19 11:28:14 host-126 kernel: [<ffffffffa0421bfb>] __ext4_journal_get_write_access+0x3b/0x80 [ext4] Aug 19 11:28:14 host-126 kernel: [<ffffffffa03f01d0>] ext4_reserve_inode_write+0x70/0xa0 [ext4] Aug 19 11:28:14 host-126 kernel: [<ffffffffa03f38f0>] ? ext4_dirty_inode+0x40/0x60 [ext4] Aug 19 11:28:14 host-126 kernel: [<ffffffffa03f0253>] ext4_mark_inode_dirty+0x53/0x210 [ext4] Aug 19 11:28:14 host-126 kernel: [<ffffffffa03f38f0>] ext4_dirty_inode+0x40/0x60 [ext4] Aug 19 11:28:14 host-126 kernel: [<ffffffff81228f8a>] __mark_inode_dirty+0xca/0x290 Aug 19 11:28:14 host-126 kernel: [<ffffffff81219ba1>] update_time+0x81/0xd0 Aug 19 11:28:14 host-126 kernel: [<ffffffff81219d09>] touch_atime+0x119/0x160 Aug 19 11:28:14 host-126 kernel: [<ffffffff811824f8>] generic_file_aio_read+0x5d8/0x790 Aug 19 11:28:14 host-126 kernel: [<ffffffff811fda3d>] do_sync_read+0x8d/0xd0 Aug 19 11:28:14 host-126 kernel: [<ffffffff811fe1ee>] vfs_read+0x9e/0x170 Aug 19 11:28:14 host-126 kernel: [<ffffffff811fedbf>] SyS_read+0x7f/0xe0 Aug 19 11:28:14 host-126 kernel: [<ffffffff81694fc9>] system_call_fastpath+0x16/0x1b I can trigger this deadlock with just "lvcreate -m3 --ty mirror -L128M -nm VG" on an upstream kernel 4.8.0-rc2 (2-way "mirror" works fine). Trying REHL7 kernel and without lvm2 now. Instrumenting/debugging I think I got to the core of this: half way port of dm-log.c to the new bio_set_op_attr(). Testing patch... On the userspace side, the fix for Bug 1366749 may be related to this too, as after the takeover, the raid1 monitoring dso will still have been in use even though it's become a mirror. dm-log fix with ommit 9c5a559d9495bba6e5b6c5ee4e8e2f2b71088684 queued in linux-dm. Corey, based on comment #10 and new uspace package, update uspace and retest, please. The kernel fix as of comment #12 is irrelevant for the RHEL kernel, because it doesn't haveisn't needed for RHEL7, because that kernel doesn't have the bio operations changes backported yet. Let's see if Alasdair's uspace fix WRT monitoring will solve this one. LVM2 upstream commit ids related: 952e4133283ac4a593d0fa8daccdc55c907379de 97ee5a1cd326cac4fa5642395d13ebbf13965970 2d65ce9711dace22cebb370b0d93fd7a98213bd8 If bz1366749 (tested successfully here) tests ok for Corey, we can close this one as a duplicate. Changing component to lvm2 and changing state to POST until further QE testing. This does appear to be fixed in the latest kernel. First two iterations passed so far (this was always triggered on the very first iteration). Letting test run all night... 3.10.0-497.el7.x86_64 lvm2-2.02.164-4.el7 BUILT: Wed Aug 31 08:47:09 CDT 2016 lvm2-libs-2.02.164-4.el7 BUILT: Wed Aug 31 08:47:09 CDT 2016 lvm2-cluster-2.02.164-4.el7 BUILT: Wed Aug 31 08:47:09 CDT 2016 device-mapper-1.02.133-4.el7 BUILT: Wed Aug 31 08:47:09 CDT 2016 device-mapper-libs-1.02.133-4.el7 BUILT: Wed Aug 31 08:47:09 CDT 2016 device-mapper-event-1.02.133-4.el7 BUILT: Wed Aug 31 08:47:09 CDT 2016 device-mapper-event-libs-1.02.133-4.el7 BUILT: Wed Aug 31 08:47:09 CDT 2016 device-mapper-persistent-data-0.6.3-1.el7 BUILT: Fri Jul 22 05:29:13 CDT 2016 [root@host-119 ~]# lvcreate --type raid1 -L 5G -n transform -m 2 revolution_9 Logical volume "transform" created. [root@host-119 ~]# lvs -a -o +devices LV VG Attr LSize Cpy%Sync Devices transform revolution_9 rwi-a-r--- 5.00g 100.00 transform_rimage_0(0),transform_rimage_1(0),transform_rimage_2(0) [transform_rimage_0] revolution_9 iwi-aor--- 5.00g /dev/sda1(1) [transform_rimage_1] revolution_9 iwi-aor--- 5.00g /dev/sdb1(1) [transform_rimage_2] revolution_9 iwi-aor--- 5.00g /dev/sdc1(1) [transform_rmeta_0] revolution_9 ewi-aor--- 4.00m /dev/sda1(0) [transform_rmeta_1] revolution_9 ewi-aor--- 4.00m /dev/sdb1(0) [transform_rmeta_2] revolution_9 ewi-aor--- 4.00m /dev/sdc1(0) [root@host-119 ~]# lvconvert --type mirror revolution_9/transform Are you sure you want to convert revolution_9/transform back to the older "mirror" type? [y/n]: y Logical volume revolution_9/transform successfully converted. [root@host-119 ~]# lvs -a -o +devices LV VG Attr LSize Log Cpy%Sync Devices transform revolution_9 mwi-a-m--- 5.00g [transform_mlog] 100.00 transform_mimage_0(0),transform_mimage_1(0),transform_mimage_2(0) [transform_mimage_0] revolution_9 iwi-aom--- 5.00g /dev/sda1(1) [transform_mimage_1] revolution_9 iwi-aom--- 5.00g /dev/sdb1(1) [transform_mimage_2] revolution_9 iwi-aom--- 5.00g /dev/sdc1(1) [transform_mlog] revolution_9 lwn-aom--- 4.00m /dev/sdh1(0) mirror device(s) (transform) found in revolution_9 on host-119 Creating ext on top of mirror(s) on host-119... mke2fs 1.42.9 (28-Dec-2013) Mounting mirrored ext filesystems on host-119... ================================================================================ Iteration 0.1 started at Wed Aug 31 17:46:46 CDT 2016 ================================================================================ ACTUAL LEG ORDER: /dev/sda1 /dev/sdb1 /dev/sdc1 Scenario kill_random_legs: Kill random legs ********* Mirror info for this scenario ********* * mirrors: transform * leg devices: /dev/sda1 /dev/sdb1 /dev/sdc1 * log devices: /dev/sdh1 * failpv(s): /dev/sda1 * failnode(s): host-119 * lvmetad: 1 * leg fault policy: remove * log fault policy: allocate ************************************************* Current mirror/raid device structure(s): LV Attr LSize Cpy%Sync Devices transform mwi-aom--- 5.00g 100.00 transform_mimage_0(0),transform_mimage_1(0),transform_mimage_2(0) [transform_mimage_0] iwi-aom--- 5.00g /dev/sda1(1) [transform_mimage_1] iwi-aom--- 5.00g /dev/sdb1(1) [transform_mimage_2] iwi-aom--- 5.00g /dev/sdc1(1) [transform_mlog] lwn-aom--- 4.00m /dev/sdh1(0) PV=/dev/sda1 transform_mimage_0: 6 PV=/dev/sda1 transform_mimage_0: 6 Writing verification files (checkit) to mirror(s) on... ---- host-119 ---- Sleeping 15 seconds to get some outsanding I/O locks before the failure Verifying files (checkit) on mirror(s) on... ---- host-119 ---- Disabling device sda on host-119 Getting recovery check start time from /var/log/messages: Aug 31 17:47 Attempting I/O to cause mirror down conversion(s) on host-119 dd if=/dev/zero of=/mnt/transform/ddfile count=10 bs=4M 10+0 records in 10+0 records out 41943040 bytes (42 MB) copied, 0.177075 s, 237 MB/s Verifying current sanity of lvm after the failure Current mirror/raid device structure(s): WARNING: Device for PV SAKkmK-BMc0-e7Oc-tg0F-5JIi-UNoi-uOuew3 not found or rejected by a filter. LV Attr LSize Cpy%Sync Devices transform mwi-aom--- 5.00g 100.00 transform_mimage_1(0),transform_mimage_2(0) [transform_mimage_1] iwi-aom--- 5.00g /dev/sdb1(1) [transform_mimage_2] iwi-aom--- 5.00g /dev/sdc1(1) [transform_mlog] lwn-aom--- 4.00m /dev/sdh1(0) Verify that each of the mirror repairs finished successfully Verifying FAILED device /dev/sda1 is *NOT* in the volume(s) olog: 2.2 Verifying LOG device(s) /dev/sdh1 *ARE* in the mirror(s) Verifying LEG device /dev/sdb1 *IS* in the volume(s) Verifying LEG device /dev/sdc1 *IS* in the volume(s) verify the dm devices associated with /dev/sda1 have been removed as expected Checking REMOVAL of transform_mimage_0 on: host-119 Verify that the mirror image order remains the same after the down conversion EXPECTED LEG ORDER: /dev/sdb1 /dev/sdc1 ACTUAL LEG ORDER: /dev/sdb1 /dev/sdc1 Verifying files (checkit) on mirror(s) on... ---- host-119 ---- Enabling device sda on host-119 WARNING: Inconsistent metadata found for VG revolution_9 Running vgs to make LVM update metadata version if possible (will restore a-m PVs) WARNING: Missing device /dev/sda1 reappeared, updating metadata for VG revolution_9 to version 10. WARNING: Inconsistent metadata found for VG revolution_9 - updating to use version 10 ------------------------------------------------------------------------------- Force a vgreduce to either clean up the corrupt additional LV or lack of PV MDA ------------------------------------------------------------------------------- Recreating PVs /dev/sda1 and then extending back into revolution_9 host-119 pvcreate /dev/sda1 Can't initialize physical volume "/dev/sda1" of volume group "revolution_9" without -ff recreation of /dev/sda1 failed, must still be in VG host-119 vgextend revolution_9 /dev/sda1 Physical volume '/dev/sda1' is already in volume group 'revolution_9' Unable to add physical volume '/dev/sda1' to volume group 'revolution_9' extension of /dev/sda1 back into revolution_9 failed Getting PE for leg: /dev/sda1 Getting PE for leg: /dev/sdb1 Getting PE for leg: /dev/sdc1 Up converting linear(s) back to mirror(s) on host-119... host-119: lvconvert --yes --type mirror --mirrorlog disk -m 2 revolution_9/transform /dev/sda1:0-5117 /dev/sdb1:0-5117 /dev/sdc1:0-5117 /dev/sdh1:0-150 Waiting for convert to finish (loss of _mimagetmp_) Waiting until all mirror|raid volumes become fully syncd... 1/1 mirror(s) are fully synced: ( 100.00% ) Sleeping 15 sec Verifying files (checkit) on mirror(s) on... ---- host-119 ---- Stopping the io load (collie/xdoio) on mirror(s) (In reply to Heinz Mauelshagen from comment #15) > LVM2 upstream commit ids related: > 952e4133283ac4a593d0fa8daccdc55c907379de > 97ee5a1cd326cac4fa5642395d13ebbf13965970 > 2d65ce9711dace22cebb370b0d93fd7a98213bd8 (In reply to Corey Marthaler from comment #19) > This does appear to be fixed in the latest kernel. First two iterations > passed so far (this was always triggered on the very first iteration). > Letting test run all night... > > 3.10.0-497.el7.x86_64 > lvm2-2.02.164-4.el7 BUILT: Wed Aug 31 08:47:09 CDT 2016 > lvm2-libs-2.02.164-4.el7 BUILT: Wed Aug 31 08:47:09 CDT 2016 > lvm2-cluster-2.02.164-4.el7 BUILT: Wed Aug 31 08:47:09 CDT 2016 > device-mapper-1.02.133-4.el7 BUILT: Wed Aug 31 08:47:09 CDT 2016 > device-mapper-libs-1.02.133-4.el7 BUILT: Wed Aug 31 08:47:09 CDT 2016 > device-mapper-event-1.02.133-4.el7 BUILT: Wed Aug 31 08:47:09 CDT 2016 > device-mapper-event-libs-1.02.133-4.el7 BUILT: Wed Aug 31 08:47:09 CDT > 2016 > device-mapper-persistent-data-0.6.3-1.el7 BUILT: Fri Jul 22 05:29:13 CDT > 2016 The patches above are part of that lvm2 build. So I'm adding this to errata. Marking verified in the latest rpms. 3.10.0-501.el7.x86_64 lvm2-2.02.165-1.el7 BUILT: Wed Sep 7 11:04:22 CDT 2016 lvm2-libs-2.02.165-1.el7 BUILT: Wed Sep 7 11:04:22 CDT 2016 lvm2-cluster-2.02.165-1.el7 BUILT: Wed Sep 7 11:04:22 CDT 2016 device-mapper-1.02.134-1.el7 BUILT: Wed Sep 7 11:04:22 CDT 2016 device-mapper-libs-1.02.134-1.el7 BUILT: Wed Sep 7 11:04:22 CDT 2016 device-mapper-event-1.02.134-1.el7 BUILT: Wed Sep 7 11:04:22 CDT 2016 device-mapper-event-libs-1.02.134-1.el7 BUILT: Wed Sep 7 11:04:22 CDT 2016 device-mapper-persistent-data-0.6.3-1.el7 BUILT: Fri Jul 22 05:29:13 CDT 2016 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-1445.html |