Bug 1074713
| Summary: | lvconvert can deadlock when attempting to repair failed raid volumes | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Corey Marthaler <cmarthal> |
| Component: | lvm2 | Assignee: | Jonathan Earl Brassow <jbrassow> |
| lvm2 sub component: | Mirroring and RAID | QA Contact: | cluster-qe <cluster-qe> |
| Status: | CLOSED WORKSFORME | Docs Contact: | |
| Severity: | unspecified | ||
| Priority: | unspecified | CC: | agk, heinzm, jbrassow, msnitzer, prajnoha, prockai, zkabelac |
| Version: | 7.0 | ||
| Target Milestone: | rc | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2016-01-21 22:03:58 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 1075262 | ||
Description of problem: ================================================================================ Iteration 2.11 started at Sat Mar 8 01:30:05 CST 2014 ================================================================================ Scenario kill_multiple_synced_raid1_3legs: Kill multiple legs of synced 3 leg raid1 volume(s) ********* RAID hash info for this scenario ********* * names: synced_multiple_raid1_3legs_1 synced_multiple_raid1_3legs_2 synced_multiple_raid1_3legs_3 * sync: 1 * type: raid1 * -m |-i value: 3 * leg devices: /dev/sdg1 /dev/sdb1 /dev/sde1 /dev/sdh1 * failpv(s): /dev/sdh1 /dev/sdb1 * failnode(s): host-005.virt.lab.msp.redhat.com * lvmetad: 1 * raid fault policy: warn ****************************************************** Creating raids(s) on host-005.virt.lab.msp.redhat.com... host-005.virt.lab.msp.redhat.com: lvcreate --type raid1 -m 3 -n synced_multiple_raid1_3legs_1 -L 500M black_bird /dev/sdg1:0-2000 /dev/sdb1:0-2000 /dev/sde1:0-2000 /dev/sdh1:0-2000 WARNING: ext3 signature detected on /dev/black_bird/synced_multiple_raid1_3legs_1 at offset 1080. Wipe it? [y/n] 1 existing signature left on the device. host-005.virt.lab.msp.redhat.com: lvcreate --type raid1 -m 3 -n synced_multiple_raid1_3legs_2 -L 500M black_bird /dev/sdg1:0-2000 /dev/sdb1:0-2000 /dev/sde1:0-2000 /dev/sdh1:0-2000 host-005.virt.lab.msp.redhat.com: lvcreate --type raid1 -m 3 -n synced_multiple_raid1_3legs_3 -L 500M black_bird /dev/sdg1:0-2000 /dev/sdb1:0-2000 /dev/sde1:0-2000 /dev/sdh1:0-2000 WARNING: ext3 signature detected on /dev/black_bird/synced_multiple_raid1_3legs_3 at offset 1080. Wipe it? [y/n] 1 existing signature left on the device. Current mirror/raid device structure(s): LV Attr LSize Cpy%Sync Devices synced_multiple_raid1_3legs_1 rwi-a-r--- 500.00m 96.00 synced_multiple_raid1_3legs_1_rimage_0(0),synced_multiple_raid1_3legs_1_rimage_1(0),synced_multiple_raid1_3legs_1_rimage_2(0),synced_multiple_raid1_3legs_1_rimage_3(0) [synced_multiple_raid1_3legs_1_rimage_0] Iwi-aor--- 500.00m /dev/sdg1(1) [synced_multiple_raid1_3legs_1_rimage_1] Iwi-aor--- 500.00m /dev/sdb1(1) [synced_multiple_raid1_3legs_1_rimage_2] Iwi-aor--- 500.00m /dev/sde1(1) [synced_multiple_raid1_3legs_1_rimage_3] Iwi-aor--- 500.00m /dev/sdh1(1) [synced_multiple_raid1_3legs_1_rmeta_0] ewi-aor--- 4.00m /dev/sdg1(0) [synced_multiple_raid1_3legs_1_rmeta_1] ewi-aor--- 4.00m /dev/sdb1(0) [synced_multiple_raid1_3legs_1_rmeta_2] ewi-aor--- 4.00m /dev/sde1(0) [synced_multiple_raid1_3legs_1_rmeta_3] ewi-aor--- 4.00m /dev/sdh1(0) synced_multiple_raid1_3legs_2 rwi-a-r--- 500.00m 39.20 synced_multiple_raid1_3legs_2_rimage_0(0),synced_multiple_raid1_3legs_2_rimage_1(0),synced_multiple_raid1_3legs_2_rimage_2(0),synced_multiple_raid1_3legs_2_rimage_3(0) [synced_multiple_raid1_3legs_2_rimage_0] Iwi-aor--- 500.00m /dev/sdg1(127) [synced_multiple_raid1_3legs_2_rimage_1] Iwi-aor--- 500.00m /dev/sdb1(127) [synced_multiple_raid1_3legs_2_rimage_2] Iwi-aor--- 500.00m /dev/sde1(127) [synced_multiple_raid1_3legs_2_rimage_3] Iwi-aor--- 500.00m /dev/sdh1(127) [synced_multiple_raid1_3legs_2_rmeta_0] ewi-aor--- 4.00m /dev/sdg1(126) [synced_multiple_raid1_3legs_2_rmeta_1] ewi-aor--- 4.00m /dev/sdb1(126) [synced_multiple_raid1_3legs_2_rmeta_2] ewi-aor--- 4.00m /dev/sde1(126) [synced_multiple_raid1_3legs_2_rmeta_3] ewi-aor--- 4.00m /dev/sdh1(126) synced_multiple_raid1_3legs_3 rwi-aor--- 500.00m 7.20 synced_multiple_raid1_3legs_3_rimage_0(0),synced_multiple_raid1_3legs_3_rimage_1(0),synced_multiple_raid1_3legs_3_rimage_2(0),synced_multiple_raid1_3legs_3_rimage_3(0) [synced_multiple_raid1_3legs_3_rimage_0] Iwi-aor--- 500.00m /dev/sdg1(253) [synced_multiple_raid1_3legs_3_rimage_1] Iwi-aor--- 500.00m /dev/sdb1(253) [synced_multiple_raid1_3legs_3_rimage_2] Iwi-aor--- 500.00m /dev/sde1(253) [synced_multiple_raid1_3legs_3_rimage_3] Iwi-aor--- 500.00m /dev/sdh1(253) [synced_multiple_raid1_3legs_3_rmeta_0] ewi-aor--- 4.00m /dev/sdg1(252) [synced_multiple_raid1_3legs_3_rmeta_1] ewi-aor--- 4.00m /dev/sdb1(252) [synced_multiple_raid1_3legs_3_rmeta_2] ewi-aor--- 4.00m /dev/sde1(252) [synced_multiple_raid1_3legs_3_rmeta_3] ewi-aor--- 4.00m /dev/sdh1(252) Waiting until all mirror|raid volumes become fully syncd... 1/3 mirror(s) are fully synced: ( 100.00% 71.24% 27.77% ) 1/3 mirror(s) are fully synced: ( 100.00% 77.66% 39.09% ) 1/3 mirror(s) are fully synced: ( 100.00% 93.97% 50.26% ) 2/3 mirror(s) are fully synced: ( 100.00% 100.00% 61.02% ) 2/3 mirror(s) are fully synced: ( 100.00% 100.00% 97.16% ) 3/3 mirror(s) are fully synced: ( 100.00% 100.00% 100.00% ) Creating ext on top of mirror(s) on host-005.virt.lab.msp.redhat.com... mke2fs 1.42.9 (28-Dec-2013) mke2fs 1.42.9 (28-Dec-2013) mke2fs 1.42.9 (28-Dec-2013) Mounting mirrored ext filesystems on host-005.virt.lab.msp.redhat.com... PV=/dev/sdh1 synced_multiple_raid1_3legs_1_rimage_3: 2 synced_multiple_raid1_3legs_1_rmeta_3: 2 synced_multiple_raid1_3legs_2_rimage_3: 2 synced_multiple_raid1_3legs_2_rmeta_3: 2 synced_multiple_raid1_3legs_3_rimage_3: 2 synced_multiple_raid1_3legs_3_rmeta_3: 2 PV=/dev/sdb1 synced_multiple_raid1_3legs_1_rimage_1: 2 synced_multiple_raid1_3legs_1_rmeta_1: 2 synced_multiple_raid1_3legs_2_rimage_1: 2 synced_multiple_raid1_3legs_2_rmeta_1: 2 synced_multiple_raid1_3legs_3_rimage_1: 2 synced_multiple_raid1_3legs_3_rmeta_1: 2 Writing verification files (checkit) to mirror(s) on... ---- host-005.virt.lab.msp.redhat.com ---- Sleeping 15 seconds to get some outsanding EXT I/O locks before the failure Verifying files (checkit) on mirror(s) on... ---- host-005.virt.lab.msp.redhat.com ---- Disabling device sdh on host-005.virt.lab.msp.redhat.com /dev/sdh1: read failed after 0 of 2048 at 0: Input/output error Disabling device sdb on host-005.virt.lab.msp.redhat.com /dev/sdb1: read failed after 0 of 2048 at 0: Input/output error Getting recovery check start time from /var/log/messages: Mar 8 01:38 Attempting I/O to cause mirror down conversion(s) on host-005.virt.lab.msp.redhat.com 10+0 records in 10+0 records out 41943040 bytes (42 MB) copied, 1.50675 s, 27.8 MB/s 10+0 records in 10+0 records out 41943040 bytes (42 MB) copied, 1.17548 s, 35.7 MB/s 10+0 records in 10+0 records out 41943040 bytes (42 MB) copied, 1.1676 s, 35.9 MB/s Verifying current sanity of lvm after the failure Current mirror/raid device structure(s): PV 8oEH1m-xnDJ-oNed-IfRE-LniQ-Wj99-1wYaf5 not recognised. Is the device missing? PV 4UwQoJ-gEnx-5swH-CKeD-xVD3-1Fyf-B2tTAv not recognised. Is the device missing? PV 8oEH1m-xnDJ-oNed-IfRE-LniQ-Wj99-1wYaf5 not recognised. Is the device missing? PV 4UwQoJ-gEnx-5swH-CKeD-xVD3-1Fyf-B2tTAv not recognised. Is the device missing? LV Attr LSize Cpy%Sync Devices synced_multiple_raid1_3legs_1 rwi-aor-p- 500.00m 100.00 synced_multiple_raid1_3legs_1_rimage_0(0),synced_multiple_raid1_3legs_1_rimage_1(0),synced_multiple_raid1_3legs_1_rimage_2(0),synced_multiple_raid1_3legs_1_rimage_3(0) [synced_multiple_raid1_3legs_1_rimage_0] iwi-aor--- 500.00m /dev/sdg1(1) [synced_multiple_raid1_3legs_1_rimage_1] iwi-aor-p- 500.00m unknown device(1) [synced_multiple_raid1_3legs_1_rimage_2] iwi-aor--- 500.00m /dev/sde1(1) [synced_multiple_raid1_3legs_1_rimage_3] iwi-aor-p- 500.00m unknown device(1) [synced_multiple_raid1_3legs_1_rmeta_0] ewi-aor--- 4.00m /dev/sdg1(0) [synced_multiple_raid1_3legs_1_rmeta_1] ewi-aor-p- 4.00m unknown device(0) [synced_multiple_raid1_3legs_1_rmeta_2] ewi-aor--- 4.00m /dev/sde1(0) [synced_multiple_raid1_3legs_1_rmeta_3] ewi-aor-p- 4.00m unknown device(0) synced_multiple_raid1_3legs_2 rwi-aor-p- 500.00m 100.00 synced_multiple_raid1_3legs_2_rimage_0(0),synced_multiple_raid1_3legs_2_rimage_1(0),synced_multiple_raid1_3legs_2_rimage_2(0),synced_multiple_raid1_3legs_2_rimage_3(0) [synced_multiple_raid1_3legs_2_rimage_0] iwi-aor--- 500.00m /dev/sdg1(127) [synced_multiple_raid1_3legs_2_rimage_1] iwi-aor-p- 500.00m unknown device(127) [synced_multiple_raid1_3legs_2_rimage_2] iwi-aor--- 500.00m /dev/sde1(127) [synced_multiple_raid1_3legs_2_rimage_3] iwi-aor-p- 500.00m unknown device(127) [synced_multiple_raid1_3legs_2_rmeta_0] ewi-aor--- 4.00m /dev/sdg1(126) [synced_multiple_raid1_3legs_2_rmeta_1] ewi-aor-p- 4.00m unknown device(126) [synced_multiple_raid1_3legs_2_rmeta_2] ewi-aor--- 4.00m /dev/sde1(126) [synced_multiple_raid1_3legs_2_rmeta_3] ewi-aor-p- 4.00m unknown device(126) synced_multiple_raid1_3legs_3 rwi-aor-p- 500.00m 100.00 synced_multiple_raid1_3legs_3_rimage_0(0),synced_multiple_raid1_3legs_3_rimage_1(0),synced_multiple_raid1_3legs_3_rimage_2(0),synced_multiple_raid1_3legs_3_rimage_3(0) [synced_multiple_raid1_3legs_3_rimage_0] iwi-aor--- 500.00m /dev/sdg1(253) [synced_multiple_raid1_3legs_3_rimage_1] iwi-aor-p- 500.00m unknown device(253) [synced_multiple_raid1_3legs_3_rimage_2] iwi-aor--- 500.00m /dev/sde1(253) [synced_multiple_raid1_3legs_3_rimage_3] iwi-aor-p- 500.00m unknown device(253) [synced_multiple_raid1_3legs_3_rmeta_0] ewi-aor--- 4.00m /dev/sdg1(252) [synced_multiple_raid1_3legs_3_rmeta_1] ewi-aor-p- 4.00m unknown device(252) [synced_multiple_raid1_3legs_3_rmeta_2] ewi-aor--- 4.00m /dev/sde1(252) [synced_multiple_raid1_3legs_3_rmeta_3] ewi-aor-p- 4.00m unknown device(252) Verifying FAILED device /dev/sdh1 is *NOT* in the volume(s) Verifying FAILED device /dev/sdb1 is *NOT* in the volume(s) Verifying IMAGE device /dev/sdg1 *IS* in the volume(s) Verifying IMAGE device /dev/sde1 *IS* in the volume(s) verify the rimage/rmeta dm devices remain after the failures Checking EXISTENCE and STATE of synced_multiple_raid1_3legs_1_rimage_3 on: host-005.virt.lab.msp.redhat.com Checking EXISTENCE and STATE of synced_multiple_raid1_3legs_1_rmeta_3 on: host-005.virt.lab.msp.redhat.com Checking EXISTENCE and STATE of synced_multiple_raid1_3legs_2_rimage_3 on: host-005.virt.lab.msp.redhat.com Checking EXISTENCE and STATE of synced_multiple_raid1_3legs_2_rmeta_3 on: host-005.virt.lab.msp.redhat.com Checking EXISTENCE and STATE of synced_multiple_raid1_3legs_3_rimage_3 on: host-005.virt.lab.msp.redhat.com Checking EXISTENCE and STATE of synced_multiple_raid1_3legs_3_rmeta_3 on: host-005.virt.lab.msp.redhat.com Checking EXISTENCE and STATE of synced_multiple_raid1_3legs_1_rimage_1 on: host-005.virt.lab.msp.redhat.com Checking EXISTENCE and STATE of synced_multiple_raid1_3legs_1_rmeta_1 on: host-005.virt.lab.msp.redhat.com Checking EXISTENCE and STATE of synced_multiple_raid1_3legs_2_rimage_1 on: host-005.virt.lab.msp.redhat.com Checking EXISTENCE and STATE of synced_multiple_raid1_3legs_2_rmeta_1 on: host-005.virt.lab.msp.redhat.com Checking EXISTENCE and STATE of synced_multiple_raid1_3legs_3_rimage_1 on: host-005.virt.lab.msp.redhat.com Checking EXISTENCE and STATE of synced_multiple_raid1_3legs_3_rmeta_1 on: host-005.virt.lab.msp.redhat.com Verify the raid image order is what's expected based on raid fault policy EXPECTED LEG ORDER: /dev/sdg1 unknown /dev/sde1 unknown ACTUAL LEG ORDER: /dev/sdg1 unknown /dev/sde1 unknown ACTUAL LEG ORDER: /dev/sdg1 unknown /dev/sde1 unknown ACTUAL LEG ORDER: /dev/sdg1 unknown /dev/sde1 unknown Waiting until all mirror|raid volumes become fully syncd... 3/3 mirror(s) are fully synced: ( 100.00% 100.00% 100.00% ) Fault policy is warn, manually repairing failed raid volumes host-005.virt.lab.msp.redhat.com: 'lvconvert --yes --repair black_bird/synced_multiple_raid1_3legs_1' host-005.virt.lab.msp.redhat.com: 'lvconvert --yes --repair black_bird/synced_multiple_raid1_3legs_2' host-005.virt.lab.msp.redhat.com: 'lvconvert --yes --repair black_bird/synced_multiple_raid1_3legs_3' [DEADLOCK] [root@host-005 ~]# ps -ef | grep lvconvert root 23625 23624 0 Mar08 ? 00:00:00 lvconvert --yes --repair black_bird/synced_multiple_raid1_3legs_3 root 30181 2462 0 18:17 pts/0 00:00:00 grep --color=auto lvconvert [root@host-005 ~]# strace -p 23625 Process 23625 attached semop(42631169, {{0, 0, 0}}, 1 Mar 10 18:18:45 host-005 kernel: [269018.288029] lvconvert S ffff88003fc14540 0 23625 23624 0x00000081 Mar 10 18:18:45 host-005 kernel: [269018.288029] ffff88003fd25cf8 0000000000000086 ffff88003fd25fd8 0000000000014540 Mar 10 18:18:45 host-005 kernel: [269018.288029] ffff88003fd25fd8 0000000000014540 ffff880024b30af0 ffff8800210d5e40 Mar 10 18:18:45 host-005 kernel: [269018.288029] ffff88003fd25db0 0000000000000000 0000000000000000 0000000000000000 Mar 10 18:18:45 host-005 kernel: [269018.288029] Call Trace: Mar 10 18:18:45 host-005 kernel: [269018.288029] [<ffffffff815c6699>] schedule+0x29/0x70 Mar 10 18:18:45 host-005 kernel: [269018.288029] [<ffffffff8122c2fa>] SYSC_semtimedop+0x8ba/0xd00 Mar 10 18:18:45 host-005 kernel: [269018.288029] [<ffffffff81099e5e>] ? dequeue_task_fair+0x41e/0x620 Mar 10 18:18:45 host-005 kernel: [269018.288029] [<ffffffff815c6699>] ? schedule+0x29/0x70 Mar 10 18:18:45 host-005 kernel: [269018.288029] [<ffffffff8106d5bc>] ? ptrace_stop+0x1bc/0x2b0 Mar 10 18:18:45 host-005 kernel: [269018.288029] [<ffffffff8106d738>] ? ptrace_do_notify+0x88/0xa0 Mar 10 18:18:45 host-005 kernel: [269018.288029] [<ffffffff810de6ec>] ? __audit_syscall_entry+0x9c/0xf0 Mar 10 18:18:45 host-005 kernel: [269018.288029] [<ffffffff8101f08d>] ? syscall_trace_enter+0x23d/0x250 Mar 10 18:18:45 host-005 kernel: [269018.288029] [<ffffffff8122d21e>] SyS_semtimedop+0xe/0x10 Mar 10 18:18:45 host-005 kernel: [269018.288029] [<ffffffff8122d230>] SyS_semop+0x10/0x20 Mar 10 18:18:45 host-005 kernel: [269018.288029] [<ffffffff815d1167>] tracesys+0xdd/0xe2 Mar 10 18:18:46 host-005 kernel: lvconvert S ffff88003fc14540 0 23625 23624 0x00000081 Mar 10 18:18:46 host-005 kernel: ffff88003fd25cf8 0000000000000086 ffff88003fd25fd8 0000000000014540 Mar 10 18:18:46 host-005 kernel: ffff88003fd25fd8 0000000000014540 ffff880024b30af0 ffff8800210d5e40 Mar 10 18:18:46 host-005 kernel: ffff88003fd25db0 0000000000000000 0000000000000000 0000000000000000 Mar 10 18:18:46 host-005 kernel: Call Trace: Mar 10 18:18:46 host-005 kernel: [<ffffffff815c6699>] schedule+0x29/0x70 Mar 10 18:18:46 host-005 kernel: [<ffffffff8122c2fa>] SYSC_semtimedop+0x8ba/0xd00 Mar 10 18:18:46 host-005 kernel: [<ffffffff81099e5e>] ? dequeue_task_fair+0x41e/0x620 Mar 10 18:18:46 host-005 kernel: [<ffffffff815c6699>] ? schedule+0x29/0x70 Mar 10 18:18:46 host-005 kernel: [<ffffffff8106d5bc>] ? ptrace_stop+0x1bc/0x2b0 Mar 10 18:18:46 host-005 kernel: [<ffffffff8106d738>] ? ptrace_do_notify+0x88/0xa0 Mar 10 18:18:46 host-005 kernel: [<ffffffff810de6ec>] ? __audit_syscall_entry+0x9c/0xf0 Mar 10 18:18:46 host-005 kernel: [<ffffffff8101f08d>] ? syscall_trace_enter+0x23d/0x250 Mar 10 18:18:46 host-005 kernel: [<ffffffff8122d21e>] SyS_semtimedop+0xe/0x10 Mar 10 18:18:46 host-005 kernel: [<ffffffff8122d230>] SyS_semop+0x10/0x20 Mar 10 18:18:46 host-005 kernel: [<ffffffff815d1167>] tracesys+0xdd/0xe2 Version-Release number of selected component (if applicable): 3.10.0-90.el7.x86_64 lvm2-2.02.105-11.el7 BUILT: Thu Mar 6 02:07:46 CST 2014 lvm2-libs-2.02.105-11.el7 BUILT: Thu Mar 6 02:07:46 CST 2014 lvm2-cluster-2.02.105-11.el7 BUILT: Thu Mar 6 02:07:46 CST 2014 device-mapper-1.02.84-11.el7 BUILT: Thu Mar 6 02:07:46 CST 2014 device-mapper-libs-1.02.84-11.el7 BUILT: Thu Mar 6 02:07:46 CST 2014 device-mapper-event-1.02.84-11.el7 BUILT: Thu Mar 6 02:07:46 CST 2014 device-mapper-event-libs-1.02.84-11.el7 BUILT: Thu Mar 6 02:07:46 CST 2014 device-mapper-persistent-data-0.2.8-4.el7 BUILT: Fri Jan 24 14:28:55 CST 2014 cmirror-2.02.105-11.el7 BUILT: Thu Mar 6 02:07:46 CST 2014