Hide Forgot
Description of problem: Create a mirror, fail a leg, watch the down convert, bring the failed leg back online, upconvert it back to the original state, fail another leg while it's in process of converting, deadlock. There is also another bug here I'm tracking down as I've found times when although the convert finishes to 100%, the tmp image never gets dropped. If this test waits for the syncing to finish (assuming it doesn't hit the other bug I just mentioned) this deadlock will not happen, if it doesn't wait, the deadlock always happens. # ./revolution_9 Creating mirror(s) on host-064.virt.lab.msp.redhat.com... host-064.virt.lab.msp.redhat.com: lvcreate --type mirror --mirrorlog disk -m 2 -n mirror_1 -L 2G revolution_9 /dev/sdc1:0-2557 /dev/sdd1:0-2557 /dev/sde1:0-2557 /dev/sdf1:0-150 Waiting until all mirror|raid volumes become fully syncd... 0/1 mirror(s) are fully synced: ( 7.30% ) [...] 1/1 mirror(s) are fully synced: ( 100.00% ) Creating ext on top of mirror(s) on host-064.virt.lab.msp.redhat.com... mke2fs 1.42.8 (20-Jun-2013) Mounting mirrored ext filesystems on host-064.virt.lab.msp.redhat.com... ================================================================================ Iteration 0.1 started at Thu Dec 5 17:16:16 CST 2013 ================================================================================ ACTUAL LEG ORDER: /dev/sdc1 /dev/sdd1 /dev/sde1 Scenario kill_non_primary_leg: Kill non primary leg ********* Mirror info for this scenario ********* * mirrors: mirror_1 * leg devices: /dev/sdc1 /dev/sdd1 /dev/sde1 * log devices: /dev/sdf1 * failpv(s): /dev/sde1 * failnode(s): host-064.virt.lab.msp.redhat.com * lvmetad: 0 * leg fault policy: remove * log fault policy: allocate ************************************************* Current mirror/raid device structure(s): LV Attr LSize Cpy%Sync Devices mirror_1 mwi-aom--- 2.00g 100.00 mirror_1_mimage_0(0),mirror_1_mimage_1(0),mirror_1_mimage_2(0) [mirror_1_mimage_0] iwi-aom--- 2.00g /dev/sdc1(0) [mirror_1_mimage_1] iwi-aom--- 2.00g /dev/sdd1(0) [mirror_1_mimage_2] iwi-aom--- 2.00g /dev/sde1(0) [mirror_1_mlog] lwi-aom--- 4.00m /dev/sdf1(0) AVAIL:2 - NEEDED:1 will_alloc_work=1 PV=/dev/sde1 mirror_1_mimage_2: 6 PV=/dev/sde1 mirror_1_mimage_2: 6 Writing verification files (checkit) to mirror(s) on... ---- host-064.virt.lab.msp.redhat.com ---- Sleeping 15 seconds to get some outsanding EXT I/O locks before the failure Verifying files (checkit) on mirror(s) on... ---- host-064.virt.lab.msp.redhat.com ---- Disabling device sde on host-064.virt.lab.msp.redhat.com Getting recovery check start time from /var/log/messages: Dec 5 17:16 Attempting I/O to cause mirror down conversion(s) on host-064.virt.lab.msp.redhat.com 10+0 records in 10+0 records out 41943040 bytes (42 MB) copied, 0.113929 s, 368 MB/s Verifying current sanity of lvm after the failure Current mirror/raid device structure(s): Couldn't find device with uuid M1Rpc0-G8Sr-fpoo-kSe9-hszw-KrBS-mPNydP. LV Attr LSize Cpy%Sync Devices mirror_1 mwi-aom--- 2.00g 100.00 mirror_1_mimage_0(0),mirror_1_mimage_1(0) [mirror_1_mimage_0] iwi-aom--- 2.00g /dev/sdc1(0) [mirror_1_mimage_1] iwi-aom--- 2.00g /dev/sdd1(0) [mirror_1_mlog] lwi-aom--- 4.00m /dev/sdf1(0) Verify that each of the mirror repairs finished successfully Verifying FAILED device /dev/sde1 is *NOT* in the volume(s) olog: 2.2 Verifying LOG device(s) /dev/sdf1 *ARE* in the mirror(s) Verifying LEG device /dev/sdc1 *IS* in the volume(s) Verifying LEG device /dev/sdd1 *IS* in the volume(s) verify the dm devices associated with /dev/sde1 have been removed as expected Checking REMOVAL of mirror_1_mimage_2 on: host-064.virt.lab.msp.redhat.com Verify that the mirror image order remains the same after the down conversion EXPECTED LEG ORDER: /dev/sdc1 /dev/sdd1 ACTUAL LEG ORDER: /dev/sdc1 /dev/sdd1 Verifying files (checkit) on mirror(s) on... ---- host-064.virt.lab.msp.redhat.com ---- Enabling device sde on host-064.virt.lab.msp.redhat.com HACK FOR BUG 799048: vgextend --restoremissing revolution_9 /dev/sde1 WARNING: Inconsistent metadata found for VG revolution_9 - updating to use version 7 HACK FOR BUG 799048: vgextend revolution_9 /dev/sde1 Physical volume '/dev/sde1' is already in volume group 'revolution_9' Unable to add physical volume '/dev/sde1' to volume group 'revolution_9'. Getting PE for leg: /dev/sdc1 Getting PE for leg: /dev/sdd1 Getting PE for leg: /dev/sde1 Up converting linear(s) back to mirror(s) on host-064.virt.lab.msp.redhat.com... host-064.virt.lab.msp.redhat.com: lvconvert --mirrorlog disk -m 2 -b revolution_9/mirror_1 /dev/sdc1:0-2557 /dev/sdd1:0-2557 /dev/sde1:0-2557 /dev/sdf1:0-150 Verifying the up conversions by checking if all original leg devices are back in the mirror(s) Verifying device /dev/sdc1 *IS* one of the legs in the mirror(s) Verifying device /dev/sdd1 *IS* one of the legs in the mirror(s) Verifying device /dev/sde1 *IS* one of the legs in the mirror(s) Waiting until all mirror|raid volumes become fully syncd... 0/1 mirror(s) are fully synced: ( 23.75% ) 0/1 mirror(s) are fully synced: ( 34.11% ) 0/1 mirror(s) are fully synced: ( 53.27% ) 0/1 mirror(s) are fully synced: ( 65.94% ) 0/1 mirror(s) are fully synced: ( 95.14% ) 1/1 mirror(s) are fully synced: ( 100.00% ) Verifying files (checkit) on mirror(s) on... ---- host-064.virt.lab.msp.redhat.com ---- Stopping the io load (collie/xdoio) on mirror(s) ================================================================================ Iteration 0.2 started at Thu Dec 5 17:18:57 CST 2013 ================================================================================ ACTUAL LEG ORDER: /dev/sdc1 /dev/sdd1 /dev/sde1 Scenario kill_log: Kill log ********* Mirror info for this scenario ********* * mirrors: mirror_1 * leg devices: /dev/sdc1 /dev/sdd1 /dev/sde1 * log devices: /dev/sdf1 * failpv(s): /dev/sdf1 * failnode(s): host-064.virt.lab.msp.redhat.com * lvmetad: 0 * leg fault policy: remove * log fault policy: allocate ************************************************* #### NOTE THE TMP IMAGE STILL EXISTS FROM PREVIOUS FAILURE ITERATION: #### Current mirror/raid device structure(s): LV Attr LSize Cpy%Sync Devices mirror_1 cwi-aom--- 2.00g 100.00 mirror_1_mimagetmp_2(0),mirror_1_mimage_2(0) [mirror_1_mimage_0] iwi-aom--- 2.00g /dev/sdc1(0) [mirror_1_mimage_1] iwi-aom--- 2.00g /dev/sdd1(0) [mirror_1_mimage_2] iwi-aom--- 2.00g /dev/sde1(0) [mirror_1_mimagetmp_2] mwi-aom--- 2.00g 100.00 mirror_1_mimage_0(0),mirror_1_mimage_1(0) [mirror_1_mlog] lwi-aom--- 4.00m /dev/sdf1(0) AVAIL:2 - NEEDED:1 will_alloc_work=1 PV=/dev/sdf1 mirror_1_mlog: 1.2 PV=/dev/sdf1 mirror_1_mlog: 1.2 Writing verification files (checkit) to mirror(s) on... ---- host-064.virt.lab.msp.redhat.com ---- Sleeping 15 seconds to get some outsanding EXT I/O locks before the failure Verifying files (checkit) on mirror(s) on... ---- host-064.virt.lab.msp.redhat.com ---- Disabling device sdf on host-064.virt.lab.msp.redhat.com Getting recovery check start time from /var/log/messages: Dec 5 17:19 Attempting I/O to cause mirror down conversion(s) on host-064.virt.lab.msp.redhat.com [DEADLOCK] Version-Release number of selected component (if applicable): 3.10.0-53.el7.x86_64 lvm2-2.02.103-6.el7 BUILT: Wed Nov 27 02:28:25 CST 2013 lvm2-libs-2.02.103-6.el7 BUILT: Wed Nov 27 02:28:25 CST 2013 lvm2-cluster-2.02.103-6.el7 BUILT: Wed Nov 27 02:28:25 CST 2013 device-mapper-1.02.82-6.el7 BUILT: Wed Nov 27 02:28:25 CST 2013 device-mapper-libs-1.02.82-6.el7 BUILT: Wed Nov 27 02:28:25 CST 2013 device-mapper-event-1.02.82-6.el7 BUILT: Wed Nov 27 02:28:25 CST 2013 device-mapper-event-libs-1.02.82-6.el7 BUILT: Wed Nov 27 02:28:25 CST 2013 device-mapper-persistent-data-0.2.8-2.el7 BUILT: Wed Oct 30 10:20:48 CDT 2013 cmirror-2.02.103-6.el7 BUILT: Wed Nov 27 02:28:25 CST 2013 How reproducible: Often
same problem as bug 1029218
(In reply to Jonathan Earl Brassow from comment #1) > same problem as bug 1029218 D'oh! Same as bug 1039218