Note: This bug is displayed in read-only format because
the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
DescriptionCorey Marthaler
2013-12-06 01:34:19 UTC
Description of problem:
Create a mirror, fail a leg, watch the down convert, bring the failed leg back online, upconvert it back to the original state, fail another leg while it's in process of converting, deadlock.
There is also another bug here I'm tracking down as I've found times when although the convert finishes to 100%, the tmp image never gets dropped. If this test waits for the syncing to finish (assuming it doesn't hit the other bug I just mentioned) this deadlock will not happen, if it doesn't wait, the deadlock always happens.
# ./revolution_9
Creating mirror(s) on host-064.virt.lab.msp.redhat.com...
host-064.virt.lab.msp.redhat.com: lvcreate --type mirror --mirrorlog disk -m 2 -n mirror_1 -L 2G revolution_9 /dev/sdc1:0-2557 /dev/sdd1:0-2557 /dev/sde1:0-2557 /dev/sdf1:0-150
Waiting until all mirror|raid volumes become fully syncd...
0/1 mirror(s) are fully synced: ( 7.30% )
[...]
1/1 mirror(s) are fully synced: ( 100.00% )
Creating ext on top of mirror(s) on host-064.virt.lab.msp.redhat.com...
mke2fs 1.42.8 (20-Jun-2013)
Mounting mirrored ext filesystems on host-064.virt.lab.msp.redhat.com...
================================================================================
Iteration 0.1 started at Thu Dec 5 17:16:16 CST 2013
================================================================================
ACTUAL LEG ORDER: /dev/sdc1 /dev/sdd1 /dev/sde1
Scenario kill_non_primary_leg: Kill non primary leg
********* Mirror info for this scenario *********
* mirrors: mirror_1
* leg devices: /dev/sdc1 /dev/sdd1 /dev/sde1
* log devices: /dev/sdf1
* failpv(s): /dev/sde1
* failnode(s): host-064.virt.lab.msp.redhat.com
* lvmetad: 0
* leg fault policy: remove
* log fault policy: allocate
*************************************************
Current mirror/raid device structure(s):
LV Attr LSize Cpy%Sync Devices
mirror_1 mwi-aom--- 2.00g 100.00 mirror_1_mimage_0(0),mirror_1_mimage_1(0),mirror_1_mimage_2(0)
[mirror_1_mimage_0] iwi-aom--- 2.00g /dev/sdc1(0)
[mirror_1_mimage_1] iwi-aom--- 2.00g /dev/sdd1(0)
[mirror_1_mimage_2] iwi-aom--- 2.00g /dev/sde1(0)
[mirror_1_mlog] lwi-aom--- 4.00m /dev/sdf1(0)
AVAIL:2 - NEEDED:1
will_alloc_work=1
PV=/dev/sde1
mirror_1_mimage_2: 6
PV=/dev/sde1
mirror_1_mimage_2: 6
Writing verification files (checkit) to mirror(s) on...
---- host-064.virt.lab.msp.redhat.com ----
Sleeping 15 seconds to get some outsanding EXT I/O locks before the failure
Verifying files (checkit) on mirror(s) on...
---- host-064.virt.lab.msp.redhat.com ----
Disabling device sde on host-064.virt.lab.msp.redhat.com
Getting recovery check start time from /var/log/messages: Dec 5 17:16
Attempting I/O to cause mirror down conversion(s) on host-064.virt.lab.msp.redhat.com
10+0 records in
10+0 records out
41943040 bytes (42 MB) copied, 0.113929 s, 368 MB/s
Verifying current sanity of lvm after the failure
Current mirror/raid device structure(s):
Couldn't find device with uuid M1Rpc0-G8Sr-fpoo-kSe9-hszw-KrBS-mPNydP.
LV Attr LSize Cpy%Sync Devices
mirror_1 mwi-aom--- 2.00g 100.00 mirror_1_mimage_0(0),mirror_1_mimage_1(0)
[mirror_1_mimage_0] iwi-aom--- 2.00g /dev/sdc1(0)
[mirror_1_mimage_1] iwi-aom--- 2.00g /dev/sdd1(0)
[mirror_1_mlog] lwi-aom--- 4.00m /dev/sdf1(0)
Verify that each of the mirror repairs finished successfully
Verifying FAILED device /dev/sde1 is *NOT* in the volume(s)
olog: 2.2
Verifying LOG device(s) /dev/sdf1 *ARE* in the mirror(s)
Verifying LEG device /dev/sdc1 *IS* in the volume(s)
Verifying LEG device /dev/sdd1 *IS* in the volume(s)
verify the dm devices associated with /dev/sde1 have been removed as expected
Checking REMOVAL of mirror_1_mimage_2 on: host-064.virt.lab.msp.redhat.com
Verify that the mirror image order remains the same after the down conversion
EXPECTED LEG ORDER: /dev/sdc1 /dev/sdd1
ACTUAL LEG ORDER: /dev/sdc1 /dev/sdd1
Verifying files (checkit) on mirror(s) on...
---- host-064.virt.lab.msp.redhat.com ----
Enabling device sde on host-064.virt.lab.msp.redhat.com
HACK FOR BUG 799048: vgextend --restoremissing revolution_9 /dev/sde1
WARNING: Inconsistent metadata found for VG revolution_9 - updating to use version 7
HACK FOR BUG 799048: vgextend revolution_9 /dev/sde1
Physical volume '/dev/sde1' is already in volume group 'revolution_9'
Unable to add physical volume '/dev/sde1' to volume group 'revolution_9'.
Getting PE for leg: /dev/sdc1
Getting PE for leg: /dev/sdd1
Getting PE for leg: /dev/sde1
Up converting linear(s) back to mirror(s) on host-064.virt.lab.msp.redhat.com...
host-064.virt.lab.msp.redhat.com: lvconvert --mirrorlog disk -m 2 -b revolution_9/mirror_1 /dev/sdc1:0-2557 /dev/sdd1:0-2557 /dev/sde1:0-2557 /dev/sdf1:0-150
Verifying the up conversions by checking if all original leg devices are back in the mirror(s)
Verifying device /dev/sdc1 *IS* one of the legs in the mirror(s)
Verifying device /dev/sdd1 *IS* one of the legs in the mirror(s)
Verifying device /dev/sde1 *IS* one of the legs in the mirror(s)
Waiting until all mirror|raid volumes become fully syncd...
0/1 mirror(s) are fully synced: ( 23.75% )
0/1 mirror(s) are fully synced: ( 34.11% )
0/1 mirror(s) are fully synced: ( 53.27% )
0/1 mirror(s) are fully synced: ( 65.94% )
0/1 mirror(s) are fully synced: ( 95.14% )
1/1 mirror(s) are fully synced: ( 100.00% )
Verifying files (checkit) on mirror(s) on...
---- host-064.virt.lab.msp.redhat.com ----
Stopping the io load (collie/xdoio) on mirror(s)
================================================================================
Iteration 0.2 started at Thu Dec 5 17:18:57 CST 2013
================================================================================
ACTUAL LEG ORDER: /dev/sdc1 /dev/sdd1 /dev/sde1
Scenario kill_log: Kill log
********* Mirror info for this scenario *********
* mirrors: mirror_1
* leg devices: /dev/sdc1 /dev/sdd1 /dev/sde1
* log devices: /dev/sdf1
* failpv(s): /dev/sdf1
* failnode(s): host-064.virt.lab.msp.redhat.com
* lvmetad: 0
* leg fault policy: remove
* log fault policy: allocate
*************************************************
#### NOTE THE TMP IMAGE STILL EXISTS FROM PREVIOUS FAILURE ITERATION: ####
Current mirror/raid device structure(s):
LV Attr LSize Cpy%Sync Devices
mirror_1 cwi-aom--- 2.00g 100.00 mirror_1_mimagetmp_2(0),mirror_1_mimage_2(0)
[mirror_1_mimage_0] iwi-aom--- 2.00g /dev/sdc1(0)
[mirror_1_mimage_1] iwi-aom--- 2.00g /dev/sdd1(0)
[mirror_1_mimage_2] iwi-aom--- 2.00g /dev/sde1(0)
[mirror_1_mimagetmp_2] mwi-aom--- 2.00g 100.00 mirror_1_mimage_0(0),mirror_1_mimage_1(0)
[mirror_1_mlog] lwi-aom--- 4.00m /dev/sdf1(0)
AVAIL:2 - NEEDED:1
will_alloc_work=1
PV=/dev/sdf1
mirror_1_mlog: 1.2
PV=/dev/sdf1
mirror_1_mlog: 1.2
Writing verification files (checkit) to mirror(s) on...
---- host-064.virt.lab.msp.redhat.com ----
Sleeping 15 seconds to get some outsanding EXT I/O locks before the failure
Verifying files (checkit) on mirror(s) on...
---- host-064.virt.lab.msp.redhat.com ----
Disabling device sdf on host-064.virt.lab.msp.redhat.com
Getting recovery check start time from /var/log/messages: Dec 5 17:19
Attempting I/O to cause mirror down conversion(s) on host-064.virt.lab.msp.redhat.com
[DEADLOCK]
Version-Release number of selected component (if applicable):
3.10.0-53.el7.x86_64
lvm2-2.02.103-6.el7 BUILT: Wed Nov 27 02:28:25 CST 2013
lvm2-libs-2.02.103-6.el7 BUILT: Wed Nov 27 02:28:25 CST 2013
lvm2-cluster-2.02.103-6.el7 BUILT: Wed Nov 27 02:28:25 CST 2013
device-mapper-1.02.82-6.el7 BUILT: Wed Nov 27 02:28:25 CST 2013
device-mapper-libs-1.02.82-6.el7 BUILT: Wed Nov 27 02:28:25 CST 2013
device-mapper-event-1.02.82-6.el7 BUILT: Wed Nov 27 02:28:25 CST 2013
device-mapper-event-libs-1.02.82-6.el7 BUILT: Wed Nov 27 02:28:25 CST 2013
device-mapper-persistent-data-0.2.8-2.el7 BUILT: Wed Oct 30 10:20:48 CDT 2013
cmirror-2.02.103-6.el7 BUILT: Wed Nov 27 02:28:25 CST 2013
How reproducible:
Often
Comment 1Jonathan Earl Brassow
2014-02-20 22:29:51 UTC
Description of problem: Create a mirror, fail a leg, watch the down convert, bring the failed leg back online, upconvert it back to the original state, fail another leg while it's in process of converting, deadlock. There is also another bug here I'm tracking down as I've found times when although the convert finishes to 100%, the tmp image never gets dropped. If this test waits for the syncing to finish (assuming it doesn't hit the other bug I just mentioned) this deadlock will not happen, if it doesn't wait, the deadlock always happens. # ./revolution_9 Creating mirror(s) on host-064.virt.lab.msp.redhat.com... host-064.virt.lab.msp.redhat.com: lvcreate --type mirror --mirrorlog disk -m 2 -n mirror_1 -L 2G revolution_9 /dev/sdc1:0-2557 /dev/sdd1:0-2557 /dev/sde1:0-2557 /dev/sdf1:0-150 Waiting until all mirror|raid volumes become fully syncd... 0/1 mirror(s) are fully synced: ( 7.30% ) [...] 1/1 mirror(s) are fully synced: ( 100.00% ) Creating ext on top of mirror(s) on host-064.virt.lab.msp.redhat.com... mke2fs 1.42.8 (20-Jun-2013) Mounting mirrored ext filesystems on host-064.virt.lab.msp.redhat.com... ================================================================================ Iteration 0.1 started at Thu Dec 5 17:16:16 CST 2013 ================================================================================ ACTUAL LEG ORDER: /dev/sdc1 /dev/sdd1 /dev/sde1 Scenario kill_non_primary_leg: Kill non primary leg ********* Mirror info for this scenario ********* * mirrors: mirror_1 * leg devices: /dev/sdc1 /dev/sdd1 /dev/sde1 * log devices: /dev/sdf1 * failpv(s): /dev/sde1 * failnode(s): host-064.virt.lab.msp.redhat.com * lvmetad: 0 * leg fault policy: remove * log fault policy: allocate ************************************************* Current mirror/raid device structure(s): LV Attr LSize Cpy%Sync Devices mirror_1 mwi-aom--- 2.00g 100.00 mirror_1_mimage_0(0),mirror_1_mimage_1(0),mirror_1_mimage_2(0) [mirror_1_mimage_0] iwi-aom--- 2.00g /dev/sdc1(0) [mirror_1_mimage_1] iwi-aom--- 2.00g /dev/sdd1(0) [mirror_1_mimage_2] iwi-aom--- 2.00g /dev/sde1(0) [mirror_1_mlog] lwi-aom--- 4.00m /dev/sdf1(0) AVAIL:2 - NEEDED:1 will_alloc_work=1 PV=/dev/sde1 mirror_1_mimage_2: 6 PV=/dev/sde1 mirror_1_mimage_2: 6 Writing verification files (checkit) to mirror(s) on... ---- host-064.virt.lab.msp.redhat.com ---- Sleeping 15 seconds to get some outsanding EXT I/O locks before the failure Verifying files (checkit) on mirror(s) on... ---- host-064.virt.lab.msp.redhat.com ---- Disabling device sde on host-064.virt.lab.msp.redhat.com Getting recovery check start time from /var/log/messages: Dec 5 17:16 Attempting I/O to cause mirror down conversion(s) on host-064.virt.lab.msp.redhat.com 10+0 records in 10+0 records out 41943040 bytes (42 MB) copied, 0.113929 s, 368 MB/s Verifying current sanity of lvm after the failure Current mirror/raid device structure(s): Couldn't find device with uuid M1Rpc0-G8Sr-fpoo-kSe9-hszw-KrBS-mPNydP. LV Attr LSize Cpy%Sync Devices mirror_1 mwi-aom--- 2.00g 100.00 mirror_1_mimage_0(0),mirror_1_mimage_1(0) [mirror_1_mimage_0] iwi-aom--- 2.00g /dev/sdc1(0) [mirror_1_mimage_1] iwi-aom--- 2.00g /dev/sdd1(0) [mirror_1_mlog] lwi-aom--- 4.00m /dev/sdf1(0) Verify that each of the mirror repairs finished successfully Verifying FAILED device /dev/sde1 is *NOT* in the volume(s) olog: 2.2 Verifying LOG device(s) /dev/sdf1 *ARE* in the mirror(s) Verifying LEG device /dev/sdc1 *IS* in the volume(s) Verifying LEG device /dev/sdd1 *IS* in the volume(s) verify the dm devices associated with /dev/sde1 have been removed as expected Checking REMOVAL of mirror_1_mimage_2 on: host-064.virt.lab.msp.redhat.com Verify that the mirror image order remains the same after the down conversion EXPECTED LEG ORDER: /dev/sdc1 /dev/sdd1 ACTUAL LEG ORDER: /dev/sdc1 /dev/sdd1 Verifying files (checkit) on mirror(s) on... ---- host-064.virt.lab.msp.redhat.com ---- Enabling device sde on host-064.virt.lab.msp.redhat.com HACK FOR BUG 799048: vgextend --restoremissing revolution_9 /dev/sde1 WARNING: Inconsistent metadata found for VG revolution_9 - updating to use version 7 HACK FOR BUG 799048: vgextend revolution_9 /dev/sde1 Physical volume '/dev/sde1' is already in volume group 'revolution_9' Unable to add physical volume '/dev/sde1' to volume group 'revolution_9'. Getting PE for leg: /dev/sdc1 Getting PE for leg: /dev/sdd1 Getting PE for leg: /dev/sde1 Up converting linear(s) back to mirror(s) on host-064.virt.lab.msp.redhat.com... host-064.virt.lab.msp.redhat.com: lvconvert --mirrorlog disk -m 2 -b revolution_9/mirror_1 /dev/sdc1:0-2557 /dev/sdd1:0-2557 /dev/sde1:0-2557 /dev/sdf1:0-150 Verifying the up conversions by checking if all original leg devices are back in the mirror(s) Verifying device /dev/sdc1 *IS* one of the legs in the mirror(s) Verifying device /dev/sdd1 *IS* one of the legs in the mirror(s) Verifying device /dev/sde1 *IS* one of the legs in the mirror(s) Waiting until all mirror|raid volumes become fully syncd... 0/1 mirror(s) are fully synced: ( 23.75% ) 0/1 mirror(s) are fully synced: ( 34.11% ) 0/1 mirror(s) are fully synced: ( 53.27% ) 0/1 mirror(s) are fully synced: ( 65.94% ) 0/1 mirror(s) are fully synced: ( 95.14% ) 1/1 mirror(s) are fully synced: ( 100.00% ) Verifying files (checkit) on mirror(s) on... ---- host-064.virt.lab.msp.redhat.com ---- Stopping the io load (collie/xdoio) on mirror(s) ================================================================================ Iteration 0.2 started at Thu Dec 5 17:18:57 CST 2013 ================================================================================ ACTUAL LEG ORDER: /dev/sdc1 /dev/sdd1 /dev/sde1 Scenario kill_log: Kill log ********* Mirror info for this scenario ********* * mirrors: mirror_1 * leg devices: /dev/sdc1 /dev/sdd1 /dev/sde1 * log devices: /dev/sdf1 * failpv(s): /dev/sdf1 * failnode(s): host-064.virt.lab.msp.redhat.com * lvmetad: 0 * leg fault policy: remove * log fault policy: allocate ************************************************* #### NOTE THE TMP IMAGE STILL EXISTS FROM PREVIOUS FAILURE ITERATION: #### Current mirror/raid device structure(s): LV Attr LSize Cpy%Sync Devices mirror_1 cwi-aom--- 2.00g 100.00 mirror_1_mimagetmp_2(0),mirror_1_mimage_2(0) [mirror_1_mimage_0] iwi-aom--- 2.00g /dev/sdc1(0) [mirror_1_mimage_1] iwi-aom--- 2.00g /dev/sdd1(0) [mirror_1_mimage_2] iwi-aom--- 2.00g /dev/sde1(0) [mirror_1_mimagetmp_2] mwi-aom--- 2.00g 100.00 mirror_1_mimage_0(0),mirror_1_mimage_1(0) [mirror_1_mlog] lwi-aom--- 4.00m /dev/sdf1(0) AVAIL:2 - NEEDED:1 will_alloc_work=1 PV=/dev/sdf1 mirror_1_mlog: 1.2 PV=/dev/sdf1 mirror_1_mlog: 1.2 Writing verification files (checkit) to mirror(s) on... ---- host-064.virt.lab.msp.redhat.com ---- Sleeping 15 seconds to get some outsanding EXT I/O locks before the failure Verifying files (checkit) on mirror(s) on... ---- host-064.virt.lab.msp.redhat.com ---- Disabling device sdf on host-064.virt.lab.msp.redhat.com Getting recovery check start time from /var/log/messages: Dec 5 17:19 Attempting I/O to cause mirror down conversion(s) on host-064.virt.lab.msp.redhat.com [DEADLOCK] Version-Release number of selected component (if applicable): 3.10.0-53.el7.x86_64 lvm2-2.02.103-6.el7 BUILT: Wed Nov 27 02:28:25 CST 2013 lvm2-libs-2.02.103-6.el7 BUILT: Wed Nov 27 02:28:25 CST 2013 lvm2-cluster-2.02.103-6.el7 BUILT: Wed Nov 27 02:28:25 CST 2013 device-mapper-1.02.82-6.el7 BUILT: Wed Nov 27 02:28:25 CST 2013 device-mapper-libs-1.02.82-6.el7 BUILT: Wed Nov 27 02:28:25 CST 2013 device-mapper-event-1.02.82-6.el7 BUILT: Wed Nov 27 02:28:25 CST 2013 device-mapper-event-libs-1.02.82-6.el7 BUILT: Wed Nov 27 02:28:25 CST 2013 device-mapper-persistent-data-0.2.8-2.el7 BUILT: Wed Oct 30 10:20:48 CDT 2013 cmirror-2.02.103-6.el7 BUILT: Wed Nov 27 02:28:25 CST 2013 How reproducible: Often