Bug 1020297

Summary: Reduction of 2 legged mirror while the 3rd. leg is being synchronized breaks the mirror.
Product: [Fedora] Fedora Reporter: Zdenek Kabelac <zkabelac>
Component: lvm2Assignee: Jonathan Earl Brassow <jbrassow>
Status: CLOSED EOL QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 22CC: agk, bmarzins, bmr, dwysocha, heinzm, jonathan, lvm-team, msnitzer, prajnoha, prockai, zkabelac
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-07-19 10:29:52 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Zdenek Kabelac 2013-10-17 12:06:28 UTC
Description of problem:

When lvconvert -m-1 is executed on mirror which is not yet fully synchronized,
log removal cause abort of conversion process.

This can be observed in lvm2 internal test suite
(make check_cluster T=lvconvert-mirror.sh):

lvconvert-mirror.sh: line ~190  

lvconvert -m-1 $vg/$lv1 "$dev2"

If this conversion is execute just in time, convertion from 
the command above -m+1 is not yet fully synchronize, it
will fail together with signs, there are still some mimagetmp volumes:

metadata/lv_manip.c:696   Stack LV1:0[0] on LV LV1_mimagetmp_2:0
metadata/lv_manip.c:300   Adding LV1:0 as an user of LV1_mimagetmp_2
metadata/lv_manip.c:696   Stack LV1:0[1] on LV LV1_mimage_2:0
metadata/lv_manip.c:300   Adding LV1:0 as an user of LV1_mimage_2
metadata/lv_manip.c:696   Stack LV1_mimagetmp_2:0[0] on LV LV1_mimage_0:0
metadata/lv_manip.c:300   Adding LV1_mimagetmp_2:0 as an user of LV1_mimage_0
metadata/lv_manip.c:300   Adding LV1_mimagetmp_2:0 as an user of LV1_mlog
locking/cluster_locking.c:428   Requesting backup of VG metadata for @PREFIX@vg
format_text/archiver.c:226   WARNING: This metadata update is NOT backed up
metadata/mirror.c:1102   Failed to initialize log device

Version-Release number of selected component (if applicable):
2.02.103

How reproducible:


Steps to Reproduce:
1. lvconvert -m+1 
2. right after that  lvconvert -m-1 while the first conversion is not fully done.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Zdenek Kabelac 2013-10-18 09:34:34 UTC
So the problem looks to be related to how the mirror evaluates it's still a mirror.

If the mirror has just 2 legs - and I add 1 leg - which is slowly being synchronized, and while the synchronization is still in progress, I cut other the one of those in-sync legs  - the mirror tries to demolish itself.
This is catched by clvmd test which is missing the lock from log_lv,
thus  lvconvert attemt to deactivate this LV (which has no lock) fails -

that's probably the reason for the weird sequence in the  lvm2 code:
(lib/metadata/mirror.c - line ~300)

	/* If the LV is active, deactivate it first. */
	if (lv_is_active(log_lv)) {
		(void) deactivate_lv(cmd, log_lv);
		/*
		 * FIXME: workaround to fail early
		 * Ensure that log is really deactivated because deactivate_lv
		 * on cluster do not fail if there is log_lv with different UUID.
		 */
		if (lv_is_active(log_lv)) {
			log_error("Aborting. Unable to deactivate mirror log.");
			goto revert_new_lv;
		}


Which is fails in cluster since without lock no deactivation is called - so lv will remain active and the second test fails.

This is sign of multiple bugs here - mirror needs to be deactivated through it master LV - and it should never go to the subdevices.

And also we have here the problem with detection what is and what is not a mirror. So if the mirror is being upscaled, and the removal of the leg would cause device to be no longer mirrored - seems we have multiple choices -
let it go -  mirror will catchup later to be a mirror?
abort instantly - and warn user he needs to wait till mirror has synced its legs?

Here is short 'test-suite' code to demonstrate the problem:

--
. lib/test

aux prepare_pvs 4
vgcreate -s 32k $vg $(cat DEVICES)

# "remove from original mirror (the original is still mirror)"
lvcreate -aey -l10 --type mirror -m1 -n $lv1 $vg "$dev1" "$dev2" "$dev3"
aux delay_dev "$dev4" 0 100
lvconvert -m+1 -b $vg/$lv1 "$dev4"
lvs -a -o+devices $vg

lvconvert -m-1 $vg/$lv1 "$dev2"
lvs -a -o+devices $vg

vgremove -ff $vg
--

Comment 2 Jaroslav Reznik 2015-03-03 15:08:51 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 22 development cycle.
Changing version to '22'.

More information and reason for this action is here:
https://fedoraproject.org/wiki/Fedora_Program_Management/HouseKeeping/Fedora22

Comment 3 Fedora End Of Life 2016-07-19 10:29:52 UTC
Fedora 22 changed to end-of-life (EOL) status on 2016-07-19. Fedora 22 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.