Bug 1633167
Summary: | lvm tried to deactivate subLV of raid1 lv in a tagged VG when Vg deactivated | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | nikhil kshirsagar <nkshirsa> | ||||
Component: | lvm2 | Assignee: | Heinz Mauelshagen <heinzm> | ||||
lvm2 sub component: | Mirroring and RAID | QA Contact: | cluster-qe <cluster-qe> | ||||
Status: | CLOSED ERRATA | Docs Contact: | |||||
Severity: | urgent | ||||||
Priority: | urgent | CC: | agk, cmarthal, heinzm, jbrassow, loberman, lvm-team, mcsontos, msnitzer, nkshirsa, nwahl, prajnoha, rbednar, rhandlin, yoliynyk, zkabelac | ||||
Version: | 7.5 | ||||||
Target Milestone: | rc | ||||||
Target Release: | --- | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | lvm2-2.02.184-1.el7 | Doc Type: | If docs needed, set a value | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2019-08-06 13:10:41 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 1577173 | ||||||
Attachments: |
|
Description
nikhil kshirsagar
2018-09-26 10:40:53 UTC
Looking over th BZ description - there is one issue standing out - when using 'cluster' (clvmd) and passing --config option with activation command (vgchnage) is unsupported combination as the 'clvmd' activation service is running autonomously from the command itself - going to take deeper analysis of the original issue. So I'm still unclear what is going on - but how has happened that _rmeta_ LVs are visible ?? I'm looking over the archive and it seems the LV itself was created on different node ? The core trouble seems to be, that _rmeta_ LVs are present with this tag - which likely failed to be checked on validation code - and code let it store in metadata (also in 'lvs -a' you can notice missing [] around the name). So is there something special going on during lvcreate ? I've not yet been able to reproduce creation of 'raid1' LV with visible _rmeta_ devices. There is definitely a bug in validation code - it's not checking internal LVs is invisible - so during run of command: vgchange -an all visible LVs are tried for deactivation and if _rmeta_ is visible and is used by raid1 LV itself - user will get error about problem with deactivation of LV which is openned. A workaround should be to 'vgcfgbackup' such VG - remove "VISIBLE" attribute from _rmeta_ LVs and vgcfgrestore them back. Also user should be always able to deactivate direct raid1 LV by name with 'lvchange -an vg/lv' command. lvm2 surely needs extension in validation of metadata - so trial to store '_rmeta_' is caught before it hits metadata storage. We need to see 'sos' report from all nodes - please provide attachemnt from all clustered nodes. The reported case is not reproducible with shown steps so far. Sos reports is needed for better analysis how the raid's _rmeta_ images were left visible in metadata. Provided single 'sos' report file just shows raid LV appeared there already invalid. So after verification - there really is during RAID creation as short moment, when raid LV is committed on this with VISIBLE flag - those _rmeta_ LVs are then zerod and new commit making them invisible follows shortly. However if there is 'crash' just in this tiny time window - raid LV is left in unknown state - user likely should remove such LV and create it again. This is seen as creation sequence bug in raidLV. If the customer has deadlocking issue - he should probably switch to use older '--type mirror' - where the order sequence with zeroing is correct. Upstream commit16ae968d24b4fe3264dc9b46063345ff2846957b to avoid commiting SubLVs for wipinng thus not causing them to turn remnant on crashs. (In reply to Zdenek Kabelac from comment #22) > So after verification - there really is during RAID creation as short > moment, when raid LV is committed on this with VISIBLE flag - those > _rmeta_ LVs are then zerod and new commit making them invisible follows > shortly. > > However if there is 'crash' just in this tiny time window - raid LV is > left in unknown state - user likely should remove such LV and create it > again. > > This is seen as creation sequence bug in raidLV. > > If the customer has deadlocking issue - he should probably switch to use > older '--type mirror' - where the order sequence with zeroing is correct. Switching to 'mirror' type is not mandatory with patch as of previous comment. The patch 16ae968d24b4fe3264dc9b46063345ff2846957b still seems to lack fixing the commit of raid metadata with visible _rmeta_ LVs. And the patch is also introducing other potential problem where the _rmeta_ LVs present in DM table can be misidentified as other type of block devices. While the very generic activation for wipe_fs was a bit 'annoying' we've been sure there is later zero chance to see this device being i.e. identified as some other device/fs/mdraid member. With clearing only 1st. sector as proposed patch is doing - we are leaving other signatures in place - so in case there is any issue with raid activation - such device can be possibly misused. To stay 'reasonably' secure we probably need to erase at least 64K from the front and end of the device - although there are weird filesystem like ZFS where its signature is stored (and later identified) in way more complicated way. It would be probably inefficient to clear whole _rmeta_ device though - since in case of large extent size we might end with erasing lots of space. Another slight advantage from previous 'activation' method was the option to introduce usage of TRIM ioctl in a very simple way for such device - although we already do provide support for trimming 'removed' PV space - so we might consider to generalize this concept of use it also for 'allocation' of new LVs. (In reply to Zdenek Kabelac from comment #25) > The patch 16ae968d24b4fe3264dc9b46063345ff2846957b still seems to lack > fixing the commit of raid metadata with visible _rmeta_ LVs. > > And the patch is also introducing other potential problem where the _rmeta_ > LVs present in DM table can be misidentified as other type of block devices. > The patch does not introduce the potential problem you describe anew at all! It keeps the semantics we had before (wiping one sector at the beginning). So your statement points at an enhancement request to reduce any potential bogus discoveries by e.g. libblkid even further from what we always allowed for. I can add an additional patch changing the given semantics after studying which signatures of what size we actually should take into consideration so that we can minimize wiping overhead. (In reply to Heinz Mauelshagen from comment #26) > (In reply to Zdenek Kabelac from comment #25) > > The patch 16ae968d24b4fe3264dc9b46063345ff2846957b still seems to lack > > fixing the commit of raid metadata with visible _rmeta_ LVs. > > > > And the patch is also introducing other potential problem where the _rmeta_ > > LVs present in DM table can be misidentified as other type of block devices. > > > > The patch does not introduce the potential problem you describe anew at all! > As of discussion with Zdenek: not using wipe_lv actually looses us wiping all known signatures, so we need to keep using that API for the time being. Reverting and coming up with a new patch... lvm2 upstream commit commit dd5716ddf258c4a44819fa90d3356833ccf767b4 Stable branch commit 9b04851fc574ce9cffd30a51d2b750955239f316 Created attachment 1586603 [details] test results Marking verified. Testing consisted of running the reproducer from initial comment (100 times) and raid sanity regression check for raid1 in singlenode environment. See attachment for logs and reproducer script. Regression run: https://beaker.cluster-qe.lab.eng.brq.redhat.com/bkr/jobs/96718 lvm2-2.02.185-2.el7.x86_64 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:2253 |