Bug 1561669

Summary:	[RFE] Allow vgreduce --remove-missing to remove missing cache leg
Product:	[Community] LVM and device-mapper	Reporter:	Xen <bugs>
Component:	lvm2	Assignee:	LVM Team <lvm-team>
lvm2 sub component:	Cache Logical Volumes	QA Contact:	cluster-qe <cluster-qe>
Status:	NEW ---	Docs Contact:
Severity:	high
Priority:	high	CC:	agk, heinzm, jbrassow, prajnoha, zkabelac
Version:	unspecified	Flags:	rule-engine: lvm-technical-solution? rule-engine: lvm-test-coverage?
Target Milestone:	---
Target Release:	---
Hardware:	All
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:		Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Xen 2018-03-28 16:49:26 UTC

On Zdenek's request.

Current LVM has some difficulty elegantly "shedding" a cache PV from a VG (and from an LV) when that cache PV is missing.

It cannot be removed with vgreduce unless --force is used, and doing that also wipes the origin volume.

In the case of a writethrough cache, as Zdenek suggested --- ( which is the default I believe? ) --- ordinarily an origin volume is in a consistent state with itself with or without the cache present -- although the cache of course has to be in sync --

But booting without the cache, activating without the cache, or in this case, removing the cache -- without the cache -- is generally not possible.

By contrast ZFS can boot without its cache just fine, but then, its cache is not persistent, at least not in current Linux incarnations.

It would generally be useful if activating a cached origin would be possible -- invalidating the cache of course -- but in lieu of that removing a missing cache is most important, I'd say.

Currently the only way to remove a missing cache is to force remove it, thereby destroying the origin, and then restoring the origin from a CFG backup, but this time as a normal linear volume, and visible.

At which it becomes possible to vgcfgrestore metadata in which the origin volume now exists once more, and halleluja, the origin exists again in undamaged state.

Regardless, it should be easily possible to do this as part of the regular vgreduce --remove-missing process, and Zdenek suggested it would be a valid use case.

So this RFE is for:

- vgreduce --remove-missing to without issue remove a missing writethrough cache pool and morph the herefore cached origin into the new "real" device just like lvconvert --uncache would with the PV present

- vgreduce --remove-missing --force to also do this for a writeback cache

or alternatively, to have a tool for displaying the "sync" status of a writeback volume?

Regardless, since writethrough is the default, only the first option seems most prudent.

Regards,

ps cannot test myself any longer and seems to be present in version up to current (185).

Comment 1 Zdenek Kabelac 2018-03-29 08:06:22 UTC

Yep - thanks for report.

I believe 'lvconvert --uncache|--splitcache' is supposed to work in cases PV with cache data & metadata are missing,  but 'vgreduce --removemissing' is not using this code piece.

It's longer outstanding problem where we try to define better consistent rules - ATM focusing on  'raid' logic - where we the consistently apply rules on other targets.

Comment 2 Xen 2018-03-29 09:14:03 UTC

Ah, I was wondering when I wrote it; whether lvconvert --uncache would work, I hadn't actually tried it myself back then, sorry, thanks for the heads up.

I think for LVM cache what's really important is that it should always auto-detect out-of-sync problems and automatic invalidation (cache wipe) on such occasions; in addition to that being able to boot without the cache I believe

Perhaps by forcing it to activate (not the default) but a cache state should not be considered more important I think than a bootable system.

I mean if the cache automatically invalidates when gone, and automatically rebuilds when present (build anew) then you have a system that survives the cache disk being removed from the system with grace;

apart from the problem that VG operations are not possible during that time.

Perhaps automatic "ignoring the missing cache" would not always be desirable, but that could be a configuration choice.

If lvconvert --uncache works with missing cache that's okay, but recreating the cache (by manual commands) is annoying and should really be avoided I think if not necessary.

It's a shame cache disk metadata has equal "authority" as origin disk metadata, honestly if a disk is only used for cache, and in many cases it would, it wouldn't really need any metadata of its own (I mean VG config), that would reduce chances of inconsistent VG metadata when an earlier missing cache is reattached to the system after it had been vgreduce --remove-missing'd, otherwise there is conflicting metadata that can only be resolved with manual edit?

Any PV holding cache volumes should really not have any metadata (for the VG), and if you did have additional "ordinary" volumes on the PV, maybe you should just move that to a different PV (partition); at least, the PV would be unusable without the "main" disk.

I guess there is the --metadataignore option (for pvcreate) for such a use case?

Sorry for rambling. That would imply that cache PV is best created with --metadataignore y or --metadatacopies 0, I didn't know that yet.

In any case, --metadataignore y would work well with vgreduce --removemissing, of course, you would have to recreate the cache (and the PV or its inclusion in the VG) afterwards if desired.

Alternatively automatic invalidation / rebuilding would make it easier to ignore the missing cache disk for a while without removing it.

Comment 3 Xen 2018-03-29 09:20:37 UTC

What happens if PV has --metadataignore y and is missing; are VG operations still possible?

Comment 4 Zdenek Kabelac 2018-03-29 12:50:10 UTC

(In reply to Xen from comment #2)
> Ah, I was wondering when I wrote it; whether lvconvert --uncache would work,
> I hadn't actually tried it myself back then, sorry, thanks for the heads up.
> 
> I think for LVM cache what's really important is that it should always
> auto-detect out-of-sync problems and automatic invalidation (cache wipe) on
> such occasions; in addition to that being able to boot without the cache I
> believe

For detection of out-of-sync cache - the cache has to by physically present.
LVM2 does not store any sort of this info in it's metadata.

There is 'assumption' when cache is in 'writethrough' mode - the cache can be dropped anytime - but if user would have manually changed mode with 'dmsetup' -  such assumption may lead to data lose.

So normally lvm2 prefer to have access it cache metadata.

In general it's hard to guess when it's wanted 'automatic' activation if PV with cache is missing (or cache got i.e. write-error or some other kind of problem)

So the lvm2 aims rather for proper tool support and admin picks what's wanted in given moment.

> Perhaps by forcing it to activate (not the default) but a cache state should
> not be considered more important I think than a bootable system.

In case PV is missing - the VG is getting inconsistent - and there is 'limited' number of actions we allow to do - it has large number of complex cases so while it's not visible on first sight there is huge impact on LVs sharing space with missing PV and each case has long list of details for solving.

> I mean if the cache automatically invalidates when gone, and automatically
> rebuilds when present (build anew) then you have a system that survives the
> cache disk being removed from the system with grace;

The simple case is -  'lvconvert --uncache' is meant to work.
Cache can by recreated easily anytime later again.

The key problem current '--uncache' code has is it's misusing 'partial' activation - this needs to be fixed (together with raid code).

> Perhaps automatic "ignoring the missing cache" would not always be
> desirable, but that could be a configuration choice.

We have already created RFE Bug 1305573   for what we call 'optional' PV.
In queue with low priority ATM.

> 
> If lvconvert --uncache works with missing cache that's okay, but recreating
> the cache (by manual commands) is annoying and should really be avoided I
> think if not necessary.

Creation of cache isn't really complicated either - especially when user prepares profile.

Also cachepool (data & metadata) can be just spitted - so can be reattached later again for caching any LV  (--splitcache)

> It's a shame cache disk metadata has equal "authority" as origin disk
> metadata, honestly if a disk is only used for cache, and in many cases it

It's NOT shame at all - it's key principle about VG consistency and safety. 
We often get 'suggestion' to support devices without any PV headers - just by addressing them like  /dev/sda  :) - this thing is however unsupportable.

While often users do think it's secure - there is no concept of device ownership and we do not intend at all to step of this 'dirty water' of plain craziness ;)

So lvm2 always requires PV UUID header as key element to keep things properly recognizable.

> I guess there is the --metadataignore option (for pvcreate) for such a use
> case?

PV has UUID (in its header) and some 'space' for metadata (metadata area).
Option  --metadataignore  makes this area ignore - but there is still a PV (just some other PV in VG has metadata for it...)

If user uses 1000PV in single VG - he may want to have only 10 metadata copies maintained - remaining 990 PV have metadata ignored - this is surely way more efficient in terms of speed of metadata update but clearly way less secure in case those 10 PVs holding metadata are lost in emergency case.... Admin has to choice best pick...