1359342 – [RFE] LVM Cache/Thin: Improve failure handling tools and process

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1359342 - [RFE] LVM Cache/Thin: Improve failure handling tools and process

Summary: [RFE] LVM Cache/Thin: Improve failure handling tools and process

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	lvm2
Sub Component:
Version:	7.3
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	rc
Target Release:	---
Assignee:	Joe Thornber
QA Contact:	cluster-qe@redhat.com
Docs Contact:	Steven J. Levine
URL:
Whiteboard:
Duplicates (1):	1471143 (view as bug list)
Depends On:
Blocks:	1482602 1577173
TreeView+	depends on / blocked

Reported:	2016-07-22 22:45 UTC by Jonathan Earl Brassow
Modified:	2019-04-29 14:07 UTC (History)
CC List:	12 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	1482602 (view as bug list)
Environment:
Last Closed:	2019-04-29 14:07:37 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Jonathan Earl Brassow 2016-07-22 22:45:22 UTC

The current method of swapping in a newly created LV in order to remove the errant metadataLV is overly cumbersome.  A new solution must be made - perhaps a single command that duplicates the existing (bad) metadata and places it into a file or new LV that can then be worked on with the cache/thin tools.

We get far too many complaints when problems are discovered in the metadata LVs - simply to get at the metadataLV and before any investigation has begun of the actual problem.

Comment 2 Zdenek Kabelac 2016-07-27 14:46:40 UTC

We have couple options - at this moment we may provide 'scriptable' tool (script composed from couple commands to capture all relevant bits).

Current API is very basic 'low-level' swap.

With commit:

https://www.redhat.com/archives/lvm-devel/2016-July/msg00222.html

we can now  're-compose' cached LV from 3 separate volumes without zeroing cache pool metadata.

So we may 'provide' something like 'lvconvert --splitcacheandpool --noflush'  to 'atomize'  cached LV into 3 individual LVs without uncaching.

Other way could be marking  LV as being in 'maintenance' mode.

lvconvert --maintenance vg/lv  |  lvchange -amy  vg/lv   ??

activating 3 LVs as they are with some 'public' made-up names ?
(as we take _t/cmeta  _t/cdata  as private LVs ignored by udev)

Open to futher proposal.

Not seeing much space for 'simple' hack solution like letting user to activate subLV anytime he wants with 'lvchange -ay vg/lv_cmeta'  - we can't let hidden/subLVs being actived without any control  - this would require rework of every lvm2 command....

Maybe  'lvchange -ay --expose-metadata-only vg/lv'  with rework of clmvd....
could expose _cmeta content as LV

Comment 3 David Teigland 2016-07-27 16:47:07 UTC

I went back and reviewed the irc discussion about this from Nov 20, 2015. The idea that seemed to be most promising was "maintenance mode" for LVs. This special mode would be written to VG metadata, and would prevent normal LVM commands from using the LV while it's in this state (similar to the exported state of a VG.) Cluster locking would simply be disabled for LVs in this state. LVM commands would require a special option, e.g. --maintenance, to operate on an LV in maintenance mode, e.g.

lvchange --set-maintenance y VG/LV

lvchange -ay VG/LV
Cannot access an LV in maintenance mode.

lvchange -ay --maintenance VG/LV
Activated LV in maintenance mode.

In maintenance mode, many normal restrictions would not apply, e.g. sub LVs could be activated directly:

lvchange -ay --maintenance VG/LV_tmeta
Activated LV_tmeta in maintenance mode.

One reason that it's important to have the maint flag in metadata is if LV repairs are only partially completed and the machine is reset, it will prevent the system from automatically using the LV in this condition.

There may be other uses for maintenance mode, like manually playing with sub LVs.

This command would extract LV_cmeta from LV, making it a visible LV, and leaving LV with a missing sub LV:

lvconvert --maintenance --extract-sublv VG/LV_cmeta VG/LV

At this point, LV_cmeta could be modified, copied to another LV, etc.

This command would insert LV_cmeta as a sub LV. It may be the same LV that was extracted or a different one that's been created/fixed:

lvconvert --maintenance --insert-sublv VG/LV_cmeta VG/LV

Comment 4 Zdenek Kabelac 2016-07-28 11:28:19 UTC

There are few issue to consider for a solution we are looking for:

1. Are we looking for a solution where user is even 'able' to change metadata - or we should be looking for some  'instant hack'  that would work on 'broken' system.

2. --extract-sublv is effectively --split  operation - so I'd prefer to keep a 'split' logic without introducing new 'extract' department.

3. Do we want to protect user from a 'stupid' action - or we switch to  'I'm expert' mode - where we pass responsibility for locking to a user:

i.e.  'lvchange -aly  --IKnowWhatIMDoing  vg/lv_cmeta' letting it pass'
and if user tries then activation of a topLV  -  lvm2 takes no responsibility.

4. Maybe 'providing' script tool:  'lvbackup_metadata -i vg/lv -o file'  is  a thing we are looking for - and we can hide swapping & activation logic behind users back ?

Comment 5 David Teigland 2016-07-28 15:36:39 UTC

(In reply to Zdenek Kabelac from comment #4)
> There are few issue to consider for a solution we are looking for:
> 
> 1. Are we looking for a solution where user is even 'able' to change
> metadata - or we should be looking for some  'instant hack'  that would work
> on 'broken' system.

What I wrote in comment 3 is intended to be a new proper feature that I hope will be useful for quite a few different things.

> 2. --extract-sublv is effectively --split  operation - so I'd prefer to keep
> a 'split' logic without introducing new 'extract' department.

It's similar to split, but it's actually different, so it requires a different option/command.

> 3. Do we want to protect user from a 'stupid' action - or we switch to  'I'm
> expert' mode - where we pass responsibility for locking to a user:

This is a special expert mode which sets aside normal rules and protections like locking.

> 4. Maybe 'providing' script tool:  'lvbackup_metadata -i vg/lv -o file'  is 
> a thing we are looking for - and we can hide swapping & activation logic
> behind users back ?

You could still do that, but in comment 3 I'm looking at the problem more broadly and trying to come up with a solution that can be used for other things also.

Comment 6 Zdenek Kabelac 2016-07-28 22:09:13 UTC

Another idea for considering - 

We may implement support for taking internal old-snapshot of metadata device.

lvcreate -s -L10 vg/pool_cmeta

This will present same challenges - but when implemented properly - we could use this take also snapshot of an 'active' pool volume and provide 'consistent' device for tooling examination - probably something we may want.

When 'snapshot' becomes full - it  would be automatically invalidated/dropped.

Comment 7 Jonathan Earl Brassow 2017-07-31 14:34:35 UTC

*** Bug 1471143 has been marked as a duplicate of this bug. ***

Comment 10 David Teigland 2017-08-29 18:34:35 UTC

"maintenance mode" has two main parts.

The first simple part I have done and pushed here
https://sourceware.org/git/?p=lvm2.git;a=shortlog;h=refs/heads/dev-dct-maintenance

It provides the basic maintenance mode framework:

- setting/clearing the maintenance flag on an LV
  lvchange --setmaintenance y|n LV

- prevents activating an LV that's in maintenance mode
  (this step is is incomplete, other changes need to be prevented
   on an LV while it's in maintenance mode, activation is just
   the main one.)

- adds an option (--maintenance) that allows a command to operate
  on an LV that's in maintenance mode, to perform "maintenance"
  of some kind.  e.g. lvchange -ay --maintenance LV

The second part, which is still largely undefined, is deciding what
operations constitute "maintenance" of some kind, and associating
these operations with maintenance mode.  Those operations would
either require the LV to already be in maintenance mode, or would
move the LV into maintenance mode themselves.  An easy example of
this is 'lvconvert --repair' which should move an LV into maintenance
mode before it begins, and only move it out of maintenance once it's
successful.  An LV would then be protected from being used after
an incomplete or unsuccessful repair.

Comment 11 Jonathan Earl Brassow 2017-09-14 16:53:51 UTC

Since the design is still unsettled, we won't be able to design, write, test, and provide QA adequate lead to expect this to land in 7.5.  I'm deferring to 7.6 for consideration.  I'd still love to see a solution emerge in 7.5, but it would likely be Tech Preview and untested by QA.

Comment 14 Jonathan Earl Brassow 2019-04-29 14:07:37 UTC

We have the ability now to activate sub-LVs, which makes this process much simpler.  We did not go with a "maintenance mode" approach in the final design.

Note You need to log in before you can comment on or make changes to this bug.