1164942 – LVM Thin: Improve fault handling and recovery

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1164942 - LVM Thin: Improve fault handling and recovery

Summary: LVM Thin: Improve fault handling and recovery

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	lvm2
Sub Component:
Version:	7.1
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	rc
Target Release:	---
Assignee:	Zdenek Kabelac
QA Contact:	cluster-qe@redhat.com
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1119323
TreeView+	depends on / blocked

Reported:	2014-11-17 22:24 UTC by Jonathan Earl Brassow
Modified:	2021-09-03 12:54 UTC (History)
CC List:	9 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2017-07-27 16:28:26 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Bugzilla	1227046	0	unspecified	CLOSED	lvchange fails to activate thin snap when run quickly after creation	2021-09-03 12:56:11 UTC

Internal Links: 1227046

Description Jonathan Earl Brassow 2014-11-17 22:24:48 UTC

- Be clearer about what the effects of the damage are.
       eg, This thin volume is lost.  The mappings in this logical range
           are lost, there were 'n' mapped blocks in this region.

   thin_rmap tool written, just need cross referencing with LVM
   metadata to identify the LVs.

- Handle out of data space gracefully
  - Configure a set of thins that may be purged?

- Handle out of metadata space gracefully
  - Automate recovery in the case of damaged pool metadata

  Need to produce a set of scenarios (eg, transaction ids mismatch
  after recovery, differing nrs of thins, missing data).

- Make sure recovery works at all levels in our supported stacks list.
  eg, RAID recovery if a leg of the data volume is lost.

- Any thin volumes that have been recovered but have potentially
  lost blocks should be marked as such.

- Any completely lost thin volumes should be recreated empty, but
  marked as damaged.

  Alternatively delete them (which is what current lvm does for
  non-thin).  Decide.

- Damaged thins should report how much space was allocated at the
  time of recovery.

- The admin should be able to delete all damaged thins with a simple
  command line.  (similar to --removemissing).

- Work around transient failures of underlying PV/LV
  - We should be able to activate the pool and any thin volumes
    that are not provisioned on these failed PV/LVs.
  - We should be able to activate thins that *do* have allocations
    on the failed PV/LVs at the admins request.  This will mark the
    thins as damaged.  (Punch error targets into the pool).


** It is very likely that this will be broken-up further into more bugs.**

Note You need to log in before you can comment on or make changes to this bug.