Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1738642

Summary: use cache_writeback and cache_repair to lvconvert repair a dm-cache cachevol
Product: Red Hat Enterprise Linux 8 Reporter: David Teigland <teigland>
Component: lvm2Assignee: David Teigland <teigland>
lvm2 sub component: Cache Logical Volumes QA Contact: cluster-qe <cluster-qe>
Status: CLOSED WONTFIX Docs Contact:
Severity: unspecified    
Priority: high CC: agk, cmarthal, heinzm, jbrassow, msnitzer, pasik, prajnoha, rhandlin, zkabelac
Version: 8.1Keywords: Reopened, Triaged
Target Milestone: rcFlags: pm-rhel: mirror+
Target Release: 8.2   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-08-08 07:26:56 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description David Teigland 2019-08-07 17:16:09 UTC
Description of problem:

When a dm-cache LV is created using a cachevol, we want to be able to use the cache_repair utility on it.  With a cachevol, the dm-cache metadata and data live on a single LV (data following metadata), not on two separate LVs as is done with a cache-pool.

The cache_repair utility currently expects a single device with metadata.
We need to either tell cache_repair the size of metadata on a cachevol, so it only looks at those blocks, or set up a temporary dm device over the metadata blocks in the cachevol and pass the temporary dm device to cache_repair.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 David Teigland 2019-08-07 17:22:20 UTC
The other part of the repair process was implemented in lvm in these commits:
https://sourceware.org/git/?p=lvm2.git;a=shortlog;h=refs/heads/dev-dct-cachevol-repair-2

That process is described in the man page update, copied from the final commit here (note that the part about dd/cache_repair needs updating according to the original description above.)

   dm-cache cachevol repair

       If the cache metadata is damaged in a cachevol, follow these  steps  to
       attempt recovery.

       Ensure that the main LV and the attached cachevol are inactive.

           $ lvs -a vg -o+segtype
             LV           VG Attr       LSize  Pool   Origin        Type
             [fast]       vg Cwi---C--- 32.00m                      linear
             main         vg Cwi---C---  1.00g [fast] [main_corig]  cache
             [main_corig] vg owi---C---  1.00g                      linear

       Create  a  new LV that will hold a repaired copy of the cache.  It must
       be the same size as the existing cachevol it will replace.

           $ lvcreate -n fast2 -L 32m vg

       Activate the cachevol LV by itself so that it can be copied.  This is a
       special case of activation that requires confirmation, since a cachevol
       LV usually cannot be activated directly.

           $ lvchange -ay vg/fast
           Do you want to activate component LV in read-only mode? [y/n]: y
             Allowing activation of component LV.

       Create a repaired  copy  of  the  cache  on  the  replacement  LV.   If
       cache_repair  fails,  then  deactivate  the  old  and new cachevols and
       either contact support, or forcibly detach the unrepairable cache  from
       the main LV.

           (Copy the entire cachevol, until the cache_repair step does this.)
           $ dd if=/dev/vg/fast of=/dev/vg/fast2 bs=1M iflag=direct oflag=direct

           $ cache_repair -i /dev/vg/fast -o /dev/vg/fast2

       Deactivate both old and new cachevols (fast and fast2).

           $ lvchange -an vg/fast vg/fast2

       Replace the current cachevol (fast) with the repaired copy (fast2) that
       the main LV will use for caching.

           $ lvconvert --replace-cachevol fast2 vg/main

       Verify that the repaired copy is now attached to the main LV,  and  the
       original damaged cachevol is detached.

           $ lvs -a vg -o+segtype
             LV           VG  Attr       LSize  Pool    Origin        Type
             fast         vg -wi------- 32.00m                       linear
             [fast2]      vg Cwi---C--- 32.00m                       linear
             main         vg Cwi---C---  1.00g [fast2] [main_corig]  cache
             [main_corig] vg owi---C---  1.00g                       linear

       Try to activate the main LV with the repaired cache.

           $ lvchange -ay vg/main

       Try  using the main LV.  If bad data is seen, then the metadata was not
       successfully repaired on the new cachevol.  In this case, contact  sup‐
       port  for  further help, or forcibly detach the unrepairable cache from
       the main LV.

           $ lvconvert --splitcache --noflush vg/main

Comment 2 David Teigland 2020-06-03 16:43:20 UTC
After discussion we eventually decided on a different direction for this than what's described above.  The latest implementation uses cache_writeback combined with cache_repair, but requires a new option in the cache_writeback utility.

https://sourceware.org/git/?p=lvm2.git;a=shortlog;h=refs/heads/dev-dct-cachevol-writeback"

For a dm-cache LV with an attached cachevol using writeback,
'lvconvert --repair LV' will:

. detach the cachevol
. run cache_repair from the cachevol to a temp file
. run cache_writeback to copy blocks from the cachevol
  back to the original LV, using the repaired metadata in
  the temp file

Requires new --fast-device-offset option for cache_writeback
command.

Comment 7 David Teigland 2021-02-03 16:50:12 UTC
Thanks for the new option, I'll switch this bug back to myself to continue with the new lvconvert repair code that use the new cache tools.

Comment 8 RHEL Program Management 2021-02-07 07:29:50 UTC
After evaluating this issue, there are no plans to address it further or fix it in an upcoming release.  Therefore, it is being closed.  If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened.

Comment 9 David Teigland 2021-02-08 18:12:22 UTC
braindead bot

Comment 11 David Teigland 2021-04-21 16:12:17 UTC
The development of this is largely finished:
https://sourceware.org/git/?p=lvm2.git;a=shortlog;h=refs/heads/dev-dct-cachevol-writeback-2

lvconvert --repair will detach the cachevol from the origin, create a temp LV, run cache_repair from the cachevol metadata to the temp LV, run cache_writeback from the cachevol data to the origin, using the temp LV for metadata, then remove the temp LV.

What is unclear is how to test and verify this feature.  After some searching and asking, I've not found realistic (real world) test scenarios in which cache_repair could be used to repair a cache that's been damaged.

The first approach to testing this would apply the same repair tests that we already perform with cachepools.  But, while we support lvconvert repair using cache_repair with a cachepool, it's not clear that this has seen real world damage repair validation either (at least I've not found it.)

So, the immediate questions seem to be:
- what kinds of damage could realistically happen in real world use
- in what scenarios would those occur
- does cache_repair repair them

Comment 13 RHEL Program Management 2021-08-08 07:26:56 UTC
After evaluating this issue, there are no plans to address it further or fix it in an upcoming release.  Therefore, it is being closed.  If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened.