Bug 1738642 - use cache_writeback and cache_repair to lvconvert repair a dm-cache cachevol
Summary: use cache_writeback and cache_repair to lvconvert repair a dm-cache cachevol
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: lvm2
Version: 8.1
Hardware: Unspecified
OS: Unspecified
high
unspecified
Target Milestone: rc
: 8.2
Assignee: David Teigland
QA Contact: cluster-qe@redhat.com
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-08-07 17:16 UTC by David Teigland
Modified: 2021-09-07 11:52 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-08-08 07:26:56 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description David Teigland 2019-08-07 17:16:09 UTC
Description of problem:

When a dm-cache LV is created using a cachevol, we want to be able to use the cache_repair utility on it.  With a cachevol, the dm-cache metadata and data live on a single LV (data following metadata), not on two separate LVs as is done with a cache-pool.

The cache_repair utility currently expects a single device with metadata.
We need to either tell cache_repair the size of metadata on a cachevol, so it only looks at those blocks, or set up a temporary dm device over the metadata blocks in the cachevol and pass the temporary dm device to cache_repair.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 David Teigland 2019-08-07 17:22:20 UTC
The other part of the repair process was implemented in lvm in these commits:
https://sourceware.org/git/?p=lvm2.git;a=shortlog;h=refs/heads/dev-dct-cachevol-repair-2

That process is described in the man page update, copied from the final commit here (note that the part about dd/cache_repair needs updating according to the original description above.)

   dm-cache cachevol repair

       If the cache metadata is damaged in a cachevol, follow these  steps  to
       attempt recovery.

       Ensure that the main LV and the attached cachevol are inactive.

           $ lvs -a vg -o+segtype
             LV           VG Attr       LSize  Pool   Origin        Type
             [fast]       vg Cwi---C--- 32.00m                      linear
             main         vg Cwi---C---  1.00g [fast] [main_corig]  cache
             [main_corig] vg owi---C---  1.00g                      linear

       Create  a  new LV that will hold a repaired copy of the cache.  It must
       be the same size as the existing cachevol it will replace.

           $ lvcreate -n fast2 -L 32m vg

       Activate the cachevol LV by itself so that it can be copied.  This is a
       special case of activation that requires confirmation, since a cachevol
       LV usually cannot be activated directly.

           $ lvchange -ay vg/fast
           Do you want to activate component LV in read-only mode? [y/n]: y
             Allowing activation of component LV.

       Create a repaired  copy  of  the  cache  on  the  replacement  LV.   If
       cache_repair  fails,  then  deactivate  the  old  and new cachevols and
       either contact support, or forcibly detach the unrepairable cache  from
       the main LV.

           (Copy the entire cachevol, until the cache_repair step does this.)
           $ dd if=/dev/vg/fast of=/dev/vg/fast2 bs=1M iflag=direct oflag=direct

           $ cache_repair -i /dev/vg/fast -o /dev/vg/fast2

       Deactivate both old and new cachevols (fast and fast2).

           $ lvchange -an vg/fast vg/fast2

       Replace the current cachevol (fast) with the repaired copy (fast2) that
       the main LV will use for caching.

           $ lvconvert --replace-cachevol fast2 vg/main

       Verify that the repaired copy is now attached to the main LV,  and  the
       original damaged cachevol is detached.

           $ lvs -a vg -o+segtype
             LV           VG  Attr       LSize  Pool    Origin        Type
             fast         vg -wi------- 32.00m                       linear
             [fast2]      vg Cwi---C--- 32.00m                       linear
             main         vg Cwi---C---  1.00g [fast2] [main_corig]  cache
             [main_corig] vg owi---C---  1.00g                       linear

       Try to activate the main LV with the repaired cache.

           $ lvchange -ay vg/main

       Try  using the main LV.  If bad data is seen, then the metadata was not
       successfully repaired on the new cachevol.  In this case, contact  sup‐
       port  for  further help, or forcibly detach the unrepairable cache from
       the main LV.

           $ lvconvert --splitcache --noflush vg/main

Comment 2 David Teigland 2020-06-03 16:43:20 UTC
After discussion we eventually decided on a different direction for this than what's described above.  The latest implementation uses cache_writeback combined with cache_repair, but requires a new option in the cache_writeback utility.

https://sourceware.org/git/?p=lvm2.git;a=shortlog;h=refs/heads/dev-dct-cachevol-writeback"

For a dm-cache LV with an attached cachevol using writeback,
'lvconvert --repair LV' will:

. detach the cachevol
. run cache_repair from the cachevol to a temp file
. run cache_writeback to copy blocks from the cachevol
  back to the original LV, using the repaired metadata in
  the temp file

Requires new --fast-device-offset option for cache_writeback
command.

Comment 7 David Teigland 2021-02-03 16:50:12 UTC
Thanks for the new option, I'll switch this bug back to myself to continue with the new lvconvert repair code that use the new cache tools.

Comment 8 RHEL Program Management 2021-02-07 07:29:50 UTC
After evaluating this issue, there are no plans to address it further or fix it in an upcoming release.  Therefore, it is being closed.  If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened.

Comment 9 David Teigland 2021-02-08 18:12:22 UTC
braindead bot

Comment 11 David Teigland 2021-04-21 16:12:17 UTC
The development of this is largely finished:
https://sourceware.org/git/?p=lvm2.git;a=shortlog;h=refs/heads/dev-dct-cachevol-writeback-2

lvconvert --repair will detach the cachevol from the origin, create a temp LV, run cache_repair from the cachevol metadata to the temp LV, run cache_writeback from the cachevol data to the origin, using the temp LV for metadata, then remove the temp LV.

What is unclear is how to test and verify this feature.  After some searching and asking, I've not found realistic (real world) test scenarios in which cache_repair could be used to repair a cache that's been damaged.

The first approach to testing this would apply the same repair tests that we already perform with cachepools.  But, while we support lvconvert repair using cache_repair with a cachepool, it's not clear that this has seen real world damage repair validation either (at least I've not found it.)

So, the immediate questions seem to be:
- what kinds of damage could realistically happen in real world use
- in what scenarios would those occur
- does cache_repair repair them

Comment 13 RHEL Program Management 2021-08-08 07:26:56 UTC
After evaluating this issue, there are no plans to address it further or fix it in an upcoming release.  Therefore, it is being closed.  If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened.


Note You need to log in before you can comment on or make changes to this bug.