Description of problem: When a dm-cache LV is created using a cachevol, we want to be able to use the cache_repair utility on it. With a cachevol, the dm-cache metadata and data live on a single LV (data following metadata), not on two separate LVs as is done with a cache-pool. The cache_repair utility currently expects a single device with metadata. We need to either tell cache_repair the size of metadata on a cachevol, so it only looks at those blocks, or set up a temporary dm device over the metadata blocks in the cachevol and pass the temporary dm device to cache_repair. Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
The other part of the repair process was implemented in lvm in these commits: https://sourceware.org/git/?p=lvm2.git;a=shortlog;h=refs/heads/dev-dct-cachevol-repair-2 That process is described in the man page update, copied from the final commit here (note that the part about dd/cache_repair needs updating according to the original description above.) dm-cache cachevol repair If the cache metadata is damaged in a cachevol, follow these steps to attempt recovery. Ensure that the main LV and the attached cachevol are inactive. $ lvs -a vg -o+segtype LV VG Attr LSize Pool Origin Type [fast] vg Cwi---C--- 32.00m linear main vg Cwi---C--- 1.00g [fast] [main_corig] cache [main_corig] vg owi---C--- 1.00g linear Create a new LV that will hold a repaired copy of the cache. It must be the same size as the existing cachevol it will replace. $ lvcreate -n fast2 -L 32m vg Activate the cachevol LV by itself so that it can be copied. This is a special case of activation that requires confirmation, since a cachevol LV usually cannot be activated directly. $ lvchange -ay vg/fast Do you want to activate component LV in read-only mode? [y/n]: y Allowing activation of component LV. Create a repaired copy of the cache on the replacement LV. If cache_repair fails, then deactivate the old and new cachevols and either contact support, or forcibly detach the unrepairable cache from the main LV. (Copy the entire cachevol, until the cache_repair step does this.) $ dd if=/dev/vg/fast of=/dev/vg/fast2 bs=1M iflag=direct oflag=direct $ cache_repair -i /dev/vg/fast -o /dev/vg/fast2 Deactivate both old and new cachevols (fast and fast2). $ lvchange -an vg/fast vg/fast2 Replace the current cachevol (fast) with the repaired copy (fast2) that the main LV will use for caching. $ lvconvert --replace-cachevol fast2 vg/main Verify that the repaired copy is now attached to the main LV, and the original damaged cachevol is detached. $ lvs -a vg -o+segtype LV VG Attr LSize Pool Origin Type fast vg -wi------- 32.00m linear [fast2] vg Cwi---C--- 32.00m linear main vg Cwi---C--- 1.00g [fast2] [main_corig] cache [main_corig] vg owi---C--- 1.00g linear Try to activate the main LV with the repaired cache. $ lvchange -ay vg/main Try using the main LV. If bad data is seen, then the metadata was not successfully repaired on the new cachevol. In this case, contact sup‐ port for further help, or forcibly detach the unrepairable cache from the main LV. $ lvconvert --splitcache --noflush vg/main
After discussion we eventually decided on a different direction for this than what's described above. The latest implementation uses cache_writeback combined with cache_repair, but requires a new option in the cache_writeback utility. https://sourceware.org/git/?p=lvm2.git;a=shortlog;h=refs/heads/dev-dct-cachevol-writeback" For a dm-cache LV with an attached cachevol using writeback, 'lvconvert --repair LV' will: . detach the cachevol . run cache_repair from the cachevol to a temp file . run cache_writeback to copy blocks from the cachevol back to the original LV, using the repaired metadata in the temp file Requires new --fast-device-offset option for cache_writeback command.
Thanks for the new option, I'll switch this bug back to myself to continue with the new lvconvert repair code that use the new cache tools.
After evaluating this issue, there are no plans to address it further or fix it in an upcoming release. Therefore, it is being closed. If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened.
braindead bot
The development of this is largely finished: https://sourceware.org/git/?p=lvm2.git;a=shortlog;h=refs/heads/dev-dct-cachevol-writeback-2 lvconvert --repair will detach the cachevol from the origin, create a temp LV, run cache_repair from the cachevol metadata to the temp LV, run cache_writeback from the cachevol data to the origin, using the temp LV for metadata, then remove the temp LV. What is unclear is how to test and verify this feature. After some searching and asking, I've not found realistic (real world) test scenarios in which cache_repair could be used to repair a cache that's been damaged. The first approach to testing this would apply the same repair tests that we already perform with cachepools. But, while we support lvconvert repair using cache_repair with a cachepool, it's not clear that this has seen real world damage repair validation either (at least I've not found it.) So, the immediate questions seem to be: - what kinds of damage could realistically happen in real world use - in what scenarios would those occur - does cache_repair repair them