Description of problem: I have a 4 HDD LVM with a BTRFS filesystem on it, and have attached an LVM cache on an SSD to it. I recently had to forcefully shutdown the server and have had 2 dirty blocks since then that I can't get rid of. ------------------------------------ LVM Cache report of /dev/vg01/home ------------------------------------ - Cache Usage: 99.9% - Metadata Usage: 23.9% - Read Hit Rate: 4.4% - Write Hit Rate: 86.2% - Demotions/Promotions/Dirty: 1897/1898/2 - Features in use: metadata2 writeback no_discard_passdown I tried --uncache and --splitcache but both are looping endlessly. lvconvert --splitcache vg01/cache Flushing 2 blocks for cache vg01/home. Flushing 2 blocks for cache vg01/home. Flushing 2 blocks for cache vg01/home. Flushing 2 blocks for cache vg01/home. Flushing 2 blocks for cache vg01/home. Flushing 2 blocks for cache vg01/home. Flushing 2 blocks for cache vg01/home. ^C Interrupted... Flushing of vg01/home aborted. I also tried increasing the migration threshold as read in quite a few places, but it's not permitted for some reason. lvchange --cachesettings migration_threshold=16384 vg01/cache Operation not permitted on hidden LV vg01/cache. Also, dmesg is constantly printing these errors regarding the cache: [47768.838675] bio_check_eod: 2516 callbacks suppressed [47768.838687] kworker/3:2: attempt to access beyond end of device dm-2: rw=1, sector=113271576960, nr_sectors = 960 limit=97664499712 [47768.838732] kworker/3:2: attempt to access beyond end of device dm-2: rw=1, sector=113292549120, nr_sectors = 960 limit=97664499712 [47768.843929] kworker/3:2: attempt to access beyond end of device dm-2: rw=1, sector=113271576960, nr_sectors = 960 limit=97664499712 [47768.843962] kworker/3:2: attempt to access beyond end of device dm-2: rw=1, sector=113292549120, nr_sectors = 960 limit=97664499712 [47768.849526] kworker/3:2: attempt to access beyond end of device dm-2: rw=1, sector=113271576960, nr_sectors = 960 limit=97664499712 [47768.849559] kworker/3:2: attempt to access beyond end of device dm-2: rw=1, sector=113292549120, nr_sectors = 960 limit=97664499712 [47768.855005] kworker/3:2: attempt to access beyond end of device dm-2: rw=1, sector=113271576960, nr_sectors = 960 limit=97664499712 [47768.855039] kworker/3:2: attempt to access beyond end of device dm-2: rw=1, sector=113292549120, nr_sectors = 960 limit=97664499712 [47768.858331] kworker/3:2: attempt to access beyond end of device dm-2: rw=1, sector=113271576960, nr_sectors = 960 limit=97664499712 [47768.858365] kworker/3:2: attempt to access beyond end of device dm-2: rw=1, sector=113292549120, nr_sectors = 960 limit=97664499712 Version-Release number of selected component (if applicable): lvm2 2.03.16-2 How reproducible: Always, survives reboots. Steps to Reproduce: 1. Have a cache with dirty blocks 2. Try to split or remove the cache Actual results: Cache can't be split or removed. Expected results: I would like the dirty blocks to go away or just remove the cache and start fresh. Additional info: Happy to provide any additional information.
What is version of lvm2, kernel, LinuxOS ? From the kernel - it does look like metadata for cache are somehow 'incompatible/corrupted'. Can you activate only cache metadata device and grab its content and attach to BZ (with just 'dd') ?
lvm2 is at 2.03.16, kernel is at 5.19.13, running Arch Linux. Not sure what you meant by activating only the cache device, but I removed all mounts related to that LVM and rebooted before grabbing the cache meta with dd. Attaching the resulting file above (compressed with xz).
Created attachment 1917872 [details] Cache metadata device contents
@zkabelac Did the cache dump tell you anything? Do you require anything else?
Hmm can we get 'lvmdump -m' for this machine ? Was there any LV size manipulation going on ? cache_check is reportin ok status - since the provided file for compressed tar archive.
The last resize was done quite a while ago and had completed successfully. It was a disk swap because some dead sectors started appearing and I had to replace one of the drives. Scrubbing the FS only revealed one corrupted file, which has been deleted since if that matters. Please find attached the dump report.
Created attachment 1920024 [details] lvmdump
Yep - in the metadata history we can see there was once size ~52.75TiB - and this is approximately the region of 2 blocks that cannot be flushed. For quick fix - are you able to to extend LV to this size again temporarily to allow to proceed this flushing ? (don't do anything yet)
It should be simple to go just with '--type zero' for lvextend. Just make very sure you are not extending your 'filesystem' with this extension! So steps like this: # lvextend --type zero -L+10T [--fs ignore] vg01/home (--fs ignore is for the most recent lvm2, likely unrelated to your installed lvm2 version) now let the cache to be flushed & dropped and reduce back this extra zero segment (which can't be used for any real read&writes) # lvreduce -L-10T [--fs ignore] vg01/home
Looks like that did the trick, dirty blocks are gone, thank you very much! Should I close the ticket myself?