Bug 1881056

Summary: Can't remove lvmcache device
Product: [Community] LVM and device-mapper Reporter: Roy Sigurd Karlsbakk <roy>
Component: lvm2Assignee: Zdenek Kabelac <zkabelac>
lvm2 sub component: Cache Logical Volumes QA Contact: cluster-qe <cluster-qe>
Status: NEW --- Docs Contact:
Severity: high    
Priority: unspecified CC: agk, heinzm, jbrassow, msnitzer, prajnoha, roy, thornber, zkabelac
Version: unspecifiedFlags: pm-rhel: lvm-technical-solution?
pm-rhel: lvm-test-coverage?
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
lvmdump -a
none
Removed usage of cache none

Description Roy Sigurd Karlsbakk 2020-09-21 13:21:50 UTC
Description of problem:

Unable to remove lvmcache device

Version-Release number of selected component (if applicable):

Debian Buster, fully upgraded
Kernel 5.7.0-0.bpo.2-amd64
LVM version as reported by lvs --version

# lvs --version
  LVM version:     2.03.02(2) (2018-12-18)
  Library version: 1.02.155 (2018-12-18)
  Driver version:  4.42.0
  Configuration:   ./configure --build=x86_64-linux-gnu --prefix=/usr --includedir=${prefix}/include --mandir=${prefix}/share/man --infodir=${prefix}/share/info --sysconfdir=/etc --localstatedir=/var --disable-silent-rules --libdir=${prefix}/lib/x86_64-linux-gnu --libexecdir=${prefix}/lib/x86_64-linux-gnu --runstatedir=/run --disable-maintainer-mode --disable-dependency-tracking --exec-prefix= --bindir=/bin --libdir=/lib/x86_64-linux-gnu --sbindir=/sbin --with-usrlibdir=/usr/lib/x86_64-linux-gnu --with-optimisation=-O2 --with-cache=internal --with-device-uid=0 --with-device-gid=6 --with-device-mode=0660 --with-default-pid-dir=/run --with-default-run-dir=/run/lvm --with-default-locking-dir=/run/lock/lvm --with-thin=internal --with-thin-check=/usr/sbin/thin_check --with-thin-dump=/usr/sbin/thin_dump --with-thin-repair=/usr/sbin/thin_repair --enable-applib --enable-blkid_wiping --enable-cmdlib --enable-dmeventd --enable-dbus-service --enable-lvmlockd-dlm --enable-lvmlockd-sanlock --enable-lvmpolld --enable-notify-dbus --enable-pkgconfig --enable-readline --enable-udev_rules --enable-udev_sync
root@smilla:~# uname -r
5.7.0-0.bpo.2-amd64

How reproducible:

Always

Steps to Reproduce:
1. lvconvert --uncache data/data

Actual results:

Doesn't fail, but loops forever with output

  Unknown feature in status: 8 2484/262144 128 819199/819200 57472609 35325443 19996086 9940614 209 210 1 3 metadata2 writethrough no_discard_passdown 2 migration_threshold 2048 smq 0 rw -
  Flushing 1 blocks for cache data/data.

Expected results:

Cache is removed

Additional info:

I first tried with kernel 5.4, didn't work. I upgraded to 5.7, didn't work. I tried with --uncache --force, same result as before.

# lvs -o+cache_mode data/data
  Unknown feature in status: 8 2488/262144 128 819200/819200 57472726 35325517 19996091 9940616 7 8 1 3 metadata2 writethrough no_discard_passdown 2 migration_threshold 2048 smq 0 rw -
  LV   VG   Attr       LSize  Pool     Origin       Data%  Meta%  Move Log Cpy%Sync Convert CacheMode
  data data Cwi-aoC--- 13,67t [_cache] [data_corig] 100,00 0,95            0,01             writethrough

So it shows writethrough cache, which is what I wanted. It also shows some unknown feature that I don't know.

Zdenek Kabelac asked me on the mailing list to list relevant packages installed, so here you go

linux-image-5.7.0-0.bpo.2-amd64 <-- running now
linux-image-5.4.0-0.bpo.3-amd64 <-- was running before that
(some older kernel was probably running before that, the server has been around for some time)

Comment 1 Zdenek Kabelac 2020-09-21 13:26:11 UTC
Please attach resulting file from a command  'lvmdump -a'.

Comment 2 Roy Sigurd Karlsbakk 2020-09-21 14:49:58 UTC
Created attachment 1715544 [details]
lvmdump -a

lvmdump -a as requested

Comment 3 Zdenek Kabelac 2020-09-21 15:08:50 UTC
Hmm - can we get a long history of 'device-mapper' messages from kernel log 
(from a journal or /var/log/syslog... dmesg...) whatever fits.

So we can see which devices in device stack are failing and which are correct.

Comment 4 Roy Sigurd Karlsbakk 2020-09-21 16:13:29 UTC
From dmesg -T

[ma. sep. 21 18:11:25 2020] device-mapper: cache: Origin device (dm-8) discard unsupported: Disabling discard passdown.
[ma. sep. 21 18:11:28 2020] device-mapper: cache: Origin device (dm-8) discard unsupported: Disabling discard passdown.

Comment 5 Zdenek Kabelac 2020-09-29 16:06:32 UTC
Created attachment 1717580 [details]
Removed usage of cache

Until we provide an 'lvm2 solution' for this erroring case, the way to fix your case is to try to vgcfgrestore attached modified metadata (dropped cache and left your origin as 'data' LV with respective UUID).

# vgcfgrestore -f data_new data

Should make you this LVs:

  LV              VG   Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  [data]          data -wi-------  13,67t                                                    
  [lvol0_pmspare] data ewi-------   1,00g                                                    
  vmtest          data -wi------- 100,00g                                                    

You can 'lvremove' lvol0_pmspare later on.

Assuming you had the 1 failing sector problem - you should probably run 'fsck' anyway.

Comment 6 Roy Sigurd Karlsbakk 2020-09-29 16:15:10 UTC
Thanks. But - is this safe? Is there a way to roll back if it fails?

Comment 7 Zdenek Kabelac 2020-09-29 16:24:23 UTC
Yep - you have full archive of previous  lvm2 metadata in  /etc/lvm/archive