Description of problem:
Unable to remove LVM caching device in writeback mode
It started off saying
0 blocks must still be flushed.
This slowly increased until it now has been saying:
8 blocks must still be flushed.
8 blocks must still be flushed.
Left for 12 hours and now it says:
143745 blocks must still be flushed.
143745 blocks must still be flushed.
143745 blocks must still be flushed.
143745 blocks must still be flushed.
143746 blocks must still be flushed.
143746 blocks must still be flushed.
Version-Release number of selected component (if applicable):
kernel-3.10.0-229.7.2.el7.jump2.x86_64
lvm2-2.02.115-3.el7.x86_64
How reproducible:
Notes from the case of SEG reproduction
LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert LV Tags Devices
[ssd_cache] vg02 Cwi---C--- 2.91t 13.96 4.44 69.02 ssd_cache_cdata(0)
[ssd_cache_cdata] vg02 Cwi-ao---- 2.91t /dev/sdb(100)
[ssd_cache_cmeta] vg02 ewi-ao---- 400.00m /dev/sdb(0)
sybase_user_data_lv vg02 Cwi-aoC--- 20.00t [ssd_cache] [sybase_user_data_lv_corig] 13.96 4.44 69.02 sybase_user_data_lv_corig(0)
[sybase_user_data_lv_corig] vg02 owi-aoC--- 20.00t /dev/sdd(0),/dev/sde(0),/dev/sdg(0)
^^^
It seems the attempt was to remove the LV while it was still mounted as noted from the Attr above, most likely this is what was resulting in the cache incoherence. When the fs is unmounted fsync() is called prior. Secondly, we see the thresholds being changed at the LV layer:
lvchange --cachesettings 'write_promote_adjustment=0 discard_promote_adjustment=0 \
migration_threshold=2048000 random_threshold=512 sequential_threshold=1000000 \
read_promote_adjustment=0' vg02/sybase_user_data_lv
However the table was not updated (again judging from sosreport) ie:
vg02-sybase_user_data_lv: 0 42949681152 cache 253:3 253:2 253:4 4096 1 writeback cleaner 0
With the above, I again attempted to reproduce the issue leaving the fs mounted. Of the many attempts, I managed to reproduce once:
# lvcreate -L 1G -n ssd_meta testvg /dev/sdc1
Logical volume "ssd_meta" created.
# lvcreate -L 10G -n ssd_cache testvg /dev/sdc1
Logical volume "ssd_cache" created.
# lvconvert -c 2M --yes --type cache-pool --cachemode writeback --poolmetadata testvg/ssd_meta testvg/ssd_cache
WARNING: Converting logical volume testvg/ssd_cache and testvg/ssd_meta to pool's data and metadata volumes.
THIS WILL DESTROY CONTENT OF LOGICAL VOLUME (filesystem etc.)
Converted testvg/ssd_cache to cache pool.
# lvcreate --type cache --cachepool testvg/ssd_cache -L 5G -n cached_vol testvg
- As noted in #21:
# lvchange --cachesettings 'write_promote_adjustment=0 discard_promote_adjustment=0 migration_threshold=2048000 random_threshold=512 sequential_threshold=1000000 read_promote_adjustment=0' testvg/cached_vol
- Reload the table to apply changes:
# dmsetup reload --table '0 10485760 cache 253:15 253:14 253:16 4096 1 writeback cleaner 0 read_promote_adjustment 0 sequential_threshold 1000000 random_threshold 512 migration_threshold 2048000 discard_promote_adjustment 0 write_promote_adjustment 0' testvg-cached_vol
# dmsetup suspend testvg-cached_vol
# dmsetup resume testvg-cached_vol
# dmsetup table testvg-cached_vol
0 10485760 cache 253:15 253:14 253:16 4096 1 writeback cleaner 0 read_promote_adjustment 0 sequential_threshold 1000000 random_threshold 512 migration_threshold 2048000 discard_promote_adjustment 0 write_promote_adjustment 0
# mkfs.ext4 /dev/testvg/cached_vol
# mount /dev/testvg/cached_vol /cached_vol/
# dd if=/dev/zero of=/cached_vol/testfile bs=4M count=500
...
2147483648 bytes (2.1 GB) copied, 1.26503 s, 1.7 GB/s
Only a few dirty blocks:
# dmsetup status testvg-cached_vol
0 10485760 cache 8 41/262144 4096 3/5120 289 101 694 41209 0 0 3 1 writeback 2 migration_threshold 2048 cleaner 0
# lvremove testvg/cached_vol
Do you really want to remove active logical volume cached_vol? [y/n]: y
3 blocks must still be flushed.
3 blocks must still be flushed.
3 blocks must still be flushed.
3 blocks must still be flushed.
3 blocks must still be flushed.
3 blocks must still be flushed.
[...]
# lvconvert --splitcache testvg/cached_vol
3 blocks must still be flushed.
3 blocks must still be flushed.
3 blocks must still be flushed.
3 blocks must still be flushed.
3 blocks must still be flushed.
3 blocks must still be flushed.
[...]
The issue is now dealt with inside lvm2 (2.02.133)
Whenever lvm2 is using 'cleaner' policy - cache mode is enforced into 'writethrough'.
Users with older version of lvm2 affected by 'never ending' flushing may as a 'hot-fix' take table line for cached LV, replace 'writeback' word with 'writethrough' load new table line and resume - this should allow to finish flushing.
This one is already in since lvm2-2.02.130-3.el7 in 7.2 (the official 7.2 comes with lvm2-2.02.130-5.el7 even). No need for z-stream, we just forgot to add this to 7.2 errata (we used bug #1269677 for that).