Bug 1665650

Summary: cannot remove cache lv when its larger than 1TB (see bz 1661987)
Product: Red Hat Enterprise Linux 8 Reporter: nikhil kshirsagar <nkshirsa>
Component: lvm2Assignee: Joe Thornber <thornber>
lvm2 sub component: Cache Logical Volumes QA Contact: cluster-qe <cluster-qe>
Status: CLOSED ERRATA Docs Contact:
Severity: urgent    
Priority: unspecified CC: agk, cmarthal, heinzm, jbrassow, loberman, mcsontos, msnitzer, prajnoha, rbednar, zkabelac
Version: 8.0Flags: jruemker: mirror+
Target Milestone: rc   
Target Release: 8.2   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: lvm2-2.03.05-1.el8 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-11-05 22:35:15 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1679810    

Description nikhil kshirsagar 2019-01-12 02:51:45 UTC
Description of problem:

As discussed with ejt, snitm and visegrips on IRC, this is a RHEL8 bz opened for the issue described in detail at https://bugzilla.redhat.com/show_bug.cgi?id=1661987

Comment 2 Corey Marthaler 2019-01-15 00:12:53 UTC
FWIW, the cache origin volume doesn't need to be large in order to hit this, just the cache pool volume (with writeback mode).

Comment 3 nikhil kshirsagar 2019-01-15 02:18:15 UTC
It's more of a chunksize issue .. if the chunk size exceeds one mb the issue happens regardless of the size of the cache lv. 

The default chunksize goes above one mb as soon as the size of the cached lv goes above one tb, so we see the issue for one tb sizes.. 

simply passing in a larger chunksize while creating the cache lv will cause the issue. 

This is because of the limit of max chunks in lvn.conf which lvm tries to keep within one million (max value defined in lvm.conf). 

This value may be increased and then in fact larger than one tb sizes are possible for cached lvs because the chunk size still remains within one mb in this case.

Comment 4 Marian Csontos 2019-08-05 13:12:34 UTC
Fixed long since. See commit 74ae1c5bc1150005ae6e82c90415c433f4a24cbd already present in 8.0.0.

Comment 6 Corey Marthaler 2019-08-13 22:08:29 UTC
Fix verified in the latest rpms.

kernel-4.18.0-127.el8    BUILT: Thu Aug  1 14:38:42 CDT 2019
lvm2-2.03.05-2.el8    BUILT: Wed Jul 24 08:05:11 CDT 2019
lvm2-libs-2.03.05-2.el8    BUILT: Wed Jul 24 08:05:11 CDT 2019
lvm2-dbusd-2.03.05-2.el8    BUILT: Wed Jul 24 08:07:38 CDT 2019
lvm2-lockd-2.03.05-2.el8    BUILT: Wed Jul 24 08:05:11 CDT 2019
device-mapper-1.02.163-2.el8    BUILT: Wed Jul 24 08:05:11 CDT 2019
device-mapper-libs-1.02.163-2.el8    BUILT: Wed Jul 24 08:05:11 CDT 2019
device-mapper-event-1.02.163-2.el8    BUILT: Wed Jul 24 08:05:11 CDT 2019
device-mapper-event-libs-1.02.163-2.el8    BUILT: Wed Jul 24 08:05:11 CDT 2019
device-mapper-persistent-data-0.8.5-2.el8    BUILT: Wed Jun  5 10:28:04 CDT 2019



============================================================
Iteration 4 of 4 started at Tue Aug 13 16:58:08 CDT 2019
============================================================
SCENARIO - [large_cache_removal]
Create a cache volume with a large pool (larger than 1TB if storage allows) with writeback mode and attempt to remove it w/o getting into a "Flushing blocks for cache" loop
Largest PV (/dev/sdp1) found is 476799, greater than 1T so this *is* a valid test for ths bug

*** Cache info for this scenario ***
*  origin (slow):  /dev/sdm1
*  pool (fast):    /dev/sdp1 /dev/sdo1
************************************

Adding "slow" and "fast" tags to corresponding pvs
Create origin (slow) volume
lvcreate --wipesignatures y  -L 2G -n large_cache_removal cache_sanity @slow
WARNING: xfs signature detected on /dev/cache_sanity/large_cache_removal at offset 0. Wipe it? [y/n]: [n]
  Aborted wiping of xfs.
  1 existing signature left on the device.

lvcreate  -l 100%PVS -n pool cache_sanity /dev/sdp1
lvcreate  -L 2G -n pool_meta cache_sanity @fast
  WARNING: No free extents on physical volume "/dev/sdp1".

Create cache pool volume by combining the cache data and cache metadata (fast) volumes with WRITEBACK mode
lvconvert --yes --type cache-pool --cachemode writeback --poolmetadata cache_sanity/pool_meta cache_sanity/pool
  WARNING: Converting cache_sanity/pool and cache_sanity/pool_meta to cache pool's data and metadata volumes with metadata wiping.
  THIS WILL DESTROY CONTENT OF LOGICAL VOLUME (filesystem etc.)
Create cached volume by combining the cache pool (fast) and origin (slow) volumes
lvconvert --yes --type cache --cachepool cache_sanity/pool cache_sanity/large_cache_removal
Placing an xfs filesystem on origin volume
Mounting origin volume

Writing files to /mnt/large_cache_removal

Checking files on /mnt/large_cache_removal

lvremove -f /dev/cache_sanity/large_cache_removal
        (this should not loop "Flushing blocks for cache cache_sanity/large_cache_removal")

  Flushing 176 blocks for cache cache_sanity/large_cache_removal.
  Flushing 154 blocks for cache cache_sanity/large_cache_removal.
  Flushing 138 blocks for cache cache_sanity/large_cache_removal.
  Flushing 123 blocks for cache cache_sanity/large_cache_removal.
  Flushing 107 blocks for cache cache_sanity/large_cache_removal.
  Flushing 91 blocks for cache cache_sanity/large_cache_removal.
  Flushing 73 blocks for cache cache_sanity/large_cache_removal.
  Flushing 55 blocks for cache cache_sanity/large_cache_removal.
  Flushing 34 blocks for cache cache_sanity/large_cache_removal.
  Flushing 16 blocks for cache cache_sanity/large_cache_removal.
  Logical volume "pool" successfully removed
  Logical volume "large_cache_removal" successfully removed

Comment 8 errata-xmlrpc 2019-11-05 22:35:15 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:3654