Bug 1959626

Summary: Ability to online resize (reduce) cache origin can lead to deadlock attempting to --splitcache
Product: Red Hat Enterprise Linux 8 Reporter: Corey Marthaler <cmarthal>
Component: lvm2Assignee: Zdenek Kabelac <zkabelac>
lvm2 sub component: Cache Logical Volumes QA Contact: cluster-qe <cluster-qe>
Status: CLOSED WONTFIX Docs Contact:
Severity: high    
Priority: high CC: agk, heinzm, jbrassow, mcsontos, msnitzer, prajnoha, zkabelac
Version: 8.5Keywords: Triaged
Target Milestone: betaFlags: pm-rhel: mirror+
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-11-11 07:25:28 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Corey Marthaler 2021-05-11 23:12:34 UTC
Description of problem:
[root@hayes-03 ~]# pvscan
  PV /dev/sdb1   VG cache_sanity    lvm2 [446.62 GiB / 446.62 GiB free]
  PV /dev/sdc1   VG cache_sanity    lvm2 [446.62 GiB / 446.62 GiB free]
  PV /dev/sdd1   VG cache_sanity    lvm2 [446.62 GiB / 446.62 GiB free]
  PV /dev/sde1   VG cache_sanity    lvm2 [446.62 GiB / 446.62 GiB free]
  PV /dev/sdf1   VG cache_sanity    lvm2 [<1.82 TiB / <1.82 TiB free]
  PV /dev/sdg1   VG cache_sanity    lvm2 [<1.82 TiB / <1.82 TiB free]
  PV /dev/sdh1   VG cache_sanity    lvm2 [<1.82 TiB / <1.82 TiB free]
  PV /dev/sdi1   VG cache_sanity    lvm2 [<1.82 TiB / <1.82 TiB free]
  PV /dev/sdj1   VG cache_sanity    lvm2 [<1.82 TiB / <1.82 TiB free]
  Total: 9 [<10.84 TiB] / in use: 9 [<10.84 TiB] / in no VG: 0 [0   ]

[root@hayes-03 ~]# lvcreate --yes --wipesignatures y  -L 4G -n corigin cache_sanity @slow
  Wiping ext4 signature on /dev/cache_sanity/corigin.
  Logical volume "corigin" created.
[root@hayes-03 ~]# lvcreate --yes  -L 4G -n pool cache_sanity @fast
  Wiping ext4 signature on /dev/cache_sanity/pool.
  Logical volume "pool" created.
[root@hayes-03 ~]# lvcreate --yes  -L 12M -n pool_meta cache_sanity @fast
  Logical volume "pool_meta" created.
[root@hayes-03 ~]# lvconvert --yes --type cache-pool --cachepolicy smq --cachemode writeback -c 32 --poolmetadata cache_sanity/pool_meta cache_sanity/pool
  WARNING: Converting cache_sanity/pool and cache_sanity/pool_meta to cache pool's data and metadata volumes with metadata wiping.
  THIS WILL DESTROY CONTENT OF LOGICAL VOLUME (filesystem etc.)
  Converted cache_sanity/pool and cache_sanity/pool_meta to cache pool.
[root@hayes-03 ~]# lvconvert --yes --type cache --cachemetadataformat 1 --cachepool cache_sanity/pool cache_sanity/corigin
  Logical volume cache_sanity/corigin is now cached.
                                                                                                   
[root@hayes-03 ~]# lvs -a -o +devices
  LV                 VG           Attr       LSize  Pool         Origin          Data%  Meta%  Move Log Cpy%Sync Convert Devices            
  corigin            cache_sanity Cwi-a-C---  4.00g [pool_cpool] [corigin_corig] 0.00   8.72            0.00             corigin_corig(0)   
  [corigin_corig]    cache_sanity owi-aoC---  4.00g                                                                      /dev/sdj1(0)       
  [lvol0_pmspare]    cache_sanity ewi------- 12.00m                                                                      /dev/sdb1(0)       
  [pool_cpool]       cache_sanity Cwi---C---  4.00g                              0.00   8.72            0.00             pool_cpool_cdata(0)
  [pool_cpool_cdata] cache_sanity Cwi-ao----  4.00g                                                                      /dev/sdd1(0)       
  [pool_cpool_cmeta] cache_sanity ewi-ao---- 12.00m                                                                      /dev/sdd1(1024)    
[root@hayes-03 ~]# lvs
  LV      VG           Attr       LSize Pool         Origin          Data%  Meta%  Move Log Cpy%Sync Convert
  corigin cache_sanity Cwi-a-C--- 4.00g [pool_cpool] [corigin_corig] 0.00   8.72            0.00            
[root@hayes-03 ~]# mkfs.ext4 /dev/cache_sanity/corigin 
mke2fs 1.45.6 (20-Mar-2020)
Discarding device blocks: done                            
Creating filesystem with 1048576 4k blocks and 262144 inodes
[...]
Writing superblocks and filesystem accounting information: done 

[root@hayes-03 ~]# mount /dev/cache_sanity/corigin /mnt/corigin/
[root@hayes-03 ~]# df -h
Filesystem                        Size  Used Avail Use% Mounted on
/dev/mapper/cache_sanity-corigin  3.9G   16M  3.7G   1% /mnt/corigin

[root@hayes-03 ~]# lvreduce --yes -L -120M -r /dev/cache_sanity/corigin
Do you want to unmount "/mnt/corigin" ? [Y|n] y
fsck from util-linux 2.32.1
/dev/mapper/cache_sanity-corigin: 11/262144 files (0.0% non-contiguous), 36942/1048576 blocks
resize2fs 1.45.6 (20-Mar-2020)
Resizing the filesystem on /dev/mapper/cache_sanity-corigin to 1017856 (4k) blocks.
The filesystem on /dev/mapper/cache_sanity-corigin is now 1017856 (4k) blocks long.

  Size of logical volume cache_sanity/corigin_corig changed from 4.00 GiB (1024 extents) to 3.88 GiB (994 extents).
  Logical volume cache_sanity/corigin successfully resized.


[root@hayes-03 ~]# lvreduce --yes -L -120M -r /dev/cache_sanity/corigin
Do you want to unmount "/mnt/corigin" ? [Y|n] y
fsck from util-linux 2.32.1
/dev/mapper/cache_sanity-corigin: 11/262144 files (0.0% non-contiguous), 36942/1017856 blocks
resize2fs 1.45.6 (20-Mar-2020)
Resizing the filesystem on /dev/mapper/cache_sanity-corigin to 987136 (4k) blocks.
The filesystem on /dev/mapper/cache_sanity-corigin is now 987136 (4k) blocks long.

  Size of logical volume cache_sanity/corigin_corig changed from 3.88 GiB (994 extents) to <3.77 GiB (964 extents).
  Logical volume cache_sanity/corigin successfully resized.

[...]
May 11 17:50:36 hayes-03 kernel: attempt to access beyond end of device
May 11 17:50:36 hayes-03 kernel: dm-3: rw=1, want=8142272, limit=7897088
May 11 17:50:36 hayes-03 kernel: attempt to access beyond end of device
May 11 17:50:36 hayes-03 kernel: dm-3: rw=1, want=8142784, limit=7897088
May 11 17:50:36 hayes-03 kernel: attempt to access beyond end of device
May 11 17:50:36 hayes-03 kernel: dm-3: rw=1, want=8139776, limit=7897088
May 11 17:50:36 hayes-03 kernel: attempt to access beyond end of device
May 11 17:50:36 hayes-03 kernel: dm-3: rw=1, want=8142272, limit=7897088
[...]

[root@hayes-03 ~]# lvs
  LV      VG           Attr       LSize  Pool         Origin          Data%  Meta%  Move Log Cpy%Sync Convert
  corigin cache_sanity Cwi-aoC--- <3.77g [pool_cpool] [corigin_corig] 3.21   12.92           0.07            
[root@hayes-03 ~]# lvs -a -o +devices
  LV                 VG           Attr       LSize  Pool         Origin          Data%  Meta%  Move Log Cpy%Sync Convert Devices            
  corigin            cache_sanity Cwi-aoC--- <3.77g [pool_cpool] [corigin_corig] 3.21   12.92           0.07             corigin_corig(0)   
  [corigin_corig]    cache_sanity owi-aoC--- <3.77g                                                                      /dev/sdj1(0)       
  [lvol0_pmspare]    cache_sanity ewi------- 12.00m                                                                      /dev/sdb1(0)       
  [pool_cpool]       cache_sanity Cwi---C---  4.00g                              3.21   12.92           0.07             pool_cpool_cdata(0)
  [pool_cpool_cdata] cache_sanity Cwi-ao----  4.00g                                                                      /dev/sdd1(0)       
  [pool_cpool_cmeta] cache_sanity ewi-ao---- 12.00m                                                                      /dev/sdd1(1024) 

# This presumably spins forever:  
[root@hayes-03 ~]# lvconvert --yes --splitcache /dev/cache_sanity/corigin
  Flushing 3 blocks for cache cache_sanity/corigin.
  Flushing 3 blocks for cache cache_sanity/corigin.
  Flushing 3 blocks for cache cache_sanity/corigin.
  Flushing 3 blocks for cache cache_sanity/corigin.
  Flushing 3 blocks for cache cache_sanity/corigin.
  Flushing 3 blocks for cache cache_sanity/corigin.
  Flushing 3 blocks for cache cache_sanity/corigin.
  Flushing 3 blocks for cache cache_sanity/corigin.
  Flushing 3 blocks for cache cache_sanity/corigin.
  Flushing 3 blocks for cache cache_sanity/corigin.
  Flushing 3 blocks for cache cache_sanity/corigin.
  Flushing 3 blocks for cache cache_sanity/corigin.
  Flushing 3 blocks for cache cache_sanity/corigin.
  Flushing 3 blocks for cache cache_sanity/corigin.
  [...]


write(1, "  Flushing 3 blocks for cache ca"..., 52) = 52
rt_sigprocmask(SIG_BLOCK, NULL, ~[KILL STOP RTMIN RT_1], 8) = 0
rt_sigaction(SIGINT, NULL, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f88e3681400}, 8) = 0
rt_sigaction(SIGINT, {sa_handler=0x559ce4c03c90, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f88e3681400}, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f88e3681400}, 8) = 0
rt_sigaction(SIGTERM, NULL, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f88e3681400}, 8) = 0
rt_sigaction(SIGTERM, {sa_handler=0x559ce4c03c90, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f88e3681400}, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f88e3681400}, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[INT KILL TERM STOP RTMIN RT_1], NULL, 8) = 0
nanosleep({tv_sec=0, tv_nsec=500000000}, NULL) = 0
rt_sigprocmask(SIG_BLOCK, NULL, ~[INT KILL TERM STOP RTMIN RT_1], 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[KILL STOP RTMIN RT_1], NULL, 8) = 0
rt_sigaction(SIGINT, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f88e3681400}, NULL, 8) = 0
rt_sigaction(SIGTERM, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f88e3681400}, NULL, 8) = 0
ioctl(3, DM_TABLE_STATUS, {version=4.0.0, data_size=16384, data_start=312, uuid="LVM-jbOTZnz8QRfVePqpJWBdoEfMvi3e52Ab3DYLkcqkOzeIGQo4U0pX09xKWaFzft9r", flags=DM_EXISTS_FLAG|DM_SKIP_BDGET_FLAG|DM_NOFLUSH_FLAG} => {version=4.43.0, data_size=475, data_start=312, dev=makedev(0xfd, 0), name="cache_sanity-corigin", uuid="LVM-jbOTZnz8QRfVePqpJWBdoEfMvi3e52Ab3DYLkcqkOzeIGQo4U0pX09xKWaFzft9r", target_count=1, open_count=1, event_nr=5, flags=DM_EXISTS_FLAG|DM_ACTIVE_PRESENT_FLAG|DM_SKIP_BDGET_FLAG|DM_NOFLUSH_FLAG, ...}) = 0
write(1, "  Flushing 3 blocks for cache ca"..., 52) = 52
rt_sigprocmask(SIG_BLOCK, NULL, ~[KILL STOP RTMIN RT_1], 8) = 0
rt_sigaction(SIGINT, NULL, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f88e3681400}, 8) = 0
rt_sigaction(SIGINT, {sa_handler=0x559ce4c03c90, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f88e3681400}, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f88e3681400}, 8) = 0
rt_sigaction(SIGTERM, NULL, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f88e3681400}, 8) = 0
rt_sigaction(SIGTERM, {sa_handler=0x559ce4c03c90, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f88e3681400}, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f88e3681400}, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[INT KILL TERM STOP RTMIN RT_1], NULL, 8) = 0
nanosleep({tv_sec=0, tv_nsec=500000000}, NULL) = 0
rt_sigprocmask(SIG_BLOCK, NULL, ~[INT KILL TERM STOP RTMIN RT_1], 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[KILL STOP RTMIN RT_1], NULL, 8) = 0
rt_sigaction(SIGINT, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f88e3681400}, NULL, 8) = 0
rt_sigaction(SIGTERM, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f88e3681400}, NULL, 8) = 0
ioctl(3, DM_TABLE_STATUS, {version=4.0.0, data_size=16384, data_start=312, uuid="LVM-jbOTZnz8QRfVePqpJWBdoEfMvi3e52Ab3DYLkcqkOzeIGQo4U0pX09xKWaFzft9r", flags=DM_EXISTS_FLAG|DM_SKIP_BDGET_FLAG|DM_NOFLUSH_FLAG} => {version=4.43.0, data_size=475, data_start=312, dev=makedev(0xfd, 0), name="cache_sanity-corigin", uuid="LVM-jbOTZnz8QRfVePqpJWBdoEfMvi3e52Ab3DYLkcqkOzeIGQo4U0pX09xKWaFzft9r", target_count=1, open_count=1, event_nr=5, flags=DM_EXISTS_FLAG|DM_ACTIVE_PRESENT_FLAG|DM_SKIP_BDGET_FLAG|DM_NOFLUSH_FLAG, ...}) = 0
[...]


Version-Release number of selected component (if applicable):
kernel-4.18.0-305.2.el8    BUILT: Wed May  5 10:35:03 CDT 2021
lvm2-2.03.12-0.1.20210426git4dc5d4a.el8    BUILT: Mon Apr 26 08:23:33 CDT 2021
lvm2-libs-2.03.12-0.1.20210426git4dc5d4a.el8    BUILT: Mon Apr 26 08:23:33 CDT 2021
lvm2-dbusd-2.03.12-0.1.20210426git4dc5d4a.el8    BUILT: Mon Apr 26 08:23:33 CDT 2021
device-mapper-1.02.177-0.1.20210426git4dc5d4a.el8    BUILT: Mon Apr 26 08:23:33 CDT 2021
device-mapper-libs-1.02.177-0.1.20210426git4dc5d4a.el8    BUILT: Mon Apr 26 08:23:33 CDT 2021
device-mapper-event-1.02.177-0.1.20210426git4dc5d4a.el8    BUILT: Mon Apr 26 08:23:33 CDT 2021
device-mapper-event-libs-1.02.177-0.1.20210426git4dc5d4a.el8    BUILT: Mon Apr 26 08:23:33 CDT 2021


How reproducible:
Everytime

Comment 1 David Teigland 2021-06-28 19:29:13 UTC
I don't know if this is supposed to work or not, Zdenek should be able to tell us the status of reducing dm-cache.

Comment 5 RHEL Program Management 2022-11-11 07:25:28 UTC
After evaluating this issue, there are no plans to address it further or fix it in an upcoming release.  Therefore, it is being closed.  If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened.