Bug 1389471

Summary: deadlock when attempting to uncache a suspended exclusively activated cluster cache pool
Product: Red Hat Enterprise Linux 7 Reporter: Corey Marthaler <cmarthal>
Component: lvm2Assignee: Zdenek Kabelac <zkabelac>
lvm2 sub component: Cache Logical Volumes QA Contact: cluster-qe <cluster-qe>
Status: CLOSED WONTFIX Docs Contact:
Severity: medium    
Priority: unspecified CC: agk, heinzm, jbrassow, msnitzer, prajnoha, zkabelac
Version: 7.3   
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-11-18 16:57:44 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Corey Marthaler 2016-10-27 16:05:16 UTC
Description of problem:
This may be related to 1388632, however unlike bug 1388632, this only fails when in exclusively activated cluster mode. This same case passes when run on non clustered VGs.


SCENARIO - [attempt_to_remove_suspended_cache_pool]
Create a cache volume, suspend the pool and meta devices, then attempt to remove it

*** Cache info for this scenario ***
*  origin (slow):  /dev/sde1
*  pool (fast):    /dev/sdf1
************************************

Adding "slow" and "fast" tags to corresponding pvs
Create origin (slow) volume
lvcreate --activate ey -L 4G -n suspend cache_sanity @slow

Create cache data and cache metadata (fast) volumes
lvcreate --activate ey -L 4G -n POOL cache_sanity @fast
WARNING: xfs signature detected on /dev/cache_sanity/POOL at offset 0. Wipe it? [y/n]: [n]
  Aborted wiping of xfs.
  1 existing signature left on the device.
lvcreate --activate ey -L 12M -n POOL_meta cache_sanity @fast

Create cache pool volume by combining the cache data and cache metadata (fast) volumes with policy: mq  mode: writethrough
lvconvert --yes --type cache-pool --cachepolicy mq --cachemode writethrough -c 64 --poolmetadata cache_sanity/POOL_meta cache_sanity/POOL
  WARNING: Converting logical volume cache_sanity/POOL and cache_sanity/POOL_meta to cache pool's data and metadata volumes with metadata wiping.
  THIS WILL DESTROY CONTENT OF LOGICAL VOLUME (filesystem etc.)
Create cached volume by combining the cache pool (fast) and origin (slow) volumes
lvconvert --yes --type cache --cachepool cache_sanity/POOL cache_sanity/suspend
dmsetup status | grep cache_sanity-suspend | grep writethrough | grep -w mq

dmsetup suspend cache_sanity-POOL_cdata
Uncache/remove the cache pool while it's suspended


Oct 26 18:07:55 host-082 qarshd[22699]: Running cmdline: dmsetup suspend cache_sanity-POOL_cdata
Oct 26 18:07:56 host-082 qarshd[22704]: Running cmdline: lvconvert --uncache cache_sanity/suspend
Oct 26 18:08:06 host-082 crmd[18409]:  notice: High CPU load detected: 1.290000
Oct 26 18:08:36 host-082 crmd[18409]:  notice: High CPU load detected: 1.180000
Oct 26 18:09:06 host-082 crmd[18409]:  notice: High CPU load detected: 1.110000
Oct 26 18:09:36 host-082 crmd[18409]:  notice: High CPU load detected: 1.060000
Oct 26 18:10:01 host-082 systemd: Started Session 389 of user root.
Oct 26 18:10:01 host-082 systemd: Starting Session 389 of user root.
Oct 26 18:10:06 host-082 crmd[18409]:  notice: High CPU load detected: 1.040000
Oct 26 18:10:36 host-082 crmd[18409]:  notice: High CPU load detected: 1.020000
Oct 26 18:11:06 host-082 crmd[18409]:  notice: High CPU load detected: 1.010000
Oct 26 18:11:33 host-082 kernel: INFO: task lvconvert:22705 blocked for more than 120 seconds.
Oct 26 18:11:33 host-082 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 26 18:11:33 host-082 kernel: lvconvert       D ffff880036637900     0 22705  22704 0x00000080
Oct 26 18:11:33 host-082 kernel: ffff880012917980 0000000000000082 ffff8800205ace70 ffff880012917fd8
Oct 26 18:11:33 host-082 kernel: ffff880012917fd8 ffff880012917fd8 ffff8800205ace70 ffff88003fc16c40
Oct 26 18:11:33 host-082 kernel: 0000000000000000 7fffffffffffffff 0000000000000000 ffff880036637900
Oct 26 18:11:33 host-082 kernel: Call Trace:
Oct 26 18:11:33 host-082 kernel: [<ffffffff8168b559>] schedule+0x29/0x70
Oct 26 18:11:33 host-082 kernel: [<ffffffff81688fa9>] schedule_timeout+0x239/0x2d0
Oct 26 18:11:33 host-082 kernel: [<ffffffffa0002699>] ? dm_make_request+0x119/0x170 [dm_mod]
Oct 26 18:11:33 host-082 kernel: [<ffffffff81060aef>] ? kvm_clock_get_cycles+0x1f/0x30
Oct 26 18:11:33 host-082 kernel: [<ffffffff810eaa6c>] ? ktime_get_ts64+0x4c/0xf0
Oct 26 18:11:33 host-082 kernel: [<ffffffff8168aafe>] io_schedule_timeout+0xae/0x130
Oct 26 18:11:33 host-082 kernel: [<ffffffff8168ab98>] io_schedule+0x18/0x20
Oct 26 18:11:33 host-082 kernel: [<ffffffff8123e6e5>] do_blockdev_direct_IO+0x1c45/0x2020
Oct 26 18:11:33 host-082 kernel: [<ffffffff81238fa0>] ? I_BDEV+0x10/0x10
Oct 26 18:11:33 host-082 kernel: [<ffffffff8123eb15>] __blockdev_direct_IO+0x55/0x60
Oct 26 18:11:33 host-082 kernel: [<ffffffff81238fa0>] ? I_BDEV+0x10/0x10
Oct 26 18:11:33 host-082 kernel: [<ffffffff81239860>] blkdev_direct_IO+0x60/0xa0
Oct 26 18:11:33 host-082 kernel: [<ffffffff81238fa0>] ? I_BDEV+0x10/0x10
Oct 26 18:11:33 host-082 kernel: [<ffffffff811826dc>] generic_file_aio_read+0x70c/0x790
Oct 26 18:11:33 host-082 kernel: [<ffffffff8121e7be>] ? mntput_no_expire+0x3e/0x120
Oct 26 18:11:33 host-082 kernel: [<ffffffff8121e7be>] ? mntput_no_expire+0x3e/0x120
Oct 26 18:11:33 host-082 kernel: [<ffffffff8120dcf6>] ? path_openat+0x166/0x490
Oct 26 18:11:33 host-082 kernel: [<ffffffff81239d9c>] blkdev_aio_read+0x4c/0x70
Oct 26 18:11:33 host-082 kernel: [<ffffffff811fd7cd>] do_sync_read+0x8d/0xd0
Oct 26 18:11:33 host-082 kernel: [<ffffffff811fdf7e>] vfs_read+0x9e/0x170
Oct 26 18:11:33 host-082 kernel: [<ffffffff811feb4f>] SyS_read+0x7f/0xe0
Oct 26 18:11:33 host-082 kernel: [<ffffffff81696489>] system_call_fastpath+0x16/0x1b
Oct 26 18:11:36 host-082 crmd[18409]:  notice: High CPU load detected: 1.010000
Oct 26 18:12:06 host-082 crmd[18409]:  notice: High CPU load detected: 1.010000
Oct 26 18:13:33 host-082 kernel: INFO: task lvconvert:22705 blocked for more than 120 seconds.
Oct 26 18:13:33 host-082 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 26 18:13:33 host-082 kernel: lvconvert       D ffff880036637900     0 22705  22704 0x00000080
Oct 26 18:13:33 host-082 kernel: ffff880012917980 0000000000000082 ffff8800205ace70 ffff880012917fd8
Oct 26 18:13:33 host-082 kernel: ffff880012917fd8 ffff880012917fd8 ffff8800205ace70 ffff88003fc16c40
Oct 26 18:13:33 host-082 kernel: 0000000000000000 7fffffffffffffff 0000000000000000 ffff880036637900
Oct 26 18:13:33 host-082 kernel: Call Trace:
Oct 26 18:13:33 host-082 kernel: [<ffffffff8168b559>] schedule+0x29/0x70
Oct 26 18:13:33 host-082 kernel: [<ffffffff81688fa9>] schedule_timeout+0x239/0x2d0
Oct 26 18:13:33 host-082 kernel: [<ffffffffa0002699>] ? dm_make_request+0x119/0x170 [dm_mod]
Oct 26 18:13:33 host-082 kernel: [<ffffffff81060aef>] ? kvm_clock_get_cycles+0x1f/0x30
Oct 26 18:13:33 host-082 kernel: [<ffffffff810eaa6c>] ? ktime_get_ts64+0x4c/0xf0
Oct 26 18:13:33 host-082 kernel: [<ffffffff8168aafe>] io_schedule_timeout+0xae/0x130
Oct 26 18:13:33 host-082 kernel: [<ffffffff8168ab98>] io_schedule+0x18/0x20
Oct 26 18:13:33 host-082 kernel: [<ffffffff8123e6e5>] do_blockdev_direct_IO+0x1c45/0x2020
Oct 26 18:13:33 host-082 kernel: [<ffffffff81238fa0>] ? I_BDEV+0x10/0x10
Oct 26 18:13:33 host-082 kernel: [<ffffffff8123eb15>] __blockdev_direct_IO+0x55/0x60
Oct 26 18:13:33 host-082 kernel: [<ffffffff81238fa0>] ? I_BDEV+0x10/0x10
Oct 26 18:13:33 host-082 kernel: [<ffffffff81239860>] blkdev_direct_IO+0x60/0xa0
Oct 26 18:13:33 host-082 kernel: [<ffffffff81238fa0>] ? I_BDEV+0x10/0x10
Oct 26 18:13:33 host-082 kernel: [<ffffffff811826dc>] generic_file_aio_read+0x70c/0x790
Oct 26 18:13:33 host-082 kernel: [<ffffffff8121e7be>] ? mntput_no_expire+0x3e/0x120
Oct 26 18:13:33 host-082 kernel: [<ffffffff8121e7be>] ? mntput_no_expire+0x3e/0x120
Oct 26 18:13:33 host-082 kernel: [<ffffffff8120dcf6>] ? path_openat+0x166/0x490
Oct 26 18:13:33 host-082 kernel: [<ffffffff81239d9c>] blkdev_aio_read+0x4c/0x70
Oct 26 18:13:33 host-082 kernel: [<ffffffff811fd7cd>] do_sync_read+0x8d/0xd0
Oct 26 18:13:33 host-082 kernel: [<ffffffff811fdf7e>] vfs_read+0x9e/0x170
Oct 26 18:13:33 host-082 kernel: [<ffffffff811feb4f>] SyS_read+0x7f/0xe0
Oct 26 18:13:33 host-082 kernel: [<ffffffff81696489>] system_call_fastpath+0x16/0x1b
Oct 26 18:15:33 host-082 kernel: INFO: task lvconvert:22705 blocked for more than 120 seconds.
Oct 26 18:15:33 host-082 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 26 18:15:33 host-082 kernel: lvconvert       D ffff880036637900     0 22705  22704 0x00000080
Oct 26 18:15:33 host-082 kernel: ffff880012917980 0000000000000082 ffff8800205ace70 ffff880012917fd8
Oct 26 18:15:33 host-082 kernel: ffff880012917fd8 ffff880012917fd8 ffff8800205ace70 ffff88003fc16c40
Oct 26 18:15:33 host-082 kernel: 0000000000000000 7fffffffffffffff 0000000000000000 ffff880036637900
Oct 26 18:15:33 host-082 kernel: Call Trace:
Oct 26 18:15:33 host-082 kernel: [<ffffffff8168b559>] schedule+0x29/0x70
Oct 26 18:15:33 host-082 kernel: [<ffffffff81688fa9>] schedule_timeout+0x239/0x2d0
Oct 26 18:15:33 host-082 kernel: [<ffffffffa0002699>] ? dm_make_request+0x119/0x170 [dm_mod]
Oct 26 18:15:33 host-082 kernel: [<ffffffff81060aef>] ? kvm_clock_get_cycles+0x1f/0x30
Oct 26 18:15:33 host-082 kernel: [<ffffffff810eaa6c>] ? ktime_get_ts64+0x4c/0xf0
Oct 26 18:15:33 host-082 kernel: [<ffffffff8168aafe>] io_schedule_timeout+0xae/0x130
Oct 26 18:15:33 host-082 kernel: [<ffffffff8168ab98>] io_schedule+0x18/0x20
Oct 26 18:15:33 host-082 kernel: [<ffffffff8123e6e5>] do_blockdev_direct_IO+0x1c45/0x2020
Oct 26 18:15:33 host-082 kernel: [<ffffffff81238fa0>] ? I_BDEV+0x10/0x10
Oct 26 18:15:33 host-082 kernel: [<ffffffff8123eb15>] __blockdev_direct_IO+0x55/0x60
Oct 26 18:15:33 host-082 kernel: [<ffffffff81238fa0>] ? I_BDEV+0x10/0x10
Oct 26 18:15:33 host-082 kernel: [<ffffffff81239860>] blkdev_direct_IO+0x60/0xa0
Oct 26 18:15:33 host-082 kernel: [<ffffffff81238fa0>] ? I_BDEV+0x10/0x10
Oct 26 18:15:33 host-082 kernel: [<ffffffff811826dc>] generic_file_aio_read+0x70c/0x790
Oct 26 18:15:33 host-082 kernel: [<ffffffff8121e7be>] ? mntput_no_expire+0x3e/0x120
Oct 26 18:15:33 host-082 kernel: [<ffffffff8121e7be>] ? mntput_no_expire+0x3e/0x120
Oct 26 18:15:33 host-082 kernel: [<ffffffff8120dcf6>] ? path_openat+0x166/0x490
Oct 26 18:15:33 host-082 kernel: [<ffffffff81239d9c>] blkdev_aio_read+0x4c/0x70
Oct 26 18:15:33 host-082 kernel: [<ffffffff811fd7cd>] do_sync_read+0x8d/0xd0
Oct 26 18:15:33 host-082 kernel: [<ffffffff811fdf7e>] vfs_read+0x9e/0x170
Oct 26 18:15:33 host-082 kernel: [<ffffffff811feb4f>] SyS_read+0x7f/0xe0
Oct 26 18:15:33 host-082 kernel: [<ffffffff81696489>] system_call_fastpath+0x16/0x1b



Version-Release number of selected component (if applicable):
3.10.0-513.el7.x86_64

lvm2-2.02.166-1.el7    BUILT: Wed Sep 28 02:26:52 CDT 2016
lvm2-libs-2.02.166-1.el7    BUILT: Wed Sep 28 02:26:52 CDT 2016
lvm2-cluster-2.02.166-1.el7    BUILT: Wed Sep 28 02:26:52 CDT 2016
device-mapper-1.02.135-1.el7    BUILT: Wed Sep 28 02:26:52 CDT 2016
device-mapper-libs-1.02.135-1.el7    BUILT: Wed Sep 28 02:26:52 CDT 2016
device-mapper-event-1.02.135-1.el7    BUILT: Wed Sep 28 02:26:52 CDT 2016
device-mapper-event-libs-1.02.135-1.el7    BUILT: Wed Sep 28 02:26:52 CDT 2016
device-mapper-persistent-data-0.6.3-1.el7    BUILT: Fri Jul 22 05:29:13 CDT 2016
cmirror-2.02.166-1.el7    BUILT: Wed Sep 28 02:26:52 CDT 2016
sanlock-3.4.0-1.el7    BUILT: Fri Jun 10 11:41:03 CDT 2016
sanlock-lib-3.4.0-1.el7    BUILT: Fri Jun 10 11:41:03 CDT 2016
lvm2-lockd-2.02.166-1.el7    BUILT: Wed Sep 28 02:26:52 CDT 2016



How reproducible:
Everytime

Comment 2 Corey Marthaler 2016-10-27 23:03:47 UTC
FWIW, this is actually not reproducible every time. In fact, I have yet to reproduce this in a not virt machine environment.

Comment 3 Corey Marthaler 2017-05-15 19:09:55 UTC
This is now easily reproducible in rhel7.4 w/ exclusively activated clustered VGs. It is not reproducible with lvmlockd or single node VGs.


3.10.0-657.el7.x86_64
lvm2-2.02.171-1.el7    BUILT: Wed May  3 07:05:13 CDT 2017
lvm2-libs-2.02.171-1.el7    BUILT: Wed May  3 07:05:13 CDT 2017
lvm2-cluster-2.02.171-1.el7    BUILT: Wed May  3 07:05:13 CDT 2017
device-mapper-1.02.140-1.el7    BUILT: Wed May  3 07:05:13 CDT 2017
device-mapper-libs-1.02.140-1.el7    BUILT: Wed May  3 07:05:13 CDT 2017
device-mapper-event-1.02.140-1.el7    BUILT: Wed May  3 07:05:13 CDT 2017
device-mapper-event-libs-1.02.140-1.el7    BUILT: Wed May  3 07:05:13 CDT 2017
device-mapper-persistent-data-0.7.0-0.1.rc6.el7    BUILT: Mon Mar 27 10:15:46 CDT 2017


SCENARIO - [attempt_to_remove_suspended_cache_pool]
Create a cache volume, suspend the pool and meta devices, then attempt to remove it

*** Cache info for this scenario ***
*  origin (slow):  /dev/mapper/mpatha1
*  pool (fast):    /dev/mapper/mpathd1
************************************

Adding "slow" and "fast" tags to corresponding pvs
Create origin (slow) volume
lvcreate --wipesignatures y --activate ey -L 4G -n suspend cache_sanity @slow

Create cache data and cache metadata (fast) volumes
lvcreate --activate ey -L 4G -n POOL cache_sanity @fast
lvcreate --activate ey -L 12M -n POOL_meta cache_sanity @fast

Create cache pool volume by combining the cache data and cache metadata (fast) volumes with policy: smq  mode: writethrough
lvconvert --yes --type cache-pool --cachepolicy smq --cachemode writethrough -c 64 --poolmetadata cache_sanity/POOL_meta cache_sanity/POOL
  WARNING: Converting logical volume cache_sanity/POOL and cache_sanity/POOL_meta to cache pool's data and metadata volumes with metadata wiping.
  THIS WILL DESTROY CONTENT OF LOGICAL VOLUME (filesystem etc.)
Create cached volume by combining the cache pool (fast) and origin (slow) volumes
lvconvert --yes --type cache --cachemetadataformat 2 --cachepool cache_sanity/POOL cache_sanity/suspend
CHECKING META FMT: lvs --noheadings -o lv_name --select 'cachemetadataformat=2' | grep suspend
dmsetup status | grep cache_sanity-suspend | grep writethrough | grep -w smq

dmsetup suspend cache_sanity-POOL_cdata
Uncache/remove the cache pool while it's suspended
[deadlock]

May 11 12:15:42 mckinley-03 qarshd[46792]: Running cmdline: lvconvert --yes --type cache --cachemetadataformat 2 --cachepool cache_sanity/POOL cache_sanity/suspend
May 11 12:15:42 mckinley-03 systemd-udevd: inotify_add_watch(7, /dev/dm-19, 10) failed: No such file or directory
May 11 12:15:42 mckinley-03 multipathd: cache_sanity-POOL: adding map
May 11 12:15:42 mckinley-03 multipathd: cache_sanity-POOL: devmap dm-19 added
May 11 12:15:42 mckinley-03 multipathd: dm-19: remove map (uevent)
May 11 12:15:42 mckinley-03 multipathd: dm-19: devmap not registered, can't remove
May 11 12:15:42 mckinley-03 multipathd: dm-19: remove map (uevent)
May 11 12:15:43 mckinley-03 systemd: Started qarsh Per-Connection Server (10.15.80.218:33194).
May 11 12:15:43 mckinley-03 systemd: Starting qarsh Per-Connection Server (10.15.80.218:33194)...
May 11 12:15:43 mckinley-03 qarshd[46861]: Talking to peer ::ffff:10.15.80.218:33194 (IPv6)
May 11 12:15:43 mckinley-03 qarshd[46861]: Running cmdline: lvs --noheadings -o lv_name --select 'cachemetadataformat=2' | grep suspend
May 11 12:15:43 mckinley-03 systemd: Started qarsh Per-Connection Server (10.15.80.218:33196).
May 11 12:15:43 mckinley-03 systemd: Starting qarsh Per-Connection Server (10.15.80.218:33196)...
May 11 12:15:43 mckinley-03 qarshd[46869]: Talking to peer ::ffff:10.15.80.218:33196 (IPv6)
May 11 12:15:43 mckinley-03 qarshd[46869]: Running cmdline: lvs --noheadings -o lv_name --select 'cachemode=writethrough && cache_policy=smq' | grep suspend
May 11 12:15:44 mckinley-03 systemd: Started qarsh Per-Connection Server (10.15.80.218:33198).
May 11 12:15:44 mckinley-03 systemd: Starting qarsh Per-Connection Server (10.15.80.218:33198)...
May 11 12:15:44 mckinley-03 qarshd[46878]: Talking to peer ::ffff:10.15.80.218:33198 (IPv6)
May 11 12:15:44 mckinley-03 qarshd[46878]: Running cmdline: dmsetup status | grep cache_sanity-suspend | grep writethrough | grep -w smq
May 11 12:15:44 mckinley-03 systemd: Started qarsh Per-Connection Server (10.15.80.218:33200).
May 11 12:15:44 mckinley-03 systemd: Starting qarsh Per-Connection Server (10.15.80.218:33200)...
May 11 12:15:44 mckinley-03 qarshd[46887]: Talking to peer ::ffff:10.15.80.218:33200 (IPv6)
May 11 12:15:44 mckinley-03 qarshd[46887]: Running cmdline: dmsetup suspend cache_sanity-POOL_cdata
May 11 12:15:45 mckinley-03 systemd: Started qarsh Per-Connection Server (10.15.80.218:33206).
May 11 12:15:45 mckinley-03 systemd: Starting qarsh Per-Connection Server (10.15.80.218:33206)...
May 11 12:15:45 mckinley-03 qarshd[46892]: Talking to peer ::ffff:10.15.80.218:33206 (IPv6)
May 11 12:15:45 mckinley-03 qarshd[46892]: Running cmdline: lvconvert --uncache cache_sanity/suspend
May 11 12:18:59 mckinley-03 kernel: INFO: task lvconvert:46893 blocked for more than 120 seconds.
May 11 12:18:59 mckinley-03 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 11 12:18:59 mckinley-03 kernel: lvconvert       D ffff99f1fcc44d80     0 46893  46892 0x00000080
May 11 12:18:59 mckinley-03 kernel: ffff99e1eda33990 0000000000000086 ffff99e1f6ea2f10 ffff99e1eda33fd8
May 11 12:18:59 mckinley-03 kernel: ffff99e1eda33fd8 ffff99e1eda33fd8 ffff99e1f6ea2f10 ffff99f1fea96cc0
May 11 12:19:00 mckinley-03 kernel: 0000000000000000 7fffffffffffffff 0000000000000000 ffff99f1fcc44d80
May 11 12:19:00 mckinley-03 kernel: Call Trace:
May 11 12:19:00 mckinley-03 kernel: [<ffffffff930a4aa9>] schedule+0x29/0x70
May 11 12:19:00 mckinley-03 kernel: [<ffffffff930a25b9>] schedule_timeout+0x239/0x2c0
May 11 12:19:00 mckinley-03 kernel: [<ffffffffc01b6e0b>] ? dm_make_request+0x16b/0x190 [dm_mod]
May 11 12:19:00 mckinley-03 kernel: [<ffffffff92cf6c26>] ? generic_make_request+0x106/0x1e0
May 11 12:19:00 mckinley-03 kernel: [<ffffffff92ae933c>] ? ktime_get_ts64+0x4c/0xf0
May 11 12:19:00 mckinley-03 kernel: [<ffffffff930a412d>] io_schedule_timeout+0xad/0x130
May 11 12:19:00 mckinley-03 kernel: [<ffffffff930a41c8>] io_schedule+0x18/0x20
May 11 12:19:00 mckinley-03 kernel: [<ffffffff92c41845>] do_blockdev_direct_IO+0x1c45/0x2020
May 11 12:19:00 mckinley-03 kernel: [<ffffffff92c3c1a0>] ? I_BDEV+0x10/0x10
May 11 12:19:00 mckinley-03 kernel: [<ffffffff92c41c75>] __blockdev_direct_IO+0x55/0x60
May 11 12:19:00 mckinley-03 kernel: [<ffffffff92c3c1a0>] ? I_BDEV+0x10/0x10
May 11 12:19:00 mckinley-03 kernel: [<ffffffff92c3ca90>] blkdev_direct_IO+0x60/0xa0
May 11 12:19:00 mckinley-03 kernel: [<ffffffff92c3c1a0>] ? I_BDEV+0x10/0x10
May 11 12:19:00 mckinley-03 kernel: [<ffffffff92b8438c>] generic_file_aio_read+0x70c/0x790
May 11 12:19:00 mckinley-03 kernel: [<ffffffff92c11116>] ? path_openat+0x166/0x490
May 11 12:19:00 mckinley-03 kernel: [<ffffffff92c3cfec>] blkdev_aio_read+0x4c/0x70
May 11 12:19:00 mckinley-03 kernel: [<ffffffff92bffd1d>] do_sync_read+0x8d/0xd0
May 11 12:19:00 mckinley-03 kernel: [<ffffffff92c0070c>] vfs_read+0x9c/0x170
May 11 12:19:00 mckinley-03 kernel: [<ffffffff92c015cf>] SyS_read+0x7f/0xe0
May 11 12:19:00 mckinley-03 kernel: [<ffffffff930b05c9>] system_call_fastpath+0x16/0x1b
May 11 12:21:00 mckinley-03 kernel: INFO: task lvconvert:46893 blocked for more than 120 seconds.
May 11 12:21:00 mckinley-03 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 11 12:21:00 mckinley-03 kernel: lvconvert       D ffff99f1fcc44d80     0 46893  46892 0x00000080
May 11 12:21:00 mckinley-03 kernel: ffff99e1eda33990 0000000000000086 ffff99e1f6ea2f10 ffff99e1eda33fd8
May 11 12:21:00 mckinley-03 kernel: ffff99e1eda33fd8 ffff99e1eda33fd8 ffff99e1f6ea2f10 ffff99f1fea96cc0
May 11 12:21:00 mckinley-03 kernel: 0000000000000000 7fffffffffffffff 0000000000000000 ffff99f1fcc44d80
May 11 12:21:00 mckinley-03 kernel: Call Trace:
May 11 12:21:00 mckinley-03 kernel: [<ffffffff930a4aa9>] schedule+0x29/0x70
May 11 12:21:00 mckinley-03 kernel: [<ffffffff930a25b9>] schedule_timeout+0x239/0x2c0
May 11 12:21:00 mckinley-03 kernel: [<ffffffffc01b6e0b>] ? dm_make_request+0x16b/0x190 [dm_mod]
May 11 12:21:00 mckinley-03 kernel: [<ffffffff92cf6c26>] ? generic_make_request+0x106/0x1e0
May 11 12:21:00 mckinley-03 kernel: [<ffffffff92ae933c>] ? ktime_get_ts64+0x4c/0xf0
May 11 12:21:00 mckinley-03 kernel: [<ffffffff930a412d>] io_schedule_timeout+0xad/0x130
May 11 12:21:00 mckinley-03 kernel: [<ffffffff930a41c8>] io_schedule+0x18/0x20
May 11 12:21:00 mckinley-03 kernel: [<ffffffff92c41845>] do_blockdev_direct_IO+0x1c45/0x2020
May 11 12:21:00 mckinley-03 kernel: [<ffffffff92c3c1a0>] ? I_BDEV+0x10/0x10
May 11 12:21:00 mckinley-03 kernel: [<ffffffff92c41c75>] __blockdev_direct_IO+0x55/0x60
May 11 12:21:00 mckinley-03 kernel: [<ffffffff92c3c1a0>] ? I_BDEV+0x10/0x10
May 11 12:21:00 mckinley-03 kernel: [<ffffffff92c3ca90>] blkdev_direct_IO+0x60/0xa0
May 11 12:21:00 mckinley-03 kernel: [<ffffffff92c3c1a0>] ? I_BDEV+0x10/0x10
May 11 12:21:00 mckinley-03 kernel: [<ffffffff92b8438c>] generic_file_aio_read+0x70c/0x790
May 11 12:21:00 mckinley-03 kernel: [<ffffffff92c11116>] ? path_openat+0x166/0x490
May 11 12:21:00 mckinley-03 kernel: [<ffffffff92c3cfec>] blkdev_aio_read+0x4c/0x70
May 11 12:21:00 mckinley-03 kernel: [<ffffffff92bffd1d>] do_sync_read+0x8d/0xd0
May 11 12:21:00 mckinley-03 kernel: [<ffffffff92c0070c>] vfs_read+0x9c/0x170
May 11 12:21:00 mckinley-03 kernel: [<ffffffff92c015cf>] SyS_read+0x7f/0xe0
May 11 12:21:00 mckinley-03 kernel: [<ffffffff930b05c9>] system_call_fastpath+0x16/0x1b
May 11 12:23:00 mckinley-03 kernel: INFO: task lvconvert:46893 blocked for more than 120 seconds.
May 11 12:23:00 mckinley-03 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 11 12:23:00 mckinley-03 kernel: lvconvert       D ffff99f1fcc44d80     0 46893  46892 0x00000080
May 11 12:23:00 mckinley-03 kernel: ffff99e1eda33990 0000000000000086 ffff99e1f6ea2f10 ffff99e1eda33fd8
May 11 12:23:00 mckinley-03 kernel: ffff99e1eda33fd8 ffff99e1eda33fd8 ffff99e1f6ea2f10 ffff99f1fea96cc0
May 11 12:23:00 mckinley-03 kernel: 0000000000000000 7fffffffffffffff 0000000000000000 ffff99f1fcc44d80
May 11 12:23:00 mckinley-03 kernel: Call Trace:
May 11 12:23:00 mckinley-03 kernel: [<ffffffff930a4aa9>] schedule+0x29/0x70
May 11 12:23:00 mckinley-03 kernel: [<ffffffff930a25b9>] schedule_timeout+0x239/0x2c0
May 11 12:23:00 mckinley-03 kernel: [<ffffffffc01b6e0b>] ? dm_make_request+0x16b/0x190 [dm_mod]
May 11 12:23:00 mckinley-03 kernel: [<ffffffff92cf6c26>] ? generic_make_request+0x106/0x1e0
May 11 12:23:00 mckinley-03 kernel: [<ffffffff92ae933c>] ? ktime_get_ts64+0x4c/0xf0
May 11 12:23:00 mckinley-03 kernel: [<ffffffff930a412d>] io_schedule_timeout+0xad/0x130
May 11 12:23:00 mckinley-03 kernel: [<ffffffff930a41c8>] io_schedule+0x18/0x20
May 11 12:23:00 mckinley-03 kernel: [<ffffffff92c41845>] do_blockdev_direct_IO+0x1c45/0x2020
May 11 12:23:00 mckinley-03 kernel: [<ffffffff92c3c1a0>] ? I_BDEV+0x10/0x10
May 11 12:23:00 mckinley-03 kernel: [<ffffffff92c41c75>] __blockdev_direct_IO+0x55/0x60
May 11 12:23:00 mckinley-03 kernel: [<ffffffff92c3c1a0>] ? I_BDEV+0x10/0x10
May 11 12:23:00 mckinley-03 kernel: [<ffffffff92c3ca90>] blkdev_direct_IO+0x60/0xa0
May 11 12:23:00 mckinley-03 kernel: [<ffffffff92c3c1a0>] ? I_BDEV+0x10/0x10
May 11 12:23:00 mckinley-03 kernel: [<ffffffff92b8438c>] generic_file_aio_read+0x70c/0x790
May 11 12:23:00 mckinley-03 kernel: [<ffffffff92c11116>] ? path_openat+0x166/0x490
May 11 12:23:00 mckinley-03 kernel: [<ffffffff92c3cfec>] blkdev_aio_read+0x4c/0x70
May 11 12:23:00 mckinley-03 kernel: [<ffffffff92bffd1d>] do_sync_read+0x8d/0xd0
May 11 12:23:00 mckinley-03 kernel: [<ffffffff92c0070c>] vfs_read+0x9c/0x170
May 11 12:23:00 mckinley-03 kernel: [<ffffffff92c015cf>] SyS_read+0x7f/0xe0
May 11 12:23:00 mckinley-03 kernel: [<ffffffff930b05c9>] system_call_fastpath+0x16/0x1b

Comment 4 Corey Marthaler 2018-07-30 16:39:38 UTC
Looks like this is actually reproducible in 7.6 w/ exclusively activated cache pools on  shared lvmlockd VGs. Turning off this this scenario in *all* clustered environments going forward...

3.10.0-927.el7.x86_64
lvm2-2.02.180-1.el7    BUILT: Fri Jul 20 12:21:35 CDT 2018
lvm2-libs-2.02.180-1.el7    BUILT: Fri Jul 20 12:21:35 CDT 2018
lvm2-cluster-2.02.180-1.el7    BUILT: Fri Jul 20 12:21:35 CDT 2018
lvm2-lockd-2.02.180-1.el7    BUILT: Fri Jul 20 12:21:35 CDT 2018
lvm2-python-boom-0.9-4.el7    BUILT: Fri Jul 20 12:23:30 CDT 2018
cmirror-2.02.180-1.el7    BUILT: Fri Jul 20 12:21:35 CDT 2018
device-mapper-1.02.149-1.el7    BUILT: Fri Jul 20 12:21:35 CDT 2018
device-mapper-libs-1.02.149-1.el7    BUILT: Fri Jul 20 12:21:35 CDT 2018
device-mapper-event-1.02.149-1.el7    BUILT: Fri Jul 20 12:21:35 CDT 2018
device-mapper-event-libs-1.02.149-1.el7    BUILT: Fri Jul 20 12:21:35 CDT 2018
device-mapper-persistent-data-0.7.3-3.el7    BUILT: Tue Nov 14 05:07:18 CST 2017




SCENARIO - [attempt_to_remove_suspended_cache_pool]
Create a cache volume, suspend the pool and meta devices, then attempt to remove it

*** Cache info for this scenario ***
*  origin (slow):  /dev/mapper/mpathb1
*  pool (fast):    /dev/mapper/mpathd1
************************************

Adding "slow" and "fast" tags to corresponding pvs
Create origin (slow) volume
lvcreate --wipesignatures y --activate ey -L 4G -n suspend cache_sanity @slow

Create cache data and cache metadata (fast) volumes
lvcreate --activate ey -L 4G -n POOL cache_sanity @fast
lvcreate --activate ey -L 12M -n POOL_meta cache_sanity @fast

Create cache pool volume by combining the cache data and cache metadata (fast) volumes with policy: mq  mode: writethrough
lvconvert --yes --type cache-pool --cachepolicy mq --cachemode writethrough -c 64 --poolmetadata cache_sanity/POOL_meta cache_sanity/POOL
  WARNING: Converting cache_sanity/POOL and cache_sanity/POOL_meta to cache pool's data and metadata volumes with metadata wiping.
  THIS WILL DESTROY CONTENT OF LOGICAL VOLUME (filesystem etc.)
Create cached volume by combining the cache pool (fast) and origin (slow) volumes
lvconvert --yes --type cache --cachemetadataformat 2 --cachepool cache_sanity/POOL cache_sanity/suspend
lvm cache metadata format doesn't appear to be consistent with what was specified (2)

dmsetup suspend cache_sanity-POOL_cdata
Uncache/remove the cache pool while it's suspended
lvconvert --uncache cache_sanity/suspend
[DEADLOCK]


Jul 26 17:06:33 harding-03 qarshd[10825]: Running cmdline: lvconvert --uncache cache_sanity/suspend                                                                              
Jul 26 17:09:01 harding-03 kernel: INFO: task lvconvert:10826 blocked for more than 120 seconds.                                                                                 
Jul 26 17:09:01 harding-03 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.                                                                     
Jul 26 17:09:01 harding-03 kernel: lvconvert       D ffffa0d16d34b0c0     0 10826  10825 0x00000080                                                                              
Jul 26 17:09:01 harding-03 kernel: Call Trace:                                                                                                                                   
Jul 26 17:09:01 harding-03 kernel: [<ffffffffa6d61069>] schedule+0x29/0x70                                                                                                       
Jul 26 17:09:01 harding-03 kernel: [<ffffffffa6d62955>] rwsem_down_write_failed+0x225/0x3a0                                                                                      
Jul 26 17:09:01 harding-03 kernel: [<ffffffffa66c67df>] ? try_check_zero+0xbf/0xf0                                                                                               
Jul 26 17:09:01 harding-03 kernel: [<ffffffffa69831c7>] call_rwsem_down_write_failed+0x17/0x30                                                                                   
Jul 26 17:09:01 harding-03 kernel: [<ffffffffa6d601dd>] down_write+0x2d/0x3d                                                                                                     
Jul 26 17:09:01 harding-03 kernel: [<ffffffffc08cb8d5>] cache_postsuspend+0x35/0x330 [dm_cache]                                                                                  
Jul 26 17:09:01 harding-03 kernel: [<ffffffffc08cb8a0>] ? writethrough_endio+0x170/0x170 [dm_cache]                                                                              
Jul 26 17:09:01 harding-03 kernel: [<ffffffffc02feedd>] dm_table_postsuspend_targets+0x4d/0x60 [dm_mod]                                                                          
Jul 26 17:09:01 harding-03 kernel: [<ffffffffc02fc09c>] dm_suspend+0xbc/0xc0 [dm_mod]                                                                                            
Jul 26 17:09:01 harding-03 kernel: [<ffffffffc0301a87>] dev_suspend+0x197/0x260 [dm_mod]                                                                                         
Jul 26 17:09:01 harding-03 kernel: [<ffffffffc0302b4e>] ctl_ioctl+0x24e/0x550 [dm_mod]                                                                                           
Jul 26 17:09:01 harding-03 kernel: [<ffffffffc03018f0>] ? table_load+0x390/0x390 [dm_mod]                                                                                        
Jul 26 17:09:01 harding-03 kernel: [<ffffffffc0302e5e>] dm_ctl_ioctl+0xe/0x20 [dm_mod]                                                                                           
Jul 26 17:09:01 harding-03 kernel: [<ffffffffa6854470>] do_vfs_ioctl+0x3a0/0x5a0                                                                                                 
Jul 26 17:09:01 harding-03 kernel: [<ffffffffa6d6076e>] ? __schedule+0x14e/0xa20
Jul 26 17:09:01 harding-03 kernel: [<ffffffffa6854711>] SyS_ioctl+0xa1/0xc0
Jul 26 17:09:01 harding-03 kernel: [<ffffffffa6d6dc95>] ? system_call_after_swapgs+0xa2/0x146
Jul 26 17:09:01 harding-03 kernel: [<ffffffffa6d6dd55>] system_call_fastpath+0x1c/0x21
Jul 26 17:09:01 harding-03 kernel: [<ffffffffa6d6dca1>] ? system_call_after_swapgs+0xae/0x146
Jul 26 17:11:01 harding-03 kernel: INFO: task lvconvert:10826 blocked for more than 120 seconds.
Jul 26 17:11:01 harding-03 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 26 17:11:01 harding-03 kernel: lvconvert       D ffffa0d16d34b0c0     0 10826  10825 0x00000080
Jul 26 17:11:01 harding-03 kernel: Call Trace:
Jul 26 17:11:01 harding-03 kernel: [<ffffffffa6d61069>] schedule+0x29/0x70
Jul 26 17:11:01 harding-03 kernel: [<ffffffffa6d62955>] rwsem_down_write_failed+0x225/0x3a0
Jul 26 17:11:01 harding-03 kernel: [<ffffffffa66c67df>] ? try_check_zero+0xbf/0xf0
Jul 26 17:11:01 harding-03 kernel: [<ffffffffa69831c7>] call_rwsem_down_write_failed+0x17/0x30
Jul 26 17:11:01 harding-03 kernel: [<ffffffffa6d601dd>] down_write+0x2d/0x3d
Jul 26 17:11:01 harding-03 kernel: [<ffffffffc08cb8d5>] cache_postsuspend+0x35/0x330 [dm_cache]
Jul 26 17:11:01 harding-03 kernel: [<ffffffffc08cb8a0>] ? writethrough_endio+0x170/0x170 [dm_cache]
Jul 26 17:11:01 harding-03 kernel: [<ffffffffc02feedd>] dm_table_postsuspend_targets+0x4d/0x60 [dm_mod]
Jul 26 17:11:01 harding-03 kernel: [<ffffffffc02fc09c>] dm_suspend+0xbc/0xc0 [dm_mod]
Jul 26 17:11:01 harding-03 kernel: [<ffffffffc0301a87>] dev_suspend+0x197/0x260 [dm_mod]
Jul 26 17:11:01 harding-03 kernel: [<ffffffffc0302b4e>] ctl_ioctl+0x24e/0x550 [dm_mod]
Jul 26 17:11:01 harding-03 kernel: [<ffffffffc03018f0>] ? table_load+0x390/0x390 [dm_mod]
Jul 26 17:11:01 harding-03 kernel: [<ffffffffc0302e5e>] dm_ctl_ioctl+0xe/0x20 [dm_mod]
Jul 26 17:11:01 harding-03 kernel: [<ffffffffa6854470>] do_vfs_ioctl+0x3a0/0x5a0
Jul 26 17:11:01 harding-03 kernel: [<ffffffffa6d6076e>] ? __schedule+0x14e/0xa20
Jul 26 17:11:01 harding-03 kernel: [<ffffffffa6854711>] SyS_ioctl+0xa1/0xc0
Jul 26 17:11:02 harding-03 kernel: [<ffffffffa6d6dc95>] ? system_call_after_swapgs+0xa2/0x146
Jul 26 17:11:02 harding-03 kernel: [<ffffffffa6d6dd55>] system_call_fastpath+0x1c/0x21
Jul 26 17:11:02 harding-03 kernel: [<ffffffffa6d6dca1>] ? system_call_after_swapgs+0xae/0x146

Comment 5 Corey Marthaler 2018-08-08 16:33:16 UTC
This can deadlock in single machine (non cluster/shared) mode as well.

3.10.0-931.el7.x86_64

lvm2-2.02.180-2.el7    BUILT: Wed Aug  1 11:22:48 CDT 2018
lvm2-libs-2.02.180-2.el7    BUILT: Wed Aug  1 11:22:48 CDT 2018
device-mapper-1.02.149-2.el7    BUILT: Wed Aug  1 11:22:48 CDT 2018
device-mapper-libs-1.02.149-2.el7    BUILT: Wed Aug  1 11:22:48 CDT 2018
device-mapper-event-1.02.149-2.el7    BUILT: Wed Aug  1 11:22:48 CDT 2018
device-mapper-event-libs-1.02.149-2.el7    BUILT: Wed Aug  1 11:22:48 CDT 2018
device-mapper-persistent-data-0.7.3-3.el7    BUILT: Tue Nov 14 05:07:18 CST 2017



SCENARIO - [attempt_to_remove_suspended_cache_pool]
Create a cache volume, suspend the pool and meta devices, then attempt to remove it
  WARNING: Not using lvmetad because a repair command was run.

*** Cache info for this scenario ***
*  origin (slow):  /dev/sda1 /dev/sdd1 /dev/sdf1
*  pool (fast):    /dev/sdc1 /dev/sde1 /dev/sdg1
************************************

Adding "slow" and "fast" tags to corresponding pvs
  WARNING: Not using lvmetad because a repair command was run.
  WARNING: Not using lvmetad because a repair command was run.
Create origin (slow) volume
lvcreate  -i 3 -L 4G -n suspend cache_sanity @slow
  WARNING: Not using lvmetad because a repair command was run.

Create cache data and cache metadata (fast) volumes
lvcreate  -i 3 -L 4G -n POOL cache_sanity @fast
  WARNING: Not using lvmetad because a repair command was run.
WARNING: xfs_external_log signature detected on /dev/cache_sanity/POOL at offset 196608. Wipe it? [y/n]: [n]
  Aborted wiping of xfs_external_log.
  1 existing signature left on the device.
lvcreate  -i 3 -L 12M -n POOL_meta cache_sanity @fast
  WARNING: Not using lvmetad because a repair command was run.

Create cache pool volume by combining the cache data and cache metadata (fast) volumes with policy: smq  mode: writethrough
lvconvert --yes --type cache-pool --cachepolicy smq --cachemode writethrough -c 64 --poolmetadata cache_sanity/POOL_meta cache_sanity/POOL
  WARNING: Not using lvmetad because a repair command was run.
  WARNING: Converting cache_sanity/POOL and cache_sanity/POOL_meta to cache pool's data and metadata volumes with metadata wiping.
  THIS WILL DESTROY CONTENT OF LOGICAL VOLUME (filesystem etc.)
Create cached volume by combining the cache pool (fast) and origin (slow) volumes
lvconvert --yes --type cache --cachemetadataformat 2 --cachepool cache_sanity/POOL cache_sanity/suspend
  WARNING: Not using lvmetad because a repair command was run.
  WARNING: Not using lvmetad because a repair command was run.

dmsetup suspend cache_sanity-POOL_cdata
Uncache/remove the cache pool while it's suspended
lvconvert --uncache cache_sanity/suspend
  WARNING: Not using lvmetad because a repair command was run.


Aug  6 19:13:16 host-093 kernel: [<ffffffffc03bafad>] dm_table_postsuspend_targets+0x4d/0x60 [dm_mod]
Aug  6 19:13:16 host-093 kernel: [<ffffffffc03b807c>] dm_suspend+0xbc/0xc0 [dm_mod]
Aug  6 19:13:16 host-093 kernel: [<ffffffffc03bdb57>] dev_suspend+0x197/0x260 [dm_mod]
Aug  6 19:13:16 host-093 kernel: [<ffffffffc03bec1e>] ctl_ioctl+0x24e/0x550 [dm_mod]
Aug  6 19:13:16 host-093 kernel: [<ffffffffc03bd9c0>] ? table_load+0x390/0x390 [dm_mod]
Aug  6 19:13:16 host-093 kernel: [<ffffffffc03bef2e>] dm_ctl_ioctl+0xe/0x20 [dm_mod]
Aug  6 19:13:16 host-093 kernel: [<ffffffff920549a0>] do_vfs_ioctl+0x3a0/0x5a0
Aug  6 19:13:16 host-093 kernel: [<ffffffff9256313e>] ? __schedule+0x14e/0xa20
Aug  6 19:13:16 host-093 kernel: [<ffffffff92054c41>] SyS_ioctl+0xa1/0xc0
Aug  6 19:13:16 host-093 kernel: [<ffffffff92570c95>] ? system_call_after_swapgs+0xa2/0x146
Aug  6 19:13:16 host-093 kernel: [<ffffffff92570d55>] system_call_fastpath+0x1c/0x21
Aug  6 19:13:16 host-093 kernel: [<ffffffff92570ca1>] ? system_call_after_swapgs+0xae/0x146
Aug  6 19:15:16 host-093 kernel: INFO: task lvconvert:7390 blocked for more than 120 seconds.
Aug  6 19:15:16 host-093 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug  6 19:15:16 host-093 kernel: lvconvert       D ffffa031ce0cc100     0  7390   7389 0x00000080
Aug  6 19:15:16 host-093 kernel: Call Trace:
Aug  6 19:15:16 host-093 kernel: [<ffffffff92563a39>] schedule+0x29/0x70
Aug  6 19:15:16 host-093 kernel: [<ffffffff92565325>] rwsem_down_write_failed+0x225/0x3a0
Aug  6 19:15:16 host-093 kernel: [<ffffffff91ec66bf>] ? try_check_zero+0xbf/0xf0
Aug  6 19:15:16 host-093 kernel: [<ffffffff92183d37>] call_rwsem_down_write_failed+0x17/0x30
Aug  6 19:15:16 host-093 kernel: [<ffffffff92562bad>] down_write+0x2d/0x3d
Aug  6 19:15:16 host-093 kernel: [<ffffffffc07a48d5>] cache_postsuspend+0x35/0x330 [dm_cache]
Aug  6 19:15:16 host-093 kernel: [<ffffffffc03bafad>] dm_table_postsuspend_targets+0x4d/0x60 [dm_mod]
Aug  6 19:15:16 host-093 kernel: [<ffffffffc03b807c>] dm_suspend+0xbc/0xc0 [dm_mod]
Aug  6 19:15:16 host-093 kernel: [<ffffffffc03bdb57>] dev_suspend+0x197/0x260 [dm_mod]
Aug  6 19:15:16 host-093 kernel: [<ffffffffc03bec1e>] ctl_ioctl+0x24e/0x550 [dm_mod]
Aug  6 19:15:16 host-093 kernel: [<ffffffffc03bd9c0>] ? table_load+0x390/0x390 [dm_mod]
Aug  6 19:15:16 host-093 kernel: [<ffffffffc03bef2e>] dm_ctl_ioctl+0xe/0x20 [dm_mod]
Aug  6 19:15:16 host-093 kernel: [<ffffffff920549a0>] do_vfs_ioctl+0x3a0/0x5a0
Aug  6 19:15:16 host-093 kernel: [<ffffffff9256313e>] ? __schedule+0x14e/0xa20
Aug  6 19:15:16 host-093 kernel: [<ffffffff92054c41>] SyS_ioctl+0xa1/0xc0
Aug  6 19:15:16 host-093 kernel: [<ffffffff92570c95>] ? system_call_after_swapgs+0xa2/0x146
Aug  6 19:15:16 host-093 kernel: [<ffffffff92570d55>] system_call_fastpath+0x1c/0x21
Aug  6 19:15:16 host-093 kernel: [<ffffffff92570ca1>] ? system_call_after_swapgs+0xae/0x146
Aug  6 19:17:16 host-093 kernel: INFO: task lvconvert:7390 blocked for more than 120 seconds.
Aug  6 19:17:16 host-093 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug  6 19:17:16 host-093 kernel: lvconvert       D ffffa031ce0cc100     0  7390   7389 0x00000080
Aug  6 19:17:16 host-093 kernel: Call Trace:
Aug  6 19:17:16 host-093 kernel: [<ffffffff92563a39>] schedule+0x29/0x70
Aug  6 19:17:16 host-093 kernel: [<ffffffff92565325>] rwsem_down_write_failed+0x225/0x3a0
Aug  6 19:17:16 host-093 kernel: [<ffffffff91ec66bf>] ? try_check_zero+0xbf/0xf0
Aug  6 19:17:16 host-093 kernel: [<ffffffff92183d37>] call_rwsem_down_write_failed+0x17/0x30
Aug  6 19:17:16 host-093 kernel: [<ffffffff92562bad>] down_write+0x2d/0x3d
Aug  6 19:17:16 host-093 kernel: [<ffffffffc07a48d5>] cache_postsuspend+0x35/0x330 [dm_cache]
Aug  6 19:17:16 host-093 kernel: [<ffffffffc03bafad>] dm_table_postsuspend_targets+0x4d/0x60 [dm_mod]
Aug  6 19:17:16 host-093 kernel: [<ffffffffc03b807c>] dm_suspend+0xbc/0xc0 [dm_mod]
Aug  6 19:17:16 host-093 kernel: [<ffffffffc03bdb57>] dev_suspend+0x197/0x260 [dm_mod]
Aug  6 19:17:16 host-093 kernel: [<ffffffffc03bec1e>] ctl_ioctl+0x24e/0x550 [dm_mod]
Aug  6 19:17:16 host-093 kernel: [<ffffffffc03bd9c0>] ? table_load+0x390/0x390 [dm_mod]
Aug  6 19:17:16 host-093 kernel: [<ffffffffc03bef2e>] dm_ctl_ioctl+0xe/0x20 [dm_mod]
Aug  6 19:17:16 host-093 kernel: [<ffffffff920549a0>] do_vfs_ioctl+0x3a0/0x5a0
Aug  6 19:17:16 host-093 kernel: [<ffffffff9256313e>] ? __schedule+0x14e/0xa20
Aug  6 19:17:16 host-093 kernel: [<ffffffff92054c41>] SyS_ioctl+0xa1/0xc0
Aug  6 19:17:16 host-093 kernel: [<ffffffff92570c95>] ? system_call_after_swapgs+0xa2/0x146
Aug  6 19:17:16 host-093 kernel: [<ffffffff92570d55>] system_call_fastpath+0x1c/0x21
Aug  6 19:17:16 host-093 kernel: [<ffffffff92570ca1>] ? system_call_after_swapgs+0xae/0x146

Comment 13 Zdenek Kabelac 2020-11-18 16:57:44 UTC
Closing as there is not even design for upstream handling in these situations.
Currently lvm2 works as expected and waits till devices suspended outside of lvm2 world are resumed.