Bug 1249234 - smq cache policy + writeback cache mode leads to deadlocks
smq cache policy + writeback cache mode leads to deadlocks
Status: CLOSED DUPLICATE of bug 1247192
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: lvm2 (Show other bugs)
7.2
x86_64 Linux
high Severity high
: rc
: ---
Assigned To: LVM and device-mapper development team
cluster-qe@redhat.com
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2015-07-31 18:41 EDT by Corey Marthaler
Modified: 2015-07-31 22:24 EDT (History)
6 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2015-07-31 22:24:05 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Corey Marthaler 2015-07-31 18:41:39 EDT
Description of problem:
I've hit this on four different systems now and in each case, smq + writeback was used along with a filesystem and I/O to many files (500). In each case, after the I/O was finished, the next lvm command run (lvconvert --uncache, lvconvert --splitcache, or lvrename) deadlocked.

I've tried to reproduce this with just smq + writeback, or smq + writeback + simple dd I/O and was unsuccessful. Looks likes the many I/Os must be required to reproduce.


Create origin (slow) volume
lvcreate -L 4G -n rename_orig_A cache_sanity /dev/sde1

Create cache data and cache metadata (fast) volumes
lvcreate -L 4G -n rename_pool_A cache_sanity /dev/sda1
lvcreate -L 12M -n rename_pool_A_meta cache_sanity /dev/sda1

Create cache pool volume by combining the cache data and cache metadata (fast) volumes
lvconvert --yes --type cache-pool --cachemode writeback -c 32 --poolmetadata cache_sanity/rename_pool_A_meta cache_sanity/rename_pool_A
  WARNING: Converting logical volume cache_sanity/rename_pool_A and cache_sanity/rename_pool_A_meta to pool's data and metadata volumes.
  THIS WILL DESTROY CONTENT OF LOGICAL VOLUME (filesystem etc.)

Create cached volume by combining the cache pool (fast) and origin (slow) volumes
lvconvert --yes --type cache --cachepool cache_sanity/rename_pool_A cache_sanity/rename_orig_A

Changing cache policy to smq

Placing an xfs filesystem on origin volume
Mounting origin volume

Writing files to /mnt/rename_orig_A
checkit starting with:
CREATE
Num files:          500
Random Seed:        7049
Verify XIOR Stream: /tmp/checkit_origin_1
Working dir:        /mnt/rename_orig_A

Checking files on /mnt/rename_orig_A
checkit starting with:
VERIFY
Verify XIOR Stream: /tmp/checkit_origin_1
Working dir:        /mnt/rename_orig_A

syncing before snap creation...
Renaming cache ORIGIN volume...
lvrename cache_sanity/rename_orig_A cache_sanity/rename_orig_B


[12840.494138] INFO: task lvrename:7063 blocked for more than 120 seconds.
[12840.496267] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[12840.498717] lvrename        D ffff88003a7c5000     0  7063   7062 0x00000080
[12840.501132]  ffff88001f223bf0 0000000000000086 ffff88003bacd080 ffff88001f223fd8
[12840.503643]  ffff88001f223fd8 ffff88001f223fd8 ffff88003bacd080 ffff88001c21a000
[12840.506267]  ffff88001f223c20 ffff88001c21a0d8 0000000000000000 ffff88003a7c5000
[12840.508822] Call Trace:
[12840.509652]  [<ffffffff81634dd9>] schedule+0x29/0x70
[12840.511266]  [<ffffffffa03d4965>] cache_postsuspend+0xe5/0x4b0 [dm_cache]
[12840.513396]  [<ffffffff810b7670>] ? wake_up_state+0x20/0x20
[12840.515178]  [<ffffffff810a5580>] ? wake_up_atomic_t+0x30/0x30
[12840.517068]  [<ffffffffa000779a>] dm_table_postsuspend_targets+0x4a/0x60 [dm_mod]
[12840.519408]  [<ffffffffa0004bb1>] dm_suspend+0xe1/0xf0 [dm_mod]
[12840.521302]  [<ffffffffa0009ee0>] ? table_load+0x380/0x380 [dm_mod]
[12840.523299]  [<ffffffffa000a074>] dev_suspend+0x194/0x250 [dm_mod]
[12840.525266]  [<ffffffffa0009ee0>] ? table_load+0x380/0x380 [dm_mod]
[12840.527240]  [<ffffffffa000a925>] ctl_ioctl+0x255/0x500 [dm_mod]
[12840.529157]  [<ffffffffa000abe3>] dm_ctl_ioctl+0x13/0x20 [dm_mod]
[12840.531086]  [<ffffffff811ef345>] do_vfs_ioctl+0x2e5/0x4c0
[12840.532832]  [<ffffffff81286fae>] ? file_has_perm+0xae/0xc0
[12840.534685]  [<ffffffff811ef5c1>] SyS_ioctl+0xa1/0xc0
[12840.536305]  [<ffffffff8163fd89>] system_call_fastpath+0x16/0x1b

Version-Release number of selected component (if applicable):
3.10.0-300.el7.x86_64
lvm2-2.02.126-1.el7    BUILT: Tue Jul 28 11:32:33 CDT 2015
lvm2-libs-2.02.126-1.el7    BUILT: Tue Jul 28 11:32:33 CDT 2015
lvm2-cluster-2.02.126-1.el7    BUILT: Tue Jul 28 11:32:33 CDT 2015
device-mapper-1.02.103-1.el7    BUILT: Tue Jul 28 11:32:33 CDT 2015
device-mapper-libs-1.02.103-1.el7    BUILT: Tue Jul 28 11:32:33 CDT 2015
device-mapper-event-1.02.103-1.el7    BUILT: Tue Jul 28 11:32:33 CDT 2015
device-mapper-event-libs-1.02.103-1.el7    BUILT: Tue Jul 28 11:32:33 CDT 2015
device-mapper-persistent-data-0.5.4-1.el7    BUILT: Fri Jul 17 08:56:22 CDT 2015
cmirror-2.02.126-1.el7    BUILT: Tue Jul 28 11:32:33 CDT 2015
sanlock-3.2.4-1.el7    BUILT: Fri Jun 19 12:48:49 CDT 2015
sanlock-lib-3.2.4-1.el7    BUILT: Fri Jun 19 12:48:49 CDT 2015
lvm2-lockd-2.02.126-1.el7    BUILT: Tue Jul 28 11:32:33 CDT 2015


How reproducible:
multiple times.
Comment 1 Mike Snitzer 2015-07-31 22:24:05 EDT
See patches that will be applied via bug#1247192:
https://bugzilla.redhat.com/show_bug.cgi?id=1247192#c1

*** This bug has been marked as a duplicate of bug 1247192 ***

Note You need to log in before you can comment on or make changes to this bug.