RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1249234 - smq cache policy + writeback cache mode leads to deadlocks
Summary: smq cache policy + writeback cache mode leads to deadlocks
Keywords:
Status: CLOSED DUPLICATE of bug 1247192
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: lvm2
Version: 7.2
Hardware: x86_64
OS: Linux
high
high
Target Milestone: rc
: ---
Assignee: LVM and device-mapper development team
QA Contact: cluster-qe@redhat.com
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-07-31 22:41 UTC by Corey Marthaler
Modified: 2023-03-08 07:27 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-08-01 02:24:05 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Corey Marthaler 2015-07-31 22:41:39 UTC
Description of problem:
I've hit this on four different systems now and in each case, smq + writeback was used along with a filesystem and I/O to many files (500). In each case, after the I/O was finished, the next lvm command run (lvconvert --uncache, lvconvert --splitcache, or lvrename) deadlocked.

I've tried to reproduce this with just smq + writeback, or smq + writeback + simple dd I/O and was unsuccessful. Looks likes the many I/Os must be required to reproduce.


Create origin (slow) volume
lvcreate -L 4G -n rename_orig_A cache_sanity /dev/sde1

Create cache data and cache metadata (fast) volumes
lvcreate -L 4G -n rename_pool_A cache_sanity /dev/sda1
lvcreate -L 12M -n rename_pool_A_meta cache_sanity /dev/sda1

Create cache pool volume by combining the cache data and cache metadata (fast) volumes
lvconvert --yes --type cache-pool --cachemode writeback -c 32 --poolmetadata cache_sanity/rename_pool_A_meta cache_sanity/rename_pool_A
  WARNING: Converting logical volume cache_sanity/rename_pool_A and cache_sanity/rename_pool_A_meta to pool's data and metadata volumes.
  THIS WILL DESTROY CONTENT OF LOGICAL VOLUME (filesystem etc.)

Create cached volume by combining the cache pool (fast) and origin (slow) volumes
lvconvert --yes --type cache --cachepool cache_sanity/rename_pool_A cache_sanity/rename_orig_A

Changing cache policy to smq

Placing an xfs filesystem on origin volume
Mounting origin volume

Writing files to /mnt/rename_orig_A
checkit starting with:
CREATE
Num files:          500
Random Seed:        7049
Verify XIOR Stream: /tmp/checkit_origin_1
Working dir:        /mnt/rename_orig_A

Checking files on /mnt/rename_orig_A
checkit starting with:
VERIFY
Verify XIOR Stream: /tmp/checkit_origin_1
Working dir:        /mnt/rename_orig_A

syncing before snap creation...
Renaming cache ORIGIN volume...
lvrename cache_sanity/rename_orig_A cache_sanity/rename_orig_B


[12840.494138] INFO: task lvrename:7063 blocked for more than 120 seconds.
[12840.496267] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[12840.498717] lvrename        D ffff88003a7c5000     0  7063   7062 0x00000080
[12840.501132]  ffff88001f223bf0 0000000000000086 ffff88003bacd080 ffff88001f223fd8
[12840.503643]  ffff88001f223fd8 ffff88001f223fd8 ffff88003bacd080 ffff88001c21a000
[12840.506267]  ffff88001f223c20 ffff88001c21a0d8 0000000000000000 ffff88003a7c5000
[12840.508822] Call Trace:
[12840.509652]  [<ffffffff81634dd9>] schedule+0x29/0x70
[12840.511266]  [<ffffffffa03d4965>] cache_postsuspend+0xe5/0x4b0 [dm_cache]
[12840.513396]  [<ffffffff810b7670>] ? wake_up_state+0x20/0x20
[12840.515178]  [<ffffffff810a5580>] ? wake_up_atomic_t+0x30/0x30
[12840.517068]  [<ffffffffa000779a>] dm_table_postsuspend_targets+0x4a/0x60 [dm_mod]
[12840.519408]  [<ffffffffa0004bb1>] dm_suspend+0xe1/0xf0 [dm_mod]
[12840.521302]  [<ffffffffa0009ee0>] ? table_load+0x380/0x380 [dm_mod]
[12840.523299]  [<ffffffffa000a074>] dev_suspend+0x194/0x250 [dm_mod]
[12840.525266]  [<ffffffffa0009ee0>] ? table_load+0x380/0x380 [dm_mod]
[12840.527240]  [<ffffffffa000a925>] ctl_ioctl+0x255/0x500 [dm_mod]
[12840.529157]  [<ffffffffa000abe3>] dm_ctl_ioctl+0x13/0x20 [dm_mod]
[12840.531086]  [<ffffffff811ef345>] do_vfs_ioctl+0x2e5/0x4c0
[12840.532832]  [<ffffffff81286fae>] ? file_has_perm+0xae/0xc0
[12840.534685]  [<ffffffff811ef5c1>] SyS_ioctl+0xa1/0xc0
[12840.536305]  [<ffffffff8163fd89>] system_call_fastpath+0x16/0x1b

Version-Release number of selected component (if applicable):
3.10.0-300.el7.x86_64
lvm2-2.02.126-1.el7    BUILT: Tue Jul 28 11:32:33 CDT 2015
lvm2-libs-2.02.126-1.el7    BUILT: Tue Jul 28 11:32:33 CDT 2015
lvm2-cluster-2.02.126-1.el7    BUILT: Tue Jul 28 11:32:33 CDT 2015
device-mapper-1.02.103-1.el7    BUILT: Tue Jul 28 11:32:33 CDT 2015
device-mapper-libs-1.02.103-1.el7    BUILT: Tue Jul 28 11:32:33 CDT 2015
device-mapper-event-1.02.103-1.el7    BUILT: Tue Jul 28 11:32:33 CDT 2015
device-mapper-event-libs-1.02.103-1.el7    BUILT: Tue Jul 28 11:32:33 CDT 2015
device-mapper-persistent-data-0.5.4-1.el7    BUILT: Fri Jul 17 08:56:22 CDT 2015
cmirror-2.02.126-1.el7    BUILT: Tue Jul 28 11:32:33 CDT 2015
sanlock-3.2.4-1.el7    BUILT: Fri Jun 19 12:48:49 CDT 2015
sanlock-lib-3.2.4-1.el7    BUILT: Fri Jun 19 12:48:49 CDT 2015
lvm2-lockd-2.02.126-1.el7    BUILT: Tue Jul 28 11:32:33 CDT 2015


How reproducible:
multiple times.

Comment 1 Mike Snitzer 2015-08-01 02:24:05 UTC
See patches that will be applied via bug#1247192:
https://bugzilla.redhat.com/show_bug.cgi?id=1247192#c1

*** This bug has been marked as a duplicate of bug 1247192 ***


Note You need to log in before you can comment on or make changes to this bug.