Bug 156569 - kernel dm: Potential for device mapper deadlock due to inconsistent ordering of obtaining mapped device's inode I_LOCK lock and the mapped device's lock semaphore.
kernel dm: Potential for device mapper deadlock due to inconsistent ordering ...
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: device-mapper (Show other bugs)
All Linux
medium Severity high
: ---
: ---
Assigned To: Alasdair Kergon
Depends On:
  Show dependency treegraph
Reported: 2005-05-01 18:08 EDT by Ed Goggin
Modified: 2010-01-11 21:14 EST (History)
12 users (show)

See Also:
Fixed In Version: RHEL4U2
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2006-05-15 10:37:07 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

  None (edit)
Description Ed Goggin 2005-05-01 18:08:10 EDT
Description of problem:

Looks to be deadlock potential between two processes and the lock
ordering of a mapped device inode's I_LOCK i_state and the
mapped device's r/w semaphore lock.  I think the potential for such
deadlock exists anytime the dm.c code calls bdget_disk() or bdget()
in order to lock the block device inode of the mapped device for which
it already has read sharing or write exclusive ownership of the r/w
semaphore lock.  This deadlock potential exists due to the fact that
the page writeback code can call dm_request() to acquire the mapped
device's lock for reading while already owning the mapped device's
I_LOCK i_state bits.

This appears to happen in the call to __unlock_fs() from dm_suspend() and
in the call to __set_size() from __bind() from dm_swap_table() in dm.c.
It is not clear why dm_suspend() acquires the mapped device's lock for
reading while calling __lock_fs() yet acquires the same lock for writing
while calling __unlock_fs().

I've gotten several actual deadlocks between multipath(8) trying to swap
in a new table and a dd(1) performing page writeback using 2.6.11-rc3.
I do not see the problem fixed in Red Hat AS 4 Updaet 1 kernel code.

Multipath owns the multipath mapped device r/w semaphore lock for writing
obtained in dm_swap_table() and is blocked trying to obtain the I_LOCK
inode i_state bits for the mapped device in __set_size() called from
__bind() while trying to set the inode size of the mapped device as
part of binding a new mapping table to the device.

The dd(1) owns the I_LOCK i_state bits of the mapped device's inode from
__sync_single_inode() as part of page writeback and is trying to submit
an i/o to the mapped device but is blocked in dm_request() trying to obtain
the r/w semaphore lock of the mapped device for reading.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
Actual results:

Expected results:

Additional info:
Comment 1 Heather Conway 2005-06-30 10:32:29 EDT
Is there any update on this issue?
Comment 2 Alasdair Kergon 2005-07-03 18:26:58 EDT
This looks tractable: the difficulty is in not introducing new race conditions
whilst solving it.  First draft of a patch (unfinished) is in 'editing' dir (00020).

1. lock/unlock fs should not hold the md lock any more.

2. suspend/swap_table/resume must still never be capable of interfering with
each other.
Comment 3 Alasdair Kergon 2005-07-06 15:09:53 EDT
Patches aimed at achieving these goals are at:

Please can people review them and try them out?

My main concern is how many new race conditions I've introduced while attempting
to fix the existing ones...
Comment 4 Heather Conway 2005-08-30 14:59:30 EDT
Alisdair - have you had any feedback on the patches or on the request in 
Comment 5 Heather Conway 2005-08-30 15:22:23 EDT
Ed - have you reviewed the patches that Alisdair posted?  If so, do you have 
any feedback that you can share?
Comment 6 Alasdair Kergon 2005-09-20 16:46:36 EDT
I believe I these fixes made it into RHEL4 U2.
Comment 8 Hari Kannan 2006-05-13 21:34:02 EDT
This item can be closed.
Andrius please close this issue.
Comment 9 Andrius Benokraitis 2006-05-15 10:37:07 EDT
Closing issue, as notabug, as this has been resolved in RHEL4 U2.

Note You need to log in before you can comment on or make changes to this bug.