Bug 156569 - kernel dm: Potential for device mapper deadlock due to inconsistent ordering of obtaining mapped device's inode I_LOCK lock and the mapped device's lock semaphore.
Summary: kernel dm: Potential for device mapper deadlock due to inconsistent ordering ...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: device-mapper
Version: 4.0
Hardware: All
OS: Linux
medium
high
Target Milestone: ---
: ---
Assignee: Alasdair Kergon
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2005-05-01 22:08 UTC by Ed Goggin
Modified: 2010-01-12 02:14 UTC (History)
12 users (show)

Fixed In Version: RHEL4U2
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2006-05-15 14:37:07 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Ed Goggin 2005-05-01 22:08:10 UTC
Description of problem:

Looks to be deadlock potential between two processes and the lock
ordering of a mapped device inode's I_LOCK i_state and the
mapped device's r/w semaphore lock.  I think the potential for such
deadlock exists anytime the dm.c code calls bdget_disk() or bdget()
in order to lock the block device inode of the mapped device for which
it already has read sharing or write exclusive ownership of the r/w
semaphore lock.  This deadlock potential exists due to the fact that
the page writeback code can call dm_request() to acquire the mapped
device's lock for reading while already owning the mapped device's
I_LOCK i_state bits.

This appears to happen in the call to __unlock_fs() from dm_suspend() and
in the call to __set_size() from __bind() from dm_swap_table() in dm.c.
It is not clear why dm_suspend() acquires the mapped device's lock for
reading while calling __lock_fs() yet acquires the same lock for writing
while calling __unlock_fs().

I've gotten several actual deadlocks between multipath(8) trying to swap
in a new table and a dd(1) performing page writeback using 2.6.11-rc3.
I do not see the problem fixed in Red Hat AS 4 Updaet 1 kernel code.

Multipath owns the multipath mapped device r/w semaphore lock for writing
obtained in dm_swap_table() and is blocked trying to obtain the I_LOCK
inode i_state bits for the mapped device in __set_size() called from
__bind() while trying to set the inode size of the mapped device as
part of binding a new mapping table to the device.

The dd(1) owns the I_LOCK i_state bits of the mapped device's inode from
__sync_single_inode() as part of page writeback and is trying to submit
an i/o to the mapped device but is blocked in dm_request() trying to obtain
the r/w semaphore lock of the mapped device for reading.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Heather Conway 2005-06-30 14:32:29 UTC
Is there any update on this issue?
Thanks.

Comment 2 Alasdair Kergon 2005-07-03 22:26:58 UTC
This looks tractable: the difficulty is in not introducing new race conditions
whilst solving it.  First draft of a patch (unfinished) is in 'editing' dir (00020).

1. lock/unlock fs should not hold the md lock any more.

2. suspend/swap_table/resume must still never be capable of interfering with
each other.


Comment 3 Alasdair Kergon 2005-07-06 19:09:53 UTC
Patches aimed at achieving these goals are at:
  ftp://sources.redhat.com/pub/dm/patches/2.6-unstable/editing/patches/

Please can people review them and try them out?

My main concern is how many new race conditions I've introduced while attempting
to fix the existing ones...


Comment 4 Heather Conway 2005-08-30 18:59:30 UTC
Alisdair - have you had any feedback on the patches or on the request in 
general?
Thanks.
Heather

Comment 5 Heather Conway 2005-08-30 19:22:23 UTC
Ed - have you reviewed the patches that Alisdair posted?  If so, do you have 
any feedback that you can share?
Thanks.
Heather

Comment 6 Alasdair Kergon 2005-09-20 20:46:36 UTC
I believe I these fixes made it into RHEL4 U2.

Comment 8 Hari Kannan 2006-05-14 01:34:02 UTC
This item can be closed.
Andrius please close this issue.

Comment 9 Andrius Benokraitis 2006-05-15 14:37:07 UTC
Closing issue, as notabug, as this has been resolved in RHEL4 U2.


Note You need to log in before you can comment on or make changes to this bug.