Bug 204311 - kernel - recursive locking in dm_request (md->io_lock)
Summary: kernel - recursive locking in dm_request (md->io_lock)
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: rawhide
Hardware: All
OS: Linux
high
high
Target Milestone: ---
Assignee: Alasdair Kergon
QA Contact: Brian Brock
URL:
Whiteboard:
: 206105 208754 238304 239925 (view as bug list)
Depends On:
Blocks: FCMETA_LOCKDEP
TreeView+ depends on / blocked
 
Reported: 2006-08-28 14:33 UTC by Prarit Bhargava
Modified: 2008-04-04 06:33 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2008-04-04 06:33:26 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)

Description Prarit Bhargava 2006-08-28 14:33:18 UTC
Description of problem:

Latest nightly fedora (20060828) fails to install on HP xw9400 because of a 
recursive lock error in the md layer.

Version-Release number of selected component (if applicable): Latest nightly
kernel 20060828


How reproducible:100%


Steps to Reproduce:
1. Attempt to install via NFS.
  
Actual results:

After device detection the following error is displayed on the console:
=============================================
[ INFO: possible recursive locking detected ]
2.6.17-1.2586.fc6 #1
---------------------------------------------
lvm/557 is trying to acquire lock:
 (&md->io_lock){----}, at: [<f8b7542f>] dm_request+0x18/0xcd [dm_mod]

but task is already holding lock:
 (&md->io_lock){----}, at: [<f8b7542f>] dm_request+0x18/0xcd [dm_mod]

other info that might help us debug this:
1 lock held by lvm/557:
 #0:  (&md->io_lock){----}, at: [<f8b7542f>] dm_request+0x18/0xcd [dm_mod]

stack backtrace:
 [<c04037db>] show_trace_log_lvl+0x58/0x159
 [<c0403d9e>] show_trace+0xd/0x10
 [<c0403e3b>] dump_stack+0x19/0x1b
 [<c042bdef>] __lock_acquire+0x765/0x97c
 [<c042c577>] lock_acquire+0x4b/0x6c
 [<c042966c>] down_read+0x2d/0x3f
 [<f8b7542f>] dm_request+0x18/0xcd [dm_mod]
 [<c04c7b38>] generic_make_request+0x28e/0x29e
 [<f8b74437>] __map_bio+0xc0/0xee [dm_mod]
 [<f8b74cdd>] __split_bio+0x158/0x3a0 [dm_mod]
 [<f8b754d6>] dm_request+0xbf/0xcd [dm_mod]
 [<c04c7b38>] generic_make_request+0x28e/0x29e
 [<c04c983d>] submit_bio+0xa1/0xa9
 [<c047e84a>] dio_bio_submit+0x4f/0x61
 [<c047f643>] __blockdev_direct_IO+0x8fa/0xc52
 [<c0466437>] blkdev_direct_IO+0x30/0x35
 [<c0444891>] generic_file_direct_IO+0x88/0xe4
 [<c0444a98>] __generic_file_aio_read+0xb7/0x1ad
 [<c0445a6a>] generic_file_read+0x87/0x9b
 [<c045f57d>] vfs_read+0xa9/0x15b
 [<c045f909>] sys_read+0x3b/0x60
 [<c0402e57>] syscall_call+0x7/0xb
DWARF2 unwinder stuck at syscall_call+0x7/0xb
Leftover inexact backtrace:
 [<c0403d9e>] show_trace+0xd/0x10
 [<c0403e3b>] dump_stack+0x19/0x1b
 [<c042bdef>] __lock_acquire+0x765/0x97c
 [<c042c577>] lock_acquire+0x4b/0x6c
 [<c042966c>] down_read+0x2d/0x3f
 [<f8b7542f>] dm_request+0x18/0xcd [dm_mod]
 [<c04c7b38>] generic_make_request+0x28e/0x29e
 [<f8b74437>] __map_bio+0xc0/0xee [dm_mod]
 [<f8b74cdd>] __split_bio+0x158/0x3a0 [dm_mod]
 [<f8b754d6>] dm_request+0xbf/0xcd [dm_mod]
 [<c04c7b38>] generic_make_request+0x28e/0x29e
 [<c04c983d>] submit_bio+0xa1/0xa9
 [<c047e84a>] dio_bio_submit+0x4f/0x61
 [<c047f643>] __blockdev_direct_IO+0x8fa/0xc52
 [<c0466437>] blkdev_direct_IO+0x30/0x35
 [<c0444891>] generic_file_direct_IO+0x88/0xe4
 [<c0444a98>] __generic_file_aio_read+0xb7/0x1ad
 [<c0445a6a>] generic_file_read+0x87/0x9b
 [<c045f57d>] vfs_read+0xa9/0x15b
 [<c045f909>] sys_read+0x3b/0x60
 [<c0402e57>] syscall_call+0x7/0xb
kjournald starting.  Commit interval 5 seconds
EXT3 FS on dm-1, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
SELinux: initialized (dev dm-1, type ext3), uses xattr
SCSI device sda: 156301488 512-byte hdwr sectors (80026 MB)
sda: Write Protect is off
sda: Mode Sense: 00 3a 00 00
SCSI device sda: drive cache: write back
 sda: sda1 sda2

Expected results: No error should occur.

Additional info:  Adding Eric Sandeen for now -- he's the only FS guy I know ;)

Possible blocker for Fedora?

Comment 1 David Lawrence 2006-09-05 15:26:18 UTC
Reassigning to correct owner, kernel-maint.

Comment 3 Dave Jones 2006-10-04 23:28:24 UTC
*** Bug 208754 has been marked as a duplicate of this bug. ***

Comment 4 Michal Jaegermann 2006-10-05 00:18:16 UTC
Minor but this bug still says "hardware: i386" while bug 208754
has a trace from x86_64.  Also in that other trace there is
nothing after "Leftover inexact backtrace:". 1/2 :-)

Comment 5 Prarit Bhargava 2006-10-05 13:42:30 UTC
Seen on i386, x86_64, and ia64.

P.

Comment 7 Alasdair Kergon 2006-10-05 14:41:57 UTC
known issue first reported in July (during OLS) but nobody's done a patch for it yet

Comment 9 Alasdair Kergon 2006-10-05 15:04:01 UTC
The locking prevents suspend requests from interfering with bio splitting.

The problem here is potential deadlock on SMP because lock requests are ordered.

Comment 10 Alasdair Kergon 2006-10-05 15:07:42 UTC
*** Bug 206105 has been marked as a duplicate of this bug. ***

Comment 11 Kenneth A. Redler 2007-04-13 11:40:51 UTC
device-mapper: ioctl: 4.11.0-ioctl (2006-10-12) initialised: dm-devel

=============================================
[ INFO: possible recursive locking detected ]
2.6.20-1.3059.fc7 #1
---------------------------------------------
init/1 is trying to acquire lock:
 (&md->io_lock){----}, at: [<d08f67b1>] dm_request+0x18/0xea [dm_mod]

but task is already holding lock:
 (&md->io_lock){----}, at: [<d08f67b1>] dm_request+0x18/0xea [dm_mod]

other info that might help us debug this:
1 lock held by init/1:
 #0:  (&md->io_lock){----}, at: [<d08f67b1>] dm_request+0x18/0xea [dm_mod]

stack backtrace:
 [<c04061e9>] show_trace_log_lvl+0x1a/0x2f
 [<c04067ad>] show_trace+0x12/0x14
 [<c0406831>] dump_stack+0x16/0x18
 [<c0442089>] __lock_acquire+0x11f/0xba4
 [<c0442f00>] lock_acquire+0x56/0x6f
 [<c043b7e0>] down_read+0x3f/0x51
 [<d08f67b1>] dm_request+0x18/0xea [dm_mod]
 [<c04e4d06>] generic_make_request+0x2d8/0x2eb
 [<d08f5516>] __map_bio+0xd5/0x128 [dm_mod]
 [<d08f5e8e>] __split_bio+0x16f/0x3d2 [dm_mod]
 [<d08f6875>] dm_request+0xdc/0xea [dm_mod]
 [<c04e4d06>] generic_make_request+0x2d8/0x2eb
 [<c04e6d2d>] submit_bio+0xd7/0xdf
 [<c049a163>] submit_bh+0xf0/0x10f
 [<c049c964>] block_read_full_page+0x2c9/0x2d9
 [<c049e3c9>] blkdev_readpage+0xf/0x11
 [<c04662ab>] __do_page_cache_readahead+0x16a/0x1b6
 [<c0466344>] blockable_page_cache_readahead+0x4d/0xa0
 [<c046655c>] page_cache_readahead+0x129/0x190
 [<c0460f3b>] do_generic_mapping_read+0x12b/0x420
 [<c0462cdf>] generic_file_aio_read+0x16a/0x197
 [<c047e28b>] do_sync_read+0xc2/0xff
 [<c047eb30>] vfs_read+0xad/0x161
 [<c047efbc>] sys_read+0x3d/0x61
 [<c0405078>] syscall_call+0x7/0xb
 =======================



Seen also in Fedora 7 test 3 kernel

Comment 12 dex 2007-04-27 04:09:37 UTC
I've been getting these errors since the pata port on my m/b was enabled many
months ago (promise 376 fake raid) now its bugging me! latest rawhide kernel
2.6.20-1.3116.fc7

Can these messages be silenced for f7 release ? 

Comment 13 Milan Broz 2007-05-12 17:38:49 UTC
*** Bug 239925 has been marked as a duplicate of this bug. ***

Comment 14 Dave Jones 2007-05-15 18:10:59 UTC
*** Bug 238304 has been marked as a duplicate of this bug. ***

Comment 15 dex 2007-06-07 14:27:16 UTC
I made the jump to  2.6.21-1.3194.fc7 (to avoid the wi-fi stuff) this bug has
now gone for me,(but: #240982 turned up) so another person can close when there
ready. (or I will close in about 2 weeks) 

Comment 16 Milan Broz 2008-04-04 06:33:26 UTC
Because recent kernels serializes operations in generic_make_request() calls for
current process, dm_request is no more called recursively (btw it was just
warning and false positive).

Still probably some situation can cause this warning if there is another thread
calling dm_request (maybe invalidation of filled snapshot etc.). But then it is
another problem with different stacktrace.



Note You need to log in before you can comment on or make changes to this bug.