Bug 204311

Summary:	kernel - recursive locking in dm_request (md->io_lock)
Product:	[Fedora] Fedora	Reporter:	Prarit Bhargava <prarit>
Component:	kernel	Assignee:	Alasdair Kergon <agk>
Status:	CLOSED RAWHIDE	QA Contact:	Brian Brock <bbrock>
Severity:	high	Docs Contact:
Priority:	high
Version:	rawhide	CC:	agk, dex.mbox, dwysocha, esandeen, jbrassow, kennyr68, kevin, mbroz, michal, njsharp, wtogami
Target Milestone:	---
Target Release:	---
Hardware:	All
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2008-04-04 06:33:26 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	202141

Description Prarit Bhargava 2006-08-28 14:33:18 UTC

Description of problem:

Latest nightly fedora (20060828) fails to install on HP xw9400 because of a 
recursive lock error in the md layer.

Version-Release number of selected component (if applicable): Latest nightly
kernel 20060828


How reproducible:100%


Steps to Reproduce:
1. Attempt to install via NFS.
  
Actual results:

After device detection the following error is displayed on the console:
=============================================
[ INFO: possible recursive locking detected ]
2.6.17-1.2586.fc6 #1
---------------------------------------------
lvm/557 is trying to acquire lock:
 (&md->io_lock){----}, at: [<f8b7542f>] dm_request+0x18/0xcd [dm_mod]

but task is already holding lock:
 (&md->io_lock){----}, at: [<f8b7542f>] dm_request+0x18/0xcd [dm_mod]

other info that might help us debug this:
1 lock held by lvm/557:
 #0:  (&md->io_lock){----}, at: [<f8b7542f>] dm_request+0x18/0xcd [dm_mod]

stack backtrace:
 [<c04037db>] show_trace_log_lvl+0x58/0x159
 [<c0403d9e>] show_trace+0xd/0x10
 [<c0403e3b>] dump_stack+0x19/0x1b
 [<c042bdef>] __lock_acquire+0x765/0x97c
 [<c042c577>] lock_acquire+0x4b/0x6c
 [<c042966c>] down_read+0x2d/0x3f
 [<f8b7542f>] dm_request+0x18/0xcd [dm_mod]
 [<c04c7b38>] generic_make_request+0x28e/0x29e
 [<f8b74437>] __map_bio+0xc0/0xee [dm_mod]
 [<f8b74cdd>] __split_bio+0x158/0x3a0 [dm_mod]
 [<f8b754d6>] dm_request+0xbf/0xcd [dm_mod]
 [<c04c7b38>] generic_make_request+0x28e/0x29e
 [<c04c983d>] submit_bio+0xa1/0xa9
 [<c047e84a>] dio_bio_submit+0x4f/0x61
 [<c047f643>] __blockdev_direct_IO+0x8fa/0xc52
 [<c0466437>] blkdev_direct_IO+0x30/0x35
 [<c0444891>] generic_file_direct_IO+0x88/0xe4
 [<c0444a98>] __generic_file_aio_read+0xb7/0x1ad
 [<c0445a6a>] generic_file_read+0x87/0x9b
 [<c045f57d>] vfs_read+0xa9/0x15b
 [<c045f909>] sys_read+0x3b/0x60
 [<c0402e57>] syscall_call+0x7/0xb
DWARF2 unwinder stuck at syscall_call+0x7/0xb
Leftover inexact backtrace:
 [<c0403d9e>] show_trace+0xd/0x10
 [<c0403e3b>] dump_stack+0x19/0x1b
 [<c042bdef>] __lock_acquire+0x765/0x97c
 [<c042c577>] lock_acquire+0x4b/0x6c
 [<c042966c>] down_read+0x2d/0x3f
 [<f8b7542f>] dm_request+0x18/0xcd [dm_mod]
 [<c04c7b38>] generic_make_request+0x28e/0x29e
 [<f8b74437>] __map_bio+0xc0/0xee [dm_mod]
 [<f8b74cdd>] __split_bio+0x158/0x3a0 [dm_mod]
 [<f8b754d6>] dm_request+0xbf/0xcd [dm_mod]
 [<c04c7b38>] generic_make_request+0x28e/0x29e
 [<c04c983d>] submit_bio+0xa1/0xa9
 [<c047e84a>] dio_bio_submit+0x4f/0x61
 [<c047f643>] __blockdev_direct_IO+0x8fa/0xc52
 [<c0466437>] blkdev_direct_IO+0x30/0x35
 [<c0444891>] generic_file_direct_IO+0x88/0xe4
 [<c0444a98>] __generic_file_aio_read+0xb7/0x1ad
 [<c0445a6a>] generic_file_read+0x87/0x9b
 [<c045f57d>] vfs_read+0xa9/0x15b
 [<c045f909>] sys_read+0x3b/0x60
 [<c0402e57>] syscall_call+0x7/0xb
kjournald starting.  Commit interval 5 seconds
EXT3 FS on dm-1, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
SELinux: initialized (dev dm-1, type ext3), uses xattr
SCSI device sda: 156301488 512-byte hdwr sectors (80026 MB)
sda: Write Protect is off
sda: Mode Sense: 00 3a 00 00
SCSI device sda: drive cache: write back
 sda: sda1 sda2

Expected results: No error should occur.

Additional info:  Adding Eric Sandeen for now -- he's the only FS guy I know ;)

Possible blocker for Fedora?

Comment 1 David Lawrence 2006-09-05 15:26:18 UTC

Reassigning to correct owner, kernel-maint.

Comment 3 Dave Jones 2006-10-04 23:28:24 UTC

*** Bug 208754 has been marked as a duplicate of this bug. ***

Comment 4 Michal Jaegermann 2006-10-05 00:18:16 UTC

Minor but this bug still says "hardware: i386" while bug 208754
has a trace from x86_64.  Also in that other trace there is
nothing after "Leftover inexact backtrace:". 1/2 :-)

Comment 5 Prarit Bhargava 2006-10-05 13:42:30 UTC

Seen on i386, x86_64, and ia64.

P.

Comment 7 Alasdair Kergon 2006-10-05 14:41:57 UTC

known issue first reported in July (during OLS) but nobody's done a patch for it yet

Comment 9 Alasdair Kergon 2006-10-05 15:04:01 UTC

The locking prevents suspend requests from interfering with bio splitting.

The problem here is potential deadlock on SMP because lock requests are ordered.

Comment 10 Alasdair Kergon 2006-10-05 15:07:42 UTC

*** Bug 206105 has been marked as a duplicate of this bug. ***

Comment 11 Kenneth A. Redler 2007-04-13 11:40:51 UTC

device-mapper: ioctl: 4.11.0-ioctl (2006-10-12) initialised: dm-devel

=============================================
[ INFO: possible recursive locking detected ]
2.6.20-1.3059.fc7 #1
---------------------------------------------
init/1 is trying to acquire lock:
 (&md->io_lock){----}, at: [<d08f67b1>] dm_request+0x18/0xea [dm_mod]

but task is already holding lock:
 (&md->io_lock){----}, at: [<d08f67b1>] dm_request+0x18/0xea [dm_mod]

other info that might help us debug this:
1 lock held by init/1:
 #0:  (&md->io_lock){----}, at: [<d08f67b1>] dm_request+0x18/0xea [dm_mod]

stack backtrace:
 [<c04061e9>] show_trace_log_lvl+0x1a/0x2f
 [<c04067ad>] show_trace+0x12/0x14
 [<c0406831>] dump_stack+0x16/0x18
 [<c0442089>] __lock_acquire+0x11f/0xba4
 [<c0442f00>] lock_acquire+0x56/0x6f
 [<c043b7e0>] down_read+0x3f/0x51
 [<d08f67b1>] dm_request+0x18/0xea [dm_mod]
 [<c04e4d06>] generic_make_request+0x2d8/0x2eb
 [<d08f5516>] __map_bio+0xd5/0x128 [dm_mod]
 [<d08f5e8e>] __split_bio+0x16f/0x3d2 [dm_mod]
 [<d08f6875>] dm_request+0xdc/0xea [dm_mod]
 [<c04e4d06>] generic_make_request+0x2d8/0x2eb
 [<c04e6d2d>] submit_bio+0xd7/0xdf
 [<c049a163>] submit_bh+0xf0/0x10f
 [<c049c964>] block_read_full_page+0x2c9/0x2d9
 [<c049e3c9>] blkdev_readpage+0xf/0x11
 [<c04662ab>] __do_page_cache_readahead+0x16a/0x1b6
 [<c0466344>] blockable_page_cache_readahead+0x4d/0xa0
 [<c046655c>] page_cache_readahead+0x129/0x190
 [<c0460f3b>] do_generic_mapping_read+0x12b/0x420
 [<c0462cdf>] generic_file_aio_read+0x16a/0x197
 [<c047e28b>] do_sync_read+0xc2/0xff
 [<c047eb30>] vfs_read+0xad/0x161
 [<c047efbc>] sys_read+0x3d/0x61
 [<c0405078>] syscall_call+0x7/0xb
 =======================



Seen also in Fedora 7 test 3 kernel

Comment 12 dex 2007-04-27 04:09:37 UTC

I've been getting these errors since the pata port on my m/b was enabled many
months ago (promise 376 fake raid) now its bugging me! latest rawhide kernel
2.6.20-1.3116.fc7

Can these messages be silenced for f7 release ?

Comment 13 Milan Broz 2007-05-12 17:38:49 UTC

*** Bug 239925 has been marked as a duplicate of this bug. ***

Comment 14 Dave Jones 2007-05-15 18:10:59 UTC

*** Bug 238304 has been marked as a duplicate of this bug. ***

Comment 15 dex 2007-06-07 14:27:16 UTC

I made the jump to  2.6.21-1.3194.fc7 (to avoid the wi-fi stuff) this bug has
now gone for me,(but: #240982 turned up) so another person can close when there
ready. (or I will close in about 2 weeks)

Comment 16 Milan Broz 2008-04-04 06:33:26 UTC

Because recent kernels serializes operations in generic_make_request() calls for
current process, dm_request is no more called recursively (btw it was just
warning and false positive).

Still probably some situation can cause this warning if there is another thread
calling dm_request (maybe invalidation of filled snapshot etc.). But then it is
another problem with different stacktrace.