Bug 209931

Summary: possible recursive locking detected using umount with xfs
Product: [Fedora] Fedora Reporter: Martin Ebourne <fedora>
Component: kernelAssignee: Eric Sandeen <esandeen>
Status: CLOSED RAWHIDE QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: medium    
Version: 5CC: davej, wtogami
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard: NeedsRetesting
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-02-13 03:17:00 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Martin Ebourne 2006-10-08 13:46:20 UTC
Description of problem:
A possible deadlock notification from umount/xfs while running Fedora testing
kernel (I'm also filing another which looks unrelated in a new bug).

Version-Release number of selected component (if applicable):
kernel-2.6.18-1.2189.fc5

How reproducible:
Only seen this one once in 24hrs. There is a lot of reboot/mount/umount going on
because I'm trying to recover from 2 failed disks.

Steps to Reproduce:
1. Reboot and mount/umount a lot?
  
Actual results:

[With context in case it is useful]

init: Trying to re-exec init
kernel: SGI XFS with ACLs, security attributes, large block/inode numbers, no
debug enabled
kernel: SGI XFS Quota Management subsystem
kernel: Filesystem "dm-4": Disabling barriers, not supported by the underlying
device
kernel: XFS mounting filesystem dm-4
kernel: XFS: recovery required required on read-only device.
kernel: XFS: write access unavailable, cannot proceed.
kernel: XFS: log mount/recovery failed: error 30
kernel: XFS: log mount failed
kernel: Filesystem "dm-4": Disabling barriers, not supported by the underlying
device
kernel: XFS mounting filesystem dm-4
kernel: XFS: recovery required required on read-only device.
kernel: XFS: write access unavailable, cannot proceed.
kernel: XFS: log mount/recovery failed: error 30
kernel: XFS: log mount failed
kernel: Filesystem "dm-4": Disabling barriers, not supported by the underlying
device
kernel: XFS mounting filesystem dm-4
kernel: Starting XFS recovery on filesystem: dm-4 (logdev: internal)
kernel: Ending XFS recovery on filesystem: dm-4 (logdev: internal)
kernel: SELinux: initialized (dev dm-4, type xfs), uses xattr
kernel: Filesystem "dm-5": Disabling barriers, not supported by the underlying
device
kernel: XFS mounting filesystem dm-5
kernel: Starting XFS recovery on filesystem: dm-5 (logdev: internal)
kernel: Ending XFS recovery on filesystem: dm-5 (logdev: internal)
kernel: SELinux: initialized (dev dm-5, type xfs), uses xattr
kernel: 
kernel: =============================================
kernel: [ INFO: possible recursive locking detected ]
kernel: 2.6.18-1.2189.fc5 #1
kernel: ---------------------------------------------
kernel: umount/3873 is trying to acquire lock:
kernel:  (&(&ip->i_lock)->mr_lock){----}, at: [<ffffffff8887db38>]
xfs_ilock+0x58/0x7b [xfs]
kernel: 
kernel: but task is already holding lock:
kernel:  (&(&ip->i_lock)->mr_lock){----}, at: [<ffffffff8887db38>]
xfs_ilock+0x58/0x7b [xfs]
kernel: 
kernel: other info that might help us debug this:
kernel: 5 locks held by umount/3873:
kernel:  #0:  (&type->s_umount_key#24){----}, at: [<ffffffff802e4221>]
deactivate_super+0x67/0x84
kernel:  #1:  (&type->s_lock_key#13){--..}, at: [<ffffffff8026652b>]
mutex_lock+0x2a/0x2e
kernel:  #2:  (iprune_mutex){--..}, at: [<ffffffff8026652b>] mutex_lock+0x2a/0x2e
kernel:  #3:  (&(&ip->i_iolock)->mr_lock){--..}, at: [<ffffffff8887db08>]
xfs_ilock+0x28/0x7b [xfs]
kernel:  #4:  (&(&ip->i_lock)->mr_lock){----}, at: [<ffffffff8887db38>]
xfs_ilock+0x58/0x7b [xfs]
kernel: 
kernel: stack backtrace:
kernel: 
kernel: Call Trace:
kernel:  [<ffffffff8026e990>] show_trace+0xae/0x32b
kernel:  [<ffffffff8026ec22>] dump_stack+0x15/0x17
kernel:  [<ffffffff802a8601>] __lock_acquire+0x135/0xa59
kernel:  [<ffffffff802a94c6>] lock_acquire+0x4b/0x69
kernel:  [<ffffffff802a608e>] down_write+0x3b/0x47
kernel:  [<ffffffff8887db38>] :xfs:xfs_ilock+0x58/0x7b
kernel:  [<ffffffff88897bbd>] :xfs:xfs_reclaim+0x5e/0xe4
kernel:  [<ffffffff888a58e9>] :xfs:xfs_fs_clear_inode+0xd9/0xfd
kernel:  [<ffffffff802244c4>] clear_inode+0xfc/0x14f
kernel:  [<ffffffff80237a13>] dispose_list+0x43/0xe9
kernel:  [<ffffffff802ebf60>] invalidate_inodes+0xd3/0xf1
kernel:  [<ffffffff802e4066>] generic_shutdown_super+0x5b/0x10d
kernel:  [<ffffffff802e413e>] kill_block_super+0x26/0x3b
kernel:  [<ffffffff802e4229>] deactivate_super+0x6f/0x84
kernel:  [<ffffffff8022f025>] mntput_no_expire+0x56/0x9a
kernel:  [<ffffffff802346fb>] path_release_on_umount+0x1d/0x21
kernel:  [<ffffffff802ed86d>] sys_umount+0x24e/0x294
kernel:  [<ffffffff8026081a>] tracesys+0xd1/0xdb
kernel: DWARF2 unwinder stuck at tracesys+0xd1/0xdb
kernel: Leftover inexact backtrace:
kernel: 
kernel: Filesystem "dm-4": Disabling barriers, not supported by the underlying
device
kernel: XFS mounting filesystem dm-4
kernel: SELinux: initialized (dev dm-4, type xfs), uses xattr
kernel: Filesystem "dm-6": Disabling barriers, not supported by the underlying
device
kernel: XFS mounting filesystem dm-6
kernel: SELinux: initialized (dev dm-6, type xfs), uses xattr
kernel: kjournald starting.  Commit interval 5 seconds

Expected results:

No errors.

Additional info:

This filesystem is on LVM.
Fully updated FC5 with testing kernel as above.

Comment 1 Dave Jones 2006-10-16 19:33:37 UTC
A new kernel update has been released (Version: 2.6.18-1.2200.fc5)
based upon a new upstream kernel release.

Please retest against this new kernel, as a large number of patches
go into each upstream release, possibly including changes that
may address this problem.

This bug has been placed in NEEDINFO state.
Due to the large volume of inactive bugs in bugzilla, if this bug is
still in this state in two weeks time, it will be closed.

Should this bug still be relevant after this period, the reporter
can reopen the bug at any time. Any other users on the Cc: list
of this bug can request that the bug be reopened by adding a
comment to the bug.

In the last few updates, some users upgrading from FC4->FC5
have reported that installing a kernel update has left their
systems unbootable. If you have been affected by this problem
please check you only have one version of device-mapper & lvm2
installed.  See bug 207474 for further details.

If this bug is a problem preventing you from installing the
release this version is filed against, please see bug 169613.

If this bug has been fixed, but you are now experiencing a different
problem, please file a separate bug for the new problem.

Thank you.

Comment 2 Martin Ebourne 2006-10-31 00:34:17 UTC
Of course, I've not seen this again, but then does this really count as a fix?:

-CONFIG_DEBUG_MUTEXES=y
-CONFIG_DEBUG_RWSEMS=y
-CONFIG_DEBUG_LOCK_ALLOC=y
-CONFIG_PROVE_LOCKING=y
+# CONFIG_DEBUG_MUTEXES is not set
+# CONFIG_DEBUG_RWSEMS is not set
+# CONFIG_DEBUG_LOCK_ALLOC is not set
+# CONFIG_PROVE_LOCKING is not set

And was the actual bug itself really fixed from 2.6.18-1.2189.fc5 to
2.6.18-1.2200.fc5? Seems unlikely (although I'm told the ext3 bug was, which is
nice).

Might be reasonable to needinfo bugs when a new major version is out, but not
when you've rebuilt to switch the debugging features off that allow the problem
to be detected in the first place.


Comment 3 Dave Jones 2006-11-24 21:02:52 UTC
I'm going to look into offering separate -debug kernels soon in addition to the
regular ones, which will allow retesting of problems like this.

As to whether this got fixed. Probably not in the Fedora kernel, but it may have
subsequently got fixed upstream, Eric might know.

Comment 4 Eric Sandeen 2007-05-31 16:24:11 UTC
Yep xfs guys have a fix for most of these:

http://oss.sgi.com/archives/xfs/2007-04/msg00177.html

I don't see it in linus' tree yet though - it will get there eventually.

Comment 5 Eric Sandeen 2008-02-13 03:17:00 UTC
at least by rawhide, some of these are fixed.  there are still xfs lockdep
reports out there which have more to do w/ lockdep not grokking xfs than any
real problems.  The sgi guys continue to find ways to annotate around it, if for
no other reason than to stop the flow of reports :)