Bug 483610 - possible recursive locking detected while starting cluster daemons
Summary: possible recursive locking detected while starting cluster daemons
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: cman
Version: rawhide
Hardware: All
OS: Linux
low
medium
Target Milestone: ---
Assignee: Fabio Massimo Di Nitto
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2009-02-02 16:16 UTC by Nate Straz
Modified: 2009-02-25 05:03 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-02-25 05:03:36 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)

Description Nate Straz 2009-02-02 16:16:43 UTC
Description of problem:

While starting cluster services on all nodes in the cluster, one node produced the following warning message.

[root@dash-01 ~]# service cman start
Starting cluster:
   Loading modules... done
   Mounting configfs... done
   Setting network parameters... done
   Starting cman... done
   Starting daemons...
=============================================
[ INFO: possible recursive locking detected ]
2.6.29-0.53.rc2.git1.fc11.x86_64 #1
---------------------------------------------
dlm_controld/5751 is trying to acquire lock:
 (&sb->s_type->i_mutex_key#12/2){--..}, at: [<ffffffffa01c6c48>] configfs_attach_group+0x4a/0x183 [configfs]

but task is already holding lock:
 (&sb->s_type->i_mutex_key#12/2){--..}, at: [<ffffffffa01c6c48>] configfs_attach_group+0x4a/0x183 [configfs]

other info that might help us debug this:
2 locks held by dlm_controld/5751:
 #0:  (&sb->s_type->i_mutex_key#11/1){--..}, at: [<ffffffff810e794b>] lookup_create+0x26/0x94
 #1:  (&sb->s_type->i_mutex_key#12/2){--..}, at: [<ffffffffa01c6c48>] configfs_attach_group+0x4a/0x183 [configfs]

stack backtrace:
Pid: 5751, comm: dlm_controld Not tainted 2.6.29-0.53.rc2.git1.fc11.x86_64 #1
Call Trace:
 [<ffffffff8106e715>] __lock_acquire+0x863/0xc41
 [<ffffffff8106eb80>] lock_acquire+0x8d/0xba
 [<ffffffffa01c6c48>] ? configfs_attach_group+0x4a/0x183 [configfs]
 [<ffffffff813818aa>] __mutex_lock_common+0x107/0x39c
 [<ffffffffa01c6c48>] ? configfs_attach_group+0x4a/0x183 [configfs]
 [<ffffffff8138308b>] ? _spin_unlock+0x26/0x2a
 [<ffffffffa01c6c48>] ? configfs_attach_group+0x4a/0x183 [configfs]
 [<ffffffff81381be8>] mutex_lock_nested+0x35/0x3a
 [<ffffffffa01c6c48>] configfs_attach_group+0x4a/0x183 [configfs]
 [<ffffffff8138308b>] ? _spin_unlock+0x26/0x2a
 [<ffffffffa01c6cf8>] configfs_attach_group+0xfa/0x183 [configfs]
 [<ffffffffa01c6fbc>] configfs_mkdir+0x23b/0x326 [configfs]
 [<ffffffff810e7c1e>] vfs_mkdir+0x6c/0xbb
 [<ffffffff810e9922>] sys_mkdirat+0xa2/0xf5
 [<ffffffff8101130a>] ? sysret_check+0x46/0x81
 [<ffffffff8106d719>] ? trace_hardirqs_on_caller+0x12f/0x153
 [<ffffffff810e9988>] sys_mkdir+0x13/0x15
 [<ffffffff810112ba>] system_call_fastpath+0x16/0x1b
done
   Starting fencing... done
[  OK  ]


Version-Release number of selected component (if applicable):
kernel-2.6.29-0.53.rc2.git1.fc11.x86_64
cman-3.0.0-4.alpha3.fc11.x86_64


How reproducible:
Unknown

Steps to Reproduce:
1. service cman start
  
Actual results:
See above

Expected results:
[root@dash-01 ~]# service cman start
Starting cluster:
   Loading modules... done
   Mounting configfs... done
   Setting network parameters... done
   Starting cman... done
   Starting daemons... done
   Starting fencing... done
[  OK  ]

Additional info:

Comment 1 Steve Whitehouse 2009-02-02 16:45:52 UTC
We already know about this one. It should really be filed against configfs since that is where the problem lies. There is a long correspondence on lkml about it, and it might even get fixed shortly.

Comment 2 Steve Whitehouse 2009-02-04 11:08:00 UTC
Looks like the "fix" for this has hit upstream now.

Comment 3 Chris Feist 2009-02-24 22:56:45 UTC
Reassigning to fabio (he does the fc11 builds).

Comment 4 Fabio Massimo Di Nitto 2009-02-25 05:03:36 UTC
This is a kernel bug and it has been fixed afaict or at least I can't reproduce it anylonger on my test machines.


Note You need to log in before you can comment on or make changes to this bug.