Bug 483610

Summary: possible recursive locking detected while starting cluster daemons
Product: [Fedora] Fedora Reporter: Nate Straz <nstraz>
Component: cmanAssignee: Fabio Massimo Di Nitto <fdinitto>
Status: CLOSED RAWHIDE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: low    
Version: rawhideCC: agk, ccaulfie, cfeist, fdinitto, mbroz, swhiteho
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-02-25 05:03:36 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Nate Straz 2009-02-02 16:16:43 UTC
Description of problem:

While starting cluster services on all nodes in the cluster, one node produced the following warning message.

[root@dash-01 ~]# service cman start
Starting cluster:
   Loading modules... done
   Mounting configfs... done
   Setting network parameters... done
   Starting cman... done
   Starting daemons...
=============================================
[ INFO: possible recursive locking detected ]
2.6.29-0.53.rc2.git1.fc11.x86_64 #1
---------------------------------------------
dlm_controld/5751 is trying to acquire lock:
 (&sb->s_type->i_mutex_key#12/2){--..}, at: [<ffffffffa01c6c48>] configfs_attach_group+0x4a/0x183 [configfs]

but task is already holding lock:
 (&sb->s_type->i_mutex_key#12/2){--..}, at: [<ffffffffa01c6c48>] configfs_attach_group+0x4a/0x183 [configfs]

other info that might help us debug this:
2 locks held by dlm_controld/5751:
 #0:  (&sb->s_type->i_mutex_key#11/1){--..}, at: [<ffffffff810e794b>] lookup_create+0x26/0x94
 #1:  (&sb->s_type->i_mutex_key#12/2){--..}, at: [<ffffffffa01c6c48>] configfs_attach_group+0x4a/0x183 [configfs]

stack backtrace:
Pid: 5751, comm: dlm_controld Not tainted 2.6.29-0.53.rc2.git1.fc11.x86_64 #1
Call Trace:
 [<ffffffff8106e715>] __lock_acquire+0x863/0xc41
 [<ffffffff8106eb80>] lock_acquire+0x8d/0xba
 [<ffffffffa01c6c48>] ? configfs_attach_group+0x4a/0x183 [configfs]
 [<ffffffff813818aa>] __mutex_lock_common+0x107/0x39c
 [<ffffffffa01c6c48>] ? configfs_attach_group+0x4a/0x183 [configfs]
 [<ffffffff8138308b>] ? _spin_unlock+0x26/0x2a
 [<ffffffffa01c6c48>] ? configfs_attach_group+0x4a/0x183 [configfs]
 [<ffffffff81381be8>] mutex_lock_nested+0x35/0x3a
 [<ffffffffa01c6c48>] configfs_attach_group+0x4a/0x183 [configfs]
 [<ffffffff8138308b>] ? _spin_unlock+0x26/0x2a
 [<ffffffffa01c6cf8>] configfs_attach_group+0xfa/0x183 [configfs]
 [<ffffffffa01c6fbc>] configfs_mkdir+0x23b/0x326 [configfs]
 [<ffffffff810e7c1e>] vfs_mkdir+0x6c/0xbb
 [<ffffffff810e9922>] sys_mkdirat+0xa2/0xf5
 [<ffffffff8101130a>] ? sysret_check+0x46/0x81
 [<ffffffff8106d719>] ? trace_hardirqs_on_caller+0x12f/0x153
 [<ffffffff810e9988>] sys_mkdir+0x13/0x15
 [<ffffffff810112ba>] system_call_fastpath+0x16/0x1b
done
   Starting fencing... done
[  OK  ]


Version-Release number of selected component (if applicable):
kernel-2.6.29-0.53.rc2.git1.fc11.x86_64
cman-3.0.0-4.alpha3.fc11.x86_64


How reproducible:
Unknown

Steps to Reproduce:
1. service cman start
  
Actual results:
See above

Expected results:
[root@dash-01 ~]# service cman start
Starting cluster:
   Loading modules... done
   Mounting configfs... done
   Setting network parameters... done
   Starting cman... done
   Starting daemons... done
   Starting fencing... done
[  OK  ]

Additional info:

Comment 1 Steve Whitehouse 2009-02-02 16:45:52 UTC
We already know about this one. It should really be filed against configfs since that is where the problem lies. There is a long correspondence on lkml about it, and it might even get fixed shortly.

Comment 2 Steve Whitehouse 2009-02-04 11:08:00 UTC
Looks like the "fix" for this has hit upstream now.

Comment 3 Chris Feist 2009-02-24 22:56:45 UTC
Reassigning to fabio (he does the fc11 builds).

Comment 4 Fabio Massimo Di Nitto 2009-02-25 05:03:36 UTC
This is a kernel bug and it has been fixed afaict or at least I can't reproduce it anylonger on my test machines.