Bug 235675

Summary: LSPP: INFO: possible recursive locking detected
Product: Red Hat Enterprise Linux 5 Reporter: Linda Knippers <linda.knippers>
Component: kernelAssignee: Eric Sandeen <esandeen>
Status: CLOSED ERRATA QA Contact: Martin Jenner <mjenner>
Severity: medium Docs Contact:
Priority: medium    
Version: 5.0CC: eparis, esandeen, iboverma, krisw, poelstra, sgrubb
Target Milestone: ---Keywords: OtherQA
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: RHBA-2007-0959 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-11-07 19:46:26 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 224041    

Description Linda Knippers 2007-04-09 15:07:26 UTC
Description of problem:
I had several systems configured for lspp running the lspp .72 kernel
sitting idle over the weekend.  Two of them had 
"INFO: possible recursive locking detected" 
messages on the console at about 4 PM Saturday morning.

Version-Release number of selected component (if applicable):
2.6.18-8.1.1.lspp.72.el5 kernel with selinux .50 policy and
the other packages from the lspp repo.

How reproducible:
I'm not sure it is reproducible.

Steps to Reproduce:
1.
2.
3.
  
Actual results:

These are the messages from each system's messages file.

This system is an x86_64 system that was refreshly installed on
Friday using the lastest ks.

Apr  7 04:04:41 cert-e3 kernel:
Apr  7 04:04:41 cert-e3 kernel: =============================================
Apr  7 04:04:41 cert-e3 kernel: [ INFO: possible recursive locking detected ]
Apr  7 04:04:41 cert-e3 kernel: 2.6.18-8.1.1.lspp.72.el5 #1
Apr  7 04:04:41 cert-e3 init: Trying to re-exec init
Apr  7 04:04:41 cert-e3 kernel: ---------------------------------------------
Apr  7 04:04:41 cert-e3 kernel: do_mq_unlink/15758 is trying to acquire lock:
Apr  7 04:04:41 cert-e3 kernel:  (&inode->i_mutex){--..}, at:
[<ffffffff80067ad2>] mutex_lock+0x2a/0x2e
Apr  7 04:04:41 cert-e3 kernel:
Apr  7 04:04:41 cert-e3 kernel: but task is already holding lock:
Apr  7 04:04:41 cert-e3 kernel:  (&inode->i_mutex){--..}, at:
[<ffffffff80067ad2>] mutex_lock+0x2a/0x2e
Apr  7 04:04:41 cert-e3 kernel:
Apr  7 04:04:41 cert-e3 kernel: other info that might help us debug this:
Apr  7 04:04:41 cert-e3 kernel: 1 lock held by do_mq_unlink/15758:
Apr  7 04:04:41 cert-e3 kernel:  #0:  (&inode->i_mutex){--..}, at:
[<ffffffff80067ad2>] mutex_lock+0x2a/0x2e
Apr  7 04:04:41 cert-e3 kernel:
Apr  7 04:04:41 cert-e3 kernel: stack backtrace:
Apr  7 04:04:41 cert-e3 kernel:
Apr  7 04:04:41 cert-e3 kernel: Call Trace:
Apr  7 04:04:41 cert-e3 kernel:  [<ffffffff800a9dea>]
__lock_acquire+0x135/0x9d9Apr  7 04:04:41 cert-e3 kernel:  [<ffffffff800aac31>]
lock_acquire+0x4b/0x69
Apr  7 04:04:41 cert-e3 kernel:  [<ffffffff80067ad2>] mutex_lock+0x2a/0x2e
Apr  7 04:04:41 cert-e3 kernel:  [<ffffffff8006792c>]
__mutex_lock_slowpath+0xe5/0x261
Apr  7 04:04:41 cert-e3 kernel:  [<ffffffff80067ad2>] mutex_lock+0x2a/0x2e
Apr  7 04:04:41 cert-e3 kernel:  [<ffffffff8004ca97>] vfs_unlink+0x8c/0x114
Apr  7 04:04:41 cert-e3 kernel:  [<ffffffff801258ea>] sys_mq_unlink+0xb9/0x103
Apr  7 04:04:41 cert-e3 kernel:  [<ffffffff80060dda>] tracesys+0xd1/0xdb
Apr  7 04:04:41 cert-e3 kernel:


This system is an i386 box freshly installed on Thursday.

Apr  7 04:05:09 kipper kernel: security:  5 users, 12 roles, 1794 types, 93
bools, 16 sens, 1024 cats
Apr  7 04:05:09 kipper kernel: security:  59 classes, 169756 rules
Apr  7 04:05:09 kipper kernel: security:  5 users, 12 roles, 1794 types, 93
bools, 16 sens, 1024 cats
Apr  7 04:05:09 kipper kernel: security:  59 classes, 169756 rules
Apr  7 04:05:09 kipper kernel: security:  5 users, 12 roles, 1794 types, 93
bools, 16 sens, 1024 cats
Apr  7 04:05:09 kipper kernel: security:  59 classes, 169756 rules
Apr  7 04:05:09 kipper kernel: security:  5 users, 12 roles, 1794 types, 93
bools, 16 sens, 1024 cats
Apr  7 04:05:09 kipper kernel: security:  59 classes, 169756 rules
Apr  7 04:05:09 kipper kernel:
Apr  7 04:05:09 kipper kernel: =============================================
Apr  7 04:05:09 kipper kernel: [ INFO: possible recursive locking detected ]
Apr  7 04:05:09 kipper kernel: 2.6.18-8.1.1.lspp.72.el5 #1
Apr  7 04:05:09 kipper kernel: ---------------------------------------------
Apr  7 04:05:09 kipper kernel: do_mq_unlink/14933 is trying to acquire lock:
Apr  7 04:05:09 kipper kernel:  (&inode->i_mutex){--..}, at: [<c0612b7d>]
mutex_lock+0x21/0x24
Apr  7 04:05:09 kipper kernel:
Apr  7 04:05:09 kipper init: Trying to re-exec init
Apr  7 04:05:09 kipper kernel: but task is already holding lock:
Apr  7 04:05:09 kipper kernel:  (&inode->i_mutex){--..}, at: [<c0612b7d>]
mutex_lock+0x21/0x24
Apr  7 04:05:10 kipper kernel:
Apr  7 04:05:10 kipper kernel: other info that might help us debug this:
Apr  7 04:05:10 kipper kernel: 1 lock held by do_mq_unlink/14933:
Apr  7 04:05:10 kipper kernel:  #0:  (&inode->i_mutex){--..}, at: [<c0612b7d>]
mutex_lock+0x21/0x24
Apr  7 04:05:10 kipper kernel:
Apr  7 04:05:10 kipper kernel: stack backtrace:
Apr  7 04:05:10 kipper kernel:  [<c04051ff>] show_trace_log_lvl+0x12/0x25
Apr  7 04:05:10 kipper kernel:  [<c040570d>] show_trace+0xd/0x10
Apr  7 04:05:10 kipper kernel:  [<c0405826>] dump_stack+0x19/0x1b
Apr  7 04:05:10 kipper kernel:  [<c043bc4d>] __lock_acquire+0x6ea/0x90d
Apr  7 04:05:10 kipper kernel:  [<c043c3e1>] lock_acquire+0x4b/0x6a
Apr  7 04:05:10 kipper kernel:  [<c0612a0e>] __mutex_lock_slowpath+0xbc/0x20a
Apr  7 04:05:10 kipper kernel:  [<c0612b7d>] mutex_lock+0x21/0x24
Apr  7 04:05:10 kipper kernel:  [<c047f96c>] vfs_unlink+0x73/0xe3
Apr  7 04:05:10 kipper kernel:  [<c04bdb72>] sys_mq_unlink+0x7b/0xd8
Apr  7 04:05:10 kipper kernel:  [<c0403fd7>] syscall_call+0x7/0xb
Apr  7 04:05:10 kipper kernel:  =======================

I have a third system which is an ia64 box that hasn't shown
this.  The ia64 box was installed a long time ago but is running
the same kernel.

Expected results:
No messages.

Additional info:

Since both systems showed the warning at about the same time I looked
to see what is running out of cron about then.  I'm guessing its
related to the prelink cron job running since it will signal init,
and both systems had a "Trying to re-exec init" message.  Another
clue is that the ia64 box, which didn't show the messages, isn't
configured for prelink.

On the two systems that did, I've seen the "Trying to re-exec init"
messages at about the same time on other days without the recursive
lock warning.

Comment 1 George C. Wilson 2007-04-09 20:50:46 UTC
eparis:  I will have somebody look at this.

Comment 2 Eric Sandeen 2007-04-09 21:51:09 UTC
Odds are this locking annotation will fix it:

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=7a434814c7a6500b08bf4419ba8712b152d08d08

--- a/ipc/mqueue.c
+++ b/ipc/mqueue.c
@@ -731,7 +731,8 @@ asmlinkage long sys_mq_unlink(const char __user *u_name)
 	if (IS_ERR(name))
 		return PTR_ERR(name);
 
-	mutex_lock(&mqueue_mnt->mnt_root->d_inode->i_mutex);
+	mutex_lock_nested(&mqueue_mnt->mnt_root->d_inode->i_mutex,
+			I_MUTEX_PARENT);
 	dentry = lookup_one_len(name, mqueue_mnt->mnt_root, strlen(name));
 	if (IS_ERR(dentry)) {
 		err = PTR_ERR(dentry);

Comment 3 Eric Sandeen 2007-04-09 21:52:16 UTC
See also
http://bugzilla.kernel.org/show_bug.cgi?id=8130

Comment 4 George C. Wilson 2007-04-12 00:03:43 UTC
Linda, any way you can you verify that this is fixed in the updated package?

Comment 5 Linda Knippers 2007-04-12 00:24:52 UTC
Did this get pulled into the lspp kernel?  I see references to the upstream
bug fix but I wasn't sure it was built into one of our kernels.

This problem wasn't reproducible so I'm not sure I can verify it but I'll
run the latest kernel and see what happens.  The upstream bug report looks
to be a good match.

Comment 6 Steve Grubb 2007-04-12 01:10:46 UTC
Yes, this was put into lspp.73.

Comment 8 Steve Grubb 2007-04-16 20:24:45 UTC
Removing lspp tracker. Seems fixed. Thanks.

Comment 9 RHEL Program Management 2007-04-17 19:42:40 UTC
This request was evaluated by Red Hat Kernel Team for inclusion in a Red
Hat Enterprise Linux maintenance release, and has moved to bugzilla 
status POST.

Comment 10 Don Zickus 2007-07-19 21:12:13 UTC
in kernel-2.6.18-16.el5

Comment 12 John Poelstra 2007-08-14 19:41:16 UTC
A fix for this issue has been included in the packages contained in the beta
(RHN channel) or most recent snapshot (partners.redhat.com) for RHEL5.1.  Please
verify that your issue is fixed.

After you (Red Hat Partner) have verified that this issue has been addressed,
please perform the following:
1) Change the *status* of this bug to VERIFIED.
2) Add *keyword* of PartnerVerified (leaving the existing keywords unmodified)

If this issue is not fixed, please add a comment describing the most recent
symptoms of the problem you are having and change the status of the bug to ASSIGNED.

Comment 13 John Poelstra 2007-08-14 20:52:12 UTC
A fix for this issue has been included in the packages contained in the beta
(RHN channel) or most recent snapshot (partners.redhat.com) for RHEL5.1.  Please
verify that your issue is fixed.

After you (Red Hat Partner) have verified that this issue has been addressed,
please perform the following:
1) Change the *status* of this bug to VERIFIED.
2) Add *keyword* of PartnerVerified (leaving the existing keywords unmodified)

If this issue is not fixed, please add a comment describing the most recent
symptoms of the problem you are having and change the status of the bug to ASSIGNED.

Comment 14 Linda Knippers 2007-08-14 22:29:40 UTC
The problem seems to be fixed in the 5.1 beta.

Comment 16 errata-xmlrpc 2007-11-07 19:46:26 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2007-0959.html