Bug 235675 - LSPP: INFO: possible recursive locking detected
LSPP: INFO: possible recursive locking detected
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel (Show other bugs)
5.0
All Linux
medium Severity medium
: ---
: ---
Assigned To: Eric Sandeen
Martin Jenner
: OtherQA
Depends On:
Blocks: RHEL5LSPPCertTracker
  Show dependency treegraph
 
Reported: 2007-04-09 11:07 EDT by Linda Knippers
Modified: 2009-06-19 12:35 EDT (History)
6 users (show)

See Also:
Fixed In Version: RHBA-2007-0959
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2007-11-07 14:46:26 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Linux Kernel 8130 None None None Never

  None (edit)
Description Linda Knippers 2007-04-09 11:07:26 EDT
Description of problem:
I had several systems configured for lspp running the lspp .72 kernel
sitting idle over the weekend.  Two of them had 
"INFO: possible recursive locking detected" 
messages on the console at about 4 PM Saturday morning.

Version-Release number of selected component (if applicable):
2.6.18-8.1.1.lspp.72.el5 kernel with selinux .50 policy and
the other packages from the lspp repo.

How reproducible:
I'm not sure it is reproducible.

Steps to Reproduce:
1.
2.
3.
  
Actual results:

These are the messages from each system's messages file.

This system is an x86_64 system that was refreshly installed on
Friday using the lastest ks.

Apr  7 04:04:41 cert-e3 kernel:
Apr  7 04:04:41 cert-e3 kernel: =============================================
Apr  7 04:04:41 cert-e3 kernel: [ INFO: possible recursive locking detected ]
Apr  7 04:04:41 cert-e3 kernel: 2.6.18-8.1.1.lspp.72.el5 #1
Apr  7 04:04:41 cert-e3 init: Trying to re-exec init
Apr  7 04:04:41 cert-e3 kernel: ---------------------------------------------
Apr  7 04:04:41 cert-e3 kernel: do_mq_unlink/15758 is trying to acquire lock:
Apr  7 04:04:41 cert-e3 kernel:  (&inode->i_mutex){--..}, at:
[<ffffffff80067ad2>] mutex_lock+0x2a/0x2e
Apr  7 04:04:41 cert-e3 kernel:
Apr  7 04:04:41 cert-e3 kernel: but task is already holding lock:
Apr  7 04:04:41 cert-e3 kernel:  (&inode->i_mutex){--..}, at:
[<ffffffff80067ad2>] mutex_lock+0x2a/0x2e
Apr  7 04:04:41 cert-e3 kernel:
Apr  7 04:04:41 cert-e3 kernel: other info that might help us debug this:
Apr  7 04:04:41 cert-e3 kernel: 1 lock held by do_mq_unlink/15758:
Apr  7 04:04:41 cert-e3 kernel:  #0:  (&inode->i_mutex){--..}, at:
[<ffffffff80067ad2>] mutex_lock+0x2a/0x2e
Apr  7 04:04:41 cert-e3 kernel:
Apr  7 04:04:41 cert-e3 kernel: stack backtrace:
Apr  7 04:04:41 cert-e3 kernel:
Apr  7 04:04:41 cert-e3 kernel: Call Trace:
Apr  7 04:04:41 cert-e3 kernel:  [<ffffffff800a9dea>]
__lock_acquire+0x135/0x9d9Apr  7 04:04:41 cert-e3 kernel:  [<ffffffff800aac31>]
lock_acquire+0x4b/0x69
Apr  7 04:04:41 cert-e3 kernel:  [<ffffffff80067ad2>] mutex_lock+0x2a/0x2e
Apr  7 04:04:41 cert-e3 kernel:  [<ffffffff8006792c>]
__mutex_lock_slowpath+0xe5/0x261
Apr  7 04:04:41 cert-e3 kernel:  [<ffffffff80067ad2>] mutex_lock+0x2a/0x2e
Apr  7 04:04:41 cert-e3 kernel:  [<ffffffff8004ca97>] vfs_unlink+0x8c/0x114
Apr  7 04:04:41 cert-e3 kernel:  [<ffffffff801258ea>] sys_mq_unlink+0xb9/0x103
Apr  7 04:04:41 cert-e3 kernel:  [<ffffffff80060dda>] tracesys+0xd1/0xdb
Apr  7 04:04:41 cert-e3 kernel:


This system is an i386 box freshly installed on Thursday.

Apr  7 04:05:09 kipper kernel: security:  5 users, 12 roles, 1794 types, 93
bools, 16 sens, 1024 cats
Apr  7 04:05:09 kipper kernel: security:  59 classes, 169756 rules
Apr  7 04:05:09 kipper kernel: security:  5 users, 12 roles, 1794 types, 93
bools, 16 sens, 1024 cats
Apr  7 04:05:09 kipper kernel: security:  59 classes, 169756 rules
Apr  7 04:05:09 kipper kernel: security:  5 users, 12 roles, 1794 types, 93
bools, 16 sens, 1024 cats
Apr  7 04:05:09 kipper kernel: security:  59 classes, 169756 rules
Apr  7 04:05:09 kipper kernel: security:  5 users, 12 roles, 1794 types, 93
bools, 16 sens, 1024 cats
Apr  7 04:05:09 kipper kernel: security:  59 classes, 169756 rules
Apr  7 04:05:09 kipper kernel:
Apr  7 04:05:09 kipper kernel: =============================================
Apr  7 04:05:09 kipper kernel: [ INFO: possible recursive locking detected ]
Apr  7 04:05:09 kipper kernel: 2.6.18-8.1.1.lspp.72.el5 #1
Apr  7 04:05:09 kipper kernel: ---------------------------------------------
Apr  7 04:05:09 kipper kernel: do_mq_unlink/14933 is trying to acquire lock:
Apr  7 04:05:09 kipper kernel:  (&inode->i_mutex){--..}, at: [<c0612b7d>]
mutex_lock+0x21/0x24
Apr  7 04:05:09 kipper kernel:
Apr  7 04:05:09 kipper init: Trying to re-exec init
Apr  7 04:05:09 kipper kernel: but task is already holding lock:
Apr  7 04:05:09 kipper kernel:  (&inode->i_mutex){--..}, at: [<c0612b7d>]
mutex_lock+0x21/0x24
Apr  7 04:05:10 kipper kernel:
Apr  7 04:05:10 kipper kernel: other info that might help us debug this:
Apr  7 04:05:10 kipper kernel: 1 lock held by do_mq_unlink/14933:
Apr  7 04:05:10 kipper kernel:  #0:  (&inode->i_mutex){--..}, at: [<c0612b7d>]
mutex_lock+0x21/0x24
Apr  7 04:05:10 kipper kernel:
Apr  7 04:05:10 kipper kernel: stack backtrace:
Apr  7 04:05:10 kipper kernel:  [<c04051ff>] show_trace_log_lvl+0x12/0x25
Apr  7 04:05:10 kipper kernel:  [<c040570d>] show_trace+0xd/0x10
Apr  7 04:05:10 kipper kernel:  [<c0405826>] dump_stack+0x19/0x1b
Apr  7 04:05:10 kipper kernel:  [<c043bc4d>] __lock_acquire+0x6ea/0x90d
Apr  7 04:05:10 kipper kernel:  [<c043c3e1>] lock_acquire+0x4b/0x6a
Apr  7 04:05:10 kipper kernel:  [<c0612a0e>] __mutex_lock_slowpath+0xbc/0x20a
Apr  7 04:05:10 kipper kernel:  [<c0612b7d>] mutex_lock+0x21/0x24
Apr  7 04:05:10 kipper kernel:  [<c047f96c>] vfs_unlink+0x73/0xe3
Apr  7 04:05:10 kipper kernel:  [<c04bdb72>] sys_mq_unlink+0x7b/0xd8
Apr  7 04:05:10 kipper kernel:  [<c0403fd7>] syscall_call+0x7/0xb
Apr  7 04:05:10 kipper kernel:  =======================

I have a third system which is an ia64 box that hasn't shown
this.  The ia64 box was installed a long time ago but is running
the same kernel.

Expected results:
No messages.

Additional info:

Since both systems showed the warning at about the same time I looked
to see what is running out of cron about then.  I'm guessing its
related to the prelink cron job running since it will signal init,
and both systems had a "Trying to re-exec init" message.  Another
clue is that the ia64 box, which didn't show the messages, isn't
configured for prelink.

On the two systems that did, I've seen the "Trying to re-exec init"
messages at about the same time on other days without the recursive
lock warning.
Comment 1 George C. Wilson 2007-04-09 16:50:46 EDT
eparis:  I will have somebody look at this.
Comment 2 Eric Sandeen 2007-04-09 17:51:09 EDT
Odds are this locking annotation will fix it:

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=7a434814c7a6500b08bf4419ba8712b152d08d08

--- a/ipc/mqueue.c
+++ b/ipc/mqueue.c
@@ -731,7 +731,8 @@ asmlinkage long sys_mq_unlink(const char __user *u_name)
 	if (IS_ERR(name))
 		return PTR_ERR(name);
 
-	mutex_lock(&mqueue_mnt->mnt_root->d_inode->i_mutex);
+	mutex_lock_nested(&mqueue_mnt->mnt_root->d_inode->i_mutex,
+			I_MUTEX_PARENT);
 	dentry = lookup_one_len(name, mqueue_mnt->mnt_root, strlen(name));
 	if (IS_ERR(dentry)) {
 		err = PTR_ERR(dentry);
Comment 3 Eric Sandeen 2007-04-09 17:52:16 EDT
See also
http://bugzilla.kernel.org/show_bug.cgi?id=8130
Comment 4 George C. Wilson 2007-04-11 20:03:43 EDT
Linda, any way you can you verify that this is fixed in the updated package?
Comment 5 Linda Knippers 2007-04-11 20:24:52 EDT
Did this get pulled into the lspp kernel?  I see references to the upstream
bug fix but I wasn't sure it was built into one of our kernels.

This problem wasn't reproducible so I'm not sure I can verify it but I'll
run the latest kernel and see what happens.  The upstream bug report looks
to be a good match.
Comment 6 Steve Grubb 2007-04-11 21:10:46 EDT
Yes, this was put into lspp.73.
Comment 8 Steve Grubb 2007-04-16 16:24:45 EDT
Removing lspp tracker. Seems fixed. Thanks.
Comment 9 RHEL Product and Program Management 2007-04-17 15:42:40 EDT
This request was evaluated by Red Hat Kernel Team for inclusion in a Red
Hat Enterprise Linux maintenance release, and has moved to bugzilla 
status POST.
Comment 10 Don Zickus 2007-07-19 17:12:13 EDT
in kernel-2.6.18-16.el5
Comment 12 John Poelstra 2007-08-14 15:41:16 EDT
A fix for this issue has been included in the packages contained in the beta
(RHN channel) or most recent snapshot (partners.redhat.com) for RHEL5.1.  Please
verify that your issue is fixed.

After you (Red Hat Partner) have verified that this issue has been addressed,
please perform the following:
1) Change the *status* of this bug to VERIFIED.
2) Add *keyword* of PartnerVerified (leaving the existing keywords unmodified)

If this issue is not fixed, please add a comment describing the most recent
symptoms of the problem you are having and change the status of the bug to ASSIGNED.
Comment 13 John Poelstra 2007-08-14 16:52:12 EDT
A fix for this issue has been included in the packages contained in the beta
(RHN channel) or most recent snapshot (partners.redhat.com) for RHEL5.1.  Please
verify that your issue is fixed.

After you (Red Hat Partner) have verified that this issue has been addressed,
please perform the following:
1) Change the *status* of this bug to VERIFIED.
2) Add *keyword* of PartnerVerified (leaving the existing keywords unmodified)

If this issue is not fixed, please add a comment describing the most recent
symptoms of the problem you are having and change the status of the bug to ASSIGNED.
Comment 14 Linda Knippers 2007-08-14 18:29:40 EDT
The problem seems to be fixed in the 5.1 beta.
Comment 16 errata-xmlrpc 2007-11-07 14:46:26 EST
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2007-0959.html

Note You need to log in before you can comment on or make changes to this bug.