Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
For bugs related to Red Hat Enterprise Linux 5 product line. The current stable release is 5.10. For Red Hat Enterprise Linux 6 and above, please visit Red Hat JIRA https://issues.redhat.com/secure/CreateIssue!default.jspa?pid=12332745 to report new issues.

Bug 567092

Summary: possible recursive locking of inode by nfsd
Product: Red Hat Enterprise Linux 5 Reporter: Jeff Layton <jlayton>
Component: kernelAssignee: Jeff Layton <jlayton>
Status: CLOSED ERRATA QA Contact: yanfu,wang <yanwang>
Severity: low Docs Contact:
Priority: low    
Version: 5.5CC: rwheeler, steved, yanwang
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard: Connectathon2010
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-01-13 21:07:36 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Jeff Layton 2010-02-21 16:12:43 UTC
Noticed this on the console this morning of the RHEL5 NFS server at connectathon:

nfs4_cb: server 0.0.0.0 not responding, timed out

=============================================
[ INFO: possible recursive locking detected ]
2.6.18-189.el5debug #1
---------------------------------------------
nfsd/2103 is trying to acquire lock:
 (&inode->i_mutex){--..}, at: [<ffffffff8004d1df>] vfs_rmdir+0x88/0x11f

but task is already holding lock:
 (&inode->i_mutex){--..}, at: [<ffffffff88637479>] nfsd4_clear_clid_dir+0x2e/0x56 [nfsd]

other info that might help us debug this:
3 locks held by nfsd/2103:
 #0:  (hash_sem){..--}, at: [<ffffffff8861b72f>] nfsd+0x18b/0x2cc [nfsd]
 #1:  (client_mutex){--..}, at: [<ffffffff88633a86>] nfsd4_setclientid_confirm+0x38/0x34f [nfsd]
 #2:  (&inode->i_mutex){--..}, at: [<ffffffff88637479>] nfsd4_clear_clid_dir+0x2e/0x56 [nfsd]

stack backtrace:

Call Trace:
 [<ffffffff800aa5b2>] __lock_acquire+0x1c4/0xaee
 [<ffffffff8004d1df>] vfs_rmdir+0x88/0x11f
 [<ffffffff800aaf31>] lock_acquire+0x55/0x6f
 [<ffffffff8004d1df>] vfs_rmdir+0x88/0x11f
 [<ffffffff80067224>] mutex_lock_nested+0x104/0x29c
 [<ffffffff8004d1df>] vfs_rmdir+0x88/0x11f
 [<ffffffff88637485>] :nfsd:nfsd4_clear_clid_dir+0x3a/0x56
 [<ffffffff886375b0>] :nfsd:nfsd4_remove_clid_dir+0xce/0x126
 [<ffffffff88633c3f>] :nfsd:nfsd4_setclientid_confirm+0x1f1/0x34f
 [<ffffffff8000b076>] kmem_cache_alloc+0xdf/0xeb
 [<ffffffff8862b048>] :nfsd:nfsd4_proc_compound+0x118d/0x1451
 [<ffffffff8003251b>] sock_recvmsg+0x107/0x15f
 [<ffffffff800a834a>] lock_release_holdtime+0x27/0x48
 [<ffffffff80049bd5>] try_to_wake_up+0x478/0x48a
 [<ffffffff80068955>] _spin_unlock_irqrestore+0x3e/0x44
 [<ffffffff800a9ec4>] mark_held_locks+0x50/0x6b
 [<ffffffff80068955>] _spin_unlock_irqrestore+0x3e/0x44
 [<ffffffff8009986b>] local_bh_enable_ip+0xed/0xf4
 [<ffffffff800aa099>] trace_hardirqs_on+0x11b/0x13f
 [<ffffffff8853401b>] :sunrpc:svc_tcp_recvfrom+0x744/0x7cc
 [<ffffffff800a834a>] lock_release_holdtime+0x27/0x48
 [<ffffffff8006840d>] _read_unlock+0x17/0x20
 [<ffffffff885382da>] :sunrpc:sunrpc_cache_lookup+0x59/0x136
 [<ffffffff8861b1db>] :nfsd:nfsd_dispatch+0xd8/0x1d6
 [<ffffffff88531925>] :sunrpc:svc_process+0x454/0x71b
 [<ffffffff800688ed>] _spin_unlock_irq+0x24/0x27
 [<ffffffff8861b5a4>] :nfsd:nfsd+0x0/0x2cc
 [<ffffffff8861b74b>] :nfsd:nfsd+0x1a7/0x2cc
 [<ffffffff80061079>] child_rip+0xa/0x11
 [<ffffffff800606a8>] restore_args+0x0/0x30
 [<ffffffff800d7145>] zone_statistics+0x3e/0x6d
 [<ffffffff8861b5a4>] :nfsd:nfsd+0x0/0x2cc
 [<ffffffff8006106f>] child_rip+0x0/0x11

svc: unknown version (2 for prog 100227 nfsacl)
nfs4_cb: server 0.0.0.0 not responding, timed out
nfsd: request from insecure port (172.16.22.2:57705)!

Comment 1 Jeff Layton 2010-02-21 16:22:48 UTC
Kernel here is 2.6.18-189.el5debug

Comment 2 Jeff Layton 2010-02-21 16:29:08 UTC
Problem looks like it's probably harmless and would be fixed by 4b75f78edcab291eb29fe9a205cbf7b80c1c644f

Comment 4 RHEL Program Management 2010-06-02 17:32:10 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 6 Jarod Wilson 2010-07-19 21:14:36 UTC
in kernel-2.6.18-207.el5
You can download this test kernel from http://people.redhat.com/jwilson/el5

Detailed testing feedback is always welcomed.

Comment 8 yanfu,wang 2010-10-20 09:51:22 UTC
hi Jeff,
if you used the testcase of http://cvs.devel.redhat.com/cgi-bin/cvsweb.cgi/tests/kernel/filesystems/nfs/connectathon/?

Comment 9 Jeff Layton 2010-10-20 11:19:08 UTC
No, I saw this warning pop while we were at the Connectathon interop event this year. While there, we had a RHEL5 server set up for other clients to test against. I saw this pop up on the console, but was never quite sure what series of client operations caused it.

So, unfortunately we have no reproducer that makes this warning pop.

Comment 10 yanfu,wang 2010-10-21 09:44:53 UTC
patch build is sane and set the bug to SanityOnly.

Comment 12 errata-xmlrpc 2011-01-13 21:07:36 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-0017.html