Bug 280311

Summary: circular locking in NLM subsystem
Product: [Fedora] Fedora Reporter: Jeff Layton <jlayton>
Component: kernelAssignee: Jeff Layton <jlayton>
Status: CLOSED UPSTREAM QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: low Docs Contact:
Priority: medium    
Version: rawhideCC: dzu, staubach, steved
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-09-22 21:34:01 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
kernel log
none
NLM: Fix a circular lock dependency in lockd
none
reproducer program none

Description Jeff Layton 2007-09-06 11:15:50 UTC
While trying to reproduce the problem in bug #253754 in rawhide, I saw this pop
up on a debug kernel. The situation was that a file on an exported filesystem
was locked by a local process and a client was blocking trying to get a lock on
the same file.

I was also playing with iptables rules at the time:


=======================================================
[ INFO: possible circular locking dependency detected ]
2.6.23-0.130.rc3.git5.killattr.1.fc8debug #1
-------------------------------------------------------
lockd/2105 is trying to acquire lock:
 (&file->f_mutex){--..}, at: [<ffffffff88279493>]
nlmsvc_traverse_blocks+0x26/0x9f [lockd]

but task is already holding lock:
 (nlm_host_mutex){--..}, at: [<ffffffff88277b0a>] nlm_lookup_host+0xda/0x3a4 [lockd]

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #1 (nlm_host_mutex){--..}:
       [<ffffffff81055121>] __lock_acquire+0xac0/0xcc9
       [<ffffffff81055740>] lock_acquire+0x5a/0x73
       [<ffffffff88277b0a>] nlm_lookup_host+0xda/0x3a4 [lockd]
       [<ffffffff81051c61>] debug_mutex_lock_common+0x16/0x23
       [<ffffffff812716e3>] __mutex_lock_slowpath+0xf6/0x2a6
       [<ffffffff81053a2c>] check_usage_backwards+0x1e/0x43
       [<ffffffff88277b0a>] nlm_lookup_host+0xda/0x3a4 [lockd]
       [<ffffffff81053ff7>] mark_held_locks+0x49/0x67
       [<ffffffff8127187b>] __mutex_lock_slowpath+0x28e/0x2a6
       [<ffffffff810541e9>] trace_hardirqs_on+0x12e/0x151
       [<ffffffff88277e28>] nlmsvc_lookup_host+0x54/0x59 [lockd]
       [<ffffffff88279718>] nlmsvc_lock+0xb9/0x319 [lockd]
       [<ffffffff8827ce66>] nlm4svc_proc_lock+0x8f/0xda [lockd]
       [<ffffffff8820d758>] svc_process+0x385/0x709 [sunrpc]
       [<ffffffff810541e9>] trace_hardirqs_on+0x12e/0x151
       [<ffffffff882786c5>] lockd+0x18a/0x269 [lockd]
       [<ffffffff81272a7b>] trace_hardirqs_on_thunk+0x35/0x37
       [<ffffffff810541e9>] trace_hardirqs_on+0x12e/0x151
       [<ffffffff81272a7b>] trace_hardirqs_on_thunk+0x35/0x37
       [<ffffffff8100cb18>] child_rip+0xa/0x12
       [<ffffffff8100c22c>] restore_args+0x0/0x30
       [<ffffffff8820cc24>] __svc_create_thread+0xe6/0x1ef [sunrpc]
       [<ffffffff8827853b>] lockd+0x0/0x269 [lockd]
       [<ffffffff8100cb0e>] child_rip+0x0/0x12
       [<ffffffffffffffff>] 0xffffffffffffffff

-> #0 (&file->f_mutex){--..}:
       [<ffffffff81053495>] print_circular_bug_tail+0x32/0x72
       [<ffffffff810538dc>] print_circular_bug_header+0xcc/0xd3
       [<ffffffff81055022>] __lock_acquire+0x9c1/0xcc9
       [<ffffffff81055740>] lock_acquire+0x5a/0x73
       [<ffffffff88279493>] nlmsvc_traverse_blocks+0x26/0x9f [lockd]
       [<ffffffff81051c61>] debug_mutex_lock_common+0x16/0x23
       [<ffffffff812716e3>] __mutex_lock_slowpath+0xf6/0x2a6
       [<ffffffff8827a684>] nlmsvc_mark_host+0x0/0x7 [lockd]
       [<ffffffff88279493>] nlmsvc_traverse_blocks+0x26/0x9f [lockd]
       [<ffffffff8827a684>] nlmsvc_mark_host+0x0/0x7 [lockd]
       [<ffffffff8827a86d>] nlm_traverse_files+0x1be/0x216 [lockd]
       [<ffffffff81053ff7>] mark_held_locks+0x49/0x67
       [<ffffffff8127187b>] __mutex_lock_slowpath+0x28e/0x2a6
       [<ffffffff810541e9>] trace_hardirqs_on+0x12e/0x151
       [<ffffffff81271886>] __mutex_lock_slowpath+0x299/0x2a6
       [<ffffffff882775a8>] nlm_gc_hosts+0x54/0x18d [lockd]
       [<ffffffff88277b24>] nlm_lookup_host+0xf4/0x3a4 [lockd]
       [<ffffffff8100c22c>] restore_args+0x0/0x30
       [<ffffffff81043b48>] groups_alloc+0x4b/0xd4
       [<ffffffff88277e28>] nlmsvc_lookup_host+0x54/0x59 [lockd]
       [<ffffffff8827ca1e>] nlm4svc_retrieve_args+0x42/0xd7 [lockd]
       [<ffffffff8827ce3b>] nlm4svc_proc_lock+0x64/0xda [lockd]
       [<ffffffff8827c4e5>] nlm4svc_decode_lockargs+0x37/0x9a [lockd]
       [<ffffffff8820d758>] svc_process+0x385/0x709 [sunrpc]
       [<ffffffff810541e9>] trace_hardirqs_on+0x12e/0x151
       [<ffffffff882786c5>] lockd+0x18a/0x269 [lockd]
       [<ffffffff81272a7b>] trace_hardirqs_on_thunk+0x35/0x37
       [<ffffffff810541e9>] trace_hardirqs_on+0x12e/0x151
       [<ffffffff81272a7b>] trace_hardirqs_on_thunk+0x35/0x37
       [<ffffffff8100cb18>] child_rip+0xa/0x12
       [<ffffffff8100c22c>] restore_args+0x0/0x30
       [<ffffffff8820cc24>] __svc_create_thread+0xe6/0x1ef [sunrpc]
       [<ffffffff8827853b>] lockd+0x0/0x269 [lockd]
       [<ffffffff8100cb0e>] child_rip+0x0/0x12
       [<ffffffffffffffff>] 0xffffffffffffffff

other info that might help us debug this:

1 lock held by lockd/2105:
 #0:  (nlm_host_mutex){--..}, at: [<ffffffff88277b0a>]
nlm_lookup_host+0xda/0x3a4 [lockd]

stack backtrace:

Call Trace:
 [<ffffffff810534cc>] print_circular_bug_tail+0x69/0x72
 [<ffffffff810538dc>] print_circular_bug_header+0xcc/0xd3
 [<ffffffff81055022>] __lock_acquire+0x9c1/0xcc9
 [<ffffffff81055740>] lock_acquire+0x5a/0x73
 [<ffffffff88279493>] :lockd:nlmsvc_traverse_blocks+0x26/0x9f
 [<ffffffff81051c61>] debug_mutex_lock_common+0x16/0x23
 [<ffffffff812716e3>] __mutex_lock_slowpath+0xf6/0x2a6
 [<ffffffff8827a684>] :lockd:nlmsvc_mark_host+0x0/0x7
 [<ffffffff88279493>] :lockd:nlmsvc_traverse_blocks+0x26/0x9f
 [<ffffffff8827a684>] :lockd:nlmsvc_mark_host+0x0/0x7
 [<ffffffff8827a86d>] :lockd:nlm_traverse_files+0x1be/0x216
 [<ffffffff81053ff7>] mark_held_locks+0x49/0x67
 [<ffffffff8127187b>] __mutex_lock_slowpath+0x28e/0x2a6
 [<ffffffff810541e9>] trace_hardirqs_on+0x12e/0x151
 [<ffffffff81271886>] __mutex_lock_slowpath+0x299/0x2a6
 [<ffffffff882775a8>] :lockd:nlm_gc_hosts+0x54/0x18d
 [<ffffffff88277b24>] :lockd:nlm_lookup_host+0xf4/0x3a4
 [<ffffffff8100c22c>] restore_args+0x0/0x30
 [<ffffffff81043b48>] groups_alloc+0x4b/0xd4
 [<ffffffff88277e28>] :lockd:nlmsvc_lookup_host+0x54/0x59
 [<ffffffff8827ca1e>] :lockd:nlm4svc_retrieve_args+0x42/0xd7
 [<ffffffff8827ce3b>] :lockd:nlm4svc_proc_lock+0x64/0xda
 [<ffffffff8827c4e5>] :lockd:nlm4svc_decode_lockargs+0x37/0x9a
 [<ffffffff8820d758>] :sunrpc:svc_process+0x385/0x709
 [<ffffffff810541e9>] trace_hardirqs_on+0x12e/0x151
 [<ffffffff882786c5>] :lockd:lockd+0x18a/0x269
 [<ffffffff81272a7b>] trace_hardirqs_on_thunk+0x35/0x37
 [<ffffffff810541e9>] trace_hardirqs_on+0x12e/0x151
 [<ffffffff81272a7b>] trace_hardirqs_on_thunk+0x35/0x37
 [<ffffffff8100cb18>] child_rip+0xa/0x12
 [<ffffffff8100c22c>] restore_args+0x0/0x30
 [<ffffffff8820cc24>] :sunrpc:__svc_create_thread+0xe6/0x1ef
 [<ffffffff8827853b>] :lockd:lockd+0x0/0x269
 [<ffffffff8100cb0e>] child_rip+0x0/0x12

Comment 1 Jeff Layton 2007-09-06 11:57:16 UTC
I'm not exactly clear on what lockdep is complaining about, but I think it's
lock ordering. It had already seen lockd take the f_mutex then the
nlm_host_mutex, but apparently the codepath above is now doing this in the
reverse order.


Comment 2 Kenneth Johansson 2007-09-15 02:08:54 UTC
Created attachment 196301 [details]
kernel log

Comment 3 Detlev Zundel 2007-09-19 10:33:23 UTC
As an additional data point, I am seeing the same problem with 2.6.22.5-49.fc6debug:

=======================================================
[ INFO: possible circular locking dependency detected ]
2.6.22.5-49.fc6debug #1
-------------------------------------------------------
lockd/3100 is trying to acquire lock:
 (&file->f_mutex){--..}, at: [<c0646a8e>] mutex_lock+0x21/0x24

but task is already holding lock:
 (nlm_host_mutex){--..}, at: [<c0646a8e>] mutex_lock+0x21/0x24

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #1 (nlm_host_mutex){--..}:
       [<c0446a92>] __lock_acquire+0x9ea/0xb75
       [<c044700f>] lock_acquire+0x56/0x6f
       [<c06468f6>] __mutex_lock_slowpath+0xf7/0x26e
       [<c0646a8e>] mutex_lock+0x21/0x24
       [<f8ee491c>] nlm_lookup_host+0x8f/0x2d9 [lockd]
       [<f8ee4b96>] nlmsvc_lookup_host+0x30/0x37 [lockd]
       [<f8ee6393>] nlmsvc_lock+0xc1/0x2f6 [lockd]
       [<f8ee9a25>] nlm4svc_proc_lock+0x8b/0xd3 [lockd]
       [<f8f4abc5>] svc_process+0x336/0x66d [sunrpc]
       [<f8ee5363>] lockd+0x14c/0x225 [lockd]
       [<c0405c8f>] kernel_thread_helper+0x7/0x10
       [<ffffffff>] 0xffffffff

-> #0 (&file->f_mutex){--..}:
       [<c0446976>] __lock_acquire+0x8ce/0xb75
       [<c044700f>] lock_acquire+0x56/0x6f
       [<c06468f6>] __mutex_lock_slowpath+0xf7/0x26e
       [<c0646a8e>] mutex_lock+0x21/0x24
       [<f8ee6120>] nlmsvc_traverse_blocks+0x22/0x8c [lockd]
       [<f8ee7418>] nlm_traverse_files+0x1b1/0x20c [lockd]
       [<f8ee74f5>] nlmsvc_mark_resources+0x27/0x29 [lockd]
       [<f8ee446c>] nlm_gc_hosts+0x49/0x19b [lockd]
       [<f8ee4933>] nlm_lookup_host+0xa6/0x2d9 [lockd]
       [<f8ee4b96>] nlmsvc_lookup_host+0x30/0x37 [lockd]
       [<f8ee9570>] nlm4svc_retrieve_args+0x38/0xb6 [lockd]
       [<f8ee99f9>] nlm4svc_proc_lock+0x5f/0xd3 [lockd]
       [<f8f4abc5>] svc_process+0x336/0x66d [sunrpc]
       [<f8ee5363>] lockd+0x14c/0x225 [lockd]
       [<c0405c8f>] kernel_thread_helper+0x7/0x10
       [<ffffffff>] 0xffffffff

other info that might help us debug this:

1 lock held by lockd/3100:
 #0:  (nlm_host_mutex){--..}, at: [<c0646a8e>] mutex_lock+0x21/0x24

stack backtrace:
 [<c04061ac>] show_trace_log_lvl+0x1a/0x2f
 [<c0406c30>] show_trace+0x12/0x14
 [<c0406c89>] dump_stack+0x16/0x18
 [<c04451c3>] print_circular_bug_tail+0x5f/0x68
 [<c0446976>] __lock_acquire+0x8ce/0xb75
 [<c044700f>] lock_acquire+0x56/0x6f
 [<c06468f6>] __mutex_lock_slowpath+0xf7/0x26e
 [<c0646a8e>] mutex_lock+0x21/0x24
 [<f8ee6120>] nlmsvc_traverse_blocks+0x22/0x8c [lockd]
 [<f8ee7418>] nlm_traverse_files+0x1b1/0x20c [lockd]
 [<f8ee74f5>] nlmsvc_mark_resources+0x27/0x29 [lockd]
 [<f8ee446c>] nlm_gc_hosts+0x49/0x19b [lockd]
 [<f8ee4933>] nlm_lookup_host+0xa6/0x2d9 [lockd]
 [<f8ee4b96>] nlmsvc_lookup_host+0x30/0x37 [lockd]
 [<f8ee9570>] nlm4svc_retrieve_args+0x38/0xb6 [lockd]
 [<f8ee99f9>] nlm4svc_proc_lock+0x5f/0xd3 [lockd]
 [<f8f4abc5>] svc_process+0x336/0x66d [sunrpc]
 [<f8ee5363>] lockd+0x14c/0x225 [lockd]
 [<c0405c8f>] kernel_thread_helper+0x7/0x10
 =======================


Comment 4 Jeff Layton 2007-09-19 20:12:46 UTC
Detlev,
  Did you happen to notice whether lockd was hung after seeing this warning?
Unfortunately I've not been able to reproduce this since (though I haven't tried
too hard)...

After asking someone more knowledgeable about lockdep, I think the issue is that
the kernel thinks there is a problem with lock ordering. The only takers of
these locks seem to be lockd though, so ordering shouldn't really be much of an
issue. From the stack traces above, the 

Supposedly, lockdep can get this wrong sometimes...maybe this is one of those cases?

I'd be less inclined to think so if lockd was really hung after these messages
popped...


Comment 5 Detlev Zundel 2007-09-20 10:27:39 UTC
Jeff,
lockd did not hang after the message.  In fact it is still running nicely.

From what I understand of lockdep, it warns us that one codepath took the 2
locks already in a different order than what is currently tried - so it _may_
happen that if both these codepaths should ever run concurrently with just the
right timing, they deadlock.

So even if the lockd on my machine is still running (Pentium IV, so 2 CPUs),
that is not really a proof that we have a false positive - the kernel doesn't
even use PREEMPT so the conditions may actually be very friendly to that
problem.  But CONFIG_PREEMPT and more CPUs could possibly provoke the error
quickly.

Pondering some more about the problem I cannot understand how this can be
only harmless.  Maybe it is neccessary to order the locks on levels (maybe
even introduce "an upper" level) and teach lockdep about that usage (like
described in Documentation/lockdep-design.txt) but unfortunately I won't be
able to do that :(

Comment 6 Jeff Layton 2007-09-20 12:47:53 UTC
Thanks for the confirmation. That was my interpretation too.

I think I might see what lockdep is complaining about (well, at least one
possible ordering violation):

First call chain (more or less the one in the stack traces above):

nlm4svc_proc_lock
   nlm4svc_retrieve_args
      nlmsvc_lookup_host
         nlm_lookup_host   (nlm_host_mutex is taken)
             nlm_gc_hosts
                nlmsvc_mark_resources
                   nlm_traverse_files
                      nlm_inspect_file
                         nlmsvc_traverse_blocks (file->f_mutex is taken)


...but later...

nlm4svc_proc_lock
   nlmsvc_lock    (file->f_mutex is taken)
      nlmsvc_create_block
         nlmsvc_lookup_host
            nlm_lookup_host (nlm_host_mutex is taken)
      

...so there does seem to be some ordering violation with this. It's not clear to
me that any of these functions are called outside of lockd however.

A potential fix might be to rejigger nlm_gc_hosts() such that we don't need to
hold the nlm_host_mutex when nlmsvc_mark_resources() is called, but I'd welcome
suggestions on this :-)

Comment 7 Trond Myklebust 2007-09-21 20:51:25 UTC
Created attachment 202831 [details]
NLM: Fix a circular lock dependency in lockd

The problem is that the garbage collector for the 'host' structures
nlm_gc_hosts(), holds nlm_host_mutex while calling down to
nlmsvc_mark_resources, which, eventually takes the file->f_mutex.

We cannot therefore call nlmsvc_lookup_host() from within
nlmsvc_create_block, since the caller will already hold file->f_mutex, so
the attempt to grab nlm_host_mutex may deadlock.

Fix the problem by calling nlmsvc_lookup_host() outside the file->f_mutex.

Comment 8 Jeff Layton 2007-09-22 13:25:01 UTC
Created attachment 203001 [details]
reproducer program

I've been able to reproduce this more or less at will by running this program
against the same file on the client and server. Eventually the lockdep warnings
pop.

Going to test Trond's patch now and see if it helps...

Comment 9 Jeff Layton 2007-09-22 21:34:01 UTC
Patch seems to correct the problem. Closing this with a resolution of UPSTREAM.