280311 – circular locking in NLM subsystem

Bug 280311 - circular locking in NLM subsystem

Summary: circular locking in NLM subsystem

Keywords:
Status:	CLOSED UPSTREAM
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	kernel
Sub Component:
Version:	rawhide
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	low
Target Milestone:	---
Assignee:	Jeff Layton
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2007-09-06 11:15 UTC by Jeff Layton
Modified:	2007-11-30 22:12 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2007-09-22 21:34:01 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
kernel log (5.48 KB, text/plain) 2007-09-15 02:08 UTC, Kenneth Johansson	no flags	Details
NLM: Fix a circular lock dependency in lockd (3.38 KB, patch) 2007-09-21 20:51 UTC, Trond Myklebust	no flags	Details \| Diff
reproducer program (633 bytes, text/plain) 2007-09-22 13:25 UTC, Jeff Layton	no flags	Details
View All

Description Jeff Layton 2007-09-06 11:15:50 UTC

While trying to reproduce the problem in bug #253754 in rawhide, I saw this pop
up on a debug kernel. The situation was that a file on an exported filesystem
was locked by a local process and a client was blocking trying to get a lock on
the same file.

I was also playing with iptables rules at the time:


=======================================================
[ INFO: possible circular locking dependency detected ]
2.6.23-0.130.rc3.git5.killattr.1.fc8debug #1
-------------------------------------------------------
lockd/2105 is trying to acquire lock:
 (&file->f_mutex){--..}, at: [<ffffffff88279493>]
nlmsvc_traverse_blocks+0x26/0x9f [lockd]

but task is already holding lock:
 (nlm_host_mutex){--..}, at: [<ffffffff88277b0a>] nlm_lookup_host+0xda/0x3a4 [lockd]

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #1 (nlm_host_mutex){--..}:
       [<ffffffff81055121>] __lock_acquire+0xac0/0xcc9
       [<ffffffff81055740>] lock_acquire+0x5a/0x73
       [<ffffffff88277b0a>] nlm_lookup_host+0xda/0x3a4 [lockd]
       [<ffffffff81051c61>] debug_mutex_lock_common+0x16/0x23
       [<ffffffff812716e3>] __mutex_lock_slowpath+0xf6/0x2a6
       [<ffffffff81053a2c>] check_usage_backwards+0x1e/0x43
       [<ffffffff88277b0a>] nlm_lookup_host+0xda/0x3a4 [lockd]
       [<ffffffff81053ff7>] mark_held_locks+0x49/0x67
       [<ffffffff8127187b>] __mutex_lock_slowpath+0x28e/0x2a6
       [<ffffffff810541e9>] trace_hardirqs_on+0x12e/0x151
       [<ffffffff88277e28>] nlmsvc_lookup_host+0x54/0x59 [lockd]
       [<ffffffff88279718>] nlmsvc_lock+0xb9/0x319 [lockd]
       [<ffffffff8827ce66>] nlm4svc_proc_lock+0x8f/0xda [lockd]
       [<ffffffff8820d758>] svc_process+0x385/0x709 [sunrpc]
       [<ffffffff810541e9>] trace_hardirqs_on+0x12e/0x151
       [<ffffffff882786c5>] lockd+0x18a/0x269 [lockd]
       [<ffffffff81272a7b>] trace_hardirqs_on_thunk+0x35/0x37
       [<ffffffff810541e9>] trace_hardirqs_on+0x12e/0x151
       [<ffffffff81272a7b>] trace_hardirqs_on_thunk+0x35/0x37
       [<ffffffff8100cb18>] child_rip+0xa/0x12
       [<ffffffff8100c22c>] restore_args+0x0/0x30
       [<ffffffff8820cc24>] __svc_create_thread+0xe6/0x1ef [sunrpc]
       [<ffffffff8827853b>] lockd+0x0/0x269 [lockd]
       [<ffffffff8100cb0e>] child_rip+0x0/0x12
       [<ffffffffffffffff>] 0xffffffffffffffff

-> #0 (&file->f_mutex){--..}:
       [<ffffffff81053495>] print_circular_bug_tail+0x32/0x72
       [<ffffffff810538dc>] print_circular_bug_header+0xcc/0xd3
       [<ffffffff81055022>] __lock_acquire+0x9c1/0xcc9
       [<ffffffff81055740>] lock_acquire+0x5a/0x73
       [<ffffffff88279493>] nlmsvc_traverse_blocks+0x26/0x9f [lockd]
       [<ffffffff81051c61>] debug_mutex_lock_common+0x16/0x23
       [<ffffffff812716e3>] __mutex_lock_slowpath+0xf6/0x2a6
       [<ffffffff8827a684>] nlmsvc_mark_host+0x0/0x7 [lockd]
       [<ffffffff88279493>] nlmsvc_traverse_blocks+0x26/0x9f [lockd]
       [<ffffffff8827a684>] nlmsvc_mark_host+0x0/0x7 [lockd]
       [<ffffffff8827a86d>] nlm_traverse_files+0x1be/0x216 [lockd]
       [<ffffffff81053ff7>] mark_held_locks+0x49/0x67
       [<ffffffff8127187b>] __mutex_lock_slowpath+0x28e/0x2a6
       [<ffffffff810541e9>] trace_hardirqs_on+0x12e/0x151
       [<ffffffff81271886>] __mutex_lock_slowpath+0x299/0x2a6
       [<ffffffff882775a8>] nlm_gc_hosts+0x54/0x18d [lockd]
       [<ffffffff88277b24>] nlm_lookup_host+0xf4/0x3a4 [lockd]
       [<ffffffff8100c22c>] restore_args+0x0/0x30
       [<ffffffff81043b48>] groups_alloc+0x4b/0xd4
       [<ffffffff88277e28>] nlmsvc_lookup_host+0x54/0x59 [lockd]
       [<ffffffff8827ca1e>] nlm4svc_retrieve_args+0x42/0xd7 [lockd]
       [<ffffffff8827ce3b>] nlm4svc_proc_lock+0x64/0xda [lockd]
       [<ffffffff8827c4e5>] nlm4svc_decode_lockargs+0x37/0x9a [lockd]
       [<ffffffff8820d758>] svc_process+0x385/0x709 [sunrpc]
       [<ffffffff810541e9>] trace_hardirqs_on+0x12e/0x151
       [<ffffffff882786c5>] lockd+0x18a/0x269 [lockd]
       [<ffffffff81272a7b>] trace_hardirqs_on_thunk+0x35/0x37
       [<ffffffff810541e9>] trace_hardirqs_on+0x12e/0x151
       [<ffffffff81272a7b>] trace_hardirqs_on_thunk+0x35/0x37
       [<ffffffff8100cb18>] child_rip+0xa/0x12
       [<ffffffff8100c22c>] restore_args+0x0/0x30
       [<ffffffff8820cc24>] __svc_create_thread+0xe6/0x1ef [sunrpc]
       [<ffffffff8827853b>] lockd+0x0/0x269 [lockd]
       [<ffffffff8100cb0e>] child_rip+0x0/0x12
       [<ffffffffffffffff>] 0xffffffffffffffff

other info that might help us debug this:

1 lock held by lockd/2105:
 #0:  (nlm_host_mutex){--..}, at: [<ffffffff88277b0a>]
nlm_lookup_host+0xda/0x3a4 [lockd]

stack backtrace:

Call Trace:
 [<ffffffff810534cc>] print_circular_bug_tail+0x69/0x72
 [<ffffffff810538dc>] print_circular_bug_header+0xcc/0xd3
 [<ffffffff81055022>] __lock_acquire+0x9c1/0xcc9
 [<ffffffff81055740>] lock_acquire+0x5a/0x73
 [<ffffffff88279493>] :lockd:nlmsvc_traverse_blocks+0x26/0x9f
 [<ffffffff81051c61>] debug_mutex_lock_common+0x16/0x23
 [<ffffffff812716e3>] __mutex_lock_slowpath+0xf6/0x2a6
 [<ffffffff8827a684>] :lockd:nlmsvc_mark_host+0x0/0x7
 [<ffffffff88279493>] :lockd:nlmsvc_traverse_blocks+0x26/0x9f
 [<ffffffff8827a684>] :lockd:nlmsvc_mark_host+0x0/0x7
 [<ffffffff8827a86d>] :lockd:nlm_traverse_files+0x1be/0x216
 [<ffffffff81053ff7>] mark_held_locks+0x49/0x67
 [<ffffffff8127187b>] __mutex_lock_slowpath+0x28e/0x2a6
 [<ffffffff810541e9>] trace_hardirqs_on+0x12e/0x151
 [<ffffffff81271886>] __mutex_lock_slowpath+0x299/0x2a6
 [<ffffffff882775a8>] :lockd:nlm_gc_hosts+0x54/0x18d
 [<ffffffff88277b24>] :lockd:nlm_lookup_host+0xf4/0x3a4
 [<ffffffff8100c22c>] restore_args+0x0/0x30
 [<ffffffff81043b48>] groups_alloc+0x4b/0xd4
 [<ffffffff88277e28>] :lockd:nlmsvc_lookup_host+0x54/0x59
 [<ffffffff8827ca1e>] :lockd:nlm4svc_retrieve_args+0x42/0xd7
 [<ffffffff8827ce3b>] :lockd:nlm4svc_proc_lock+0x64/0xda
 [<ffffffff8827c4e5>] :lockd:nlm4svc_decode_lockargs+0x37/0x9a
 [<ffffffff8820d758>] :sunrpc:svc_process+0x385/0x709
 [<ffffffff810541e9>] trace_hardirqs_on+0x12e/0x151
 [<ffffffff882786c5>] :lockd:lockd+0x18a/0x269
 [<ffffffff81272a7b>] trace_hardirqs_on_thunk+0x35/0x37
 [<ffffffff810541e9>] trace_hardirqs_on+0x12e/0x151
 [<ffffffff81272a7b>] trace_hardirqs_on_thunk+0x35/0x37
 [<ffffffff8100cb18>] child_rip+0xa/0x12
 [<ffffffff8100c22c>] restore_args+0x0/0x30
 [<ffffffff8820cc24>] :sunrpc:__svc_create_thread+0xe6/0x1ef
 [<ffffffff8827853b>] :lockd:lockd+0x0/0x269
 [<ffffffff8100cb0e>] child_rip+0x0/0x12

Comment 1 Jeff Layton 2007-09-06 11:57:16 UTC

I'm not exactly clear on what lockdep is complaining about, but I think it's
lock ordering. It had already seen lockd take the f_mutex then the
nlm_host_mutex, but apparently the codepath above is now doing this in the
reverse order.

Comment 2 Kenneth Johansson 2007-09-15 02:08:54 UTC

Created attachment 196301 [details]
kernel log

Comment 3 Detlev Zundel 2007-09-19 10:33:23 UTC

As an additional data point, I am seeing the same problem with 2.6.22.5-49.fc6debug:

=======================================================
[ INFO: possible circular locking dependency detected ]
2.6.22.5-49.fc6debug #1
-------------------------------------------------------
lockd/3100 is trying to acquire lock:
 (&file->f_mutex){--..}, at: [<c0646a8e>] mutex_lock+0x21/0x24

but task is already holding lock:
 (nlm_host_mutex){--..}, at: [<c0646a8e>] mutex_lock+0x21/0x24

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #1 (nlm_host_mutex){--..}:
       [<c0446a92>] __lock_acquire+0x9ea/0xb75
       [<c044700f>] lock_acquire+0x56/0x6f
       [<c06468f6>] __mutex_lock_slowpath+0xf7/0x26e
       [<c0646a8e>] mutex_lock+0x21/0x24
       [<f8ee491c>] nlm_lookup_host+0x8f/0x2d9 [lockd]
       [<f8ee4b96>] nlmsvc_lookup_host+0x30/0x37 [lockd]
       [<f8ee6393>] nlmsvc_lock+0xc1/0x2f6 [lockd]
       [<f8ee9a25>] nlm4svc_proc_lock+0x8b/0xd3 [lockd]
       [<f8f4abc5>] svc_process+0x336/0x66d [sunrpc]
       [<f8ee5363>] lockd+0x14c/0x225 [lockd]
       [<c0405c8f>] kernel_thread_helper+0x7/0x10
       [<ffffffff>] 0xffffffff

-> #0 (&file->f_mutex){--..}:
       [<c0446976>] __lock_acquire+0x8ce/0xb75
       [<c044700f>] lock_acquire+0x56/0x6f
       [<c06468f6>] __mutex_lock_slowpath+0xf7/0x26e
       [<c0646a8e>] mutex_lock+0x21/0x24
       [<f8ee6120>] nlmsvc_traverse_blocks+0x22/0x8c [lockd]
       [<f8ee7418>] nlm_traverse_files+0x1b1/0x20c [lockd]
       [<f8ee74f5>] nlmsvc_mark_resources+0x27/0x29 [lockd]
       [<f8ee446c>] nlm_gc_hosts+0x49/0x19b [lockd]
       [<f8ee4933>] nlm_lookup_host+0xa6/0x2d9 [lockd]
       [<f8ee4b96>] nlmsvc_lookup_host+0x30/0x37 [lockd]
       [<f8ee9570>] nlm4svc_retrieve_args+0x38/0xb6 [lockd]
       [<f8ee99f9>] nlm4svc_proc_lock+0x5f/0xd3 [lockd]
       [<f8f4abc5>] svc_process+0x336/0x66d [sunrpc]
       [<f8ee5363>] lockd+0x14c/0x225 [lockd]
       [<c0405c8f>] kernel_thread_helper+0x7/0x10
       [<ffffffff>] 0xffffffff

other info that might help us debug this:

1 lock held by lockd/3100:
 #0:  (nlm_host_mutex){--..}, at: [<c0646a8e>] mutex_lock+0x21/0x24

stack backtrace:
 [<c04061ac>] show_trace_log_lvl+0x1a/0x2f
 [<c0406c30>] show_trace+0x12/0x14
 [<c0406c89>] dump_stack+0x16/0x18
 [<c04451c3>] print_circular_bug_tail+0x5f/0x68
 [<c0446976>] __lock_acquire+0x8ce/0xb75
 [<c044700f>] lock_acquire+0x56/0x6f
 [<c06468f6>] __mutex_lock_slowpath+0xf7/0x26e
 [<c0646a8e>] mutex_lock+0x21/0x24
 [<f8ee6120>] nlmsvc_traverse_blocks+0x22/0x8c [lockd]
 [<f8ee7418>] nlm_traverse_files+0x1b1/0x20c [lockd]
 [<f8ee74f5>] nlmsvc_mark_resources+0x27/0x29 [lockd]
 [<f8ee446c>] nlm_gc_hosts+0x49/0x19b [lockd]
 [<f8ee4933>] nlm_lookup_host+0xa6/0x2d9 [lockd]
 [<f8ee4b96>] nlmsvc_lookup_host+0x30/0x37 [lockd]
 [<f8ee9570>] nlm4svc_retrieve_args+0x38/0xb6 [lockd]
 [<f8ee99f9>] nlm4svc_proc_lock+0x5f/0xd3 [lockd]
 [<f8f4abc5>] svc_process+0x336/0x66d [sunrpc]
 [<f8ee5363>] lockd+0x14c/0x225 [lockd]
 [<c0405c8f>] kernel_thread_helper+0x7/0x10
 =======================

Comment 4 Jeff Layton 2007-09-19 20:12:46 UTC

Detlev,
  Did you happen to notice whether lockd was hung after seeing this warning?
Unfortunately I've not been able to reproduce this since (though I haven't tried
too hard)...

After asking someone more knowledgeable about lockdep, I think the issue is that
the kernel thinks there is a problem with lock ordering. The only takers of
these locks seem to be lockd though, so ordering shouldn't really be much of an
issue. From the stack traces above, the 

Supposedly, lockdep can get this wrong sometimes...maybe this is one of those cases?

I'd be less inclined to think so if lockd was really hung after these messages
popped...

Comment 5 Detlev Zundel 2007-09-20 10:27:39 UTC

Jeff,
lockd did not hang after the message.  In fact it is still running nicely.

From what I understand of lockdep, it warns us that one codepath took the 2
locks already in a different order than what is currently tried - so it _may_
happen that if both these codepaths should ever run concurrently with just the
right timing, they deadlock.

So even if the lockd on my machine is still running (Pentium IV, so 2 CPUs),
that is not really a proof that we have a false positive - the kernel doesn't
even use PREEMPT so the conditions may actually be very friendly to that
problem.  But CONFIG_PREEMPT and more CPUs could possibly provoke the error
quickly.

Pondering some more about the problem I cannot understand how this can be
only harmless.  Maybe it is neccessary to order the locks on levels (maybe
even introduce "an upper" level) and teach lockdep about that usage (like
described in Documentation/lockdep-design.txt) but unfortunately I won't be
able to do that :(

Comment 6 Jeff Layton 2007-09-20 12:47:53 UTC

Thanks for the confirmation. That was my interpretation too.

I think I might see what lockdep is complaining about (well, at least one
possible ordering violation):

First call chain (more or less the one in the stack traces above):

nlm4svc_proc_lock
   nlm4svc_retrieve_args
      nlmsvc_lookup_host
         nlm_lookup_host   (nlm_host_mutex is taken)
             nlm_gc_hosts
                nlmsvc_mark_resources
                   nlm_traverse_files
                      nlm_inspect_file
                         nlmsvc_traverse_blocks (file->f_mutex is taken)


...but later...

nlm4svc_proc_lock
   nlmsvc_lock    (file->f_mutex is taken)
      nlmsvc_create_block
         nlmsvc_lookup_host
            nlm_lookup_host (nlm_host_mutex is taken)
      

...so there does seem to be some ordering violation with this. It's not clear to
me that any of these functions are called outside of lockd however.

A potential fix might be to rejigger nlm_gc_hosts() such that we don't need to
hold the nlm_host_mutex when nlmsvc_mark_resources() is called, but I'd welcome
suggestions on this :-)

Comment 7 Trond Myklebust 2007-09-21 20:51:25 UTC

Created attachment 202831 [details]
NLM: Fix a circular lock dependency in lockd

The problem is that the garbage collector for the 'host' structures
nlm_gc_hosts(), holds nlm_host_mutex while calling down to
nlmsvc_mark_resources, which, eventually takes the file->f_mutex.

We cannot therefore call nlmsvc_lookup_host() from within
nlmsvc_create_block, since the caller will already hold file->f_mutex, so
the attempt to grab nlm_host_mutex may deadlock.

Fix the problem by calling nlmsvc_lookup_host() outside the file->f_mutex.

Comment 8 Jeff Layton 2007-09-22 13:25:01 UTC

Created attachment 203001 [details]
reproducer program

I've been able to reproduce this more or less at will by running this program
against the same file on the client and server. Eventually the lockdep warnings
pop.

Going to test Trond's patch now and see if it helps...

Comment 9 Jeff Layton 2007-09-22 21:34:01 UTC

Patch seems to correct the problem. Closing this with a resolution of UPSTREAM.

Note You need to log in before you can comment on or make changes to this bug.