Bug 220966

Summary: NetLabel: 2.6.19-1.2891.fc7 - INFO: soft-safe -> soft-unsafe lock order detected
Product: [Fedora] Fedora Reporter: Miles Lane <miles.lane>
Component: kernelAssignee: Eric Paris <eparis>
Status: CLOSED CURRENTRELEASE QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: medium    
Version: rawhideCC: dex.mbox, eparis, linda.knippers, mark, paul.moore, redhat-bugzilla, rwarsow, sgrubb, wtogami
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: FC6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-03-13 19:48:34 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 221504    
Attachments:
Description Flags
Patch to David Miller's net-2.6 git tree none

Description Miles Lane 2006-12-29 18:01:50 UTC
Description of problem:

[ INFO: soft-safe -> soft-unsafe lock order detected ]
2.6.19-1.2891.fc7 #1
------------------------------------------------------
cupsd/1884 [HC0[0]:SC0[1]:HE1:SE0] is trying to acquire:
 (&ssec->nlbl_lock){--..}, at: [<c04cec37>]
selinux_netlbl_socket_setsid+0xbb/0x123

and this task is already holding:
 (af_callback_keys + sk->sk_family#3){-.-+}, at: [<c05daa1c>]
inet_accept+0x70/0xb5
which would create a new lock dependency:
 (af_callback_keys + sk->sk_family#3){-.-+} -> (&ssec->nlbl_lock){--..}

but this new dependency connects a soft-irq-safe lock:
 (af_callback_keys + sk->sk_family#3){-.-+}
... which became soft-irq-safe at:
 [<c043fff1>] __lock_acquire+0x37d/0x9f8
 [<c044094d>] lock_acquire+0x56/0x6f
 [<c05fbdb6>] _read_lock_bh+0x30/0x3d
 [<c04c687e>] selinux_socket_sock_rcv_skb+0xbd/0x252
 [<c05d0645>] tcp_v4_rcv+0x37a/0x909
 [<c05b7593>] ip_local_deliver+0x185/0x22e
 [<c05b73d6>] ip_rcv+0x418/0x450
 [<c059ae9c>] netif_receive_skb+0x2db/0x35a
 [<c059c85f>] process_backlog+0x95/0xf6
 [<c059ca46>] net_rx_action+0xa1/0x1a8
 [<c042bf5a>] __do_softirq+0x6f/0xe2
 [<c04063a1>] do_softirq+0x61/0xc7
 [<ffffffff>] 0xffffffff

to a soft-irq-unsafe lock:
 (&ssec->nlbl_lock){--..}
... which became soft-irq-unsafe at:
...  [<c044007d>] __lock_acquire+0x409/0x9f8
 [<c044094d>] lock_acquire+0x56/0x6f
 [<c05fbc89>] _spin_lock+0x2b/0x38
 [<c04cec37>] selinux_netlbl_socket_setsid+0xbb/0x123
 [<c04d0c92>] selinux_netlbl_socket_post_create+0x2d/0x2f
 [<c04c807b>] selinux_socket_post_create+0x156/0x15c
 [<c059213c>] __sock_create+0x179/0x1b2
 [<c05921ae>] sock_create+0x1a/0x1f
 [<c0592435>] sys_socket+0x1b/0x3c
 [<c0592cba>] sys_socketcall+0x77/0x241
 [<c0404050>] syscall_call+0x7/0xb
 [<ffffffff>] 0xffffffff

other info that might help us debug this:

3 locks held by cupsd/1884:
 #0:  (sk_lock-AF_INET){--..}, at: [<c05da9dd>] inet_accept+0x31/0xb5
 #1:  (af_callback_keys + sk->sk_family#3){-.-+}, at: [<c05daa1c>]
inet_accept+0x70/0xb5
 #2:  (policy_rwlock){..-?}, at: [<c04cebd0>]
selinux_netlbl_socket_setsid+0x54/0x123

the soft-irq-safe lock's dependencies:
-> (af_callback_keys + sk->sk_family#3){-.-+} ops: 0 {
  initial-use  at:
                       [<c0440098>] __lock_acquire+0x424/0x9f8
                       [<c044094d>] lock_acquire+0x56/0x6f
                       [<c05fbd3b>] _write_lock_bh+0x30/0x3d
                       [<c05c1871>] tcp_close+0x24b/0x52b
                       [<c05da570>] inet_release+0x43/0x49
                       [<c0591f72>] sock_release+0x17/0x68
                       [<c05921e0>] sock_close+0x2d/0x33
                       [<c047959b>] __fput+0xbe/0x168
                       [<c047965c>] fput+0x17/0x19
                       [<c0476f5d>] filp_close+0x54/0x5c
                       [<c0477ee1>] sys_close+0x78/0xb0
                       [<c0404050>] syscall_call+0x7/0xb
                       [<ffffffff>] 0xffffffff
  hardirq-on-W at:
                       [<c0440056>] __lock_acquire+0x3e2/0x9f8
                       [<c044094d>] lock_acquire+0x56/0x6f
                       [<c05fbd3b>] _write_lock_bh+0x30/0x3d
                       [<c05c1871>] tcp_close+0x24b/0x52b
                       [<c05da570>] inet_release+0x43/0x49
                       [<c0591f72>] sock_release+0x17/0x68
                       [<c05921e0>] sock_close+0x2d/0x33
                       [<c047959b>] __fput+0xbe/0x168
                       [<c047965c>] fput+0x17/0x19
                       [<c0476f5d>] filp_close+0x54/0x5c
                       [<c0477ee1>] sys_close+0x78/0xb0
                       [<c0404050>] syscall_call+0x7/0xb
                       [<ffffffff>] 0xffffffff
  in-softirq-R at:
                       [<c043fff1>] __lock_acquire+0x37d/0x9f8
                       [<c044094d>] lock_acquire+0x56/0x6f
                       [<c05fbdb6>] _read_lock_bh+0x30/0x3d
                       [<c04c687e>] selinux_socket_sock_rcv_skb+0xbd/0x252
                       [<c05d0645>] tcp_v4_rcv+0x37a/0x909
                       [<c05b7593>] ip_local_deliver+0x185/0x22e
                       [<c05b73d6>] ip_rcv+0x418/0x450
                       [<c059ae9c>] netif_receive_skb+0x2db/0x35a
                       [<c059c85f>] process_backlog+0x95/0xf6
                       [<c059ca46>] net_rx_action+0xa1/0x1a8
                       [<c042bf5a>] __do_softirq+0x6f/0xe2
                       [<c04063a1>] do_softirq+0x61/0xc7
                       [<ffffffff>] 0xffffffff
  hardirq-on-R at:
                       [<c044001f>] __lock_acquire+0x3ab/0x9f8
                       [<c044094d>] lock_acquire+0x56/0x6f
                       [<c05fbdb6>] _read_lock_bh+0x30/0x3d
                       [<c04c687e>] selinux_socket_sock_rcv_skb+0xbd/0x252
                       [<c05d0645>] tcp_v4_rcv+0x37a/0x909
                       [<c05b7593>] ip_local_deliver+0x185/0x22e
                       [<c05b73d6>] ip_rcv+0x418/0x450
                       [<c059ae9c>] netif_receive_skb+0x2db/0x35a
                       [<c059c85f>] process_backlog+0x95/0xf6
                       [<c059ca46>] net_rx_action+0xa1/0x1a8
                       [<c042bf5a>] __do_softirq+0x6f/0xe2
                       [<c04063a1>] do_softirq+0x61/0xc7
                       [<ffffffff>] 0xffffffff
 }
 ... key      at: [<c09f6770>] af_callback_keys+0x10/0x100

the soft-irq-unsafe lock's dependencies:
-> (&ssec->nlbl_lock){--..} ops: 0 {
  initial-use  at:
                       [<c0440098>] __lock_acquire+0x424/0x9f8
                       [<c044094d>] lock_acquire+0x56/0x6f
                       [<c05fbc89>] _spin_lock+0x2b/0x38
                       [<c04cec37>] selinux_netlbl_socket_setsid+0xbb/0x123
                       [<c04d0c92>] selinux_netlbl_socket_post_create+0x2d/0x2f
                       [<c04c807b>] selinux_socket_post_create+0x156/0x15c
                       [<c059213c>] __sock_create+0x179/0x1b2
                       [<c05921ae>] sock_create+0x1a/0x1f
                       [<c0592435>] sys_socket+0x1b/0x3c
                       [<c0592cba>] sys_socketcall+0x77/0x241
                       [<c0404050>] syscall_call+0x7/0xb
                       [<ffffffff>] 0xffffffff
  softirq-on-W at:
                       [<c044007d>] __lock_acquire+0x409/0x9f8
                       [<c044094d>] lock_acquire+0x56/0x6f
                       [<c05fbc89>] _spin_lock+0x2b/0x38
                       [<c04cec37>] selinux_netlbl_socket_setsid+0xbb/0x123
                       [<c04d0c92>] selinux_netlbl_socket_post_create+0x2d/0x2f
                       [<c04c807b>] selinux_socket_post_create+0x156/0x15c
                       [<c059213c>] __sock_create+0x179/0x1b2
                       [<c05921ae>] sock_create+0x1a/0x1f
                       [<c0592435>] sys_socket+0x1b/0x3c
                       [<c0592cba>] sys_socketcall+0x77/0x241
                       [<c0404050>] syscall_call+0x7/0xb
                       [<ffffffff>] 0xffffffff
  hardirq-on-W at:
                       [<c0440056>] __lock_acquire+0x3e2/0x9f8
                       [<c044094d>] lock_acquire+0x56/0x6f
                       [<c05fbc89>] _spin_lock+0x2b/0x38
                       [<c04cec37>] selinux_netlbl_socket_setsid+0xbb/0x123
                       [<c04d0c92>] selinux_netlbl_socket_post_create+0x2d/0x2f
                       [<c04c807b>] selinux_socket_post_create+0x156/0x15c
                       [<c059213c>] __sock_create+0x179/0x1b2
                       [<c05921ae>] sock_create+0x1a/0x1f
                       [<c0592435>] sys_socket+0x1b/0x3c
                       [<c0592cba>] sys_socketcall+0x77/0x241
                       [<c0404050>] syscall_call+0x7/0xb
                       [<ffffffff>] 0xffffffff
 }
 ... key      at: [<c09ebb20>] __key.27522+0x0/0x8

stack backtrace:
 [<c04051c1>] show_trace_log_lvl+0x1a/0x2f
 [<c0405766>] show_trace+0x12/0x14
 [<c04057ea>] dump_stack+0x16/0x18
 [<c043fbe0>] check_usage+0x242/0x24c
 [<c04404f3>] __lock_acquire+0x87f/0x9f8
 [<c044094d>] lock_acquire+0x56/0x6f
 [<c05fbc89>] _spin_lock+0x2b/0x38
 [<c04cec37>] selinux_netlbl_socket_setsid+0xbb/0x123
 [<c04d0f40>] selinux_netlbl_sock_graft+0xc7/0xcf
 [<c04c4b74>] selinux_sock_graft+0x32/0x36
 [<c05daa3b>] inet_accept+0x8f/0xb5
 [<c0592b93>] sys_accept+0xd0/0x180
 [<c0592d13>] sys_socketcall+0xd0/0x241
 [<c0404050>] syscall_call+0x7/0xb

Version-Release number of selected component (if applicable):
2.6.19-1.2891.fc7

How reproducible:
This seems to happen every time.

Comment 1 Steve Grubb 2006-12-30 12:47:29 UTC
Paul, Eric...any ideas? Are we missing some patches for 2.6.19  that went into
netdev tree that might account for this problem? Thanks.

Comment 2 Paul Moore 2007-01-02 21:31:17 UTC
I'm currently chasing two other bugs right now, but I'll grab the latest 
FC/Rawhide kernel sources and take a look sometime this week.  Thanks for 
capturing all of the debug information Miles.

Comment 3 Paul Moore 2007-01-03 20:48:58 UTC
I believe the following patch should correct this problem:

http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=9883a13c72dbf8c518814b6091019643cdb34429

I'll try rebuilding the 2.6.19-1.2891.fc7 kernel with this patch applied and 
see what I get.  In the meantime if Miles wants to give it a shot too that 
would be helpful.

Comment 4 Eric Paris 2007-01-04 17:22:27 UTC
there should be some test rawhide kernels with the patch in question at
http://people.redhat.com/eparis/BZ220966

please let us know what you find with these kernels

Comment 5 Paul Moore 2007-01-04 22:27:30 UTC
I believe there patch I mentioned in comment #3 does not entirely solve this 
problem as there is still a softirq "unsafe" spinlock in 
selinux_netlbl_socket_setsid().  I am currently working on a patch for to use 
the correct locking calls, I will update the BZ when I have a patch.

Comment 6 Paul Moore 2007-01-04 23:54:08 UTC
Created attachment 144868 [details]
Patch to David Miller's net-2.6 git tree

This patch should address the problem described here, however, due to other
problems with the 2.6.19-1.2891.fc7 I am having difficulty testing this to
verify that it both solves the problem and does not introduce any new
regressions.

I am in the process of building a new kernel, I'll update once I know more.

Comment 7 Miles Lane 2007-01-05 03:55:38 UTC
I am currently on a business trip and will need to look into these patches when
I get home.  Thanks.

Comment 8 Paul Moore 2007-01-05 19:56:54 UTC
I was able to get a kernel working with this fix today and I have been playing 
with it for the past couple of hours and it appears to fix the problem for me.  
I'll be pushing this patch upstream later today.

Comment 9 Eric Paris 2007-01-08 19:30:54 UTC
my people page now has fedora kernels with both of these fixes.  will wait for
reporters comments before claiming it is 'fixed upstream'

Comment 10 dex 2007-01-09 10:44:35 UTC
this bug seems to duplicate 220726

Comment 11 Eric Paris 2007-01-09 14:40:10 UTC
*** Bug 220726 has been marked as a duplicate of this bug. ***

Comment 12 Paul Moore 2007-01-09 16:40:19 UTC
(In reply to comment #10)
> this bug seems to duplicate 220726

Hi dex,

Can you verify that Eric's test kernels solve the lockdep problem you are 
seeing?

Thanks.

Comment 13 dex 2007-01-09 19:07:49 UTC
uname -a
Linux dexterFC5t1 2.6.19-1.2904.2.2.fc7.2.BZ220966.TESTING.ONLY #0 SMP Fri Jan 5
14:02:45 EST 2007 i686 athlon i386 GNU/Linux

testing ... so far so good a couple of hours up all's quiet 

Comment 14 Paul Moore 2007-01-09 19:10:58 UTC
Great, thanks for testing the fix.  If you see anything strange be sure to 
post an update.

Comment 15 dex 2007-01-10 08:46:15 UTC
from #13

uptime
 08:41:39 up 14:32,  0 users,  load average: 0.30, 0.34, 0.28

no sign of bug here under normal daily load well done.


Comment 16 Eric Paris 2007-02-14 15:41:19 UTC
*** Bug 223649 has been marked as a duplicate of this bug. ***

Comment 17 Eric Paris 2007-02-14 15:44:59 UTC
Moving back to waiting on me to check the upstream and FC devel kernel status of
this fix.

Comment 18 Eric Paris 2007-03-13 19:48:34 UTC
closing as this is fixed in the shipping FC6 kernels.