Bug 200038

Summary:	circular lock problem
Product:	[Fedora] Fedora	Reporter:	Trever Adams <trever>
Component:	kernel	Assignee:	Kernel Maintainer List <kernel-maint>
Status:	CLOSED RAWHIDE	QA Contact:	Brian Brock <bbrock>
Severity:	high	Docs Contact:
Priority:	high
Version:	6	CC:	jvdias, wtogami
Target Milestone:	---
Target Release:	---
Hardware:	i586
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2006-09-13 22:34:10 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Trever Adams 2006-07-25 00:06:19 UTC

Description of problem:
 [<c042cb4d>] lock_acquire+0x4b/0x6a
 [<c05ee648>] _spin_lock+0x19/0x28
 [<c05997c0>] dev_queue_xmit+0xda/0x1fe
 [<c059e08a>] neigh_resolve_output+0x1cd/0x1f9
 [<c05b617f>] ip_output+0x1c6/0x1f3
 [<d09ee7ce>] ipip6_tunnel_xmit+0x546/0x5ac [ipv6]
 [<c0597eb8>] dev_hard_start_xmit+0x15a/0x1b7
 [<c059983a>] dev_queue_xmit+0x154/0x1fe
 [<d09e6bfe>] ip6_output2+0x1db/0x1f9 [ipv6]
 [<d09e72d6>] ip6_output+0x6ba/0x6f4 [ipv6]
 [<d09e776a>] ip6_push_pending_frames+0x2c6/0x384 [ipv6]
 [<d09f624b>] udp_v6_push_pending_frames+0x154/0x176 [ipv6]
 [<d09f76ca>] udpv6_sendmsg+0x5ae/0x7b3 [ipv6]
 [<c05d1b98>] inet_sendmsg+0x3b/0x48
 [<c058e7a7>] sock_sendmsg+0xe8/0x103
 [<c058ef75>] sys_sendmsg+0x14d/0x1a8
 [<c0590098>] sys_socketcall+0x16b/0x186
 [<c0402e27>] syscall_call+0x7/0xb


Version-Release number of selected component (if applicable):
bind-9.3.2-32.fc6
kernel-2.6.17-1.2439.fc6

How reproducible:
I have been seeing this all day with no madwifi loaded. I haven't noticed it in
the logs before.

Steps to Reproduce:
1. Unknown
  
Actual results:
The log excerpt from above.

Comment 1 Trever Adams 2006-07-25 00:08:16 UTC

       [<c05d132e>] inet_ioctl+0x73/0x91
       [<c058ecfc>] sock_ioctl+0x1b5/0x1d3
       [<c046e3a5>] do_ioctl+0x19/0x50
       [<c046e5dc>] vfs_ioctl+0x200/0x217
       [<c046e63a>] sys_ioctl+0x47/0x61
       [<c0402e27>] syscall_call+0x7/0xb

       [<c058ef75>] sys_sendmsg+0x14d/0x1a8
       [<c0590098>] sys_socketcall+0x16b/0x186
       [<c0402e27>] syscall_call+0x7/0xb

other info that might help us debug this:

2 locks held by named/1528:
 #0:  (sk_lock-AF_INET6){--..}, at: [<d09f763b>] udpv6_sendmsg+0x51f/0x7b3 [ipv6]
 #1:  (&dev->_xmit_lock){-+..}, at: [<c059981f>] dev_queue_xmit+0x139/0x1fe

 [<c05b617f>] ip_output+0x1c6/0x1f3
 [<d09ee7ce>] ipip6_tunnel_xmit+0x546/0x5ac [ipv6]
 [<c0597eb8>] dev_hard_start_xmit+0x15a/0x1b7
 [<c059983a>] dev_queue_xmit+0x154/0x1fe
 [<d09e6bfe>] ip6_output2+0x1db/0x1f9 [ipv6]
 [<d09e72d6>] ip6_output+0x6ba/0x6f4 [ipv6]
 [<d09e776a>] ip6_push_pending_frames+0x2c6/0x384 [ipv6]
 [<d09f624b>] udp_v6_push_pending_frames+0x154/0x176 [ipv6]
 [<d09f76ca>] udpv6_sendmsg+0x5ae/0x7b3 [ipv6]
 [<c05d1b98>] inet_sendmsg+0x3b/0x48
 [<c058e7a7>] sock_sendmsg+0xe8/0x103
 [<c058ef75>] sys_sendmsg+0x14d/0x1a8
 [<c0590098>] sys_socketcall+0x16b/0x186
 [<c0402e27>] syscall_call+0x7/0xb



Sorry, the paste didn't work before. This is the complete excerpt.

Comment 2 Trever Adams 2006-07-25 00:10:33 UTC

=======================================================
[ INFO: possible circular locking dependency detected ]
-------------------------------------------------------
named/1528 is trying to acquire lock:
 (&dev->queue_lock){-+..}, at: [<c05997c0>] dev_queue_xmit+0xda/0x1fe

but task is already holding lock:
 (&dev->_xmit_lock){-+..}, at: [<c059981f>] dev_queue_xmit+0x139/0x1fe

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #1 (&dev->_xmit_lock){-+..}:
       [<c042cb4d>] lock_acquire+0x4b/0x6a
       [<c05ee675>] _spin_lock_bh+0x1e/0x2d
       [<c05a6267>] dev_activate+0xfe/0x12f
       [<c0598b01>] dev_open+0x4c/0x64
       [<c0597556>] dev_change_flags+0x50/0xf0
       [<c05d0db0>] devinet_ioctl+0x235/0x519
       [<c05d132e>] inet_ioctl+0x73/0x91
       [<c058ecfc>] sock_ioctl+0x1b5/0x1d3
       [<c046e3a5>] do_ioctl+0x19/0x50
       [<c046e5dc>] vfs_ioctl+0x200/0x217
       [<c046e63a>] sys_ioctl+0x47/0x61
       [<c0402e27>] syscall_call+0x7/0xb

-> #0 (&dev->queue_lock){-+..}:
       [<c042cb4d>] lock_acquire+0x4b/0x6a
       [<c05ee648>] _spin_lock+0x19/0x28
       [<c05997c0>] dev_queue_xmit+0xda/0x1fe
       [<c059e08a>] neigh_resolve_output+0x1cd/0x1f9
       [<c05b617f>] ip_output+0x1c6/0x1f3
       [<d09ee7ce>] ipip6_tunnel_xmit+0x546/0x5ac [ipv6]
       [<c0597eb8>] dev_hard_start_xmit+0x15a/0x1b7
       [<c059983a>] dev_queue_xmit+0x154/0x1fe
       [<d09e6bfe>] ip6_output2+0x1db/0x1f9 [ipv6]
       [<d09e72d6>] ip6_output+0x6ba/0x6f4 [ipv6]
       [<d09e776a>] ip6_push_pending_frames+0x2c6/0x384 [ipv6]
       [<d09f624b>] udp_v6_push_pending_frames+0x154/0x176 [ipv6]
       [<d09f76ca>] udpv6_sendmsg+0x5ae/0x7b3 [ipv6]
       [<c05d1b98>] inet_sendmsg+0x3b/0x48
       [<c058e7a7>] sock_sendmsg+0xe8/0x103
       [<c058ef75>] sys_sendmsg+0x14d/0x1a8
       [<c0590098>] sys_socketcall+0x16b/0x186
       [<c0402e27>] syscall_call+0x7/0xb


other info that might help us debug this:

2 locks held by named/1528:
 #0:  (sk_lock-AF_INET6){--..}, at: [<d09f763b>] udpv6_sendmsg+0x51f/0x7b3 [ipv6]
 #1:  (&dev->_xmit_lock){-+..}, at: [<c059981f>] dev_queue_xmit+0x139/0x1fe

stack backtrace:
 [<c04037a2>] show_trace_log_lvl+0x54/0xfd
 [<c0403d20>] show_trace+0xd/0x10
 [<c0403dbd>] dump_stack+0x19/0x1b
 [<c042bc4a>] print_circular_bug_tail+0x59/0x64
 [<c042c438>] __lock_acquire+0x7e3/0x987
 [<c042cb4d>] lock_acquire+0x4b/0x6a
 [<c05ee648>] _spin_lock+0x19/0x28
 [<c05997c0>] dev_queue_xmit+0xda/0x1fe
 [<c059e08a>] neigh_resolve_output+0x1cd/0x1f9
 [<c05b617f>] ip_output+0x1c6/0x1f3
 [<d09ee7ce>] ipip6_tunnel_xmit+0x546/0x5ac [ipv6]
 [<c0597eb8>] dev_hard_start_xmit+0x15a/0x1b7
 [<c059983a>] dev_queue_xmit+0x154/0x1fe
 [<d09e6bfe>] ip6_output2+0x1db/0x1f9 [ipv6]
 [<d09e72d6>] ip6_output+0x6ba/0x6f4 [ipv6]
 [<d09e776a>] ip6_push_pending_frames+0x2c6/0x384 [ipv6]
 [<d09f624b>] udp_v6_push_pending_frames+0x154/0x176 [ipv6]
 [<d09f76ca>] udpv6_sendmsg+0x5ae/0x7b3 [ipv6]
 [<c05d1b98>] inet_sendmsg+0x3b/0x48
 [<c058e7a7>] sock_sendmsg+0xe8/0x103
 [<c058ef75>] sys_sendmsg+0x14d/0x1a8
 [<c0590098>] sys_socketcall+0x16b/0x186
 [<c0402e27>] syscall_call+0x7/0xb

Comment 3 Jason Vas Dias 2006-07-25 14:59:01 UTC

This appears to be an issue with the new kernel lock detection code - 
tentatively moving this bug to the kernel component.

Please tell us :
 o Is this a multiprocessor machine ? How many processors ?
   (perhaps the output of /proc/cpuinfo might be useful)

 o Is there any degradation in named performance - ie. does the named process   
   actually lock up / hang or is there just the kernel stack dump and the 
   'possible circular locking dependency detected' log message ?
   If named is locking up, please enable named debugging:
   1. Ensure named.conf puts the named.run log in a writable directory,
      eg.:
      'logging {
               channel default_debug {
                     file "data/named.run";
                     severity dynamic;
               };
       };'
   2. Turn on named debugging from the command line with named running:
      # rndc trace 99
   3. Then send us both the $ROOTDIR/var/named/data/named.run file 
      (where $ROOTDIR may be set in /etc/sysconfig/named) and the 
      /var/log/messages containing the kernel lock detection message
      from the same time.
   Then we should be able to determine what named is doing to cause this
   kernel lock detection mechanism to trigger.
   If named is not locking up, then it appears to be a false alarm raised
   by the new kernel lock detection code.

Comment 4 Trever Adams 2006-07-25 19:14:51 UTC

I have not seen named locking up. I just figured it would be important to post
the bug. processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 5
model           : 8
model name      : AMD-K6(tm) 3D processor
stepping        : 12
cpu MHz         : 501.147
cache size      : 64 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu vme de pse tsc msr mce cx8 pge mmx syscall 3dnow k6_mtrr
bogomips        : 1004.78


This is a single processor machine. This output is from the last kernel I used
and is currently tainted (the original bug report was not tainted).

Comment 7 Trever Adams 2006-09-05 12:09:34 UTC

Give me a few more days, but I have not seen this in the last few weeks. I will
close it if I see nothing within the next week.

Comment 8 Trever Adams 2006-09-13 22:34:10 UTC

This has been fixed. Closing.