Bug 167715

Summary: ipv6 badness in dst_release at include/net/dst.h:149
Product: Red Hat Enterprise Linux 4 Reporter: Rob Braun <bbraun>
Component: kernelAssignee: David Miller <davem>
Status: CLOSED ERRATA QA Contact: Brian Brock <bbrock>
Severity: high Docs Contact:
Priority: high    
Version: 4.0CC: abo, jbaron, lwhatley, persteinar.iversen, terry
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: RHBA-2007-0304 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-05-01 23:21:17 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 176344    
Attachments:
Description Flags
Fix for ipv6 DST locking, backported from upstream. none

Description Rob Braun 2005-09-07 15:32:32 UTC
Description of problem:
By default, this machine came up with IPv6 enabled.  No IPv6 services were
actively being used, although several other IPv6 enabled devices were on the
same switch, so there was IPv6 traffic being seen by this machine.  The IPv6
traffic was mostly router advertisements and such.  I do not know what other
systems were connected to the switch, as the machine is colocated at a remote
data center.  After several hours of being on the network, I started recieving
these messages in the log.  The machine also started rebooting, although I have
not 100% correlated the rebooting to this message.  This is the only suspicious
message in the logs prior to rebooting.

Sep  2 18:46:30 lh kernel: Badness in dst_release at include/net/dst.h:149
Sep  2 18:46:31 lh kernel:  [<f8bb3f3b>] ip6_dst_check+0x42/0x4a [ipv6] Sep  2
18:46:31 lh kernel:  [<f8bac962>] ip6_dst_lookup+0x3d/0x12f [ipv6]
Sep  2 18:46:31 lh kernel:  [<f8bbbafe>] udpv6_sendmsg+0x4af/0x769 [ipv6]
Sep  2 18:46:31 lh kernel:  [<c01a2203>] socket_has_perm+0x51/0x5f Sep  2
18:46:31 lh kernel:  [<c02acad1>] inet_sendmsg+0x38/0x42
Sep  2 18:46:31 lh kernel:  [<c026aa05>] sock_sendmsg+0xdb/0xf7 Sep  2 18:46:31
lh kernel:  [<c01281f8>] update_wall_time+0x8/0x30
Sep  2 18:46:31 lh kernel:  [<c011e912>] autoremove_wake_function+0x0/0x2d Sep 
2 18:46:31 lh kernel:  [<c026fa96>] verify_iovec+0x76/0xc2
Sep  2 18:46:31 lh kernel:  [<c026c157>] sys_sendmsg+0x1ee/0x23b
Sep  2 18:46:31 lh kernel:  [<c011b599>] activate_task+0x88/0x95
Sep  2 18:46:31 lh kernel:  [<c011ba1a>] try_to_wake_up+0x222/0x22d
Sep  2 18:46:31 lh kernel:  [<c011cf19>] __wake_up+0x29/0x3c
Sep  2 18:46:31 lh kernel:  [<c0132678>] wake_futex+0x3a/0x44
Sep  2 18:46:31 lh kernel:  [<c0132721>] futex_wake+0x9f/0xc5
Sep  2 18:46:31 lh kernel:  [<c026c540>] sys_socketcall+0x1c1/0x1dd
Sep  2 18:46:31 lh kernel:  [<c0124489>] sys_gettimeofday+0x53/0xac
Sep  2 18:46:31 lh kernel:  [<c02c62a3>] syscall_call+0x7/0xb
Sep  2 18:46:31 lh kernel:  [<c02c007b>] unix_detach_fds+0x2e/0x31

As a workaround, I have disabled IPv6, and this appears to have stopped happening.

Version-Release number of selected component (if applicable):
2.6.9-11.ELsmp

How reproducible:
It is very reproducible in my environment, although I am not exactly sure what
is causing it, making it difficult to reproduce elsewhere.

Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Jason Vas Dias 2006-07-27 00:14:08 UTC
Another report of this bug just came in, for the BIND named process on
U3 i386 SMP kernel (I think) - I've requested more details - from a customer 
at apnic.net with a large, busy BIND installation with IPv6 enabled and BIND
DNSSEC enabled - here are the logs:

kernel: Badness in dst_release at include/net/dst.h:149
kernel:  [<f8a6df99>] udpv6_sendmsg+0x69c/0x770 
[ipv6]
kernel:  [<c02b8aa1>] inet_sendmsg+0x38/0x42
kernel:  [<c027661d>] sock_sendmsg+0xdb/0xf7
kernel:  [<c0141aa9>] 
generic_file_buffered_write+0x3b0/0x47c
kernel:  [<c0120291>] 
autoremove_wake_function+0x0/0x2d
kernel:  [<c027b6c2>] verify_iovec+0x76/0xc2
kernel:  [<c0277d68>] sys_sendmsg+0x1ee/0x23b
kernel:  [<c011cc85>] activate_task+0x88/0x95
kernel:  [<c011d1a3>] try_to_wake_up+0x281/0x28c
kernel:  [<c011e7a1>] __wake_up+0x29/0x3c
kernel:  [<c01348bb>] wake_futex+0x3a/0x44
kernel:  [<c0134964>] futex_wake+0x9f/0xc5
kernel:  [<c027816f>] sys_socketcall+0x1df/0x1fb
kernel:  [<c0125fc5>] sys_gettimeofday+0x53/0xac
kernel:  [<c02d268f>] syscall_call+0x7/0xb
kernel:  [<c02d007b>] schedule+0x2f7/0x8d3

Could this possibly be related to the Fedora devel kernel's bug 200038,
with the new 'circular locking detection' mechanism being triggered 
also by udpv6_sendmsg ?

This bug could be quite important for our advertised full IPv6 support in
RHEL-5, so I'm tentatively bumping up the severity.

Comment 2 Terry Manderson 2006-07-27 02:02:30 UTC
Adding to Jason's comment, I am the customer having the issue..

2.6.9-34.0.2.ELsmp kernel on a i386 kenel.

Quite an active v6 server, located on main v6 IX in Tokyo.
running bind with dnssec enabled.. (although I think that that having
v6 services just triggers the issue more readily.)

The issue doesn't cause named to core dump. so I let it run in error state for some time and then also 
get error messages like:

Jul 27 10:54:32 karashi kernel: Badness in dst_release at include/net/dst.h:149
Jul 27 10:54:33 karashi kernel:  [<f8a70709>] icmpv6_send+0x2a3/0x516 [ipv6]
Jul 27 10:54:33 karashi kernel:  [<c02b39a1>] udp_rcv+0x32e/0x33a
Jul 27 10:54:33 karashi kernel:  [<f8a6d550>] udpv6_rcv+0x2ac/0x4af [ipv6]
Jul 27 10:54:33 karashi kernel:  [<f8a5fa35>] ip6_input+0x18e/0x2a0 [ipv6]
Jul 27 10:54:33 karashi kernel:  [<f8a5f807>] ipv6_rcv+0x183/0x202 [ipv6]
Jul 27 10:54:33 karashi kernel:  [<c027f269>] netif_receive_skb+0x1f1/0x21f
Jul 27 10:54:33 karashi kernel:  [<f88a862e>] tg3_rx+0x284/0x35e [tg3]
Jul 27 10:54:33 karashi kernel:  [<f88a877f>] tg3_poll+0x77/0x12b [tg3]
Jul 27 10:54:33 karashi kernel:  [<c027f3f5>] net_rx_action+0x61/0xd8
Jul 27 10:54:33 karashi kernel:  [<c0126754>] __do_softirq+0x4c/0xb1
Jul 27 10:54:33 karashi kernel:  [<c0108143>] do_softirq+0x4f/0x56

so clearly affecting icmp as well as udp as the kernel messages Jason has posted above
reflect.

Stoping bind, then retsarting the network clears the issue for a period of time.
(/etc/init.d/network restart) - but it then returns...

leaving the issue alone, named begins to chew cpu and memory resources.
(i believe trying to answer to repeated unanswered request, and trying to send the v6 answers with no 
joy) Evenetually bind service is effected. This is a serious issue for us.

I Have also submitted a rhn support service request..  #943134


Comment 3 Terry Manderson 2006-07-27 02:28:29 UTC
also.. it may be worth checking differences in 2.6.9 and 2.6.11 from kernel.org for this patch..

http://oss.sgi.com/archives/netdev/2005-01/txt6VEiPGUBAU.txt that appears to have been
applied in later kernels.. I couln't find it in redhat kernel src.. may have overlooked, please confirm.

Comment 4 David Miller 2006-07-27 05:29:14 UTC
Yes, the patch at that URL would definitely fix the problem you
are seeing.

I remember this one, Jeff Garzik was seeing it on his systems.

ip6_dst_lookup() is called from both socket locked and socket
unlocked contexts.  Therefore, using __sk_dst_check() is racy.
We have to use sk_dst_check() which grabs the sk->sk_dst_lock.

UDPv6 is one of the non-locked contexts this gets invoked in, which
is why DNS servers will hit this quite readily.

So the thing to do is backport the mentioned patch.


Comment 5 David Miller 2006-07-27 05:35:59 UTC
Created attachment 133127 [details]
Fix for ipv6 DST locking, backported from upstream.

Comment 6 Terry Manderson 2006-07-27 05:40:41 UTC
cool.. what is the time frame for a rpm release of an updated kernel to get me out of this
bind ('scuse the pun) ??

Comment 10 RHEL Program Management 2006-08-16 21:01:04 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this enhancement by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This enhancement is not yet committed for inclusion in an Update
release.

Comment 13 Jason Baron 2006-09-14 15:56:19 UTC
committed in stream U5 build 42.10. A test kernel with this patch is available
from http://people.redhat.com/~jbaron/rhel4/


Comment 14 malek shabou 2006-09-22 15:32:07 UTC
hi,
there is stil an probleme with IPv6 and fragmentation, see  Service Request #1014896

regards

Comment 17 Terry Manderson 2006-11-06 06:55:16 UTC
finally got a scheduled dowtime to implement the kernel..

Not even 7 hours of run-time...

Nov  6 14:27:06 karashi kernel: ------------[ cut here ]------------
Nov  6 14:27:06 karashi kernel: kernel BUG at net/ipv6/ip6_output.c:714!
Nov  6 14:27:06 karashi kernel: invalid operand: 0000 [#1]
Nov  6 14:27:06 karashi kernel: SMP
Nov  6 14:27:06 karashi kernel: Modules linked in: md5 ipv6 dm_mirror dm_mod button battery ac uh
ci_hcd ehci_hcd tg3 ext3 jbd megaraid_mbox megaraid_mm sd_mod scsi_mod
Nov  6 14:27:06 karashi kernel: CPU:    0
Nov  6 14:27:06 karashi kernel: EIP:    0060:[<f8a6c8a8>]    Not tainted VLI
Nov  6 14:27:06 karashi kernel: EFLAGS: 00010282   (2.6.9-42.22.ELsmp)
Nov  6 14:27:06 karashi kernel: EIP is at ip6_fragment+0x685/0x7ce [ipv6]
Nov  6 14:27:06 karashi kernel: eax: fffffff2   ebx: f5dfca80   ecx: 00000000   edx: f6cf2a80
Nov  6 14:27:06 karashi kernel: esi: f3b57160   edi: fffffd30   ebp: fffffd30   esp: f7274c14
Nov  6 14:27:06 karashi kernel: ds: 007b   es: 007b   ss: 0068
Nov  6 14:27:06 karashi kernel: Process named (pid: 3610, threadinfo=f7274000 task=f7187830)
Nov  6 14:27:06 karashi kernel: Stack: 00000000 86274eb0 00000000 000007c8 83010000 
ffffff31 0000
07c8 fffffd30
Nov  6 14:27:06 karashi kernel:        f3b55758 f689d080 f8a6b409 f5dfca80 f6cf2a80 f3b569c8 f6cf
2a80 f3b56998
Nov  6 14:27:06 karashi kernel:        f69f9880 00000000 f8a6d5da f621ee50 f689d080 00000000 
f621
eea4 f621ee00
Nov  6 14:27:06 karashi kernel: Call Trace:
Nov  6 14:27:06 karashi kernel:  [<f8a6b409>] ip6_output2+0x0/0x235 [ipv6]
Nov  6 14:27:06 karashi kernel:  [<f8a6d5da>] ip6_push_pending_frames+0x291/0x369 [ipv6]
Nov  6 14:27:06 karashi kernel:  [<f8a7bb2d>] udp_v6_push_pending_frames+0x169/0x185 [ipv6]
Nov  6 14:27:06 karashi kernel:  [<f8a7c16b>] udpv6_sendmsg+0x622/0x770 [ipv6]
Nov  6 14:27:06 karashi kernel:  [<c027d0e7>] skb_dequeue+0x40/0x46
Nov  6 14:27:06 karashi kernel:  [<c027dc5d>] skb_recv_datagram+0x61/0x9b
Nov  6 14:27:06 karashi kernel:  [<c02bb071>] inet_sendmsg+0x38/0x42
Nov  6 14:27:06 karashi kernel:  [<c027836d>] sock_sendmsg+0xdb/0xf7
Nov  6 14:27:06 karashi kernel:  [<c02d5822>] reschedule_interrupt+0x1a/0x20
Nov  6 14:27:06 karashi kernel:  [<c011de62>] find_busiest_group+0xdd/0x295
Nov  6 14:27:06 karashi kernel:  [<c0120519>] autoremove_wake_function+0x0/0x2d
Nov  6 14:27:06 karashi kernel:  [<c0279ab8>] sys_sendmsg+0x1ee/0x23b
Nov  6 14:27:06 karashi kernel:  [<c02d3576>] schedule_timeout+0xb9/0x154
Nov  6 14:27:06 karashi kernel:  [<c0150835>] find_extend_vma+0x12/0x4f
Nov  6 14:27:06 karashi kernel:  [<c01350ec>] unqueue_me+0x73/0x79
Nov  6 14:27:06 karashi kernel:  [<c0150835>] find_extend_vma+0x12/0x4f
Nov  6 14:27:06 karashi kernel:  [<c0134b9f>] get_futex_key+0x39/0x108
Nov  6 14:27:06 karashi kernel:  [<c0134d78>] futex_wake+0x9f/0xc5
Nov  6 14:27:06 karashi kernel:  [<c0279ebf>] sys_socketcall+0x1df/0x1fb
Nov  6 14:27:06 karashi kernel:  [<c02d4e27>] syscall_call+0x7/0xb
Nov  6 14:27:06 karashi kernel:  [<c02d007b>] ipv6_skip_exthdr_nolen+0x71/0x100
Nov  6 14:27:06 karashi kernel: Code: 89 44 24 10 eb 0b 8b 4c 24 10 8b 54 24 20 89 4a 04 8b 44 24
 2c 8b 48 24 8b 44 24 30 55 8b 54 24 10 e8 e3 fe 80 c7 5f 85 c0 74 08 <0f> 0b ca 02 62 c9 a8 f8 0
f b7 44 24 08 0f b6 d0 c1 e2 08 c1 e8
Nov  6 14:27:06 karashi kernel:  <0>Fatal exception: panic in 5 seconds

Comment 18 David Miller 2006-11-14 22:46:46 UTC
You are getting a different crash than the one for the bug being
worked on in this bugzilla.  Please open up a new bug report.


Comment 22 Lee Whatley 2007-03-07 15:16:31 UTC
Is it safe to say that this problem only affects SMP kernels?  I ask because I
just put a machine together that is using IPv6 that is experiencing this exact
same problem.  All of my web searching has produced references to SMP kernels. 
I am wondering if rebooting to the uniprocessor kernel (which I unfortunately
can't reboot this machine right now) would fix the problem? 

Comment 23 Mike Gahagan 2007-04-11 18:07:36 UTC
Fix is in the -54 kernel.


Comment 25 Neil Horman 2007-04-26 15:51:33 UTC
*** Bug 227849 has been marked as a duplicate of this bug. ***

Comment 26 Red Hat Bugzilla 2007-05-01 23:21:17 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2007-0304.html


Comment 29 John Feeney 2012-03-22 14:18:55 UTC
*** Bug 237622 has been marked as a duplicate of this bug. ***