Description of problem: By default, this machine came up with IPv6 enabled. No IPv6 services were actively being used, although several other IPv6 enabled devices were on the same switch, so there was IPv6 traffic being seen by this machine. The IPv6 traffic was mostly router advertisements and such. I do not know what other systems were connected to the switch, as the machine is colocated at a remote data center. After several hours of being on the network, I started recieving these messages in the log. The machine also started rebooting, although I have not 100% correlated the rebooting to this message. This is the only suspicious message in the logs prior to rebooting. Sep 2 18:46:30 lh kernel: Badness in dst_release at include/net/dst.h:149 Sep 2 18:46:31 lh kernel: [<f8bb3f3b>] ip6_dst_check+0x42/0x4a [ipv6] Sep 2 18:46:31 lh kernel: [<f8bac962>] ip6_dst_lookup+0x3d/0x12f [ipv6] Sep 2 18:46:31 lh kernel: [<f8bbbafe>] udpv6_sendmsg+0x4af/0x769 [ipv6] Sep 2 18:46:31 lh kernel: [<c01a2203>] socket_has_perm+0x51/0x5f Sep 2 18:46:31 lh kernel: [<c02acad1>] inet_sendmsg+0x38/0x42 Sep 2 18:46:31 lh kernel: [<c026aa05>] sock_sendmsg+0xdb/0xf7 Sep 2 18:46:31 lh kernel: [<c01281f8>] update_wall_time+0x8/0x30 Sep 2 18:46:31 lh kernel: [<c011e912>] autoremove_wake_function+0x0/0x2d Sep 2 18:46:31 lh kernel: [<c026fa96>] verify_iovec+0x76/0xc2 Sep 2 18:46:31 lh kernel: [<c026c157>] sys_sendmsg+0x1ee/0x23b Sep 2 18:46:31 lh kernel: [<c011b599>] activate_task+0x88/0x95 Sep 2 18:46:31 lh kernel: [<c011ba1a>] try_to_wake_up+0x222/0x22d Sep 2 18:46:31 lh kernel: [<c011cf19>] __wake_up+0x29/0x3c Sep 2 18:46:31 lh kernel: [<c0132678>] wake_futex+0x3a/0x44 Sep 2 18:46:31 lh kernel: [<c0132721>] futex_wake+0x9f/0xc5 Sep 2 18:46:31 lh kernel: [<c026c540>] sys_socketcall+0x1c1/0x1dd Sep 2 18:46:31 lh kernel: [<c0124489>] sys_gettimeofday+0x53/0xac Sep 2 18:46:31 lh kernel: [<c02c62a3>] syscall_call+0x7/0xb Sep 2 18:46:31 lh kernel: [<c02c007b>] unix_detach_fds+0x2e/0x31 As a workaround, I have disabled IPv6, and this appears to have stopped happening. Version-Release number of selected component (if applicable): 2.6.9-11.ELsmp How reproducible: It is very reproducible in my environment, although I am not exactly sure what is causing it, making it difficult to reproduce elsewhere. Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
Another report of this bug just came in, for the BIND named process on U3 i386 SMP kernel (I think) - I've requested more details - from a customer at apnic.net with a large, busy BIND installation with IPv6 enabled and BIND DNSSEC enabled - here are the logs: kernel: Badness in dst_release at include/net/dst.h:149 kernel: [<f8a6df99>] udpv6_sendmsg+0x69c/0x770 [ipv6] kernel: [<c02b8aa1>] inet_sendmsg+0x38/0x42 kernel: [<c027661d>] sock_sendmsg+0xdb/0xf7 kernel: [<c0141aa9>] generic_file_buffered_write+0x3b0/0x47c kernel: [<c0120291>] autoremove_wake_function+0x0/0x2d kernel: [<c027b6c2>] verify_iovec+0x76/0xc2 kernel: [<c0277d68>] sys_sendmsg+0x1ee/0x23b kernel: [<c011cc85>] activate_task+0x88/0x95 kernel: [<c011d1a3>] try_to_wake_up+0x281/0x28c kernel: [<c011e7a1>] __wake_up+0x29/0x3c kernel: [<c01348bb>] wake_futex+0x3a/0x44 kernel: [<c0134964>] futex_wake+0x9f/0xc5 kernel: [<c027816f>] sys_socketcall+0x1df/0x1fb kernel: [<c0125fc5>] sys_gettimeofday+0x53/0xac kernel: [<c02d268f>] syscall_call+0x7/0xb kernel: [<c02d007b>] schedule+0x2f7/0x8d3 Could this possibly be related to the Fedora devel kernel's bug 200038, with the new 'circular locking detection' mechanism being triggered also by udpv6_sendmsg ? This bug could be quite important for our advertised full IPv6 support in RHEL-5, so I'm tentatively bumping up the severity.
Adding to Jason's comment, I am the customer having the issue.. 2.6.9-34.0.2.ELsmp kernel on a i386 kenel. Quite an active v6 server, located on main v6 IX in Tokyo. running bind with dnssec enabled.. (although I think that that having v6 services just triggers the issue more readily.) The issue doesn't cause named to core dump. so I let it run in error state for some time and then also get error messages like: Jul 27 10:54:32 karashi kernel: Badness in dst_release at include/net/dst.h:149 Jul 27 10:54:33 karashi kernel: [<f8a70709>] icmpv6_send+0x2a3/0x516 [ipv6] Jul 27 10:54:33 karashi kernel: [<c02b39a1>] udp_rcv+0x32e/0x33a Jul 27 10:54:33 karashi kernel: [<f8a6d550>] udpv6_rcv+0x2ac/0x4af [ipv6] Jul 27 10:54:33 karashi kernel: [<f8a5fa35>] ip6_input+0x18e/0x2a0 [ipv6] Jul 27 10:54:33 karashi kernel: [<f8a5f807>] ipv6_rcv+0x183/0x202 [ipv6] Jul 27 10:54:33 karashi kernel: [<c027f269>] netif_receive_skb+0x1f1/0x21f Jul 27 10:54:33 karashi kernel: [<f88a862e>] tg3_rx+0x284/0x35e [tg3] Jul 27 10:54:33 karashi kernel: [<f88a877f>] tg3_poll+0x77/0x12b [tg3] Jul 27 10:54:33 karashi kernel: [<c027f3f5>] net_rx_action+0x61/0xd8 Jul 27 10:54:33 karashi kernel: [<c0126754>] __do_softirq+0x4c/0xb1 Jul 27 10:54:33 karashi kernel: [<c0108143>] do_softirq+0x4f/0x56 so clearly affecting icmp as well as udp as the kernel messages Jason has posted above reflect. Stoping bind, then retsarting the network clears the issue for a period of time. (/etc/init.d/network restart) - but it then returns... leaving the issue alone, named begins to chew cpu and memory resources. (i believe trying to answer to repeated unanswered request, and trying to send the v6 answers with no joy) Evenetually bind service is effected. This is a serious issue for us. I Have also submitted a rhn support service request.. #943134
also.. it may be worth checking differences in 2.6.9 and 2.6.11 from kernel.org for this patch.. http://oss.sgi.com/archives/netdev/2005-01/txt6VEiPGUBAU.txt that appears to have been applied in later kernels.. I couln't find it in redhat kernel src.. may have overlooked, please confirm.
Yes, the patch at that URL would definitely fix the problem you are seeing. I remember this one, Jeff Garzik was seeing it on his systems. ip6_dst_lookup() is called from both socket locked and socket unlocked contexts. Therefore, using __sk_dst_check() is racy. We have to use sk_dst_check() which grabs the sk->sk_dst_lock. UDPv6 is one of the non-locked contexts this gets invoked in, which is why DNS servers will hit this quite readily. So the thing to do is backport the mentioned patch.
Created attachment 133127 [details] Fix for ipv6 DST locking, backported from upstream.
cool.. what is the time frame for a rpm release of an updated kernel to get me out of this bind ('scuse the pun) ??
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this enhancement by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This enhancement is not yet committed for inclusion in an Update release.
committed in stream U5 build 42.10. A test kernel with this patch is available from http://people.redhat.com/~jbaron/rhel4/
hi, there is stil an probleme with IPv6 and fragmentation, see Service Request #1014896 regards
finally got a scheduled dowtime to implement the kernel.. Not even 7 hours of run-time... Nov 6 14:27:06 karashi kernel: ------------[ cut here ]------------ Nov 6 14:27:06 karashi kernel: kernel BUG at net/ipv6/ip6_output.c:714! Nov 6 14:27:06 karashi kernel: invalid operand: 0000 [#1] Nov 6 14:27:06 karashi kernel: SMP Nov 6 14:27:06 karashi kernel: Modules linked in: md5 ipv6 dm_mirror dm_mod button battery ac uh ci_hcd ehci_hcd tg3 ext3 jbd megaraid_mbox megaraid_mm sd_mod scsi_mod Nov 6 14:27:06 karashi kernel: CPU: 0 Nov 6 14:27:06 karashi kernel: EIP: 0060:[<f8a6c8a8>] Not tainted VLI Nov 6 14:27:06 karashi kernel: EFLAGS: 00010282 (2.6.9-42.22.ELsmp) Nov 6 14:27:06 karashi kernel: EIP is at ip6_fragment+0x685/0x7ce [ipv6] Nov 6 14:27:06 karashi kernel: eax: fffffff2 ebx: f5dfca80 ecx: 00000000 edx: f6cf2a80 Nov 6 14:27:06 karashi kernel: esi: f3b57160 edi: fffffd30 ebp: fffffd30 esp: f7274c14 Nov 6 14:27:06 karashi kernel: ds: 007b es: 007b ss: 0068 Nov 6 14:27:06 karashi kernel: Process named (pid: 3610, threadinfo=f7274000 task=f7187830) Nov 6 14:27:06 karashi kernel: Stack: 00000000 86274eb0 00000000 000007c8 83010000 ffffff31 0000 07c8 fffffd30 Nov 6 14:27:06 karashi kernel: f3b55758 f689d080 f8a6b409 f5dfca80 f6cf2a80 f3b569c8 f6cf 2a80 f3b56998 Nov 6 14:27:06 karashi kernel: f69f9880 00000000 f8a6d5da f621ee50 f689d080 00000000 f621 eea4 f621ee00 Nov 6 14:27:06 karashi kernel: Call Trace: Nov 6 14:27:06 karashi kernel: [<f8a6b409>] ip6_output2+0x0/0x235 [ipv6] Nov 6 14:27:06 karashi kernel: [<f8a6d5da>] ip6_push_pending_frames+0x291/0x369 [ipv6] Nov 6 14:27:06 karashi kernel: [<f8a7bb2d>] udp_v6_push_pending_frames+0x169/0x185 [ipv6] Nov 6 14:27:06 karashi kernel: [<f8a7c16b>] udpv6_sendmsg+0x622/0x770 [ipv6] Nov 6 14:27:06 karashi kernel: [<c027d0e7>] skb_dequeue+0x40/0x46 Nov 6 14:27:06 karashi kernel: [<c027dc5d>] skb_recv_datagram+0x61/0x9b Nov 6 14:27:06 karashi kernel: [<c02bb071>] inet_sendmsg+0x38/0x42 Nov 6 14:27:06 karashi kernel: [<c027836d>] sock_sendmsg+0xdb/0xf7 Nov 6 14:27:06 karashi kernel: [<c02d5822>] reschedule_interrupt+0x1a/0x20 Nov 6 14:27:06 karashi kernel: [<c011de62>] find_busiest_group+0xdd/0x295 Nov 6 14:27:06 karashi kernel: [<c0120519>] autoremove_wake_function+0x0/0x2d Nov 6 14:27:06 karashi kernel: [<c0279ab8>] sys_sendmsg+0x1ee/0x23b Nov 6 14:27:06 karashi kernel: [<c02d3576>] schedule_timeout+0xb9/0x154 Nov 6 14:27:06 karashi kernel: [<c0150835>] find_extend_vma+0x12/0x4f Nov 6 14:27:06 karashi kernel: [<c01350ec>] unqueue_me+0x73/0x79 Nov 6 14:27:06 karashi kernel: [<c0150835>] find_extend_vma+0x12/0x4f Nov 6 14:27:06 karashi kernel: [<c0134b9f>] get_futex_key+0x39/0x108 Nov 6 14:27:06 karashi kernel: [<c0134d78>] futex_wake+0x9f/0xc5 Nov 6 14:27:06 karashi kernel: [<c0279ebf>] sys_socketcall+0x1df/0x1fb Nov 6 14:27:06 karashi kernel: [<c02d4e27>] syscall_call+0x7/0xb Nov 6 14:27:06 karashi kernel: [<c02d007b>] ipv6_skip_exthdr_nolen+0x71/0x100 Nov 6 14:27:06 karashi kernel: Code: 89 44 24 10 eb 0b 8b 4c 24 10 8b 54 24 20 89 4a 04 8b 44 24 2c 8b 48 24 8b 44 24 30 55 8b 54 24 10 e8 e3 fe 80 c7 5f 85 c0 74 08 <0f> 0b ca 02 62 c9 a8 f8 0 f b7 44 24 08 0f b6 d0 c1 e2 08 c1 e8 Nov 6 14:27:06 karashi kernel: <0>Fatal exception: panic in 5 seconds
You are getting a different crash than the one for the bug being worked on in this bugzilla. Please open up a new bug report.
Is it safe to say that this problem only affects SMP kernels? I ask because I just put a machine together that is using IPv6 that is experiencing this exact same problem. All of my web searching has produced references to SMP kernels. I am wondering if rebooting to the uniprocessor kernel (which I unfortunately can't reboot this machine right now) would fix the problem?
Fix is in the -54 kernel.
*** Bug 227849 has been marked as a duplicate of this bug. ***
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2007-0304.html
*** Bug 237622 has been marked as a duplicate of this bug. ***