Description of problem: Kernel caught BUG_ON: Kernel BUG at include/linux/timer.h:82 Call trace: add_timer rt_secret_rebuild_oneshot rt_emergency_hash_rebuild rt_intern_hash __ip_route_output_key ip_route_output_flow ip_queue_xmit tcp_transmit_skb tcp_sendmsg Original dmesg: Route hash chain too long! Route hash chain too long! Adjust your secret_interval! Adjust your secret_interval! ----------- [cut here ] --------- [please bite here ] --------- Kernel BUG at include/linux/timer.h:82 ... Process mysqld (pid: 55952, veid=9430, threadinfo ffff8100523e6000, task ffff81004ac43000) Stack: ffff81042e0a8300 0000000000000000 ffff810300000008 ffff8100523e7bb0 0000000000379e80 ffff81036cb99970 000000010c7fb028 0000000000000000 00000001ffffffff ffff81038f3def00 ffff8100523e7b48 ffff810325d89508 Call Trace: [<ffffffff8005dbf9>] __ip_route_output_key+0x8fd/0x97f [<ffffffff8024d7ff>] ip_route_output_flow+0x1e/0x290 [<ffffffff8003701f>] ip_queue_xmit+0x110/0x608 [<ffffffff80023745>] tcp_transmit_skb+0x725/0x75d [<ffffffff8000c54b>] cache_alloc_debugcheck_after+0x40/0x1c1 [<ffffffff80035e0f>] __tcp_push_pending_frames+0x792/0x87e [<ffffffff80030854>] __alloc_skb+0x8a/0x15e [<ffffffff800283fb>] tcp_sendmsg+0xb3f/0xc59 [<ffffffff8008bfff>] enqueue_task+0x41/0x56 [<ffffffff8003b066>] do_sock_write+0xa8/0xe4 [<ffffffff8004bea6>] sock_aio_write+0x4f/0x5e [<ffffffff80018e93>] do_sync_write+0xc7/0x104 [<ffffffff80033692>] release_sock+0x2f/0xd6 [<ffffffff8022aae4>] sock_setsockopt+0x576/0x588 [<ffffffff800a55ba>] autoremove_wake_function+0x0/0x2e [<ffffffff8023ff7d>] compat_sys_setsockopt+0x406/0x41a [<ffffffff800175ec>] vfs_write+0xc0/0x153 [<ffffffff80017c83>] sys_write+0x49/0xbf [<ffffffff80065766>] ia32_sysret+0x0/0xa Version-Release number of selected component (if applicable): 2.6.18-164.11.1.el5 Issue is in the patch linux-2.6-net-allow-for-on-demand-emergency-route-cache-flushing.patch Also see the bug #545411 that also related to the *-route-cache-flushing* patch.
Created attachment 394697 [details] fix-route-add_timer.patch PATCH: Fix caught BUG_ON during rt_secret_rebuild()
Problem is found by vzbugs
hi, I dont see this change in the upstream and it looks like it should be there. Any plan to post it upstream? thanks, jirka
(In reply to comment #3) > hi, > > I dont see this change in the upstream and it looks like it should be there. > Any plan to post it upstream? > > thanks, > jirka I will sent to upstream in two days.
could you plz post link to the sent patch once it's out thanks, jirka
(In reply to comment #5) > could you plz post link to the sent patch once it's out "route: Fix caught BUG_ON during rt_secret_rebuild_oneshot()" http://patchwork.ozlabs.org/patch/47840/
thanks, any idea how to reproduce this? jirka
Yes. You should do in parallels two things: 1. change the /proc/sys/net/ipv4/route/secret_interval value 2. force the rt cache grow e.g. by routing many packets with different IPs through the node
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
in kernel-2.6.18-200.el5 You can download this test kernel from http://people.redhat.com/jwilson/el5 Detailed testing feedback is always welcomed.
I was trying to reproduce the issue these days with kernel 2.6.18-194.el5, but I couldn't trigger it, I did following things: 1. changed the /proc/sys/net/ipv4/route/secret_interval value to 86400. 2. changed the route cache hash entries to smaller value: [root@intel-s3e8132-01 ~]# dmesg | grep "IP route" IP route cache hash table entries: 16 (order: -5, 128 bytes) [root@intel-s3e8132-01 ~]# cat /proc/sys/net/ipv4/route/max_size 256 3. routed many packets with different destination IPs and the route cache grew. [root@intel-s3e8132-01 ~]# ip -o route ls cache|wc -l 127 But seems the route cache prunes itself when it reaches the limit? I couldn't get the message "Route hash chain too long!". Any ideas will be much appreciated.
You will not be able to easily reproduce the issue, as this is a rare race. If you cannot accept the patch by *only* reviewing it - you can close the bug.
(In reply to comment #17) > If you cannot accept the patch by *only* reviewing it - you can close the bug. AFAIU the patch was actually accepted (as per comment #14). Well, not yet appeared in the released kernel, but it will be there. I am not sure what the RH policy is wrt closing the bugs. Dayong question is not about whether to accept the patch, he just tried to verify the bug.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2011-0017.html