Bug 566104 - route: BUG at include/linux/timer.h:82 (call from rt_secret_rebuild_oneshot)
Summary: route: BUG at include/linux/timer.h:82 (call from rt_secret_rebuild_oneshot)
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.4
Hardware: All
OS: Linux
low
urgent
Target Milestone: rc
: ---
Assignee: Jiri Olsa
QA Contact: Network QE
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-02-17 09:20 UTC by Vitaliy Gusev
Modified: 2011-01-13 21:07 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-01-13 21:07:06 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
fix-route-add_timer.patch (1.65 KB, patch)
2010-02-17 09:24 UTC, Vitaliy Gusev
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2011:0017 0 normal SHIPPED_LIVE Important: Red Hat Enterprise Linux 5.6 kernel security and bug fix update 2011-01-13 10:37:42 UTC

Description Vitaliy Gusev 2010-02-17 09:20:15 UTC
Description of problem:

Kernel caught BUG_ON:
Kernel BUG at include/linux/timer.h:82
Call trace:

  add_timer
  rt_secret_rebuild_oneshot
  rt_emergency_hash_rebuild
  rt_intern_hash
  __ip_route_output_key
  ip_route_output_flow
  ip_queue_xmit
  tcp_transmit_skb
  tcp_sendmsg

Original dmesg:

  Route hash chain too long!
  Route hash chain too long!
  Adjust your secret_interval!
  Adjust your secret_interval!
  ----------- [cut here ] --------- [please bite here ] ---------
  Kernel BUG at include/linux/timer.h:82
...
  Process mysqld (pid: 55952, veid=9430, threadinfo ffff8100523e6000, task ffff81004ac43000)
Stack:  ffff81042e0a8300 0000000000000000 ffff810300000008 ffff8100523e7bb0
 0000000000379e80 ffff81036cb99970 000000010c7fb028 0000000000000000
 00000001ffffffff ffff81038f3def00 ffff8100523e7b48 ffff810325d89508
Call Trace:
 [<ffffffff8005dbf9>] __ip_route_output_key+0x8fd/0x97f
 [<ffffffff8024d7ff>] ip_route_output_flow+0x1e/0x290
 [<ffffffff8003701f>] ip_queue_xmit+0x110/0x608
 [<ffffffff80023745>] tcp_transmit_skb+0x725/0x75d
 [<ffffffff8000c54b>] cache_alloc_debugcheck_after+0x40/0x1c1
 [<ffffffff80035e0f>] __tcp_push_pending_frames+0x792/0x87e
 [<ffffffff80030854>] __alloc_skb+0x8a/0x15e
 [<ffffffff800283fb>] tcp_sendmsg+0xb3f/0xc59
 [<ffffffff8008bfff>] enqueue_task+0x41/0x56
 [<ffffffff8003b066>] do_sock_write+0xa8/0xe4
 [<ffffffff8004bea6>] sock_aio_write+0x4f/0x5e
 [<ffffffff80018e93>] do_sync_write+0xc7/0x104
 [<ffffffff80033692>] release_sock+0x2f/0xd6
 [<ffffffff8022aae4>] sock_setsockopt+0x576/0x588
 [<ffffffff800a55ba>] autoremove_wake_function+0x0/0x2e
 [<ffffffff8023ff7d>] compat_sys_setsockopt+0x406/0x41a
 [<ffffffff800175ec>] vfs_write+0xc0/0x153
 [<ffffffff80017c83>] sys_write+0x49/0xbf
 [<ffffffff80065766>] ia32_sysret+0x0/0xa

Version-Release number of selected component (if applicable):

 2.6.18-164.11.1.el5

  Issue is in the patch
  linux-2.6-net-allow-for-on-demand-emergency-route-cache-flushing.patch

  Also see the bug #545411 that also related to the *-route-cache-flushing* 
  patch.

Comment 1 Vitaliy Gusev 2010-02-17 09:24:49 UTC
Created attachment 394697 [details]
fix-route-add_timer.patch

PATCH: Fix caught BUG_ON during rt_secret_rebuild()

Comment 2 Vitaliy Gusev 2010-02-17 09:26:09 UTC
Problem is found by vzbugs

Comment 3 Jiri Olsa 2010-03-12 10:41:18 UTC
hi,

I dont see this change in the upstream and it looks like it should be there.
Any plan to post it upstream?

thanks,
jirka

Comment 4 Vitaliy Gusev 2010-03-12 11:20:24 UTC
(In reply to comment #3)
> hi,
> 
> I dont see this change in the upstream and it looks like it should be there.
> Any plan to post it upstream?
> 
> thanks,
> jirka    

I will sent to upstream in two days.

Comment 5 Jiri Olsa 2010-03-16 08:31:34 UTC
could you plz post link to the sent patch once it's out

thanks,
jirka

Comment 6 Vitaliy Gusev 2010-03-17 09:42:54 UTC
(In reply to comment #5)
> could you plz post link to the sent patch once it's out


"route: Fix caught BUG_ON during rt_secret_rebuild_oneshot()"

http://patchwork.ozlabs.org/patch/47840/

Comment 7 Jiri Olsa 2010-03-18 14:49:23 UTC
thanks, any idea how to reproduce this?

jirka

Comment 11 Pavel Emelyanov 2010-05-13 14:48:21 UTC
Yes. You should do in parallels two things:
1. change the /proc/sys/net/ipv4/route/secret_interval value
2. force the rt cache grow e.g. by routing many packets with different IPs through the node

Comment 12 RHEL Program Management 2010-05-20 12:42:11 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 14 Jarod Wilson 2010-05-25 21:11:41 UTC
in kernel-2.6.18-200.el5
You can download this test kernel from http://people.redhat.com/jwilson/el5

Detailed testing feedback is always welcomed.

Comment 16 Dayong Tian 2010-11-30 08:24:43 UTC
I was trying to reproduce the issue these days with kernel 2.6.18-194.el5, but I couldn't trigger it, I did following things:

1. changed the /proc/sys/net/ipv4/route/secret_interval value to 86400.
2. changed the route cache hash entries to smaller value:

   [root@intel-s3e8132-01 ~]# dmesg | grep "IP route"
   IP route cache hash table entries: 16 (order: -5, 128 bytes)
   [root@intel-s3e8132-01 ~]# cat /proc/sys/net/ipv4/route/max_size
   256

3. routed many packets with different destination IPs and the route cache grew.

   [root@intel-s3e8132-01 ~]# ip -o route ls cache|wc -l
   127

But seems the route cache prunes itself when it reaches the limit? I couldn't get the message  "Route hash chain too long!".
Any ideas will be much appreciated.

Comment 17 Pavel Emelyanov 2010-11-30 08:40:03 UTC
You will not be able to easily reproduce the issue, as this is a rare race. If you cannot accept the patch by *only* reviewing it - you can close the bug.

Comment 19 Kirill Kolyshkin 2010-12-03 04:20:19 UTC
(In reply to comment #17)
> If you cannot accept the patch by *only* reviewing it - you can close the bug.

AFAIU the patch was actually accepted (as per comment #14). Well, not yet appeared in the released kernel, but it will be there. I am not sure what the RH policy is wrt closing the bugs.

Dayong question is not about whether to accept the patch, he just tried to verify the bug.

Comment 21 errata-xmlrpc 2011-01-13 21:07:06 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-0017.html


Note You need to log in before you can comment on or make changes to this bug.