Description of problem: kernel 2.6.9-34.0.1.ELsmp freezes after 1-2 weeks of running as a highly loaded router/firewall. The BUG_ON output logged via serial cable is: Jul 05 22:16:27: ------------[ cut here ]------------ Jul 05 22:16:27: kernel BUG at kernel/timer.c:420! Jul 05 22:16:27: invalid operand: 0000 [#1] Jul 05 22:16:27: SMP Jul 05 22:16:27: Modules linked in: cls_u32 sch_sfq sch_cbq ipt_REDIRECT iptable_mangle iptable_nat ip_conntrack iptable_filter md5 ipv6 ip_tables 8021q cpuid dm_mirror dm_mod button battery ac ohci_hcd eepro100 e100 mii e1000 tg3 floppy ext3 jbd dpt_i2o sd_mod scsi_mod Jul 05 22:16:27: CPU: 1 Jul 05 22:16:27: EIP: 0060:[<c0129d99>] Not tainted VLI Jul 05 22:16:27: EFLAGS: 00010887 (2.6.9-34.0.1.ELsmp) Jul 05 22:16:27: EIP is at cascade+0x18/0x37 Jul 05 22:16:27: eax: c2ee57e0 ebx: c2ee6160 ecx: 0000002e edx: e045808c Jul 05 22:16:27: esi: e045808c edi: c2ee57e0 ebp: 0000002e esp: c03cbfb4 Jul 05 22:16:27: ds: 007b es: 007b ss: 0068 Jul 05 22:16:27: Process swapper (pid: 0, threadinfo=c03cb000 task=f7f310b0) Jul 05 22:16:27: Stack: 00000000 c038fb88 c2ee57e0 c03cbfcc c012a211 00000246 c03cbfcc c03cbfcc Jul 05 22:16:28: 0000012c 00000001 c038fb88 0000000a 00000001 c0126748 f7f00f78 00000046 Jul 05 22:16:28: 00000000 00000000 c0108143 Jul 05 22:16:28: Call Trace: Jul 05 22:16:28: [<c012a211>] run_timer_softirq+0x6f/0x145 Jul 05 22:16:28: [<c0126748>] __do_softirq+0x4c/0xb1 Jul 05 22:16:28: [<c0108143>] do_softirq+0x4f/0x56 Jul 05 22:16:28: ======================= Jul 05 22:16:28: [<c0117488>] smp_apic_timer_interrupt+0x9a/0x9c Jul 05 22:16:28: [<c02d315a>] apic_timer_interrupt+0x1a/0x20 Jul 05 22:16:28: [<c0104018>] default_idle+0x0/0x2f Jul 05 22:16:28: [<c0104041>] default_idle+0x29/0x2f Jul 05 22:16:28: [<c01040a0>] cpu_idle+0x26/0x3b Jul 05 22:16:28: Code: e8 51 ff ff ff 85 c0 74 08 0f 0b 8d 01 9e 46 2e c0 5b c3 55 89 cd 57 89 c7 56 53 8d 1c ca 8b 33 39 de 74 1a 39 7e 20 89 f2 74 08 <0f> 0b a4 01 9e 46 2e c0 8b 36 89 f8 e8 86 fc ff ff eb e2 89 1b After this message, the machine seems to be frozen. Hard-reset is needed to get back the system. The hardware spec is: Server Model: Gigabyte GS-SR222E CPU: 2 X Intel 2.4GHz (hyperthreading enabled) Motherboard: Gigabyte GA-8EGPDRE Memory: 4G ECC memory RAID card: Adaptec ASR-2015S Version-Release number of selected component (if applicable): kernel 2.6.9-34.0.1.ELsmp on RHEL 4 How reproducible: Cannot reproduce it. But it is recurring once every 1-2 weeks. The machine is used as a router + firwall and its average load from `uptime` is around 10-15 during peak hour. Steps to Reproduce: 1. 2. 3. Actual results: System freeze, even when /proc/sys/kernel/panic is 1. Expected results: System should not freeze Additional info:
*** This bug has been marked as a duplicate of 174990 ***
This issue is resolved in the U4 kernel. Please find test kernels available at: http://people.redhat.com/~jbaron/rhel4/
I spoke with jbaron, and he is unable to confirm that the fix for bug 174990 should have fixed this bug. We have a customer who is still seeing it with the 2.6.9-42.0.2.EL kernel.
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
The x86_64 system triggering the BUG() in cascade() has two cpus, and therefore two tvec_base_t structures: crash> timer | grep TVEC TVEC_BASES[0]: 10001063be0 TVEC_BASES[1]: 1024009fbe0 crash> While running on cpu 1, run_timer_softirq() is kicked off: crash> bt PID: 16462 TASK: 1016d58a030 CPU: 1 COMMAND: "prelink" #0 [10438fd3d40] netpoll_start_netdump at ffffffffa0824366 #1 [10438fd3d70] die at ffffffff80111c00 #2 [10438fd3d90] do_invalid_op at ffffffff80111fc8 #3 [10438fd3dc8] cascade at ffffffff8013fef6 #4 [10438fd3de0] activate_task at ffffffff801321e3 #5 [10438fd3e10] recalc_task_prio at ffffffff80132155 #6 [10438fd3e50] error_exit at ffffffff80110d91 [exception RIP: cascade+0x28] RIP: ffffffff8013fef6 RSP: 0000010438fd3f00 RFLAGS: 00010087 RAX: 00000102400a05e8 RBX: 00000102400a0dd8 RCX: 000000010efe5e9f RDX: 00000102400a05e8 RSI: ffffffffa06c0a70 RDI: 000001024009fbe0 RBP: ffffffffa06c0a70 R8: 000000000000009f R9: 0000000000000080 R10: 0000000000000000 R11: 0000000000000002 R12: 000001024009fbe0 R13: 000000000000001e R14: 0000000000000246 R15: 0000000000800638 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0000 #7 [10438fd3f20] run_timer_softirq at ffffffff8013ffa0 #8 [10438fd3f70] __do_softirq at ffffffff8013c738 #9 [10438fd3fa0] do_softirq at ffffffff8013c7e1 #10 [10438fd3fb0] apic_timer_interrupt at ffffffff80110bf5 --- <IRQ stack> --- #11 [1034946df58] apic_timer_interrupt at ffffffff80110bf5 RIP: 0000000000401a38 RSP: 0000007fbffecc60 RFLAGS: 00000206 RAX: 0000000000630160 RBX: 0000007fbffef370 RCX: 0000000000000008 RDX: 0000000000046f58 RSI: 0000000000023010 RDI: 0000000000000001 RBP: 0000000000046f58 R8: 0000000000801848 R9: 0000000000000003 R10: 00000000ffffffff R11: 0000000000000016 R12: 0000007fbffef370 R13: 0000000000007a16 R14: 0000000000801650 R15: 0000000000800638 ORIG_RAX: ffffffffffffffef CS: 0033 SS: 002b crash> The cascade() BUG() occurs because, while manipulating its per-cpu (cpu 1's) timer lists, it finds a timer_list entry whose "base" pointer doesn't point back to cpu 1's tvec_base_t structure: static int cascade(tvec_base_t *base, tvec_t *tv, int index) { /* cascade all the timers from tv up one level */ struct list_head *head, *curr; head = tv->vec + index; curr = head->next; /* * We are removing _all_ timers from the list, so we don't have to * detach them individually, just clear the list afterwards. */ while (curr != head) { struct timer_list *tmp; tmp = list_entry(curr, struct timer_list, entry); BUG_ON(tmp->base != base); curr = curr->next; internal_add_timer(base, tmp); } INIT_LIST_HEAD(head); return index; } crash> dis -rl ffffffff8013fef6 /usr/src/build/765721-x86_64/BUILD/kernel-2.6.9/linux-2.6.9/kernel/timer.c: 406 0xffffffff8013fece <cascade>: push %r13 ... /usr/src/build/765721-x86_64/BUILD/kernel-2.6.9/linux-2.6.9/kernel/timer.c: 420 0xffffffff8013feed <cascade+0x1f>: cmp %r12,0x38(%rbp) /usr/src/build/765721-x86_64/BUILD/kernel-2.6.9/linux-2.6.9/kernel/timer.c: 419 0xffffffff8013fef1 <cascade+0x23>: mov %rbp,%rsi /usr/src/build/765721-x86_64/BUILD/kernel-2.6.9/linux-2.6.9/kernel/timer.c: 420 0xffffffff8013fef4 <cascade+0x26>: je 0xffffffff8013ff02 <cascade+0x34> 0xffffffff8013fef6 <cascade+0x28>: ud2a r12 contains 000001024009fbe0, which as shown above, is the address of cpu 1's tvec_base_t. rbp contains the timer_list entry being examined, at address ffffffffa06c0a70. It looks like this: crash> timer_list ffffffffa06c0a70 struct timer_list { entry = { next = 0x1023682dd88, prev = 0x10374c44298 }, expires = 0x10efe6177, lock = { lock = 0x1, magic = 0xdead4ead }, magic = 0x4b87ad6e, function = 0xffffffffa0409ed0, data = 0xffffffffa06c0920, base = 0x10001063be0 } It's "base" field -- 0x38(%rbp) -- contains address 0x10001063be0, which is the address of cpu 0's tvec_base_t, hence the resultant BUG(). The timer_list structure itself, its "function" pointer, and its "data" pointer are all part of the nvidia driver: crash> sym ffffffffa06c0a70 0xffffffffa0409ed0 0xffffffffa06c0920 ffffffffa06c0a70 (b) nv_linux_devices+0x150 ffffffffa0409ed0 (t) nv_kern_rc_timer ffffffffa06c0920 (b) nv_linux_devices crash> The problem is that it is impossible to tell whether the nvidia driver itself is doing something directly with the timer_list entry, or whether there's an issue with the core timer functionality. Typically problems with the timer code are due to drivers behaving badly. Anyway, after discussing this with Jason, we would really need to know whether this can be reproduced without the nvidia driver. I see that the original BUG() trace is for an x86 machine that doesn't have the nvidia driver. But it doesn't have netdump enabled, and the system "freezes". Can they reproduce this on an x86 machine and make the vmcore available to us? It would be really helpful if in that x86 dump, we could see what the offending timer_list entry looks like, i.e., from a non-tainted source.