Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

For bugs related to Red Hat Enterprise Linux 4 product line. The current stable release is 4.9. For Red Hat Enterprise Linux 6 and above, please visit Red Hat JIRA https://issues.redhat.com/secure/CreateIssue!default.jspa?pid=12332745 to report new issues.

Bug 198321

Summary:	kernel freeze at "kernel BUG at kernel/timer.c:420!"
Product:	Red Hat Enterprise Linux 4	Reporter:	Leonard Ye <leonardye>
Component:	kernel	Assignee:	Dave Anderson <anderson>
Status:	CLOSED WONTFIX	QA Contact:	Brian Brock <bbrock>
Severity:	high	Docs Contact:
Priority:	medium
Version:	4.0	CC:	jbaron, tao
Target Milestone:	---	Keywords:	Reopened
Target Release:	---
Hardware:	i686
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2007-08-29 22:12:17 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	181409

Description Leonard Ye 2006-07-11 02:37:34 UTC

Description of problem:
kernel 2.6.9-34.0.1.ELsmp freezes after 1-2 weeks of running as a highly 
loaded router/firewall.  The BUG_ON output logged via serial cable is:

Jul 05 22:16:27: ------------[ cut here ]------------
Jul 05 22:16:27: kernel BUG at kernel/timer.c:420!
Jul 05 22:16:27: invalid operand: 0000 [#1]
Jul 05 22:16:27: SMP
Jul 05 22:16:27: Modules linked in: cls_u32 sch_sfq sch_cbq ipt_REDIRECT 
iptable_mangle iptable_nat ip_conntrack iptable_filter md5 ipv6 ip_tables 
8021q cpuid dm_mirror dm_mod button battery ac ohci_hcd eepro100 e100 mii 
e1000 tg3 floppy ext3 jbd dpt_i2o sd_mod scsi_mod
Jul 05 22:16:27: CPU:    1
Jul 05 22:16:27: EIP:    0060:[<c0129d99>]    Not tainted VLI
Jul 05 22:16:27: EFLAGS: 00010887   (2.6.9-34.0.1.ELsmp)
Jul 05 22:16:27: EIP is at cascade+0x18/0x37
Jul 05 22:16:27: eax: c2ee57e0   ebx: c2ee6160   ecx: 0000002e   edx: e045808c
Jul 05 22:16:27: esi: e045808c   edi: c2ee57e0   ebp: 0000002e   esp: c03cbfb4
Jul 05 22:16:27: ds: 007b   es: 007b   ss: 0068
Jul 05 22:16:27: Process swapper (pid: 0, threadinfo=c03cb000 task=f7f310b0)
Jul 05 22:16:27: Stack: 00000000 c038fb88 c2ee57e0 c03cbfcc c012a211 00000246 
c03cbfcc c03cbfcc
Jul 05 22:16:28:        0000012c 00000001 c038fb88 0000000a 00000001 c0126748 
f7f00f78 00000046
Jul 05 22:16:28:        00000000 00000000 c0108143
Jul 05 22:16:28: Call Trace:
Jul 05 22:16:28:  [<c012a211>] run_timer_softirq+0x6f/0x145
Jul 05 22:16:28:  [<c0126748>] __do_softirq+0x4c/0xb1
Jul 05 22:16:28:  [<c0108143>] do_softirq+0x4f/0x56
Jul 05 22:16:28:  =======================
Jul 05 22:16:28:  [<c0117488>] smp_apic_timer_interrupt+0x9a/0x9c
Jul 05 22:16:28:  [<c02d315a>] apic_timer_interrupt+0x1a/0x20
Jul 05 22:16:28:  [<c0104018>] default_idle+0x0/0x2f
Jul 05 22:16:28:  [<c0104041>] default_idle+0x29/0x2f
Jul 05 22:16:28:  [<c01040a0>] cpu_idle+0x26/0x3b
Jul 05 22:16:28: Code: e8 51 ff ff ff 85 c0 74 08 0f 0b 8d 01 9e 46 2e c0 5b 
c3 55 89 cd 57 89 c7 56 53 8d 1c ca 8b 33 39 de 74 1a 39 7e 20 89 f2 74 08 
<0f> 0b a4 01 9e 46 2e c0 8b 36 89 f8 e8 86 fc ff ff eb e2 89 1b

After this message, the machine seems to be frozen.  Hard-reset is needed to 
get back the system.

The hardware spec is:
Server Model: Gigabyte GS-SR222E
CPU: 2 X Intel 2.4GHz (hyperthreading enabled)
Motherboard: Gigabyte GA-8EGPDRE
Memory: 4G ECC memory
RAID card: Adaptec ASR-2015S


Version-Release number of selected component (if applicable):
kernel 2.6.9-34.0.1.ELsmp on RHEL 4

How reproducible:
Cannot reproduce it. But it is recurring once every 1-2 weeks.  The machine is 
used as a router + firwall and its average load from `uptime` is around 10-15 
during peak hour.

Steps to Reproduce:
1.
2.
3.
  
Actual results:
System freeze, even when /proc/sys/kernel/panic is 1.

Expected results:
System should not freeze

Additional info:

Comment 1 Jason Baron 2006-07-13 19:48:40 UTC


*** This bug has been marked as a duplicate of 174990 ***

Comment 2 Jason Baron 2006-07-13 19:49:28 UTC

This issue is resolved in the U4 kernel. Please find test kernels available at:
http://people.redhat.com/~jbaron/rhel4/

Comment 4 Guy Streeter 2006-09-26 19:13:33 UTC

I spoke with jbaron, and he is unable to confirm that the fix for bug 174990
should have fixed this bug. We have a customer who is still seeing it with the
2.6.9-42.0.2.EL kernel.

Comment 11 RHEL Program Management 2006-09-26 19:32:13 UTC

This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 15 Dave Anderson 2006-10-30 21:47:26 UTC

The x86_64 system triggering the BUG() in cascade() has
two cpus, and therefore two tvec_base_t structures:

crash> timer | grep TVEC
TVEC_BASES[0]: 10001063be0
TVEC_BASES[1]: 1024009fbe0
crash>

While running on cpu 1, run_timer_softirq() is kicked off:

crash> bt
PID: 16462  TASK: 1016d58a030       CPU: 1   COMMAND: "prelink"
 #0 [10438fd3d40] netpoll_start_netdump at ffffffffa0824366
 #1 [10438fd3d70] die at ffffffff80111c00
 #2 [10438fd3d90] do_invalid_op at ffffffff80111fc8
 #3 [10438fd3dc8] cascade at ffffffff8013fef6
 #4 [10438fd3de0] activate_task at ffffffff801321e3
 #5 [10438fd3e10] recalc_task_prio at ffffffff80132155
 #6 [10438fd3e50] error_exit at ffffffff80110d91
    [exception RIP: cascade+0x28]
    RIP: ffffffff8013fef6  RSP: 0000010438fd3f00  RFLAGS: 00010087
    RAX: 00000102400a05e8  RBX: 00000102400a0dd8  RCX: 000000010efe5e9f
    RDX: 00000102400a05e8  RSI: ffffffffa06c0a70  RDI: 000001024009fbe0
    RBP: ffffffffa06c0a70   R8: 000000000000009f   R9: 0000000000000080
    R10: 0000000000000000  R11: 0000000000000002  R12: 000001024009fbe0
    R13: 000000000000001e  R14: 0000000000000246  R15: 0000000000800638
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0000
 #7 [10438fd3f20] run_timer_softirq at ffffffff8013ffa0
 #8 [10438fd3f70] __do_softirq at ffffffff8013c738
 #9 [10438fd3fa0] do_softirq at ffffffff8013c7e1
#10 [10438fd3fb0] apic_timer_interrupt at ffffffff80110bf5
--- <IRQ stack> ---
#11 [1034946df58] apic_timer_interrupt at ffffffff80110bf5
    RIP: 0000000000401a38  RSP: 0000007fbffecc60  RFLAGS: 00000206
    RAX: 0000000000630160  RBX: 0000007fbffef370  RCX: 0000000000000008
    RDX: 0000000000046f58  RSI: 0000000000023010  RDI: 0000000000000001
    RBP: 0000000000046f58   R8: 0000000000801848   R9: 0000000000000003
    R10: 00000000ffffffff  R11: 0000000000000016  R12: 0000007fbffef370
    R13: 0000000000007a16  R14: 0000000000801650  R15: 0000000000800638
    ORIG_RAX: ffffffffffffffef  CS: 0033  SS: 002b
crash>

The cascade() BUG() occurs because, while manipulating
its per-cpu (cpu 1's) timer lists, it finds a timer_list entry whose
"base" pointer doesn't point back to cpu 1's tvec_base_t structure:

static int cascade(tvec_base_t *base, tvec_t *tv, int index)
{
        /* cascade all the timers from tv up one level */
        struct list_head *head, *curr;

        head = tv->vec + index;
        curr = head->next;
        /*
         * We are removing _all_ timers from the list, so we don't  have to
         * detach them individually, just clear the list afterwards.
         */
        while (curr != head) {
                struct timer_list *tmp;

                tmp = list_entry(curr, struct timer_list, entry);
                BUG_ON(tmp->base != base);
                curr = curr->next;
                internal_add_timer(base, tmp);
        }
        INIT_LIST_HEAD(head);

        return index;
}

crash> dis -rl ffffffff8013fef6
/usr/src/build/765721-x86_64/BUILD/kernel-2.6.9/linux-2.6.9/kernel/timer.c: 406
0xffffffff8013fece <cascade>:   push   %r13
...
/usr/src/build/765721-x86_64/BUILD/kernel-2.6.9/linux-2.6.9/kernel/timer.c: 420
0xffffffff8013feed <cascade+0x1f>:      cmp    %r12,0x38(%rbp)
/usr/src/build/765721-x86_64/BUILD/kernel-2.6.9/linux-2.6.9/kernel/timer.c: 419
0xffffffff8013fef1 <cascade+0x23>:      mov    %rbp,%rsi
/usr/src/build/765721-x86_64/BUILD/kernel-2.6.9/linux-2.6.9/kernel/timer.c: 420
0xffffffff8013fef4 <cascade+0x26>:      je     0xffffffff8013ff02 <cascade+0x34>
0xffffffff8013fef6 <cascade+0x28>:      ud2a

r12 contains 000001024009fbe0, which as shown above, is the address
of cpu 1's tvec_base_t.  rbp contains the timer_list entry being
examined, at address ffffffffa06c0a70.  It looks like this:

crash> timer_list ffffffffa06c0a70
struct timer_list {
  entry = {
    next = 0x1023682dd88,
    prev = 0x10374c44298
  },
  expires = 0x10efe6177,
  lock = {
    lock = 0x1,
    magic = 0xdead4ead
  },
  magic = 0x4b87ad6e,
  function = 0xffffffffa0409ed0,
  data = 0xffffffffa06c0920,
  base = 0x10001063be0
}

It's "base" field -- 0x38(%rbp) -- contains address 0x10001063be0,
which is the address of cpu 0's tvec_base_t, hence the resultant
BUG().

The timer_list structure itself, its "function" pointer, and its
"data" pointer are all part of the nvidia driver:

crash> sym ffffffffa06c0a70 0xffffffffa0409ed0 0xffffffffa06c0920
ffffffffa06c0a70 (b) nv_linux_devices+0x150
ffffffffa0409ed0 (t) nv_kern_rc_timer
ffffffffa06c0920 (b) nv_linux_devices
crash>

The problem is that it is impossible to tell whether the nvidia
driver itself is doing something directly with the timer_list
entry, or whether there's an issue with the core timer functionality.
Typically problems with the timer code are due to drivers
behaving badly.

Anyway, after discussing this with Jason, we would really need to 
know whether this can be reproduced without the nvidia driver.
I see that the original BUG() trace is for an x86 machine that
doesn't have the nvidia driver.  But it doesn't have netdump
enabled, and the system "freezes".  Can they reproduce this on
an x86 machine and make the vmcore available to us? 
It would be really helpful if in that x86 dump, we could
see what the offending timer_list entry looks like, i.e., from
a non-tainted source.