From Bugzilla Helper: User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.5b) Gecko/20030907 Firebird/0.6.1+ Description of problem: During test runs of our high availability database product, we are seeing "fairly reproducible" kernel oopses after running for 10-20 hours. The oopses are reproducible in the sense that they seem to always occur after 10-20 hours during a certain test, but there seems to be a single particular point in the tests at which they happen. This would be consistent with our suspicion that certain load conditions provoke the problem. Netdump output for this oops is in the "Additional Information" field. This is the first time we have used netdump to investigate this problem -- on some of the previous occurences of the problem we have seen somewhat different oops messages. We are still running tests and expect to be able to add a few more oops messages to this bug report during the weekend. Version-Release number of selected component (if applicable): 2.4.9-e.24smp How reproducible: Always Steps to Reproduce: 1. Boot machine, start database tests 2. Crash occurs after approx. 10-20 hours Actual Results: Kernel oops, see Description Additional info: System: Asus CUV4X-DLS Dual Socket 370, VIA 694XDP AGP chipset, LSI SYM53C1010-33 ULTRA 160 SCSI controller onboard. BIOS version 1016. Dual Pentium III 800 MHz CPUs 1 Gb RAM 2x18Gb SCSI disks NOTE: We have also tried passing "noapic" to the kernel at boot. The behaviour was no different, except that it did seem as if the crash happened a bit earlier then, i.e. that the problem was actually _worse_ with noapic. The following is the log produced by netdump (full memory dump is available as well): invalid operand: 0000 Kernel 2.4.9-e.24smp CPU: 0 EIP: 0010:[<c038f4f0>] Not tainted EFLAGS: 00010286 EIP is at tvec_bases [kernel] 0x1330 eax: c038e4c0 ebx: c038e1c0 ecx: c038f4e0 edx: d91140dc esi: c038f4e8 edi: c038f4f0 ebp: 00000046 esp: c0311f30 ds: 0018 es: 0018 ss: 0018 Process swapper (pid: 0, stackpage=c0311000) Stack: c01249d1 c038f4e8 c038e1c0 00000000 00000000 c0310000 c0125014 c038e1c0 c038e1c0 00000000 00000000 00000000 c0105400 c0114228 00000046 c0108ed3 00000000 00000000 c0105400 c0310000 c0310000 c0246629 c0311f8c c0105400 Call Trace: [<c01249d1>] __run_timers [kernel] 0xd1 [<c0125014>] run_local_timers [kernel] 0x94 [<c0105400>] default_idle [kernel] 0x0 [<c0114228>] smp_apic_timer_interrupt [kernel] 0xb8 [<c0108ed3>] do_IRQ [kernel] 0xe3 [<c0105400>] default_idle [kernel] 0x0 [<c0246629>] call_apic_timer_interrupt [kernel] 0x5 [<c0105400>] default_idle [kernel] 0x0 [<c0105400>] default_idle [kernel] 0x0 [<c010542e>] default_idle [kernel] 0x2e [<c0105492>] cpu_idle [kernel] 0x32 [<c0105000>] stext [kernel] 0x0 [<c02447a0>] .rodata.str1.32 [kernel] 0x560 Code: f0 f4 38 c0 f0 f4 38 c0 f8 f4 38 c0 f8 f4 38 c0 00 f5 38 c0 CPU#1 is frozen. < netdump activated - performing handshake with the client. > Process: 0, { swapper} Kernel 2.4.9-e.24smp EIP: 0010:[<c038f4f0>] CPU: 0EIP is at tvec_bases [kernel] 0x1330 EFLAGS: 00010286 Not tainted EAX: c038e4c0 EBX: c038e1c0 ECX: c038f4e0 EDX: d91140dc ESI: c038f4e8 EDI: c038f4f0 EBP: 00000046 DS: 0018 ES: 0018 [<c01249d1>] __run_timers [kernel] 0xd1 CR0: 8005003b CR2: 40159c80 CR3: 140c0000 CR4: 000006d0 Call Trace: [<c0125014>] run_local_timers [kernel] 0x94 [<c0105400>] default_idle [kernel] 0x0 [<c0114228>] smp_apic_timer_interrupt [kernel] 0xb8 [<c0108ed3>] do_IRQ [kernel] 0xe3 [<c0105400>] default_idle [kernel] 0x0 [<c0246629>] call_apic_timer_interrupt [kernel] 0x5 [<c0105400>] default_idle [kernel] 0x0 [<c0105400>] default_idle [kernel] 0x0 [<c010542e>] default_idle [kernel] 0x2e [<c0105492>] cpu_idle [kernel] 0x32 [<c0105000>] stext [kernel] 0x0 [<c02447a0>] .rodata.str1.32 [kernel] 0x560
any chance of getting an lsmod for this machine ?
Here's an lsmod: [root@europa10 root]# lsmod Module Size Used by Not tainted nfs 92736 14 (autoclean) lockd 61184 1 (autoclean) [nfs] sunrpc 86096 1 (autoclean) [nfs lockd] netconsole 16320 0 (unused) autofs 13796 7 (autoclean) eepro100 21968 1 usb-uhci 26948 0 (unused) usbcore 68864 1 [usb-uhci] ext3 74176 5 jbd 55304 5 [ext3] sym53c8xx 67940 8 sd_mod 13888 8 scsi_mod 126252 2 [sym53c8xx sd_mod]
Slight bug in the bug description. The sentence "The oopses are reproducible in the sense that they seem to always occur after 10-20 hours during a certain test, but there seems to be a single particular point in the tests at which they happen." should read "The oopses are reproducible in the sense that they seem to always occur after 10-20 hours during a certain test, but there seems to be NO single particular point in the tests at which they happen."
Another crash today on another, identical machine Unable to handle kernel paging request at virtual address 48f914f4 *pde = 00000000 Oops: 0002 Kernel 2.4.9-e.24smp CPU: 0 EIP: 0010:[<c038f483>] Not tainted EFLAGS: 00010282 EIP is at tvec_bases [kernel] 0x12c3 eax: c038dc00 ebx: c038e1c0 ecx: c038f420 edx: f44b3f7c esi: efd73f7c edi: c038f430 ebp: 00000046 esp: c0311eec ds: 0018 es: 0018 ss: 0018 Process swapper (pid: 0, stackpage=c0311000) 68c038f4 Stack: c038f430 efd73f7c 00000046 c0311f10 c038e1c0 f44b3f7c c038f420 c038e400 c038f430 efd73f7c 00000046 c0311f30 c038e1c0 f44b3f7c c038f420 c038e400 c01249d1 efd73f7c c038e1c0 00000000 00000000 c0310000 c0125014 Call Trace: [<c01249d1>] __run_timers [kernel] 0xd1 [<c0125014>] run_local_timers [kernel] 0x94 [<c0105400>] default_idle [kernel] 0x0 [<c0114228>] smp_apic_timer_interrupt [kernel] 0xb8 [<c0108eee>] do_IRQ [kernel] 0xfe [<c0105400>] default_idle [kernel] 0x0 [<c0246629>] call_apic_timer_interrupt [kernel] 0x5 [<c0105400>] default_idle [kernel] 0x0 [<c0105400>] default_idle [kernel] 0x0 [<c010542e>] default_idle [kernel] 0x2e [<c0105492>] cpu_idle [kernel] 0x32 [<c0105000>] stext [kernel] 0x0 [<c02447a0>] .rodata.str1.32 [kernel] 0x560 Code: c0 80 f4 38 c0 88 f4 38 c0 88 f4 38 c0 90 f4 38 c0 90 f4 38 CPU#1 is frozen. < netdump activated - performing handshake with the client. > Process: 0, { swapper} Kernel 2.4.9-e.24smp EIP: 0010:[<c038f483>] CPU: 0EIP is at tvec_bases [kernel] 0x12c3 EFLAGS: 00010282 Not tainted EAX: c038dc00 EBX: c038e1c0 ECX: c038f420 EDX: f44b3f7c ESI: efd73f7c EDI: c038f430 EBP: 00000046 DS: 0018 ES: 0018 CR0: 8005003b CR2: 48f914f4 CR3: 31f9c000 CR4: 000006d0 Call Trace: [<c01249d1>] __run_timers [kernel] 0xd1 [<c0125014>] run_local_timers [kernel] 0x94 [<c0105400>] default_idle [kernel] 0x0 [<c0114228>] smp_apic_timer_interrupt [kernel] 0xb8 [<c0108eee>] do_IRQ [kernel] 0xfe [<c0105400>] default_idle [kernel] 0x0 [<c0246629>] call_apic_timer_interrupt [kernel] 0x5 [<c0105400>] default_idle [kernel] 0x0 [<c0105400>] default_idle [kernel] 0x0 [<c010542e>] default_idle [kernel] 0x2e [<c0105492>] cpu_idle [kernel] 0x32 [<c0105000>] stext [kernel] 0x0 [<c02447a0>] .rodata.str1.32 [kernel] 0x560
*** This bug has been marked as a duplicate of 84452 ***
Changed to 'CLOSED' state since 'RESOLVED' has been deprecated.