104297 – Oops in process swapper during call to __run_timers

Bug 104297 - Oops in process swapper during call to __run_timers

Summary: Oops in process swapper during call to __run_timers

Keywords:
Status:	CLOSED DUPLICATE of bug 84452
Alias:	None
Product:	Red Hat Enterprise Linux 2.1
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	2.1
Hardware:	i686
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	---
Assignee:	Larry Woodman
QA Contact:	Brian Brock
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2003-09-12 08:57 UTC by Yngve Svendsen
Modified:	2007-11-30 22:06 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2006-02-21 18:58:34 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Yngve Svendsen 2003-09-12 08:57:27 UTC

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.5b)
Gecko/20030907 Firebird/0.6.1+

Description of problem:
During test runs of our high availability database product, we are seeing
"fairly reproducible" kernel oopses after running for 10-20 hours. The oopses
are reproducible in the sense that they seem to always occur after 10-20 hours
during a certain test, but there seems to be a single particular point in the
tests at which they happen. This would be consistent with our suspicion that
certain load conditions provoke the problem.

Netdump output for this oops is in the "Additional Information" field. This is
the first time we have used netdump to investigate this problem -- on some of
the previous occurences of the problem we have seen somewhat different oops
messages. We are still running tests and expect to be able to add a few more
oops messages to this bug report during the weekend.

Version-Release number of selected component (if applicable):
2.4.9-e.24smp

How reproducible:
Always

Steps to Reproduce:
1. Boot machine, start database tests
2. Crash occurs after approx. 10-20 hours
  

Actual Results:  Kernel oops, see Description

Additional info:

System:
Asus CUV4X-DLS Dual Socket 370, VIA 694XDP AGP chipset, LSI SYM53C1010-33 ULTRA
160 SCSI controller onboard. BIOS version 1016. Dual Pentium III 800 MHz CPUs
1 Gb RAM
2x18Gb SCSI disks

NOTE: We have also tried passing "noapic" to the kernel at boot. The behaviour
was no different, except that it did seem as if the crash happened a bit earlier
then, i.e. that the problem was actually _worse_ with noapic.

The following is the log produced by netdump (full memory dump is available as
well):

invalid operand: 0000
Kernel 2.4.9-e.24smp
CPU:    0
EIP:    0010:[<c038f4f0>]    Not tainted
EFLAGS: 00010286
EIP is at tvec_bases [kernel] 0x1330 
eax: c038e4c0   ebx: c038e1c0   ecx: c038f4e0   edx: d91140dc
esi: c038f4e8   edi: c038f4f0   ebp: 00000046   esp: c0311f30
ds: 0018   es: 0018   ss: 0018
Process swapper (pid: 0, stackpage=c0311000)
Stack: c01249d1 c038f4e8 c038e1c0 00000000 00000000 c0310000 c0125014 c038e1c0 
       c038e1c0 00000000 00000000 00000000 c0105400 c0114228 00000046 c0108ed3 
       00000000 00000000 c0105400 c0310000 c0310000 c0246629 c0311f8c c0105400 
Call Trace: [<c01249d1>] __run_timers [kernel] 0xd1 
[<c0125014>] run_local_timers [kernel] 0x94 
[<c0105400>] default_idle [kernel] 0x0 
[<c0114228>] smp_apic_timer_interrupt [kernel] 0xb8 
[<c0108ed3>] do_IRQ [kernel] 0xe3 
[<c0105400>] default_idle [kernel] 0x0 
[<c0246629>] call_apic_timer_interrupt [kernel] 0x5 
[<c0105400>] default_idle [kernel] 0x0 
[<c0105400>] default_idle [kernel] 0x0 
[<c010542e>] default_idle [kernel] 0x2e 
[<c0105492>] cpu_idle [kernel] 0x32 
[<c0105000>] stext [kernel] 0x0 
[<c02447a0>] .rodata.str1.32 [kernel] 0x560 


Code: f0 f4 38 c0 f0 f4 38 c0 f8 f4 38 c0 f8 f4 38 c0 00 f5 38 c0 
CPU#1 is frozen.
< netdump activated - performing handshake with the client. >

Process: 0, {             swapper}
Kernel 2.4.9-e.24smp
EIP: 0010:[<c038f4f0>] CPU: 0EIP is at tvec_bases [kernel] 0x1330 
 EFLAGS: 00010286    Not tainted
EAX: c038e4c0 EBX: c038e1c0 ECX: c038f4e0 EDX: d91140dc
ESI: c038f4e8 EDI: c038f4f0 EBP: 00000046 DS: 0018 ES: 0018
[<c01249d1>] __run_timers [kernel] 0xd1 
CR0: 8005003b CR2: 40159c80 CR3: 140c0000 CR4: 000006d0
Call Trace: [<c0125014>] run_local_timers [kernel] 0x94 
[<c0105400>] default_idle [kernel] 0x0 
[<c0114228>] smp_apic_timer_interrupt [kernel] 0xb8 
[<c0108ed3>] do_IRQ [kernel] 0xe3 
[<c0105400>] default_idle [kernel] 0x0 
[<c0246629>] call_apic_timer_interrupt [kernel] 0x5 
[<c0105400>] default_idle [kernel] 0x0 
[<c0105400>] default_idle [kernel] 0x0 
[<c010542e>] default_idle [kernel] 0x2e 
[<c0105492>] cpu_idle [kernel] 0x32 
[<c0105000>] stext [kernel] 0x0 
[<c02447a0>] .rodata.str1.32 [kernel] 0x560

Comment 1 Arjan van de Ven 2003-09-12 09:03:19 UTC

any chance of getting an lsmod for this machine ?

Comment 2 Yngve Svendsen 2003-09-12 09:13:51 UTC

Here's an lsmod:

[root@europa10 root]# lsmod
Module                  Size  Used by    Not tainted
nfs                    92736  14  (autoclean)
lockd                  61184   1  (autoclean) [nfs]
sunrpc                 86096   1  (autoclean) [nfs lockd]
netconsole             16320   0  (unused)
autofs                 13796   7  (autoclean)
eepro100               21968   1 
usb-uhci               26948   0  (unused)
usbcore                68864   1  [usb-uhci]
ext3                   74176   5 
jbd                    55304   5  [ext3]
sym53c8xx              67940   8 
sd_mod                 13888   8 
scsi_mod              126252   2  [sym53c8xx sd_mod]

Comment 3 Yngve Svendsen 2003-09-12 09:46:16 UTC

Slight bug in the bug description. The sentence
"The oopses are reproducible in the sense that they seem to always occur after
10-20 hours during a certain test, but there seems to be a single particular
point in the tests at which they happen."

should read

"The oopses are reproducible in the sense that they seem to always occur after
10-20 hours during a certain test, but there seems to be NO single particular
point in the tests at which they happen."

Comment 4 Yngve Svendsen 2003-09-12 11:12:49 UTC

Another crash today on another, identical machine

Unable to handle kernel paging request at virtual address 48f914f4
*pde = 00000000
Oops: 0002
Kernel 2.4.9-e.24smp
CPU:    0
EIP:    0010:[<c038f483>]    Not tainted
EFLAGS: 00010282
EIP is at tvec_bases [kernel] 0x12c3 
eax: c038dc00   ebx: c038e1c0   ecx: c038f420   edx: f44b3f7c
esi: efd73f7c   edi: c038f430   ebp: 00000046   esp: c0311eec
ds: 0018   es: 0018   ss: 0018
Process swapper (pid: 0, stackpage=c0311000)
68c038f4 Stack: c038f430 efd73f7c 00000046 c0311f10 c038e1c0 f44b3f7c c038f420 
       c038e400 c038f430 efd73f7c 00000046 c0311f30 c038e1c0 f44b3f7c c038f420 
       c038e400 c01249d1 efd73f7c c038e1c0 00000000 00000000 c0310000 c0125014 
Call Trace: [<c01249d1>] __run_timers [kernel] 0xd1 
[<c0125014>] run_local_timers [kernel] 0x94 
[<c0105400>] default_idle [kernel] 0x0 
[<c0114228>] smp_apic_timer_interrupt [kernel] 0xb8 
[<c0108eee>] do_IRQ [kernel] 0xfe 
[<c0105400>] default_idle [kernel] 0x0 
[<c0246629>] call_apic_timer_interrupt [kernel] 0x5 
[<c0105400>] default_idle [kernel] 0x0 
[<c0105400>] default_idle [kernel] 0x0 
[<c010542e>] default_idle [kernel] 0x2e 
[<c0105492>] cpu_idle [kernel] 0x32 
[<c0105000>] stext [kernel] 0x0 
[<c02447a0>] .rodata.str1.32 [kernel] 0x560 


Code: c0 80 f4 38 c0 88 f4 38 c0 88 f4 38 c0 90 f4 38 c0 90 f4 38 
CPU#1 is frozen.
< netdump activated - performing handshake with the client. >

Process: 0, {             swapper}
Kernel 2.4.9-e.24smp
EIP: 0010:[<c038f483>] CPU: 0EIP is at tvec_bases [kernel] 0x12c3 
 EFLAGS: 00010282    Not tainted
EAX: c038dc00 EBX: c038e1c0 ECX: c038f420 EDX: f44b3f7c
ESI: efd73f7c EDI: c038f430 EBP: 00000046 DS: 0018 ES: 0018
CR0: 8005003b CR2: 48f914f4 CR3: 31f9c000 CR4: 000006d0
Call Trace: [<c01249d1>] __run_timers [kernel] 0xd1 
[<c0125014>] run_local_timers [kernel] 0x94 
[<c0105400>] default_idle [kernel] 0x0 
[<c0114228>] smp_apic_timer_interrupt [kernel] 0xb8 
[<c0108eee>] do_IRQ [kernel] 0xfe 
[<c0105400>] default_idle [kernel] 0x0 
[<c0246629>] call_apic_timer_interrupt [kernel] 0x5 
[<c0105400>] default_idle [kernel] 0x0 
[<c0105400>] default_idle [kernel] 0x0 
[<c010542e>] default_idle [kernel] 0x2e 
[<c0105492>] cpu_idle [kernel] 0x32 
[<c0105000>] stext [kernel] 0x0 
[<c02447a0>] .rodata.str1.32 [kernel] 0x560

Comment 5 Jeff Needle 2003-10-02 14:31:23 UTC


*** This bug has been marked as a duplicate of 84452 ***

Comment 6 Red Hat Bugzilla 2006-02-21 18:58:34 UTC

Changed to 'CLOSED' state since 'RESOLVED' has been deprecated.

Note You need to log in before you can comment on or make changes to this bug.