79884 – Oops with 2.4.18-18.7.x

Bug 79884 - Oops with 2.4.18-18.7.x

Summary: Oops with 2.4.18-18.7.x

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Red Hat Linux
Classification:	Retired
Component:	kernel
Sub Component:
Version:	7.3
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Arjan van de Ven
QA Contact:	Brian Brock
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2002-12-17 20:05 UTC by Michal Jaegermann
Modified:	2007-04-18 16:49 UTC (History)
CC List:	0 users
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2003-01-11 18:27:49 UTC
Embargoed:

Attachments	(Terms of Use)

Description Michal Jaegermann 2002-12-17 20:05:24 UTC

Description of problem:

On one machines around 6:40 local time, when a computer in question was
really not doing anything, a kernel oopsed and a machine went down.
An attempt of an autoreboot (nobody was around) ended up with

Uncompressing Linux....

crc error

-- System halted

Only later when a machine was powered down manually it was possible
to power it up and restart.

Here is a decoded oops.

Unable to handle kernel NULL pointer dereference at virtual address 00000005
 printing eip:
c0116a3a
*pde = 00000000
Oops: 0002
CPU:    0
EIP:    0010:[<c0116a3a>]    Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00210092
eax: 00000001   ebx: 00200292   ecx: 00000002   edx: dd97c03c
esi: cd97c000   edi: cd97c008   ebp: 00000000   esp: d390ff1c
ds: 0018   es: 0018   ss: 0018
Process gnome-smproxy (pid: 23583, stackpage=d390f000)
Stack: dd97c038 c0146a6e 00000000 d6a68340 00000001 c0146e26 d390ff54 d390ff54 
       00000020 d390e000 7fffffff 00000006 00000000 00000006 00000000 cd97c000 
       00000001 bffff7f4 deb1dd58 00000006 c01471a9 00000006 d390ff90 d390ff8c 
Call Trace: [<c0146a6e>] poll_freewait [kernel] 0x2e (0xd390ff20))
[<c0146e26>] do_select [kernel] 0x226 (0xd390ff30))
[<c01471a9>] sys_select [kernel] 0x339 (0xd390ff6c))
[<c010893b>] system_call [kernel] 0x33 (0xd390ffc0))
Code: 89 48 04 89 01 53 9d 5b c3 8d b6 00 00 00 00 8d bc 27 00 00 

>>EIP; c0116a3a <remove_wait_queue+a/20>   <=====
Trace; c0146a6e <poll_freewait+2e/50>
Trace; c0146e26 <do_select+226/240>
Trace; c01471a9 <sys_select+339/480>
Trace; c010893b <system_call+33/38>
Code;  c0116a3a <remove_wait_queue+a/20>
00000000 <_EIP>:
Code;  c0116a3a <remove_wait_queue+a/20>   <=====
   0:   89 48 04                  mov    %ecx,0x4(%eax)   <=====
Code;  c0116a3d <remove_wait_queue+d/20>
   3:   89 01                     mov    %eax,(%ecx)
Code;  c0116a3f <remove_wait_queue+f/20>
   5:   53                        push   %ebx
Code;  c0116a40 <remove_wait_queue+10/20>
   6:   9d                        popf   
Code;  c0116a41 <remove_wait_queue+11/20>
   7:   5b                        pop    %ebx
Code;  c0116a42 <remove_wait_queue+12/20>
   8:   c3                        ret    
Code;  c0116a43 <remove_wait_queue+13/20>
   9:   8d b6 00 00 00 00         lea    0x0(%esi),%esi
Code;  c0116a49 <remove_wait_queue+19/20>
   f:   8d bc 27 00 00 00 00      lea    0x0(%edi,1),%edi

Version-Release number of selected component (if applicable):
2.4.18-18.7.x

Comment 1 Ben LaHaise 2002-12-17 20:37:02 UTC

Is the hardware for this machine known good?  Does it pass an overnight run of
memtest86?  The fact that a boot failed with an invalid crc strongly hints at
that, or possibly the cpu overheating.

Comment 2 Michal Jaegermann 2002-12-17 22:49:47 UTC

> Is the hardware for this machine known good?

Well, it is in a continous use for the last two years and this is the
first incident of that sort (some three weeks after 2.4.18-18.7.x
was installed). In other words so far hardware looked good. :-)
It runs for now after a powerdown and reboot.

memtest86 did not run so far and this is not that easy as the machine
is quite far from my desk. :-)  That is still an open option but not
that easy to arrange.

Comment 3 Michal Jaegermann 2002-12-21 17:44:29 UTC

As for today this oops looks like it was really caused by a broken CPU fan.
I will monitor the situation further.

Comment 4 Michal Jaegermann 2003-01-11 18:27:49 UTC

It definitely was a broken fan.

Note You need to log in before you can comment on or make changes to this bug.