Bug 79884 - Oops with 2.4.18-18.7.x
Summary: Oops with 2.4.18-18.7.x
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Linux
Classification: Retired
Component: kernel
Version: 7.3
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Arjan van de Ven
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2002-12-17 20:05 UTC by Michal Jaegermann
Modified: 2007-04-18 16:49 UTC (History)
0 users

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2003-01-11 18:27:49 UTC
Embargoed:


Attachments (Terms of Use)

Description Michal Jaegermann 2002-12-17 20:05:24 UTC
Description of problem:

On one machines around 6:40 local time, when a computer in question was
really not doing anything, a kernel oopsed and a machine went down.
An attempt of an autoreboot (nobody was around) ended up with

Uncompressing Linux....

crc error

-- System halted

Only later when a machine was powered down manually it was possible
to power it up and restart.

Here is a decoded oops.

Unable to handle kernel NULL pointer dereference at virtual address 00000005
 printing eip:
c0116a3a
*pde = 00000000
Oops: 0002
CPU:    0
EIP:    0010:[<c0116a3a>]    Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00210092
eax: 00000001   ebx: 00200292   ecx: 00000002   edx: dd97c03c
esi: cd97c000   edi: cd97c008   ebp: 00000000   esp: d390ff1c
ds: 0018   es: 0018   ss: 0018
Process gnome-smproxy (pid: 23583, stackpage=d390f000)
Stack: dd97c038 c0146a6e 00000000 d6a68340 00000001 c0146e26 d390ff54 d390ff54 
       00000020 d390e000 7fffffff 00000006 00000000 00000006 00000000 cd97c000 
       00000001 bffff7f4 deb1dd58 00000006 c01471a9 00000006 d390ff90 d390ff8c 
Call Trace: [<c0146a6e>] poll_freewait [kernel] 0x2e (0xd390ff20))
[<c0146e26>] do_select [kernel] 0x226 (0xd390ff30))
[<c01471a9>] sys_select [kernel] 0x339 (0xd390ff6c))
[<c010893b>] system_call [kernel] 0x33 (0xd390ffc0))
Code: 89 48 04 89 01 53 9d 5b c3 8d b6 00 00 00 00 8d bc 27 00 00 

>>EIP; c0116a3a <remove_wait_queue+a/20>   <=====
Trace; c0146a6e <poll_freewait+2e/50>
Trace; c0146e26 <do_select+226/240>
Trace; c01471a9 <sys_select+339/480>
Trace; c010893b <system_call+33/38>
Code;  c0116a3a <remove_wait_queue+a/20>
00000000 <_EIP>:
Code;  c0116a3a <remove_wait_queue+a/20>   <=====
   0:   89 48 04                  mov    %ecx,0x4(%eax)   <=====
Code;  c0116a3d <remove_wait_queue+d/20>
   3:   89 01                     mov    %eax,(%ecx)
Code;  c0116a3f <remove_wait_queue+f/20>
   5:   53                        push   %ebx
Code;  c0116a40 <remove_wait_queue+10/20>
   6:   9d                        popf   
Code;  c0116a41 <remove_wait_queue+11/20>
   7:   5b                        pop    %ebx
Code;  c0116a42 <remove_wait_queue+12/20>
   8:   c3                        ret    
Code;  c0116a43 <remove_wait_queue+13/20>
   9:   8d b6 00 00 00 00         lea    0x0(%esi),%esi
Code;  c0116a49 <remove_wait_queue+19/20>
   f:   8d bc 27 00 00 00 00      lea    0x0(%edi,1),%edi

Version-Release number of selected component (if applicable):
2.4.18-18.7.x

Comment 1 Ben LaHaise 2002-12-17 20:37:02 UTC
Is the hardware for this machine known good?  Does it pass an overnight run of
memtest86?  The fact that a boot failed with an invalid crc strongly hints at
that, or possibly the cpu overheating.

Comment 2 Michal Jaegermann 2002-12-17 22:49:47 UTC
> Is the hardware for this machine known good?

Well, it is in a continous use for the last two years and this is the
first incident of that sort (some three weeks after 2.4.18-18.7.x
was installed). In other words so far hardware looked good. :-)
It runs for now after a powerdown and reboot.

memtest86 did not run so far and this is not that easy as the machine
is quite far from my desk. :-)  That is still an open option but not
that easy to arrange.

Comment 3 Michal Jaegermann 2002-12-21 17:44:29 UTC
As for today this oops looks like it was really caused by a broken CPU fan.
I will monitor the situation further.

Comment 4 Michal Jaegermann 2003-01-11 18:27:49 UTC
It definitely was a broken fan.


Note You need to log in before you can comment on or make changes to this bug.