Bug 5474

Summary:	frequent kernel crash
Product:	[Retired] Red Hat Linux	Reporter:	Brian Gunney <btng>
Component:	kernel	Assignee:	Michael K. Johnson <johnsonm>
Status:	CLOSED NOTABUG	QA Contact:
Severity:	medium	Docs Contact:
Priority:	medium
Version:	6.0	CC:	davem
Target Milestone:	---
Target Release:	---
Hardware:	i386
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2000-02-05 23:41:13 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Brian Gunney 1999-10-02 15:38:19 UTC

Ever since RH 6.0, I've been having a lot of problems with
crashing kernels.  I've upgraded the kernel-related packages
to 2.2.5-22 and  the XFree86  packages to 3.3.5-1.6.0.

Usually, the crash is either associated with opening lots of
windows (swapping problem, bug 4224), netscape (X font
server, bug 4187) or just simply leaving the computer
running for several days.  Usually, the crash occurs when I
shut down the X-server, log out (as an ordinary user in run
level 5) or changing to run level 1.  There is so much
diagnostic and the computer hangs, requiring a hard reboot.
(Occasionally, as a result, some packages get corrupted and
I have to reinstall them.)  A few days ago, the diagnostic
was short enough to fit on the screen, so I was able to
transcribe them before rebooting.  Here goes:

Output left on the last screen before the computer hung.

...
INIT: Sending processes the TERM signal
Shutting down X Font Server:					[OK]
Stopping gpm							[OK]
Unable to handle kernel paging request at virtual addresses
d10ea7d8
current->tss.cr3 = 00872000, %cr3=00872000
*pde = 00000000
Oops: 0000
CPU: 0
EIP: 0010:[<c01307ae>]
EFLAGS: 00010212
eax: d10ea770 ebx: 000008ae ecx: c022fd68 edx: d10ea770
esi: c3d9c200 edi: 000008ae ebp: c3d9c200 esp: c335df14
ds: 0018 es: 0018 ss: 0018
Process rc (pid:2932, process nr:45, stackpage=c335d000)
stack: c0130a0d c3d9c200 000008ae c022fd68 000008ae c2041360
c2595540 c259558c
       c013d414 c3d9c200 000008ae c2041360 c2595540 c335df88
c26dc4b0 c012b05a
       c2595540 c2041360 c335df88 00000000 c243e016 00000001
c012b222 c3f5c280
Call Trace:
[<c0130a0d>][<c013d414>][<c012b05a>][<c012b222>][<c012b314>][<c0129422>][<c01095a8>]
Code: 39 70 68 75 0d 39 58 18 75 08 ff 40 1c eb 0b 8d 76 00
8b 12
INIT: no more processes left in this runlevel

At this point, the computer hung.

At other times, diagnostics claims a kernel panic.  Yet
other times, there is no diagnostic at all--it just hangs.
I realize that this is not very specific information.  Let
me know how to get more better information on the crash.

Comment 1 Bill Nottingham 1999-10-04 15:32:59 UTC

One thing you can do is run the oops through the
'ksymoops' program (it's included in the kernel source
in /usr/src/linux/scripts/ksymoops). This will show exactly
where the kernel dies at.

Comment 2 Cristian Gafton 1999-10-06 22:55:59 UTC

assigned to dledford

Comment 3 Cristian Gafton 1999-10-06 23:45:59 UTC

assigned to dledford

Comment 4 Doug Ledford 1999-10-07 02:14:59 UTC

Please download and install the 2.2.12-20 kernel from the Red Hat
Linux 6.1 distribution and see if the problem continues.  Report that
back here.  Until we know if the problem still happens this bug report
will be on hold.

------- Additional Comments From   10/11/99 09:04 -------
Where is the Oops file located?

Comment 5 Brian Gunney 1999-11-19 00:12:59 UTC

I've updated to kernel 2.2.12-20--The problem still exists.

Comment 6 Doug Ledford 1999-11-19 04:11:59 UTC

The problem description you've given so far sounds largely like a hardware
corruption issue.  We have a standard test we run on systems to see if there are
hardware problems.  That test requires about 100MBytes of free disk space, one
of the linux-2.2.x.tar.gz kernel source tar balls, and the following script:

#!/bin/sh
rm -fr linux linux.save
tar xzf linux-2.2.0.tar.gz
mv linux linux.save
i=0
while [ "$i" -lt "10" ]; do
  tar xzf linux-2.2.0.tar.gz
  diff -U 3 -rN linux.save linux
  rm -rf linux
  i=`expr $i + 1`
done
rm -fr linux.save
_______End of Script______

If you run that script and it generates any output on your computer screen at
all, then you most likely have hardware corruption.  I realize this isn't
something one would expect to show up just by upgrading the OS, but keep in mind
that with each version of our product that we release, the kernel typically just
keeps getting faster and faster, and the later kernels can cause problems to
show up on systems that were just barely eeking by before.

Comment 7 Brian Gunney 1999-12-09 23:10:59 UTC

I got output from running the script, so it must be hardware.