Ever since RH 6.0, I've been having a lot of problems with crashing kernels. I've upgraded the kernel-related packages to 2.2.5-22 and the XFree86 packages to 3.3.5-1.6.0. Usually, the crash is either associated with opening lots of windows (swapping problem, bug 4224), netscape (X font server, bug 4187) or just simply leaving the computer running for several days. Usually, the crash occurs when I shut down the X-server, log out (as an ordinary user in run level 5) or changing to run level 1. There is so much diagnostic and the computer hangs, requiring a hard reboot. (Occasionally, as a result, some packages get corrupted and I have to reinstall them.) A few days ago, the diagnostic was short enough to fit on the screen, so I was able to transcribe them before rebooting. Here goes: Output left on the last screen before the computer hung. ... INIT: Sending processes the TERM signal Shutting down X Font Server: [OK] Stopping gpm [OK] Unable to handle kernel paging request at virtual addresses d10ea7d8 current->tss.cr3 = 00872000, %cr3=00872000 *pde = 00000000 Oops: 0000 CPU: 0 EIP: 0010:[<c01307ae>] EFLAGS: 00010212 eax: d10ea770 ebx: 000008ae ecx: c022fd68 edx: d10ea770 esi: c3d9c200 edi: 000008ae ebp: c3d9c200 esp: c335df14 ds: 0018 es: 0018 ss: 0018 Process rc (pid:2932, process nr:45, stackpage=c335d000) stack: c0130a0d c3d9c200 000008ae c022fd68 000008ae c2041360 c2595540 c259558c c013d414 c3d9c200 000008ae c2041360 c2595540 c335df88 c26dc4b0 c012b05a c2595540 c2041360 c335df88 00000000 c243e016 00000001 c012b222 c3f5c280 Call Trace: [<c0130a0d>][<c013d414>][<c012b05a>][<c012b222>][<c012b314>][<c0129422>][<c01095a8>] Code: 39 70 68 75 0d 39 58 18 75 08 ff 40 1c eb 0b 8d 76 00 8b 12 INIT: no more processes left in this runlevel At this point, the computer hung. At other times, diagnostics claims a kernel panic. Yet other times, there is no diagnostic at all--it just hangs. I realize that this is not very specific information. Let me know how to get more better information on the crash.
One thing you can do is run the oops through the 'ksymoops' program (it's included in the kernel source in /usr/src/linux/scripts/ksymoops). This will show exactly where the kernel dies at.
assigned to dledford
Please download and install the 2.2.12-20 kernel from the Red Hat Linux 6.1 distribution and see if the problem continues. Report that back here. Until we know if the problem still happens this bug report will be on hold. ------- Additional Comments From 10/11/99 09:04 ------- Where is the Oops file located?
I've updated to kernel 2.2.12-20--The problem still exists.
The problem description you've given so far sounds largely like a hardware corruption issue. We have a standard test we run on systems to see if there are hardware problems. That test requires about 100MBytes of free disk space, one of the linux-2.2.x.tar.gz kernel source tar balls, and the following script: #!/bin/sh rm -fr linux linux.save tar xzf linux-2.2.0.tar.gz mv linux linux.save i=0 while [ "$i" -lt "10" ]; do tar xzf linux-2.2.0.tar.gz diff -U 3 -rN linux.save linux rm -rf linux i=`expr $i + 1` done rm -fr linux.save _______End of Script______ If you run that script and it generates any output on your computer screen at all, then you most likely have hardware corruption. I realize this isn't something one would expect to show up just by upgrading the OS, but keep in mind that with each version of our product that we release, the kernel typically just keeps getting faster and faster, and the later kernels can cause problems to show up on systems that were just barely eeking by before.
I got output from running the script, so it must be hardware.