Red Hat Bugzilla – Bug 43311
Kernel oops causes X to hang
Last modified: 2007-04-18 12:33:32 EDT
Description of Problem: Kernel oops causes X to hang. Can telnet
in from another system and perfrm an orderly shutdown. This happens
once or twice a day. It's not always X that gets the error but it
usually is. The failure is always at the same location of
__poolwait+16/144 if I'm reading the report correctly
(see attached kernel messages)
How Reproducible: Can't figure out a consistant pattern to reproduce.
Steps to Reproduce:
Created attachment 20151 [details]
Kernel OOPS output
I have similar problems and they appear more seldom since i increased swapspace
from approx 400 Mbyte to 600 Mbyte. I have 224 Mbyte ram and heard that
swapspace should be at least the double, due to a bug...
I have 256 MB of RAM and my swap file is nearly 550 MB. I upgraded my kernel
to 2.4.5 a couple of days ago and have had only one hit since then. I also
upgraded my BIOS to the latest one. I am suspicious of the VIA chipset on my
motherboard. I have a ABit KT7A-RAID (nothing on the RAID controller at the
moment). I have no direct evidence that it's the VIA but a lot of people have
had problems with it. I've done some looking at the kernel source to try and
determine what the pollwait routine does and see if it may be a chipset issue.
Not much progress on that. When/if it happens again I'm going to telnet into
the box and run ksymoops at the time of the failure and see if that reveals
anything. At this point I'm waiting on another incident.
Some bioses are known to misconfigure (in hindsight) VIA chipsets...
if a biosupgrade fixed it it seems like a bios/chipset bug indeed.
I'll mark this bug as "needinfo" so it's simple to reopen once this
Created attachment 21645 [details]
Just when I thought this problem had disappeared, I had
another occurance this evening. This time I telneted in
and ran ksymoops of the running system before rebooting.
I also captued the output from lsmod.
[root@localhost /root]# lsmod
Module Size Used by
iptable_filter 1952 0 (autoclean) (unused)
ip_tables 10752 1 [iptable_filter]
ppp_async 6320 1 (autoclean)
ppp_generic 13344 3 (autoclean) [ppp_async]
slhc 5024 0 (autoclean) [ppp_generic]
via686a 8336 0
sensors 6064 0 [via686a]
i2c-isa 1184 0 (unused)
i2c-core cdrom 27520 0 [ide-cd] 13392 0
[via686a sensors i2c-isa]
autofs 10240 1 (autoclean)
ide-scsi 7968 0
scsi_mod 82896 1 [ide-scsi]
ide-cd 26336 0
[root@localhost /root]# lspci
00:00.0 Host bridge: VIA Technologies, Inc. VT8363/8365 [KT133/KM133] (rev
00:01.0 PCI bridge: VIA Technologies, Inc. VT8363/8365 [KT133/KM133 AGP]
00:07.0 ISA bridge: VIA Technologies, Inc. VT82C686 [Apollo Super South] (rev
00:07.1 IDE interface: VIA Technologies, Inc. Bus Master IDE (rev 06)
00:07.4 Host bridge: VIA Technologies, Inc. VT82C686 [Apollo Super ACPI] (rev
00:08.0 Multimedia audio controller: Ensoniq CT5880 [AudioPCI] (rev 02)
00:0f.0 Ethernet controller: Advanced Micro Devices [AMD] 79c970 [PCnet LANCE]
01:00.0 VGA compatible controller: ATI Technologies Inc Rage 128 SM
See the attachemnt for the ksymoops output.
Can you try removing the lmsensors module ?
Also, what kernel was this exactly ?
I can remove the lmsensors module. I have taken the oops with and without
those modules loaded. The kernel is 2.4.5. As noted before, the oops
was happening daily with 2.4.2-2. After flashing the bios and upgrading
the kernel the instances have become much less frequent (i.e. 2). I can't
say which is responsible -- the bios upgrade or the kernel upgrade. My
suspicion is the bios upgrade since there have been numerious problems with
Created attachment 21986 [details]
ksymoops output for kernel 2.4.3-12
Since my last update I have had many occurances of this problem.... 7 yesterday and
3 today. All were without LMSENSORS loaded. This morning I backed off to kernel
2.4.2-2 (the original from Redhat 7.1). The hit this morning was taken with the 2.4.2-2
kernel. At noon I upgraded to the new 2.4.3-12 kernel. I took a hit this afternoon while
it was running. I've included the ksymoops output as an attachment.
I don't understand why I ran for nearly three weeks without incident. The only updates
to the system were via up2date. Those were...
I have no idea if this is a hardware problem or a software problem. The problem is
always at __pollwait+16. I have no idea what that routine does.
This problem should be closed. It turned out to be a bad CPU!!! The processor chip
was replaced and all is well.
Thank you for letting us know.
Bad hardware is not something I can fix ;)