Bug 43311

Summary: Kernel oops causes X to hang
Product: [Retired] Red Hat Linux Reporter: rich
Component: kernelAssignee: Arjan van de Ven <arjanv>
Status: CLOSED NOTABUG QA Contact: Brock Organ <borgan>
Severity: medium Docs Contact:
Priority: medium    
Version: 7.1   
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2001-08-25 02:24:12 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Kernel OOPS output
none
ksymoops output
none
ksymoops output for kernel 2.4.3-12 none

Description rich 2001-06-03 01:05:22 UTC
Description of Problem:  Kernel oops causes X to hang.  Can telnet
in from another system and perfrm an orderly shutdown.  This happens
once or twice a day.  It's not always X that gets the error but it
usually is.  The failure is always at the same location of
__poolwait+16/144 if I'm reading the report correctly

(see attached kernel messages)

How Reproducible:  Can't figure out a consistant pattern to reproduce.


Steps to Reproduce:
1. 
2. 
3. 

Actual Results:


Expected Results:


Additional Information:

Comment 1 rich 2001-06-03 01:06:57 UTC
Created attachment 20151 [details]
Kernel OOPS output

Comment 2 Johan Leckstrom 2001-06-07 12:51:19 UTC
I have similar problems and they appear more seldom since i increased swapspace 
from approx 400 Mbyte to 600 Mbyte. I have 224 Mbyte ram and heard that 
swapspace should be at least the double, due to a bug...

Comment 3 rich 2001-06-07 13:14:53 UTC
I have 256 MB of RAM and my swap file is nearly 550 MB.  I upgraded my kernel 
to 2.4.5 a couple of days ago and have had only one hit since then.  I also 
upgraded my BIOS to the latest one.  I am suspicious of the VIA chipset on my 
motherboard. I have a ABit KT7A-RAID (nothing on the RAID controller at the 
moment).  I have no direct evidence that it's the VIA but a lot of people have 
had problems with it. I've done some looking at the kernel source to try and 
determine what the pollwait routine does and see if it may be a chipset issue. 
Not much progress on that.  When/if it happens again I'm going to telnet into 
the box and run ksymoops at the time of the failure and see if that reveals 
anything.  At this point I'm waiting on another incident.

Comment 4 Arjan van de Ven 2001-06-11 15:09:43 UTC
Some bioses are known to misconfigure (in hindsight) VIA chipsets...
if a biosupgrade fixed it it seems like a bios/chipset bug indeed.
I'll mark this bug as "needinfo" so it's simple to reopen once this
returns...

Comment 5 rich 2001-06-24 01:42:44 UTC
Created attachment 21645 [details]
ksymoops output

Comment 6 rich 2001-06-24 01:43:02 UTC
Just when I thought this problem had disappeared, I had 
another occurance this evening.  This time I telneted in
and ran ksymoops of the running system before rebooting.
I also captued the output from lsmod.

[root@localhost /root]# lsmod
Module                  Size  Used by
iptable_filter          1952   0  (autoclean) (unused)
ip_tables              10752   1  [iptable_filter]
ppp_async               6320   1  (autoclean)
ppp_generic            13344   3  (autoclean) [ppp_async]
slhc                    5024   0  (autoclean) [ppp_generic]
via686a                 8336   0 
sensors                 6064   0  [via686a]
i2c-isa                 1184   0  (unused)
i2c-core cdrom                  27520   0  [ide-cd]              13392   0  
[via686a sensors i2c-isa]
autofs                 10240   1  (autoclean)
ide-scsi                7968   0 
scsi_mod               82896   1  [ide-scsi]
ide-cd                 26336   0


[root@localhost /root]# lspci
00:00.0 Host bridge: VIA Technologies, Inc. VT8363/8365 [KT133/KM133] (rev 
03)
00:01.0 PCI bridge: VIA Technologies, Inc. VT8363/8365 [KT133/KM133 AGP]
00:07.0 ISA bridge: VIA Technologies, Inc. VT82C686 [Apollo Super South] (rev 
40)
00:07.1 IDE interface: VIA Technologies, Inc. Bus Master IDE (rev 06)
00:07.4 Host bridge: VIA Technologies, Inc. VT82C686 [Apollo Super ACPI] (rev 
40)
00:08.0 Multimedia audio controller: Ensoniq CT5880 [AudioPCI] (rev 02)
00:0f.0 Ethernet controller: Advanced Micro Devices [AMD] 79c970 [PCnet LANCE] 
(rev 16)
01:00.0 VGA compatible controller: ATI Technologies Inc Rage 128 SM        


See the attachemnt for the ksymoops output.



Comment 7 Arjan van de Ven 2001-06-25 06:36:28 UTC
Can you try removing the lmsensors module ?
Also, what kernel was this exactly ?

Comment 8 rich 2001-06-25 12:31:09 UTC
I can remove the lmsensors module.  I have taken the oops with and without
those modules loaded.  The kernel is 2.4.5.  As noted before, the oops 
was happening daily with 2.4.2-2.  After flashing the bios and upgrading 
the kernel the instances have become much less frequent (i.e. 2).  I can't 
say which is responsible -- the bios upgrade or the kernel upgrade. My 
suspicion is the bios upgrade since there have been numerious problems with
VIA chipsets.

Comment 9 rich 2001-06-27 20:09:51 UTC
Created attachment 21986 [details]
ksymoops output for kernel 2.4.3-12

Comment 10 rich 2001-06-27 20:10:25 UTC
Since my last update I have had many occurances of this problem.... 7 yesterday and 
3 today.  All were without LMSENSORS loaded.  This morning I backed off to kernel 
2.4.2-2 (the original from Redhat 7.1).  The hit this morning was taken with the 2.4.2-2
kernel.  At noon I upgraded to the new 2.4.3-12 kernel.  I took a hit this afternoon while 
it was running.  I've included the ksymoops output as an attachment.

I don't understand why I ran for nearly three weeks without incident.  The only updates
to the system were via up2date.  Those were...


SysVinit-2.78-17.i386.rpm
XFree86-SVGA-3.3.6-38.i386.rpm
XFree86-VGA16-3.3.6-38.i386.rpm
cpp-2.96-85.i386.rpm
gcc-2.96-85.i386.rpm
gcc-c++-2.96-85.i386.rpm
libstdc++-2.96-85.i386.rpm
libstdc++-devel-2.96-85.i386.rpm

I have no idea if this is a hardware problem or a software problem.  The problem is 
always at __pollwait+16.  I have no idea what that routine does.


Comment 11 rich 2001-08-25 02:24:08 UTC
This problem should be closed.  It turned out to be a bad CPU!!!  The processor chip
was replaced and all is well.

Comment 12 Arjan van de Ven 2001-08-25 06:49:15 UTC
Thank you for letting us know.
Bad hardware is not something I can fix ;)