Bug 18006

Summary: Observed SMP kernel hang twice in two days
Product: [Retired] Red Hat Linux Reporter: Jesper Skov <jskov>
Component: kernelAssignee: Arjan van de Ven <arjanv>
Status: CLOSED ERRATA QA Contact: Brian Brock <bbrock>
Severity: high Docs Contact:
Priority: medium    
Version: 7.0CC: alan, angela.duane, jfarinas, jskov, julianws, nigel
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2003-06-10 18:43:45 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Jesper Skov 2000-10-01 09:34:02 UTC
I've had the 2.2.16-22smp kernel hang on me twice in two days: no activity
on the screeen, no response to mouse or keyboard input. Had to hard reset
both times.

The second time I noticed that one of the CPUs where chugging away at
100% load, but I can't think of anything running at the time which could
have caused a solid load like that. No clues in the logs.

Both times I was doing interactive GDB work, and running eCos serial
tests at 115200 baud. GDB connects to the target via a local socket
connection. So high serial load and/or socket IO _may_ be part of the
reason for locking up the box.

My box is a dual 450MHz PIII w 256MB RAM and IDE disks (running in DMA
mode).


After the second crash I downgraded my kernel to the pinestrip 
2.2.16-17smp kernel which I"ve been using for many many days without
problems. I will add a comment to this bug if the kernel hangs again.

Comment 1 Alan Cox 2000-10-01 21:42:44 UTC
If 2.2.16-17 is reliable then please also try 2.2.17 final to be sure its a Red
Hat bogon not a main
kernel tree error


Comment 2 Need Real Name 2000-10-30 02:31:26 UTC
I have updated the kernel to 2.2.17 and am still experiancing the same
behaviour. Machine freezes completley about once every two days or so, unable to
open new terminal session or telnet from the network etc. Only way out is the
big red switch. I am running  Celeron 600 MHz with 256Mb memory 500Mb swap.
Major application is Oracle 8.1.6i.


Comment 3 Jesper Skov 2000-11-01 10:30:28 UTC
The 2.2.16-17smp kernel just hung, so it wasn't any better as I had thought.


Comment 4 Jesper Skov 2000-11-01 14:39:08 UTC
2.2.18-pre18 also hangs. Here's some more info:

 Load was about .3
 CPU1 3% sys, 94% idle
 CPU2 8% sys, 4% user, 88% idle

 Mem free: 2892k  - just before the system hung, free memory was
 decreasing at about 100k/sec.

 98MB shared, 90MB buffered

Uptime was 28 minutes. Mem free had hit bottom two times, I think, the
first time freeing back 56MB, the second time to 16MB.

Nothing in the logs. Biggest things running were:

 X 18MB
 Netscape 38MB

How I forced this hang:

 Downloading a big page in Netscape. The eCos test farm produces some
 very big HTML outputs which take minutes to load over the (saturated)
 64kb UK line.

 Downloading the latest kernel sources from a local mirror. Was pretty
 much maxing my 512kb ADSL line.

 Running GDB in a loop, continously downloading files to a target
 board at 38400 baud.

I'm pretty sure the crash is related to the serial load. I've only ever seen
the kernel hang when I was making heavy use of the serial line. But it may also
be related to the high ethernet traffic.

The serial line in use is on an ISA plugin card. I guess I should have
mentioned that before, but I just thought of it. I'm using /dev/ttyS2

 Serial driver version 4.27 with MANY_PORTS MULTIPORT SHARE_IRQ enabled
 ttyS00 at 0x03f8 (irq = 4) is a 16550A
 ttyS01 at 0x02f8 (irq = 3) is a 16550A
 ttyS02 at 0x03e8 (irq = 4) is a 16550A
 ttyS03 at 0x02e8 (irq = 3) is a 16550A


Don't know if this is of any help at all. I hope so. I'm going for a
2.4.x kernel now - it takes too much time to recover from a crash and I
don't want it to affect deliverables [it's bound to hit at worst possible
time if it happens again]


Comment 5 Need Real Name 2000-11-09 14:12:20 UTC
New redhat user (new to Linux/Unix, in general), so I may not know what I am 
talking about. . .

I can reproduce this hang readily (10+ times a day) when I am booting Linux 
(enterprise??) or Linux SMP. This hang requires a hard reset of the system. I 
tried removing a processor and I was still able to reproduce. However, at this 
point I had no understanding at all of the various boot modes. After some 
reading I am now booting Linux UP (uniprocessor??) and I am fine. I have been 
running for 2 days without a single hang.

The kernel running is 2.2.16.

As I said, I am new to Redhat and Linux/Unix in general. If I can provide more 
information please do not hesitate to contact me. This has become a very 
serious issue for my site.

Comment 6 Need Real Name 2000-11-15 20:46:29 UTC
This issue or something similar to it is hindering our product development on 
Redhat 7.0. Please respond.

This is a regression that happened between 6.2 and 7.0. 

Like I said in my earlier update, I am new to Unix. Below is what I have looked 
at so far:

The system has a Voodoo3 video card and a 3ware card in it. Both of these 
drivers are installed. I am not booted off of the 3ware card, but ide off of 
the motherboard.

It's almost like xserv hangs or something. The screen doesn't refresh. I have 
mouse movement, but I can't click on anything. Also, I can access the system 
remotely, usually, so I don't think the kernel is dead. However, I sometimes 
can and sometimes cannot do a shutdown or reboot remotely and I cannot do it 
locally. It requires a hard reset of the system.

I don't have to be doing anything for this hang to occur. I can boot the system 
with the default boot (Linux) or Linux SMP, walk away from the system without 
ever having opened a single program and come back later to see that the system 
is hung, or it hangs after doing just a couple of things.

This made me think that maybe it was a problem with Gnome. So I tried KDE. 
Still get hang. 

I started looking more at it and thought that maybe it was the Window Manager 
(Sawfish). I switched to Enlightenment. Still get hang.

The only thing that seems to have alleviated the problem is booting Linux UP.

Please let me know if there is any other information I can provide to resolve 
this issue.

Comment 7 Need Real Name 2000-12-25 23:36:57 UTC
Hi all

I'm very new to these things as well and got an amdk62 processor 96m etc. 
running my win 98 on the same machine very well and even faster then rh7.0 ???
So, I dont know if it is relevant to this query then ?

Anyway I installed & reinstalled rh7.0 (approx 5 times with the anaconda 
updates ) alread, using all different advices & recommendations etc...

I have had this as well even without logging into X etc...
I just sometimes find my harddrive light go on for a while or other times I did 
a simple few pwd & ls commands and then I get these kernell pointer or page 
request errors etc ....

Sometimes I can recover ,but most times I need to use my reset button...

I also found that when I finally reboot and the system force checks on 
partitions not clean etc ....and I issue a free command , I get only 2250 
or so)free mem as opposed to the ussual 77000 + (or so) free mem when starting 
clean partitions

Very dissapointed as everyone else in my usergroup simply recommends changing 
to mandrake etc....


Comment 8 Alan Cox 2003-06-10 18:43:45 UTC
Closing old 2.2 specific bug reports. Since all our errata are 2.4 based this
info is no longer useful.