Red Hat Bugzilla – Bug 18006
Observed SMP kernel hang twice in two days
Last modified: 2007-04-18 12:28:50 EDT
I've had the 2.2.16-22smp kernel hang on me twice in two days: no activity
on the screeen, no response to mouse or keyboard input. Had to hard reset
The second time I noticed that one of the CPUs where chugging away at
100% load, but I can't think of anything running at the time which could
have caused a solid load like that. No clues in the logs.
Both times I was doing interactive GDB work, and running eCos serial
tests at 115200 baud. GDB connects to the target via a local socket
connection. So high serial load and/or socket IO _may_ be part of the
reason for locking up the box.
My box is a dual 450MHz PIII w 256MB RAM and IDE disks (running in DMA
After the second crash I downgraded my kernel to the pinestrip
2.2.16-17smp kernel which I"ve been using for many many days without
problems. I will add a comment to this bug if the kernel hangs again.
If 2.2.16-17 is reliable then please also try 2.2.17 final to be sure its a Red
Hat bogon not a main
kernel tree error
I have updated the kernel to 2.2.17 and am still experiancing the same
behaviour. Machine freezes completley about once every two days or so, unable to
open new terminal session or telnet from the network etc. Only way out is the
big red switch. I am running Celeron 600 MHz with 256Mb memory 500Mb swap.
Major application is Oracle 8.1.6i.
The 2.2.16-17smp kernel just hung, so it wasn't any better as I had thought.
2.2.18-pre18 also hangs. Here's some more info:
Load was about .3
CPU1 3% sys, 94% idle
CPU2 8% sys, 4% user, 88% idle
Mem free: 2892k - just before the system hung, free memory was
decreasing at about 100k/sec.
98MB shared, 90MB buffered
Uptime was 28 minutes. Mem free had hit bottom two times, I think, the
first time freeing back 56MB, the second time to 16MB.
Nothing in the logs. Biggest things running were:
How I forced this hang:
Downloading a big page in Netscape. The eCos test farm produces some
very big HTML outputs which take minutes to load over the (saturated)
64kb UK line.
Downloading the latest kernel sources from a local mirror. Was pretty
much maxing my 512kb ADSL line.
Running GDB in a loop, continously downloading files to a target
board at 38400 baud.
I'm pretty sure the crash is related to the serial load. I've only ever seen
the kernel hang when I was making heavy use of the serial line. But it may also
be related to the high ethernet traffic.
The serial line in use is on an ISA plugin card. I guess I should have
mentioned that before, but I just thought of it. I'm using /dev/ttyS2
Serial driver version 4.27 with MANY_PORTS MULTIPORT SHARE_IRQ enabled
ttyS00 at 0x03f8 (irq = 4) is a 16550A
ttyS01 at 0x02f8 (irq = 3) is a 16550A
ttyS02 at 0x03e8 (irq = 4) is a 16550A
ttyS03 at 0x02e8 (irq = 3) is a 16550A
Don't know if this is of any help at all. I hope so. I'm going for a
2.4.x kernel now - it takes too much time to recover from a crash and I
don't want it to affect deliverables [it's bound to hit at worst possible
time if it happens again]
New redhat user (new to Linux/Unix, in general), so I may not know what I am
talking about. . .
I can reproduce this hang readily (10+ times a day) when I am booting Linux
(enterprise??) or Linux SMP. This hang requires a hard reset of the system. I
tried removing a processor and I was still able to reproduce. However, at this
point I had no understanding at all of the various boot modes. After some
reading I am now booting Linux UP (uniprocessor??) and I am fine. I have been
running for 2 days without a single hang.
The kernel running is 2.2.16.
As I said, I am new to Redhat and Linux/Unix in general. If I can provide more
information please do not hesitate to contact me. This has become a very
serious issue for my site.
This issue or something similar to it is hindering our product development on
Redhat 7.0. Please respond.
This is a regression that happened between 6.2 and 7.0.
Like I said in my earlier update, I am new to Unix. Below is what I have looked
at so far:
The system has a Voodoo3 video card and a 3ware card in it. Both of these
drivers are installed. I am not booted off of the 3ware card, but ide off of
It's almost like xserv hangs or something. The screen doesn't refresh. I have
mouse movement, but I can't click on anything. Also, I can access the system
remotely, usually, so I don't think the kernel is dead. However, I sometimes
can and sometimes cannot do a shutdown or reboot remotely and I cannot do it
locally. It requires a hard reset of the system.
I don't have to be doing anything for this hang to occur. I can boot the system
with the default boot (Linux) or Linux SMP, walk away from the system without
ever having opened a single program and come back later to see that the system
is hung, or it hangs after doing just a couple of things.
This made me think that maybe it was a problem with Gnome. So I tried KDE.
Still get hang.
I started looking more at it and thought that maybe it was the Window Manager
(Sawfish). I switched to Enlightenment. Still get hang.
The only thing that seems to have alleviated the problem is booting Linux UP.
Please let me know if there is any other information I can provide to resolve
I'm very new to these things as well and got an amdk62 processor 96m etc.
running my win 98 on the same machine very well and even faster then rh7.0 ???
So, I dont know if it is relevant to this query then ?
Anyway I installed & reinstalled rh7.0 (approx 5 times with the anaconda
updates ) alread, using all different advices & recommendations etc...
I have had this as well even without logging into X etc...
I just sometimes find my harddrive light go on for a while or other times I did
a simple few pwd & ls commands and then I get these kernell pointer or page
request errors etc ....
Sometimes I can recover ,but most times I need to use my reset button...
I also found that when I finally reboot and the system force checks on
partitions not clean etc ....and I issue a free command , I get only 2250
or so)free mem as opposed to the ussual 77000 + (or so) free mem when starting
Very dissapointed as everyone else in my usergroup simply recommends changing
to mandrake etc....
Closing old 2.2 specific bug reports. Since all our errata are 2.4 based this
info is no longer useful.