Red Hat Bugzilla – Bug 140910
mysterious system hangs
Last modified: 2007-11-30 17:07:05 EST
From Bugzilla Helper:
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0) Opera
Description of problem:
System stop working, but seems to working: on console can enter
username, but there is no password prompt; on network listen on all
port but not give back any banner, and reset connection ater ~5
minutes; there is no entry in any log file after. The problem occours
on 4 machine running this kernel. Logs show no problem, no entry in
audit, process accountig and syslog relevant to hang up.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. I don't know why it happens. Occoured 1-4 times in a month.
There is 4 machine:
1. MAIL: ide disk, using home partition mounted over nfs.
2. FILE: 3ware 9000 sata controller (8 disk, raid10 array), use an
firewire accessed ide disk for backups, home partition exported over
3. WWW: ide disk, not use nfs.
All machine using processor pentium 4.
4. Workstation: i don't know any hardware parameters, I'm not
*** Bug 140911 has been marked as a duplicate of this bug. ***
We have a similar problem with our loghost, which is running
(obviously) syslogd but also agetty at ttyS0 with kernel console set
to ttyS0. The symptoms described in this bug report very much
resembles the problem we have, where an strace on syslogd shows that
it is waiting for open() on ttyS0 (/dev/console). No agetty was
running (configured in /etc/inittab), and with syslogd unable to
respond all processes trying to syslog would hang as well.
Perhaps there is a race in the kernel serial driver and the serial
console? All we know is that syslogd was waiting for open(ttyS0,..)
and that agetty wasn't running, and that killing and restarting
syslogd fixed the problem. Killing syslogd made agetty start, and
restarting syslogd fixed the rest.
We got an error message on console:
1. When logging out from the serial console on ttyS0:
Warning: null TTY for (04:40) in tty_fasync
Warning: null TTY for (04:40) in tty_fasync
2. When the last of the syslogd processes (yes, there were 8 of them)
rs_close: bad serial port count; tty->count is 1, state->count is 6
Jozsef, does this in any way resemble your problem, and could it be
that we have experienced the same bug?
We are using system builtin mingetty, but syslogd.conf isn't default.
In this config only file logging exist and there is no console
It could be the same bug, but we could not resolve the problem from
userspace, only if we hard reset the system. Cause of this I suspect
too that it is a kernel bug.
More investigation of this bug shows that:
1. could be cause of heavy swap usage
2. could be sysv shared memory, semaphor, message queue limitation (/
3. could be vm settings (/proc/sys/vm/*)
4. i don't know:)
It's impossible to determine what's happening based upon the
information available so far. Given that it is capable of receiving
keyboard interrupts, then it should be able to respond to Alt-Sysrq
input on the console. The next time the hang occurs, please send the
output from Alt-Sysrq-m, Alt-sysrq-t, Alt-sysrq-p, and Alt-Sysrq-w,
in that order.
Also, make sure that /proc/sys/kernel/sysrq is equal to 1. If it is
not, then "echo 1 > /proc/sys/kernel/sysrq", or set it permanently
to 1 in /etc/sysctl.conf:
sysrq = 1
Created attachment 112804 [details]
Output from sysrq
We had this hang yet another time, and this time we were able to extract
the sysrq output. The attached file include the output in sequence.
The state dump took a long time. The register dump was because of this
not taken directly after the state dump, but a few minutes later when I
discovered that the state dump was finished.
Did the output from sysrq make it easier to find the problem? Any clues
on how to avoid the problem would be very much appreaciated.
SysRq : Show Memory
Zone:DMA freepages: 2876 min: 0 low: 0 high: 0
Zone:Normal freepages: 1322 min: 1279 low: 4544 high: 6304
Zone:HighMem freepages: 571 min: 255 low: 6654 high: 9981
Free pages: 4768 ( 570 HighMem)
( Active: 350863/84675, inactive_laundry: 18465, inactive_clean: 11537, free:
aa:0 ac:0 id:0 il:0 ic:0 fr:2876
aa:633 ac:42957 id:7056 il:3268 ic:3391 fr:1324
aa:44890 ac:262383 id:77621 il:15197 ic:8146 fr:567
2*4kB 1*8kB 2*16kB 2*32kB 2*64kB 0*128kB 2*256kB 1*512kB 0*1024kB 1*2048kB
2*4096kB = 11504kB)
82*4kB 92*8kB 44*16kB 2*32kB 0*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 1*2048kB
0*4096kB = 5288kB)
302*4kB 6*8kB 1*16kB 1*32kB 1*64kB 1*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB
0*4096kB = 2264kB)
Swap cache: add 0, delete 0, find 0/0, race 0+0
140228 pages of slabcache
4060 pages of kernel stacks
0 lowmem pagetables, 15994 highmem pagetables
Free swap: 4192504kB
655339 pages of RAM
425963 pages of HIGHMEM
13876 reserved pages
477812 pages shared
0 pages swap cached
The system does not appear to be stopped or in any kind of
"hard" hang. It certainly is strapped for memory, although
there is page reclamation going on in order to keep the system
running. In fact, kswapd is currently blocked because there
is enough memory on the inactive clean (ic:) plus the free (fr:)
lists of both the normal and high zones that collectively are
greater than the "low:" values for those zones. So at the time
of the alt-sysrq-t, kswapd was not actively reclaiming memory from
either zone's respective page caches.
What is a bit troubling is the amount of pages being used by
the slabcache (140228), all of which comes from the normal
zone, which when fully populated can have a maximum of
~225000 pages (~896MB). Whenever the slabcache consumes
more than about 50% of the normal zone, there's potentially
a problem. A "cat /proc/slabinfo" at the time of the hang
(if possible) might yield some clues.
However, swap is not even being used, because the page reclamation
process from each zone's page cache seems to be doing enough
to satisfy the memory requirements. Furthermore, the alt-sysrq-w
at the end of the output shows 3 processors idle, and the
4th one doing a syslog read.
What is hard to understand, though, is why there are so many
crond processes running. The system at the time of the alt-sysrq-t
had 2030 processes, and 1963 of them are "crond" processes, which
I've never seen before.
Is that a "normal" situation in your configuration? (Or has
crond somehow gone wild?)
Lastly, this is a RHEL3-U3 kernel. Numerous memory-handling
updates have gone into the RHEL3-U4 kernel, as well as into the
soon-to-be-released RHEL3-U5 kernel. Before doing much more
with this case, the kernel will have to be updated.
crond was blocked, trying to syslog. sysklogd was hanging, and thus all
the crond processes got stuck. I believe sysklogd was hanging because of
some serial console problem, or something like that. This was definitely
not a normal situation. The number of processes had been growing linearly
for quite some time when we discovered this.
You can see this pattern from our munin graphs at
<URL: http://yggdrasil.uio.no/munin/uio.no/hvelvet.uio.no.html >
Notice how there are spikes in november, december, march and april.
Does this sound like something that could be associated
with your init-dev patch?
Nonetheless, an upgrade to RHEL3-U5 is in order here, due to several
fixes in the tty area. It was released to RHN this morning as:
RHSA-2005:294 - Updated kernel packages available for Red Hat Enterprise Linux 3
We upgraded our log host to a new kernel 2005-05-20 (kernel version
2.4.21-32.ELsmp), and the problem with blocked processes repeated
itself this night. So the new kernel do not seem to make any difference for
This bug is filed against RHEL 3, which is in maintenance phase.
During the maintenance phase, only security errata and select mission
critical bug fixes will be released for enterprise products. Since
this bug does not meet that criteria, it is now being closed.
For more information of the RHEL errata support policy, please visit:
If you feel this bug is indeed mission critical, please contact your
support representative. You may be asked to provide detailed
information on how this bug is affecting you.