Red Hat Bugzilla – Bug 138611
Hung server recovers afer several Alt-Sysrq - RHEL3 U4 *Beta* kernel
Last modified: 2007-11-30 17:07:05 EST
Description of problem:
The server is a RHEL3 U3 quad Xeon MP with 8GB RAM running kernel
2.4.21-22.ELsmp from RHEL3 U4 *Beta*.
The server became unreachable via ssh/telnet, serial login console
seemed to respond, but always failed with "Login timed out after 60
I issued several Alt-Sysrq M T and just before rebooting the server, I
noticed that it started responding again to telnet/login, then I
logged on te serial console, ssh needed a service restart in order to
be available again.
The server has been updated to kernel 2.4.21-23.ELsmp.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
Created attachment 106396 [details]
Alt Sysrq M T W
Attachment appears to be garbage; I certainly can't read it.
Sorry -- didn't realize it was a gzip file...
I don't know anything about how the sshd daemons work. There are 28
sshd processes, most of which are blocked on sys_select system call,
while 10 of them are blocked in this manner:
sshd S 00000002 0 9052 943 10136 5816
Call Trace: [<c0123f14>] schedule [kernel] 0x2f4 (0xede93e14)
[<c0134fbc>] schedule_timeout [kernel] 0xbc (0xede93e58)
[<c013d023>] futex_wait [kernel] 0x303 (0xede93e90)
[<c013c4c0>] futex_vcache_callback [kernel] 0x0 (0xede93ea0)
[<c013c4c0>] futex_vcache_callback [kernel] 0x0 (0xede93ef4)
[<c010bf9e>] do_signal [kernel] 0x8e (0xede93f20)
[<c013c663>] do_futex [kernel] 0xe3 (0xede93f58)
[<c013c739>] sys_futex [kernel] 0xb9 (0xede93f88)
But I have no idea whether this is normal or not?
If you do an Alt-sysrq-t while everything is running OK, do
you see any sshd daemons with traces like the above?
Created attachment 106479 [details]
Alt Sysrq M T while everything is running OK
Sorry, I forgot to set the "content type" to "auto-detect" for the previous
Please find attached the Alt-sysrq-t output you requested.
Note that this server had 12 hours of uptime when this problem showed up, now
it is running 2.4.21-23.ELsmp for 32 hours without problems, also note that we
have other servers running fine with 2.4.21-22.ELsmp, but this is the only
server having more than 4GB of RAM.
Created attachment 106885 [details]
Alt Sysrq M T W P
After 6 days of uptime with kernel 2.4.21-23 the problem has showed again.
The scenario is almost the same, but this time the only way to recover the
console access was to issue an Alt-sysrq-i, then I have successfully logged on
I have additional Alt-sysrq-t and Alt-sysrq-m issued after Alt-sysrq-i that I
can attach if it helps.
The server is now running 2.4.21-25.
This bug is filed against RHEL 3, which is in maintenance phase.
During the maintenance phase, only security errata and select mission
critical bug fixes will be released for enterprise products. Since
this bug does not meet that criteria, it is now being closed.
For more information of the RHEL errata support policy, please visit:
If you feel this bug is indeed mission critical, please contact your
support representative. You may be asked to provide detailed
information on how this bug is affecting you.