Bug 138611 - Hung server recovers afer several Alt-Sysrq - RHEL3 U4 *Beta* kernel
Hung server recovers afer several Alt-Sysrq - RHEL3 U4 *Beta* kernel
Status: CLOSED WONTFIX
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: kernel (Show other bugs)
3.0
i686 Linux
medium Severity high
: ---
: ---
Assigned To: Dave Anderson
Brian Brock
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2004-11-10 03:37 EST by Juanjo Villaplana
Modified: 2007-11-30 17:07 EST (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2007-10-19 15:14:34 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Alt Sysrq M T W (21.14 KB, application/octet-stream)
2004-11-10 03:43 EST, Juanjo Villaplana
no flags Details
Alt Sysrq M T while everything is running OK (13.65 KB, application/x-gzip)
2004-11-11 02:22 EST, Juanjo Villaplana
no flags Details
Alt Sysrq M T W P (67.21 KB, application/x-gzip)
2004-11-17 05:35 EST, Juanjo Villaplana
no flags Details

  None (edit)
Description Juanjo Villaplana 2004-11-10 03:37:22 EST
Description of problem:

The server is a RHEL3 U3 quad Xeon MP with 8GB RAM running kernel
2.4.21-22.ELsmp from RHEL3 U4 *Beta*.

The server became unreachable via ssh/telnet, serial login console
seemed to respond, but always failed with "Login timed out after 60
seconds".

I issued several Alt-Sysrq M T and just before rebooting the server, I
 noticed that it started responding again to telnet/login, then I
logged on te serial console, ssh needed a service restart in order to
be available again.

The server has been updated to kernel 2.4.21-23.ELsmp.

Version-Release number of selected component (if applicable):

kernel 2.4.21-22.ELsmp

How reproducible:

n/a

Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:
Comment 1 Juanjo Villaplana 2004-11-10 03:43:44 EST
Created attachment 106396 [details]
Alt Sysrq M T W
Comment 2 Dave Anderson 2004-11-10 16:29:47 EST
Attachment appears to be garbage; I certainly can't read it.
Comment 3 Dave Anderson 2004-11-10 17:16:31 EST
Sorry -- didn't realize it was a gzip file...

I don't know anything about how the sshd daemons work.  There are 28
sshd processes, most of which are blocked on sys_select system call,
while 10 of them are blocked in this manner:

sshd          S 00000002     0  9052    943         10136  5816
(NOTLB)
Call Trace:   [<c0123f14>] schedule [kernel] 0x2f4 (0xede93e14)
[<c0134fbc>] schedule_timeout [kernel] 0xbc (0xede93e58)
[<c013d023>] futex_wait [kernel] 0x303 (0xede93e90)
[<c013c4c0>] futex_vcache_callback [kernel] 0x0 (0xede93ea0)
[<c013c4c0>] futex_vcache_callback [kernel] 0x0 (0xede93ef4)
[<c010bf9e>] do_signal [kernel] 0x8e (0xede93f20)
[<c013c663>] do_futex [kernel] 0xe3 (0xede93f58)
[<c013c739>] sys_futex [kernel] 0xb9 (0xede93f88)

But I have no idea whether this is normal or not?

If you do an Alt-sysrq-t while everything is running OK, do 
you see any sshd daemons with traces like the above?

Comment 4 Juanjo Villaplana 2004-11-11 02:22:10 EST
Created attachment 106479 [details]
Alt Sysrq M T while everything is running OK

Hi Dave,

Sorry, I forgot to set the "content type" to "auto-detect" for the previous
attachment.

Please find attached the Alt-sysrq-t output you requested.

Note that this server had 12 hours of uptime when this problem showed up, now
it is running 2.4.21-23.ELsmp for 32 hours without problems, also note that we
have other servers running fine with 2.4.21-22.ELsmp, but this is the only
server having more than 4GB of RAM.

Regards,
	   Juanjo
Comment 5 Juanjo Villaplana 2004-11-17 05:35:44 EST
Created attachment 106885 [details]
Alt Sysrq M T W P

After 6 days of uptime with kernel 2.4.21-23 the problem has showed again.

The scenario is almost the same, but this time the only way to recover the
console access was to issue an Alt-sysrq-i, then I have successfully logged on
console.

I have additional Alt-sysrq-t and Alt-sysrq-m issued after Alt-sysrq-i that I
can attach if it helps.

The server is now running 2.4.21-25.

Regards,
	    Juanjo
Comment 6 RHEL Product and Program Management 2007-10-19 15:14:34 EDT
This bug is filed against RHEL 3, which is in maintenance phase.
During the maintenance phase, only security errata and select mission
critical bug fixes will be released for enterprise products. Since
this bug does not meet that criteria, it is now being closed.
 
For more information of the RHEL errata support policy, please visit:
http://www.redhat.com/security/updates/errata/
 
If you feel this bug is indeed mission critical, please contact your
support representative. You may be asked to provide detailed
information on how this bug is affecting you.

Note You need to log in before you can comment on or make changes to this bug.