Description of problem: The server is a RHEL3 U3 quad Xeon MP with 8GB RAM running kernel 2.4.21-22.ELsmp from RHEL3 U4 *Beta*. The server became unreachable via ssh/telnet, serial login console seemed to respond, but always failed with "Login timed out after 60 seconds". I issued several Alt-Sysrq M T and just before rebooting the server, I noticed that it started responding again to telnet/login, then I logged on te serial console, ssh needed a service restart in order to be available again. The server has been updated to kernel 2.4.21-23.ELsmp. Version-Release number of selected component (if applicable): kernel 2.4.21-22.ELsmp How reproducible: n/a Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
Created attachment 106396 [details] Alt Sysrq M T W
Attachment appears to be garbage; I certainly can't read it.
Sorry -- didn't realize it was a gzip file... I don't know anything about how the sshd daemons work. There are 28 sshd processes, most of which are blocked on sys_select system call, while 10 of them are blocked in this manner: sshd S 00000002 0 9052 943 10136 5816 (NOTLB) Call Trace: [<c0123f14>] schedule [kernel] 0x2f4 (0xede93e14) [<c0134fbc>] schedule_timeout [kernel] 0xbc (0xede93e58) [<c013d023>] futex_wait [kernel] 0x303 (0xede93e90) [<c013c4c0>] futex_vcache_callback [kernel] 0x0 (0xede93ea0) [<c013c4c0>] futex_vcache_callback [kernel] 0x0 (0xede93ef4) [<c010bf9e>] do_signal [kernel] 0x8e (0xede93f20) [<c013c663>] do_futex [kernel] 0xe3 (0xede93f58) [<c013c739>] sys_futex [kernel] 0xb9 (0xede93f88) But I have no idea whether this is normal or not? If you do an Alt-sysrq-t while everything is running OK, do you see any sshd daemons with traces like the above?
Created attachment 106479 [details] Alt Sysrq M T while everything is running OK Hi Dave, Sorry, I forgot to set the "content type" to "auto-detect" for the previous attachment. Please find attached the Alt-sysrq-t output you requested. Note that this server had 12 hours of uptime when this problem showed up, now it is running 2.4.21-23.ELsmp for 32 hours without problems, also note that we have other servers running fine with 2.4.21-22.ELsmp, but this is the only server having more than 4GB of RAM. Regards, Juanjo
Created attachment 106885 [details] Alt Sysrq M T W P After 6 days of uptime with kernel 2.4.21-23 the problem has showed again. The scenario is almost the same, but this time the only way to recover the console access was to issue an Alt-sysrq-i, then I have successfully logged on console. I have additional Alt-sysrq-t and Alt-sysrq-m issued after Alt-sysrq-i that I can attach if it helps. The server is now running 2.4.21-25. Regards, Juanjo
This bug is filed against RHEL 3, which is in maintenance phase. During the maintenance phase, only security errata and select mission critical bug fixes will be released for enterprise products. Since this bug does not meet that criteria, it is now being closed. For more information of the RHEL errata support policy, please visit: http://www.redhat.com/security/updates/errata/ If you feel this bug is indeed mission critical, please contact your support representative. You may be asked to provide detailed information on how this bug is affecting you.