From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0.2) Gecko/20030716 Description of problem: I was stress-testing and IBM x440 8-way system over the weekend. When I arived on monday the system was in a non-usable state. The non-disk based tests were still running but the others had stoped. Top was still running but not a nfs copy. I was unable to logon to the system via the console or ssh. It would take my username and password but would not give me a shell. My one shell open was lost to an uptime command that never returned and I could not kill. Top showed very high load avrages 17-20 with all the cpus doing idle things. It seems like new processes were not able to start. It also showed there was lots of free memory left. After rebooting the box to check /var/log/messages there was lots of free disk space so I don't know what is going on. Just that the box was unusable. Commands would not return and users could not logon. I will attach the messages (I didn't see anything good in there but who knows) I am working to test on other boxes asap to see if I can reproduce the problem somewhere else. The test I am ruunning is a tools10 based .(kernel compile, nfs copy, copy cd to disk, hell hound, do some pings loop) Version-Release number of selected component (if applicable): kernel-2.4.21-7.EL How reproducible: Didn't try Steps to Reproduce: 1.Install AS3.0 update cd's re0108 2.run tests 3.wait a weekend Actual Results: The system was non-usable Expected Results: The system should have behaved in a usable fashion. Additional info: Working to retest system.
Created attachment 96914 [details] var/log/messages from system
Created attachment 96919 [details] first var/log/messages had remove some of the data due to size constraints
Well, there's nothing to work with here. Please reproduce the hang state, and then forward the outputs of: Alt-Sysrq-m Alt-Sysrq-p (several in a row) Alt-Sysrq-w Alt-Sysrq-t Before starting your tests, make sure /proc/sys/kernel/sysrq is set to 1, or that "kernel.sysrq" is set to 1 in /etc/sysctl.conf.
I am currently testing with 2 systems to see the issue again and to get the above outputs.
update to my last request: please do the Alt-Sysrq-w last, as it is possible that it will hang the console (and never return) if one of the cpus is spinning on a lock with interrupts disabled. So, if you get the same hang please do the Alt-Sysrq's in this order: Alt-Sysrq-m Alt-Sysrp-p (several in a row) Alt-Sysrq-t Alt-Sysrq-w
Well I have been testing 2 systems for 6 days had havent seen the issue again. If I see it again and am able capture any debug output I will post again. Thanks.
This sounds a bit like bug 117210.
This bug is filed against RHEL 3, which is in maintenance phase. During the maintenance phase, only security errata and select mission critical bug fixes will be released for enterprise products. Since this bug does not meet that criteria, it is now being closed. For more information of the RHEL errata support policy, please visit: http://www.redhat.com/security/updates/errata/ If you feel this bug is indeed mission critical, please contact your support representative. You may be asked to provide detailed information on how this bug is affecting you.