Bug 113351 - System non-usable after 2 days of stress
System non-usable after 2 days of stress
Status: CLOSED WONTFIX
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: kernel (Show other bugs)
3.0
i686 Linux
medium Severity high
: ---
: ---
Assigned To: Dave Anderson
Brian Brock
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2004-01-12 18:59 EST by keith mannth
Modified: 2007-11-30 17:07 EST (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2007-10-19 15:31:27 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
var/log/messages from system (914.88 KB, text/plain)
2004-01-12 19:00 EST, keith mannth
no flags Details
first var/log/messages (947.92 KB, text/plain)
2004-01-12 19:14 EST, keith mannth
no flags Details

  None (edit)
Description keith mannth 2004-01-12 18:59:15 EST
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0.2)
Gecko/20030716

Description of problem:
  I was stress-testing and IBM x440 8-way system over the weekend. 
When I arived on monday the system was in a non-usable state. The
non-disk based tests were still running but the others had stoped. 
Top was still running but not a nfs copy.  
   I was unable to logon to the system via the console or ssh.  It
would take my username and password but would not give me a shell.  My
one shell open was lost to an uptime command that never returned and I
could not kill.  
  Top showed very high load avrages 17-20 with all the cpus doing idle
things.  It seems like new processes were not able to start.  It also
showed there was lots of free memory left.
  After rebooting the box to check /var/log/messages there was lots of
free disk space so I don't know what is going on.   Just that the box
was unusable.  Commands would not return  and users could not logon. 
I will attach the messages (I didn't see anything good in there but
who knows) 
  I am working to test on other boxes asap to see if I can reproduce
the problem somewhere else.  The test I am ruunning is a tools10 based
.(kernel compile, nfs copy, copy cd to disk, hell hound, do some pings
loop)   

Version-Release number of selected component (if applicable):
kernel-2.4.21-7.EL

How reproducible:
Didn't try

Steps to Reproduce:
1.Install AS3.0 update cd's re0108
2.run tests 
3.wait a weekend
    

Actual Results:    The system was non-usable

Expected Results:    The system should have behaved in a usable fashion.

Additional info:

  Working to retest system.
Comment 1 keith mannth 2004-01-12 19:00:27 EST
Created attachment 96914 [details]
var/log/messages from system
Comment 2 keith mannth 2004-01-12 19:14:34 EST
Created attachment 96919 [details]
first var/log/messages

 had remove some of the data due to size constraints
Comment 3 Dave Anderson 2004-01-15 08:21:16 EST
Well, there's nothing to work with here.

Please reproduce the hang state, and then forward the outputs
of:

  Alt-Sysrq-m
  Alt-Sysrq-p (several in a row)
  Alt-Sysrq-w
  Alt-Sysrq-t

Before starting your tests, make sure /proc/sys/kernel/sysrq is set
to 1, or that "kernel.sysrq" is set to 1 in /etc/sysctl.conf.

 
Comment 4 keith mannth 2004-01-15 13:16:44 EST
  I am currently testing with 2 systems to see the issue again and to
get the above outputs.  
Comment 5 Dave Anderson 2004-01-16 13:32:54 EST
update to my last request:

please do the Alt-Sysrq-w last, as it is possible that it will
hang the console (and never return) if one of the cpus is spinning
on a lock with interrupts disabled.  So, if you get the same hang
please do the Alt-Sysrq's in this order:

Alt-Sysrq-m
Alt-Sysrp-p (several in a row)
Alt-Sysrq-t
Alt-Sysrq-w
 
Comment 6 keith mannth 2004-01-19 19:37:00 EST
  Well I have been testing 2 systems for 6 days had havent seen the
issue again.  If I see it again and am able capture any debug output I
will post again. Thanks.
Comment 7 Johan Walles 2004-03-19 08:51:01 EST
This sounds a bit like bug 117210.
Comment 8 RHEL Product and Program Management 2007-10-19 15:31:27 EDT
This bug is filed against RHEL 3, which is in maintenance phase.
During the maintenance phase, only security errata and select mission
critical bug fixes will be released for enterprise products. Since
this bug does not meet that criteria, it is now being closed.
 
For more information of the RHEL errata support policy, please visit:
http://www.redhat.com/security/updates/errata/
 
If you feel this bug is indeed mission critical, please contact your
support representative. You may be asked to provide detailed
information on how this bug is affecting you.

Note You need to log in before you can comment on or make changes to this bug.