Bug 113351 - System non-usable after 2 days of stress
Summary: System non-usable after 2 days of stress
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: kernel
Version: 3.0
Hardware: i686
OS: Linux
medium
high
Target Milestone: ---
Assignee: Dave Anderson
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2004-01-12 23:59 UTC by keith mannth
Modified: 2007-11-30 22:07 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2007-10-19 19:31:27 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
var/log/messages from system (914.88 KB, text/plain)
2004-01-13 00:00 UTC, keith mannth
no flags Details
first var/log/messages (947.92 KB, text/plain)
2004-01-13 00:14 UTC, keith mannth
no flags Details

Description keith mannth 2004-01-12 23:59:15 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0.2)
Gecko/20030716

Description of problem:
  I was stress-testing and IBM x440 8-way system over the weekend. 
When I arived on monday the system was in a non-usable state. The
non-disk based tests were still running but the others had stoped. 
Top was still running but not a nfs copy.  
   I was unable to logon to the system via the console or ssh.  It
would take my username and password but would not give me a shell.  My
one shell open was lost to an uptime command that never returned and I
could not kill.  
  Top showed very high load avrages 17-20 with all the cpus doing idle
things.  It seems like new processes were not able to start.  It also
showed there was lots of free memory left.
  After rebooting the box to check /var/log/messages there was lots of
free disk space so I don't know what is going on.   Just that the box
was unusable.  Commands would not return  and users could not logon. 
I will attach the messages (I didn't see anything good in there but
who knows) 
  I am working to test on other boxes asap to see if I can reproduce
the problem somewhere else.  The test I am ruunning is a tools10 based
.(kernel compile, nfs copy, copy cd to disk, hell hound, do some pings
loop)   

Version-Release number of selected component (if applicable):
kernel-2.4.21-7.EL

How reproducible:
Didn't try

Steps to Reproduce:
1.Install AS3.0 update cd's re0108
2.run tests 
3.wait a weekend
    

Actual Results:    The system was non-usable

Expected Results:    The system should have behaved in a usable fashion.

Additional info:

  Working to retest system.

Comment 1 keith mannth 2004-01-13 00:00:27 UTC
Created attachment 96914 [details]
var/log/messages from system

Comment 2 keith mannth 2004-01-13 00:14:34 UTC
Created attachment 96919 [details]
first var/log/messages

 had remove some of the data due to size constraints

Comment 3 Dave Anderson 2004-01-15 13:21:16 UTC
Well, there's nothing to work with here.

Please reproduce the hang state, and then forward the outputs
of:

  Alt-Sysrq-m
  Alt-Sysrq-p (several in a row)
  Alt-Sysrq-w
  Alt-Sysrq-t

Before starting your tests, make sure /proc/sys/kernel/sysrq is set
to 1, or that "kernel.sysrq" is set to 1 in /etc/sysctl.conf.

 


Comment 4 keith mannth 2004-01-15 18:16:44 UTC
  I am currently testing with 2 systems to see the issue again and to
get the above outputs.  


Comment 5 Dave Anderson 2004-01-16 18:32:54 UTC
update to my last request:

please do the Alt-Sysrq-w last, as it is possible that it will
hang the console (and never return) if one of the cpus is spinning
on a lock with interrupts disabled.  So, if you get the same hang
please do the Alt-Sysrq's in this order:

Alt-Sysrq-m
Alt-Sysrp-p (several in a row)
Alt-Sysrq-t
Alt-Sysrq-w
 

Comment 6 keith mannth 2004-01-20 00:37:00 UTC
  Well I have been testing 2 systems for 6 days had havent seen the
issue again.  If I see it again and am able capture any debug output I
will post again. Thanks.

Comment 7 Johan Walles 2004-03-19 13:51:01 UTC
This sounds a bit like bug 117210.


Comment 8 RHEL Program Management 2007-10-19 19:31:27 UTC
This bug is filed against RHEL 3, which is in maintenance phase.
During the maintenance phase, only security errata and select mission
critical bug fixes will be released for enterprise products. Since
this bug does not meet that criteria, it is now being closed.
 
For more information of the RHEL errata support policy, please visit:
http://www.redhat.com/security/updates/errata/
 
If you feel this bug is indeed mission critical, please contact your
support representative. You may be asked to provide detailed
information on how this bug is affecting you.


Note You need to log in before you can comment on or make changes to this bug.