Bug 54114 - swapped-out process stuck in system-mode using 100% cpu
swapped-out process stuck in system-mode using 100% cpu
Status: CLOSED CURRENTRELEASE
Product: Red Hat Linux
Classification: Retired
Component: kernel (Show other bugs)
7.1
i686 Linux
medium Severity medium
: ---
: ---
Assigned To: Arjan van de Ven
Brock Organ
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2001-09-27 12:45 EDT by Richard T. Jones
Modified: 2007-04-18 12:37 EDT (History)
0 users

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2003-06-07 14:13:13 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Richard T. Jones 2001-09-27 12:45:03 EDT
From Bugzilla Helper:
User-Agent: Mozilla/4.76C-CERN UNIX gryphn 45 [en] (X11; U; Linux 2.4.2-2
i686)

Description of problem:
After some time of running under heavy nfs traffic a process on one or
more of the nfs client nodes starts consuming 100% of one cpu but not
seeming to progress.  Doing kill -9 <pid> as root does nothing.  The
system otherwise continues to respond.  Nothing unusual shows up in
/var/log/messages.  Doing a normal reboot proceeds as usual, and the
rebooted system is idle again.

When the stuck process is running, /proc/loadavg reports 2.0 runnable
processes.  Running top shows 100% of one cpu devoted to running the
stuck job.  This process is running 100% system-time.  It does not respond
to signals.  Top reports this process as RW (runnable and swapped). 
Its sizes (from top) are all zero as expected for swapped-out processes.

Version-Release number of selected component (if applicable):

2.4.2-2smp kernel.  Test node is running stock i686 kernel.  Other nodes
run a diskless (IP-autoconfig + nfs-root) build and show same behavior.
I have not seen it on my Athlon-smp nodes so far.

How reproducible:
Sometimes

Steps to Reproduce:
1. Run several i/o bound (nfs) jobs on the cluster, loading the net/server
2. Log onto one of the client nodes and start something (eg. compiler)
3. Watch top to see the process jump to 100% system-mode usage.
	

Actual Results:  At some point the job will get stuck. Sequential shutdown
(with /etc/rc.d/init.d/* stop) of  everything (except network) fails to
unstick it.  Reboot works.

Expected Results:  process should have finished normally, or at least
responded to  kill -9 <pid> from superuser.

Additional info:
Comment 1 Alan Cox 2003-06-07 14:13:13 EDT
Should have been fixed a long time ago by errata kernel. If not re-open.

Note You need to log in before you can comment on or make changes to this bug.