Bug 159799 - A lot of stopped processes on a SMP system with HTT enabled
A lot of stopped processes on a SMP system with HTT enabled
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: kernel (Show other bugs)
i386 Linux
medium Severity medium
: ---
: ---
Assigned To: Don Howard
Brian Brock
Depends On:
  Show dependency treegraph
Reported: 2005-06-08 04:26 EDT by Vlady
Modified: 2007-11-30 17:07 EST (History)
1 user (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2005-11-07 16:31:33 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

  None (edit)
Description Vlady 2005-06-08 04:26:45 EDT
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.7.8) Gecko/20050512 Firefox/1.0.4

Description of problem:
We have number of 2 CPU servers with HTT enabled on the processors.

They run various RedHat kernels with versions 2.4.21 and above.
On all of these servers we encounter a problems with a lot of "stoped processes", e.g. shown with status ``T'' in the output of ps(1) command.

If the servers run unattended long time their total number of processes become numerous. This affects servers' performance and in some cases causes errors/problems of type "cannot fork".

Version-Release number of selected component (if applicable):
kernel-2.4.21 and above

How reproducible:

Steps to Reproduce:
1. Run multiprocessor machine with HTT enabled on the processors
2. Leave it running and executing some tasks some days
3. Execute ps -ax | grep T

Actual Results:  From day to day the number of "stopped processes" (with status ``T'' in the ps(1) output ) becomes bigger and bigger.

Expected Results:  Normally, there should be a few number of processes with such a status.

Additional info:

I supose the bug/problem is caused by the process scheduler, working in a SMP HTT enabled machine.
Comment 1 Ernie Petrides 2005-06-08 20:18:19 EDT
Thanks for your report, Vlady.  Could you please specify the exact
(most recent) kernel version that exhibits this problem?  Also, can
you give us an idea of what sort of processes are stopped?  And do
they go away if they are killed?
Comment 2 Don Howard 2005-06-08 20:34:05 EDT
Vlady -

Could you also capture, 'ps -ax', sysrq-t and sysrq-m output from a system that
is experiencing this problem?
Comment 3 Vlady 2005-06-09 04:08:46 EDT
Below is a small part of the ``ps -ax | grep T" command results on a server with
2.4.21-27.0.4.ELsmp kernel and HTT enabled.

26337 ?        T      0:00 cut -b 7-
24464 ?        T      0:00 sed s/$/<NL>/g
23302 ?        T      0:00 /bin/sed s|/|.|g
  9958 ?        T      0:00 id -u
  8551 ?        T      0:00 netstat -tln
22400 ?        T      0:00 mkdir -p /some/dir
  1697 ?        T      0:00 netstat -tln
13750 ?        T      0:00 /bin/sed s|/dev/||
20597 ?        T      0:00 netstat -tln
25570 ?        T      0:00 /bin/sed s|/dev/||
  7115 ?        T      0:00 /bin/sh /bin/egrep -q (^|:)/usr/X11R6/bin($|:)
22075 ?        T      0:00 id -un
  1386 ?        T      0:00 /usr/bin/tty
  2272 ?        T      0:00 sh -c date +%Z 2> /dev/null
21237 ?        T      0:00 sh -c sysctl fs.file-max
28206 ?        T      0:00 /bin/sh -c /usr/lib/sa/sa2 -A
32587 ?        T      0:00 sh -c date +%Z 2> /dev/null
30494 ?        T      0:00 sh -c date +%Z 2> /dev/null
 4591 ?         T      0:00 netstat -tln
 9427 ?         T      0:00 /bin/sed s|/dev/||

Sorry, but i can't supply you with the results of SysRq*. All our servers which
experince "stopped processes" problem are in production and don't want to
experiement with their kernels! :(
Comment 4 Vlady 2005-06-09 04:24:55 EDT
Also, i don't have console access to our servers, so i can't even excute sysreq
+ t or sysreq + m keyboard secuences.
Comment 5 Don Howard 2005-06-09 13:40:27 EDT
Vlady -

You can use sysrq-trigger remotely:

# enable sysrq-trigger
$ echo 1 > /proc/sys/kernel/sysrq 

# sysrq-t
$ echo t > /proc/sysrq-trigger

# sysrq-m
$ echo m > /proc/sysrq-trigger

The sysrq info is really important - I can't make any suggestions unless I can
see where these process are blocking in the kernel.
Comment 6 Don Howard 2005-11-07 16:31:33 EST
Hi Vlady

Were you able to use the sysreq-trigger mechanism I mentioned above? I'll need
the sysrq-t & sysrq-m info in order to see what's happening with the stopped

For now, I'm going to close this issue.  Please re-open it if you are able to
collect the info (and are still experiencing the problem).

Note You need to log in before you can comment on or make changes to this bug.