Red Hat Bugzilla – Bug 159799
A lot of stopped processes on a SMP system with HTT enabled
Last modified: 2007-11-30 17:07:07 EST
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.7.8) Gecko/20050512 Firefox/1.0.4
Description of problem:
We have number of 2 CPU servers with HTT enabled on the processors.
They run various RedHat kernels with versions 2.4.21 and above.
On all of these servers we encounter a problems with a lot of "stoped processes", e.g. shown with status ``T'' in the output of ps(1) command.
If the servers run unattended long time their total number of processes become numerous. This affects servers' performance and in some cases causes errors/problems of type "cannot fork".
Version-Release number of selected component (if applicable):
kernel-2.4.21 and above
Steps to Reproduce:
1. Run multiprocessor machine with HTT enabled on the processors
2. Leave it running and executing some tasks some days
3. Execute ps -ax | grep T
Actual Results: From day to day the number of "stopped processes" (with status ``T'' in the ps(1) output ) becomes bigger and bigger.
Expected Results: Normally, there should be a few number of processes with such a status.
I supose the bug/problem is caused by the process scheduler, working in a SMP HTT enabled machine.
Thanks for your report, Vlady. Could you please specify the exact
(most recent) kernel version that exhibits this problem? Also, can
you give us an idea of what sort of processes are stopped? And do
they go away if they are killed?
Could you also capture, 'ps -ax', sysrq-t and sysrq-m output from a system that
is experiencing this problem?
Below is a small part of the ``ps -ax | grep T" command results on a server with
2.4.21-27.0.4.ELsmp kernel and HTT enabled.
26337 ? T 0:00 cut -b 7-
24464 ? T 0:00 sed s/$/<NL>/g
23302 ? T 0:00 /bin/sed s|/|.|g
9958 ? T 0:00 id -u
8551 ? T 0:00 netstat -tln
22400 ? T 0:00 mkdir -p /some/dir
1697 ? T 0:00 netstat -tln
13750 ? T 0:00 /bin/sed s|/dev/||
20597 ? T 0:00 netstat -tln
25570 ? T 0:00 /bin/sed s|/dev/||
7115 ? T 0:00 /bin/sh /bin/egrep -q (^|:)/usr/X11R6/bin($|:)
22075 ? T 0:00 id -un
1386 ? T 0:00 /usr/bin/tty
2272 ? T 0:00 sh -c date +%Z 2> /dev/null
21237 ? T 0:00 sh -c sysctl fs.file-max
28206 ? T 0:00 /bin/sh -c /usr/lib/sa/sa2 -A
32587 ? T 0:00 sh -c date +%Z 2> /dev/null
30494 ? T 0:00 sh -c date +%Z 2> /dev/null
4591 ? T 0:00 netstat -tln
9427 ? T 0:00 /bin/sed s|/dev/||
Sorry, but i can't supply you with the results of SysRq*. All our servers which
experince "stopped processes" problem are in production and don't want to
experiement with their kernels! :(
Also, i don't have console access to our servers, so i can't even excute sysreq
+ t or sysreq + m keyboard secuences.
You can use sysrq-trigger remotely:
# enable sysrq-trigger
$ echo 1 > /proc/sys/kernel/sysrq
$ echo t > /proc/sysrq-trigger
$ echo m > /proc/sysrq-trigger
The sysrq info is really important - I can't make any suggestions unless I can
see where these process are blocking in the kernel.
Were you able to use the sysreq-trigger mechanism I mentioned above? I'll need
the sysrq-t & sysrq-m info in order to see what's happening with the stopped
For now, I'm going to close this issue. Please re-open it if you are able to
collect the info (and are still experiencing the problem).