Bug 111300 - proc's accounting of CPU and memory utilization wrong under NPTL
proc's accounting of CPU and memory utilization wrong under NPTL
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: procps (Show other bugs)
3.0
All Linux
medium Severity medium
: ---
: ---
Assigned To: Daniel Walsh
Brian Brock
:
Depends On:
Blocks: 116727
  Show dependency treegraph
 
Reported: 2003-12-01 15:42 EST by Ben Woodard
Modified: 2007-11-30 17:06 EST (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2004-08-19 14:18:19 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
simple test program that illustrates the problem (258 bytes, text/plain)
2003-12-01 15:47 EST, Ben Woodard
no flags Details
A patch to fix this problem (511 bytes, patch)
2004-03-11 09:08 EST, Bernd Schmidt
no flags Details | Diff

  None (edit)
Description Ben Woodard 2003-12-01 15:42:13 EST
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1)
Gecko/20030225

Description of problem:
This may be a duplicate of 110555 but I'm not 100% sure. They are at
least somewhat related.

When you run a computationally bound process which has multiple
threads does not accurately show the utilization of the machine.

In particular two fields seem to reflect incorrect data. %CPU and %MEM
columns.

Even when a process is fully using all four processors on a four
processor machine the %CPU is only 24.9%. It can be seen from the
header of the top display that all four CPUs are being fully utilized
but this is not reflected in the process's utilization in the top line.

The same thing is true of the %MEM. However, the SIZE and RSS seem to
reflect the correct values.

Version-Release number of selected component (if applicable):
procps-2.0.13-9.2E

How reproducible:
Always

Steps to Reproduce:
1. run a program that hogs all the CPUs
2. run top

    

Actual Results:   12:38:06  up 24 days,  6:14, 10 users,  load
average: 0.02, 0.07, 0.07
75 processes: 73 sleeping, 2 running, 0 zombie, 0 stopped
CPU states:  cpu    user    nice  system    irq  softirq  iowait    idle
           total   55.9%    0.0%    0.0%   0.0%     0.0%    0.0%   44.0%
           cpu00   56.1%    0.0%    0.0%   0.0%     0.0%    0.0%   43.7%
           cpu01   55.8%    0.0%    0.0%   0.0%     0.0%    0.0%   44.1%
           cpu02   55.8%    0.0%    0.0%   0.0%     0.0%    0.0%   44.1%
           cpu03   55.8%    0.0%    0.0%   0.0%     0.0%    0.0%   44.0%
Mem:  16629968k av, 6702144k used, 9927824k free,       0k shrd, 
214432k buff
      1041312k active,            5067296k inactive
Swap: 16779856k av,       0k used, 16779856k free                
5839264k cached
                                                                     
                                                        
  PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME CPU COMMAND
 9150 ben       23   0   992  992   688 R    24.9  0.0   0:11   1 a.out


Expected Results:   12:38:06  up 24 days,  6:14, 10 users,  load
average: 0.02, 0.07, 0.07
75 processes: 73 sleeping, 2 running, 0 zombie, 0 stopped
CPU states:  cpu    user    nice  system    irq  softirq  iowait    idle
           total   55.9%    0.0%    0.0%   0.0%     0.0%    0.0%   44.0%
           cpu00   56.1%    0.0%    0.0%   0.0%     0.0%    0.0%   43.7%
           cpu01   55.8%    0.0%    0.0%   0.0%     0.0%    0.0%   44.1%
           cpu02   55.8%    0.0%    0.0%   0.0%     0.0%    0.0%   44.1%
           cpu03   55.8%    0.0%    0.0%   0.0%     0.0%    0.0%   44.0%
Mem:  16629968k av, 6702144k used, 9927824k free,       0k shrd, 
214432k buff
      1041312k active,            5067296k inactive
Swap: 16779856k av,       0k used, 16779856k free                
5839264k cached
                                                                     
                                                        
  PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME CPU COMMAND
 9150 ben       23   0   992  992   688 R    99.9  0.0   0:11   1 a.out

or 
 12:38:06  up 24 days,  6:14, 10 users,  load average: 0.02, 0.07, 0.07
75 processes: 73 sleeping, 2 running, 0 zombie, 0 stopped
CPU states:  cpu    user    nice  system    irq  softirq  iowait    idle
           total   55.9%    0.0%    0.0%   0.0%     0.0%    0.0%   44.0%
           cpu00   56.1%    0.0%    0.0%   0.0%     0.0%    0.0%   43.7%
           cpu01   55.8%    0.0%    0.0%   0.0%     0.0%    0.0%   44.1%
           cpu02   55.8%    0.0%    0.0%   0.0%     0.0%    0.0%   44.1%
           cpu03   55.8%    0.0%    0.0%   0.0%     0.0%    0.0%   44.0%
Mem:  16629968k av, 6702144k used, 9927824k free,       0k shrd, 
214432k buff
      1041312k active,            5067296k inactive
Swap: 16779856k av,       0k used, 16779856k free                
5839264k cached
                                                                     
                                                        
  PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME CPU COMMAND
 9150 ben       23   0   992  992   688 R    399.0  0.0   0:11   1 a.out




Additional info:
Comment 1 Ben Woodard 2003-12-01 15:44:09 EST
Here is a better example header. With the previous header, I grabbed
it before the process had fully grabbed all the CPUs.

 12:45:57  up 24 days,  6:22, 10 users,  load average: 0.69, 1.13, 0.76
76 processes: 74 sleeping, 2 running, 0 zombie, 0 stopped
CPU states:  cpu    user    nice  system    irq  softirq  iowait    idle
           total   99.9%    0.0%    0.0%   0.0%     0.0%    0.0%    0.0%
           cpu00   99.9%    0.0%    0.0%   0.0%     0.0%    0.0%    0.0%
           cpu01  100.0%    0.0%    0.0%   0.0%     0.0%    0.0%    0.0%
           cpu02  100.0%    0.0%    0.0%   0.0%     0.0%    0.0%    0.0%
           cpu03   99.9%    0.0%    0.0%   0.0%     0.0%    0.0%    0.0%
Mem:  16629968k av, 6704672k used, 9925296k free,       0k shrd, 
214432k buff
      1041472k active,            5067296k inactive
Swap: 16779856k av,       0k used, 16779856k free                
5839232k cached
                                                                     
                                                        
  PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME CPU COMMAND
 9200 ben       25   0   992  992   688 R    24.9  0.0   0:54   1 a.out
Comment 2 Ben Woodard 2003-12-01 15:47:00 EST
Created attachment 96265 [details]
simple test program that illustrates the problem

cc -lpthread top-test.c
Then run the program. This was designed to run on a 4CPU box but the same
effect should show up on any SMP box.
Comment 3 Alexander Larsson 2004-02-05 05:39:21 EST
Mass reassign to new owner
Comment 4 Alexander Larsson 2004-02-05 05:43:51 EST
Mass reassign to new owner
Comment 6 Daniel Walsh 2004-02-16 15:18:37 EST
Could you try this with procps-3.1.15.

Many fixes have gone into this version.

Dan 
Comment 7 Dave Maley 2004-02-17 12:58:49 EST
looks like the problem still exists in procps-3.1.15-4.  The %CPU
never seemed to get above 25%, while the top headers show CPU at 100%.
 %MEM is also still never going above 0% or 0.1%.


top - 13:00:46 up 12 min,  3 users,  load average: 3.95, 3.52, 1.82
Tasks:  70 total,   4 running,  66 sleeping,   0 stopped,   0 zombie
Cpu(s): 100.0% user,   0.0% system,   0.0% nice,   0.0% idle
Mem:    384200k total,   265800k used,   118400k free,    13696k buffers
Swap:  1959920k total,        0k used,  1959920k free,   143760k cached
                                                                     
                                                                       
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 3683 dave      25   0   344  344  276 R 25.3  0.1   0:05.44 top-test


I also tested procps-3.1.15-4 on a FC1 box w/ 2 CPU's --> same results
Comment 8 Bernd Schmidt 2004-03-11 09:08:04 EST
Created attachment 98458 [details]
A patch to fix this problem

Part of the problem is that top caps CPU usage at 100%, which isn't correct for
processes with multiple threads - they can consume more than 100% of one CPU. 
The second part of the problem is that we then divide the CPU usage by the
number of CPUs (this behaviour can be toggled with the "I" key).

This patch just raises the cap to nr_cpu * 100%.
Comment 9 Bernd Schmidt 2004-03-11 09:15:24 EST
I've tried to reproduce a problem with %MEM, and couldn't.  On a
machine with 2G of RAM I added code to the test program to allocate
and touch 512MB; top reported %MEM as 25% which seems normal.
Comment 10 Daniel Walsh 2004-03-29 07:53:21 EST
According to the Upstream maintainer, he believes that this requires a
modification to the kernel, to make it work correctly.

Comment 11 Arjan van de Ven 2004-03-29 07:55:10 EST
Which isn't correct
Comment 12 Daniel Walsh 2004-06-24 13:59:53 EDT
Added patch to procps-2.0.17-9 in U3.

Dan
Comment 13 Albert Cahalan 2004-07-20 14:06:50 EDT
While it is possible to scan all threads, this would
mostly eliminate the performance advantages of having
distinct per-process and per-thread info in /proc.
On large systems, one might as well delete "top" then.
Even without this change, "top" is barely able to
slurp down ACSII /proc files fast enough to avoid
falling behind on a large system.

I have provided an outline of kernel changes that need
to be made. I'll include it in another comment.
Comment 14 Albert Cahalan 2004-07-20 14:07:39 EDT
I'll outline a solution below. While Arjan is in
some way correct, the non-kernel solutions are
unacceptably slow -- they nearly wipe out the
whole reason for having threads in task directories.
If I accept a user-space hack, there will be far
less motivation for fixing the kernel.

Here's part of an email I sent to somebody who was
interested in solving this problem. He had plenty of
code experience, but no kernel experience and little time.

I think these instructions should be simple to follow.
The only concerns are locking and cache line bouncing.
Aside from that, the code change is trivial.

-----------------------------------------------------------

> If you think there is something that I can do to help you in a 1 to 2
> day time frame, let me know.  I will work on it at night and over the
> weekend.

No, but you can produce a 90%-correct hack for yourself
in a day. This involves the kernel source, not procps.

In fs/proc/base.c find the proc_pident_lookup function.
Change this:
                case PROC_TID_STAT:
                case PROC_TGID_STAT:
                        inode->i_fop = &proc_info_file_operations;
                        ei->op.proc_read = proc_pid_stat;
                        break;

Into this:
                case PROC_TID_STAT:
                        inode->i_fop = &proc_info_file_operations;
                        ei->op.proc_read = proc_tid_stat;
                        break;
                case PROC_TGID_STAT:
                        inode->i_fop = &proc_info_file_operations;
                        ei->op.proc_read = proc_tgid_stat;
                        break;

In fs/proc/array.c, copy proc_pid_stat to make a second
function. Name one copy proc_tid_stat, and the other
copy proc_tgid_stat. There's also an extern declaration
that you need to duplicate in the fs/proc/base.c file.
Use grep to see if I missed anything.

Now you have separate per-process and per-thread functions,
but they show the same thing. Modify the per-process (tgid)
one to take CPU usage data from some new variables. You can
put these variables in the task_struct with the others,
causing non-leader tasks to have some wasted space. The
fancy solution would involve the signal struct, but that
requires different locking.

That leaves only one thing left. You need to insure that
your new struct members get updated. Find the places where
this happens for the per-thread variables. Either use the
group_leader pointer for this (in case you placed the new
data in the task_stuct of the leader) or... you're on your
own for using the signal struct.

There. Done, except for some rare glitches when multiple
threads try to update the counter at the exact same moment.
Comment 15 Albert Cahalan 2004-07-20 14:09:10 EDT
As you can see, this is now a kernel bug. :-)
No procps modifications are required at all.
Comment 16 Jay Turner 2004-08-17 10:24:25 EDT
I still think there might be something odd going on.  With
procps-2.0.17-10, I'm able to get the following output:

 10:25:09  up  2:44,  5 users,  load average: 5.02, 3.43, 1.81
80 processes: 78 sleeping, 2 running, 0 zombie, 0 stopped
CPU states:  cpu    user    nice  system    irq  softirq  iowait    idle
           total  215.2%    0.0%  138.8%   0.0%     0.0%    0.0%   44.8%
           cpu00   54.0%    0.0%   35.6%   0.2%     0.0%    0.0%   10.2%
           cpu01   51.4%    0.0%   37.0%   0.0%     0.0%    0.0%   11.6%
           cpu02   55.0%    0.0%   32.7%   0.0%     0.0%    0.0%   12.1%
           cpu03   55.0%    0.0%   33.7%   0.0%     0.0%    0.0%   11.1%
Mem:  4025148k av, 2547936k used, 1477212k free,       0k shrd,   
7292k buff
      2494512k active,                184k inactive
Swap: 2040244k av,  110144k used, 1930100k free                  
24824k cached
                                                                     
                                                                     
                         
  PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME CPU COMMAND
 6179 root      25   0 2387M 2.3G   308 R    354.4 60.7   0:22   3 a.out
Comment 18 Jay Turner 2004-09-01 22:59:02 EDT
An errata has been issued which should help the problem 
described in this bug report. This report is therefore being 
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files, 
please follow the link below. You may reopen this bug report 
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2004-311.html

Note You need to log in before you can comment on or make changes to this bug.