Bug 137583

Summary: failures of the process accounting in ps, top, and time
Product: Red Hat Enterprise Linux 4 Reporter: Allen Brown <abrown>
Component: kernelAssignee: Rik van Riel <riel>
Status: CLOSED UPSTREAM QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: 4.0   
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2004-11-01 20:51:18 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Allen Brown 2004-10-29 16:48:11 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6)
Gecko/20040207 Firefox/0.8

Description of problem:
cat /etc/redhat-release 
Red Hat Enterprise Linux ES release 3.90 (Nahant)

Process timing measurement is incorrect.  Also note that top, ps, and
/proc will not charge processor time to tasks which complete their
load in less than 1/HZ (a "jiffy").



Version-Release number of selected component (if applicable):
kernel-smp-2.6.8-1.528.2.10

How reproducible:
Always

Steps to Reproduce:
1. exec top (see output #1 below)
2. exec'ed an app (eatcpu.linux)
3. see output #2 below
4. kill eatcpu
5. see output #3 below

Actual Results:  #1:
top - 12:36:24 up 2 days, 20:06,  1 user,  load average: 0.00, 0.00, 0.00
Tasks:  75 total,   1 running,  74 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.0% us,  0.1% sy,  0.0% ni, 99.9% id,  0.0% wa,  0.0% hi, 
0.0% si
Mem:   3976640k total,   399000k used,  3577640k free,   200328k buffers
Swap:  4096312k total,        0k used,  4096312k free,    96852k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND 
         
13122 root      16   0  3228  928 1652 R  0.5  0.0   0:00.09 top     
          
    1 root      16   0  3620  492 1396 S  0.0  0.0   0:01.37 init    
          

--------------------------------------------------
#2:

top - 12:38:14 up 2 days, 20:08,  1 user,  load average: 0.66, 0.19, 0.06
Tasks:  76 total,   2 running,  74 sleeping,   0 stopped,   0 zombie
Cpu(s): 25.1% us,  0.0% sy,  0.0% ni, 74.9% id,  0.0% wa,  0.0% hi, 
0.0% si
Mem:   3976640k total,   399896k used,  3576744k free,   200328k buffers
Swap:  4096312k total,        0k used,  4096312k free,    97632k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND 
         
13123 root      25   0  2208  688 2164 R 99.9  0.0   1:42.34
eatcpu.linux       
    1 root      16   0  3620  492 1396 S  0.0  0.0   0:01.37 init    
          
    2 root      RT   0     0    0    0 S  0.0  0.0   0:00.24
migration/0        

-----------------------------------------------------
#3:

top - 12:39:27 up 2 days, 20:09,  1 user,  load average: 0.90, 0.37, 0.13
Tasks:  75 total,   1 running,  74 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.0% us,  0.1% sy,  0.0% ni, 99.9% id,  0.0% wa,  0.0% hi, 
0.0% si
Mem:   3976640k total,   399896k used,  3576744k free,   200344k buffers
Swap:  4096312k total,        0k used,  4096312k free,    97616k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND 
         
    1 root      16   0  3620  492 1396 S  0.0  0.0   0:01.37 init    
          
    2 root      RT   0     0    0    0 S  0.0  0.0   0:00.24
migration/0        
    3 root      34  19     0    0    0 S  0.0  0.0   0:00.00
ksoftirqd/0        
    4 root      RT   0     0    0    0 S  0.0  0.0   0:00.02
migration/1        


Expected Results:  Per CPU%'s reported in CPU states should match (or
total) CPU%'s reported per PID.



Additional info:

Comment 2 Allen Brown 2004-11-01 15:49:20 UTC
Now that I look at my initial description, it was not accurate and 
didn't do the problem justice. We are running thousands of programs 
that are completing within a millisecond (or two). These processes 
consume a considerable amount of processing power but are not charged 
processor time because tasks which complete their load in less than 
2/HZ (a "jiffy") are not charged.
In previous kernel releases poll latency was defined as 1/Hz with one 
run queue. Now it appears process scheduling is done with 2 run 
queues making the poll latency two "jiffies". 

If it will help, I might be able to provide some sample code to 
illustrate this. But based on my limited research, this is a known 
issue within the community. Albert Calahan (the maintainer of top and 
ps) suggested we make some kernel hacks to get this working, but that 
would render our release unsupportable by Red Hat.

Please advise ... thanks.

Comment 3 Rik van Riel 2004-11-01 20:51:18 UTC
Indeed, this is a known issue upstream.  I would be happy to work
upstream with you to get the issue fixed there.  You are right in that
the changes would probably be so invasive that such a hacked kernel
would not be Red Hat supportable...

Let me know if you want help fixing this issue in the community. IMHO
it is worth fixing, just not sure if Linus will agree ;)

Comment 4 Allen Brown 2004-11-02 21:30:41 UTC
We would like to have this resolved upstream and your help would be 
greatly appreciated. We were told it would render our release 
unsupportable if we made the changes ourselves.
Please advise ....