From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6) Gecko/20040207 Firefox/0.8 Description of problem: cat /etc/redhat-release Red Hat Enterprise Linux ES release 3.90 (Nahant) Process timing measurement is incorrect. Also note that top, ps, and /proc will not charge processor time to tasks which complete their load in less than 1/HZ (a "jiffy"). Version-Release number of selected component (if applicable): kernel-smp-2.6.8-1.528.2.10 How reproducible: Always Steps to Reproduce: 1. exec top (see output #1 below) 2. exec'ed an app (eatcpu.linux) 3. see output #2 below 4. kill eatcpu 5. see output #3 below Actual Results: #1: top - 12:36:24 up 2 days, 20:06, 1 user, load average: 0.00, 0.00, 0.00 Tasks: 75 total, 1 running, 74 sleeping, 0 stopped, 0 zombie Cpu(s): 0.0% us, 0.1% sy, 0.0% ni, 99.9% id, 0.0% wa, 0.0% hi, 0.0% si Mem: 3976640k total, 399000k used, 3577640k free, 200328k buffers Swap: 4096312k total, 0k used, 4096312k free, 96852k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 13122 root 16 0 3228 928 1652 R 0.5 0.0 0:00.09 top 1 root 16 0 3620 492 1396 S 0.0 0.0 0:01.37 init -------------------------------------------------- #2: top - 12:38:14 up 2 days, 20:08, 1 user, load average: 0.66, 0.19, 0.06 Tasks: 76 total, 2 running, 74 sleeping, 0 stopped, 0 zombie Cpu(s): 25.1% us, 0.0% sy, 0.0% ni, 74.9% id, 0.0% wa, 0.0% hi, 0.0% si Mem: 3976640k total, 399896k used, 3576744k free, 200328k buffers Swap: 4096312k total, 0k used, 4096312k free, 97632k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 13123 root 25 0 2208 688 2164 R 99.9 0.0 1:42.34 eatcpu.linux 1 root 16 0 3620 492 1396 S 0.0 0.0 0:01.37 init 2 root RT 0 0 0 0 S 0.0 0.0 0:00.24 migration/0 ----------------------------------------------------- #3: top - 12:39:27 up 2 days, 20:09, 1 user, load average: 0.90, 0.37, 0.13 Tasks: 75 total, 1 running, 74 sleeping, 0 stopped, 0 zombie Cpu(s): 0.0% us, 0.1% sy, 0.0% ni, 99.9% id, 0.0% wa, 0.0% hi, 0.0% si Mem: 3976640k total, 399896k used, 3576744k free, 200344k buffers Swap: 4096312k total, 0k used, 4096312k free, 97616k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 1 root 16 0 3620 492 1396 S 0.0 0.0 0:01.37 init 2 root RT 0 0 0 0 S 0.0 0.0 0:00.24 migration/0 3 root 34 19 0 0 0 S 0.0 0.0 0:00.00 ksoftirqd/0 4 root RT 0 0 0 0 S 0.0 0.0 0:00.02 migration/1 Expected Results: Per CPU%'s reported in CPU states should match (or total) CPU%'s reported per PID. Additional info:
Now that I look at my initial description, it was not accurate and didn't do the problem justice. We are running thousands of programs that are completing within a millisecond (or two). These processes consume a considerable amount of processing power but are not charged processor time because tasks which complete their load in less than 2/HZ (a "jiffy") are not charged. In previous kernel releases poll latency was defined as 1/Hz with one run queue. Now it appears process scheduling is done with 2 run queues making the poll latency two "jiffies". If it will help, I might be able to provide some sample code to illustrate this. But based on my limited research, this is a known issue within the community. Albert Calahan (the maintainer of top and ps) suggested we make some kernel hacks to get this working, but that would render our release unsupportable by Red Hat. Please advise ... thanks.
Indeed, this is a known issue upstream. I would be happy to work upstream with you to get the issue fixed there. You are right in that the changes would probably be so invasive that such a hacked kernel would not be Red Hat supportable... Let me know if you want help fixing this issue in the community. IMHO it is worth fixing, just not sure if Linus will agree ;)
We would like to have this resolved upstream and your help would be greatly appreciated. We were told it would render our release unsupportable if we made the changes ourselves. Please advise ....