Red Hat Bugzilla – Bug 458324
bogus %CPU values from top
Last modified: 2009-11-23 04:59:04 EST
With everything current from rawhide, including kernel 2.6.27-0.226.rc1.git5.fc10.i686 and procps 3.2.7-20.fc9.i386, here's what I see when I run "top":
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
3769 jik 20 0 25976 8888 7532 S 143.2 0.4 1126:42 multiload-apple
3549 jik 20 0 46032 2044 1640 S 90.9 0.1 187:11.98 gvfs-fuse-daemo
11045 jik 20 0 2560 1100 828 R 1.3 0.1 0:00.10 top
Note the bogus %CPU values on the top two lines.
The values over 100 % are OK -- it means that the process consumes more than 1 CPU (core). If you don't have multiprocessor/multicore machine then something is wrong.
How many (logical) CPUs does your system have?
I have an HT CPU which pretends to have two CPUs. "cat /proc/cpuinfo" gives me two of the blocks shown below.
I don't quite get how a single process can use more than one CPU at a time, but leaving that aside for the moment, since I have only two CPUs (or CPU-like things, since it's really only one CPU with HT), then it seems to me that the percentages shouldn't be able to go over 200%, and yet I regularly see processes listed at much higher percentages than that.
vendor_id : GenuineIntel
cpu family : 15
model : 4
model name : Intel(R) Pentium(R) 4 CPU 3.00GHz
stepping : 1
cpu MHz : 3016.881
cache size : 1024 KB
physical id : 0
siblings : 2
core id : 0
cpu cores : 1
apicid : 0
initial apicid : 0
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 5
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe constant_tsc pebs bts pni monitor ds_cpl cid xtpr
bogomips : 6033.76
clflush size : 64
Please start top, press '1' and post the summary data that should contain the overall numbers for each logical CPU.
One process can have multiple threads and each of them can run on a different CPU. That's how the usage goes over 100 %.
top - 15:03:27 up 1:13, 1 user, load average: 0.01, 0.03, 0.00
Tasks: 140 total, 1 running, 137 sleeping, 2 stopped, 0 zombie
Cpu0 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu1 : 0.0%us, 0.7%sy, 0.0%ni, 99.3%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 1990292k total, 553932k used, 1436360k free, 50660k buffers
Swap: 2104496k total, 0k used, 2104496k free, 330116k cached
Perhaps there's no bug here, but there's certainly a change in behavior, though, because previously I rarely if ever saw top report a process with more than 100% CPU, but now I see it all the time.
Still, I find it hard to believe that the explanation is multiple threads. Some other anomalies:
When I hit H to display all threads, the next time top updates the display, there a bunch of processes which show %CPU as 9999.9. This seems to happen again when I switch back.
Watching the top output on an ongoing basis, I just saw rsyslogd report 9999.9% CPU, and imapd reporting over 2000% cpu for a number of updates in a row. The former is clearly bogus. I suppose the latter is possible, but it still seems quite odd.
I wonder if perhaps there's a bug in the code for amalgating usage for all the threads in a process?
(In reply to comment #4)
> Perhaps there's no bug here, but there's certainly a change in behavior,
> though, because previously I rarely if ever saw top report a process with more
> than 100% CPU, but now I see it all the time.
I can't explain the behaviour you observe (haven't reproduced it myself) by any change in the procps' code -- which hasn't changed much already for some months. This may be caused by the kernel change and I can't do much about that. All in all top only reads data from /proc, so it might be interesting to look at the raw values of /proc/<num>/stat.
I have this behaviour too.
model name : Intel(R) Core(TM)2 Duo CPU T9300 @ 2.50GHz
I agree that the numbers are clearly bogus. I can understand the up to 200% on a dual core machine but sometimes the values are well beyond that.
No matter how much I repetetively use
ps -eo pcpu,comm | sort -n
I can't get any pcpu value to exceed even 100% while top is concurrently showing values past 100% for the same processes. I believe that ps is also reading the /proc values.
I don't recall this behaviour happening in 2.6.26... it only seems to have started with 2.6.27.
(In reply to comment #6)
> ps -eo pcpu,comm | sort -n
> I can't get any pcpu value to exceed even 100% while top is concurrently
> showing values past 100% for the same processes. I believe that ps is also
> reading the /proc values.
It's true that top and ps both read the values from /proc but there is a difference: ps computes the %CPU as a CPU time divided by the time the process is running while top divides the CPU time by the time since the last screen update. You must be very lucky to get comparably high values from both.
I will try to look into the kernel changes between 2.6.26 and 2.6.27 that could affect the /proc values.
The routine that fills out the proc values is
do_task_stat in fs/proc/array.c
There were changes from 2.6.26 to 2.6.27-rc4 for namespaces and using seq_printf for buffering. Neither of those appears to be a problem (seq_printf and sprintf share the same formatting routine)
I'll wander through the code a bit more but I don't see any obvious problems.
The values are back to being correct/reasonable in
Jonathan, did the new kernel fix this for you, or does the problem still occur?
I am still seeing %CPU values from top higher than 100% for single-threaded applications and much higher than 200% for both single-threaded and multi-threaded applications (on a two-processor system), so no, this doesn't appear to be fixed.
I'm running kernel 184.108.40.206-23.rc1.fc10.i686 and procps-3.2.7-21.fc10.i386
This bug appears to have been reported against 'rawhide' during the Fedora 10 development cycle.
Changing version to '10'.
More information and reason for this action is here:
I am also seeing a general discrepancy between the values in /proc/pid/stat and /proc/stat with the values for cpu and system cpu being understated in the latter. The same discrepancy shows in top with sum of cpu usage from all the processes exceed that of the cpu utilization. The problem shows for either individual cpus or the total cpu for the box.
run an application with a noticeable cpu consumption. Note its pid, e.g. 1020, sample the values in /proc/stat and /proc/1020/stat
cat /proc/1020/stat /proc/stat | head 2 > t1
cat /proc/1020/stat /proc/stat | head 2 >> t1
Compute the jiffies delta for user and sys cpu for both the process and the host. I am seeing the former being 1.5 to 2 times larger.
I am running kernel 220.127.116.11-170.2.35.fc10.x86_64 #1 SMP Mon Feb 23 13:00:23 EST 2009 x86_64 x86_64 x86_64 GNU/Linux
This message is a reminder that Fedora 10 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 10. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as WONTFIX if it remains open with a Fedora
'version' of '10'.
Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version'
to a later Fedora version prior to Fedora 10's end of life.
Bug Reporter: Thank you for reporting this issue and we are sorry that
we may not be able to fix it before Fedora 10 is end of life. If you
would still like to see this bug fixed and are able to reproduce it
against a later version of Fedora please change the 'version' of this
bug to the applicable version. If you are unable to change the version,
please add a comment here and someone will do it for you.
Although we aim to fix as many bugs as possible during every release's
lifetime, sometimes those efforts are overtaken by events. Often a
more recent Fedora release includes newer upstream software that fixes
bugs or makes them obsolete.
The process we are following is described here:
I don't think I'm seeing this anymore in rawhide.
ok, closing. if you encounter this again, you can reopen