Bug 174619

Summary: top reports wrong values for CPU(s) in batch mode
Product: [Fedora] Fedora Reporter: Brian McEntire <brianm>
Component: procpsAssignee: Karel Zak <kzak>
Status: CLOSED ERRATA QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: medium    
Version: rawhide   
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-04-20 22:54:49 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Brian McEntire 2005-11-30 18:59:03 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.12) Gecko/20050921 Red Hat/1.7.12-1.4.1

Description of problem:
When I use a particular option to the 'top' command, specifically, -b -n1, I do not get the proper output. 

When running top in regular mode, it keeps updating and the values on the Cpu(s) line changes with each update. However, when running top with the -bn1 switches (batch mode, 1 iteration), the Cpu(s) line does not change from run to run over seconds, minutes or even hours.

Here is an example of top, with no switches:

top - 13:42:52 up 21 days,  7:36, 38 users,  load average: 0.93, 0.86, 0.70
Tasks: 289 total,   3 running, 286 sleeping,   0 stopped,   0 zombie
Cpu(s): 15.9% us,  7.9% sy,  0.0% ni, 75.5% id,  0.2% wa,  0.5% hi,  0.0% si
Mem:   2074936k total,  1677308k used,   397628k free,    75060k buffers
Swap:  2064376k total,    53180k used,  2011196k free,   989352k cached

In another widow, close to the same time, I ran 'top -bn1':

top - 13:42:54 up 21 days,  7:36, 38 users,  load average: 0.93, 0.86, 0.70
Tasks: 292 total,   2 running, 290 sleeping,   0 stopped,   0 zombie
Cpu(s): 36.9% us,  2.3% sy,  0.0% ni, 50.3% id, 10.4% wa,  0.1% hi,  0.0% si
Mem:   2074936k total,  1678076k used,   396860k free,    75064k buffers
Swap:  2064376k total,    53180k used,  2011196k free,   989348k cached

Notice that most of the values on the Cpu(s) line are not even close. Any that are seem to be a concidence. This is especially true with 'id' which represents the percentage of time the CPU is idle.

I've been monitoring a number of RH 7.2, RHEL3, and RHEL4 systems. Only the RHEL4 systems have this problem.

On one RHEL4 system, about 50.3% id has been showing consistently for each check of top -bn1 run every 5 minutes. For another RHEL4 system, the id% has been constant at about 81%.

Version-Release number of selected component (if applicable):
procps-3.2.3-8.2

How reproducible:
Always

Steps to Reproduce:
1. Open one window, run 'top'
2. Open another window, run 'watch -n3 "top -bn1|head -7"'
3.
  

Actual Results:  The window running 'top' with no parameters works normally, values for Cpu(s) change almost every update.

The window running top -bn1 repeatedly via 'watch' clearly demonstrates the problem. The values for the Cpu(s) line don't change.

The unchanging values from tob -bn1 can be very different from normal top.

Expected Results:  top -bn1 should return approximately the same result as top when run at the same time.

Additional info:

This appears to be an initialization problem with top. 

When I first start top, without parameters, the first putput shows the exact same values as the 'watch -n3 "top -bn1|head -7"'. On the next update of normal/interactive mode top, the Cpu(s) values change, drastically, and then continue to change with each update.

It seems top on RHEL4 initially gives a false or (counter based?) value for Cpu(s). If it takes an additional cycle to determine and provide real values, batch mode should be fixed so that proper values are reported.

Comment 1 Brian McEntire 2005-11-30 19:01:08 UTC
In case it matters to troubleshooting, our RHEL4 systems have two CPUs.

To be clear, the value of CPU idle from top -bn1 is not corrolated to
interactive top.

Comment 2 Brian McEntire 2005-11-30 19:20:17 UTC
Just to give another data point, I found an RHEL4 system with only one CPU, it
also demonstrates the same behaviour when top runs in batch mode.

Comment 3 Karel Zak 2005-11-30 22:09:45 UTC
It's correct bahaviour. The top command calculates %CPU by looking at the change
in CPU time values between samples. When you first run it, it has no previous
sample to compare to, so these initial values are the percentages since boot. It
means you need at least two loops. There is not other way how implement it.

Sorry, closing as NOTABUG.



Comment 4 Brian McEntire 2005-12-01 02:25:06 UTC
I don't want to be a pain, but please take a second look.

I'm not sure how they do it, but top -bn1 in both RH 7.2 and RHEL3 does not
share this behavior. They provide real values on the first iteration. This seems
like something that changed for the worse between RHEL3 and RHEL4.

At a minimum, if the number given by the first iteration is bogus, it would be
better to provide no value at all. It would also be earier to parse.

Comment 5 Karel Zak 2005-12-01 11:35:44 UTC
The top command in RHEL3 uses sleep(1) during startup. It means the first loop
is there but it's hidden for users. I don't think this is good solution for
interactive mode.

I've thought about it and I think we can add there a new option and the top with
this option will follow old good behaviour in RHEL3. I'm goning to talk about it
with upstream developer.

Thanks for your patience ;-)

Comment 6 Brian McEntire 2005-12-01 20:55:21 UTC
Thank you!  I haven't looked at the code so I don't know the difficulty of this,
but maybe you could put that sleep(1) back, but only for when the -b switch is used.

Comment 7 Karel Zak 2006-03-20 21:31:03 UTC
I don't want to change this behaviour in RHEL4 (due to possible regression). I
will try improve it for FC5/RHEL5. It's more safe way.

Comment 8 Karel Zak 2006-04-20 22:54:49 UTC
It's released to FC5 now. But it's implemented by CPULOOP=1 environment variable
-- yeah, it's workaround... I don't want to release some FC/RHEL specific
command line option now. I'm trying found some better solution with upstream
maintainer too. Maybe there will be possible use something better than env.
variable in future. 

Comment 9 Albert Cahalan 2007-05-28 04:24:35 UTC
BSD, AIX, and SysV all supply a per-process running average from the kernel. If
the Linux kernel would supply that, then the first iteration of top could be
both fast and correct.

Note that this data is also required for ps to meet the requirements of the
Single UNIX Specification. Right now ps is in violation, and not fixable in any
remotely acceptable way.