174619 – top reports wrong values for CPU(s) in batch mode

Bug 174619 - top reports wrong values for CPU(s) in batch mode

Summary: top reports wrong values for CPU(s) in batch mode

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	procps
Sub Component:
Version:	rawhide
Hardware:	i386
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Karel Zak
QA Contact:	Brian Brock
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2005-11-30 18:59 UTC by Brian McEntire
Modified:	2007-11-30 22:11 UTC (History)
CC List:	0 users
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2006-04-20 22:54:49 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Brian McEntire 2005-11-30 18:59:03 UTC

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.12) Gecko/20050921 Red Hat/1.7.12-1.4.1

Description of problem:
When I use a particular option to the 'top' command, specifically, -b -n1, I do not get the proper output. 

When running top in regular mode, it keeps updating and the values on the Cpu(s) line changes with each update. However, when running top with the -bn1 switches (batch mode, 1 iteration), the Cpu(s) line does not change from run to run over seconds, minutes or even hours.

Here is an example of top, with no switches:

top - 13:42:52 up 21 days,  7:36, 38 users,  load average: 0.93, 0.86, 0.70
Tasks: 289 total,   3 running, 286 sleeping,   0 stopped,   0 zombie
Cpu(s): 15.9% us,  7.9% sy,  0.0% ni, 75.5% id,  0.2% wa,  0.5% hi,  0.0% si
Mem:   2074936k total,  1677308k used,   397628k free,    75060k buffers
Swap:  2064376k total,    53180k used,  2011196k free,   989352k cached

In another widow, close to the same time, I ran 'top -bn1':

top - 13:42:54 up 21 days,  7:36, 38 users,  load average: 0.93, 0.86, 0.70
Tasks: 292 total,   2 running, 290 sleeping,   0 stopped,   0 zombie
Cpu(s): 36.9% us,  2.3% sy,  0.0% ni, 50.3% id, 10.4% wa,  0.1% hi,  0.0% si
Mem:   2074936k total,  1678076k used,   396860k free,    75064k buffers
Swap:  2064376k total,    53180k used,  2011196k free,   989348k cached

Notice that most of the values on the Cpu(s) line are not even close. Any that are seem to be a concidence. This is especially true with 'id' which represents the percentage of time the CPU is idle.

I've been monitoring a number of RH 7.2, RHEL3, and RHEL4 systems. Only the RHEL4 systems have this problem.

On one RHEL4 system, about 50.3% id has been showing consistently for each check of top -bn1 run every 5 minutes. For another RHEL4 system, the id% has been constant at about 81%.

Version-Release number of selected component (if applicable):
procps-3.2.3-8.2

How reproducible:
Always

Steps to Reproduce:
1. Open one window, run 'top'
2. Open another window, run 'watch -n3 "top -bn1|head -7"'
3.
  

Actual Results:  The window running 'top' with no parameters works normally, values for Cpu(s) change almost every update.

The window running top -bn1 repeatedly via 'watch' clearly demonstrates the problem. The values for the Cpu(s) line don't change.

The unchanging values from tob -bn1 can be very different from normal top.

Expected Results:  top -bn1 should return approximately the same result as top when run at the same time.

Additional info:

This appears to be an initialization problem with top. 

When I first start top, without parameters, the first putput shows the exact same values as the 'watch -n3 "top -bn1|head -7"'. On the next update of normal/interactive mode top, the Cpu(s) values change, drastically, and then continue to change with each update.

It seems top on RHEL4 initially gives a false or (counter based?) value for Cpu(s). If it takes an additional cycle to determine and provide real values, batch mode should be fixed so that proper values are reported.

Comment 1 Brian McEntire 2005-11-30 19:01:08 UTC

In case it matters to troubleshooting, our RHEL4 systems have two CPUs.

To be clear, the value of CPU idle from top -bn1 is not corrolated to
interactive top.

Comment 2 Brian McEntire 2005-11-30 19:20:17 UTC

Just to give another data point, I found an RHEL4 system with only one CPU, it
also demonstrates the same behaviour when top runs in batch mode.

Comment 3 Karel Zak 2005-11-30 22:09:45 UTC

It's correct bahaviour. The top command calculates %CPU by looking at the change
in CPU time values between samples. When you first run it, it has no previous
sample to compare to, so these initial values are the percentages since boot. It
means you need at least two loops. There is not other way how implement it.

Sorry, closing as NOTABUG.

Comment 4 Brian McEntire 2005-12-01 02:25:06 UTC

I don't want to be a pain, but please take a second look.

I'm not sure how they do it, but top -bn1 in both RH 7.2 and RHEL3 does not
share this behavior. They provide real values on the first iteration. This seems
like something that changed for the worse between RHEL3 and RHEL4.

At a minimum, if the number given by the first iteration is bogus, it would be
better to provide no value at all. It would also be earier to parse.

Comment 5 Karel Zak 2005-12-01 11:35:44 UTC

The top command in RHEL3 uses sleep(1) during startup. It means the first loop
is there but it's hidden for users. I don't think this is good solution for
interactive mode.

I've thought about it and I think we can add there a new option and the top with
this option will follow old good behaviour in RHEL3. I'm goning to talk about it
with upstream developer.

Thanks for your patience ;-)

Comment 6 Brian McEntire 2005-12-01 20:55:21 UTC

Thank you!  I haven't looked at the code so I don't know the difficulty of this,
but maybe you could put that sleep(1) back, but only for when the -b switch is used.

Comment 7 Karel Zak 2006-03-20 21:31:03 UTC

I don't want to change this behaviour in RHEL4 (due to possible regression). I
will try improve it for FC5/RHEL5. It's more safe way.

Comment 8 Karel Zak 2006-04-20 22:54:49 UTC

It's released to FC5 now. But it's implemented by CPULOOP=1 environment variable
-- yeah, it's workaround... I don't want to release some FC/RHEL specific
command line option now. I'm trying found some better solution with upstream
maintainer too. Maybe there will be possible use something better than env.
variable in future.

Comment 9 Albert Cahalan 2007-05-28 04:24:35 UTC

BSD, AIX, and SysV all supply a per-process running average from the kernel. If
the Linux kernel would supply that, then the first iteration of top could be
both fast and correct.

Note that this data is also required for ps to meet the requirements of the
Single UNIX Specification. Right now ps is in violation, and not fixable in any
remotely acceptable way.

Note You need to log in before you can comment on or make changes to this bug.