Bug 66319

Summary:	Erroneous process usage reported with non-standard HZ
Product:	[Retired] Red Hat Linux	Reporter:	Mark H Johnson <mark_h_johnson>
Component:	kernel	Assignee:	Arjan van de Ven <arjanv>
Status:	CLOSED CURRENTRELEASE	QA Contact:	Aaron Brown <abrown>
Severity:	low	Docs Contact:
Priority:	medium
Version:	7.1
Target Milestone:	---
Target Release:	---
Hardware:	All
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2002-07-12 13:25:59 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Mark H Johnson 2002-06-07 16:47:12 UTC

From Bugzilla Helper:
User-Agent: Mozilla/4.76 [en] (Windows NT 5.0; U)

Description of problem:
After building a kernel based on a recent source RPM and changing HZ from 100 to 1200,
programs such as top report too much CPU utilization for individual processes. Several
processes can report >90% utilization while total CPU loading is 20% or less.

Have isolated problem to source code that estimates HZ from the uptime and
other values within the /proc file system.

Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1. Build kernel after changing HZ to 1200
2. Install kernal and reboot
3. Run some processes and monitor CPU usage w/ top
	

Actual Results:  Several processes can show about 100% usage, but actual CPU loading is much less.

Expected Results:  Processes should show accurate values.

Additional info:

Three suggestions:
 - continue to use case statement and if result is not within expected ranges (e.g, 50, 60, 100, 128,... 1024)
then use calculated value instead of defaulting to nominal value (HZ=100) OR
 - use calculated value in all cases - round off error is probably not enough to be detectable
by the user anyway OR
 - use value exported by the kernel (sysconf?) for HZ or from kernel header files

Thanks.

Comment 1 Arjan van de Ven 2002-06-07 17:09:01 UTC

Nah this is a kernel bug.
The kernel should always export everything in HZ=100 even if the internal HZ is
higher. The code for that is present just not error-free.

What exact kernel version are you using ?

Comment 2 Mark H Johnson 2002-06-07 19:40:29 UTC

Re: kernel should export data as if HZ=100

I don't think that is true. Let me refer you to "man 2 times" which states...
 The function times returns the number of clock ticks that have elapsed since an arbitrary point in the past....
The number of clock ticks per second can be obtained by using
  sysconf(_SC_CLK_TCK);
which w/ the custom kernel is 1200 (which matches HZ).

We are building a "real time" kernel based on ...
  kernel-source-2.4.18-4.i386.rpm (from Red Hat)
  changed HZ to 1200 (smallest value that is a multiple of 10, 20, 30, 40, 50, 60, 80, 100 hz)
  cpu affinity & prempt patches from Robert Love
  bigphysmem patches (to support a driver we use)
  our own patch to allow mlockall to get up to 90% physical memory (not 50%)
 
We have a modified version of top (we call ttop) which does the correct calculations
for HZ=1200 and generates reasonable results. We can send you the patch if you
are interested, but we recommend one of the other fixes as a more general solution.

Comment 3 Arjan van de Ven 2002-06-07 19:56:32 UTC

Hmm. Well in theory reporting userspace in HZ=100 should be the case. It
obviously isn't.

As for the rest: I assume you are aware that the preempt patch is incompatible
with the 2.4 TCP/IP stack (and that it basically doesn't reduce latency if the
lowlatency patch is applied, as is the case for 2.4.18-4)

Comment 4 Mark H Johnson 2002-06-07 20:38:27 UTC

Hmm. Not sure what NEEDINFO means, but I'll reply to the comments.

We've been running w/ a 2.4 kernel w/ the preempt patches for months now
w/o any TCP/IP problems that we have been able to determine. Perhaps you
could elaborate on that separately [via email?]?

The reason we use both the low latency & preemption patch is that it
does appear to work better w/ both (based on our measurements) I would
also expect both in the Red Hat 2.5 kernels when the transition is made.

Let me also point out the paper "Linux Scheduler Latency" by Clark Williams
at Red Hat (March 2002) which reports results against 2.4.17 where
the both patches worked better than the low latency alone by 2-3 msec
[see last chart & accompanying text].

Comment 5 Alan Cox 2002-07-12 13:25:54 UTC

Linus has specifically stated that he wants the userspace view to see the
existing 1/100th of a second behaviour.

Comment 6 Michael K. Johnson 2002-07-12 13:34:19 UTC

0) This isn't officially supported.  :-)
1) You can look at our current rawhide kernels (and our Limbo beta
   kernels) to see how we are handling this (HZ=1000, reporting as
   if HZ=100).  Linus is doing the same with the 2.5 kernel.
2) By "basically" I think Arjan meant "not very much" which is supported
   by Clark's white paper; Clark's work was what I understand Arjan to
   have been referring to.  I think you are probably in violent agreement
   there.

I don't see any bugs here for us to fix.