From Bugzilla Helper:
User-Agent: Mozilla/4.76 [en] (Windows NT 5.0; U)
Description of problem:
After building a kernel based on a recent source RPM and changing HZ from 100 to 1200,
programs such as top report too much CPU utilization for individual processes. Several
processes can report >90% utilization while total CPU loading is 20% or less.
Have isolated problem to source code that estimates HZ from the uptime and
other values within the /proc file system.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. Build kernel after changing HZ to 1200
2. Install kernal and reboot
3. Run some processes and monitor CPU usage w/ top
Actual Results: Several processes can show about 100% usage, but actual CPU loading is much less.
Expected Results: Processes should show accurate values.
- continue to use case statement and if result is not within expected ranges (e.g, 50, 60, 100, 128,... 1024)
then use calculated value instead of defaulting to nominal value (HZ=100) OR
- use calculated value in all cases - round off error is probably not enough to be detectable
by the user anyway OR
- use value exported by the kernel (sysconf?) for HZ or from kernel header files
Nah this is a kernel bug.
The kernel should always export everything in HZ=100 even if the internal HZ is
higher. The code for that is present just not error-free.
What exact kernel version are you using ?
Re: kernel should export data as if HZ=100
I don't think that is true. Let me refer you to "man 2 times" which states...
The function times returns the number of clock ticks that have elapsed since an arbitrary point in the past....
The number of clock ticks per second can be obtained by using
which w/ the custom kernel is 1200 (which matches HZ).
We are building a "real time" kernel based on ...
kernel-source-2.4.18-4.i386.rpm (from Red Hat)
changed HZ to 1200 (smallest value that is a multiple of 10, 20, 30, 40, 50, 60, 80, 100 hz)
cpu affinity & prempt patches from Robert Love
bigphysmem patches (to support a driver we use)
our own patch to allow mlockall to get up to 90% physical memory (not 50%)
We have a modified version of top (we call ttop) which does the correct calculations
for HZ=1200 and generates reasonable results. We can send you the patch if you
are interested, but we recommend one of the other fixes as a more general solution.
Hmm. Well in theory reporting userspace in HZ=100 should be the case. It
As for the rest: I assume you are aware that the preempt patch is incompatible
with the 2.4 TCP/IP stack (and that it basically doesn't reduce latency if the
lowlatency patch is applied, as is the case for 2.4.18-4)
Hmm. Not sure what NEEDINFO means, but I'll reply to the comments.
We've been running w/ a 2.4 kernel w/ the preempt patches for months now
w/o any TCP/IP problems that we have been able to determine. Perhaps you
could elaborate on that separately [via email?]?
The reason we use both the low latency & preemption patch is that it
does appear to work better w/ both (based on our measurements) I would
also expect both in the Red Hat 2.5 kernels when the transition is made.
Let me also point out the paper "Linux Scheduler Latency" by Clark Williams
at Red Hat (March 2002) which reports results against 2.4.17 where
the both patches worked better than the low latency alone by 2-3 msec
[see last chart & accompanying text].
Linus has specifically stated that he wants the userspace view to see the
existing 1/100th of a second behaviour.
0) This isn't officially supported. :-)
1) You can look at our current rawhide kernels (and our Limbo beta
kernels) to see how we are handling this (HZ=1000, reporting as
if HZ=100). Linus is doing the same with the 2.5 kernel.
2) By "basically" I think Arjan meant "not very much" which is supported
by Clark's white paper; Clark's work was what I understand Arjan to
have been referring to. I think you are probably in violent agreement
I don't see any bugs here for us to fix.