From Bugzilla Helper: User-Agent: Mozilla/4.76 [en] (Windows NT 5.0; U) Description of problem: After building a kernel based on a recent source RPM and changing HZ from 100 to 1200, programs such as top report too much CPU utilization for individual processes. Several processes can report >90% utilization while total CPU loading is 20% or less. Have isolated problem to source code that estimates HZ from the uptime and other values within the /proc file system. Version-Release number of selected component (if applicable): How reproducible: Always Steps to Reproduce: 1. Build kernel after changing HZ to 1200 2. Install kernal and reboot 3. Run some processes and monitor CPU usage w/ top Actual Results: Several processes can show about 100% usage, but actual CPU loading is much less. Expected Results: Processes should show accurate values. Additional info: Three suggestions: - continue to use case statement and if result is not within expected ranges (e.g, 50, 60, 100, 128,... 1024) then use calculated value instead of defaulting to nominal value (HZ=100) OR - use calculated value in all cases - round off error is probably not enough to be detectable by the user anyway OR - use value exported by the kernel (sysconf?) for HZ or from kernel header files Thanks.
Nah this is a kernel bug. The kernel should always export everything in HZ=100 even if the internal HZ is higher. The code for that is present just not error-free. What exact kernel version are you using ?
Re: kernel should export data as if HZ=100 I don't think that is true. Let me refer you to "man 2 times" which states... The function times returns the number of clock ticks that have elapsed since an arbitrary point in the past.... The number of clock ticks per second can be obtained by using sysconf(_SC_CLK_TCK); which w/ the custom kernel is 1200 (which matches HZ). We are building a "real time" kernel based on ... kernel-source-2.4.18-4.i386.rpm (from Red Hat) changed HZ to 1200 (smallest value that is a multiple of 10, 20, 30, 40, 50, 60, 80, 100 hz) cpu affinity & prempt patches from Robert Love bigphysmem patches (to support a driver we use) our own patch to allow mlockall to get up to 90% physical memory (not 50%) We have a modified version of top (we call ttop) which does the correct calculations for HZ=1200 and generates reasonable results. We can send you the patch if you are interested, but we recommend one of the other fixes as a more general solution.
Hmm. Well in theory reporting userspace in HZ=100 should be the case. It obviously isn't. As for the rest: I assume you are aware that the preempt patch is incompatible with the 2.4 TCP/IP stack (and that it basically doesn't reduce latency if the lowlatency patch is applied, as is the case for 2.4.18-4)
Hmm. Not sure what NEEDINFO means, but I'll reply to the comments. We've been running w/ a 2.4 kernel w/ the preempt patches for months now w/o any TCP/IP problems that we have been able to determine. Perhaps you could elaborate on that separately [via email?]? The reason we use both the low latency & preemption patch is that it does appear to work better w/ both (based on our measurements) I would also expect both in the Red Hat 2.5 kernels when the transition is made. Let me also point out the paper "Linux Scheduler Latency" by Clark Williams at Red Hat (March 2002) which reports results against 2.4.17 where the both patches worked better than the low latency alone by 2-3 msec [see last chart & accompanying text].
Linus has specifically stated that he wants the userspace view to see the existing 1/100th of a second behaviour.
0) This isn't officially supported. :-) 1) You can look at our current rawhide kernels (and our Limbo beta kernels) to see how we are handling this (HZ=1000, reporting as if HZ=100). Linus is doing the same with the 2.5 kernel. 2) By "basically" I think Arjan meant "not very much" which is supported by Clark's white paper; Clark's work was what I understand Arjan to have been referring to. I think you are probably in violent agreement there. I don't see any bugs here for us to fix.