Red Hat Bugzilla – Bug 433954
[MRG] Kernel config changes required
Last modified: 2008-04-23 17:51:20 EDT
Sripathi Kodi <email@example.com> - 2008-02-19 06:17 EDT
We have seen that some config options in RH's 2.6.24-21.el5rt kernel are
affecting latencies in our tests. We need these options changed to improve
Options CONFIG_CPU_FREQ and CONFIG_NO_HZ should be turned OFF for predictable
Also, with CONFIG_FAIR_GROUP_SCHED on, specJBB produced an oops. We have
reported it to linux-rt-users with the subject line "Oops while running Specjbb
on -rt kernel". We believe this code is not yet perfect. Hence we need
CONFIG_FAIR_GROUP_SCHED turned OFF.
So we need:
CONFIG_CPU_FREQ to be turned OFF
CONFIG_NO_HZ to be turned OFF
CONFIG_FAIR_GROUP_SCHED to be turned OFF.
Sripathi Kodi <firstname.lastname@example.org> - 2008-02-22 06:04 EDT
In addition to the above, we are analyzing the effect of CONFIG_NUMA and
CONFIG_CPU_IDLE on the latencies.
So in summary:
The following options should be turned OFF:
CONFIG_RELOCATABLE should be turned ON. There is another bug,
https://bugzilla.redhat.com/show_bug.cgi?id=432378 specifically for this option.
------- Comment From email@example.com 2008-02-26 22:02 EDT-------
Minor update on this: From the discussion on monday, RH calimed
CONFIG_FAIR_GROUP_SCHED would be disabled in a future release.
The good news is that we've turned off CONFIG_FAIR_GROUP_SCHED.
The bad news (from your perspective) is that turning off CPU_FREQ and NO_HZ is
not as easy a decision. If we were focusing strictly on latency then we could do
this and default the system to poll=idle and we'd have the best latency we
could. Unfortunately we cannot ignore the power savings that NO_HZ and the
frequency governors buy us.
So, I'd like to try and quantify the sorts of performance hits we're seeing with
CPU_FREQ and NO_HZ, so that we can address those rather than tossing them out.
We'd like to get some more information concerning NO_HZ and CPU_FREQ. We've
specifically seen workloads where latency was improved by booting with nohz=true
(since it's disabled by default currently), so we'd like to see what kind of
workload is being hurt by having NO_HZ code in place.
Also, what settings did you have for the CPU_FREQ governors?
Can you share any test code or can we try and duplicate the test conditions?
Have you tried using a later 2.6.24 kernel than the one listed in the original
------- Comment From firstname.lastname@example.org 2008-03-13 07:19 EDT-------
We are running some tests to get comparison numbers for NO_HZ. For CPU_FREQ, we
have some numbers.
CONFIG_CPU_FREQ does not seem to have much impact on latencies on LS21 and HS21
blades. However, when run on an Intellistation zPro, we see a significant impact
of this. I suspect it depends on how much of power control is supported by the
hardware. There is a setting in the BIOS of zPro to disable power management,
which makes it perform much better. However, we saw the best numbers when
CONFIG_CPU_FREQ was disabled in the kernel.
Some numbers below. Except for one, they are from the rt-test suite that is now
part of LTP.
test name || MRG kernel 126.96.36.199-29.el5rt || MRG with CPU_FREQ disabled.
async handler || Max: 32 us/ Avg: 8.0527 us || Max: 25 us / Avg: 5.1222 us
gtod latency || Max: 7 us /Avg: 0.5148 us || Max: 2 us /Avg: 0.1292 us
pi_perf || Max = 85 us /Avg = 46.69 us || Max = 60 us /Average = 40.63 us
pthread_kill_lat || Max : 22 us / Avg : 8.1215 us || Max : 19 us / abg : 5.7992
sched_jitter || max jitter: 305.623993 us || max jitter: 141.701004 us
sched_latency || max : 9 us / avg : 5 us || max : 6 us /avg : 3 us
martix mult Sequential:
|| Max: 108219 /Avg :107882.4609 us || Max: 71929 us /Avg:
|| Max: 27600 /Avg : 27097.2109 || Max : 18421 us /Avg :
Seq/Conc Ratios :
|| Max: 2.9210 /Avg: 3.9813 || Max: 3.9047 /Avg : 3.9812
Proprietary benchmark(100run) || 77/100 PASS || 95/100 PASS
------- Comment From email@example.com 2008-03-17 16:25 EDT-------
Sripathi: So does idle=poll along with the bios disable effect this at all?
------- Comment From firstname.lastname@example.org 2008-03-25 19:25 EDT-------
As discussed on the call, we're now inline w/ the current MRG config for
everything except RELOCATE and KDUMP (covered by ltc bug #42253 and RH bug
I think CONFIG_NUMA is fine to be left alone, as we can boot w/ numa=off if
necessary (and we can still hunt down the performance issue in the meantime).
So I think this can be marked resolved.