Red Hat Bugzilla – Bug 1013225
CONFIG_SCHEDSTATS isn't enabled due to performance impact (systemd-bootchart generates no bootcharts)
Last modified: 2015-09-03 23:23:40 EDT
Description of problem:
systemd-bootchart generates no (empty) bootcharts. The journal does not contain a message with MESSAGE_ID=9f26aa562cf440c2b16c773d0479b518.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. Boot with init=/usr/lib/systemd/systemd-bootchart
2. Check the size of the generated svg in /run/log
The svg file is empty (size: 0 bytes).
A bootchart should have been generated and stored in /run/log.
I have reproduced this on a minimal install done from the F20 alpha netinstall iso.
I checked my system, and I have the same problem. But when I run systemd-bootchart from command line, then I see:
open /proc/schedstat: No such file or directory
The same message is shown during boot.
Created attachment 810072 [details]
# fgrep CONFIG_SCHEDSTATS /boot/config-*
/boot/config-3.12.0-0.rc3.git5.2.fc21.x86_64:# CONFIG_SCHEDSTATS is not set
/boot/config-3.12.0-0.rc4.git1.2.fc21.x86_64:# CONFIG_SCHEDSTATS is not set
/boot/config-3.12.0-0.rc6.git0.2.fc21.x86_64:# CONFIG_SCHEDSTATS is not set
We switched to enabling that only on debug builds a while ago. It seems that was turned off entirely with the final 3.11.0 build and has remained off since. Internal testing shows the option has a non-trivial performance impact for context switches.
We can turn this on in debug kernels again, but I'm not sure it's worthwhile. Given that there are other debug options enabled which slow things down even more, all a bootchart would show there is slow things being slow. It isn't typically what someone wants to measure.
*** Bug 1026506 has been marked as a duplicate of this bug. ***
Josh P., can you elaborate a bit on some of the scheduling performance impacts you saw during your measurements?
In my tests I did a lot of context switches under various CPU loads. I saw a ~5-10% drop in average context switch speed when CONFIG_SCHEDSTATS was enabled. It varied depending on number of CPUs, CPU load, kernel version, and other kernel config options.
The performance hit only seemed to happen on post-CFS kernels (>= 2.6.23). The previous O(1) scheduler didn't seem to have this issue.
That's odd. I'll take a look. it shouldn't be that expensive.
*** Bug 1046021 has been marked as a duplicate of this bug. ***
So how do I, as a user, turn this feature on at runtime (or boot-time)?
I need this to try bootcharting the GNOME login as part of https://bugzilla.gnome.org/show_bug.cgi?id=645756
I don't believe it is something that can be enabled at runtime. You'd need to rebuild a kernel with the config option set.
Having just also having been forced to compile a custom kernel, I consider this seriously retrograde. I happened to want latencytop which I had used used previously on Fedora.
What is the target audience of Fedora these days ? - the build created the perf tools, but we don't have low latency kernel, or latencytop ?. Where's the logic ?.
Decisions by the developers are making it hard to stay committed to Fedora.
It's turned off because of the performance impact and, according to comment #9, under investigation.
Has anyone actually gotten around to investigating why the performance impact of this option is noticable? Josh, Andi?
Tim took a look I believe. Unfortunately nothing conclusive. Maybe we ran the wrong workload.
It would be good to have some function traces from a workload that shows slow downs with it own.
Can you guys reevaluate this for Fedora 21 workstation?
It's quite frustrating to not be able to provide the requested information to upstream GNOME because of this; GNOME users get to pay the price of eternally bad login performance because the upstream issue cannot get investigated without this profiling information.
Tis is just to notice the bug appears to be still there, e.g. booting with init=/usr/lib/systemd/systemd-bootchart leaves empty svg file in /run/log. And yes, the "open /proc/schedstat: No such file or directory" can be seen during boot.
Please is there any chance that this will be fixed? My system boots extremely slowly due to reason unknown at the time, so the bootchart graph should help greatly.
It's not a bug, it's a choice that's been made to disable the config option. You can build a kernel with it set fairly easily if you need to.
This is a bug though:
Failed to open /proc/latency_stats: No such file or directory
Please enable the CONFIG_LATENCYTOP configuration in your kernel.
That was bug 1046021 which was closed as a duplicate of this.
One can also argue that systemd-bootchart not working is also a bug (though I'm not inclined to do so).
Isn't it time that CONFIG_SCHEDSTATS was looked at again? And CONFIG_LATENCYTOP? (I notice that CONFIG_LATENCYTOP=y for RHEL7.)
Perhaps suggestions as to what to test for (and how) so that people can report back here about the impact (if any)?
(In reply to firstname.lastname@example.org from comment #20)
> This is a bug though:
> $ latencytop
> Failed to open /proc/latency_stats: No such file or directory
> Please enable the CONFIG_LATENCYTOP configuration in your kernel.
> That was bug 1046021 which was closed as a duplicate of this.
> One can also argue that systemd-bootchart not working is also a bug (though
> I'm not inclined to do so).
> Isn't it time that CONFIG_SCHEDSTATS was looked at again? And
> CONFIG_LATENCYTOP? (I notice that CONFIG_LATENCYTOP=y for RHEL7.)
I'm not aware of CONFIG_LATENCYTOP=y being set in the RHEL7 kernel. If it is, it would select SCHEDSTATS and that would be enabled as well. That would contradict the entire reasoning behind it being disabled in Fedora, given that it was found it cause the issues in RHEL.
Can you point me to which RHEL7 kernel RPM has LATENCYTOP enabled?
Josh, are you aware of any change on the RHEL side of things here?
(In reply to Josh Boyer from comment #21)
> Josh, are you aware of any change on the RHEL side of things here?
No. LATENCYTOP and SCHEDSTATS are (and always have been) both disabled on the RHEL7 production kernel.
They are however both (and always have been) enabled on the RHEL7 debug kernel.
Sorry, eyes not tracking properly, too many kernels installed:
# CONFIG_SCHEDSTATS is not set
# CONFIG_LATENCYTOP is not set
That's the current RHEL7 kernel. So far as I can tell, though, RHEL7 also ships latencytop. Does this mean that latencytop is only intended to work with the debug kernel? (In my experience, running the debug kernel to test for performance is not a good move, it affects performance too much.)
One question is if CONFIG_SCHEDSTAT and CONFIG_LATENCYTOP should be enabled or not by default.
In my opinion, the bigger problem with CONFIG_SCHEDSTAT and CONFIG_LATENCYTOP disabled is that the end result is empty bootrchart.svg. The end result should be some explanatory message in bootchart.svg, like Please recompile and reinstall your kernel with THIS and THAT option enabled. Or this requirement should be documented in manpage, wiki etc.
Would like to revitalize this discussion around the change of disabling CONFIG_SCHEDSTAT.
I am a performance consultant, working on all sorts of commercial application performance issues. The _complete_ metrics under /proc/<PID>/task/<TID>/sched are, in my opinion, invaluable.
One can immediately check for the severity of a CPU bottleneck, estimate IO waiting or the likelihood of priority inversion.
Other distributions still do provide these metrics, like e.g. SLES 11/12.
Could you please explain where you saw the performance impact when having CONFIG_SCHEDSTAT activated.
Created attachment 1066951 [details]
Try the attached (crude) microbenchmark and run like this:
perf stat -e cs ./cs 5 100
That will spawn 100 threads which call sched_yield() in a tight loop for 5 seconds. I think perf will generally report fewer context switches when CONFIG_SCHEDSTATS is enabled.
That said, I'm not really convinced that this microbenchmark corresponds to a sane real world usage scenario.
Also, given the number of people who have complained about latencytop being disabled and systemd-bootchart being broken, it might not be worth the tradeoff.
Created attachment 1066952 [details]
FWIW we did some tests and couldn't measure a difference with CONFIG_SCHEDSTATS. Also in theory the code shouldn't have much impact. Unless it can be measured in something a bit more macro it would be good to consider re-enabling it again, as it's very useful.
There's the concept of performance you're losing by not having the right tools to improve performance. And latencytop and other SCHEDSTATS based tools have a lot of potential here.
Thanks for these positive thoughts and feedback.
So how can we convince the decision makers at RedHat re-enabling CONFIG_SCHEDSTATS again?
As I am new to this whole process, I would like to mention that I am looking for RHEL 7.x onwards. Not Fedora. Is this still the correct place to beg? Or would I need to file another ER?
(In reply to Jan Schreiber from comment #29)
> Thanks for these positive thoughts and feedback.
> So how can we convince the decision makers at RedHat re-enabling
> CONFIG_SCHEDSTATS again?
We can enable it in Fedora whenever.
> As I am new to this whole process, I would like to mention that I am looking
> for RHEL 7.x onwards. Not Fedora. Is this still the correct place to beg? Or
> would I need to file another ER?
You would need to file a bug against the RHEL7 kernel making that request. I personally have no insight into how likely it will be granted.
Just filed https://bugzilla.redhat.com/show_bug.cgi?id=1256961.
Feel free to post your comments there, if your concern is the issue described above under RHEL 7.x.
Thanks for revitalising this Jan - I had given up in frustration. Couldn't believe it still wasn't enabled in F22.
For the devs, please consider this a vote for the change ASAP. Nice to see some potential for light at the end of the tunnel.
I've enabled the options in f23 and rawhide. The rc8-git1 kernel will have them set.
We'll look at the stable releases later.
Thank you Josh, that just dropped on the rawhide nodebug repo. _much_ better.
kernel-4.2.0-1.fc23 has been submitted as an update to Fedora 23. https://bodhi.fedoraproject.org/updates/FEDORA-2015-14782
kernel-4.2.0-1.fc23 has been pushed to the Fedora 23 testing repository. If problems still persist, please make note of it in this bug report.\nIf you want to test the update, you can install it with \n su -c 'yum --enablerepo=updates-testing update kernel'. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2015-14782
kernel-4.2.0-1.fc23 has been pushed to the Fedora 23 stable repository. If problems still persist, please make note of it in this bug report.