Bug 1054473

Summary: RFE: Backport CGroups optimizations
Product: [Fedora] Fedora Reporter: David Strauss <david>
Component: systemdAssignee: systemd-maint
Status: CLOSED WONTFIX QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: unspecified    
Version: 20CC: johannbg, lnykryn, msekleta, plautrba, systemd-maint, vpavlin, zbyszek
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-12-09 02:43:17 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description David Strauss 2014-01-16 21:57:20 UTC
Upstream patch: http://cgit.freedesktop.org/systemd/systemd/commit/src/core/cgroup.c?id=6414b7c981378a6eef480f6806d7cbfc98ca22a1

Description of problem:
Unit numbers exponentially affect CGroups application during boot time and daemon-reload/daemon-reexec. This take make a system that booted fine under Fedora 19/systemd v204 take hours under Fedora 20/systemd v208. This even affects disabled units.

Version-Release number of selected component (if applicable):
Any systemd v208

How reproducible:
Every boot.

Steps to Reproduce:
1. Create a bunch of units (hundreds to thousands) that use a proportional CGroups controller, like CPUShares.
2. Boot.
3. Wait a really long time.

Actual results:
System takes a really, really long time to reach login.

Expected results:
System takes an amount of time roughly proportional to the number of units it starts.

Additional info:
This patch has been running in production at Pantheon for over a month. We can't boot our boxes without it. It applies cleanly to Fedora's systemd package.

Comment 1 Lennart Poettering 2014-02-23 15:55:00 UTC
If this is backported the more recent fixes to 209 should be backported too, which make sure cgroup membership can be removed again if a cgroup setting is reset.

That said, I am not convinced that this is material to backport, and would suggest leaving this for F21.

Comment 2 David Strauss 2014-02-28 19:10:58 UTC
> If this is backported the more recent fixes to 209 should be backported too, which make sure cgroup membership can be removed again if a cgroup setting is reset.

Agreed.

> That said, I am not convinced that this is material to backport, and would suggest leaving this for F21.

We're currently running both the cgroup performance patch and an interim one for is-enabled performance on our production systems. While I have someone working on an upstream-friendly fix for the latter right now, it may be some time before that's done. As long as we have to patch the RPM at least once, it's about the same effort for us.

In short, I'm fine leaving this un-backported.

Comment 3 Zbigniew Jędrzejewski-Szmek 2014-03-03 13:28:16 UTC
If you have backported patches that work, I'd be great to add them to the v208-stable branch and release them in F20, since F21 is another 6 months away.

Comment 4 David Strauss 2014-03-03 18:26:24 UTC
I know my patch applies cleanly to v208 and is stable in production:
http://cgit.freedesktop.org/systemd/systemd/commit/src/core/cgroup.c?id=6414b7c981378a6eef480f6806d7cbfc98ca22a1

I haven't tried Lennart's follow-up fixes to core/cgroup.c for removing controllers, but they will probably also apply and run fine.

Comment 5 Lennart Poettering 2014-12-09 02:43:17 UTC
I figure we won't backport these anymore. THis is too complex, anf F20 too old. Sorry.