707205 – systemd mounts cgroups

Bug 707205 - systemd mounts cgroups

Summary: systemd mounts cgroups

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	systemd
Sub Component:
Version:	15
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Assignee:	Lennart Poettering
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2011-05-24 11:51 UTC by Dhaval Giani
Modified:	2011-05-25 14:57 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2011-05-25 14:57:52 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Dhaval Giani 2011-05-24 11:51:07 UTC

Systemd should not mount cgroups with which it has no business. This currently breaks cgconfig, and also does not easily allow system configurations which systemd does not like. (IOW, first the user needs to unmount the other subsystems, and then use cgconfigparser)

Comment 1 Michal Schmidt 2011-05-24 12:05:50 UTC

Could you please show us the specific lines of /proc/mounts which you believe should not be there?

Comment 2 Dhaval Giani 2011-05-24 12:16:03 UTC

honestly?

cat /proc/mounts | grep cgroup

should get you it all..

Comment 3 Jan Safranek 2011-05-24 12:26:24 UTC

Well, I think /sys/fs/cgroup/systemd is fine, but the rest of /sys/fs/cgroup/ is problematic.

Comment 4 Michal Schmidt 2011-05-24 12:35:58 UTC

I get 4 lines:

The first two of them are essential to systemd. I hope these do not cause any trouble:
tmpfs /sys/fs/cgroup tmpfs rw,seclabel,nosuid,nodev,noexec,relatime,mode=755 0 0
cgroup /sys/fs/cgroup/systemd cgroup rw,nosuid,nodev,noexec,relatime,release_agent=/lib/systemd/systemd-cgroups-agent,name=systemd 0 0

The other two are causing the problem, right?:

cgroup /sys/fs/cgroup/ns cgroup rw,nosuid,nodev,noexec,relatime,ns 0 0
cgroup /sys/fs/cgroup/blkio cgroup rw,nosuid,nodev,noexec,relatime,blkio 0 0

Comment 5 Dhaval Giani 2011-05-24 12:41:58 UTC

right. you get two more lines because the cgconfig service fails at startup, and as its fallback mode, removes the subsystems mentioned in its configuration. disable cgconfig from starting up at bootup, and you will get something more interesting.

Comment 6 Michal Schmidt 2011-05-24 12:48:00 UTC

I see. systemd mounts all controllers it can find in /proc/cgroups during initialization (in the function mount_cgroup_controllers()).

Comment 7 Bill Nottingham 2011-05-24 15:31:40 UTC

(In reply to comment #5)
> right. you get two more lines because the cgconfig service fails at startup,
> and as its fallback mode, removes the subsystems mentioned in its
> configuration. disable cgconfig from starting up at bootup, and you will get
> something more interesting.

That seems pretty broken - cgconfig shouldn't be removing things it didn't mount.

Comment 8 Dhaval Giani 2011-05-24 16:09:22 UTC

> That seems pretty broken - cgconfig shouldn't be removing things it didn't
> mount.

Sure, but this happens because the user wants cgconfig to mount those things, and cgconfig tries to do so. We really do not want systemd to be mounting things by hardcoding them.

Comment 9 Bill Nottingham 2011-05-24 17:07:34 UTC

So, systemd mounts all kernel cgroups by default under /sys/fs/cgroup/<name>.

cgconfig, by default, mounts all kernel cgroups under /sys/fs/cgroup/<name>.

Frankly, if cgconfig can't handle the case where they're already mounted where it wants them mounted, I don't see how that's a systemd issue - that's just bad implementation.

Note that systemd mounts them for a reason - they're exposed as an API for units to use if they need to or want to. See 'ControlGroup' in systemd.exec(5). By mounting them in early boot, it makes sure that service is available for any unit that needs it, and that can be easily configured by that unit in it's own definition.

The argument seems to be that rather than have them available as a system resource as early as possible for whatever might need them, mounted in a simple and quick fashion, as a standard namespace, you'd prefer to have this only available after some userspace service runs, giving the user full control to shoot themselves by deciding that the cpu controller lives under /sys/fs/cgroup/memory and the memory controller lives under /sys/fs/cgroup/cpu.

Why is that an API we should expose to the system in that way? What benefits does that give you over just having them always mounted under the standard names, and leaving cgconfig.conf for more esoteric combinations that the admin can configure?

Comment 10 Dhaval Giani 2011-05-24 17:26:23 UTC

(In reply to comment #9)
> So, systemd mounts all kernel cgroups by default under /sys/fs/cgroup/<name>.
> 
> cgconfig, by default, mounts all kernel cgroups under /sys/fs/cgroup/<name>.
> 
> Frankly, if cgconfig can't handle the case where they're already mounted where
> it wants them mounted, I don't see how that's a systemd issue - that's just bad
> implementation.
> 
> Note that systemd mounts them for a reason - they're exposed as an API for
> units to use if they need to or want to. See 'ControlGroup' in systemd.exec(5).
> By mounting them in early boot, it makes sure that service is available for any
> unit that needs it, and that can be easily configured by that unit in it's own
> definition.
>

How many units today are using cgroups which are used for resource management?
 
> The argument seems to be that rather than have them available as a system
> resource as early as possible for whatever might need them, mounted in a simple
> and quick fashion, as a standard namespace, you'd prefer to have this only
> available after some userspace service runs, giving the user full control to
> shoot themselves by deciding that the cpu controller lives under
> /sys/fs/cgroup/memory and the memory controller lives under /sys/fs/cgroup/cpu.
> 

I do not seem to agree with it. The problem here is that systemd has no clue about how the cgroups are going to be used, there are enough use cases where the systemd technique can be used, but it is much preferable to have different setups. systemd has absolutely no business looking at resource management, since it has no clue what applications want.

> Why is that an API we should expose to the system in that way? What benefits
> does that give you over just having them always mounted under the standard
> names, and leaving cgconfig.conf for more esoteric combinations that the admin
> can configure?

Right, so unfortunately by having systemd do what it is doing (for no additional benefit), you increasing complexity elsewhere. I do not see any benefit in systemd mounting cgroups as it sees fit. systemd's requirements of cgroups is limited to only its named subsystem. I know Lennart had plans of mounting it together with cpu, but there are enough issues as it is over there, and afaik, he dropped those plans.

Finally, what is the benefit of systemd doing things it is not designed to do. systemd is an init replacement, a session manager. It is not a resource manager and therefore has no business mucking around there.

Comment 11 Bill Nottingham 2011-05-24 19:34:47 UTC

(In reply to comment #10)
> > Note that systemd mounts them for a reason - they're exposed as an API for
> > units to use if they need to or want to. See 'ControlGroup' in systemd.exec(5).
> > By mounting them in early boot, it makes sure that service is available for any
> > unit that needs it, and that can be easily configured by that unit in it's own
> > definition.
> >
> 
> How many units today are using cgroups which are used for resource management?

Roughly, just as many that are using the libcgroup ABI.

> > The argument seems to be that rather than have them available as a system
> > resource as early as possible for whatever might need them, mounted in a simple
> > and quick fashion, as a standard namespace, you'd prefer to have this only
> > available after some userspace service runs, giving the user full control to
> > shoot themselves by deciding that the cpu controller lives under
> > /sys/fs/cgroup/memory and the memory controller lives under /sys/fs/cgroup/cpu.
> > 
> 
> I do not seem to agree with it. The problem here is that systemd has no clue
> about how the cgroups are going to be used, there are enough use cases where
> the systemd technique can be used, but it is much preferable to have different
> setups. 

Why? You never explain this, you just state this.

> > Why is that an API we should expose to the system in that way? What benefits
> > does that give you over just having them always mounted under the standard
> > names, and leaving cgconfig.conf for more esoteric combinations that the admin
> > can configure?
> 
> Right, so unfortunately by having systemd do what it is doing (for no
> additional benefit), you increasing complexity elsewhere.

If you're complaining about 'increasing complexity' meaning 'having to handle the case where the filesystem is already mounted', I'm not sure we can have a reasonable discussion on this.

What other complexity are you referring to ?

> I do not see any
> benefit in systemd mounting cgroups as it sees fit.

It makes sure they're available in a standard location for services that need them. Without this, every service that an admin might want to put in a group needs to implicitly have a dependency on a service that mounts the cgroups, which seems like overkill when the init process can handle it, just like it handles making sure /proc, /sys, or similar resources are available.

> Finally, what is the benefit of systemd doing things it is not designed to do.
> systemd is an init replacement, a session manager. It is not a resource manager
> and therefore has no business mucking around there.

By virtue of managing daemons, it certainly is a resource manager - it controls what uid/gid they run as, it manages rlimits (as it should); managing what control group a daemon goes into is a logical extension of that. It certainly makes more sense as an interface to have in systemd than to have a separate service that reclassifies daemons into control groups.

Closing as NOTABUG; if you've got specific usage cases that this breaks, I would like to know what they are.

Comment 12 Dhaval Giani 2011-05-24 19:47:22 UTC

(In reply to comment #11)
> (In reply to comment #10)
> > > Note that systemd mounts them for a reason - they're exposed as an API for
> > > units to use if they need to or want to. See 'ControlGroup' in systemd.exec(5).
> > > By mounting them in early boot, it makes sure that service is available for any
> > > unit that needs it, and that can be easily configured by that unit in it's own
> > > definition.
> > >
> > 
> > How many units today are using cgroups which are used for resource management?
> 
> Roughly, just as many that are using the libcgroup ABI.
> 
> > > The argument seems to be that rather than have them available as a system
> > > resource as early as possible for whatever might need them, mounted in a simple
> > > and quick fashion, as a standard namespace, you'd prefer to have this only
> > > available after some userspace service runs, giving the user full control to
> > > shoot themselves by deciding that the cpu controller lives under
> > > /sys/fs/cgroup/memory and the memory controller lives under /sys/fs/cgroup/cpu.
> > > 
> > 
> > I do not seem to agree with it. The problem here is that systemd has no clue
> > about how the cgroups are going to be used, there are enough use cases where
> > the systemd technique can be used, but it is much preferable to have different
> > setups. 
> 
> Why? You never explain this, you just state this.
> 
> > > Why is that an API we should expose to the system in that way? What benefits
> > > does that give you over just having them always mounted under the standard
> > > names, and leaving cgconfig.conf for more esoteric combinations that the admin
> > > can configure?
> > 
> > Right, so unfortunately by having systemd do what it is doing (for no
> > additional benefit), you increasing complexity elsewhere.
> 
> If you're complaining about 'increasing complexity' meaning 'having to handle
> the case where the filesystem is already mounted', I'm not sure we can have a
> reasonable discussion on this.
> 
> What other complexity are you referring to ?
>

The fact that anoyone else who does not want to configure cgroups the way systemd decides to has to go through a lot of pain, especially *when* most use cases still do not use cgroups. Yes, when there are a lot of users, it makes sense, but right now, it does not. Remember, premature optimization is never good.
 
> > I do not see any
> > benefit in systemd mounting cgroups as it sees fit.
> 
> It makes sure they're available in a standard location for services that need
> them. Without this, every service that an admin might want to put in a group
> needs to implicitly have a dependency on a service that mounts the cgroups,
> which seems like overkill when the init process can handle it, just like it
> handles making sure /proc, /sys, or similar resources are available.
> 
> > Finally, what is the benefit of systemd doing things it is not designed to do.
> > systemd is an init replacement, a session manager. It is not a resource manager
> > and therefore has no business mucking around there.
> 
> By virtue of managing daemons, it certainly is a resource manager - it controls
> what uid/gid they run as, it manages rlimits (as it should); managing what
> control group a daemon goes into is a logical extension of that. It certainly
> makes more sense as an interface to have in systemd than to have a separate
> service that reclassifies daemons into control groups.
> 

Until systemd actually does that, it has no business doing what it is doing.

> Closing as NOTABUG; if you've got specific usage cases that this breaks, I
> would like to know what they are.

I am sorry, I don't quite agree with your assessment, so leaving it open, till systemd is fixed.

Comment 13 Dhaval Giani 2011-05-24 20:20:00 UTC

in order to explain this bug better, (as mjg59 wants)

Please disable cgconfig, and try diong

mkdir -p /sys/kernel/fs/cgroup/all
mount -t cgroup -o memory,cpu,cpuacct none /sys/kernel/fs/cgroup/all

If you give me the capability to do this with systemd, I shall accept I am wrong. but seriously, stop attacking me and fix the software.

Comment 14 Bill Nottingham 2011-05-24 20:50:28 UTC

(In reply to comment #12)
> The fact that anoyone else who does not want to configure cgroups the way
> systemd decides to has to go through a lot of pain, especially *when* most use
> cases still do not use cgroups. Yes, when there are a lot of users, it makes
> sense, but right now, it does not. Remember, premature optimization is never
> good.

This pain (by all appearances) boils down to the fact that you can't mount a subsystem in multiple places. What is the reason for this limitation? (I'm assuming it's an implementation choice, but it appears arbitrary.) Supporting one hierarchy managed by cpu & blkio, and one managed by cpu & memory, seems like a logical case to support.

(This is leaving aside that a brief testing on F-15 shows mounting a hierarchy with multiple controllers is currently broken.)

> > By virtue of managing daemons, it certainly is a resource manager - it controls
> > what uid/gid they run as, it manages rlimits (as it should); managing what
> > control group a daemon goes into is a logical extension of that. It certainly
> > makes more sense as an interface to have in systemd than to have a separate
> > service that reclassifies daemons into control groups.
> > 
> 
> Until systemd actually does that, it has no business doing what it is doing.

Not sure what you mean by this - it already does uid, gid, rlimits, and assignation to a cgroup.

Comment 15 Dhaval Giani 2011-05-24 20:59:11 UTC

(In reply to comment #14)
> (In reply to comment #12)
> > The fact that anoyone else who does not want to configure cgroups the way
> > systemd decides to has to go through a lot of pain, especially *when* most use
> > cases still do not use cgroups. Yes, when there are a lot of users, it makes
> > sense, but right now, it does not. Remember, premature optimization is never
> > good.
> 
> This pain (by all appearances) boils down to the fact that you can't mount a
> subsystem in multiple places. What is the reason for this limitation? (I'm
> assuming it's an implementation choice, but it appears arbitrary.) Supporting
> one hierarchy managed by cpu & blkio, and one managed by cpu & memory, seems
> like a logical case to support.
> 
> (This is leaving aside that a brief testing on F-15 shows mounting a hierarchy
> with multiple controllers is currently broken.)
> 

Please do give me a design, where a task can be part of groupA in cpu,memory and groupB in cpu,blkio. Unless of course you are talking about cpu,memory,blkio mounted together.

> 
> Not sure what you mean by this - it already does uid, gid, rlimits, and
> assignation to a cgroup.

Assignation on some heuristic that systemd thinks is useful, is not really useful for a lot of other users.

Comment 16 Lennart Poettering 2011-05-25 14:57:52 UTC

I am not sure I understand what this bug is about. We mount all the hierarchies at boot that are enabled in the kernel, and the reason for that is that we want them to be available with no subsystem-specific hacks.

Mounting cgroup hierarchies together makes them useless for things like systemd because then starting cgroups in them often does not work unless you initialize all the parameters. Hence we mount them separately.

Also, Dhaval, you are working on libcgroup which hides whether things are mounted together or not, and is able to create groups in multiple trees simultaneously without user interaction.

Note You need to log in before you can comment on or make changes to this bug.