Bug 628004

Summary: Doesn't work on old kernels lacking /sys/fs/cgroup
Product: [Fedora] Fedora Reporter: W. Michael Petullo <mike>
Component: systemdAssignee: Lennart Poettering <lpoetter>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: low    
Version: rawhideCC: jaminjay+rhbz, lpoetter, mattdm, metherid, mingo, misek, mschmidt, notting, plautrba, pzijlstr, sangu.fedora
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: systemd-25-1.fc15 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-05-01 03:23:39 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
systemd cgroups fix
none
semi-serious patch: kernel cgroups fix! none

Description W. Michael Petullo 2010-08-27 16:18:06 UTC
Description of problem:
After performing an update to F14 (development version), Systemd tries to mount /sys/fs/cgroup but fails with "Failed to mount /sys/fs/cgroup: No such file or directory." The system boot fails at this point.

Version-Release number of selected component (if applicable):
systemd-8-2.fc14.x86_64
kernel-2.6.36-0.0.rc0.git1.fc15.x86_64

How reproducible:
Every time

Steps to Reproduce:
1. Update to F14 as of 25 AUG 10
2. Boot the system
  
Actual results:
See above.

Expected results:
System should boot.

Additional info:
Booting with init=/sbin/upstart works.

Once I boot using upstart, I can confirm that /sys/fs/cgroup does not exist. I can not manually create /sys/fs/cgroup.
# mkdir /sys/fs/cgroup
mkdir: cannot create directory `/sys/fs/cgroup': No such file or directory

Comment 1 Matthew Miller 2010-08-27 17:48:30 UTC
You shouldn't create that directory; the kernel does. But that commit is:

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=ab69bcd66fb4be64edfc767365cb9eb084961246

And I think that's later than the kernel you're running, which isn't the latest in rawhide. And with the fc15 tag, is that really in F14?

Updating to the latest rawhide kernel should fix you up.

Lennart, should we put in a requires > line for the kernel? (Ugh.)

Comment 2 Vaclav "sHINOBI" Misek 2010-08-27 18:43:50 UTC
Hmm too bad for me as I used latest FC 13 kernel due to the nvidia proprietary drivers. They can't compile under debugging-enabled kernels currently used in FC 14 :-(. Is there any workaround?

Comment 3 Matthew Miller 2010-08-27 18:53:20 UTC
1. Use nouveau.
2. Recompile your old kernel with the patch from http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=676db4af043014e852f67ba0349dae0071bd11f3 .
3. Recompile the new kernel with whatever kludges you need to do to make nvidia happy.

Comment 4 Matthew Miller 2010-08-27 18:53:46 UTC
4. Complain to nvidia.

Comment 5 Vaclav "sHINOBI" Misek 2010-08-27 19:17:09 UTC
Yes, yes. I know nvidia drivers are unsupported and should be avoided. Unfortunately Fermi chip leaves me no other viable option.

The point is, that systemd should be used by some users on different kernels without the above mentioned patch, so if this can be fixed in systemd it will be nice.

Comment 6 Matthew Miller 2010-08-27 19:21:48 UTC
In absence of a systemd fix, 2 and 3 are still viable options. And honestly, 4 is too.

Comment 7 Bill Nottingham 2010-08-27 19:41:30 UTC
systemd already Requires: a newer kernel. However, it's based on the F14 2.6.35 version numbering, which the older 2.6.36 prerelease kernels show up as 'newer' than.

Comment 8 Matthew Miller 2010-08-27 19:56:12 UTC
Oh awesome. I ♥ RPM.

I assume the F14 2.6.35.2 kernel has the patch backported? In that case, this is basically either a) not a bug or b) a feature request for systemd to fall back to mounting the cgroup filesystem in some other (emergency?) location.

Comment 9 W. Michael Petullo 2010-08-27 20:42:40 UTC
"Downgrading" to kernel-2.6.35.2-9.fc14.x86_64 from kernel-2.6.36-0.0.rc0.git1.fc15.x86_64 fixed the problem. In response to comment #1, yes, I am testing some packages from F15. That 2.6.36-rc0 would not suffice when 2.6.35.2 worked kind of threw me.

I agree that with comment #8 in that systemd should have some type of fallback if things fail in this (or some similar) manner. Eventually, init=/sbin/upstart will not suffice.

Comment 10 W. Michael Petullo 2010-08-27 20:44:19 UTC
Of course, init=/bin/sh will work, so perhaps I would reword my previous comment.

Comment 11 Benjamin Hahne 2010-09-01 13:52:47 UTC
This bug occurred after running preupgrade from F13 to F14 Alpha.  I believe it was caused by the menu.lst not being set to boot the new kernel.  Had to manually configure to enable working system after help from #fedora-qa to identify kernel mismatch.

Comment 12 Lennart Poettering 2010-09-14 23:31:58 UTC
I guess I should just be honest: I have no plans to support systemd on older kernel versions. Sorry.

Comment 14 Ingo Molnar 2011-03-13 15:32:09 UTC
Created attachment 484007 [details]
systemd cgroups fix


Right now systemd hangs during early bootup when a !cgroups kernel
is booted. This behavior is pretty harmful not just because it
breaks compatibility with kernel .config's that booted just
fine before, but also because:

 - it makes it impossible to boot older kernels

 - it makes it impossible to bisect kernel bugs back into days
   when cgroups support was not available - or into commits where
   cgroups support is broken

 - makes it impossible to boot randconfig or allnoconfig kernels.

The bootup critical path must be as robust and permissive as possible.

This patch fixes the cgroups problem by:

 - probing for cgroups support more gently
 - cgroup abstractions behaving as if the controller was empty

The cgroups data structures are left alone to keep things simple
and easily maintainable.

I have tested this on latest rawhide, with a cgroups enabled
custom-built-kernel, with a cgroups-disabled custom built kernel
and with the stock Rawhide kernel as well.

Signed-off-by: Ingo Molnar <mingo>

Comment 15 Bill Nottingham 2011-03-14 14:59:55 UTC
That might get you a systemd system that boots. Unless I'm reading it wrong, it's  not going to give you one that will work in a reliably correct manner.

Comment 16 Ingo Molnar 2011-03-14 18:34:17 UTC
I've seen no problems so far with a fairly complete Rawhide install. What kind of reliability problems would you expect?

Booting is a critical path: the kernel will not spuriously panic the system either just because there's no keyboard for example, we try hard to issue warnings and continue, whenever possible. It is absolutely vital for the boot critical path to be permissive and resilient.

Comment 17 Bill Nottingham 2011-03-14 18:41:53 UTC
The default method for killing services is via cgroups; if it just stubs out and succeeds (as your patch adds), I wouldn't expect stopping of services to work right at all.

Comment 18 Ingo Molnar 2011-03-14 19:35:23 UTC
Ah, "systemctl kill"?

It's been introduced in v12 but Google only finds 177 references to "systemctl kill" while "systemctl stop" has 10 times as many reference so I suspect kill is a very rarely used feature. (I only found documentation references to it.)

To solve this i can modify the patch to display a "cannot kill tasks on cgroup-less kernels" message. Or can add fallback logic to send a signal to the process group of the service. Which one would you prefer?

Thanks.

Comment 19 Lennart Poettering 2011-03-15 02:23:06 UTC
Well, we use the cgroups for a couple of things: we use it when shutting down a service to kill all its processes. We use it to determine the main process of a sysv service (enumerate though the cgroup and find the parent process of the others, if that is possible). We use it to figure out when a sysv with no declared "main" process exits. We use it to implement "systemd kill". We use it to implement the cgroup tree in "systemctl status". We use it to implement "systemd-cgls". We use it to do user session management, where your processes are killed when you log out. We use it for lifecycle management of XDG_RUNTIME_DIR. We use it to safely kill user processes before going down. Binaries and shell scripts use the equivalent of "test -d /sys/fs/cgroup/systemd" to detect a systemd boot. We use it to do "systemctl status $PID". We use it for this, and that and everything else.

Sure if you disable cgroups you can make the basics of systemd work, but the question is: do we want to support that? I am pretty sure I don't. I am not interested in all the bug reports that have to do with people having disabled cgroups in their kernel.

I am not entirely sure what the big deal is anyway. You cannot boot up a F14 system up without hotplug or devtmpfs or anything either. The cgroup stuff is just one option more.

If you need something minimal to test, then use init=/bin/sh. If you want something more powerful, then use systemd, but I am not sure I want to care for those cases where people want systemd but are unwilling to switch one more kernel option for that. We don't support systems without CONFIG_HOTPLUG either...

Sorry if that is disappointing.

Comment 20 Peter Zijlstra 2011-03-15 10:09:56 UTC
Yes you can boot without HOTPLUG and devtmpfs crap, just pre-populate a static /dev. The trouble is that getting a distro to boot a bare bzImage is getting harder and harder. Anaconda actively discourages !LVM setups (crashes half the time I try) (Linus complained about this LVM crap default too).

Having this LVM crap on by default mandates you use initrd nonsense, so FAIL.

Then there's the lack of /dev, so you have to boot with some distro kernel, bind mount /dev from under a tmpfs mount, cp device entries over, and then you're mostly set again.

Then you find that if you try to boot a localyesconfig half the init scripts fall flat on their face because modules are missing (they're built in) and refuse to start services like nfsd etc. FAIL, so you go hack initscripts to just start already.

Most times after an upgrade serial console stops working too, so then you have to go carry around computers/monitors/keyboard etc. to get shit working again, another FAIL.

But now, lo and behold, init won't even begin to work now, progress they call it, FAIL. /bin/sh isn't an option because then network,sshd etc. don't get started. FAIL.

Guys, please get a grip, this is total crap. Kernel devs don't run distro configs (for the very simple reason that building those takes forever -- >5 minutes), your distro is actively getting worse for non-default configs.

As it stands my last few fedora machines are going to get switched to another distro when its upgrade time, because I've utterly had it (and I haven't even talked about the utter POS called yum and release upgrades).

Comment 21 Ingo Molnar 2011-03-15 10:53:13 UTC
 "I am not entirely sure what the big deal is anyway. You cannot boot up a F14
  system up without hotplug or devtmpfs or anything either. The cgroup stuff is
  just one option more."

Unfortunately you seem to have no idea what you are talking about.

Firstly, your claim is factually false, i regularly boot !devtmpfs kernels:

 [root@aldebaran ~]# uname -a
 Linux aldebaran 2.6.38-tip+ #9 SMP Tue Mar 15 11:55:18 CET 2011 x86_64 x86_64  x86_64 GNU/Linux
 [root@aldebaran ~]# rpm -q fedora-release
 fedora-release-16-0.1.noarch
 [root@aldebaran ~]# zgrep DEVTMPFS /proc/config.gz 
 # CONFIG_DEVTMPFS is not set

(That's on one of my testboxes, updated to latest Rawhide and with a patched systemd to fix the boot hang i reported here. It is very simple to boot !devtmpfs kernels.)

Secondly, your claim about CONFIG_HOTPLUG is highly misleading: while hotlug is indeed required, hotplug is an infinitely more important kernel feature than cgroups will ever become: you cannot even disable HOTPLUG in the kernel config without going through some serious steps such as disabling various other kernel features and enabling CONFIG_EXPERT first.

The key guiding principle is that any restrictions on the Linux boot path should be added very, very conservatively. If at all possible the bootup logic must be permissive, modular and neutral, not restrictive, monolithic and opinionated. (Go ask Linus or Andrew Morton if you need a different upstream authority than my word.)

The kernel follows that principle very, very strongly: the kernel will not crash on bootup if there is no keyboard (or monitor) plugged in, and it will try to WARN_ON() when it finds a problem - not BUG_ON(). Linus regularly flames developers who add a bogus boot-stopper aspect to the kernel.

systemd, by extension, must follow the same principle. There's no ifs and when about it.

Your excuses about 'systemctl kill' functionality is also misleading: i repeat, my testsystem with patched systemd is working fine and i am not using "systemctl kill" nor do i want to - the regular "systemctl stop" path works just fine. (I also offered to print a message if it's used on a !cgroups kernel - it's similar to what other tools do if some particular kernel feature is not available.)

You are adding additional, arbitrary restrictions on what will boot and what will not under Fedora and this is actively harmful to those who try to develop, maintain and test kernel features using Fedora.

I am a Red Hat employed upstream kernel maintainer and i am in the somewhat grotesque situation currently that i am unable to boot latest Fedora Rawhide with about half of the kernel configs i get from testers - and after refusing to fix it yourself you are now refusing to take this rather simple fix and you keep coming up with the factually false reasons to not apply it. Why?

Are you suggesting that i should migrate over to another Linux distribution, because you feel that it's not important that kernel maintainers be able to test Fedora Rawhide? I need to be able to test generic kernels, as the scheduler maintainer i need to be able to tes !cgroups kernels just as much as i am testing cgroups kernels.

What you are doing here is not merely "disappointing", it is unfortunately also very harmful to Red Hat's upstream kernel efforts, so i cannot ignore this.

If you do not have the time and experience to care about a generic, modular, resilient Fedora boot path i can understand that - please declare if it's so and then i can co-maintain those bits for you. You do not *have to* do this, just do not stand in the way of others who are willing to do it.

Thanks.

Comment 22 Michal Schmidt 2011-03-15 12:44:42 UTC
(In reply to comment #21)
> my testsystem with patched systemd is working fine and i am not using
> "systemctl kill" nor do i want to - the regular "systemctl stop" path works
> just fine. (I also offered to print a message if it's used on a !cgroups kernel
> - it's similar to what other tools do if some particular kernel feature is not
> available.)

Lennart,
if systemd with crippled functionality is sufficient for Ingo and others, I don't see why not to allow this. I can understand your worry of useless bugreports, but as long as a clear warning about this is put into the log, there will not be many of these cases filed in BZ, and those that will filed anyway should be easy to spot and closed as invalid.
Perhaps to make it very obvious systemctl could repeat the warning every time it is run by the user.

Comment 23 Michal Schmidt 2011-03-15 13:05:07 UTC
(In reply to comment #20)
> Anaconda actively discourages !LVM setups (crashes half the time I try)

LVM is merely a default. Non-LVM setups are supposed to work. Many users have no problems with it. If you're seeing bugs in anaconda, please file them in Bugzilla. I can assure you there are no intentional bugs to discourage non-LVM installations.
By the way, Fedora will probably abandon LVM by default in the future.

> Then you find that if you try to boot a localyesconfig half the init scripts
> fall flat on their face because modules are missing (they're built in) and
> refuse to start services like nfsd etc. FAIL, so you go hack initscripts to
> just start already.

If you hit a bug like this and fix the initscript for yourself, why don't you file a bug with the diff in order to have it fixed for others? I don't see why maintainers would object to having the scripts work in both modular and built-in configs.

> Most times after an upgrade serial console stops working too, so then you have
> to go carry around computers/monitors/keyboard etc. to get shit working again,
> another FAIL.

This too is certainly not intentional and should be treated as a bug.

> (and I haven't even talked about the utter POS called yum and release
> upgrades).

FWIW, though yum is far from perfect, yum upgrades have been working fine for me for several releases.

If you want to discuss specific bugs, file them in Bugzilla under appropriate components. 
If you want to discuss the general direction of Fedora, try the devel@ mailing list.
Feel free to reply to me in private.
I apologize to others for replying to off-topic rants.

Comment 24 Bill Nottingham 2011-03-15 21:17:56 UTC
Created attachment 485606 [details]
semi-serious patch: kernel cgroups fix!

(In reply to comment #21)
> Your excuses about 'systemctl kill' functionality is also misleading: i repeat,
> my testsystem with patched systemd is working fine

That really depends on what features you're using, and expecting to work. Some examples:
Normal system:

# systemctl status sendmail.service
sendmail.service - LSB: start and stop sendmail
	  Loaded: loaded (/etc/rc.d/init.d/sendmail)
	  Active: active (exited) since Tue, 15 Mar 2011 12:29:36 -0400; 4h 25min ago
	 Process: 24476 ExecReload=/etc/rc.d/init.d/sendmail reload (code=exited, status=0/SUCCESS)
	Main PID: 17073 (code=exited, status=0/SUCCESS)
	  CGroup: name=systemd:/system/sendmail.service
		  └ 24487 sendmail: accepting connections

Patched system:

# systemctl status sendmail.service
sendmail.service - LSB: start and stop sendmail
	  Loaded: loaded (/etc/rc.d/init.d/sendmail)
	  Active: active (exited) since Tue, 15 Mar 2011 16:52:00 -0400; 43s ago
	 Process: 676 ExecStart=/etc/rc.d/init.d/sendmail start (code=exited, status=0/SUCCESS)
	  CGroup: name=systemd:(null)/sendmail.service

No pid, no process information, etc.

Again on a patched system:

[root@localhost ~]# systemctl start acpid.service
[root@localhost ~]# ps axf | grep acpid
 1321 ?        Ss     0:00 /usr/sbin/acpid
[root@localhost ~]# systemctl status acpid.service
acpid.service - ACPI Event Daemon
	  Loaded: loaded (/lib/systemd/system/acpid.service)
	  Active: inactive (dead) since Tue, 15 Mar 2011 16:58:45 -0400; 8s ago
	 Process: 1320 ExecStart=/usr/sbin/acpid $OPTIONS (code=exited, status=0/SUCCESS)
	  CGroup: name=systemd:(null)/acpid.service

Here it's not even that it's missing the pid... it's claiming it's not running when it is.

So, while we could patch systemd to limp along in unsupported mode in the absence of cgroups, claiming that it's 'working fine' in that case with the patch seems patently false.

There's the alternative method, of course, in the patch attached here - make cgroups & devtmpfs mandatory in the same way as other things. Given the amount of stuff that defaults to yes hidden behind expert that has nothing to do with boot (AIO? kallsyms?), it seems like something that's at least worth some discussion.

Comment 25 Ingo Molnar 2011-03-18 13:32:27 UTC
  "So, while we could patch systemd to limp along in unsupported mode in the
   absence of cgroups, claiming that it's 'working fine' in that case with the
   patch seems patently false."

It's still working fine for me on several boxes i use for kernel testing so what's your point? (And no, i did not claim it's working fine in general - obviously cgroups functionality will not work - i have even described that.)

You have also not replied to my big picture arguments *at all*.

I am running non-cgroups or older kernels regularly, and i have to, because i develop the kernel. We work hard on the kernel side to allow old userspace to still boot fine. I expect the extended arm of the kernel boot mechanism, systemd, to be similarly resilient when it boots up older (or just differently configured) kernels. Like it was for 15 years before it was broken ...

If i do not build NFS functionality into the kernel i do not expect 'service nfs start' to work. If i do not build futex support into the kernel i do not expect 'yum update' (or other, futex-dependent functionality) to work.

If i do not build cgroups support into the kernel, if i boot a kernel where during startup systemd warns me that cgroups is not there, i do not expect cgroups functionality to work.

But i very much do not want 'service nfs start' or 'yum update' to intentionally prevent my box from booting - and i do not want systemd to prevent my box from booting either.

Please include this small trivial patch to make it possible for me (and other upstream developers, some of whom commented in this bugzilla entry) to continue to be able to develop the kernel under Fedora. Please also try to understand the principles we described here as they are important - so that we are not in a similar situation a few months down the line.

I can make any modification to it you ask to make it more palatable to you. It will not affect the Fedora build and installation of systemd, since Fedora boots with a cgroups-enabled kernel.

Thanks.

Comment 26 Bill Nottingham 2011-03-18 17:40:58 UTC
(In reply to comment #25)
>   "So, while we could patch systemd to limp along in unsupported mode in the
>    absence of cgroups, claiming that it's 'working fine' in that case with the
>    patch seems patently false."
> 
> It's still working fine for me on several boxes i use for kernel testing so
> what's your point? (And no, i did not claim it's working fine in general -
> obviously cgroups functionality will not work - i have even described that.)

The point is, it's not a useful way to run it. cgroups is core functionality of systemd, not an option. You can claim it's "working fine for you", but when core functionality doesn't work (i.e., proper tracking of services), that's an pretty large stretch.

You're essentially asking a program to allow users to run it in a crippled mode where it's known not to work properly for users/admins, where it's not going to be tested, and where it's not going to be supported anyway. In the grand scheme of things, it's pretty dumb to spend time coding for that case - it could be added, but it's not going to be regularly tested, it's not going to be maintained, and so on.

I'm not saying it can't be added; just that adding these sorts of code paths isn't really good practice.

> I am running non-cgroups or older kernels regularly, and i have to, because i
> develop the kernel. We work hard on the kernel side to allow old userspace to
> still boot fine. I expect the extended arm of the kernel boot mechanism,
> systemd, to be similarly resilient when it boots up older (or just differently
> configured) kernels. Like it was for 15 years before it was broken ...

Fedora already requires 2.6.32 or later to boot, completely independent of systemd. I haven't seen any righteous indignation about that.

> If i do not build NFS functionality into the kernel i do not expect 'service
> nfs start' to work. If i do not build futex support into the kernel i do not
> expect 'yum update' (or other, futex-dependent functionality) to work.

If you disable futex, libc breaks. That's why the kernel doesn't let you disable it unless you configure it in 'I want to shoot myself' mode, because the commonly accepted userspace requires it as minimum functionality.

All I'm saying is that if the commonly accepted userspace requires cgroups, then perhaps the kernel should adjust for that feature as well in its configury. (Perhaps not now, but 6 months down the road...)

Comment 27 Lennart Poettering 2011-03-18 17:57:33 UTC
So, in the interest of peace I will modify systemd, to ignore cgroup errors, as Ingo requested. However, it will print a big big warning if you do this, and will sleep for 5s so that people see it. And I will ignore all bugs reported when things don't work properly if this is used, for example when the system cannot shut down anymore, or if the systemctl status output is all borked. I will mercilessly close all bug reports that whine about this. It is your own fault if you run things this way, and I have no plans in supporting this hack the tiniest bit beyond that I will make systemd not terminate when it stumbles over a kernel with missing cgroup support.

Comment 28 Lennart Poettering 2011-04-12 21:12:11 UTC
So, here you go:

http://cgit.freedesktop.org/systemd/patch/?id=e5a53dc74636ffa9de639733a0bef65f967c9ffa

Very lightly tested, but should do what is needed. Will enter F15 eventually, too.

Comment 29 Fedora Update System 2011-04-21 02:00:30 UTC
systemd-25-1.fc15 has been submitted as an update for Fedora 15.
https://admin.fedoraproject.org/updates/systemd-25-1.fc15

Comment 30 Fedora Update System 2011-04-21 03:01:55 UTC
Package systemd-25-1.fc15:
* should fix your issue,
* was pushed to the Fedora 15 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing systemd-25-1.fc15'
as soon as you are able to, then reboot.
Please go to the following url:
https://admin.fedoraproject.org/updates/systemd-25-1.fc15
then log in and leave karma (feedback).

Comment 31 Michal Schmidt 2011-04-22 11:17:34 UTC
systemd-25 did not quite fix it. I have posted a patch which makes it work for me:
http://lists.freedesktop.org/archives/systemd-devel/2011-April/002065.html

Comment 32 Lennart Poettering 2011-04-27 02:17:53 UTC
fixed in git.

Comment 33 Fedora Update System 2011-05-01 03:22:55 UTC
systemd-25-1.fc15 has been pushed to the Fedora 15 stable repository.  If problems still persist, please make note of it in this bug report.