Bug 1746355 - Error starting daemon: Devices cgroup isn't mounted
Summary: Error starting daemon: Devices cgroup isn't mounted
Keywords:
Status: NEW
Alias: None
Product: Fedora
Classification: Fedora
Component: moby-engine
Version: 31
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Olivier Lemasle
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard: RejectedBlocker
: 1751636 1757078 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-08-28 09:17 UTC by Lukas Slebodnik
Modified: 2019-12-01 20:01 UTC (History)
36 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-10-29 02:00:51 UTC


Attachments (Terms of Use)

Description Lukas Slebodnik 2019-08-28 09:17:55 UTC
Description of problem:
The default cgroup hierarchy is set to unified (cgroups v2) (#1732114).
https://fedoraproject.org/wiki/Changes/CGroupsV2
and thus moby-engine(docker.service) does not work on f31 by default


Version-Release number of selected component (if applicable):
sh$ rpm -q moby-engine systemd
moby-engine-18.09.8-2.ce.git0dd43dd.fc31.x86_64
systemd-243~rc2-1.fc31.x86_64

How reproducible:
Deterministic

Steps to Reproduce:
1. boot minimal machine with >= systemd-243~rc2-1.fc31.x86_64
2. dnf install -y moby-engine
3. systemctl start docker.service

Actual results:

sh# systemctl start docker.service
Job for docker.service failed because the control process exited with error code.
See "systemctl status docker.service" and "journalctl -xe" for details.

sh# systemctl status docker.service | cat
● docker.service - Docker Application Container Engine
   Loaded: loaded (/usr/lib/systemd/system/docker.service; disabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Wed 2019-08-28 11:16:10 CEST; 4s ago
     Docs: https://docs.docker.com
  Process: 21555 ExecStart=/usr/bin/dockerd --host=fd:// --exec-opt native.cgroupdriver=systemd $OPTIONS (code=exited, status=1/FAILURE)
 Main PID: 21555 (code=exited, status=1/FAILURE)
      CPU: 192ms

Aug 28 11:16:10 kvm-01-guest06.lab.eng.brq.redhat.com systemd[1]: docker.service: Service RestartSec=100ms expired, scheduling restart.
Aug 28 11:16:10 kvm-01-guest06.lab.eng.brq.redhat.com systemd[1]: docker.service: Scheduled restart job, restart counter is at 3.
Aug 28 11:16:10 kvm-01-guest06.lab.eng.brq.redhat.com systemd[1]: Stopped Docker Application Container Engine.
Aug 28 11:16:10 kvm-01-guest06.lab.eng.brq.redhat.com systemd[1]: docker.service: Start request repeated too quickly.
Aug 28 11:16:10 kvm-01-guest06.lab.eng.brq.redhat.com systemd[1]: docker.service: Failed with result 'exit-code'.
Aug 28 11:16:10 kvm-01-guest06.lab.eng.brq.redhat.com systemd[1]: Failed to start Docker Application Container Engine.

Expected results:
The service docker.service is running without any problem


Additional info:
Aug 28 11:16:09 host.example.com dockerd[21555]: time="2019-08-28T11:16:09.246674880+02:00" level=info msg=serving... address=/var/run/docker/containerd/containerd-debug.sock
Aug 28 11:16:09 host.example.com dockerd[21555]: time="2019-08-28T11:16:09.246765661+02:00" level=info msg=serving... address=/var/run/docker/containerd/containerd.sock.ttrpc
Aug 28 11:16:09 host.example.com dockerd[21555]: time="2019-08-28T11:16:09.246850456+02:00" level=info msg=serving... address=/var/run/docker/containerd/containerd.sock
Aug 28 11:16:09 host.example.com dockerd[21555]: time="2019-08-28T11:16:09.246908728+02:00" level=info msg="containerd successfully booted in 0.005902s"
Aug 28 11:16:09 host.example.com dockerd[21555]: time="2019-08-28T11:16:09.250414430+02:00" level=info msg="pickfirstBalancer: HandleSubConnStateChange: 0xc00090edb0, READY" module=grpc
Aug 28 11:16:09 host.example.com dockerd[21555]: time="2019-08-28T11:16:09.257215489+02:00" level=info msg="parsed scheme: \"unix\"" module=grpc
Aug 28 11:16:09 host.example.com dockerd[21555]: time="2019-08-28T11:16:09.257326949+02:00" level=info msg="scheme \"unix\" not registered, fallback to default scheme" module=grpc
Aug 28 11:16:09 host.example.com dockerd[21555]: time="2019-08-28T11:16:09.257423104+02:00" level=info msg="parsed scheme: \"unix\"" module=grpc
Aug 28 11:16:09 host.example.com dockerd[21555]: time="2019-08-28T11:16:09.257490048+02:00" level=info msg="scheme \"unix\" not registered, fallback to default scheme" module=grpc
Aug 28 11:16:09 host.example.com dockerd[21555]: time="2019-08-28T11:16:09.259201027+02:00" level=info msg="[graphdriver] using prior storage driver: overlay2"
Aug 28 11:16:09 host.example.com dockerd[21555]: time="2019-08-28T11:16:09.268368337+02:00" level=info msg="ccResolverWrapper: sending new addresses to cc: [{unix:///var/run/docker/containerd/containerd.sock 0  <nil>}]" module=grpc
Aug 28 11:16:09 host.example.com dockerd[21555]: time="2019-08-28T11:16:09.268525439+02:00" level=info msg="ClientConn switching balancer to \"pick_first\"" module=grpc
Aug 28 11:16:09 host.example.com dockerd[21555]: time="2019-08-28T11:16:09.268608193+02:00" level=info msg="pickfirstBalancer: HandleSubConnStateChange: 0xc000652c70, CONNECTING" module=grpc
Aug 28 11:16:09 host.example.com dockerd[21555]: time="2019-08-28T11:16:09.268975701+02:00" level=info msg="pickfirstBalancer: HandleSubConnStateChange: 0xc000652c70, READY" module=grpc
Aug 28 11:16:09 host.example.com dockerd[21555]: time="2019-08-28T11:16:09.269076346+02:00" level=info msg="ccResolverWrapper: sending new addresses to cc: [{unix:///var/run/docker/containerd/containerd.sock 0  <nil>}]" module=grpc
Aug 28 11:16:09 host.example.com dockerd[21555]: time="2019-08-28T11:16:09.269121236+02:00" level=info msg="ClientConn switching balancer to \"pick_first\"" module=grpc
Aug 28 11:16:09 host.example.com dockerd[21555]: time="2019-08-28T11:16:09.269184880+02:00" level=info msg="pickfirstBalancer: HandleSubConnStateChange: 0xc000652f40, CONNECTING" module=grpc
Aug 28 11:16:09 host.example.com dockerd[21555]: time="2019-08-28T11:16:09.269537140+02:00" level=info msg="pickfirstBalancer: HandleSubConnStateChange: 0xc000652f40, READY" module=grpc
Aug 28 11:16:09 host.example.com dockerd[21555]: time="2019-08-28T11:16:09.342301674+02:00" level=info msg="Graph migration to content-addressability took 0.00 seconds"
Aug 28 11:16:09 host.example.com dockerd[21555]: time="2019-08-28T11:16:09.342516945+02:00" level=warning msg="Your kernel does not support cgroup memory limit"
Aug 28 11:16:09 host.example.com dockerd[21555]: time="2019-08-28T11:16:09.342566592+02:00" level=warning msg="Unable to find cpu cgroup in mounts"
Aug 28 11:16:09 host.example.com dockerd[21555]: time="2019-08-28T11:16:09.342612586+02:00" level=warning msg="Unable to find blkio cgroup in mounts"
Aug 28 11:16:09 host.example.com dockerd[21555]: time="2019-08-28T11:16:09.342657229+02:00" level=warning msg="Unable to find cpuset cgroup in mounts"
Aug 28 11:16:09 host.example.com dockerd[21555]: time="2019-08-28T11:16:09.342712706+02:00" level=warning msg="mountpoint for pids not found"
Aug 28 11:16:09 host.example.com dockerd[21555]: time="2019-08-28T11:16:09.343018181+02:00" level=info msg="stopping event stream following graceful shutdown" error="context canceled" module=libcontainerd namespace=plugins.moby
Aug 28 11:16:09 host.example.com dockerd[21555]: time="2019-08-28T11:16:09.343162904+02:00" level=info msg="stopping healthcheck following graceful shutdown" module=libcontainerd
Aug 28 11:16:09 host.example.com dockerd[21555]: time="2019-08-28T11:16:09.345078327+02:00" level=info msg="pickfirstBalancer: HandleSubConnStateChange: 0xc000652f40, TRANSIENT_FAILURE" module=grpc
Aug 28 11:16:09 host.example.com dockerd[21555]: time="2019-08-28T11:16:09.345181054+02:00" level=info msg="pickfirstBalancer: HandleSubConnStateChange: 0xc000652f40, CONNECTING" module=grpc
Aug 28 11:16:10 host.example.com dockerd[21555]: Error starting daemon: Devices cgroup isn't mounted
Aug 28 11:16:10 host.example.com audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=docker comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed'
Aug 28 11:16:10 host.example.com systemd[1]: docker.service: Main process exited, code=exited, status=1/FAILURE
Aug 28 11:16:10 host.example.com systemd[1]: docker.service: Failed with result 'exit-code'.
Aug 28 11:16:10 host.example.com systemd[1]: Failed to start Docker Application Container Engine.
Aug 28 11:16:10 host.example.com systemd[1]: docker.service: Service RestartSec=100ms expired, scheduling restart.
Aug 28 11:16:10 host.example.com systemd[1]: docker.service: Scheduled restart job, restart counter is at 3.
Aug 28 11:16:10 host.example.com systemd[1]: Stopped Docker Application Container Engine.

Comment 1 Lukas Slebodnik 2019-08-28 09:20:02 UTC
Workaround is add kernel commandline option: systemd.unified_cgroup_hierarchy=0

Comment 2 nicolasoliver03 2019-09-13 22:21:35 UTC
I am having the same problem in Fedora IoT 31.
The workaround posted by Lukas also works for me (rpm-ostree kargs --editor, add systemd.unified_cgroup_hierarchy=0, and systemctl reboot)

Comment 3 Lukas Slebodnik 2019-09-16 15:23:45 UTC
*** Bug 1751636 has been marked as a duplicate of this bug. ***

Comment 4 Fedora Blocker Bugs Application 2019-10-14 21:30:33 UTC
Proposed as a Blocker for 31-final by Fedora user leonid224 using the blocker tracking app because:

 Docker (moby-engine) is a major component with a lot of users. The workaround in the bug proposes switching systemd to use cgroups1 instead of cgroups2. I suspect that cgroups1, while well-tested by the virtue of being used in Fedora for many releases, isn't well-tested specifically with Fedora 31, where cgroups2 is the default and many things might implicitly rely on cgroups2.

Comment 5 Lukas Slebodnik 2019-10-14 22:34:16 UTC
(In reply to Fedora Blocker Bugs Application from comment #4)
> Proposed as a Blocker for 31-final by Fedora user leonid224 using the
> blocker tracking app because:
> 
>  Docker (moby-engine) is a major component with a lot of users. The
> workaround in the bug proposes switching systemd to use cgroups1 instead of
> cgroups2. I suspect that cgroups1, while well-tested by the virtue of being
> used in Fedora for many releases, isn't well-tested specifically with Fedora
> 31, where cgroups2 is the default and many things might implicitly rely on
> cgroups2.

I test cgroups V1 daily with moby-engine on fedora 31.
And not just with moby-engine also with podman. And life is much more stable with cgroups V1

Comment 6 Zbigniew Jędrzejewski-Szmek 2019-10-15 06:38:10 UTC
-1 for blocker.

This is unfortunate, but docker is a package with troubled upstream. This issue could have
been handled on the docker side any time during the last ... 5 years (I think that as of
kernel 3.16 from August 2014 the cgroupsv2 api was more or less finalized). We cannot block
or delay Fedora based on the hope that this will happen next week.

The number of users who need docker is a small fraction of Fedora users.

Comment 7 Sam 2019-10-15 11:20:05 UTC
Without Docker how can Fedora serve as a development platform for Kubernetes deployments?

Not supporting Docker seems to remove Fedora from much of the cloud work being done by OpenShift and will set back its adoption significantly in a space that is still gathering interest and momentum.

Comment 8 Pablo Iranzo Gómez 2019-10-15 11:24:44 UTC
(In reply to Sam from comment #7)
> Without Docker how can Fedora serve as a development platform for Kubernetes
> deployments?
> 
> Not supporting Docker seems to remove Fedora from much of the cloud work
> being done by OpenShift and will set back its adoption significantly in a
> space that is still gathering interest and momentum.

In the meantime, you can use podman to run the containers as a workaround (as I do)

Comment 9 Zbigniew Jędrzejewski-Szmek 2019-10-15 12:05:53 UTC
> Without Docker how can Fedora serve as a development platform for Kubernetes deployments?

Let's not get overly dramatic. You can either a) use one of the other implementations, podman,
etc, or b) simply set the kernel option. Having to set a kernel option is not the end of the world.
Running with cgroups v1 is still supported, just not the default.

Comment 10 Sam 2019-10-15 12:40:00 UTC
(In reply to Zbigniew Jędrzejewski-Szmek from comment #9)
> > Without Docker how can Fedora serve as a development platform for Kubernetes deployments?
> 
> a) use one of the other implementations, podman,

The problem is not that Fedora cannot run containers, the problem is that a development environment for many Kubernetes installations needs to run `docker build` with a Docker file. If a developer can't build their deployment artifact on their machine, they will be very unlikely to use Fedora. For good or bad, Kubernetes most often means Docker.

> Running with cgroups v1 is still supported, just not the default.

I appreciate this. The engineers I work with will not use Fedora if they must touch kernel arguments.

Comment 11 Sam 2019-10-15 12:42:54 UTC
For others reading, `podman build` does support Dockerfiles. I wasn't clear in my previous post. You _can_ build images, just not with the technology that may finally end up running them.

Comment 12 Lukas Slebodnik 2019-10-15 21:28:38 UTC
BTW moby-engine(docker) can use oci runtime (which support cgroupsv2) but it is not enough because docker daemon still expects cgroups v1

(In reply to Sam from comment #10)
> > Running with cgroups v1 is still supported, just not the default.
> 
> I appreciate this. The engineers I work with will not use Fedora if they
> must touch kernel arguments.

Adding "systemd.unified_cgroup_hierarchy=0" the option GRUB_CMDLINE_LINUX in /etc/sysconfig/grub
is very trivial. And they still can use fedora 30 for moby-engine if they do not wand to touch kernel arguments.
Maybe mob-engine upstream will solve it meanwhile.

Comment 13 Adam Williamson 2019-10-16 01:04:22 UTC
Yeah, I think I'm -1 on this. Per the criteria we don't block on anything container-y, and if we were going to, it'd likely be podman, not docker.

Comment 14 Brian 'redbeard' Harrington 2019-10-16 01:41:58 UTC
-1

This sounds like errata to be documented given the scope of effect (only users who chose to use Docker for containerization) and numerous workarounds.

Comment 15 František Zatloukal 2019-10-16 14:40:54 UTC
-1 Blocker

Comment 16 Adam Williamson 2019-10-16 15:06:25 UTC
That's -4, so rejecting.

Comment 17 Adam Williamson 2019-10-29 02:00:51 UTC

*** This bug has been marked as a duplicate of bug 1757078 ***

Comment 18 Lukas Slebodnik 2019-10-31 21:42:04 UTC
*** Bug 1757078 has been marked as a duplicate of this bug. ***

Comment 19 Alexander von Gluck IV 2019-11-09 15:23:33 UTC
A quick bit of commentary late to the game.

I'd personally prefer to use podman for building my containers, however podman images are *not* compatible with docker 19.03 at the moment due to the following bug:
https://github.com/moby/moby/issues/39727

Pretty much:
  * Build Dockerfile with podman, no issues
  * Push image to hub.docker.com, no issues
  * Pull image to docker 19.03.x system to deploy:
    * Error response from daemon: mediaType in manifest should be 'application/vnd.docker.distribution.manifest.v2+json' not ''

That kind of sucks... so on Fedora 31 I can't generate container images compatible with docker without the cgroup v1 hack.

Comment 20 Adam Williamson 2019-11-12 23:55:45 UTC
"That kind of sucks... so on Fedora 31 I can't generate container images compatible with docker without the cgroup v1 hack."

I mean, to be clear, it's not a "hack". It's a configuration option. It is an entirely supported one that we expect people to use, and that's why it's there: we know some people will need cgroups v1, for legitimate reasons. You don't need to worry that you're doing something hacky or temporary or potentially broken or anything, just because you're picking this configuration option.

Comment 21 Lukas Slebodnik 2019-11-13 21:19:30 UTC
BTW Is there an upstream issue for moby and cgroups v2?

Comment 22 Sam 2019-11-15 06:42:32 UTC
Re Adam Williamson, it is true kernel arguments are not "hacks," but they are never addressed during system upgrades. In 5 months when I'm installing Fedora 32, I expect I will not be notified if cgroups v1 is required or not, nor that I can move to v2 when the Docker installation understands it. My kernel options may just be kicked to an rpmnew file and I'll be back to this ticket again.

At this very moment I'm staring at the SBT docker plugin's error message that it can't build a docker image now that I've aliased podman to docker. I don't want a sleeping configuration surprised in my grub configuration that will break my system 5 months from now when I've forgotten about cgroup namespacing. 

I love that Fedora is often very cutting-edge, but breaking Docker is a significant every-day problem for me. Podman isn't sufficient. Kernel arguments are not well supported, and I frankly don't trust that v1 is fully regression tested so that it is _actually_ an expected configuration more than just a possible one that _should_ work.

Comment 23 Leonid Podolny 2019-11-16 16:04:00 UTC
To Sam:
(I am the person who proposed this ticket as a release blocker).

On a practical level, instead of aliasing, you could install a package called "podman-docker", which expressly mimics docker command line options. For me, it solved all the inconsistencies I saw this far.
As to your philosophical argument, it's docker's rather than Fedora's fault, cgroups v2 isn't exactly new or a surprise to anyone. Docker just dropped a ball as an upstream. You could take the "I was trying to upgrade Fedora, so it's Fedora's regression" stand, but the thing is that Fedora has many thousands of upstreams, each with their own bugs, it needs to find some kind of a compromise between them. Here it seems to have chosen to remain close to systemd, which is a more important upstream than docker.


Note You need to log in before you can comment on or make changes to this bug.