Description of problem: Using Dockerfile like this: FROM mattdm/fedora:latest RUN yum update -y RUN yum install -y redis RUN systemctl enable redis.service RUN systemctl start redis.service EXPOSE 6379 ENTRYPOINT ["/usr/bin/redis-cli"] The step 'systemctl start redis.service' failed with this error message: Failed to get D-Bus connection: No connection to service manager. Version-Release number of selected component (if applicable): Name : docker-io Arch : x86_64 Version : 0.7 Release : 0.17.rc6.fc20 Steps to Reproduce: 1. Save the above Dockerfile 2. Run: docker build -t test/redis . 3. The build fill faile on systemctl start. Actual results: The service failed to start due to D-BUS connection. Expected results: The service should be started? Additional info:
https://github.com/dotcloud/docker/issues/2296 There is another issue reported on Github with the same problem. The solution as far as I understand is to use Fedora 'machine' container, in other words start the systemd service in Dockerfile? So if I want to build a Docker container built on top of Fedora 20 image, I created this Dockerfile (I might be completely wrong on this ;-) FROM mattdm/fedora RUN yum install -y redis RUN systemctl enable redis.service RUN /usr/lib/systemd/systemd --system & EXPOSE 6379 ENTRYPOINT ["/usr/bin/redis-cli"] I wonder about this line: RUN /usr/lib/systemd/systemd --system & If I do this inside the container, it starts systemd deamon and some services seems to start as well: [root@localhost redis-server]# docker run -i -t 275d2bce86d7 /bin/bash bash-4.2# systemctl start redis.service Failed to get D-Bus connection: No connection to service manager. bash-4.2# /usr/lib/systemd/systemd --system & [1] 7 bash-4.2# systemd 204 running in system mode. (+PAM +LIBWRAP +AUDIT +SELINUX +IMA +SYSVINIT +LIBCRYPTSETUP +GCRYPT +ACL +XZ) Detected virtualization 'lxc'. Failed to set hostname to <41a78414b3fd>: Operation not permitted No control group support available, not creating root group. Failed to open /dev/autofs: No such file or directory Failed to initialize automounter: No such file or directory ... ...more logs... bash-4.2# ps aux USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 1 0.0 0.0 11732 1652 ? S 20:24 0:00 /bin/bash root 7 0.2 0.1 46156 3380 ? S 20:25 0:00 /usr/lib/systemd/systemd --system redis 40 0.0 0.3 46576 7548 ? Ssl 20:25 0:00 /usr/sbin/redis-server /etc/redis.conf root 47 0.0 0.0 124096 1496 ? Ss 20:25 0:00 /usr/sbin/crond -n root 48 0.0 0.0 110000 824 tty1 Ss+ 20:25 0:00 /sbin/agetty --noclear -s console 115200 38400 9600 root 62 0.0 0.1 83620 3708 ? Ss 20:25 0:00 /usr/sbin/sendmail -bd -q1h root 64 0.0 0.0 11264 1040 ? R+ 20:25 0:00 ps aux And also Redis is here! (Because I enabled it in one of the RUN commands). I wonder, if this is the right way to build Fedora based Docker containers, or if it is documented somewhere.
Some documentation would be good. There are basically two ways to go. First, the "application container" model. Here, you don't use systemctl to launch the service -- you just run it directly. This works very well for relatively simple daemons. The other approach is the "system container" model, where your docker environment is like a lightweight virtual machine. Here, you run systemd, and systemd manages your session (almost) as if it were running in a "real" machine. That looks like what you're trying to do.
Matthew thanks for the answer. So the 'recommended' way is to have container for each service? Like one for database, httpd server, redis, etc... and then glue them together?
I tried to use the same approach as above in order to run postgresql-server, but with no luck. The process isn't running, and testing the service status results in: Failed to get D-Bus connection: No connection to service manager.
Oved: I think it might be a not a bug but a wrong approach :-) See the 'system container' model Matthew described in Comment 2. In order to start postgresql service, you need to start 'systemd' itself inside the container, which will then pick up postgres unit file and start it.
(In reply to Michal Fojtik from comment #5) > Oved: I think it might be a not a bug but a wrong approach :-) See the > 'system container' model Matthew described in Comment 2. In order to start > postgresql service, you need to start 'systemd' itself inside the container, > which will then pick up postgres unit file and start it. I tried it, but got a lot of errors: sh-4.2# /usr/lib/systemd/systemd --system & [1] 6 sh-4.2# systemd 204 running in system mode. (+PAM +LIBWRAP +AUDIT +SELINUX +IMA +SYSVINIT +LIBCRYPTSETUP +GCRYPT +ACL +XZ) Detected virtualization 'lxc'. Failed to set hostname to <38620592650b>: Operation not permitted No control group support available, not creating root group. Failed to open /dev/autofs: No such file or directory Failed to initialize automounter: No such file or directory Unit proc-sys-fs-binfmt_misc.automount entered failed state. systemd-journald.service: main process exited, code=exited, status=218/CAPABILITIES Unit systemd-journald.service entered failed state. systemd-journald.service holdoff time over, scheduling restart. var-lib-nfs-rpc_pipefs.mount mount process exited, code=exited status=32 Unit var-lib-nfs-rpc_pipefs.mount entered failed state. proc-fs-nfsd.mount mount process exited, code=exited status=32 Unit proc-fs-nfsd.mount entered failed state. sys-kernel-debug.mount mount process exited, code=exited status=32 Unit sys-kernel-debug.mount entered failed state. dev-mqueue.mount mount process exited, code=exited status=32 Unit dev-mqueue.mount entered failed state. sys-kernel-config.mount mount process exited, code=exited status=32 Unit proc-fs-nfsd.mount entered failed state. sys-kernel-debug.mount mount process exited, code=exited status=32 Unit sys-kernel-debug.mount entered failed state. dev-mqueue.mount mount process exited, code=exited status=32 Unit dev-mqueue.mount entered failed state. sys-kernel-config.mount mount process exited, code=exited status=32 Unit sys-kernel-config.mount entered failed state. dev-hugepages.mount mount process exited, code=exited status=32 Unit dev-hugepages.mount entered failed state. systemd-journald.service: main process exited, code=exited, status=218/CAPABILITIES Unit systemd-journald.service entered failed state. systemd-journald.service holdoff time over, scheduling restart. systemd-binfmt.service: main process exited, code=exited, status=1/FAILURE Unit systemd-binfmt.service entered failed state. systemd-journald.service: main process exited, code=exited, status=218/CAPABILITIES Unit systemd-journald.service entered failed state. systemd-journald.service holdoff time over, scheduling restart. lvm2-lvmetad.service: Supervising process 33 which is not our child. We'll most likely not notice when it exits. systemd-journald.service: main process exited, code=exited, status=218/CAPABILITIES Unit systemd-journald.service entered failed state. systemd-journald.service holdoff time over, scheduling restart. systemd-journald.service: main process exited, code=exited, status=218/CAPABILITIES Unit systemd-journald.service entered failed state. systemd-journald.service holdoff time over, scheduling restart. ... ... ... systemd-journald.service: main process exited, code=exited, status=218/CAPABILITIES Unit systemd-journald.service entered failed state. systemd-journald.service holdoff time over, scheduling restart. systemd-journald.service: main process exited, code=exited, status=218/CAPABILITIES Unit systemd-journald.service entered failed state. systemd-journald.service holdoff time over, scheduling restart. systemd-journald.service start request repeated too quickly, refusing to start. Unit systemd-journald.service entered failed state. systemd-journald.service start request repeated too quickly, refusing to start. Unit systemd-journald.socket entered failed state. systemd-remount-fs.service: main process exited, code=exited, status=1/FAILURE Unit systemd-remount-fs.service entered failed state. Failed to create cgroup cpu:/: No such file or directory Failed to create cgroup cpu:/: No such file or directory iscsid.service: Supervising process 56 which is not our child. We'll most likely not notice when it exits. avahi-daemon.service: main process exited, code=exited, status=255/n/a Unit avahi-daemon.service entered failed state. postgresql.service: control process exited, code=exited status=206 Unit postgresql.service entered failed state. systemd-logind.service: main process exited, code=exited, status=218/CAPABILITIES Unit systemd-logind.service entered failed state. dbus.service: main process exited, code=exited, status=206/OOM_ADJUST Unit dbus.service entered failed state. dbus.service: main process exited, code=exited, status=206/OOM_ADJUST Unit dbus.service entered failed state. systemd-logind.service: main process exited, code=exited, status=218/CAPABILITIES Unit systemd-logind.service entered failed state. dbus.service: main process exited, code=exited, status=206/OOM_ADJUST Unit dbus.service entered failed state. systemd-logind.service holdoff time over, scheduling restart. dbus.service: main process exited, code=exited, status=206/OOM_ADJUST Unit dbus.service entered failed state. dbus.service start request repeated too quickly, refusing to start. Unit dbus.socket entered failed state. systemd-logind.service: main process exited, code=exited, status=218/CAPABILITIES Unit systemd-logind.service entered failed state. systemd-logind.service holdoff time over, scheduling restart. systemd-logind.service: main process exited, code=exited, status=218/CAPABILITIES Unit systemd-logind.service entered failed state. systemd-logind.service holdoff time over, scheduling restart. systemd-logind.service: main process exited, code=exited, status=218/CAPABILITIES Unit systemd-logind.service entered failed state. systemd-logind.service holdoff time over, scheduling restart. Some services eventually got started, but not all of them. Also, trying to start it manually results in: service postgresql restart Redirecting to /bin/systemctl restart postgresql.service Failed to get D-Bus connection: No connection to service manager.
More information: Similar error as above: bash-4.2# /usr/lib/systemd/systemd --system systemd 208 running in system mode. (+PAM +LIBWRAP +AUDIT +SELINUX +IMA +SYSVINIT +LIBCRYPTSETUP +GCRYPT +ACL +XZ) Welcome to Fedora 20 (Heisenbug)! Failed to set hostname to <24db4bdb5323>: Operation not permitted No control group support available, not creating root group. Failed to verify GPT partition /dev/dm-6: No such file or directory /usr/lib/systemd/system-generators/systemd-gpt-auto-generator exited with exit status 1. [ OK ] Reached target Remote File Systems. Starting Collect Read-Ahead Data... Segmentation fault (core dumped) When attempting to start without explicitly specifying --system: # /usr/lib/systemd/systemd Trying to run as user instance, but the system has not been booted with systemd.
(In reply to Raman Gupta from comment #7) > More information: > > Similar error as above: > > bash-4.2# /usr/lib/systemd/systemd --system > systemd 208 running in system mode. (+PAM +LIBWRAP +AUDIT +SELINUX +IMA > +SYSVINIT +LIBCRYPTSETUP +GCRYPT +ACL +XZ) > > Welcome to Fedora 20 (Heisenbug)! > > Failed to set hostname to <24db4bdb5323>: Operation not permitted > No control group support available, not creating root group. > Failed to verify GPT partition /dev/dm-6: No such file or directory > /usr/lib/systemd/system-generators/systemd-gpt-auto-generator exited with > exit status 1. > [ OK ] Reached target Remote File Systems. > Starting Collect Read-Ahead Data... > Segmentation fault (core dumped) I assume the segfault is the bug https://bugs.freedesktop.org/show_bug.cgi?id=74589 showing up.
I order to see this hang in Rawhide if you build an image based on the following dockerfile. # cat > Dockerfile << _EOF FROM fedora:rawhide MAINTAINER "Scott Collier" <scollier> ENV container docker RUN yum -y update; yum clean all RUN yum -y install httpd systemd; yum clean all; systemctl enable httpd.service RUN echo "Apache" >> /var/www/html/index.html EXPOSE 80 CMD ["/usr/sbin/init"] _EOF # docker build -t httpd_systemd . # docker run --rm httpd_systemd It will start systemd but it will only hang. It will be running with environment container=docker, but I think the problem maybe that systemd sees the processes still has "mknod" capability so it tries to do something that ends up getting blocked.
It looks that systemd is not allowed to mount /dev [root@notas ~]$ docker run --rm -i -t httpd_systemd /usr/lib/systemd/systemd --log-level=debug --log-target=console Failed to mount /dev: Operation not permitted Failed to mount /run: Operation not permitted
And if I am not mistaken it is not blocked, but simply gives up and freezes itself.
Since /dev is already established by docker, we do not want systemd to attempt to mount /dev. I believe this is the same way this worked in virt-sandbox. One difference is that docker does not remove cap_mknod, but blocks the creation of devices nodes based on cgroup config. Is this what is causing the problem?
Created attachment 889251 [details] output without freezing on dev There are other issues: Failed to mount /sys/fs/cgroup/devices: No such file or directory Detected virtualization 'other'. No control group support available, not creating root group. Failed to create mount unit file /run/systemd/generator/-.mount, as it already exists. Duplicate entry in /etc/fstab? Failed to open /dev/autofs: No such file or directory Failed to initialize automounter: No such file or directory
Well we had a long conversation on @docker with the docker guys and Lennart, it came down to systemd expecting /run and /dev being mounted and pre-created. Lennart also says that things will break with systemd running certain Unit files since docker was removing the CAP_SYS_ADMIN. Basically any unit file that specified a PrivateNet, PrivateDevice, PrivateTmp, since systemd would require SYS_ADMIN privs to be able to "mount" and create device nodes.
So what is the correct way to set things up and get them running?
Right now you could attempt to run systemd within a privileged container. I think some people have had success with this. Once we get the --opt option from docker, you should be able to turn on the CAP_SYS_ADMIN and then it will work. Upstream is working on some patches that should make this easier, without the --priv.
I'm sorry but at least on Fedora 20, with Dockerfile having just # cat Dockerfile # Clone from the Fedora 20 image FROM fedora:20 I get # docker run --privileged -ti -e 'container=docker' ContainerInterface /bin/bash bash-4.2# /usr/lib/systemd/systemd --system systemd 208 running in system mode. (+PAM +LIBWRAP +AUDIT +SELINUX +IMA +SYSVINIT +LIBCRYPTSETUP +GCRYPT +ACL +XZ) Detected virtualization 'other'. Welcome to Fedora 20 (Heisenbug)! Set hostname to <9982f7b90c53>. No control group support available, not creating root group. [ OK ] Reached target Remote File Systems. [ OK ] Listening on /dev/initctl Compatibility Named Pipe. [ OK ] Listening on Delayed Shutdown Socket. [ OK ] Reached target Paths. [ OK ] Reached target Encrypted Volumes. [ OK ] Listening on udev Kernel Socket. [ OK ] Listening on udev Control Socket. [ OK ] Listening on Journal Socket. Starting udev Coldplug all Devices... Segmentation fault bash-4.2# The same when I throw yum upgrade -y there as well.
Well that is better then before. I think we have to fix docker to provide /dev and /run so systemd does not start udev.
segfault in systemd is at src/core/unit.c:2145 because we are trying to strdup u->manager->cgroup_root but as shown in log we don't have root cgroup. I've always thought that process tracking capabilities, hence *some* cgroup support, is hard dependency, therefore I am surprised systemd doesn't exit immediately after failed attempt to create root cgroup. Maybe we should fix that and don't try to pretend we can do something sensible without cgroup support whatsoever. Not sure what we are missing here in privileged docker container but as systemd works just fine in nspawn containers I'd say that docker guys should check there how nspawn prepares container for running systemd in and do likewise for privileged docker containers.
Well as you said systemd should not segfault. I am working with upstream to separate out the creation of /dev and /run as mount points. But I am pretty sure docker team is not going to care for running systemd in a container. Those of us who want it will need to provide patches to systemd and/or docker to make it work.
Here is the what I get with the latest Docker and Rawhide systemd docker run --privileged --rm -ti -e 'container=docker' systemd_httpd sh sh-4.3# /usr/lib/systemd/systemd --system systemd 212 running in system mode. (+PAM +AUDIT +SELINUX +IMA +SYSVINIT +LIBCRYPTSETUP +GCRYPT +ACL +XZ +SECCOMP -APPARMOR) Detected virtualization 'other'. Detected architecture 'x86-64'. Welcome to Fedora 21 (Rawhide)! Set hostname to <d1a38d4e7d25>. No control group support available, not creating root group. Running in a container, ignoring fstab device entry for /dev/disk/by-uuid/82836b15-6e06-4406-8368-3a1a864947ae. bind() failed: No such file or directory [ OK ] Reached target Remote File Systems. [ OK ] Reached target Paths. Failed to open /dev/autofs: No such file or directory Failed to initialize automounter: No such file or directory [FAILED] Failed to set up automount Arbitrary Executable File Formats File System Automount Point. See 'systemctl status proc-sys-fs-binfmt_misc.automount' for details. Unit proc-sys-fs-binfmt_misc.automount entered failed state. [ OK ] Reached target Encrypted Volumes. [ OK ] Reached target Swap. Segmentation fault (core dumped) Not sure how to get it to stop automounter?
Same error char *unit_default_cgroup_path(Unit *u) { _cleanup_free_ char *escaped = NULL, *slice = NULL; int r; assert(u); if (unit_has_name(u, SPECIAL_ROOT_SLICE)) return strdup(u->manager->cgroup_root); Segfault. There is nothing mounted on /sys/fs/cgroup. Shouldn't systemd see this and do nothing with cgroups?
(In reply to Daniel Walsh from comment #22) > There is nothing mounted on /sys/fs/cgroup. Shouldn't systemd see this and > do nothing with cgroups? Systemd requires cgroups (the grouping, not the controllers). Systemd would better make it fail properly here, explaining what went wrong, instead of trying to go ahead without cgroups and crash. There once was the idea to be able to run systemd in degraded mode without cgroups, this code path is not used or tested anywhere and I doubt that this mode will be too useful today, random stuff will just fail in weird ways. The container environment has to setup and provide access to the cgroups tree for systemd to work as expected.
Where does systemd-nspawn setup the cgroups file system?
This seems to work for me. # docker run --privileged --rm -ti -e 'container=docker' -v /sys/fs/cgroup:/sys/fs/cgroup systemd_httpd sh ## /lib/systemd/systemd --system systemd 212 running in system mode. (+PAM +AUDIT +SELINUX +IMA +SYSVINIT +LIBCRYPTSETUP +GCRYPT +ACL +XZ +SECCOMP -APPARMOR) Detected virtualization 'other'. Detected architecture 'x86-64'. Welcome to Fedora 21 (Rawhide)! Set hostname to <6ee2dae7c974>. Running in a container, ignoring fstab device entry for /dev/disk/by-uuid/82836b15-6e06-4406-8368-3a1a864947ae. bind() failed: No such file or directory [ OK ] Reached target Remote File Systems. [ OK ] Reached target Paths. Failed to open /dev/autofs: No such file or directory Failed to initialize automounter: No such file or directory [FAILED] Failed to set up automount Arbitrary Executable File Formats File System Automount Point. See 'systemctl status proc-sys-fs-binfmt_misc.automount' for details. Unit proc-sys-fs-binfmt_misc.automount entered failed state. [ OK ] Reached target Encrypted Volumes. [ OK ] Reached target Swap. [ OK ] Created slice Root Slice. [ OK ] Listening on Journal Socket. [ OK ] Listening on /dev/initctl Compatibility Named Pipe. [ OK ] Listening on udev Kernel Socket. [ OK ] Listening on udev Control Socket. [ OK ] Listening on Delayed Shutdown Socket. [ OK ] Created slice User and Session Slice. [ OK ] Created slice System Slice. Mounting Debug File System... Starting Create Static Device Nodes in /dev... Starting Journal Service... [ OK ] Started Journal Service. Mounting POSIX Message Queue File System... Mounting Huge Pages File System... Starting udev Coldplug all Devices... [ OK ] Reached target Slices. Mounting Configuration File System... Mounting FUSE Control File System... Starting Apply Kernel Variables... Starting Remount Root and Kernel File Systems... [ OK ] Created slice system-getty.slice. [ OK ] Mounted FUSE Control File System. [ OK ] Mounted Huge Pages File System. [ OK ] Mounted POSIX Message Queue File System. [ OK ] Mounted Configuration File System. [ OK ] Mounted Debug File System. [ OK ] Started Create Static Device Nodes in /dev. Starting udev Kernel Device Manager... [ OK ] Started Apply Kernel Variables. systemd-remount-fs.service: main process exited, code=exited, status=1/FAILURE [FAILED] Failed to start Remount Root and Kernel File Systems. See 'systemctl status systemd-remount-fs.service' for details. Unit systemd-remount-fs.service entered failed state. Starting Load/Save Random Seed... Starting Configure read-only root support... [ OK ] Reached target Local File Systems (Pre). [ OK ] Started Load/Save Random Seed. [ OK ] Started Configure read-only root support. [ OK ] Reached target Local File Systems. Starting Trigger Flushing of Journal to Persistent Storage... Starting Mark the need to relabel after reboot... Starting Create Volatile Files and Directories... [ OK ] Started Mark the need to relabel after reboot. systemd-journal-flush.service: main process exited, code=exited, status=1/FAILURE [FAILED] Failed to start Trigger Flushing of Journal to Persistent Storage. See 'systemctl status systemd-journal-flush.service' for details. Unit systemd-journal-flush.service entered failed state. [ OK ] Started Create Volatile Files and Directories. Starting Update UTMP about System Boot/Shutdown... systemd-update-utmp.service: main process exited, code=exited, status=1/FAILURE [FAILED] Failed to start Update UTMP about System Boot/Shutdown. See 'systemctl status systemd-update-utmp.service' for details. [DEPEND] Dependency failed for Update UTMP about System Runlevel Changes. Unit systemd-update-utmp.service entered failed state. [ OK ] Started udev Coldplug all Devices. [ *** ] A start job is running for udev Kernel Device Manager (24s / 1min 30s)
Need to get a hell of a lot of cleanup to make this work reasonably like I would want, since we want to get to the point where systemd only runs systemd, journald, and httpd.
Got it working. http://rhatdan.wordpress.com/2014/04/30/running-systemd-within-a-docker-container/
On Fedora 20, this only works with docker-io-0.10 but that seems to introduce other regressions.
What regressions are you seeing?
(In reply to Daniel Walsh from comment #29) > What regressions are you seeing? https://bugzilla.redhat.com/show_bug.cgi?id=1094664 and https://bugzilla.redhat.com/show_bug.cgi?id=1086430#c8.
I believe those might be SELinux issues.
Might be interesting for you: http://maci0.wordpress.com/2014/07/23/run-systemd-in-an-unprivileged-docker-container/
Note for others, to save investigation time: This ALL CANNOT work on RHEL 6 OS Type kernels: https://bugs.freedesktop.org/show_bug.cgi?id=90517