Bug 1514511
| Summary: | systemd-machined can stop the docker.service | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Michele Baldessari <michele> | ||||
| Component: | oci-register-machine | Assignee: | Lokesh Mandvekar <lsm5> | ||||
| Status: | CLOSED ERRATA | QA Contact: | atomic-bugs <atomic-bugs> | ||||
| Severity: | urgent | Docs Contact: | |||||
| Priority: | urgent | ||||||
| Version: | 7.4 | CC: | amurdaca, bdobreli, chjones, dciabrin, dprince, dwalsh, fdinitto, fkluknav, jjoyce, jschluet, lfriedma, lsm5, mkrcmari, mpatel, msekleta, oblaut, sasha, sbaker, slinaber, systemd-maint-list, tbarron, tvignaud | ||||
| Target Milestone: | pre-dev-freeze | Keywords: | Extras, Triaged | ||||
| Target Release: | 7.5 | ||||||
| Hardware: | All | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | oci-register-machine-0-3.14.gitcd1e331.el7_4 | Doc Type: | If docs needed, set a value | ||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2018-01-25 12:57:03 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
|
Description
Michele Baldessari
2017-11-17 16:03:06 UTC
This is just a hunch, but I think that this might be a bug in docker. When I start docker I see that it uses RegisterMachine dbus method: "RegisterMachine() is similar to CreateMachine(), however only registers a machine, but does not create a scope unit for it. The caller's unit will be registered instead." So the unit is determined from the cgroup of leader process: https://github.com/lnykryn/systemd-rhel/blob/staging/src/machine/machined-dbus.c#L351 I don't know how this is implemented in docker, so let's reassign this to them, with a question: Could there be a race between docker starting the leader process, creating a scope and registering the process in machined? The register machine code is called within runc, outside of docker. It should be happening after PID1 was created inside of the container, but not running. Could this be happening, if docker is not using systemd for managing the cgroup, so it gets confused? To be more precise. libcontainer calls StartTransientUnit method oci-register-machine calls RegisterMachine method Could there be a race between those two calls? StartTransientUnit is asynchronous. Return value is DBus object path and one needs to subscribe to DBus signal notifications about that object. Once job is completed, signal with result is sent out by systemd. StartTransientUnit from go-systemd actually allows to wait for job completion (by supplying channel where you later wait for the message containing job result [1]). However, libcontainer currently doesn't pass any channel into StartTransientUnit [2]. Hence code called after StartTransientUnit is racing with systemd and systemd-machined may still see the container PID as part of docker.service, because transient unit doesn't exist yet and PID is not yet migrated into new cgroup. [1] https://github.com/coreos/go-systemd/blob/master/dbus/methods.go#L136 [2] https://github.com/docker/libcontainer/blob/master/cgroups/systemd/apply_systemd.go#L201 This may NOT be related but I can see often that some containers exit/fail to start on something like: Nov 18 01:57:57 controller-1 dockerd-current[2664]: time="2017-11-18T01:57:56.990514935Z" level=error msg="Create container failed with error: invalid header field value \"oci runtime error: container_linux.go:247: starting container process caused \\\"process_linux.go:258: applying cgroup configuration for process caused \\\\\\\"open /sys/fs/cgroup/pids/system.slice/docker-20152b32aea39235a8a0d2ed7b3559456c3e3aa6ccd201128ec9f5bccdb69006.scope/cgroup.procs: no such file or directory\\\\\\\"\\\"\\n\"" The docker service is up and running in such case, sometimes I can see as described in bug description - docker service is stopped, so all containers are stopped. Just mentioning since cgroups were mentioned. Mrunal could this be a bug in runc? We had a reoccurrence today on our environment, so I'll add some details probably worth mentionning:
* Our machines are VM, configured with debug logs enabled in docker and systemd
[root@controller-1 ~]# cat /etc/docker/daemon.json
{"debug":true}
[root@controller-1 ~]# cat /etc/systemd/system.conf | grep debug
LogLevel=debug
* Like previously, one of the controller was rebooted ungracefully, via "echo c > /proc/sysrq-trigger"
* On restart, one container (75568d917f5be14168ef2b961825e834118e795296457d65eb9de5092ebcd36b)
seemed to have caused the stop of docker service.
I don't know if it's worth noting, during restart, but in the journal libcontainer complained about client being "out of sync":
Nov 21 23:31:18 controller-1 dockerd-current[2688]: time="2017-11-21T23:31:18.805758013Z" level=debug msg="Loaded container 75568d917f5be14168ef2b961825e834118e795296457d65eb9de5092ebcd36b"
Nov 21 23:31:19 controller-1 dockerd-current[2688]: time="2017-11-21T23:31:19.318214148Z" level=warning msg="libcontainerd: client is out of sync, restore was called on a fully synced contai
Nov 21 23:31:19 controller-1 dockerd-current[2688]: time="2017-11-21T23:31:19.318729253Z" level=warning msg="failed to cleanup ipc mounts:\nfailed to umount /var/lib/docker/containers/75568d
Nov 21 23:31:22 controller-1 dockerd-current[2688]: time="2017-11-21T23:31:22.202612037Z" level=info msg="Removing stale sandbox 78f3d3aee2d232ceef89da0756ab58be14e326ba48c30b3698678950c70d7
Nov 21 23:31:26 controller-1 dockerd-current[2688]: time="2017-11-21T23:31:26.633063633Z" level=debug msg="Starting container 75568d917f5be14168ef2b961825e834118e795296457d65eb9de5092ebcd36b
Nov 21 23:31:28 controller-1 dockerd-current[2688]: time="2017-11-21T23:31:28.268646949Z" level=debug msg="createSpec: cgroupsPath: system.slice:docker:75568d917f5be14168ef2b961825e834118e79
Nov 21 23:31:29 controller-1 systemd[1]: Started docker container 75568d917f5be14168ef2b961825e834118e795296457d65eb9de5092ebcd36b.
Nov 21 23:31:29 controller-1 systemd[1]: Starting docker container 75568d917f5be14168ef2b961825e834118e795296457d65eb9de5092ebcd36b.
Nov 21 23:31:30 controller-1 oci-register-machine[5433]: 2017/11/21 23:31:30 Register machine: prestart 75568d917f5be14168ef2b961825e834118e795296457d65eb9de5092ebcd36b 5216 /var/lib/docker/
Nov 21 23:31:30 controller-1 systemd-machined[4105]: New machine 75568d917f5be14168ef2b961825e834.
* Then when the aforementionned controller had too many consecutive restart failure, systemd got called to stop what is obviously the wrong service (docker itself):
Nov 21 23:33:18 controller-1 dockerd-current[2688]: time="2017-11-21T23:33:18.037365080Z" level=debug msg="libcontainerd: received containerd event: &types.Event{Type:\"start-process\", Id:\
Nov 21 23:33:18 controller-1 dockerd-current[2688]: time="2017-11-21T23:33:18.037473326Z" level=debug msg="libcontainerd: event unhandled: type:\"start-process\" id:\"75568d917f5be14168ef2b9
Nov 21 23:33:18 controller-1 dockerd-current[2688]: time="2017-11-21T23:33:18.239694897Z" level=debug msg="containerd: process exited" id=75568d917f5be14168ef2b961825e834118e795296457d65eb9d
Nov 21 23:33:18 controller-1 dockerd-current[2688]: time="2017-11-21T23:33:18.240014402Z" level=debug msg="libcontainerd: received containerd event: &types.Event{Type:\"exit\", Id:\"75568d91
Nov 21 23:33:28 controller-1 systemd[1]: Stopping Docker Application Container Engine...
Nov 21 23:33:28 controller-1 dockerd-current[2688]: time="2017-11-21T23:33:28.173239468Z" level=debug msg="stopping 75568d917f5be14168ef2b961825e834118e795296457d65eb9de5092ebcd36b"
Nov 21 23:33:29 controller-1 dockerd-current[2688]: time="2017-11-21T23:33:28.173298775Z" level=debug msg="Sending 15 to 75568d917f5be14168ef2b961825e834118e795296457d65eb9de5092ebcd36b"
Nov 21 23:33:30 controller-1 oci-register-machine[22832]: 2017/11/21 23:33:30 Register machine: poststop 75568d917f5be14168ef2b961825e834118e795296457d65eb9de5092ebcd36b 0 /var/lib/docker/ov
Nov 21 23:33:29 controller-1 systemd[1]: Got cgroup empty notification for: /system.slice/docker-75568d917f5be14168ef2b961825e834118e795296457d65eb9de5092ebcd36b.scope
Nov 21 23:33:29 controller-1 systemd[1]: docker-75568d917f5be14168ef2b961825e834118e795296457d65eb9de5092ebcd36b.scope: cgroup is empty
Nov 21 23:33:29 controller-1 systemd[1]: docker-75568d917f5be14168ef2b961825e834118e795296457d65eb9de5092ebcd36b.scope changed running -> dead
Nov 21 23:33:29 controller-1 systemd[1]: Collecting docker-75568d917f5be14168ef2b961825e834118e795296457d65eb9de5092ebcd36b.scope
Nov 21 23:33:30 controller-1 systemd[1]: Failed to load configuration for docker-75568d917f5be14168ef2b961825e834118e795296457d65eb9de5092ebcd36b.scope: No such file or directory
* The last log seems interesting: there's no such scope for docker-75568d917f5be14168ef2b961825e834118e795296457d65eb9de5092ebcd36b.scope in /var/run/systemd/machines.
Instead the only scope symlink that [root@controller-1 ~]# ls -l /var/run/systemd/machines/ | grep 75568d917f5be14168ef2b961825e834
-rw-r--r--. 1 root root 326 Nov 22 00:00 75568d917f5be14168ef2b961825e834
lrwxrwxrwx. 1 root root 32 Nov 22 00:00 unit:docker.service -> 75568d917f5be14168ef2b961825e834
so this "unit:docker.service" symlink is probably the reason why systemd is stopping docker instead of the failing container.
I'm attaching journal of the machine in case it brings any value
Created attachment 1357319 [details]
journal log from the last boot, where docker stopped unexpectedly
Simple fix edit /etc/oci-register-machine and change the disable flag to "true" I've pushed an upstream fix to puppet-tripleo here for OSP12: https://review.openstack.org/522388 Set disable:true for oci-machine-register Also pushed the fix downstream for OSP12 testing: https://code.engineering.redhat.com/gerrit/124023 Set disable:true for oci-machine-register sorry for the newbie question here but, aside from working around this race condition, what does changing the default implies in terms of behavior of the system? there has to be a reason why we register the containers with systemd-machined in the first place. Also the proper fix here should not be that hard. You just need to make the start of scope synchronous or wait for it in oci-registermachine. Michal already posted here the relevant parts of the code. Register machine is mainly used for systemd based containers, so you can see the log/journal information on the host without entering the container. Lukáš , Mrunal will work on the proper fix, but I believe it has to go into runc which involves processing it through the upstream. This bug has existed for a while, and for people who do not need RegisterMachine, it is easy to turn off while we go through the upstream process. We should clone this bug on runc to make sure the true fix gets in. This bugzilla has been removed from the release since it has not been triaged, and needs to be reviewed for targeting another release. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:0196 |