Bug 1320601

Summary: Unable to start docker container because oci-register-machine fails
Product: [Fedora] Fedora Reporter: Andy Goldstein <agoldste>
Component: dockerAssignee: Lokesh Mandvekar <lsm5>
Status: CLOSED RAWHIDE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: rawhideCC: abb, adimania, admiller, amurdaca, dwalsh, ichavero, jcajka, jchaloup, lsm5, marianne, miminar, mpatel, msekleta, systemd-maint, vbatts
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: docker-1.10.3-4.gitf8a9a2a.fc25 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-03-29 18:21:15 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Andy Goldstein 2016-03-23 15:28:33 UTC
Description of problem: Docker containers fail to start because ici-register-machine fails.


Version-Release number of selected component (if applicable): docker-1.10.3-3.gitd93ee51.fc25.x86_64


How reproducible: 100%


Steps to Reproduce:
1. docker run --rm busybox date
2.
3.

Actual results: Container fails to start


Expected results: Container starts


Additional info: from the journal:

Mar 23 11:05:21 hack2015 docker[7326]: time="2016-03-23T11:05:21.655385103-04:00" level=debug msg="Calling POST /v1.22/containers/7b687c0c73f3824e1bc0f8145621b54dd46768f0d24bb
b02de1995895e12fa3c/start"
Mar 23 11:05:21 hack2015 docker[7326]: time="2016-03-23T11:05:21.655413547-04:00" level=debug msg="POST /v1.22/containers/7b687c0c73f3824e1bc0f8145621b54dd46768f0d24bbb02de199
5895e12fa3c/start"
...
[snip]
...
Mar 23 11:05:21 hack2015 oci-register-machine[7740]: 2016/03/23 11:05:21 Register machine: prestart %!d(string=7b687c0c73f3824e1bc0f8145621b54dd46768f0d24bbb02de1995895e12fa3c) %!s(int=7713) /var/lib/docker/devicemapper/mnt/6f9a36c0ab1a6dfeb991b51dbfc2f4636d31cfc71c346ce930ea6c42b396ab2a/rootfs
Mar 23 11:05:21 hack2015 oci-register-machine[7740]: 2016/03/23 11:05:21 Register machine failed: Failed to determine unit of process 7713 : No such device or address
Mar 23 11:05:21 hack2015 docker[7326]: time="2016-03-23T11:05:21.976319905-04:00" level=warning msg="signal: killed"
Mar 23 11:05:21 hack2015 oci-systemd-hook[7746]: systemdhook <debug>: Skipping as container command is date, not init or systemd
Mar 23 11:05:21 hack2015 oci-register-machine[7747]: 2016/03/23 11:05:21 Register machine: poststop %!d(string=7b687c0c73f3824e1bc0f8145621b54dd46768f0d24bbb02de1995895e12fa3c) %!s(int=0) /var/lib/docker/devicemapper/mnt/6f9a36c0ab1a6dfeb991b51dbfc2f4636d31cfc71c346ce930ea6c42b396ab2a/rootfs
Mar 23 11:05:21 hack2015 oci-register-machine[7747]: 2016/03/23 11:05:21 TerminateMachine failed: No machine '7b687c0c73f3824e1bc0f8145621b54dd46768f0d24bbb02de1995895e12fa3c'

Comment 1 Daniel Walsh 2016-03-23 18:21:14 UTC
This looks like a new feature of systemd?

Comment 2 Daniel Walsh 2016-03-23 18:23:00 UTC
systemd guys, is there some new requirement for registering a machine, that is just showing up in rawhide?

Andy could you attach the output of 

cat /proc/self/cgroup

From inside of the container.

Comment 3 Andy Goldstein 2016-03-23 18:26:42 UTC
I can't start any containers, so I can't provide that info. Do you want me to disable the oci-register-machine hook and then get the cgroup info? If so, here it is:

$ sudo docker run --rm busybox cat /proc/self/cgroup
11:hugetlb:/docker/5b93dd25de4f0330f34ac997784f0610983187028ad3240922093f5d76306054
10:devices:/docker/5b93dd25de4f0330f34ac997784f0610983187028ad3240922093f5d76306054
9:cpu,cpuacct:/docker/5b93dd25de4f0330f34ac997784f0610983187028ad3240922093f5d76306054
8:freezer:/docker/5b93dd25de4f0330f34ac997784f0610983187028ad3240922093f5d76306054
7:blkio:/docker/5b93dd25de4f0330f34ac997784f0610983187028ad3240922093f5d76306054
6:pids:/system.slice/docker.service
5:memory:/docker/5b93dd25de4f0330f34ac997784f0610983187028ad3240922093f5d76306054
4:perf_event:/docker/5b93dd25de4f0330f34ac997784f0610983187028ad3240922093f5d76306054
3:net_cls,net_prio:/docker/5b93dd25de4f0330f34ac997784f0610983187028ad3240922093f5d76306054
2:cpuset:/docker/5b93dd25de4f0330f34ac997784f0610983187028ad3240922093f5d76306054
1:name=systemd:/docker/5b93dd25de4f0330f34ac997784f0610983187028ad3240922093f5d76306054


Note, I am running systemd-229-7.fc25.x86_64.

Comment 4 Michal Sekletar 2016-03-23 19:02:17 UTC
At the first glance this looks like a race. I think docker creates scope for the container but doesn't wait for systemd start job to complete and proceeds immediately with registration of the machine. This may fail because machined tries to determine cgroup of the container scope, however cgroup was not realized and processes were not migrated there yet.

Are you sure you wait for completion of scope start job?

Comment 5 Daniel Walsh 2016-03-23 19:43:13 UTC
Andy are you running with the standard docker unit file?

Is your docker daemon running with 

--exec-opt native.cgroupdriver=systemd 

Michal 
We are registering the machine after the process is created, so I figure it is registered with the cgroup.  But Mrunal would no for sure.

Is this a new check in systemd-229-7?  We have not seen this problem before.

Comment 6 Andy Goldstein 2016-03-23 19:47:53 UTC
I'm using the standard unit file from the rpm. The only change I made to /etc/sysconfig/docker was to enable debug logging (-D). It does not appear to be running with any --exec-opt settings. The output from ps is

/usr/bin/docker daemon --selinux-enabled --log-driver=journald -D

Comment 7 Daniel Walsh 2016-03-23 19:49:48 UTC
Could you add that option to /etc/sysconfig/docker and see if that fixes the issue.  docker-1.10 is no longer using systemd by default for setting up cgroups.  We are supposed to have changed the default back in our package.

Comment 8 Andy Goldstein 2016-03-23 20:00:39 UTC
Setting the cgroupdriver to systemd fixed the problem.

Comment 9 Daniel Walsh 2016-03-23 20:49:15 UTC
Lokesh could you make sure this flag is in /etc/sysconfig/docker

Comment 10 Lokesh Mandvekar 2016-03-28 17:43:00 UTC
*** Bug 1321618 has been marked as a duplicate of this bug. ***

Comment 12 Andreu Botella 2016-03-30 14:29:25 UTC
I'm having the same problem in Fedora 24. Any plans to backport the fix?

Comment 13 Daniel Walsh 2016-03-30 14:32:49 UTC
Yes we need to make this the default in F24 and all docker-1.10 packages.

Comment 14 Lokesh Mandvekar 2016-04-01 18:40:21 UTC
(In reply to Andreu Botella from comment #12)
> I'm having the same problem in Fedora 24. Any plans to backport the fix?

Yes, I've added this to f24 as well, the update can be found here: https://bodhi.fedoraproject.org/updates/FEDORA-2016-976f49409c . Please add karma so we can move it to stable quicker.

Comment 15 Daniel Walsh 2016-04-03 09:46:53 UTC
Lokesh that fixes the problem, but you should have put the cgroup option into /etc/sysconfig/docker rather then in the unit file, so people could easily change it.