With docker-io-1.4.1-27.git76baa35.fc22.x86_64 "docker run busybox echo hello" works fine. It fails with: docker-1.5.0-24.git5ebfacd.fc22.x86_64 Looks like the fatal error below may be: Failed to determine unit of process 1074 : Invalid argument An issue with the systemd cgroup integration? Mar 24 22:32:20 localhost.localdomain docker[826]: time="2015-03-24T22:32:20Z" level=info msg="+job wait(179a5460785b92c923ecdeecc52ee81119a1cf85a6c1bf52c32a3321bfac9543)" Mar 24 22:32:20 localhost.localdomain docker[826]: time="2015-03-24T22:32:20Z" level=info msg="POST /v1.18/containers/179a5460785b92c923ecdeecc52ee81119a1cf85a6c1bf52c32a3321bfac9543/wait" Mar 24 22:32:20 localhost.localdomain docker[826]: time="2015-03-24T22:32:20Z" level=info msg="-job attach(179a5460785b92c923ecdeecc52ee81119a1cf85a6c1bf52c32a3321bfac9543) = OK (0)" Mar 24 22:32:20 localhost.localdomain docker[826]: time="2015-03-24T22:32:20Z" level=info msg="-job release_interface(179a5460785b92c923ecdeecc52ee81119a1cf85a6c1bf52c32a3321bfac9543) = OK (0)" Mar 24 22:32:20 localhost.localdomain docker[826]: time="2015-03-24T22:32:20Z" level=info msg="+job release_interface(179a5460785b92c923ecdeecc52ee81119a1cf85a6c1bf52c32a3321bfac9543)" Mar 24 22:32:20 localhost.localdomain docker[826]: time="2015-03-24T22:32:20Z" level=warning msg="Your kernel does not support OOM notifications: open /sys/fs/cgroup/memory/system.slice/docker-179a5460785b92c923ecdeecc52ee81119a1cf85a6c1bf52c32a3321bfac9543.scope/memory.oom_control: no such file or directory" Mar 24 22:32:20 localhost.localdomain docker[826]: time="2015-03-24T22:32:20Z" level=info msg="-job start(179a5460785b92c923ecdeecc52ee81119a1cf85a6c1bf52c32a3321bfac9543) = OK (0)" Mar 24 22:32:20 localhost.localdomain docker[826]: time="2015-03-24T22:32:20Z" level=error msg="Unable to RegisterMachine hopeful_hodgkin for 179a5460785b92c923ecdeecc52ee81119a1cf85a6c1bf52c32a3321bfac9543: Failed to determine unit of process 1074 : Invalid argument" Mar 24 22:32:20 localhost.localdomain docker[826]: time="2015-03-24T22:32:20Z" level=info msg="-job log(start, 179a5460785b92c923ecdeecc52ee81119a1cf85a6c1bf52c32a3321bfac9543, busybox:latest) = OK (0)" Mar 24 22:32:20 localhost.localdomain docker[826]: time="2015-03-24T22:32:20Z" level=info msg="+job log(start, 179a5460785b92c923ecdeecc52ee81119a1cf85a6c1bf52c32a3321bfac9543, busybox:latest)" Mar 24 22:32:20 localhost.localdomain docker[826]: time="2015-03-24T22:32:20Z" level=info msg="-job allocate_interface(179a5460785b92c923ecdeecc52ee81119a1cf85a6c1bf52c32a3321bfac9543) = OK (0)" Mar 24 22:32:20 localhost.localdomain docker[826]: time="2015-03-24T22:32:20Z" level=info msg="+job allocate_interface(179a5460785b92c923ecdeecc52ee81119a1cf85a6c1bf52c32a3321bfac9543)" Mar 24 22:32:20 localhost.localdomain docker[826]: time="2015-03-24T22:32:20Z" level=info msg="+job start(179a5460785b92c923ecdeecc52ee81119a1cf85a6c1bf52c32a3321bfac9543)" Mar 24 22:32:20 localhost.localdomain docker[826]: time="2015-03-24T22:32:20Z" level=info msg="POST /v1.18/containers/179a5460785b92c923ecdeecc52ee81119a1cf85a6c1bf52c32a3321bfac9543/start" Mar 24 22:32:20 localhost.localdomain docker[826]: time="2015-03-24T22:32:20Z" level=info msg="+job attach(179a5460785b92c923ecdeecc52ee81119a1cf85a6c1bf52c32a3321bfac9543)" Mar 24 22:32:20 localhost.localdomain docker[826]: time="2015-03-24T22:32:20Z" level=info msg="-job container_inspect(179a5460785b92c923ecdeecc52ee81119a1cf85a6c1bf52c32a3321bfac9543) = OK (0)" Mar 24 22:32:20 localhost.localdomain docker[826]: time="2015-03-24T22:32:20Z" level=info msg="+job container_inspect(179a5460785b92c923ecdeecc52ee81119a1cf85a6c1bf52c32a3321bfac9543)" Mar 24 22:32:20 localhost.localdomain docker[826]: time="2015-03-24T22:32:20Z" level=info msg="POST /v1.18/containers/179a5460785b92c923ecdeecc52ee81119a1cf85a6c1bf52c32a3321bfac9543/attach?stderr=1&stdout=1&stream=1" Mar 24 22:32:20 localhost.localdomain docker[826]: time="2015-03-24T22:32:20Z" level=info msg="-job create() = OK (0)" Mar 24 22:32:20 localhost.localdomain docker[826]: time="2015-03-24T22:32:20Z" level=info msg="-job log(create, 179a5460785b92c923ecdeecc52ee81119a1cf85a6c1bf52c32a3321bfac9543, busybox:latest) = OK (0)" Mar 24 22:32:20 localhost.localdomain docker[826]: time="2015-03-24T22:32:20Z" level=info msg="+job log(create, 179a5460785b92c923ecdeecc52ee81119a1cf85a6c1bf52c32a3321bfac9543, busybox:latest)" Mar 24 22:32:20 localhost.localdomain docker[826]: time="2015-03-24T22:32:20Z" level=info msg="+job create()" Mar 24 22:32:20 localhost.localdomain docker[826]: time="2015-03-24T22:32:20Z" level=info msg="POST /v1.18/containers/create"
Ok, I think this is a race condition between RegisterMachine and a container that exits quickly. This happens with a simple "docker run centos true", but works fine with "docker run centos sleep 5".
Are you just seeing the error in the logs? Or are you seeing a fatal error. We saw this in testing, and just basically ignored the failure. You are correct that we register with a pid, which could have disappeared before we register the machine.
(In reply to Daniel Walsh from comment #2) > Are you just seeing the error in the logs? Or are you seeing a fatal error. > > We saw this in testing, and just basically ignored the failure. I have a wedged container that appears running but can't be stopped. > You are correct that we register with a pid, which could have disappeared > before we register the machine. I think the parent process here needs to avoid calling waitpid() on it until the RegisterMachine call has returned. Ah, looking at the code there's even a comment for this. Although, I'm not sure offhand why this causes an issue if we're just logging an error.
I am not sure we can get the pid of the child process and register the machine before the process exits. I though the WaitFor
Is there a way for us to know the error is PID does not exist, in which case we can just quietly ignore the error.
(In reply to Daniel Walsh from comment #5) > Is there a way for us to know the error is PID does not exist, in which case > we can just quietly ignore the error. That would cover up the race condition here where the pid is recycled and we end up calling RegisterMachine for someone else's process. I'd much prefer the docker daemon avoided calling waitpid() until RegisterMachine was done.
I am trying to move it to the StartCallback, which looks like it should be called before waitpid, but I still see the problem 1 out of 5 tries. :^( I am asking Michael Crosby if he has any ideas.
This happens for Fedora-Cloud-Atomic-22_Beta_TC8-20150405.x86_64.qcow2: -bash-4.3# atomic host status TIMESTAMP (UTC) VERSION ID OSNAME REFSPEC * 2015-04-15 12:50:37 22.39 a8d8656489 fedora-atomic fedora-atomic:fedora-atomic/f22/x86_64/docker-host -bash-4.3# rpm -q docker docker-1.5.0-25.git5ebfacd.fc22.x86_64 The command I am trying to run is: docker run -it --rm busybox true && echo "PASS" The logs give me: time="2015-05-07T15:54:43Z" level=error msg="Unable to RegisterMachine mad_bartik for def45f097c895cec20c407eea09bf6887caa08750903fc7624097261ce404147: Failed to determine unit of process 1150 : Invalid argument"
I just looked at this (again) and I am unable to reproduce. Docker 1.6.2, RHEL 7 I don't get any errors with either: docker run -it --rm busybox true && echo "PASS" docker run busybox echo hello Anything else/other versions/distributions I should try?
and also can't reproduce with Docker 1.7 Fedora 21
You were not testing with my patches. Make sure you use the systems docker to see this not just vanilla docker from github
I submitted https://github.com/rhatdan/docker/pull/88
Now that we have this working with hooks. Sally you need to get this added to the docker package.
Using Mrunal's hooks (https://github.com/mrunalp/runc/tree/feature/hooks) I have a branch (https://github.com/sallyom/runc/tree/registermachine) that implements RegisterMachine/TerminateMachine with no race condition. Will open PR once hooks are merged.
Docker-1.8 should no longer have the registry machine patch. As Sally states we are attempting to do this a different way in the future.
Probably fixed long back. Putting this ON_QA so we can close this one out soon. We have docker 1.10.1 in testing repos.
Fedora 22 changed to end-of-life (EOL) status on 2016-07-19. Fedora 22 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. If you are unable to reopen this bug, please file a new report against the current release. If you experience problems, please add a comment to this bug. Thank you for reporting this bug and we are sorry it could not be fixed.