Hide Forgot
Description of problem: Output from docker exec sometimes isn't returned/displayed. I've seen this in both 1.11.2 and 1.12.1. I think this may apply to docker attach as well, but I have only ever tested with exec. Version-Release number of selected component (if applicable): 1.11.x, 1.12.x How reproducible: sporadically, but often enough to be problematic Steps to Reproduce: 1. sudo docker run -d --name nginx nginx 2. while true; do output=$(sudo docker exec nginx echo running in container); echo "${output}"; if [ "${output}" != "running in container" ]; then echo OMG; break; fi; done Actual results: After numerous iterations, each printing out running in container correctly, I see a blank line followed by my error indicator (OMG). Here is the tail of the results: running in container running in container running in container OMG Expected results: An endless display of running in container and never any blank lines. Additional info: Upstream issue: https://github.com/docker/docker/issues/27289 Fixed by: https://github.com/docker/docker/pull/27467
Candidate fix is being worked/tested here: https://github.com/projectatomic/docker/pull/212
Andy confirmed in the github comment above that Nalin's patch resolves this issue. The patch is included in the package we originally delivered at 7.3.1 dev freeze.
Ian - does this BZ need to be included in an errata so QE can verify / close it?
The fix in docker-1.12.3-4.el7.x86_64 works well for me, move to verified
was able to verify last night on rhel 7.3.0 with following setup: running in container running in container running in container running in container running in container rpc error: code = 2 desc = containerd: container not started OMG [root@rhel-730 ~]# [root@rhel-730 ~]# docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES ea8413ec4d93 httpd "httpd-foreground" 14 hours ago Up 14 hours 80/tcp loving_agnesi 165d5a7aba11 nginx "nginx -g 'daemon off" 16 hours ago Up 16 hours 80/tcp, 443/tcp nginx 04f9249f23e9 cockpit/kubernetes:latest "/usr/libexec/cockpit" 16 hours ago Up 16 hours 0.0.0.0:9090->9090/tcp atomic-registry-console 5d3bd0447229 openshift/origin:latest "/usr/bin/openshift s" 16 hours ago Up 16 hours 53/tcp, 0.0.0.0:8443->8443/tcp atomic-registry-master b1f6b13574e2 openshift/origin-docker-registry:latest "/bin/sh -c 'DOCKER_R" 16 hours ago Up 16 hours atomic-registry [root@rhel-730 ~]# docker version Client: Version: 1.12.3 API version: 1.24 Package version: docker-common-1.12.3-4.el7.x86_64 Go version: go1.6.2 Git commit: f320458-redhat Built: Mon Nov 7 10:15:24 2016 OS/Arch: linux/amd64 Server: Version: 1.12.3 API version: 1.24 Package version: docker-common-1.12.3-4.el7.x86_64 Go version: go1.6.2 Git commit: f320458-redhat Built: Mon Nov 7 10:15:24 2016 OS/Arch: linux/amd64 [root@rhel-730 ~]#
multiple setups (different machines, users) reproduced using the same docker version.
David, the output you pasted is a different failure condition. The failure I experienced was that when you run 'docker exec ...', sometimes you get no output whatsoever. Your failure shows that docker/containerd thinks that the container isn't started for some reason. Do you have the docker daemon logs from around the time this error occurred?
Andy, So, we are unable to reproduce the original "no output" issue, but may now have unmasked or created a new issue where exec thinks the container hasn't started. Should we close this BZ out and open up a new one to track the new problem? (And, if we do, should the new BZ be an immediate blocker?)
It's hard to say without root causing the new failure if Nalin's patch is the issue. I would recommend verifying/closing this BZ and opening a new one to track the 'container not started' issue. I initially was going to say the new bug shouldn't be a blocker, but I'm starting to lean toward 'yes' given that David said he can reproduce the issue in multiple environments. David, could you please clarify how many environments and how easy it was to reproduce? Also, was it always "container not started", or did you sometimes get a blank line (this issue)?
I see the difference between our issue and the original - fine with calling it a new issue and opening a new bug.
System logs from the failure on my virt. Not useful, but including anyway: Nov 15 00:15:04 esm-rhel7-d12-5 dockerd-current: time="2016-11-15T00:15:04.332109579-05:00" level=info msg="{Action=start, Username=root, LoginUID=0, PID=22894}" Nov 15 00:15:04 esm-rhel7-d12-5 dockerd-current: time="2016-11-15T00:15:04.386343921-05:00" level=error msg="Error running exec in container: rpc error: code = 2 desc = containerd: container not started" docker-1.12.3-4.el7.x86_64
Do we have the journal output (journalctl -u docker -n 100) from the time of the failure? There could be information about why the container wasn't started.
Would be nice to also have "shim-log.json" and "log.json" (https://github.com/docker/containerd/blob/master/runtime/container.go#L617 and https://github.com/docker/containerd/blob/master/runtime/container.go#L629) Those should be somewhere in /var/run/docker (or /var/run/containerd to my best knowledge)
(In reply to Antonio Murdaca from comment #16) > Would be nice to also have "shim-log.json" and "log.json" > (https://github.com/docker/containerd/blob/master/runtime/container.go#L617 > and > https://github.com/docker/containerd/blob/master/runtime/container.go#L629) > Those should be somewhere in /var/run/docker (or /var/run/containerd to my > best knowledge) reason for this is that it seems that "container not started" error can only come from that code path inside this if https://github.com/docker/containerd/blob/master/runtime/container.go#L614
In particular, after running 'docker start nginx' to force the container's filesystem to be mounted, the contents of /var/run/docker/libcontainerd/containerd/ should have containerd's logs.
I've got a failure: rpc error: code = 2 desc = containerd: container not started OMG Tue Nov 15 18:39:44 EST 2016 I took the liberty of running docker ps, hope that's OK: CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 5e52ae521795 nginx "nginx -g 'daemon off" 7 hours ago Up 7 hours 80/tcp, 443/tcp nginx Relevant (to my eye) section of /var/log/messages: Nov 15 18:39:38 esm-rhel7-d12-5 dockerd-current: time="2016-11-15T18:39:38.403054161-05:00" level=debug msg="form data: {\"AttachStderr\":true,\"AttachStdin\":false,\"AttachStdout\":true,\"Cmd\":[\"echo\",\"running\",\"in\",\"container\"],\"Detach\":false,\"DetachKeys\":\"\",\"Privileged\":false,\"Tty\":false,\"User\":\"\"}" Nov 15 18:39:38 esm-rhel7-d12-5 dockerd-current: time="2016-11-15T18:39:38.403158712-05:00" level=info msg="{Action=exec, Username=root, LoginUID=0, PID=20707}" Nov 15 18:39:38 esm-rhel7-d12-5 dockerd-current: time="2016-11-15T18:39:38.405863542-05:00" level=debug msg="AuthZ request using plugin rhel-push-plugin" Nov 15 18:39:38 esm-rhel7-d12-5 dockerd-current: time="2016-11-15T18:39:38.406222991-05:00" level=debug msg="AuthZ response using plugin rhel-push-plugin" Nov 15 18:39:38 esm-rhel7-d12-5 dockerd-current: time="2016-11-15T18:39:38.406779523-05:00" level=debug msg="Calling POST /v1.24/exec/3dd59c1e036e700a9402e2780d68a12a3b3716d2d91bd626c734c361db4ab42d/start" Nov 15 18:39:38 esm-rhel7-d12-5 systemd: Scope libcontainer-20718-systemd-test-default-dependencies.scope has no PIDs. Refusing. [ repeated 6,763 times ] Nov 15 18:42:38 esm-rhel7-d12-5 journal: Suppressed 1461 messages from /system.slice/docker.service No shim logs I can see, but I don't think we were expecting any: # find /var/run/docker -name 'shim*' -ls 80972938 0 -rw-r--r-- 1 root root 0 Nov 15 10:48 /var/run/docker/libcontainerd/containerd/5e52ae521795ee5dad9d7ff5ce2181098445d2f9bb836149c56d2d33ead39f10/init/shim-log.json Please contact me directly for access to the host. I will not run any more docker commands on it or do anything else that could affect its state.
Ran the reproducer for over 4 days using docker-latest.x86_64 0:1.12.3-9.el7. No errors were produced.
After evaluating this issue, there are no plans to address it further or fix it in an upcoming release. Therefore, it is being closed. If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened.