Bug 1389474

Summary: Output from stdout for `docker exec` sometimes isn't returned / displayed for docker 1.11.x and 1.12.x
Product: Red Hat Enterprise Linux 7 Reporter: Andy Goldstein <agoldste>
Component: dockerAssignee: Nalin Dahyabhai <nalin>
Status: CLOSED WONTFIX QA Contact: atomic-bugs <atomic-bugs>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 7.3CC: amurdaca, ddarrah, eparis, imcleod, jhonce, lfriedma, lsm5, pasik, santiago, xtian
Target Milestone: rcKeywords: Extras
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1397182 (view as bug list) Environment:
Last Closed: 2020-12-15 07:48:34 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 1397182    

Description Andy Goldstein 2016-10-27 16:16:10 UTC
Description of problem: Output from docker exec sometimes isn't returned/displayed. I've seen this in both 1.11.2 and 1.12.1. I think this may apply to docker attach as well, but I have only ever tested with exec.


Version-Release number of selected component (if applicable): 1.11.x, 1.12.x


How reproducible: sporadically, but often enough to be problematic


Steps to Reproduce:
1. sudo docker run -d --name nginx nginx
2. while true; do output=$(sudo docker exec nginx echo running in container); echo "${output}"; if [ "${output}" != "running in container" ]; then echo OMG; break; fi; done

Actual results:

After numerous iterations, each printing out running in container correctly, I see a blank line followed by my error indicator (OMG). Here is the tail of the results:

running in container
running in container
running in container

OMG


Expected results: An endless display of running in container and never any blank lines.


Additional info:

Upstream issue: https://github.com/docker/docker/issues/27289
Fixed by: https://github.com/docker/docker/pull/27467

Comment 2 Ian McLeod 2016-10-31 20:48:41 UTC
Candidate fix is being worked/tested here:

https://github.com/projectatomic/docker/pull/212

Comment 3 Ian McLeod 2016-11-14 19:28:41 UTC
Andy confirmed in the github comment above that Nalin's patch resolves this issue.  The patch is included in the package we originally delivered at 7.3.1 dev freeze.

Comment 4 Laurie Friedman 2016-11-14 19:46:24 UTC
Ian - does this BZ need to be included in an errata so QE can verify / close it?

Comment 6 Luwen Su 2016-11-15 06:20:25 UTC
The fix in docker-1.12.3-4.el7.x86_64 works well for me, move to verified

Comment 7 David Darrah/Red Hat QE 2016-11-15 12:39:04 UTC
was able to verify last night on rhel 7.3.0 with following setup:

running in container
running in container
running in container
running in container
running in container
rpc error: code = 2 desc = containerd: container not started
OMG
[root@rhel-730 ~]# 
[root@rhel-730 ~]# docker ps
CONTAINER ID        IMAGE                                     COMMAND                  CREATED             STATUS              PORTS                            NAMES
ea8413ec4d93        httpd                                     "httpd-foreground"       14 hours ago        Up 14 hours         80/tcp                           loving_agnesi
165d5a7aba11        nginx                                     "nginx -g 'daemon off"   16 hours ago        Up 16 hours         80/tcp, 443/tcp                  nginx
04f9249f23e9        cockpit/kubernetes:latest                 "/usr/libexec/cockpit"   16 hours ago        Up 16 hours         0.0.0.0:9090->9090/tcp           atomic-registry-console
5d3bd0447229        openshift/origin:latest                   "/usr/bin/openshift s"   16 hours ago        Up 16 hours         53/tcp, 0.0.0.0:8443->8443/tcp   atomic-registry-master
b1f6b13574e2        openshift/origin-docker-registry:latest   "/bin/sh -c 'DOCKER_R"   16 hours ago        Up 16 hours                                          atomic-registry
[root@rhel-730 ~]# docker version
Client:
 Version:         1.12.3
 API version:     1.24
 Package version: docker-common-1.12.3-4.el7.x86_64
 Go version:      go1.6.2
 Git commit:      f320458-redhat
 Built:           Mon Nov  7 10:15:24 2016
 OS/Arch:         linux/amd64

Server:
 Version:         1.12.3
 API version:     1.24
 Package version: docker-common-1.12.3-4.el7.x86_64
 Go version:      go1.6.2
 Git commit:      f320458-redhat
 Built:           Mon Nov  7 10:15:24 2016
 OS/Arch:         linux/amd64
[root@rhel-730 ~]#

Comment 8 David Darrah/Red Hat QE 2016-11-15 12:43:44 UTC
multiple setups (different machines, users) reproduced using the same docker version.

Comment 9 Andy Goldstein 2016-11-15 13:48:12 UTC
David, the output you pasted is a different failure condition. The failure I experienced was that when you run 'docker exec ...', sometimes you get no output whatsoever. Your failure shows that docker/containerd thinks that the container isn't started for some reason.

Do you have the docker daemon logs from around the time this error occurred?

Comment 10 Ian McLeod 2016-11-15 15:14:56 UTC
Andy,

So, we are unable to reproduce the original "no output" issue, but may now have unmasked or created a new issue where exec thinks the container hasn't started.

Should we close this BZ out and open up a new one to track the new problem?  (And, if we do, should the new BZ be an immediate blocker?)

Comment 11 Andy Goldstein 2016-11-15 15:21:08 UTC
It's hard to say without root causing the new failure if Nalin's patch is the issue. I would recommend verifying/closing this BZ and opening a new one to track the 'container not started' issue. I initially was going to say the new bug shouldn't be a blocker, but I'm starting to lean toward 'yes' given that David said he can reproduce the issue in multiple environments. David, could you please clarify how many environments and how easy it was to reproduce? Also, was it always "container not started", or did you sometimes get a blank line (this issue)?

Comment 13 David Darrah/Red Hat QE 2016-11-15 15:30:35 UTC
I see the difference between our issue and the original - fine with calling it a new issue and opening a new bug.

Comment 14 Ed Santiago 2016-11-15 15:44:19 UTC
System logs from the failure on my virt. Not useful, but including anyway:

    Nov 15 00:15:04 esm-rhel7-d12-5 dockerd-current: time="2016-11-15T00:15:04.332109579-05:00" level=info msg="{Action=start, Username=root, LoginUID=0, PID=22894}"
    Nov 15 00:15:04 esm-rhel7-d12-5 dockerd-current: time="2016-11-15T00:15:04.386343921-05:00" level=error msg="Error running exec in container: rpc error: code = 2 desc = containerd: container not started"

docker-1.12.3-4.el7.x86_64

Comment 15 Nalin Dahyabhai 2016-11-15 16:35:05 UTC
Do we have the journal output (journalctl -u docker -n 100) from the time of the failure?  There could be information about why the container wasn't started.

Comment 16 Antonio Murdaca 2016-11-15 16:45:50 UTC
Would be nice to also have "shim-log.json" and "log.json" (https://github.com/docker/containerd/blob/master/runtime/container.go#L617 and https://github.com/docker/containerd/blob/master/runtime/container.go#L629)
Those should be somewhere in /var/run/docker (or /var/run/containerd to my best knowledge)

Comment 17 Antonio Murdaca 2016-11-15 16:46:37 UTC
(In reply to Antonio Murdaca from comment #16)
> Would be nice to also have "shim-log.json" and "log.json"
> (https://github.com/docker/containerd/blob/master/runtime/container.go#L617
> and
> https://github.com/docker/containerd/blob/master/runtime/container.go#L629)
> Those should be somewhere in /var/run/docker (or /var/run/containerd to my
> best knowledge)

reason for this is that it seems that "container not started" error can only come from that code path inside this if https://github.com/docker/containerd/blob/master/runtime/container.go#L614

Comment 18 Nalin Dahyabhai 2016-11-15 17:11:54 UTC
In particular, after running 'docker start nginx' to force the container's filesystem to be mounted, the contents of /var/run/docker/libcontainerd/containerd/ should have containerd's logs.

Comment 19 Ed Santiago 2016-11-15 23:50:55 UTC
I've got a failure:

    rpc error: code = 2 desc = containerd: container not started
    OMG
    Tue Nov 15 18:39:44 EST 2016

I took the liberty of running docker ps, hope that's OK:

    CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS              PORTS               NAMES
    5e52ae521795        nginx               "nginx -g 'daemon off"   7 hours ago         Up 7 hours          80/tcp, 443/tcp     nginx

Relevant (to my eye) section of /var/log/messages:

    Nov 15 18:39:38 esm-rhel7-d12-5 dockerd-current: time="2016-11-15T18:39:38.403054161-05:00" level=debug msg="form data: {\"AttachStderr\":true,\"AttachStdin\":false,\"AttachStdout\":true,\"Cmd\":[\"echo\",\"running\",\"in\",\"container\"],\"Detach\":false,\"DetachKeys\":\"\",\"Privileged\":false,\"Tty\":false,\"User\":\"\"}"
    Nov 15 18:39:38 esm-rhel7-d12-5 dockerd-current: time="2016-11-15T18:39:38.403158712-05:00" level=info msg="{Action=exec, Username=root, LoginUID=0, PID=20707}"
    Nov 15 18:39:38 esm-rhel7-d12-5 dockerd-current: time="2016-11-15T18:39:38.405863542-05:00" level=debug msg="AuthZ request using plugin rhel-push-plugin"
    Nov 15 18:39:38 esm-rhel7-d12-5 dockerd-current: time="2016-11-15T18:39:38.406222991-05:00" level=debug msg="AuthZ response using plugin rhel-push-plugin"
    Nov 15 18:39:38 esm-rhel7-d12-5 dockerd-current: time="2016-11-15T18:39:38.406779523-05:00" level=debug msg="Calling POST /v1.24/exec/3dd59c1e036e700a9402e2780d68a12a3b3716d2d91bd626c734c361db4ab42d/start"
    Nov 15 18:39:38 esm-rhel7-d12-5 systemd: Scope libcontainer-20718-systemd-test-default-dependencies.scope has no PIDs. Refusing.
    [ repeated 6,763 times ]
    Nov 15 18:42:38 esm-rhel7-d12-5 journal: Suppressed 1461 messages from /system.slice/docker.service

No shim logs I can see, but I don't think we were expecting any:

    # find /var/run/docker -name 'shim*' -ls
    80972938    0 -rw-r--r--   1 root     root            0 Nov 15 10:48 /var/run/docker/libcontainerd/containerd/5e52ae521795ee5dad9d7ff5ce2181098445d2f9bb836149c56d2d33ead39f10/init/shim-log.json

Please contact me directly for access to the host. I will not run any more docker commands on it or do anything else that could affect its state.

Comment 22 David Darrah/Red Hat QE 2016-11-21 16:30:00 UTC
Ran the reproducer for over 4 days using docker-latest.x86_64 0:1.12.3-9.el7.  No errors were produced.

Comment 24 RHEL Program Management 2020-12-15 07:48:34 UTC
After evaluating this issue, there are no plans to address it further or fix it in an upcoming release.  Therefore, it is being closed.  If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened.