Bug 1579814
| Summary: | Incorrect exit code of command executed inside container via python API | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Daniele <dconsoli> |
| Component: | python-docker-py | Assignee: | Tomas Tomecek <ttomecek> |
| Status: | CLOSED ERRATA | QA Contact: | atomic-bugs <atomic-bugs> |
| Severity: | medium | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 7.5 | CC: | amurdaca, dornelas, lsu |
| Target Milestone: | rc | Keywords: | Extras |
| Target Release: | --- | ||
| Hardware: | All | ||
| OS: | All | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2018-06-26 18:36:32 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 1186913, 1561841 | ||
Now I understand the confusion.
The problem Motorola is seeing is actually a race condition: the moment they invoke `c.exec_start(id); pprint.pprint(c.exec_inspect(id)`, the command haven't finished yet and hence dockerd reports `ExitCode: None`.
I played with this a bit and came up with this solution:
import docker
c = docker.Client()
c.create_container(image="rhel7", name="python_test", command="sleep infinity", tty=True, stdin_open=True)
c.start("python_test")
id = c.exec_create("python_test", "ls -lha /", tty=True)
stream = c.exec_start(id, detach=False, stream=True)
for i in stream:
print(i) # will print output of the command to stdout
ret = c.exec_inspect(id)
print("Exit code: %s" % ret["ExitCode"])
And running it:
[root@vm ~]# python t.py
total 0
drwxr-xr-x. 1 root root 6 May 18 14:40 .
drwxr-xr-x. 1 root root 6 May 18 14:40 ..
-rwxr-xr-x. 1 root root 0 May 18 14:40 .dockerenv
lrwxrwxrwx. 1 root root 7 Apr 4 17:06 bin -> usr/bin
dr-xr-xr-x. 2 root root 6 Dec 14 17:23 boot
drwxr-xr-x. 5 root root 360 May 18 14:40 dev
drwxr-xr-x. 1 root root 66 May 18 14:40 etc
drwxr-xr-x. 2 root root 6 Apr 4 17:08 home
lrwxrwxrwx. 1 root root 7 Apr 4 17:06 lib -> usr/lib
lrwxrwxrwx. 1 root root 9 Apr 4 17:06 lib64 -> usr/lib64
drwxr-xr-x. 2 root root 6 Dec 14 17:23 media
drwxr-xr-x. 2 root root 6 Dec 14 17:23 mnt
drwxr-xr-x. 2 root root 6 Dec 14 17:23 opt
dr-xr-xr-x. 108 root root 0 May 18 14:40 proc
dr-xr-x---. 1 root root 23 Apr 4 17:12 root
drwxr-xr-x. 1 root root 21 Apr 4 17:12 run
lrwxrwxrwx. 1 root root 8 Apr 4 17:06 sbin -> usr/sbin
drwxr-xr-x. 2 root root 6 Dec 14 17:23 srv
dr-xr-xr-x. 13 root root 0 May 18 14:14 sys
drwxrwxrwt. 7 root root 132 Apr 4 17:08 tmp
drwxr-xr-x. 13 root root 155 Apr 4 17:06 usr
drwxr-xr-x. 18 root root 238 Apr 4 17:06 var
Exit code: 0
When I used the touch command, I was still hitting the race condition, probably because 'touch' doesn't output anything to stdout (is this a bug in dockerd)?
I also recommend reading docker's source code on how it handles exec:
https://github.com/projectatomic/docker/blob/docker-1.13.1/cli/command/container/exec.go
https://github.com/projectatomic/docker/blob/docker-1.13.1/client/container_exec.go
Please let me know whether the proposed solution works for Motorola.
If this is a regression, it is possible that this has changed in docker daemon.
Antonio, would you know more about this?
From my PoV, it's not a problem to add something like this into docker-py's method exec_start:
if not detach:
while True:
ret = c.exec_inspect(id)
if ret["ExitCode"] is not None:
break
sleep(0.01)
To me, it really sounds like there is a problem in detach mechanic.
I finally figured this out! This is a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1501782 I was able to reproduce. As soon as I used the update from the bug above, the problem was gone. Attaching test case for QE: import docker import pprint import time d = docker.Client() runs_count = 10 for tty_value in (True, False): for i in range(runs_count): co = d.create_container(image="rhel7", command="sleep infinity", tty=tty_value) d.start(co) ex = d.exec_create(co, "touch /test", tty=tty_value) d.exec_start(ex, detach=False) # sl = float(i) / float(1000) sl = 0.001 * (2 ** i) time.sleep(sl) ins = d.exec_inspect(ex) ec = ins["ExitCode"] pprint.pprint("[%d] Exit code: %s, tty: %s, sleep: %s" % (i, ec, tty_value, sl)) if ec is None: print "FAILURE" Tried with: python-docker-py-1.10.6-4.el7.noarch docker-1.13.1-68.gitdded712.el7.x86_64 From the #Comment 0 # docker exec -ti python_test ls / bin dev home lib64 media opt root sbin sys tmp var boot etc lib lost+found mnt proc run srv test usr # echo $? 0 # docker exec -ti cli_test ls / bin dev home lib64 media opt root sbin sys tmp var boot etc lib lost+found mnt proc run srv test usr # echo $? 0 From the #Comment 6 # python 1.py '[0] Exit code: 0, tty: True, sleep: 0.001' '[1] Exit code: 0, tty: True, sleep: 0.002' '[2] Exit code: 0, tty: True, sleep: 0.004' '[3] Exit code: 0, tty: True, sleep: 0.008' '[4] Exit code: 0, tty: True, sleep: 0.016' '[5] Exit code: 0, tty: True, sleep: 0.032' '[6] Exit code: 0, tty: True, sleep: 0.064' '[7] Exit code: 0, tty: True, sleep: 0.128' '[8] Exit code: 0, tty: True, sleep: 0.256' '[9] Exit code: 0, tty: True, sleep: 0.512' '[0] Exit code: 0, tty: False, sleep: 0.001' '[1] Exit code: 0, tty: False, sleep: 0.002' '[2] Exit code: 0, tty: False, sleep: 0.004' '[3] Exit code: 0, tty: False, sleep: 0.008' '[4] Exit code: 0, tty: False, sleep: 0.016' '[5] Exit code: 0, tty: False, sleep: 0.032' '[6] Exit code: 0, tty: False, sleep: 0.064' '[7] Exit code: 0, tty: False, sleep: 0.128' '[8] Exit code: 0, tty: False, sleep: 0.256' '[9] Exit code: 0, tty: False, sleep: 0.512' Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:2025 |
Customer tried to run the following: =============================== # docker ps -a --quiet | xargs docker rm -f 06eb4a8936e7 4658bc1ea116 # docker create -ti --name cli_test registry.access.redhat.com/rhel7.4 sleep infinity; docker start cli_test; docker exec -ti cli_test touch /test ; echo $? 97485390d26bd97caff4d1d64886b71e8fdaa8cd9d9f3b56f29effe523fb811e cli_test 0 # python -c 'import docker; import pprint; c = docker.Client(); c.create_container(image="registry.access.redhat.com/rhel7.4", name="python_test", command="sleep infinity", tty=True, stdin_open=True); c.start("python_test"); id = c.exec_create("python_test", "touch /test", tty=True); c.exec_start(id); pprint.pprint(c.exec_inspect(id))' {u'CanRemove': False, u'ContainerID': u'3412e0f488dd592ca078def74cee107ab8662312b2b0bc663b605a43eda1e579', u'DetachKeys': u'', u'ExitCode': None, u'ID': u'be2de26297383919e2ab26b04f76f760f6d28db7ab4b0718bbdfa363d8153c60', u'OpenStderr': True, u'OpenStdin': False, u'OpenStdout': True, u'Pid': 0, u'ProcessConfig': {u'arguments': [u'/test'], u'entrypoint': u'touch', u'privileged': False, u'tty': True}, u'Running': True} # docker exec -ti cli_test ls / bin boot dev etc home lib lib64 media mnt opt proc rhel root run sbin srv sys test tmp usr var # docker exec -ti python_test ls / bin boot dev etc home lib lib64 media mnt opt proc rhel root run sbin srv sys test tmp usr var =============================== This exits with ExitCode=None instead of 1 or 0. We suggested: ExitCode==None means that the command either haven't started or haven't finished. You probably need to start the exec. Can you try d.exec_start(...), after doing exec_create()? Now the command finishes, /test is created inside the container, but ExitCode is still "None". They're expecting 1 or 0.