Bug 1579814

Summary: Incorrect exit code of command executed inside container via python API
Product: Red Hat Enterprise Linux 7 Reporter: Daniele <dconsoli>
Component: python-docker-pyAssignee: Tomas Tomecek <ttomecek>
Status: CLOSED ERRATA QA Contact: atomic-bugs <atomic-bugs>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 7.5CC: amurdaca, dornelas, lsu
Target Milestone: rcKeywords: Extras
Target Release: ---   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-06-26 18:36:32 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1186913, 1561841    

Description Daniele 2018-05-18 11:53:47 UTC
Customer tried to run the following:

===============================
# docker ps -a --quiet | xargs docker rm -f
06eb4a8936e7
4658bc1ea116
# docker create -ti --name cli_test registry.access.redhat.com/rhel7.4 sleep infinity; docker start cli_test; docker exec -ti cli_test touch /test ; echo $?
97485390d26bd97caff4d1d64886b71e8fdaa8cd9d9f3b56f29effe523fb811e
cli_test
0
# python -c 'import docker; import pprint; c = docker.Client(); c.create_container(image="registry.access.redhat.com/rhel7.4", name="python_test", command="sleep infinity", tty=True, stdin_open=True); c.start("python_test"); id = c.exec_create("python_test", "touch /test", tty=True); c.exec_start(id); pprint.pprint(c.exec_inspect(id))'
{u'CanRemove': False,
 u'ContainerID': u'3412e0f488dd592ca078def74cee107ab8662312b2b0bc663b605a43eda1e579',
 u'DetachKeys': u'',
 u'ExitCode': None,
 u'ID': u'be2de26297383919e2ab26b04f76f760f6d28db7ab4b0718bbdfa363d8153c60',
 u'OpenStderr': True,
 u'OpenStdin': False,
 u'OpenStdout': True,
 u'Pid': 0,
 u'ProcessConfig': {u'arguments': [u'/test'],
                    u'entrypoint': u'touch',
                    u'privileged': False,
                    u'tty': True},
 u'Running': True}
# docker exec -ti cli_test ls /
bin  boot  dev  etc  home  lib  lib64  media  mnt  opt  proc  rhel  root  run  sbin  srv  sys  test  tmp  usr  var
# docker exec -ti python_test ls /
bin  boot  dev  etc  home  lib  lib64  media  mnt  opt  proc  rhel  root  run  sbin  srv  sys  test  tmp  usr  var

===============================

This exits with ExitCode=None instead of 1 or 0.

We suggested: ExitCode==None means that the command either haven't started or haven't finished. You probably need to start the exec. Can you try d.exec_start(...), after doing exec_create()?

Now the command finishes, /test is created inside the container, but ExitCode is still "None". They're expecting 1 or 0.

Comment 2 Tomas Tomecek 2018-05-18 14:45:28 UTC
Now I understand the confusion.

The problem Motorola is seeing is actually a race condition: the moment they invoke `c.exec_start(id); pprint.pprint(c.exec_inspect(id)`, the command haven't finished yet and hence dockerd reports `ExitCode: None`.

I played with this a bit and came up with this solution:


import docker

c = docker.Client()
c.create_container(image="rhel7", name="python_test", command="sleep infinity", tty=True, stdin_open=True)
c.start("python_test")
id = c.exec_create("python_test", "ls -lha /", tty=True)
stream = c.exec_start(id, detach=False, stream=True)
for i in stream:
    print(i)  # will print output of the command to stdout
ret = c.exec_inspect(id)
print("Exit code: %s" % ret["ExitCode"])


And running it:


[root@vm ~]# python t.py
total 0
drwxr-xr-x.   1 root root   6 May 18 14:40 .
drwxr-xr-x.   1 root root   6 May 18 14:40 ..
-rwxr-xr-x.   1 root root   0 May 18 14:40 .dockerenv
lrwxrwxrwx.   1 root root   7 Apr  4 17:06 bin -> usr/bin
dr-xr-xr-x.   2 root root   6 Dec 14 17:23 boot
drwxr-xr-x.   5 root root 360 May 18 14:40 dev
drwxr-xr-x.   1 root root  66 May 18 14:40 etc
drwxr-xr-x.   2 root root   6 Apr  4 17:08 home
lrwxrwxrwx.   1 root root   7 Apr  4 17:06 lib -> usr/lib
lrwxrwxrwx.   1 root root   9 Apr  4 17:06 lib64 -> usr/lib64
drwxr-xr-x.   2 root root   6 Dec 14 17:23 media
drwxr-xr-x.   2 root root   6 Dec 14 17:23 mnt
drwxr-xr-x.   2 root root   6 Dec 14 17:23 opt
dr-xr-xr-x. 108 root root   0 May 18 14:40 proc
dr-xr-x---.   1 root root  23 Apr  4 17:12 root
drwxr-xr-x.   1 root root  21 Apr  4 17:12 run
lrwxrwxrwx.   1 root root   8 Apr  4 17:06 sbin -> usr/sbin
drwxr-xr-x.   2 root root   6 Dec 14 17:23 srv
dr-xr-xr-x.  13 root root   0 May 18 14:14 sys
drwxrwxrwt.   7 root root 132 Apr  4 17:08 tmp
drwxr-xr-x.  13 root root 155 Apr  4 17:06 usr
drwxr-xr-x.  18 root root 238 Apr  4 17:06 var

Exit code: 0


When I used the touch command, I was still hitting the race condition, probably because 'touch' doesn't output anything to stdout (is this a bug in dockerd)?


I also recommend reading docker's source code on how it handles exec:

https://github.com/projectatomic/docker/blob/docker-1.13.1/cli/command/container/exec.go
https://github.com/projectatomic/docker/blob/docker-1.13.1/client/container_exec.go


Please let me know whether the proposed solution works for Motorola.

Comment 4 Tomas Tomecek 2018-05-30 10:56:26 UTC
If this is a regression, it is possible that this has changed in docker daemon.

Antonio, would you know more about this?

From my PoV, it's not a problem to add something like this into docker-py's method exec_start:

if not detach:
  while True:
    ret = c.exec_inspect(id)
    if ret["ExitCode"] is not None:
      break
    sleep(0.01)

To me, it really sounds like there is a problem in detach mechanic.

Comment 6 Tomas Tomecek 2018-06-01 09:40:34 UTC
I finally figured this out!

This is a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1501782

I was able to reproduce. As soon as I used the update from the bug above, the problem was gone.

Attaching test case for QE:


import docker
import pprint
import time

d = docker.Client()

runs_count = 10

for tty_value in (True, False):
    for i in range(runs_count):
        co = d.create_container(image="rhel7", command="sleep infinity", tty=tty_value)
        d.start(co)
        ex = d.exec_create(co, "touch /test", tty=tty_value)
        d.exec_start(ex, detach=False)
        # sl = float(i) / float(1000)
        sl = 0.001 * (2 ** i)
        time.sleep(sl)
        ins = d.exec_inspect(ex)
        ec = ins["ExitCode"]
        pprint.pprint("[%d] Exit code: %s, tty: %s, sleep: %s" % (i, ec, tty_value, sl))
        if ec is None:
            print "FAILURE"

Comment 9 Luwen Su 2018-06-14 08:58:05 UTC
Tried with:
python-docker-py-1.10.6-4.el7.noarch
docker-1.13.1-68.gitdded712.el7.x86_64

From the #Comment 0

# docker exec -ti python_test ls /
bin   dev  home  lib64	     media  opt   root	sbin  sys   tmp  var
boot  etc  lib	 lost+found  mnt    proc  run	srv   test  usr
# echo $?
0

# docker exec -ti cli_test ls /
bin   dev  home  lib64	     media  opt   root	sbin  sys   tmp  var
boot  etc  lib	 lost+found  mnt    proc  run	srv   test  usr
# echo $?
0

From the #Comment 6
# python 1.py 
'[0] Exit code: 0, tty: True, sleep: 0.001'
'[1] Exit code: 0, tty: True, sleep: 0.002'
'[2] Exit code: 0, tty: True, sleep: 0.004'
'[3] Exit code: 0, tty: True, sleep: 0.008'
'[4] Exit code: 0, tty: True, sleep: 0.016'
'[5] Exit code: 0, tty: True, sleep: 0.032'
'[6] Exit code: 0, tty: True, sleep: 0.064'
'[7] Exit code: 0, tty: True, sleep: 0.128'
'[8] Exit code: 0, tty: True, sleep: 0.256'
'[9] Exit code: 0, tty: True, sleep: 0.512'
'[0] Exit code: 0, tty: False, sleep: 0.001'
'[1] Exit code: 0, tty: False, sleep: 0.002'
'[2] Exit code: 0, tty: False, sleep: 0.004'
'[3] Exit code: 0, tty: False, sleep: 0.008'
'[4] Exit code: 0, tty: False, sleep: 0.016'
'[5] Exit code: 0, tty: False, sleep: 0.032'
'[6] Exit code: 0, tty: False, sleep: 0.064'
'[7] Exit code: 0, tty: False, sleep: 0.128'
'[8] Exit code: 0, tty: False, sleep: 0.256'
'[9] Exit code: 0, tty: False, sleep: 0.512'

Comment 11 errata-xmlrpc 2018-06-26 18:36:32 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2025