Bug 1374265

Summary: [extras-rhel-7.3.0] docker-py 1.7.2 hangs when trying to access ctr stats against docker 1.10
Product: Red Hat Enterprise Linux 7 Reporter: Thomas Wiest <twiest>
Component: dockerAssignee: Antonio Murdaca <amurdaca>
Status: CLOSED ERRATA QA Contact: atomic-bugs <atomic-bugs>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 7.2CC: gouyang, lsm5, ttomecek
Target Milestone: rcKeywords: Extras
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-11-04 09:09:23 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Thomas Wiest 2016-09-08 11:25:12 UTC
Description of problem:
When trying to access a container's stats using docker-py 1.7.2 against a docker 1.10 daemon, the call hangs for 2 minutes and then exits with the following error:

requests.exceptions.ReadTimeout: UnixHTTPConnectionPool(host='localhost', port=None): Read timed out. (read timeout=120)


I'm able to do other operations like listing the containers. It only seems to hang / error on the stats call.

This works against docker 1.9.




Version-Release number of selected component (if applicable):
python-docker-py-1.7.2-1.el7.noarch
docker-1.10.3-46.el7.10.x86_64




How reproducible:
Very reproducible using this reproducer:


#!/usr/bin/env python

from docker import AutoVersionClient

cli = AutoVersionClient(base_url='unix://var/run/docker.sock', timeout=120)

first_ctr = cli.containers()[0]
name = first_ctr['Names'][0]
print name

stats = cli.stats(first_ctr['Id'], stream=False)
print stats





Steps to Reproduce:
1. Make sure you're running against docker-1.10 with docker-py 1.7.2
2. Run the reproducer script above
3. Notice that it just hangs after printing the name of the container. It hangs for 2 minutes and then errors out.
4. Run the same reproducer against docker-1.9 and notice that it works there.





Actual results:
$ time ./reproducer.py 
/k8s_fluentd-elasticsearch.c2b7b021_logging-fluentd-0fyea_logging_6d005e90-743d-11e6-a23f-0a3cb09b029f_132818d6
Traceback (most recent call last):
  File "./reproducer.py", line 11, in <module>
    stats = cli.stats(first_ctr['Id'], stream=False)
  File "/usr/lib/python2.7/site-packages/docker/utils/decorators.py", line 35, in wrapper
    return f(self, *args, **kwargs)
  File "/usr/lib/python2.7/site-packages/docker/utils/decorators.py", line 21, in wrapped
    return f(self, resource_id, *args, **kwargs)
  File "/usr/lib/python2.7/site-packages/docker/api/container.py", line 373, in stats
    return self._result(self._get(url, params={'stream': False}),
  File "/usr/lib/python2.7/site-packages/docker/utils/decorators.py", line 47, in inner
    return f(self, *args, **kwargs)
  File "/usr/lib/python2.7/site-packages/docker/client.py", line 112, in _get
    return self.get(url, **self._set_request_timeout(kwargs))
  File "/usr/lib/python2.7/site-packages/requests/sessions.py", line 476, in get
    return self.request('GET', url, **kwargs)
  File "/usr/lib/python2.7/site-packages/requests/sessions.py", line 464, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/lib/python2.7/site-packages/requests/sessions.py", line 576, in send
    r = adapter.send(request, **kwargs)
  File "/usr/lib/python2.7/site-packages/requests/adapters.py", line 433, in send
    raise ReadTimeout(e, request=request)
requests.exceptions.ReadTimeout: UnixHTTPConnectionPool(host='localhost', port=None): Read timed out. (read timeout=120)

real	2m0.233s
user	0m0.085s
sys	0m0.027s






Expected results:
This is what happens when I run the same reproducer against docker-1.9.1-40.el7.x86_64

$ time ./reproducer.py 
/k8s_router.a7369d28_router-10-b2nrs_default_5f03cf62-7544-11e6-a051-124ba03dd4fb_6b159c64
{u'read': u'2016-09-08T11:15:32.691956665Z', u'memory_stats': {u'usage': 87830528, u'limit': 2000000000, u'failcnt': 0, u'stats': {u'unevictable': 0, u'total_inactive_file': 33382400 [... snip ...]

real	0m2.132s
user	0m0.139s
sys	0m0.116s

Comment 1 Thomas Wiest 2016-09-08 11:37:16 UTC
I tried upgrading to the latest docker-py using pip, and I got the same results.

So this doesn't look like this is fixed in newer versions of docker-py.

$ pip list | grep docker-py
You are using pip version 7.1.0, however version 8.1.2 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
docker-py (1.9.0)

$ time ./reproducer.py 
/k8s_fluentd-elasticsearch.c2b7b021_logging-fluentd-0fyea_logging_6d005e90-743d-11e6-a23f-0a3cb09b029f_132818d6
Traceback (most recent call last):
  File "./reproducer.py", line 11, in <module>
    stats = cli.stats(first_ctr['Id'], stream=False)
  File "/usr/lib/python2.7/site-packages/docker/utils/decorators.py", line 35, in wrapper
    return f(self, *args, **kwargs)
  File "/usr/lib/python2.7/site-packages/docker/utils/decorators.py", line 21, in wrapped
    return f(self, resource_id, *args, **kwargs)
  File "/usr/lib/python2.7/site-packages/docker/api/container.py", line 378, in stats
    return self._result(self._get(url, params={'stream': False}),
  File "/usr/lib/python2.7/site-packages/docker/utils/decorators.py", line 47, in inner
    return f(self, *args, **kwargs)
  File "/usr/lib/python2.7/site-packages/docker/client.py", line 140, in _get
    return self.get(url, **self._set_request_timeout(kwargs))
  File "/usr/lib/python2.7/site-packages/requests/sessions.py", line 488, in get
    return self.request('GET', url, **kwargs)
  File "/usr/lib/python2.7/site-packages/requests/sessions.py", line 475, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/lib/python2.7/site-packages/requests/sessions.py", line 596, in send
    r = adapter.send(request, **kwargs)
  File "/usr/lib/python2.7/site-packages/requests/adapters.py", line 499, in send
    raise ReadTimeout(e, request=request)
requests.exceptions.ReadTimeout: UnixHTTPConnectionPool(host='localhost', port=None): Read timed out. (read timeout=120)

real	2m0.171s
user	0m0.083s
sys	0m0.023s

Comment 3 Tomas Tomecek 2016-09-09 10:00:22 UTC
I'm still not able to reproduce b/c my local libvirt isn't working and hence I can't even access rhel 7 (https://bugzilla.redhat.com/show_bug.cgi?id=1374325 https://bugzilla.redhat.com/show_bug.cgi?id=1374486), our internal openstack won't even start a vm -,-


So in the meantime I checked sources of docker engine and this is what I came up with:

code which forms the http request:

https://github.com/projectatomic/docker/blob/rhel7-1.10.3/vendor/src/github.com/docker/engine-api/client/container_stats.go#L10

code that processes the response:

https://github.com/projectatomic/docker/blob/rhel7-1.10.3/api/client/stats.go#L40


Interesting thing is that no matter if you specify `stream` or not, server always responds with `Transfer-Encoding: chunked` which is kind of odd. If the client is interested only in a single stats instance, the server should just send the json in a single response and close. Not return a stream. You can easily observe this:

$ ncat -U /var/run/docker.sock
GET /v1.22/containers/580dc1eaae69/stats?stream=0 HTTP/1.1  # press <enter> twice and wait for response

HTTP/1.1 200 OK
Content-Type: application/json
Server: Docker/1.10.3 (linux)
Date: Fri, 09 Sep 2016 09:57:20 GMT
Transfer-Encoding: chunked

1127
{"read":"2016-09-09T09:57:20.896712812Z","precpu_stats":{"cpu_usage":{"total_usage":981229


docker client handles this weird behavior by doing only a single iteration during processing response and then just breaks out of the loop [1]. At the same time it only prints a single stats instance with --no-stream. Funny thing is that client sets "timeout" to two seconds [2] and if it doesn't get a response in that timeframe it just prints all zeros. That's what you're seeing. So instead docker client printed "hey I received no stats" it just prints those zeros. To me, docker-py does this correctly, it keeps on waiting and eventually times out.

Hence it's pretty clear that the issue is in docker engine: it doesn't respond with stats correctly in time and that's why client just keeps on waiting getting nothing.


Therefore I'm reassigning to docker component.


[1] https://github.com/projectatomic/docker/blob/rhel7-1.10.3/api/client/stats.go#L86
[2] https://github.com/projectatomic/docker/blob/rhel7-1.10.3/api/client/stats.go#L93

Comment 4 Antonio Murdaca 2016-09-09 12:15:03 UTC
Is this happening with upstream docker and docker-py as well? I doubt this went unnoticed upstream, we probably need to backport a patch for this.

Comment 5 Thomas Wiest 2016-09-09 13:02:13 UTC
@amurdaca  Looks like upstream is experiencing something similar (found by ttomecek):

https://github.com/docker/docker-py/issues/1195

Comment 6 Tomas Tomecek 2016-09-09 14:51:52 UTC
I have found and fixed the issue in 1.10: https://github.com/projectatomic/docker/pull/195 right now I'm working on upstream.

Comment 7 Tomas Tomecek 2016-09-09 15:15:50 UTC
Upstream PR: https://github.com/docker/docker/pull/25905; my PR is just a backport.

Comment 8 Antonio Murdaca 2016-09-09 15:22:56 UTC
backported to docker-1.12 and 1.10.3 with https://github.com/projectatomic/docker/pull/195

Comment 11 errata-xmlrpc 2016-11-04 09:09:23 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2016-2634.html