Bug 1309900

Summary: docker pull wedges, daemon restart required
Product: Red Hat Enterprise Linux 7 Reporter: Chris Evich <cevich>
Component: dockerAssignee: Daniel Walsh <dwalsh>
Status: CLOSED ERRATA QA Contact: atomic-bugs <atomic-bugs>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 7.3CC: amurdaca, cevich, lsm5, lsu, vgoyal, walters
Target Milestone: rcKeywords: Extras
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-06-23 16:17:54 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Chris Evich 2016-02-18 22:29:04 UTC
Description of problem:
Found this on a "continuous-build" VM constructed as follows:  Start with latest RHEL Atomic Host,  apply updated os-tree composed from latest brew-builds plus recent builds from several upstream projects (docker, atomic, etc).  Totally understand this could end up being a garbage-in-garbage-out case.

Version-Release number of selected component (if applicable):
-bash-4.2# rpm -q atomic
atomic-1.8.63-d14c26ba39613990cf0be1bfcff8cc730923798c.451f96f7c7eeaeb7efe364cc7bcbe269f351ff21.el7.x86_64
-bash-4.2# rpm -q docker
docker-1.4.1.7082-b720092d04d672d3639882045cd52aba090f916b.F.13.split.116.5a94566a184652f126ec2fcd5acb67ac57512428.el7.x86_64

-bash-4.2# docker version
Client:
 Version:         1.9.1-el7
 API version:     1.21
 Package version: docker-1.4.1.7082-b720092d04d672d3639882045cd52aba090f916b.F.13.split.116.5a94566a184652f126ec2fcd5acb67ac57512428.el7.x86_64
 Go version:      go1.6rc1
 Git commit:      21ef73f-dirty
 Built:           Thu Jan 28 21:21:16 UTC 2016
 OS/Arch:         linux/amd64

How reproducible:
Not easily

Steps to Reproduce:
1. Run playbook against test machine w/ latest RHEL AH 7.2.1 - internal-link: https://url.corp.redhat.com/97cd50c passing (-e) it 'ostree_type=stage'
2. Connect to host
3. Edit /etc/sysconfig/docker to point at stage repo - internal-link: https://url.corp.redhat.com/586d681
3. restart docker daemon
4. run docker pull rhel7/rhel-tools:latest

Actual results:
Docker pull wedged forever (possibly due to excessive registry load).  Further 'docker pull' commands also become wedged.  Only recourse is to restart docker daemon, killing any existing/running/happy containers.  The registry host is ping-able.

Expected results:
Docker should timeout "eventually" and/or abort the pull if connection to registry is "unresponsive" and/or a timeout expires and/or throughput to registry is excessive "slow".

Where "eventually", "unresponsive", and "slow" should match general-use case user-expectations.

Additional info:
-bash-4.2# docker info
Containers: 0
Images: 1
Server Version: 1.9.1-el7
Storage Driver: devicemapper
 Pool Name: atomicos-docker--pool
 Pool Blocksize: 524.3 kB
 Base Device Size: 107.4 GB
 Backing Filesystem: xfs
 Data file: 
 Metadata file: 
 Data Space Used: 558.4 MB
 Data Space Total: 15.77 GB
 Data Space Available: 15.22 GB
 Metadata Space Used: 90.11 kB
 Metadata Space Total: 46.14 MB
 Metadata Space Available: 46.05 MB
 Udev Sync Supported: true
 Deferred Removal Enabled: false
 Deferred Deletion Enabled: false
 Deferred Deleted Device Count: 0
 Library Version: 1.02.107-RHEL7 (2015-10-14)
Execution Driver: native-0.2
Logging Driver: json-file
Kernel Version: 3.10.0-327.el7.x86_64
Operating System: Red Hat Enterprise Linux Atomic Host 7.2
CPUs: 2
Total Memory: 3.703 GiB
Name: rhelah72-jxfhw.novalocal
ID: 5XKI:XZGF:3O5T:KJSB:3G5F:JBNY:5MOV:2ANQ:2ZKI:77MP:C5QJ:I76N
WARNING: bridge-nf-call-ip6tables is disabled
-bash-4.2# atomic --version
Traceback (most recent call last):
  File "/usr/bin/atomic", line 34, in <module>
    from Atomic.verify import Verify
  File "/usr/lib/python2.7/site-packages/Atomic/verify.py", line 4, in <module>
    from docker.errors import NotFound
ImportError: cannot import name NotFound

Comment 3 Daniel Walsh 2016-02-19 14:02:22 UTC
Any chance you can try this with docker-1.10.

What was the rpm version of the docker-1.9?

Need to get an update of the atomic package also.

Comment 4 Chris Evich 2016-02-19 16:13:57 UTC
(In reply to Daniel Walsh from comment #3)
> Any chance you can try this with docker-1.10.
> 
> What was the rpm version of the docker-1.9?
> 
> Need to get an update of the atomic package also.

Versions are in the description above (the really long fugly looking name).  It's greek to me, but IIUC those should be commit IDs in the name.  I can find out more details if you'd like to know where it came from / how it was built. Just let me know.

Still trying to re-reproduce this w/ -D to the daemon, it's not cooperating consistently.  Seems likely it's dependent on state/status of the network and registry it's using.  Still trying...

Comment 5 Daniel Walsh 2016-02-19 16:21:00 UTC
docker-1.4.1.7082-b720092d04d672d3639882045cd52aba090f916b.F.13.split.116.5a94566a184652f126ec2fcd5acb67ac57512428.el7.x86_64

Well if this is what you are talking about it looks a little (Lot ) out of date.

Comment 7 Colin Walters 2016-02-19 16:24:34 UTC
(In reply to Daniel Walsh from comment #5)
> docker-1.4.1.7082-b720092d04d672d3639882045cd52aba090f916b.F.13.split.116.
> 5a94566a184652f126ec2fcd5acb67ac57512428.el7.x86_64
> 
> Well if this is what you are talking about it looks a little (Lot ) out of
> date.

Yeah, unfortunately:
http://gitlab.osas.lab.eng.rdu2.redhat.com/walters/rhel-atomic-host-continuous/blob/master/overlay.yml#L95

rpmdistro-gitoverlay still hasn't quite yet learned how to build all 4 git repos in docker.spec.

Comment 8 Chris Evich 2016-02-19 16:29:30 UTC
(In reply to Daniel Walsh from comment #5)
> Well if this is what you are talking about it looks a little (Lot ) out of
> date.

Colin,

Could you share some insight into how docker & atomic were built in this os-tree?  

-bash-4.2# rpm -q atomic
atomic-1.8.63-d14c26ba39613990cf0be1bfcff8cc730923798c.451f96f7c7eeaeb7efe364cc7bcbe269f351ff21.el7.x86_64
-bash-4.2# rpm -q docker
docker-1.4.1.7082-b720092d04d672d3639882045cd52aba090f916b.F.13.split.116.5a94566a184652f126ec2fcd5acb67ac57512428.el7.x86_64

Here's the console from the job that produced the vm:
https://url.corp.redhat.com/6a6e913

It's from your 'continuous' playbook.  Links to build-logs or anything like that could be helpful as well.  Thanks.

Comment 9 Chris Evich 2016-02-19 16:30:11 UTC
Eek Mid-air collision detected! hehe, was just gunna ping you Collin :D

Comment 10 Antonio Murdaca 2016-02-19 17:03:27 UTC
(In reply to Daniel Walsh from comment #5)
> docker-1.4.1.7082-b720092d04d672d3639882045cd52aba090f916b.F.13.split.116.
> 5a94566a184652f126ec2fcd5acb67ac57512428.el7.x86_64
> 
> Well if this is what you are talking about it looks a little (Lot ) out of
> date.

but seems like the actual docker running is 1.9.1-el7 (weird)

Comment 11 Daniel Walsh 2016-06-03 18:23:56 UTC
Since we have newer dockers, I am going to mark this as fixed in the new release.

Comment 12 Chris Evich 2016-06-06 14:30:25 UTC
Yep, NP.

Comment 13 Daniel Walsh 2016-06-07 14:58:15 UTC
Fixed in docker-1.10

Comment 18 errata-xmlrpc 2016-06-23 16:17:54 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:1274