Bug 1189840

Summary: [atomic] Negative DNS caching of the dockerd breaks docker pull
Product: [Retired] Atomic Reporter: Ladislav Jozsa <ljozsa>
Component: docker-ioAssignee: Michal Minar <miminar>
Status: CLOSED UPSTREAM QA Contact: Ladislav Jozsa <ljozsa>
Severity: medium Docs Contact:
Priority: medium    
Version: unspecifiedCC: dwalsh, walters
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-06-03 12:18:16 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1208834    
Bug Blocks: 1147383    
Attachments:
Description Flags
go reproducer
none
go reproducer
none
ping in python none

Description Ladislav Jozsa 2015-02-05 15:26:58 UTC
Description of problem:
When trying to run 'docker pull fedora' without internet connection (networking down), dockerd remembers that it cannot resolve index.docker.io and it fails to pull the image even if the networking back is up.

Version-Release number of selected component (if applicable):
docker-1.4.1-30.el7

How reproducible:
always

Steps to Reproduce:
1. Make sure networking is deconfigured - no IPv4 address on the primary NIC
2. docker pull fedora
3. ifup <interface>
4. docker pull fedora

Actual results:
'docker pull' doesn't resolve index.docker.io even if the internet connection is restored. ping index.docker.io works well

Expected results:
'docker pull' works as soon as the internet connectivity is restored after failing first

Additional info:

Comment 2 Michal Minar 2015-04-02 09:46:42 UTC
Created attachment 1010091 [details]
go reproducer

It seems like some golang issue. I made a simple go reproducer (attached) which just pings docker index, shuts down interface, pings again, brings the interface back up and does one more ping.

If I run it on RHEL7 (golang-1.3.3-3.el7), I get:
$ go run ping-after-iface-restart.go
Pinging index https://index.docker.io/v1
Ping successful. StatusCode=200, Registry Version: 0.6.3
Shutting down interface "eth0".
Pinging index https://index.docker.io/v1
Failed to ping https://index.docker.io/v1: Get https://index.docker.io/v1/_ping: dial tcp: lookup index.docker.io: no such host
Bringing interface "eth0" back up.
Pinging index https://index.docker.io/v1
Failed to ping https://index.docker.io/v1: Get https://index.docker.io/v1/_ping: dial tcp: lookup index.docker.io: no such host

^^ Not always, but fairly often. Even though there's a 1 second sleep before new ping.

However, the same program on Fedora rawhide (golang-1.4.2-1.fc23) produces:
Pinging index https://index.docker.io/v1
Ping successful. StatusCode=200, Registry Version: 0.6.3
Shutting down interface "eth0".
Pinging index https://index.docker.io/v1
Failed to ping https://index.docker.io/v1: Get https://index.docker.io/v1/_ping: dial tcp: lookup index.docker.io: no such host
Bringing interface "eth0" back up.
Pinging index https://index.docker.io/v1
Ping successful. StatusCode=200, Registry Version: 0.6.3

^^ Always.

Now again on RHEL7 with a slight modification - rerun the program binary just after iface gets restarted.
$ go run ping-after-iface-restart.go --once; nmcli d disconnect eth0; nmcli d connect eth0; go run ping-after-iface-restart.go --once
Pinging index https://index.docker.io/v1
Ping successful. StatusCode=200, Registry Version: 0.6.3
Device 'eth0' successfully disconnected.
Device 'eth0' successfully activated with 'b3431d45-3f8d-47cd-ad11-71b737eec98b'.
Pinging index https://index.docker.io/v1
Ping successful. StatusCode=200, Registry Version: 0.6.3

^^ Always.

I'll try the same again on RHEL7 with newer golang - if I somehow manage to build it.

Comment 3 Michal Minar 2015-04-02 09:51:45 UTC
Created attachment 1010094 [details]
go reproducer

Updated the script (previous version was lacking `--once`).

Comment 4 Michal Minar 2015-04-03 10:39:10 UTC
Created attachment 1010593 [details]
ping in python

go reproducer rewritten in python.

Comment 5 Michal Minar 2015-04-03 10:43:34 UTC
So the rawhide's golang (1.4.2) compiled on RHEL7 suffers from the same issue. However the same program written and launched in python (attached) behaves correctly -- 3rd ping is always successful.

I'll create a new bug for golang component as a blocker for this one.

Comment 6 Michal Minar 2015-04-03 11:02:57 UTC
Actually there's a difference between golang 1.3.3 and 1.4.2. The former has 48% success rate while the latter has 72% success rate from out of 100 cases :-).

Comment 7 Daniel Walsh 2015-04-14 19:59:14 UTC
If this is a golang issue, we should close it upstream.