Bug 1189840 - [atomic] Negative DNS caching of the dockerd breaks docker pull
Summary: [atomic] Negative DNS caching of the dockerd breaks docker pull
Keywords:
Status: CLOSED UPSTREAM
Alias: None
Product: Atomic
Classification: Retired
Component: docker-io
Version: unspecified
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Michal Minar
QA Contact: Ladislav Jozsa
URL:
Whiteboard:
Depends On: 1208834
Blocks: 1147383
TreeView+ depends on / blocked
 
Reported: 2015-02-05 15:26 UTC by Ladislav Jozsa
Modified: 2015-06-03 12:18 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-06-03 12:18:16 UTC
RHEL 7.3 requirements from Atomic Host:
Embargoed:


Attachments (Terms of Use)
go reproducer (2.74 KB, text/plain)
2015-04-02 09:46 UTC, Michal Minar
no flags Details
go reproducer (1.44 KB, text/plain)
2015-04-02 09:51 UTC, Michal Minar
no flags Details
ping in python (1.61 KB, text/plain)
2015-04-03 10:39 UTC, Michal Minar
no flags Details

Description Ladislav Jozsa 2015-02-05 15:26:58 UTC
Description of problem:
When trying to run 'docker pull fedora' without internet connection (networking down), dockerd remembers that it cannot resolve index.docker.io and it fails to pull the image even if the networking back is up.

Version-Release number of selected component (if applicable):
docker-1.4.1-30.el7

How reproducible:
always

Steps to Reproduce:
1. Make sure networking is deconfigured - no IPv4 address on the primary NIC
2. docker pull fedora
3. ifup <interface>
4. docker pull fedora

Actual results:
'docker pull' doesn't resolve index.docker.io even if the internet connection is restored. ping index.docker.io works well

Expected results:
'docker pull' works as soon as the internet connectivity is restored after failing first

Additional info:

Comment 2 Michal Minar 2015-04-02 09:46:42 UTC
Created attachment 1010091 [details]
go reproducer

It seems like some golang issue. I made a simple go reproducer (attached) which just pings docker index, shuts down interface, pings again, brings the interface back up and does one more ping.

If I run it on RHEL7 (golang-1.3.3-3.el7), I get:
$ go run ping-after-iface-restart.go
Pinging index https://index.docker.io/v1
Ping successful. StatusCode=200, Registry Version: 0.6.3
Shutting down interface "eth0".
Pinging index https://index.docker.io/v1
Failed to ping https://index.docker.io/v1: Get https://index.docker.io/v1/_ping: dial tcp: lookup index.docker.io: no such host
Bringing interface "eth0" back up.
Pinging index https://index.docker.io/v1
Failed to ping https://index.docker.io/v1: Get https://index.docker.io/v1/_ping: dial tcp: lookup index.docker.io: no such host

^^ Not always, but fairly often. Even though there's a 1 second sleep before new ping.

However, the same program on Fedora rawhide (golang-1.4.2-1.fc23) produces:
Pinging index https://index.docker.io/v1
Ping successful. StatusCode=200, Registry Version: 0.6.3
Shutting down interface "eth0".
Pinging index https://index.docker.io/v1
Failed to ping https://index.docker.io/v1: Get https://index.docker.io/v1/_ping: dial tcp: lookup index.docker.io: no such host
Bringing interface "eth0" back up.
Pinging index https://index.docker.io/v1
Ping successful. StatusCode=200, Registry Version: 0.6.3

^^ Always.

Now again on RHEL7 with a slight modification - rerun the program binary just after iface gets restarted.
$ go run ping-after-iface-restart.go --once; nmcli d disconnect eth0; nmcli d connect eth0; go run ping-after-iface-restart.go --once
Pinging index https://index.docker.io/v1
Ping successful. StatusCode=200, Registry Version: 0.6.3
Device 'eth0' successfully disconnected.
Device 'eth0' successfully activated with 'b3431d45-3f8d-47cd-ad11-71b737eec98b'.
Pinging index https://index.docker.io/v1
Ping successful. StatusCode=200, Registry Version: 0.6.3

^^ Always.

I'll try the same again on RHEL7 with newer golang - if I somehow manage to build it.

Comment 3 Michal Minar 2015-04-02 09:51:45 UTC
Created attachment 1010094 [details]
go reproducer

Updated the script (previous version was lacking `--once`).

Comment 4 Michal Minar 2015-04-03 10:39:10 UTC
Created attachment 1010593 [details]
ping in python

go reproducer rewritten in python.

Comment 5 Michal Minar 2015-04-03 10:43:34 UTC
So the rawhide's golang (1.4.2) compiled on RHEL7 suffers from the same issue. However the same program written and launched in python (attached) behaves correctly -- 3rd ping is always successful.

I'll create a new bug for golang component as a blocker for this one.

Comment 6 Michal Minar 2015-04-03 11:02:57 UTC
Actually there's a difference between golang 1.3.3 and 1.4.2. The former has 48% success rate while the latter has 72% success rate from out of 100 cases :-).

Comment 7 Daniel Walsh 2015-04-14 19:59:14 UTC
If this is a golang issue, we should close it upstream.


Note You need to log in before you can comment on or make changes to this bug.