Hide Forgot
Description of problem: See the following details. Version-Release number of selected component (if applicable): openshift v3.4.0.30+e10cc28 kubernetes v1.4.0+776c994 etcd 3.1.0-rc.0 How reproducible: Always Steps to Reproduce: 1. Set openshift_dns_ip=<a-non-existing-ip>, e.g: 172.30.0.1 2. After installation, log into docker-registry via oc rsh command. # oc rsh docker-registry-2-n4ms1 sh-4.2$ cat /etc/resolv.conf search default.svc.cluster.local svc.cluster.local cluster.local openstacklocal lab.sjc.redhat.com nameserver 172.30.0.2 nameserver 192.168.2.15 options ndots:5 # curl google.com => PASS 3. Try to login to this docker-registry via docker command. Actual results: # docker login -u unused -p $(oc sa get-token builder -n openshift3) 172.30.19.67:5000 Error response from daemon: Get http://172.30.19.67:5000/v2/: Get http://172.30.19.67:5000/openshift/token?account=unused&client_id=docker&offline_token=true: net/http: request canceled (Client.Timeout exceeded while awaiting headers) (Client.Timeout exceeded while awaiting headers) sti build will also fail. # oc logs django-psql-example-1-build -n install-test <--snip--> Pushing image 172.30.19.67:5000/install-test/django-psql-example:latest ... Registry server Address: Registry server User Name: serviceaccount Registry server Email: serviceaccount Registry server Password: <<non-empty>> error: build error: Failed to push image: net/http: request canceled (Client.Timeout exceeded while awaiting headers) Expected results: When the 1st dns server is down, the 2nd dns server still available, docker login should succeed. Additional info: There are several similar issue in github issues: https://github.com/docker/docker/issues/22635#issuecomment-260063252 https://github.com/concourse/concourse/issues/374#issuecomment-211466240 After I correct dnsIP in node-config.yaml, restart node server, re-deploy docker-registry pod to make a working DNS listed on the top of /etc/resolv.conf in container, this issue disappeared.
Jianlin, > 1. Set openshift_dns_ip=<a-non-existing-ip>, e.g: 172.30.0.1 So you're saying that isn't a valid ip? That's not the kubernetes service ip? Can you provide the logs from the registry too? Given this sounds like a deliberate misconfiguration I'm going to mark this UpcomingRelease.
Created attachment 1228815 [details] docker registry log
(In reply to Scott Dodson from comment #1) > Jianlin, > > > 1. Set openshift_dns_ip=<a-non-existing-ip>, e.g: 172.30.0.1 > > So you're saying that isn't a valid ip? That's not the kubernetes service ip? > Sorry for my typo. I was setting "openshift_dns_ip=172.30.0.2" > Can you provide the logs from the registry too? Attached.
Re-assigning to Image Registry component but reviewing the logs attached in comment #3 it doesn't look like it's actually a dns failure but a failure to connect to the api server based on this log entry time="2016-12-07T02:53:43.511149145Z" level=debug msg="invalid token: Get https://openshift-136.lab.sjc.redhat.com:443/oapi/v1/users/~: dial tcp: i/o timeout" go.version=go1.7.3 http.request.host="172.30.245.180:5000" http.request.id=80af04bb-3e33-4502-9c59-5c78c3172260 http.request.method=GET http.request.remoteaddr="10.128.0.1:40022" http.request.uri="/openshift/token?account=unused&client_id=docker&offline_token=true" http.request.useragent="docker/1.12.3 go/go1.6.2 git-commit/8b91553-redhat kernel/3.10.0-514.2.2.el7.x86_64 os/linux arch/amd64 UpstreamClient(Docker-Client/1.12.3 \\(linux\\))" instance.id=f90a06b7-512b-4633-96eb-b8517dce4b08 If it is a dns issue then it'd be the registry's dns resolver library not properly failing over from one dns server to the other.
v3.6.0-alpha.2, go v1.7.5 - it tries to use another DNS server after waiting for 20 second. Which timeout do you have for master api calls from the registry?
(In reply to Oleg Bulatov from comment #5) > v3.6.0-alpha.2, go v1.7.5 - it tries to use another DNS server after waiting > for 20 second. Which timeout do you have for master api calls from the > registry? I am not deeply familiar with the functionality between api and registry stuff, can you tell me, how to get that timeout value?
Much has probably changed in this space since 3.6, can you confirm it's still an issue in 3.9?
In 3.9, I have no way to reproduce the same scenario like the initial report in 3.4, now only dnsIP is kept in /etc/resolv.conf in pod, no way to add one more as the second nameserver. # oc rsh docker-registry-2-bhlbg sh-4.2$ cat /etc/resolv.conf nameserver 172.16.120.117 search default.svc.cluster.local svc.cluster.local cluster.local openstacklocal bb.com options ndots:5 So this scenario is not applicable for 3.9 version. But I tried some negative testing, set dnsIP to a wrong IP, then try docker login, it succeeded. # oc rsh docker-registry-2-wrcvw sh-4.2$ cat /etc/resolv.conf nameserver 172.16.120.7 search default.svc.cluster.local svc.cluster.local cluster.local openstacklocal bb.com options ndots:5 # docker login -u unused -p $(oc sa get-token builder -n openshift) 172.31.43.231:5000 Login Succeeded # openshift version openshift v3.9.0-0.16.0 kubernetes v1.9.0-beta1 etcd 3.2.8
Ok, going to close this out then. Sorry we let it sit so long. (I am surprised it works even with a completely wrong dns value).